HELSINKI UNIVERSITY OF TECHNOLOGY Department of Automation and Systems Technology

Reviews
HELSINKI UNIVERSITY OF TECHNOLOGY Department of Automation and Systems Technology System Analysis Laboratory Olli Väyrynen Identifying Undervalued Stocks with Multiple Financial Ratios Master’s thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Technology Espoo, Finland, December 2, 2007 Supervisor: Professor Ahti Salo Instructor: Ph.D Hannes Kulvik HELSINKI UNIVERSITY OF TECHNOLOGY Department of Automation and Systems Technology Author: Olli Väyrynen Major subject: Systems and operations research Minor subject: Strategy and international business Title: Title in Finnish: Chair: Supervisor: Instructor: Abstract: ABSTRACT OF THE MASTER’S THESIS Date 3.12.2007 Number of pages: 86 Identifying Undervalued Stocks with Multiple Financial Ratios Aliarvostettujen osakkeiden identifiointi tunnuslukujen perusteella Mat-2. Systems and operations research Professor Ahti Salo Ph.D Hannes Kulvik The process of determining the current value of a company is of interest for legislators, company management, asset managers as well as for individuals. There are numerous methods for determining the value, some of them are subjective and some more objective. Public companies announce their results according to certain economic legislation in order to improve the transparency of their businesses. At worst, the lack of integrity and transparency of profit calculation with unethical company governance leads to severe financial collapses. A company’s value lies in its potential to generate a stream of profits in the future. The goal of this Thesis is to form an accurate model to identify undervalued stocks. The identification is based on eight financial ratios, so the analysis is multivariate in nature. The valuation of stocks is enforced with dividend discount model (DDM) using actual cash flow data. Stocks are then classified in undervalued and overvalued with linear discriminant analysis (LDA) which is a widely used with corporate performance surveys. Different ratio combinations are evaluated in order to find the most discriminating ratio profile. The statistical assumptions of discriminant analysis are examined in depth as they influence the both statistical and practical significance levels, as well as the prediction capability of the model. Based on this Thesis, the LDA based multivariate identification of undervalued stocks has some predictive capability. As expected, the predictive capability deteriorates substantially when predicting to other sectors or to other periods of time. Overall, more research is needed to develop the model to be utilized in practice. Keywords: discriminant analysis, stock valuation, financial ratio analysis, bankruptcy prediction TEKNILLINEN KORKEAKOULU DIPLOMITYÖN TIIVISTELMÄ Automaatio- ja systeemitekniikan osasto Tekijä: Olli Väyrynen Päiväys Pääaine: Systeemi- ja operaatiotutkimus 3.12.2007 Sivuaine: Yritysstrategia ja kansainvälinen Sivumäärä: liiketoiminta 86 Työn nimi: Aliarvostettujen osakkeiden identifiointi tunnuslukujen perusteella Professuuri: Mat-2. Systeemi- ja operaatiotutkimus Valvoja: Professori Ahti Salo Ohjaaja: TkT Hannes Kulvik Tiivistelmä: Yrityksen nykyarvon määrityksestä ovat kiinnostuneita lainsäätäjät, yrityksen johto, salkunhoitajat sekä myös yksityiset henkilöt. On olemassa lukuisia tapoja määrittää yrityksen arvo ja toiset niistä ovat subjektiivisia ja toiset enemmän objektiivisia. Julkiset osakeyhtiöt ilmoittavat tuloksensa tietyn taloudellisen lainsäädännön mukaisesti jonka tarkoituksena on parantaa liiketoiminnan läpinäkyvyyttä. Epäyhtenäinen ja läpinäkymätön tuloslaskenta yhdessä epäeettisen hallinnointikäytännön kanssa on aiheuttanut viime vuosikymmenten aikana vakavia taloudellisia romahduksia. Yrityksen arvo piilee sen potentiaalissa tuottaa voittoa tulevasiuudessa. Diplomityön tavoite on muodostaa käyttökelpoinen malli aliarvostettujen osakkeiden identifioinnille. Identifiointi perustuu kahdeksaan taloudelliseen tunnuslukuun, joten kyseessä on monimuuttujatutkimus. Osakkeiden arvonmääritys perustuu osinkodiskonttausmalliin (Dividend Discount Model), jossa käytetään toteutuneita kassavirtoja. Osakkeet luokitellaan aliarvostettuihin ja yliarvostettuihin lineaarisella erotteluanalyysillä (Linear Discriminant Analysis), jota käytetään laajalti yritysten performanssin määrittämisessä. Parhaimman erottelun löytämiseksi eri tunnuslukuyhdistelmät arvioidaan analyyttisin menetelmin. Erotteluanalyysin tilastolliset otaksumat tutkitaan perusteellisesti, sillä niillä on vaikutusta mallin ennustuskykyyn sekä tilastolliseen että käytännölliseen merkitsevyystasoon. Tämän tutkimuksen perusteella lineaarisen erotteluanalyysiin pohjautuva aliarvostettujen osakkeiden monimuuttujaidentifiointi osoitti jonkin verran ennustuskykyä. Kuten odotettua, ennustamiskyky heikkenee huomattavasti ennustettaessa toiseen sektoriin tai toiseen ajankohtaan. Kaiken kaikkiaan tarvitaan lisätutkimusta mallin kehittämiseksi jotta sitä voitaisiin hyödyntää käytännössä. Avainsanat: erotteluanalyysi, osakkeiden hinnoittelu, tunnuslukuanalyysi, konkurssiennustaminen Preface I would like to thank Sifterfund for giving me an opportunity to do this Thesis. Especially I would like to thank my instructor Hannes Kulvik for guidance and interest. I would like to thank Professor Ahti Salo for comments and guidance. Furthermore thanks to my parents, friends and Maria for support. Espoo, 3.12.2007 Olli Väyrynen Contents 1 1.1 1.2 1.3 1.4 2 2.1 Introduction..................................................................................................................1 Problem context .........................................................................................................1 Research objectives....................................................................................................3 Scope..........................................................................................................................4 Structure.....................................................................................................................5 Stock valuation .............................................................................................................7 Financial ratio analysis ..............................................................................................8 2.1.1 Price to earnings ratio (PE) ....................................................................................9 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.1.8 2.2 2.2.1 3 3.1 3.1.1 3.1.2 3.2 3.2.1 3.3 4 4.1 4.2 4.3 4.4 4.5 5 Earnings before interests and taxes margin (EBITM) .........................................10 Cash flow to price yield (CPY)............................................................................11 Free cash flow to price yield (FCPY) ..................................................................12 Return on capital employed (ROCE)...................................................................14 Price to book value ratio (PB)..............................................................................15 Price to sales ratio (PS) ........................................................................................16 Gearing ratio (GEA) ............................................................................................17 Dividend discount model .........................................................................................19 Discount factor.....................................................................................................21 Bankruptcy prediction models..................................................................................24 Statistical Models.....................................................................................................24 Linear Discriminant Analysis (LDA) ..................................................................24 Logit models ........................................................................................................28 Artificial intelligent expert system models (AIESM)..............................................30 Neural networks ...................................................................................................30 Models based on economic theories ........................................................................32 Empirical analysis......................................................................................................34 Valuation phase........................................................................................................36 Basic sample, hold-out sample and external sample ...............................................38 Selection procedure of companies ...........................................................................39 Approaches to sample forming ................................................................................40 Ratio profile .............................................................................................................41 Results on predictive accuracy .................................................................................42 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 6 6.1 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5 6.2 Mean values and variances in the basic sample.......................................................42 Variable correlations................................................................................................43 Variable normality ...................................................................................................44 Equality of covariance matrixes...............................................................................48 Classification capability...........................................................................................49 Variable influence section........................................................................................51 Prediction to hold-out sample ..................................................................................54 Prediction to another sector .....................................................................................55 Sensitivity analysis .....................................................................................................56 Excluding outliers ....................................................................................................56 Normality .............................................................................................................57 Equality of covariance matrixes...........................................................................59 Classification capability.......................................................................................60 Variable influence................................................................................................61 Out-of-sample prediction capability ....................................................................62 Variable transformations..........................................................................................63 Transformations and the basic sample.................................................................64 Transformations and the trimmed sample............................................................68 Superior ratio profile................................................................................................71 The proposed models for classifying and predicting ...........................................73 6.2.1 6.2.2 6.3 6.3.1 7 7.1 7.2 8 Discussion and conclusions .......................................................................................77 Evaluation of tested approaches...............................................................................77 Suggestions for further study ...................................................................................79 References...................................................................................................................82 1 Introduction 1.1 Problem context During the last three decades, financial markets have experienced an upheaval as computers gain capacity and as financial information can be accessed in a moment. Financial institutions compete with each other by offering the most accurate and relevant financial information. Ever more sophisticated decision support systems are formed by mathematical models and forecasts. There are numerous people in the investment world with different backgrounds, who valuate securities for several purposes. A common purpose is to benefit from knowing the real value of a security. People valuating stocks can be divided into six broad categories (Hoover 2006) according to their objectives: Corporate managers benefit from knowing the value of their company especially when taking strategic decisions about raising money, because in the times of overvaluation, company can gather larger amounts of money in a share issuance. They also benefit from knowing the values of other companies in case of an acquisition or a merger. Financial analysts in investment banking business help corporate managers to realize additional share issuances, acquisitions and mergers. Financial analysts in equity research track and give out their recommendations to public and/or private investors, although there is little empirical evidence supporting the idea of profiting abnormally based on the information offered by equity researchers. Asset managers are professional investors managing funds of individuals and organizations in order to profit abnormal profits. In order to “beat the market”, assets managers continuously seek and act on market misvaluation of securities. Individuals are rarely recommended to pick stocks but instead, are encouraged to invest in indices or funds. Still, stock picking can often be seen as an enjoyable hobby bringing excitement. Economic policymakers value stock markets as a whole in order to observe the stability of 1 the markets. They have to up to date as they make decisions about interest rates, money supply and they enact laws. The two extremes of security valuation are technical analysis and fundamental analysis (Ross et al. 1999). Technical analysis (Murphy 1999) discards the underlying business and valuates securities by statistics, such as past prices and volumes. Advocates of technical analysis predict market prices and movements based on the dynamics of market price and volume, rather than on fundamentals of the corporation. Fundamental analysis strives to measure the intrinsic value of a security by studying all the information that can affect the value. Both microeconomic and macroeconomic factors are considered in fundamental analysis. Many of the most well performed investors support fundamental analysis. Stocks are commonly given scores on the basis of their fundamentals. Nowadays, various mathematical tools are used to aid valuating and picking stocks. Financial ratios are being interpreted and compared to assess the intrinsic value of stocks. A classical scoring method is linear discriminant analysis (Altman 1968; Lachenbruch 1975; Sharma 1996), which classifies stocks in two or more groups based on multiple financial ratios. The most recent models for stock valuation and classification are neural network models (Brocket et al. 1994, 2006; Zapranis & Ginoglou 2000). In this Thesis, the stock valuation and classification is studied from the asset manager’s point of view. Sifterfund supplied the access to Bloomberg in order to gather the data for the Thesis. The data consists of two sub sectors in the US manufacturing sector; industrial manufacturing and non-cyclical consumer goods. The data is gathered from the years 1989-2006. 2 1.2 Research objectives The Thesis focuses on the relationship between stock valuation and financial ratios. The primary objective is to form a model to classify stocks in groups of undervalued and overvalued based on their financial ratios. The underlying idea is that the valuation level of a stock can be assessed from its financial ratios. The valuation level is not apparent in the ratios and multivariate technique is required. The Thesis concentrates on two group discriminant analysis as a method for group classification and prediction. The classification capability is evaluated from both, practical and statistical point of view. The validation of the classification capability is enforced with testing the prediction capability within the sector, as well as, within another sector. Therefore, three samples are diligently formed; basic and hold-out samples from the industrial sector and the additional sample from non-cyclical consumer goods sector. The basic sample is for adjusting the model and the other two are for testing the prediction capability. Suitability and limitations of discriminant analysis for identification of undervalued stocks is inspected as well. An important phase in the formation of the model is definition of ratio-profile to be used. Thus, various ratio profiles will be tested in order to find the most suitable one. The candidate ratios are agreed with the instructor. As the ultimate goal is to achieve highest possible classification and prediction rates, the statistical assumptions of the discriminant analysis are examined in depth. Other secondary objectives include literature review, assessment of prudence of the Thesis in general, as well as, identifying relevant areas for further study. 3 1.3 Scope The main constraint in this work is that only discriminant analysis is used. That is because the scope of the Thesis was estimated suitable with discriminant analysis only. The number of financial ratios in the Thesis is eight. The eight ratios were agreed with the instructor as the most relevant ones in this regard. The limitation is reasonable because larger number of candidates would require an additional model for ratio selection. Originally there were nine ratios but dividend yield was excluded because of its partially continuous nature. The exclusion was also supported by the correlation with the valuation method. The dividends from the following seven year affect the valuation of a stock. None of the qualitative measures are included in the Thesis either. The Thesis is limited to only in the two sub-sectors in the US manufacturing sector; industrial and non-cyclical consumer goods. The two sub-sectors are selected because they are considered to have low volatility and because of the popularity of the manufacturing sector in the earlier research. The market capitalization is limited to about mid-caps range from $300m-3$bn. Larger companies are deemed to be somewhat self-sustaining and therefore not characteristic for the markets. Smaller companies have the tendency to suffer from overall uncertainty, respectively. On a practical implementation point of view, the largest constraint is the valuation method. The valuation method used future cash flows from the following seven years which impedes model’s crude implementation in to practice. A satisfactory prediction capability seven years in to the future is not realistic. The valuation phase as well as the sample formation phase decrease sample sizes because observations with incomplete history data are excluded. The financial ratio history 4 data was commonly available since 1989. The actual valuation begins from the year 1991 in order to increase the sample sizes. 1.4 Structure Literature survey of stock valuation consists of financial ratio analysis, dividend discount model and bankruptcy prediction models. The financial ratio analysis concentrates on individual financial ratios and their characteristics. The dividend discount model is reviewed concentrating on discount factor and time span to be used in the valuation. Bankruptcy prediction models chapter covers the various models used in the company performance research. The empirical data section consists of valuation and sample forming procedure. This section is important because the rest of the work is based on the samples formed here. The samples can be formed in endless ways and it largely depends on the characteristics of the take. The results section first analyses the results from the discriminant analysis. The Thesis continues with sensitivity analysis, in which history data is processed, trimmed, and various ratio profiles are being analyzed. The results section culminates to the superior discriminating functions. The final section assesses the analysis critically and suggestions for further study are formed. The overall research process is shown in Figure 1, which illustrates the crucial points in the research. The analysis diagram can be used as a guide through the Thesis. 5 Figure 1 Analysis diagram. 6 2 Stock valuation The starting point for asset valuation based on fundamentals is that the present value depends on its future cash flows and for example stocks provide two kinds of cash flows: dividends and sale price in the end (Ross et al. 1999). If the valuation concerns bonds, coupons are received and if real projects are valued, after tax cash flows are discounted into the present value. Summing up the future cash flows yields the discounted cash flow model (DCF) which is the same despite the type of the asset. The subjectivity of fundamental analysis often crystallizes when an investor realizes of having only quality companies the portfolio. It is a natural tendency to analyze and choose only high-quality companies because the markets, supply and demand, are defined by human behavior. The demand for quality stocks can be viewed to be substantially high and low-quality stocks low. This can inflict a wide gap between the real values of stocks. After all, stock picking is largely about timing and understanding the behavior of other people in the markets. Stracca made a comprehensive review on behavioral finance (Stracca 2004) and said it to be the most promising field of economics at the moment. Bulletproof evidence is yet to be provided that behavioral sciences help outperforming the market. On the other hand, many studies provide evidence that the market functioning is rather irrational. To be exact, behavioral finance scientifically studies human and social cognitive and emotional biases to better understand economic decisions-making under uncertainty. It also studies how the biases affect market prices and allocation of resources. The Efficient Market Hypothesis (EMH) asserts that prices on traded assets are unbiased and they reflect all the information available. EMH was introduced by Eugene Fama in 1970 (Fama 1970) and it is one of the cornerstones in the modern theories of finance. According to EMH, it is not possible to consistently 7 outperform the market with the information already available on the market. The three forms of EMH are weak-, semi- and strong-form. Weak form efficiency implies that technical analysis will not yield excess returns in the long-run. Semistrong efficiency implies that fundamental analysis cannot yield excess returns in the long run either. Strong-form efficiency implies that security prices reflect all the information available and no one can earn excess returns, respectively. In an efficient market, above average return has more to do with luck than skill. There are numerous studies and statements for and against EMH. In case of the strongform EMH, analyzing information would not benefit anyone. Weak-form EMH hardly is the case because the majority of the active asset managers under-perform their appropriate benchmarks. The pervasive concepts of financial literature are value investing and growth investing (Hoover 2006). Value investing is an investment strategy that favors stocks that are undervalued, for example, because of overreaction to news flow. Growth investing is an investment strategy that favors stocks that are expected to earn above average earnings compared to the markets. As the EMH implies: “If the market prices stocks accurately, there is no consistent advantage in choosing between one type of stocks over another” (Hoover 2006). Growth investors typically screen out low PE companies and they think that they are able to predict high-growth phases for companies. In contrast, value investors typically screen out high PE companies, and their idea is to know the undervalued stocks with relatively slower growth. The academic community has generally come to agree that value investing is the better performing strategy of the two (Chan & Lakonishok 2004). The two popular investing strategies have different starting points but the objectives are the same, so, they are not so different after all. As Warren Buffet puts it: “Growth and Value Investing are joined at the hip”. 2.1 Financial ratio analysis One of the most common ways of assessing the relative values of stocks among practitioners is to compare financial ratios. The main advantage of using financial 8 ratios instead of amounts from the income statement is that they are independent of the size of the company. Thus, financial ratios allow elegant comparison of securities. Academicians have been studying financial ratios widely for almost a century already. As computers develop and financial reporting became more regular, statistics came up as a notable way of studying security valuation. Statistics are useful in cases of extensive sample-sizes. Edward I. Altman said in 1968 (Altman 1968): “Academicians seem to be moving toward the elimination of ratio analysis as an analytical technique in assessing the performance of the business enterprise”. Despite of Altman’s argument, researchers have been publishing papers on this matter frequently (Dimitras et al. 1996). Although there is just a little formal empirical evidence that financial ratios help picking well performing stocks (Adnan & Dar 2006), it is widely used and agreed to be useful (Campbell & Shiller 1998). The comparison of financial ratios is used to assess companies’ financial condition, operations and attractiveness as an investment. Based on their characteristics, the ratios can be divided into five categories. Leverage ratios (or gearing) show the extent of long term debt in the capital structure of a company. Liquidity ratios imply the solvency or a company’s ability to pay off its short term debt obligations. Operational ratios imply the operational efficiency of a company using its assets to generate profit. Profitability ratios show company’s ability to generate sales on the relevant costs or capital. Solvency ratios give a picture of a company’s ability to generate cash flow to pay its financial obligations with available cash. The eight potential ratios were agreed with instructor and they are examined below. 2.1.1 Price to earnings ratio (PE) The most popular valuation ratio is price to earnings ratio (PE), which is usually the first thing to examined about securities. PE can also be seen in the news as institutions assess the economy as a whole. Stock exchanges are also valued as a whole, for example, “valued above historical average”. Practitioners rely heavily 9 on the PE-ratio. Technology and other volatile stocks generally sell at high PE ratios because they are expected to grow fast in the future. Valuation levels can get out of hand, as we saw in the beginning of the millennium, as the internetbubble burst. Small, loss making internet companies could have been valued tens of times higher than for example stable manufacturing companies because of the growth opportunities investors believed the companies would have. Billions and billions of dollars disappeared as the market corrected itself. PE-ratios are sector specific and PE comparison among the companies in the sector gives a fast, tentative estimate of the appreciation level of the companies. The general PElevels also vary from country to country and for example in Japan (Ross et al. 1999), the average multiple for Tokyo Stock Exchange has been 40-100, while in America it has been around 25. This suggests huge and constant growth opportunities for Japanese companies but it can, as well depend on the culture and on what level the market has been used to be. The PE-trend in Japan is downward sloping probably because companies are getting even more multinational. In addition to growth opportunities (Ross et al. 1999), the PE-ratio can be high because of low risk or for it is accounted in a conservative manner, yet, the first one being the most important. Usually earnings figure is from the last four quarters (trailing) or expected four quarters but sometimes two past quarters are used to predict the two future earnings. Earnings typically refer to after-tax netincome and it is the ultimate success factor for businesses. PE ratio is calculated according to equation PE = SHARE _ PRICE . EARNIGNS _ PER _ SHARE (1) 2.1.2 Earnings before interests and taxes margin (EBITM) Earnings before interests and taxes is very popular financial figure indicating the profitability of a company. The differences to PE ratios earnings are that interest expenses and taxes are not deducted from operating income. Corporate management has rather wide margin to adjust EBIT because of amortizations with 10 intangibles and depreciations with tangibles. Yet, one has to keep in mind that expenses incurred from the firm’s capital structure do not affect EBIT and thus it cannot be observed solely. Another pitfall occurs with research and development expenses because, for example, technology companies treat it as an operating expense, although it is the single most important capital expenditure in a technology company (Damodaran 2001). Another phenomenon for a company management is to postpone earnings higher than analyst estimates. The EBITM is EBIT compared to the net sales and it is also called operating margin. It indicates how effective a company is at controlling the costs and expenses associated with their normal business operations. The ratio is calculated as EBITM = REVENUE − OPERATINGEXPENSES , NET _ SALES (2) where the revenue less operating expenses is also equal to a sum of earnings, interest expenses and taxes. 2.1.3 Cash flow to price yield (CPY) CPY is a reliable measure of sustainability of a business because cash is concrete, contrary to PE which is easily manipulated. Cash either comes in or not. If a company places a cost one year ahead, earnings weaken by that amount but cash flows are unchanged until the next year. Many practitioners and investor gurus place a great deal of emphasis on CPY. Jing Liu (Liu et al. 2007) recently studied the difference between EPS (inverse of PE ratio) and CPY in the context of stock valuation and concluded that, in general, EPS outperformed CPY. He used estimates instead of reported financials and also argued that the estimates outperformed the reported financials as well. However, a logical explanation for the popularity of CPY among investor gurus might be the clarity that the measure offers when assessing individual companies. There has been a growing trend of analysts making cash flow forecasts especially for firms with, for example, poor 11 financial health or high earnings volatility (Defond 2003). Cash flow forecasts assist in interpreting earnings and assessing firm viability. CPY is calculated as CPY = OPERATING _ CASH _ FLOW . COMMON _ SHARES _ OUTSTANDING (3) Operating cash flow is the difference between the revenue from the products/services (operating revenue) and costs incurred from producing the products/services in question (operating costs). Operating cash flow (OCF) equals the sum of EBIT and depreciation less taxes. 2.1.4 Free cash flow to price yield (FCPY) Free Cash Flow is a measure of financial performance representing the cash that is left after the costs of handling its asset base. It is calculated by deducting capital expenditures from operational income, as below: FCPY = FREE _ CASH _ FLOW _ PER _ SHARE . CURRENT _ MARKET _ PRICE _ PER _ SHARE (4) Free cash flow is the sum of net income and amortization/depreciation less changes in working capital and capital expenditures. Free cash flow is another concrete measure of the companies’ ability to generate profits, in addition to cash flow. Even profitable businesses can have negative cash flow if they face increased financing cost from additional capital. The difference between OCF and FCF is that FCF is stricter and takes into account changes in working capital and capital expenditures to reveal the hard cash that the company has after all the costs the business requires. The reason why amortization/depreciation is added to the equation is that the FCF measures the 12 cash flow at that moment and affects of investments executed in the past years are eliminated. Investors are quite interested in FCF because the growth of a company asks for cash and, even more importantly, the stream of dividends is paid in hard cash as well. When a stock price is relatively low and FCF is in a steady rise, a profitable investment opportunity might have occurred. If the company is not wasting the incoming money for nothing, earnings will rise eventually. On the contrary, is the FCF levels are weakening for too long, the company will face liquidity problems and becomes indebted. FCF is the cash that can be used to invest in and to upgrade businesses. Excessive shareholder rewarding can deplete the FCF and way more expensive money has to be lent from outside the firm, thus increasing risk and lowering future cash flows. The interests of corporate managers and shareholders have major conflicts and they have drawn a little attention in the academic community (Jensen 1986). It is also claimed that managers’ power is reduced by high payout ratio because of reduced resources they are in charge of, which might give incentives to mangers to grow the company beyond its optimal size (Jensen 1986). The bias develops further because managers’ compensations are positively related to the growth in sales (Murphy 1985). The free cash flow hypothesis claims that high levels of FCF leads to wasteful activities by the management (Ross et al. 1999). According to the hypothesis, without excess cash, management operates as in more risky situation and thus avoids projects with negative NPV (Mitra et al. 1991). The hypothesis supports debt financing as the principal when interest reduce the free cash flow reducing the opportunity for managers to waste resources. According to the US oil industry survey supporting the free cash flow hypothesis (Griffin 1988), oil industry altered much in the beginning of the eighties, through mergers and share buybacks. Market values increased even though debt to equity ratios increased substantially, meaning that the markets viewed increased debt beneficial. 13 2.1.5 Return on capital employed (ROCE) ROCE measures the profitability of a company’s capital investments. The ratio is defined as ROCE = EBIT . TOTAL _ ASSETS − CURRENT _ LIEABILITIES (5) Capital employed includes fixed tangible assets, other operating assets and working capital. In other words, capital employed is the value of all the assets employed in a business. ROCE is closely related to return on equity (ROE) which is the ratio between net income and average stockholders’ equity. The difference with ROCE is that interests and taxes are subtracted from the net income and long term liabilities are also subtracted from the total assets. ROCE is much overlooked ratio possibly because it is not as intuitive as many others but it is a useful ratio for assessing the efficiency of a company’s capital investments. A public company has to raise capital to achieve higher return and ROCE measures company’s ability to achieve operating profit on operating assets. As a rule of thumb, ROCE should always be higher than the rate at which the company borrows. A stable history of high ROCE suggests high growth for a company and ROCE is especially essential with capital intensive companies because huge sums of money is needed for investments and, once again, it is vital to invest in order to grow. On the contrary, as Helfert (Helfer 2001) puts it: ROCE “does not, however, relate well to economic measures used in judging new investments, nor does it assist in making day-to-day decisions on an economic basis”. Also, ROCE has a tendency to rise cash being the same, because assets are being depreciated all the time. This is not a flaw because companies increase debt rather repeatedly. When studying ROCE, long averages should be used with assets. Another point worth 14 considering with ROCE is that inflation only increases revenues but would not affect assets, which might increase ratios substantially in the times of high inflation. Andersson (Andersson 2006) reveals an interesting statistic about S&P 500’s 158 survivor companies’ (been in the index since 1980 until 2003) ROCE, what have been stable at around 12 % for over two decades. In the recession of 1990 ROCE went under 10 % and in the internet crash 2001, it dropped to 5 %. Yet, it seems to recover quickly. 2.1.6 Price to book value ratio (PB) Price to book value is the intuitive comparison of the market capital and the book value of the company in the balance sheet. There are slightly varying ways of defining the book-value but the basic way defining it is by using share capital, which is the difference between total assets and total liabilities. Retained earnings are included in the equation because it is a profit retained to the company after paying the shareholders, so it really is a tangible asset. PB ratio is PB = MARKET _ PRICE .(6) TOTAL _ ASSETS − TOTAL _ LIABILITIES + RETAINED _ EARNINGS Book value manipulation is possible because plant depreciates but the management can choose the pace at which it depreciates. The annual depreciation rates are regulated but companies can adjust depreciations according to their results. Company owned old buildings are commonly complicated to value and they can be depreciated “worthless” according to balance sheet but in reality, they might be worth millions. Respectively, some equipment can be depreciated for a couple of years and still have value in the balance sheet, although, nobody would buy used equipment. If a company is trading below its book value, it is usually thought as cheap. PB ratios are usually low among capital intensive industries, such as engineering and 15 metal industry, because they are not supposed to grow rapidly in the future and investments are time demanding processes. PB ratio is less meaningful for companies that posses hidden assets, such as intellectual property, which is not reflected in the book value. (Penman 1996) nominated PB as an appropriate indicator (argued also that PE is not sufficient) of earnings growth because PB is unaffected by current profitability. Fama and French (Fama & French 1992, 1995) show that firms with low PB have persistently low earnings, high financial leverage and are more likely to cut dividends compared to companies with high PB. 2.1.7 Price to sales ratio (PS) Price to sales ratio values stock by dividing the market price with the trailing 12 month revenue. Price to sales ratio does not take capital structure into account, thus only similar companies should be compared. When comparing similar companies, say, companies with similar capital composition, their price to sales ratios tell a lot about the company’s competence of making revenue and how much the markets value every dollar of the company’s sales. Price to sales ratio is very handy in cases where large-scale costs occur and PE ratio becomes useless because earnings may diminish even to negative level. Company might have been investing heavily and the revenues are rocketing, so, valuation ratios should be studied in a multivariate manner. PS ratio is PS = CURRENT _ MARKET _ PRICE . 12 _ MONTH _ TRAILING _ REVENUE (7) One should be careful with revenues, because they can sometimes be net revenues meaning that cash discounts are being subtracted. Some practitioners consider relatively low price to sales ratio and rising stock price to be an investment opportunity for a growth stock. Another warning sign might be rising receivables 16 even though revenue growth is string because then revenues are not collected. PS is suggested to be a stable stock price predictor but PE ratio outperforms it in most cases (Senchack & Martin 1987; Park & Lee 2003). 2.1.8 Gearing ratio (GEA) Gearing is a financial ratio describing the level of company’s debt compared to its share capital. The gearing equation below indicates the degree to which the firm is funded by creditors and owners money. High levels of gearing is considered risky but on the contrary, organic growth is not enough in most cases and financial leverage is required survive. Gearing level must be considered in relation to its peers and substantially high gearings should be regarded risky because in case of an economic downturn, debt services cause serious risk for the company. Gearing ratio is GEARING = NET _ DEBT * 100 . SHARE _ CAPITAL (8) Net debt is total debt deducted by liquid assets; cash and assets that can be converted to cash immediately, such as savings deposits, certificates of deposit, money market accounts and money market mutual funds. Capital intensive industries, such as automobile industry, tend to have gearing ratios as high as 2, compared to ratios well below one in less capital intensive industries. Schools books put in brief (Ross et al. 1999):” Changes in capital structure benefit the stockholders if and only if the value of the firm increases”. Traditional corporate finance is based on Modigliani-Miller Theorem (Modigliani & Miller 1958), which states that the firm is unaffected by how the firm if financed, thus, in the absence of tax effects, transaction costs, an asymmetric information and bankruptcy costs. In practice, the capital structure is optimized by the absent settings mentioned, mostly according to tax effects and economic 17 situation. Equity issuance is beneficial for a firm in the times of high stock price because more money can be gathered and managers avoid equity issuances if they consider their stock undervalued. Recent studies also suggest that firm’s history plays an important role in determining capital structure (Hovakiam et al. 2001). Also highly profitable firms tend to pay down their debt and become less leveraged. High payout ratio also affects gearing because it supports taking debt for investments. Jensen (Jensen 1986) claims that the shareholders support paying dividends for reducing resources under management’s control, thus, reducing wasteful investing in negative NPV targets (mentioned also in Free Cash Flow). Some terminology explaining corporate financing behavior: The static trade-off theory (Myers 1984) says that a firm is viewed as setting a target debt-to-equity ratio and makes choices according to the current and target debt-to-equity ratio. The theory is for the tax benefit of debt and claims that the marginal benefits of further increases in debt decrease, so, the debt-to-equity ratio must be optimized according to marginal benefits. The pecking order theory (Myers 1984) says that a firm prefers internal and debt financing and there is no actual target debt-to-equity ratio. The theory suggests that companies make the financing decisions according to the law of least effort, or least resistance. Hence, the hierarchy of financing is internal funds, debt, and equity as the last resort. The theory also claims that firms adapt their dividend payout ratio to their investment opportunities, which makes dividend policies sticky. The market timing hypothesis (Baker & Wurgler 2002) does not generally care if debt or equity is used but the choice depends on the current situation of the financial markets and the price to be paid for the capital. Equity is issued when prices are high and repurchasing when prices are low. Firms take advance of perceived “mis-pricing” of markets in financing their business and therefore, the hypothesis belongs to behavioral finance. 18 The neutral mutation hypothesis (Miller 1977) says that firms fall into financing patterns and habits which have no effect on firm value. Habits make interest groups feel comfort and predicting accurate. 2.2 Dividend discount model Stocks are valued according to their future cash flows for investors, meaning dividends, if any, and sale price after the holding period. The future cash flows will be discounted according to investors yield requirement on the investment. Dividend discount model (Ross et al. 1999) is the general starting point for all security valuation methods and a number of researchers have found a positive correlation between dividend yields and future stocks returns in a multiple-year time horizon (Goetzmann & Jorion 1995). LeRoy and Porter (Leroy & Porter 1981) and Shiller (Shiller 1981), on the other hand, questioned the usefulness of DDM by claiming that stock prices appear to be too volatile to be measured by fundamentals. DDM is mostly used to with future estimates and using it with realized dividends and market prices divides academicians into supporters and opponents. The basic idea of DDM is (Ross et al. 1999) P0 = Div1 P + 1 . 1+ r 1+ r (9) Net present value of a stock, considered one year ahead, is the sum of the dividend, and the sale price after that year discounted by investors yield requirement r, as can be seen in the equation above. When the proceeding n years are in consideration, the formula evolves as (Ross et al. 1999) P0 = n Div3 Divn Pn Div1 Div2 + + + ... = ∑ + . 2 3 n 1 + r (1 + r ) (1 + r ) (1 + r )n 1 t =1 ( + r ) (10) 19 Macro economic conditions vary substantially in ten year period which is why dividends and investor’s yields requirements cannot be assumed to be flat. A very popular version of DDM is constant growth version, in which dividends are assumed to grow at a constant rate g, as in the equation (Ross et al. 1999) P0 = Div . r−g (11) The equation is called Gordon formula and for the summation to be finite, it requires g to be smaller than r. Estimation of the growth rate for dividends is usually based on history trend and future prospects. As we want to illustrate reality in more accurate way, the equation evolves in differential growth DDM, in which we have more than one distinct growth rates g i . There are two different growth rates in the equation (Ross et al. 1999) P0 = ∑ t =1 T Div(1 + g1 ) t (1 + r )t + (1 + r )T DivT +1 r − g2 . (12) The discount rate, r, can be put in to the equation in exactly equal, differential, manner. DDM has problems as well as any model. The model can be viewed too static if constant discount rates are used during long periods. Another problem is that when counting infinite sums, the sums might dissolve, depending on the relation between growth factor and discount factor. Yet another flaw from reality is that commonly used zero growth dividends are not real. Sometimes companies cannot afford paying dividends at all and usually they prefer paying constantly rising dividends. 20 2.2.1 Discount factor The risk-adjusted discount factor to be used to discount future flows into present value consists of risk-free rate of return and risk premium. Risk-free rate of return is the minimum return to be expected from any investment, although nothing is purely risk-free. Risk-free rate of return is usually referred to as the interest rate of three month US Treasury Bills. Risk premium is the extra pay investors expect to achieve because of tolerating higher risk. As put in Luenberger (1997); it is a simplistic way of taking uncertainty in account is to increase the interest rate. The discount rate is the name of the rate at which US banks borrow from the US Federal Reserve. It is also called key rate or FED funds rate. The FED adjusts the key rate to control the liquidity in the markets to control the inflation. As banks borrow the money further, their business is to benefit from it and charge higher interest rate than the key rate. The key rate sets up the general level for interest rates and the end-user interest rates for each period of time are determined by supply and demand for money, which are in turn greatly affected by the economic outlook. The risk premium is the rate of return above risk-free interest rate, in other words, the reward for holding a risky investment rather than a risk-free one. Risk can be divided in two; market risk and specific risk. Market risk (systematic risk) cannot be avoided because economic cycles affect the whole market, yet stocks are affected differently than bonds. Specific risk (unsystematic risk) depends on the investment giving the investor more control on the risk she is willing to take. Unlike the market risk, specific risk can be diversified away. Inflation is a risk for lender because purchasing power of the amount lent, in most cases, is not as much as it was in the beginning. As the general level of goods and services is rising, lenders expect to be compensated for lending the money. The most well known are CPI (Consumer Price Indicator), which measures the 21 consumers prices constantly and defines the change as inflation and GDP deflator, which measures the cost of goods purchased by U.S. households, government and industry. Liquidity premium is a term used to explain the difference between two loans otherwise similar but the maturity dates. Short term loan is expected to be less risky than long term, giving the long term loan wider premium. In graphics, this is called the term structure which describes the relationship of spot rates with different maturities in which the yield curve is upward rising. Company’s cost of capital is frequently used as the standard interest rate for discounting future cash flows in to the present value. The cost of capital is a weighted sum of cost of equity and cost of debt (weighted average cost of capital, WACC), in which the tax benefit of deductible interest payments is included WACC = EQUITY DEBT requity + rdebt (1 − TAX ) . EQUITY + DEBT EQUITY + DEBT (13) The proportions of equity and debt are calculated with market values instead of book values. The cost of debt is the easy part of the WACC because it is usually clear how much a company is paying for their loans and bonds but the cost of equity is trickier. The cost of equity is normally higher than the cost of debt because it involves the risk premium. Actually, the cost of equity is the yield requirement of shareholders for lending the capital and bearing the risk of ownership. A common method for estimating the cost of firm’s equity is to use dividend capitalization model. The dividend capitalization model approximates the future dividends which capitalize to the current market price. If the company is not paying dividends, those can be estimated by comparing its average net income and cash flow with a similar-size firm. The formula for dividend capitalization model is also called Gordon Model (Gordon 1962) 22 COST _ OF _ EQUITY= NEXT _ YEARS_ DVDS + DVD_ GROWTH_ RATE. MARKET_ VALUE (14) The Gordon model itself is primarily used as a stock valuation method, but it can be used to assessing the cost of equity from dividend trend. The model should only be used with mature firms with low growth rates because of the assumption of constant growth rate in perpetuity. Another model for calculating the cost of equity is Capital Asset Pricing Model (CAPM) which describes the relation between expected return and risk. The model begins with time value of money in the form of risk free rate and continues by taking into account asset’s sensitivity to market risk. The risk premium is the difference between market return and risk free rate and the premium will be multiplied the sensitivity coefficient beta, β which yields the return above risk free rate. In other words beta is asset’s volatility in relation to the rest of the market. In theory, the market portfolio includes all the assets in the economy in proportion to their size but in practice, the S&P 500 index has often been referred to be the market portfolio with beta of 1. Luckily, news agencies such as Reuters and Bloomberg offers betas for many of the listed stocks. The model is defined as rasset = r free + β (rmarket − r free ) . (15) 23 3 Bankruptcy prediction models 3.1 Statistical Models Statistical models focus on symptoms of failure drawn mainly from company accounts. Statistical models follow classical standard modelling procedures and can be multivariate or univariate. Statistical models are mostly based financial ratios but the calculation methods divide them into two dominating research models: discriminant models and logarithmic models. Both the models lack the direct influence of corporate governance structures and management practises in numerical forms (Adnan & Dar 2006), yet, they are found useful and intuitive explanatory models. 3.1.1 Linear Discriminant Analysis (LDA) Linear discriminant analysis (LDA) (Lachembruch 1975) concentrates on assigning observations to two or more distinct groups. The discrimination is based on their characteristics, thus two dimensional data is projected in to one dimension. The discrimination of the observations into different groups is realized with a maximal separation between groups and the question is to fairly select the characteristics involved in the analysis. LDA has an assumption of being able to classify the initial data correctly into different groups which then will be used to evaluate weights to each variable (characteristic). Hence, LDA is a parametric method meaning that all the variables should be normally distributed. LDA is closely related to the truly common method in the statistics, regression analysis, with the difference of having a quantitative variable, where LDA has categorical variable. The idea is to discriminate the observations into q different groups C1 , C2 ,..., Cq . The first objective in LDA is to identify a set of variables that has the strongest 24 discriminating ability (Sharma 1996). Those variables are called discriminator variables. Means xiC j , i = 1,2,..., r , j = 1,2,..., q , will be calculated for each variable x i in each group and the number of the variables involved is denoted by r. Categorization of observations is based on their “z-scores” given by the weight ( wi ) function. It is also called discriminant function zi = x1w1 + x2 w2 + ... + xn wn = X ′w . (16) Two conditions must be satisfied to provide the maximum separation for z: the group means in the z should be as far apart as possible and values of z in each group should be as homogenous as possible. The two conditions are conjoined in having maximum between-group sum of squares and minimum with-in sum of squares. The second objective of LDA is to identify z, which provides the best maximum separation into the distinct categories. The third objective is to classify future observations into each of the groups, respectively. LDA problems can be solved with Fisher’s (Fisher 1936) criterion J ( w) = (x1 − x2 )2 2 s12 − s 2 , (17) where within group variances are calculated by s i2 = 1 ni ∑ (xij − xi )2 , i = 1,2,..., g . ni − 1 j =1 (18) Maximizing Fisher’s criterion yields a closed form solution. In order to find the maximizing vector wopt , we have to calculate the first derivate from the criterion 25 & and solve the equation J ( w) = 0 . The criterion needs to be rewritten to solve the equation (Sharma 1996). The criterion can be written as J ( w) = wT Bw . wT Ww (19) B is the between classes correlation matrix and W is within groups correlation matrix. The relation between B and W is that together they form a matrix which has the sums of squares as diagonal values and sums of cross products as offdiagonal values. It is also called SSCP matrix (Sums of Squares and Cross Products) SSCP = X T X = B + W . (20) The solution for the weights in the criterion is w = S −1W ( x1 − x2 ) . (21) Discriminant analysis has three assumptions that the data should meet; multivariate normality, equality of the covariance matrices and independency of observations (Sharma 1996). In order to achieve statistically significant results, discriminator variables come from a multivariate normal distribution. In theory, classification results are also affected if the assumption is violated. Unfortunately there is no clear-cut answer how much the variables can deviate from the normality. Although, studies have shown that even if the overall classification rate is not affected, some groups might enjoy overestimation of suffer underestimation (Lachenbruch et al. 1973). Violation of the assumption for the equality of covariance matrices also affects the significance tests and the classification results. The degree to which they are affected depends on the group sizes and the number of discriminator variables (Marks & Dunn 1974). In cases of unequal group sizes and when the number of discriminator variables is large, the null 26 hypothesis for equal group means is rejected too often. The equality of the covariance matrices assumption can be tested with Box’s M test variable, which, in turn, can be approximated as an F-statistic. Discriminant analysis is quite robust to the two assumptions but it is beneficial to know the possible effects of violating these assumptions. The final assumption of independence of observations is less discussed but it has a substantial effect on the power and on significance level, as well. The assumption is often violated when delicate procedures are used to for samples causing correlation among the observations. One can use stringent alpha levels if the observations are assumed not to independent of each other. In 1968, Edward Altman made the pioneer research in discriminant analysis with financial ratios. He assessed the quality of ratio analysis as an analytical technique and the prediction of corporate failure was used as an illustrative case. He gathered sixty-six firms to adjust the model to best categorize between bankrupt and non-bankrupt firms. The firms were from the US manufacturing sector from years 1946-1965 and the mean asset size is $6.4 million. He used five financial ratios as explanatory variables. Altman was able to achieve 94 % classification rate with the initial sample. He tested the model with several secondary samples which validated his results. The model could predict bankruptcy two years prior to the actual failure. According to a complete history review (Adnan & Dar 2006), about 30 % of the bankruptcy prediction research is realized with dicriminant analysis. The geometric mean of the prediction rates is 85 % among the 25 past studies he had gathered to the review and DA ranked number one in bankruptcy prediction scene. The two most frequently used methods for deriving the variable profile in LDA are simultaneous method and stepwise method (Laitinen & Laitinen 1998). The simultaneous method is direct and the discriminant analysis is executed with exante defined variable profile. Stepwise method, on the contrary, uses forward selection, backward elimination, or stepwise selection. Forward selection begins 27 with no variables in the model and at each step, if the variable that contributes the least to the classifying ability measured by the coefficient of determination, R 2 , fails to meet the criterion to stay, will be removed. Once in the profile, the variable stays there. The backward elimination begins with full variable profile and they are eliminated one by one if they do not contribute significantly to the degree of discrimination; R 2 . Stepwise procedure is a compromise between the two other procedures. Stepwise procedure starts out empty and the order of entry for the variables included in the model is solely based on their statistical criteria (Tapachnick & Fidell 2000). Variables can also be eliminated from the profile if they are not found significant anymore. The interpretation of the variables is not important in any of these methods which might not go along with reality for some variables being more meaningful than others. These methods are useful for testing explanatory variables especially when there are loads of them but the researcher must be cautious with the methods for not discarding important variables unconsciously. 3.1.2 Logit models Logit models provide results that are easy to interpret because they are based on probabilities. Each x is represents a certain financial ratio and they are weighted (a & b) according to past-data usually by the method of maximum likelihood. In other words, the logistic regression determines whether each explanatory variable has a predictive relationship with dichotomous dependent variable. Logit models are analogous to multiple linear regression methods when the dependent variable is binary. The model as in the basic form ln( p /(1 − p)) = a + b1 x1 + b2 x 2 + b3 x3 + ... + bn x n . (22) The main advantage against discriminant analysis is that logical regression model does not require variables to be normally distributed and samples to have equal 28 covariance matrixes. The probability for the delayed payment p fixed from the equation 22, as p= 1+ e − ( a + b1 x1 + b2 x 2 + b3 x3 + ... ) 1 . (23) The logistic curve illustrates the relation between probability, p, and the independent variable x (Figure 2). The logistic function “normalizes” all the values non-linearly to probability scale; 0 to 1. p 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 X Figure 2 Logistic curve. In 1985, Zavgren made a bankruptcy prediction research for 90 companies (Zavgren 1985) from the years 1972-1978. He made the actual prediction for one to five years prior to actual corporate failure. The prediction was reliable only one year prior to the failure by an accuracy rate of 82 %. According to the comprehensive history review (Adnan & Dar 2006), logit models account about 21 % of the bankruptcy prediction research methods. Average predictive accuracy among 19 past studies of the logit models is 87 %. Logit model can be seen as simple and competitive methods for bankruptcy prediction. 29 3.2 Artificial intelligent expert system models (AIESM) Artificial intelligent expert system models also focus on symptoms of failure drawn from company accounts. AIES models depend heavily on computer technology and they are multivariate in nature. There are many different kinds of AIESM models (Adnan & Dar 2006) but neural networks reflect the basic idea of the field. 3.2.1 Neural networks Compared to discriminant analysis, neural networks have a major advantage of not having a priori assumptions regarding the underlying structure of the relationship. Non-parametric models illustrate reality better than linear, parametric models but linear models are easier to calculate. Neural networks consist of neurons that work together to produce an output from inputs. Neural networks were initially invented to simulate the processes of human brain (Brockett et al. 2006) giving the structure for neurons. Neurons are interconnected, meaning that a neuron has many inputs and the neuron calculates weighted summation of the inputs to give an output. Mathematical neural networks function by constantly adjusting the weights of the summation. The process of learning (recognizing patterns, changing the interconnections, developing generalizations, etc.) is called the training rule (Brockett 1994). Neurons are grouped in layers, so the neural network functions in parallel, which gives the advantage of functioning even in the case of malfunction of some of the neurons. Similarly to the discriminant zscore, the weighted aggregate from inputs is calculated. Then, the sum is interpreted by the activation function to be sent out from the neural unit. activation function is usually a logistic function (Smith & Gupta 2002) The F ( z) = 1 , 1 + e ( − aZ ) (24) 30 in which, a determines the steepness of the slope and, Z is the weighted summation score. If the activation function would mimic a real neuron, it would give binary output but for many practical reasons, a smooth function is used. The output is usually centered to small values around zero. The most often-used activation functions are threshold, sigmoid and hyperbolic tangent. The use depends on the characteristics and the range of the wanted outputs. Neural networks offer strong results for corporate failure prediction. Neural network models neither explain how they ended up in the classification, nor do they give a likelihood of possible failure. Teaching the neural network is time consuming and finding the most suitable model may be difficult because there are many different models to pick from. A neuron is depicted below (Figure 3). Figure 3 Neural network neuron. Neural networks approach outperforms the linear discrimant analysis marginally in corporate failure prediction, especially in classifying financially troubled firms (Zapranis & Ginoglou 200). Yang et al. (1999) compared neural networks with DA with data gathered from years 1984-1989 from 122 companies in the US oil and gas industry. Five financial ratios were used to explain bankruptcy; net cash flow to total assets, total debt to total assets, explorations expenses to total reserves, current liabilities to total debt and the trend in total reserves. Yang’s results were a little ambiguous depending on the data processing methods but the conclusion was that Fisher’s discriminant analysis predicted bankrupt companies more accurately than the neural networks. 31 According to the history review (Adnan & Dar 2006), past studies of neural networks have had an average prediction rate of 87 % and about 9 % of the bankruptcy prediction studies have been neural network studies. Artificial intelligent models cover about a fourth of the bankruptcy prediction research. 3.3 Models based on economic theories Theoretical models concentrate on qualitative causes of failure and they are drawn from information that could satisfy the theoretical argument. Theoretic models are also multivariate in nature and they usually employ a statistical technique to support the qualitative theoretical argument. Four types of theoretic models are reviewed below. Balance sheet decomposition measures the changes in the balance sheet and relies on the assumption that companies try to maintain equilibrium in their assets. Heavy changes may be signs for financial distress in the future. Decomposition measures can include current assets as a fraction of total assets, current liabilities as a fraction of total assets, long-term liabilities as a fraction of total assets, etc. Booth (1983) produced empirical evidence that failed and non-failed companies have distinct characteristics in the composition of their balance sheet, even though, his model was unable to successfully classify non-failed companies. Yet, balance sheet decomposition is a useful tool for assessing companies’ financial condition. In Gambler’s ruin theory, firms are seen as gamblers betting constantly with some probability of loss (Adnan & Dar 2006). Ultimately the game ends as the firm fails. Flipping a coin is a good example of gambler’s ruin theory, because the player who starts with more coins is more likely to win, even with equal odds. Cash management theory puts weight on short-term cash balances of a firm (Adnan & Dar 2006). Imbalances with inflows and outflows can cause financial 32 distress and insolvency. Cash management theory models minimize the costs of cash management, optimize the capital structure and maximize the present value of net cash flows (Zapranis & Ginoglou 2000). The cash management would be much simpler in an optimal business world without lags in payments. Cash management is needed for preparing for unexpected costs, debt services, inventory fills, reserve cash for varying revenue, etc... The variables used in the cash management models vary but can include for example the elasticity of cash balance with respect to volume of transactions or elasticity of cash balance with respect to opportunity cost rate. Empirical studies support the idea of cash management behaviour changes notably prior to the times of financial distress (Zapranis & Ginoglou). Credit risk theories are mostly for money borrowing firms. Financial institutions have created a number of models for measuring credit risk and the models are based international bank regulatory framework BASEL, as well as on corporate finance theories. Two influential benchmark models (Gordy 2000) are J.P. Morgan’s CreditMetrics and Crédit Suisse’s Financial Product’s CreditRisk. Specifically, models are measuring the portfolio value-at-risk for market risk. In a comparison of the two, the models were claimed not having serious differences and they are profoundly suitable for comparing the relative risk levels in two portfolios than producing absolute levels of risk (Gordy 2000). 33 4 Empirical analysis Altman (1968), as well as other researchers have been using the US manufacturing sector as their test field for their statistical multivariate prediction models. The reasons for this cluster are that the enormous size of the US economy and the large number of listed companies, as well as, the availability of financial information. Also, many researchers come from the US and it is a common tradition to inspect domestic markets. The manufacturing sector is suitable for statistical multivariate research because it is stable and more visible than, for example, financial sector, where the business is not as concrete as in the transforming raw materials into goods. Manufacturing sector manufactures finished goods from primary materials offered by primary sector and sells them to other businesses, export or to domestic consumers. The economic sector used in this research is more extensive, industrial sector, as Bloomberg defines it. The industrial sector consists of 10 sub-sectors; aerospace/defence, building materials, electrical components and equipment, electronics, engineering and construction, environmental control, hand- and machine tools, construction and mining machinery, diversified machinery, metal fabricates and hardware, miscellaneous manufacturing, packaging and containers, shipbuilding, transportation and trucking and leasing. The industrial sector seems capital intensive, as they need plants for production and the sector seems rather stable. On the other hand, the industrial sector suffers from raw material and energy price hikes and from increased competition in emerging economies, in which the cost of labour is smaller. The industrial sector also includes cyclical industries, such as construction but when compared to other sectors in Bloomberg’s partition, the industrial sector is thought as the best for the research. The industrial sector included 970 listed companies in the end of year 2006, with market capitalization above $1 million. 34 The other sectors than the industrial sector in Bloomberg are basic materials, communication, cyclical consumer products, non-cyclical consumer products, diversified, energy, financial, technology and utilities. Another sector used is also included to examine the inter-sector predictive capability of the discriminating function. Validation of the predictive capability is an essential part of prediction model formation. The reference sector is non-cyclical consumer goods, as it was thought to be the most similar sector with the industrial sector in the Bloomberg’s partitioning. Non-cyclical consumer sector includes agriculture, beverages, biotechnology, commercial services, cosmetics and personal care, food, healthcare products, healthcare services, household products and wares and pharmaceuticals. Non-cyclical consumer companies are called defensive and they have tendency of outperforming the market in the times of recessions because they produce products that we are not used to live without. In the end of year 2006, 446 noncyclical companies existed with market capitalization above $1 million. The first decision was to use annual historical data instead of quarterly data because annual data seemed to have less variance. Companies fix their quarterly results knowingly, although the business stays as it is and the management prettifies results for example to please shareholders. Results can be fixed in the short run but they have less adjustment tolerance with annual results, in which “all comes together”. That makes annual results clearer than the quarterly data and it is used in this Thesis. Also, annual data was better available than quarterly. After some examination, it seemed that Bloomberg has history data from the beginning of the 90´s. There is not much of financial ratio data available in the 80’s although price history might be available way longer in the past. Therefore, the history data used to for the samples begins in 1989 and continues until the latest full year; 2006. 35 4.1 Valuation phase All the companies in the two selected sectors, industrial and non-cyclical, are valued with the dividend discount model. The discount rates are calculated from the Fed Funds rate, as it can be seen as the risk free return on investment. The Fed Funds rate is used in order to discount dividends and market prices a realistic manner. The goal of the Thesis is to valuate stocks on a relative basis which is why the general level of the discount rate is not of greatest interest. Important is that the discount is the same for all the observations. The annual Fed Funds rates from years 1989-2006 are shown in Table 1. Table 1 Fed Funds rate. Year 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Fed Funds rate 9,156 7,875 5,250 3,438 3,000 4,500 5,813 5,250 5,500 5,250 5,125 6,375 3,375 1,625 1,063 1,563 3,500 5,125 The valuation of a stock is the ratio between the price at time t and the sum of discounted dividends from the years t, t+1, t+2, t+3, t+4, t+5, t+6 and t+7 and discounted sale price from year t+7; 36 VALUATION = P0 Div P ∑ (1 + rt ) + (1 +7r ) t =1 i 7 7 . (24) As the valuation is enforced with 7 year dividend discount model, the latest possible valuation point in time is in the year 1999 with data reaching to year 2006. This is the major limitation of the analysis. The history data starts from the year 1989 but the valuation is started two years later, in the year 1991. The aim is to avoid mispricing of stocks in the first few years after the stock exchange listing. When a fresh new company is about to gather capital by listing itself in a stock exchange, it arouses plenty of prejudice and enthusiasm about its value. Companies are usually heavily mispriced at the moment of listing but, sooner or later, the markets tend to fix the price at the correct level. According the two year existence restriction, the valuation is executed for the years 1991-1999. Now that there are nine separate periods to enforce the valuation, there needs to be nine separate sets of interest rates formed from the annual Fed Funds spot rates. The required short discount factors can be calculated with rolling over an investment each year with a spot interest rt∗ = ∏ t ∗+7 1 . 1 t =t * ( + Fed _ Funds _ ratet / 100 ) (25) As can be seen in Table 2, discount factors derived from short rates vary some because of sliding “current time”. In other words, different sets of spot rates are used to calculate them. 37 Table 2 Discount factors. Period 1989-1996 1990-1997 1991-1998 1992-1999 1993-2000 1994-2001 1995-2002 1996-2003 1997-2004 1998-2005 1997-2006 t 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 t+1 0,927 0,950 0,967 0,971 0,957 0,945 0,950 0,948 0,950 0,951 0,950 t+2 0,881 0,919 0,935 0,929 0,904 0,898 0,901 0,901 0,904 0,894 0,904 t+3 0,851 0,892 0,907 0,878 0,859 0,851 0,856 0,857 0,850 0,865 0,850 t+4 0,827 0,853 0,868 0,834 0,814 0,809 0,814 0,805 0,822 0,851 0,822 t+5 0,791 0,807 0,821 0,791 0,774 0,769 0,765 0,779 0,809 0,842 0,809 t+6 0,748 0,766 0,780 0,751 0,736 0,723 0,740 0,767 0,800 0,829 0,800 t+7 0,710 0,726 0,739 0,715 0,692 0,700 0,728 0,759 0,788 0,801 0,788 Companies with market capital below $300 million and above $3 billion at the moment of examination are excluded from the samples. Therefore, companies can be called mid-caps, although the definition of mid-caps varies greatly. The midcap restriction is not very strict but it excludes very unstable start-up companies and, on the other hand, it excludes giant businesses that are very powerful and exceptional in numerous ways. 4.2 Basic sample, hold-out sample and external sample The nine year period is divided in to basic sample, 1991-1996, and hold-out sample, 1997-1999. The idea of the two samples is that the basic sample is for setting up the coefficients for the discriminant function (in-sample prediction) and the hold-out out sample is for trying out the predicting capability of the discriminant function (out-of-sample prediction). Yet, another out-of-sample prediction is enforced to the external sector of non-cyclical consumer goods for the years 1991-1996. Researchers constantly argue whether to use the in-sample prediction or the out-of-sample prediction (Atsushi Inoue and Lutz Kilian 2002). The conventional wisdom is that in-sample prediction has a weakness of model overfitting and out-of-sample prediction protects against it. The overfitting with in-sample prediction yields from using (at least partially) same data to adjust the model and to assess the prediction capability. 38 4.3 Selection procedure of companies The industrial sector has 290 companies with full price history starting from 1989. The procedure is to start from year 1991 and select nine smallest valuations, meaning nine cheapest companies. Each observation is obliged to have full ratio data and to have market capitalization between $300 million and $3 billion. The chosen ones are then being excluded from the list and the same choosing is enforced for the nine most expensive companies, respectively. This is repeated for each of the six years and the basic sample then consists of 108 companies, 54 cheap and 54 expensive ones. The reason to start from the furthest year from now is rationalized with the fact that there are more companies with full history data as it gets closer to the present day. That leaves bigger sample for the hold-out period, in which companies have to have complete price history two years before the start; 1995. The hold-out sample is formed in a similar manner, starting from year 1997 with the nine cheapest companies. The hold-out time period includes 300 additional companies with the two year existence rule compared to basic sample. However, the number of companies with full ratio data with the restrictions mentioned earlier is rather limited. Nine cheapest and nine most expensive companies are chosen for each year, starting from year 1997. Consequently, the hold-out sample consists of 54 companies, 27 cheap and 27 expensive ones. The extra sector of non-cyclical consumer products is treated in the exact same fashion. The list of potential companies is smaller than the list of industrial companies with total of 130 companies with restriction of having full price history from year 1989 until 2006. This time, the five cheapest and five most expensive companies are chosen each year which results in a sample of 60 companies. 39 4.4 Approaches to sample forming After many trials, the procedure for forming the samples in a yearly order with matching the same number of the cheapest and the most expensive ones seems practical. The procedure balances all the years with the same amount of observations in the both categories. An alternate procedure would be to simply choose the cheapest ones without paying attention to the year they occur. These could be matched with the most expensive ones, respectively. Then, the discriminating function may concentrate on some of the years more than others and the universal applicability might get biased. The paired-sample design is frequently used technique to form samples. Beaver (1966) is a well recognized researcher in bankruptcy prediction and he, among others, used paired-sample design in his analysis. At first, Beaver chose the bankrupt companies (cf. cheap companies) and, next, he started selecting pairs (cf. expensive companies) for each bankrupt company. The pairs needed to be from the same sector, to have rather equal market capitalization and to be from the same year. Compared to Beaver’s paired-sample design, the market capitalization matching has been left out, as the market capital restriction of $300m-$3bn is thought to incur adequate homogeneity among the companies. Also, the paired-sample design turned out to require way bigger initial samples as the sampling rules are so strict. In this case, the paired-sample design felt somewhat like data-mining, although the Thesis is about data analysis. To be clear, data mining focuses on extracting useful information from large sets of data (Mannila et al. 2001), and the crucial point is that the data-mining applications are to some degree self-guiding. Data analysis, for its part, is not aiming to the discovery of unforeseen patterns hidden in the data, but to assessing existing model into the data or extracting parameters for a model to adapt it to reality. 40 4.5 Ratio profile The ratio data gathered for each company in the two sectors includes, price to earnings ratio (PE), gearing ratio (GEA), price to book value ratio (PB), free cash flow per share yield (FCPY), cash flow per share yield (CPY), return on capital employed (ROCE), earnings before interests and taxes margin (EBITM) and price to sales ratio(PS). The dividend yield is excluded from the profile because if a company is not paying dividends, it is a systematic error in a form of zero or empty value. All the variables are continuous and, for example, companies with negative earnings must be excluded because only positive PEs are announced. All the rest are announced to be zero or empty value, which makes them discontinuous. 41 5 Results on predictive accuracy 5.1 Mean values and variances in the basic sample Class specific and overall mean values and variances of the basic sample are delivered in the Table 3. Each of the eight ratios is inspected as well as market capitalization of the companies. The variances are given as a percentage share of the mean value, thus, they are the coefficient of variance. From now on, group I refers to cheap and group II to expensive companies. Table 3 Mean values and variances of the basic sample. Variable PE GEA PB FCPY CPY ROCE EBITM PS MCAP Count Cheap 34.858 60.966 2.875 0.095 0.275 15.832 11.628 1.102 748.199 54 CV 3.068 2.789 0.876 2.918 3.347 0.602 0.585 1.076 0.862 Expensive 28.957 46.380 3.170 0.004 0.090 26.242 10.779 1.483 874.486 54 CV 1.152 1.675 0.941 20.179 0.930 3.851 0.558 1.410 0.668 Overall 31.908 53.673 3.023 0.049 0.183 21.037 11.203 1.293 811.343 108 CV 2.473 2.455 0.911 4.201 3.604 3.405 0.572 1.317 0.759 The market capitalization is restricted to $300m-$3bn and the average capitalization in the group of expensive stocks is about one fifth higher than in the group of cheap ones. The overall average market capitalization is $811m with a variance of 75.9 %. Companies in among the expensive ones seem to be more homogenous in size with a variance of 66.8%. In general, the starting point with valuation ratios is that the higher the ratio, the more expensive the stock. Price to sales-ratio and price to book-ratio are 10% and 35% higher in the group I, which argues with intuition. The average PE-ratio level of 31.908 is generally really high but the massive variance of 306.8 % in the group I refer to outliers, which in turn, can explain the illogical order of magnitude. 42 The group of cheap stocks is more heavily geared than the group of expensive stocks but again, there is a tremendous variance of 278.9 % in the group of cheap stocks. A somewhat logical explanation to higher gearing levels in the groups of cheap stocks could be that small, growing companies need to take more debt to be able to grow. Markets might price the group I stock low because of risk stemming from the relatively high debt. The overall gearing ratio is moderate 54 %. The cash flow ratios of CPY and FCPY and stronger among the cheap stocks but they are the most severely affected by the variance. The FCPY ratio in the group of expensive stocks has the record variance of 2017.9% which is not bearable for any multivariate analysis. The profitability ratios of ROCE and EBITM are at an understandable level, except the high variance of ROCE in the expensive group. EBITM is marginally stronger in the group of cheap stocks and on the contrary, ROCE is weaker in the group of cheap stocks. 5.2 Variable correlations Correlations between variables, without paying attention to groups, are in Table 4. The biggest correlation is between FPCY and CPY, which is understandable, because of the similarity of the calculation methods. The FCPY subtracts more than just the operational costs, such as capital expenditures and changes in working capital. The correlation of 0.92 affects multicollinearity which complicates the interpretation of individual variable influences in the discriminant score composition. The second highest correlation of 0.75 is between PB and PS. This correlation is also a strong one and it stems from the micro finance basics: sales is the input for the balance sheet, costs subtracted, of course. Also, the companies in the sample must have quite similar EBITM because correlated amounts of money are wasted 43 and brought to balance sheet. The lowest value in the overall coefficients of variance (Table 3) speaks out for the conclusion of the similar EBITM between the groups. The third high correlation is between EBITM and PS, which complements the aforementioned computational relationship from micro finance. Table 4 Variable correlation in the basic sample. Variable PE GEA PB FCPY CPY ROCE EBITM PS PE 0.15 0.24 -0.12 -0.06 0.01 -0.11 0.14 GEA PB FCPY CPY ROCE EBITM 0.09 -0.03 0.07 -0.00 -0.21 -0.26 -0.12 -0.15 0.08 0.22 0.75 0.92 -0.00 0.06 -0.10 -0.03 0.06 -0.13 0.04 0.02 0.49 Other correlations between the variables are not significant. The highest negative correlation of -0.26 is between PS and GEA and it suggests that the level of net debt increases as the sales increase or market price decreases. 5.3 Variable normality As is the case with the most of the multivariate techniques, the significance of statistical tests in discriminant analysis requires certain assumptions to be fulfilled. Violating the assumptions can influence the significance and the power of the statistical tests. The first assumption is that the data comes from a multivariate normal distribution. The overall glance of the mean values and variance of the basic sample suggests that the variables are not normally distributed. At first, the individual variable dispersions are examined graphically. The values of the variables are plotted as an inverse of the standard normal cumulative versus the ordered observations. First, the data is arranged from smallest to largest, and then the cumulative percentiles are determined. The z-scores are then calculated from the standard normal distribution and each of the z-value is plotted against the 44 corresponding data value. If the data is normally distributed, the plotted points form a straight line and stragglers at either end indicate outliers. Variables are examined by groups. The observation of the probability plots revealed the fact that in most of the cases, there are just a couple of sky high outliers. The assumption for the variable normality is rejected for all the variable groups because of those extreme values. The closest groups to the normality are GEA and CPY in group II, ROCE in group I and EBITM in both groups. The most famous and seemingly the most powerful individual test for normality is Wilks-Shapiro. The Wilks-Shapiro test is enforced for both the groups with all the 8 variables. The test-values, probability levels and decisions at the significance level of 0.05 are in Table 5. All the variables in both groups are rejected for normality at significance level of 5%. There are only 6 out of 16 with a probability level of above zero with six decimals. The same six variable groups were concluded to be close of being normally distributed in the “eye-balling” earlier. CPY in the group of expensive stocks is the most normal with a probability level of 0,006998. Table 5 Wilks-Shapiro normality test for the variables. Variable PE - Group I PE - Group II GEA - Group I GEA - Group II PB - Group I PB - Group II FCPY - Group I FCPY - Group II CPY - Group I CPY - Group II ROCE- Group I ROCE - Group II EBITM - Group I EBITM - Group II PS - Group I PS - Group II Test Value 0.201 0.475 0.521 0.883 0.681 0.521 0.411 0.864 0.213 0.937 0.897 0.174 0.856 0.897 0.701 0.519 Prob Level 0.000000 0.000000 0.000000 0.000080 0.000000 0.000000 0.000000 0.000021 0.000000 0.006998 0.000221 0.000000 0.000012 0.000218 0.000000 0.000000 Decision 5% Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality 45 The assumption of multivariate normality is necessary for the significance tests of the explanatory variables and for the discriminant function itself. The degree to which the assumption can be violated cannot be specified scientifically, but earlier research has shown, that although the overall classification rate remains unaffected, some groups might suffer from underestimation and some from overestimation (Lachenbruch et al. 1973). As it is said in the NCSS’s (statistical software) help page about discriminant analysis; “a sample size of at least twenty observations in the smallest group is usually adequate to ensure robustness of any inferential tests that may be made”. The phrase in the help page refers to central limit theorem, which suggests that the sum of variables is more likely to be normally distributed as the number of observations increases, despite the distributions they come from. The central limit theorem suggests that in the end, everything is normally distributed even though variables being discontinuous. The multivariate normality assumption is not very strict and researchers have not paid much attention to it in the earlier research. Unfortunately, there are very few tests for the examination of multivariate normality and in this regards, graphical examination is enforced (Johnson & Wichern 1987). The multivariate normality is checked with Q-Q plot which is formed in the following way. First, Mahalanobis distances are calculated for each of the companies. Mahalanobis distance is a statistical distance from the sample centroid. The distances are calculated with equation ′ 2 MDrs = ( xr − xs )V −1 ( xr − xs ) . (26) The matrix in the middle is the covariance matrix of the x, defined as: -V=cov(x). Second, the distances are sorted to the order of magnitude and percentiles are calculated for the distances according to the observation number j with an equation of (j-0.5)/n. The n is the total number of observations. Third, Chi-square values are calculated for the percentiles. It has been shown that when the sample size is sufficiently large (25 or more) and when the parent population is normal, 46 the mahalanobis distances behave like a Chi-square random variable (Gnanadesikan 1977). Figure 4 Chi square plot for the basic sample. The Chi-square values are plotted against the Mahalanobis distance in Figure 4. A curve is also fitted to with a least sum of squares method to the points in Figure 4 to examine the linearity of the plot. The plot appears not be linear at least because of extreme observations on the right side. The majority of the observations are concentrated on the small end where the slope seems rather steep and the plot is also skewed to left. The normality assumption is rejected after the graphical observation of the plot. Obviously, the test based on Q-Q plot is subjective, because the researcher visually determines the linearity. The more analytical way of assessing the linearity of the plot is to compare the correlation coefficient of the plot to the critical values. The critical values give the percent points of the cumulative sampling distribution of the correlation between sample values and theoretical quantiles obtained empirically by Filliben (1975). The correlation of the plot is 0.740 and compared to the critical value for the alpha level of 0.05 with 47 100 (closest to 108) observations of 0.987 (Sharma 1996:446), the plot is not even close to linear. Although the critical values are computed for univariate distribution, they can be used as a benchmark. 5.4 Equality of covariance matrixes The second assumption for discriminant analysis is the equality of covariance matrices. Violation of the assumption affects both, type I and type II error but type I is more severely affected (Sharma 1996). The type I error refers to the probability of falsely rejecting the null hypothesis due to change and the type II error is the probability of correctly rejecting the null hypothesis when it is false. As it came up with the normality assumption, group sizes should be kept equal because the significance level is not affected as badly by the inequality of covariance matrices. The equality of covariance matrices is tested with BartlettBox homogeneity test for individual variable and with Box’s M test for all the variables together. The test results for the equality of covariance matrixes are shown in Table 6. Table 6 Bartlett-Box tests for the equality of the covariance matrices. Variable PE GEA PB FCPY CPY ROCE EBITM PS Box's M Bartlett Value 59.877 29.6714 1.5138 70.8414 181.9493 177.8006 0.8097 16.1772 590.5288 DF1 1 1 1 1 1 1 1 1 36 DF2 33708 33708 33708 33708 33708 33708 33708 33708 37807 F Approx 59.420 29.420 1.500 70.320 181.210 177.060 0.800 16.030 15.090 F Prob 0.000 0.000 0.221 0.000 0.000 0.000 0.370 0.000 0.000 Chi2 Approx 59.310 29.390 1.500 70.170 180.230 176.120 0.800 16.020 543.790 Chi2 Prob 0.000 0.000 0.221 0.000 0.000 0.000 0.370 0.000 0.000 Since the significance levels are below 0.05 with all the variables except PB and EBITM, they are assumed to have significantly unequal variances between the groups. PB (0.221) and EBITM (0.370) have equal variances at statistically significant level, so, they pass the test. The Box’s M test is significant at any significance level with a probability of 0.000 indicating that the equality of 48 covariance matrices is not fulfilled. Tests for the covariance matrices and the variances are affected by the non-normality of the variables. 5.5 Classification capability Classification results were weak for basic sample with total classification rate of 56 % (Table 7). Classification is practically at the level of random guess, that being 50 %. There are 25 cheap companies that are classified as expensive (type I error) and 22 expensive companies that are classified cheap (type II error). The error rates are 46 % and 41 % for types I and II. As discussed earlier, the assumptions of multivariate normality and equality of variances were violated severely which is sure to weaken the classification results. Table 7 Classification count table for basic sample. Actual Group I Group II Total Predicted Group I Group II 32 22 25 29 57 51 Total 54 54 108 56 % A Wilks’ Lambda is used to test the significance of the discriminant function as a whole (Table 8). The Wilks’ Lambda is a direct measure of the proportion of the variance in the combination of dependent variables that is not accounted for by the grouping variable. The Wilks’ Lambda is about 0.87, which suggest that the group means are not substantially different from each other. The F-value is an approximation of the Wilks’ Lambda and the null hypothesis is that the groups are not statistically different from each other. The null hypothesis is accepted at the significance level of 0.05, thus the discriminant function is not statistically significant. Although, the probability level of 0.0737 is not far away from being significant. The square of the canonical correlation can be used as a practical significance of the discriminant function. The squared canonical correlation equals 0.1311 (Table 8), meaning, that about 13 % of the variation between the 49 two groups is account for by the discriminating variables, which appears to be quite unimpressive. Table 8 The significance of the discriminant function of the basic sample of 108 companies. Canon Corr 0,3621 Canon Corr2 0,1311 F-Value 1,9 Numer DF 8 Denom DF 99 Prob Level 0,0737 Wilks' Lambda 0,868901 There is not an analytical way of defining how high is “high” with the practical significance but it is similar to R-squared in multiple regression and it is used to determine if the strength of the relationship is strong on a relative basis. Frank et al. (1965) studied the biases in the discriminant analysis he suggested to use split sample validation. The Sample is randomly divided into five distinct subsets, each consisting of 14 cheap stocks and 14 expensive stocks. The prediction will be performed for the remaining 80 companies based on the coefficients acquired from the sub-sample of 28 companies. All the five prediction accuracies (Table 9) are in line with the original classification rate of 56 %, the final sub-sample even outperforming the original rate. Table 9 Split sample validation for the basic sample Sub 1 Sub 2 Sub 3 Sub 4 Sub 5 Prediction 53 % 55 % 54 % 51 % 58 % t-value 3.3 2.6 1.3 5.2 1.2 p-value 0.001 0.005 0.094 0.000 0.115 The t-values are based on a test of whether the proportion of correctly classified cases in the sample is significantly different from the proportion that would be obtained by change. The intuitively clear boundary is 50 % but the test takes into account the number of observations in the prediction. According to Frank, the tvalue is biased into direction of showing greater prediction rates than there would be among the whole population. But the magnitude of the bias decreases as the sample sizes become larger. 50 The test values are calculated with equation t= proportion _ correctly _ classfied − P . P(1 − P) n (27) P is the theoretical likelihood of belonging to each group, which is 0.5 in the two group case. The number of observations predicted is n (80). 5.6 Variable influence section This time, the Wilks’ Lambda test is used to test the statistical significance of discriminant functions that consist of only one variable at a time and also to test the affect on significance by removing one variable at a time from the full profile. Variable influence section gives clues of the most and least discriminating variables which are important especially when the ratio profile is not fixed. If the stepwise procedure is used to try out different combinations of variables, variable influence section gives guidelines for introducing and removing probabilities for the variables. The variable influence results are shown in Table 10. Table 10 Variable influence for the basic sample of 108 companies. Removed Variable PE GEA PB FCPY CPY ROCE EBITM PS Lambda 0.985312 0.997176 0.997792 0.929438 0.958672 0.989552 0.969838 0.980207 Removed F-Value 1.48 0.28 0.22 7.52 4.27 1.05 3.08 2 Removed F-Prob 0.227325 0.597645 0.640783 0.007258 0.041452 0.309091 0.082405 0.160534 Alone Lambda 0.998588 0.996909 0.997094 0.951233 0.97993 0.99467 0.99556 0.987369 Alone FValue 0.15 0.33 0.31 5.43 2.17 0.57 0.47 1.36 Alone F-Prob 0.699459 0.567667 0.579475 0.021637 0.143597 0.452727 0.493228 0.246839 RSquared Other X's 0.123465 0.323457 0.694694 0.857286 0.859114 0.019774 0.327410 0.752938 51 The alone statistical significances reveal that only FCPY is statistically significant (0.021637) at the significance level of 0.05. The significance suggests that the null hypothesis is rejected for the group centroids being equal. The second best alone F-probability is 0.143597 by the CPY, but the null hypothesis remains in force. The third most significant is the PS with a probability of 0.246839, but the rest of the alone probabilities are a lot weaker, PE, PB and GEA being the weakest discriminators. The removed Wilks’ Lambda is computed to test the impact of removing the variable from the profile. The impact is statistically significant with FCPY and CPY at the significance level of 0.05, which applies to their strong alone Fprobabilities. EBITM is also close of being significant with a probability of 0.082405. If removed, PB and GEA have the weakest affect. The last column in Table 10 is for R-squared value that would be obtained if the variable in question is regressed on all other independent variables. Values higher than 0.99 suggest severe multicollinearity among the variables. Removal is advised for variables with such high R-squared values. The variables that managed well in the two previous tests are now being the most correlated to the rest of the variables with values of 0.857286 and 0.859114. The variables are regressed on all the other variables but the cash flow ratios could be concluded to cause most of the correlation to each other, as the R-squared values are almost equal. The values are not alarming but the possibility of the linear combination of one to others has to be taken into account. ROCE is the most independent variable and PE is the second. The higher values with the cash flow ratios can be explained by the similarities in the way of calculating the ratios and, in the end, all the variables are connected through a balance sheet. The standardized canonical coefficients (same as standardized discriminant scores) are as an aid for the interpretation of the variates by showing the weight given each variable in the construction of the score (Table 11). The standardized coefficients are analogous to standardized beta coefficients in multiple regression 52 analysis. The cheap group has scores below zero and the expensive group above zero. Table 11 Standardized canonical coefficients for the basic sample. FCPY -1,96 CPY 1,51 PS 0,78 EBITM -0,59 PE -0,36 ROCE 0,29 PB -0,23 GEA -0,18 It is interesting that the absolute biggest weight is on FCPY and the second biggest on CPY, but they have different sign digit. PS weight of 0.78 and EBITM of -0.59 are also relatively big but the rest of the weights are quite equal. The correlation coefficients between the discriminant scores and the discriminator variables assist to interpret the relative contribution each variable has on the discrimination. The result matrix is called the structure matrix: Table 12 Structure matrix for the basic sample. FCPY -0,58 CPY -0,37 PS 0,29 ROCE 0,19 EBITM -0,17 GEA -0,14 PB 0,14 PE -0,10 The cash flow ratios are two most correlated variables with the discriminant scores but their mutual relation is ambiguous because this time, both the correlations have negative sign. The FCPY ratio suggests that the cheap stocks have relative strong FCPY ratio because the cheaper the stock, the more negative the discriminant score. High FCPY ratio can easily be accepted as a common characteristic for a cheap stock. According to the standardized weights, the CPY ratio could be interpreted as a decelerator for the FCPY ratio but the interpretation is more complex as the negative correlation with discriminant scores is accounted for. A careful guess is that the CPY decelerates the FCPY ratio a little but it remains positive for the cheap stocks. The PS ratio points out its influence on the discrimant scores with the third place in both the tables above. The positive correlation with discriminant score suggest high PS ratios for the expensive stocks, which is understandable as the low sales incur high PS ratio. Therefore, high sales figures (thus low PS ratios) can be accepted characteristic for cheap 53 stocks. The conclusion is consistent with the much debated FCPY ratio as well, because weak sales figure is not likely to cause strong FCPY. The success for FCPY and CPY are repeated in the dichotomous classification test for the basic sample (Table 13). Table 13 Dichotomous classification rates of the basic sample of 108 companies. Variable PE GEA PB FCPY CPY ROCE EBITM PS Group I 17 % 33 % 74 % 52 % 24 % 76 % 35 % 76 % Group II 81 % 63 % 35 % 76 % 87 % 15 % 67 % 26 % Overall 49 % 48 % 55 % 64 % 56 % 45 % 51 % 51 % The overall classification rate is at the level of random guessing, with an exception of FCPY, which classified 64% of the companies in the basic sample correctly. There seems to be a trend of predicting expensive stocks more accurately than the cheap ones. In an investing situation avoiding the type I error is more important because investing into expensive stocks based on incorrect classification can be disastrous. There are exceptionally high classification rates in the group II by PE, FCPY and CPY. On the other hand, the overall classification rates are evened out by classifying almost all the companies as expensive. 5.7 Prediction to hold-out sample The prediction capability is tested with the hold-out sample of 27 cheap companies and 27 expensive companies from the following years of 1997-1999. The companies have the same selection procedure and the same market capitalization limits than in the basic sample. As expected, the prediction rate is weaker than the original classification rate reaching only to 44 % (Table 14). The discriminant function managed to predict 15 out of 27 cheap companies while it predicted only 9 out of 27 expensive companies. Thus, the type I error of 67 % 54 increased 21 percentage points but the type II error of 44 % increased only 3 percentage points. For practical point of view, the high type I error is hazardous because 67 % of the expensive stocks are though as cheap. The explanation for the type I error could be too short statistical distance between the groups while the discriminant function is leaned towards the group I. Possible extreme points in the basic sample might also affect the classification rate. Table 14 Prediction results of the hold-out sample. Actual Group I Group II Total Group I 15 18 33 Group II 12 9 21 Total 27 27 54 44 % 5.8 Prediction to another sector Yet another prediction is tried out to another sector in order to assess the universal prediction capability. The sector is non-cyclical consumer goods sector in the U.S. and it is formed by a similar procedure than the other samples. The non-cyclical sample consists of 60 companies, 30 cheap ones and 30 expensive ones, 10 for each year in 1991-1996. The prediction rate is what was expected; 43 % with error rates 69 % and 73 %. Interpretation of the rates below the random rate of 50 % is quite useless but the prediction rate can be compared to the internal classification rate of the non-cyclical sector. A discriminant function is formed from the 60 companies in the non-cyclical sector and the similar 8 explanatory variables are included. The quick formation results 67 % discrimination between the groups and it is cross-validated to 50 %.The inter-sector prediction failed as badly as the prediction to the hold-out sample, although the non-cyclical sector could be discriminated more accurately. The estimated discriminant function coefficients for the basic sample are in Table 15. Table 15 Discriminant function for the basic sample. Constant PE GEA PB FCPY CPY ROCE EBITM PS 0,8836 -0,0045 -0,0014 -0,0850 -9,6840 2,3117 0,0040 -0,0923 0,4616 55 6 Sensitivity analysis The predictive accuracy obtained from the unprocessed data was weak. The assumptions of normality and equality of covariance matrices were severely violated and, next, the data will be processed to fulfill the assumptions, at least approximately. The sensitivity chapter focuses on the classification and prediction rates as the variables approach normal distribution and as different ratio profiles are tried. The procedure is more or less trial and error like because of the endless possibilities of executing the analysis. 6.1 Excluding outliers There are opponing viewpoints on whether it is wise or not to remove extreme points from the data. The grounds for and against the exclusion is not studied in this Thesis; the focus is on the effects of the exclusion on the classification and prediction assignments. The variable normality is approached by excluding outliers one by one. Outliers are excluded by observing the multivariate probability plot and individual probability plots at the same time. In many cases, the most extreme univariate outliers turned out to be multivariate outliers as well. Research has shown that the significance level is not appreciably affected if the group sizes are equal (Sharma 1996), even if the covariance matrices are not equal. Every effort should be made in order to keep the group sizes equal, they claim. Total of 25 outliers are excluded based on our graphical assessment. The outliers count only as 18 companies, as there are multidimensional outliers. The current exclusion is fit, because same amount of companies is removed from both groups (9) and because the sample size is not reduced too much. The original sample is reduced by 17 % and from now on, the sample is called the trimmed sample. 56 As expected, the overall variances have dropped dramatically with almost all the variables (Table 16), as the outliers were excluded. FCPY, CPY and ROCE experienced the heaviest drops in the variance of more than 230 percentage points and PE following with 190 percentage points. The lightest drop was with EBITM of only 3.6 percentage points. Table 16 Coefficients of the overall variance development. Variable PE GEA PB FCPY CPY ROCE EBITM PS MCAP CV 108 Basic 2.473 2.455 0.911 4.201 3.604 3.405 0.572 1.317 0.759 CV 90 Trimmed 0.512 1.951 0.478 1.889 0.785 0.528 0.536 0.862 0.717 6.1.1 Normality The exclusion of the 18 companies made four out of 16 groups of variables normally distributed with a significance level of 0.05 according to Wilks-Shapiro test (Table 17). 57 Table 17 Wilks-Shapiro variable normality test for the trimmed sample. Variable PE Group I PE Group II GEA Group I GEA Group II PB Group I PB Group II FCPY Group I FCPY Group II CPY Group I CPY Group II ROCE Group I ROCE Group II EBITM Group I EBITM Group II PS Group I PS Group II Test Value 0.789 0.924 0.878 0.984 0.862 0.932 0.875 0.961 0.821 0.976 0.988 0.943 0.837 0.930 0.824 0.755 Prob Level 0.000001 0.005698 0.000208 0.796612 0.000077 0.011279 0.000171 0.132815 0.000007 0.478296 0.915369 0.028268 0.000017 0.009540 0.000008 0.000000 Decision 5% Reject Normality Reject Normality Reject Normality Accept Normality Reject Normality Reject Normality Reject Normality Accept Normality Reject Normality Accept Normality Accept Normality Reject Normality Reject Normality Reject Normality Reject Normality Reject Normality GEA ratio in group II and ROCE in group I achieved strong normality after the exclusion of the outliers, with probability levels of 0.79 and 0.91. CPY and FCPY in group II achieved probability levels of 0.47 and 0.13. Other ratios close to being accepted are PE, PB and ROCE in group II. The PS ratio in group 2 is the only one of them to remain at the probability level zero. The variable normality improved comfortably on a univariate basis. The multivariate normality is assessed with the help of the Chi square to Mahalanobis distance plot, again (Figure 5). 58 Figure 5 Chi Square plot for the trimmed sample. Based on visual examination of the Chi square plot, the multivariate normality cannot be accepted. The plot could be seen as partly linear if it would be cut in two at around Mahalanobis distance of 7. The first half is steeper than the second half. The values are concentrated in to the small end The correlation between the Mahalanobis distances and the Chi square values is 0.934, which is much higher than the 0.740 for the basic sample. Yet, the correlation is smaller than the critical value 0.985 and the plot is not linear. The normality assumption was rejected but the plot got better in the sense of linearity. 6.1.2 Equality of covariance matrixes The equality of covariance matrices was reasoned earlier to be dependent of the normality of the variables. Now that normality assumption is a little closer to being satisfied, the equality of the covariance and variances is tested again with the Bartlett-Box test. The null hypothesis that the inside-group variances are similar is accepted for GEA, PB, CPY, ROCE, and for EBITM (Table 18). If the 59 range between 0.01-0.05 is considered having marginally equal variances between groups, PE has marginally equal variances. PS is also close to having marginally equal variances with the value 0.009. FCPY remains at the probability level of zero which makes it the number one suspect for the rejection of the Box’s M test. The F approximation of the test was rejected with significance level of 0.05 but the value, 0.006, rose promisingly from the zero. As discussed earlier, the equality of the covariance matrixes was found to improve the type I error rate. Table 18 Bartlett-Box tests for the equality of the covariance matrixes for the trimmed sample. Variable PE GEA PB FCPY CPY ROCE EBITM PS Box's M Bartlett Value 4.0201 0.6414 0.0112 13.4873 2.9138 1.0398 0.0599 6.8076 67.6068 DF1 1 1 1 1 1 1 1 1 36 DF2 23232 23232 23232 23232 23232 23232 23232 23232 26057 F Approx 3.980 0.630 0.010 13.340 2.880 1.030 0.060 6.730 1.700 F Prob 0.046 0.426 0.916 0.000 0.090 0.311 0.808 0.009 0.006 Chi2 Approx 3.970 0.630 0.010 13.330 2.880 1.030 0.060 6.730 61.160 Chi2 Prob 0.046 0.426 0.916 0.000 0.090 0.311 0.808 0.009 0.006 A quick trial of the Box’s M test without FCPY revealed the fact that without it, the equality of covariance matrices is accepted with F-probability 0.055. With the original basic sample, the coefficients of variance for FCPY were 2.918 for group I and 20.179 for group II. In the trimmed sample, the coefficients were 1.431 and 2.873, respectively. The gap between the coefficients of variance narrowed, but compared to group 1, the variance is twice as big in the group two. To speak for the FCPY ratio, it was the only one to reject the null hypothesis of the group centroids being equal, earlier with the basic sample, when it was considered alone. The FCPY ratio is strong but temperamental predictor. 6.1.3 Classification capability The classification capability improved 10 percentage points to 66 % (Table 19). The type I error is 33 % and type II 36 % compared to 46 % and 41 %, prior to the 60 exclusion of the outliers. The classification results improved from the level of random guessing to the level of being indicative. Table 19 Classification count table for the trimmed sample. Actual Group I Group II Total Group I 29 15 44 Group II 16 30 46 Total 45 45 90 66 % The exclusion of the extreme points from the basic sample has a moderate effect on the classification rate. The improvement is understandable, because such an extreme values easily biases the discriminant function into direction or another. High within-group variance is very harmful in discrimiant analysis. The differences should be between the groups. 6.1.4 Variable influence FCPY and CPY ratios are significant predictors if used alone. They both achieved statistically significant discriminant functions with probabilities 0.004 and 0.008 (Table 20). On the other hand, they are easier to regress with other variables than the rest of the variables, except the PS ratio. The multicollinearity situation is changed because earlier, the R squared values were equal for the tow cash flow ratios. EBITM and GEA were the ratios that weakened as alone predictors but the rest of them had some improved. The rank 1 alone classifier, FCPY, is still the best alone predictor. However, FCPY’s affect of removal decreased the previous 0.007 to 0.339. According to this test, ROCE achieved the top position in the list of ratios not to be removed from the full ratio profile. 61 Table 20 Variable influence section for the trimmed sample of 90 companies. Removed Lambda 0.963179 0.987205 0.986721 0.988714 0.992533 0.96662 0.987048 0.975639 Removed F-Value 3.1 1.05 1.09 0.92 0.61 2.8 1.06 2.02 Removed F-Prob 0.082236 0.30859 0.299554 0.339121 0.437293 0.098289 0.305624 0.15882 Alone Lambda 0.997806 0.999454 0.985097 0.911228 0.922106 0.987987 0.999992 0.978474 Alone F-Value 0.19 0.05 1.33 8.57 7.43 1.07 0 1.94 Alone F-Prob 0.66109 0.826927 0.2517 0.004343 0.007725 0.303772 0.979479 0.167615 RSquared Other X's 0.437305 0.380058 0.432491 0.635176 0.718753 0.360985 0.616409 0.725707 Variable PE GEA PB FCPY CPY ROCE EBITM PS 6.1.5 Out-of-sample prediction capability Hold-out sample prediction is supposed to remind real decision making situation and, thus, the normality or equality of covariance matrices are not checked. Holdout samples are rarely processed in the earlier studies, either. The prediction rate of the trimmed sample to the hold-out sample reveals more surprising results (Table 21). The overall prediction rate jumped from the noninformative 44 % to 61 %. The prediction rate is not high enough to be used in practice but the improvement is promising. The type I error is 48% and the type II error is credible 30 %. The trend of leaning towards the group I seems to exist in the trimmed sample as well as earlier with the original sample. The lean might also be caused by the differences in the ratios in general because the hold-out sample is after the trimmed sample in the time span. Table 21 Prediction count table of the trimmed sample to the hold-out sample. Predicted Actual Group I Group II Total Group I 19 13 32 Group II 8 14 22 Total 27 27 54 61 % The impacts of removing the 18 companies with extreme characteristics, turned out to be beneficial for the classification and prediction rates. The research 62 continues with both the sample; the original basic sample (108 observations) and the trimmed sample (90 observations) in order to test the variable transformations and ratio profiles on them. The trimmed sample has a lead over the basic sample because of the significant improvement in the classification and prediction rates. 6.2 Variable transformations There is a great variety of mathematical transformations for achieving variable normality. Variable can be multiplied, squared, raised to power, converted to logarithmic scale, invested and etc. For counts, the most common transformation is the square root transformation. Data transformations should be utilized only with a clear reason to do so and to do less is to decrease the chance of drawing incorrect conclusions. The impacts of the three basic transformations are examined in this regard; square root, logarithmic, and inverse transformation. Square root transformation is the most common transformation and it is the most suitable for counts. Square root transformation can be used to distributions that look quite like the normal distribution but which are skewed to the left. In other words, the distribution has relatively large number small values. As the square cannot be taken from the negative values, a constant is be added to the values to makes them all positive. Besides that, square root is highly nonlinear with values between zero and one, so, the minimum should be set higher than that. The logarithmic transformation is commonly used for proportions, but with counts, the transformation can be used similarly to the square root transformation but when the distribution leans left exceedingly. The base for the logarithmic transformation changes the nature of the transformation and natural constant is deemed suitable for the variable values in the Thesis. Greater bases for the logarithmic transformations, such as 10 and 100, will result in a loss of resolution with smaller values. The logarithm cannot be taken from negative values either; a constant must be added to level up the values. The inverse transformation makes very small values very large and very large numbers very small. This transformation has the effect of reversing the order of 63 the scores and therefore, the values must be reversed by multiplying them by -1 and by adding a constant to bring the minimum back to above 1.0. The inverse transformation is suitable for variables whose distribution looks like a steep leftward slope. That means, that there are large number of small values and the number of bigger values decrease rather linearly. The logarithmic transformation works also for data that has growing residual as the value of the variable grows. The three transformations were introduced in the order of their power, starting from the weakest. According to the guidance of literature (Osborne 2002), all the variables are anchored to minimum of 1.0. 6.2.1 Transformations and the basic sample All the variables are first being transformed to perform an exploratory analysis. Univariate normality, multivariate normality, as well as the classification rates are in focus. The probabilities of the F-test indicate that there are three variable groups significantly normal after square root transformation, five after logarithmic transformation and five as well, with the inverse transformation (Table 22). The probabilities without the transformations are very close to the zero because of the outliers and they improved slightly in consequence of the transformations. On the other hand, the classification rates improved way above the original level of 56 %. The classification rate improves as the transformation gets stronger, the inverse transformation resulting creditable 9 percentage point improvement to 65 %. The prediction rates improve quite linearly with the classification rates but even the highest rate of 54 %, by the inverse transformation, is too close to the random guess rate to be considered noteworthy. 64 Table 22 Wilks-Shapiro normality test after transformations for the basic sample. PE PE GEA GEA PB PB FCPY FCPY CPY CPY ROCE ROCE EBITM EBITM PS PS Group I Group II Group I Group II Group I Group II Group I Group II Group I Group II Group I Group II GROUP I GROUP II GROUP I GROUP II Original 0.000000 0.000000 0.000000 0.000080 0.000000 0.000000 0.000000 0.000021 0.000000 0.006998 0.000221 0.000000 0.000012 0.000218 0.000000 0.000000 56 % 44 % SQRT 0.000000 0.000000 0.000000 0.058680 0.000004 0.000000 0.000000 0.000007 0.000000 0.020684 0.027411 0.000000 0.005744 0.156928 0.000001 0.000000 59 % 46 % LN 0.000006 0.000042 0.000161 0.908347 0.015348 0.000647 0.000000 0.000003 0.000000 0.055728 0.394959 0.000000 0.560079 0.099314 0.000539 0.000002 62 % 50 % INV 0.000000 0.027550 0.181271 0.001157 0.000016 0.286330 0.000000 0.000000 0.000000 0.275222 0.025635 0.000000 0.000628 0.000000 0.847031 0.057886 65 % 54 % Classification Rate Prediction Rate The prior objective of the analysis of the transformations is to achieve higher classification rates through the implicit goal of improving the variable normality and the equality of the covariance matrices. Based on the prior objective, a test profile can be formed to represent the highest normality probabilities on a univariate basis. The test profile consists of inversed PE, PB, CPY and PS, logarithmic GEA, ROCE and EBITM , and plain FCPY, according to the probabilities from the Wilks Shapiro test. The multivariate normality was earlier assessed with the correlation between the Mahalanobis distances and the corresponding Chi square values. For the test sample, the correlation is 0,822539. The correlation is better than the original 0.740 but the multivariate normality is rejected, the critical value being 0.987 (Sharma 1996:466), n=100 and alpha=0.05. The test profile yields classification rate of 61 % which outperforms the crude square root transformation. The logarithmic and the inverse transformations result higher rates, and the process of deciding on the transformations based on their univariate normality is found useless. The multivariate normality correlations was initially 0.740 and it improves to levels 0.769, 0.841 and 0.833 for the transformations in the order of their power (sqrt, ln and inv). The assumption of 65 the multimultivariate normality is rejected at for the three transformations, as the critical value is 0.987. The F approximations for the Box’s M test remained at the zero level, thus none of the crude transformations improved the equality of the covariance matrices. As it was discussed earlier, the assumption is very sensitive to the outliers and multivariate normality. Since the classification rates improved by the slight improvements with the normality assumptions, a simple sensitivity analysis is enforced, in order to examine the effects of transforming one variable at a time. There is only one transformation with a negative effect on the classification rate; squared ROCE (Table 23). The transformations of the GEA ratio turned out to strengthen the classification rate as much as 8 % at its best, by the logarithmic transformation. The inverse transformation improves the most the classification and the squared and logarithmic transformations cause rather equal improvements. Table 23 Classification rate: one transformation at a time. PE GEA PB FCPY CPY ROCE EBITM PS SQRT 2% 2% 2% 1% 4% -1 % 4% 1% LN 0% 8% 2% 1% 4% 2% 4% 2% INV 1% 7% 5% 0% 6% 3% 3% 4% The main conclusion is that any of the variable transformations is beneficial for the classification rate of the basic sample. The improvement might stem from smoothing of the extreme distances between the observations by the transformations. Dichotomous classification rates are also of interest and despite complex relationship with multivariate classification, they give hints of the discriminating 66 power of the variables. The three transformations are tested on the eight variables and the dichotomous classification rates are compared to the original rate. There are three obtrusive improvements among the dichotomous classification rates; 12 and 10 percentage points with the PE ratio and the 11 percentage points with the ROCE ratio (Table 24). The three improvements are notable but the levels reached are still weak. The logarithmic transformation upraises PE to the rate of 61 % and ROCE to the original level of 56 % that the multivariate case resulted. FCPY is originally the strongest classifier but it weakens as it is being transformed. The effects of the transformations vary quite a lot but the logarithmic transformation is the only one not to decrease the rates. Table 24 Dichotomous classification rates after transformations. PE GEA PB FCPY CPY ROCE EBITM PS Original 49 % 48 % 55 % 64 % 56 % 45 % 51 % 51 % SQRT 7% -1 % 3% -1 % 5% -4 % 3% 4% LN 12 % 7% 3% 0% 2% 11 % 0% 5% INV 10 % 9% 0% -4 % 0% 6% 1% 0% The number of combinations that can be formed with eight variables and four choices for each is too large to roam through manually. Yet, numerous combinations were tried out and they did not results significantly higher classification rates than the 65 % that was achieved by the comprehensive inverse transformation. Even if a few percentage points were achieved, it was not logically achieved according to all the results depicted earlier. Therefore, transformations are considered useful for increasing the classification rate with the basic sample, which includes univariate and multivariate outliers. The inverse transformation was the most beneficial for the classification and prediction rates. Although the literature stresses that the variable transformations must be well reasoned, a simple crude trial of inverse transformation yielded highest results. 67 6.2.2 Transformations and the trimmed sample As with the basic sample earlier, all the variables are first being transformed in order to see what happens. The univariate normality and the multivariate normality, as well as the classification and prediction rates are examined. Wilks’ Shapiro test and the classification rates are shown in Table 25. The last row of the table reveals the fact that if all the variables were transformed, none of the classification rates improved compared to the starting point of 66 %. Same goes with the prediction rates; none of them improved. The exclusion of the outliers made four out of 16 sixteen variable groups normally distributed at the significance level 0.05. Taking the square root from the variables kept the same four variable groups at an acceptable level and additional three variable groups are also accepted as being normally distributed. The number of normally distributed variable groups is10 and 6 for the logarithmic and inverse transformations, respectively. The variable normality is tested by groups but if a transformation is accepted, both groups of the variable have to be transformed. Table 25 Wilks-Shapiro normality test after transformations for the trimmed sample. PE PE GEA GEA PB PB FCPY FCPY CPY CPY ROCE ROCE EBITM EBITM PS PS Variable Group I Group II Group I Group II Group I Group II Group I Group II Group I Group II Group I Group II GROUP I GROUP II GROUP I GROUP II Trimmed 0.000001 0.005698 0.000208 0.796612 0.000077 0.011279 0.000171 0.132815 0.000007 0.478296 0.915369 0.028268 0.000017 0.009540 0.000008 0.000000 66 % 61 % SQRT 0.000453 0.141455 0.026774 0.982204 0.003091 0.159392 0.000618 0.097257 0.000021 0.597056 0.538298 0.000220 0.004104 0.453685 0.000368 0.000013 61 % 56 % LN 0.057554 0.347929 0.614070 0.473473 0.087513 0.639762 0.002124 0.068865 0.000063 0.702221 0.105115 0.000000 0.312850 0.038543 0.015145 0.000615 62 % 57 % INV 0.000013 0.002027 0.274891 0.000139 0.590774 0.385379 0.018725 0.031599 0.000534 0.840173 0.000341 0.000000 0.000624 0.000000 0.802802 0.123910 59 % 59% Classification rate Prediction rate 68 Now that the effects of the transformations on the normality are excluded, a test sample can be formed by taking a transformation based on the best sums of the Ftest probabilities in Table 25. Then, the test sample would the most normal on a univariate basis, and the effects on the classification rate can be observed. The test sample consists of logarithmic PE, GEA and EBITM, inversed PB, CPY and PS and plain FCPY and ROCE. The test sample classifies correctly 62 % of the observations. Now there are only four variable groups that are not normally distributed at the significance level 0.05, yet, two of them are close, the logarithmic EBITM ratio in group 2 with probability of 0.038543 and the ROCE ratio in the group II with probability of 0.028268. The weakest two are the FCPY ratio in group I (0.000171) and the inverted CPY ratio in group I (0.000534). The correlation between the Mahalanobis distance Chi square plot is 0.9686, which suggests that the multivariate normality is rejected when compared to the critical value of 0.985 (Sharma 1996:466 ). Nonetheless, the correlation improved from the earlier 0.934 for the trimmed sample. In general, both the univariate and the multivariate normality assumption are very close to being met. The classification rate of the test sample weakened four percentage points because of the transformations. Therefore, the method of selecting variable transformations based univariate probabilities from the Wilks-Shapiro normality test, is considered useless The equality of the covariance matrices did not improve earlier after transforming the variables in the basic sample. The F approximation of the Box’s M for the trimmed sample is 0.006 and it improves slightly after square root transformation (0.026) and after the logarithmic transformation (0.007) but fades after the inverse transformation (0.000). Therefore, the assumption for the equality of the covariance matrices was not met with the three transformations. A simple sensitivity analysis of trying out one variable transformation at a time for each variable in the trimmed sample resulted equal or weaker classification rates compared to the 66 % with the non-transformed ratio profile (Table 26). 69 According to the classification results, there are no grounds for any of the transformations taken in the test sample. Table 26 Classification rates after transforming variables one at a time. PE GEA PB FCPY CPY ROCE EBITM PS SQRT 0% -3 % -2 % 0% 0% -3 % -2 % -4 % LN -2 % -3 % -4 % -2 % -2 % -4 % -2 % -5 % INV -4 % -2 % -3 % 0% -2 % -6 % -5 % 0% The transformations did not improve the classification rate on a dichotomous basis. The drops of 8 % with PE and 6 % with ROCE and EBITM, can be considered harmful. The rest of the alterations in the classification rate are rather marginal. The interesting thing is that the why FCPY and CPY maintain their classification rate even if they are transformed? The Bartlett’s test for equality of variances points out that the FCPY is furthest away from the equality (Table 18). CPY was ranked fifth, respectively. This intimates that the violation of equal variances does not affect the classification rate, at least not very obviously. The comparison of the group I mean values to group II mean values reveals the fact the FCPY has the biggest gap (72 %) and the CPY (36 %) the second biggest, compared to the rest of the gaps from (0.29 %-28 %). The values of the FCPY with such a wide gap between the means makes it a strong classifier, even if transformed. Table 27 Effects of the transformations on the dichotomous classification rate. Variable PE GEA PB FCPY CPY ROCE EBITM PS Trimmed 63 % 56 % 59 % 63 % 53 % 56 % 52 % 56 % SQRT -2 % 1% -1 % 0% 0% -1 % -4 % 0% LN -4 % 1% -1 % 0% 1% -6 % -6 % 1% INV -8 % -1 % -2 % 0% 3% -3 % -3 % -3 % 70 The variable transformations did not improve the discriminating power (Table 27) of the analysis even though the normality assumptions improved. The normality and the equality of the covariance matrices assumptions do not seem very strict with discriminant analysis. The assumptions must be looser because the ultimate output is binary variable instead of continuous variable. 6.3 Superior ratio profile There are 255 distinct combinations as there are eight variables in the consideration. The purpose of this section is to strive for higher classification and prediction rates by trying out the different combinations. The three samples are; the basic sample of 108 companies, both original and inverse transformation versions and trimmed sample of 90 companies. The inclusion of the three samples enables a comparison of the classification and prediction rates after the two data processing techniques, outlier exclusion and variable transformation. The gearing ratio was discussed with the instructor and the conclusion was to try it out at first but also, to substitute the ratio with a binary variable. The dummy variables are encouraged to be used with logit models rather than with discriminant analysis. Since there is not a clear reason not to use dummy variables, GEA MEDIAN is introduced. The GEA MEDIAN simply divides the companies in the sample in two, half of them being geared and the other half free from debt. The reason for the substitute ratio is that slight differences in the gearing ratios do not affect the “goodness” of the companies and they are considered equal. The companies that are heavily geared are, for their part, considered worse (different) that the rest of the companies. In other words, the additional information that the continuous gearing ratio offers was questioned and the simpler division was introduced. The new variable serves as a substitute for the original GEA ratio. 71 The trials of substituting the original GEA ratio with the GEA MEDIAN ratio proved to improve the classification and prediction rates enough to approve the substitution to take place (Table 28). Both the inversed basic sample and the trimmed sample achieved 71 % which is 15 percentage points higher than the original rate. The cell size can be considered to affect the creditability of the results. Therefore, the 71 % for the inversed basic sample is more reliable than the trimmed sample because the trimmed sample is 17 % smaller. Surprisingly, the prediction rate of the trimmed sample fell as the GEA MEDIAN substitute took place. So far, the inversed basic sample is deemed having the best discriminating capability. Table 28 GEA MEADIAN substitute; classification and prediction rates. ORIGINAL GEA Classification Prediction 56 % 44 % 65 % 44 % 66 % 61 % GEA MEDIAN Classification 65 % 71 % 71 % Prediction 46 % 61 % 57 % Basic Inversed basic Trimmed The most discriminating ratio profile for the basic sample is achieved without ROCE and EBITM; thus PE, GEA MEDIAN, PB, FCPY, CPY and PS are included. The classification rate is 69 % while the error rates 33 % for type I and 28 % for type II. The new ratio profile improved the prediction rate by 13 percentage points to 59 % and the prediction errors are 30 % and 52 % for type I and type II. As can be seen in Table 28, higher rates can be achieved by outlier exclusion and variable transformation. The trials of different variable combinations for the inversed basic sample did not yield higher classification rates than the 71 % with the full ratio profile. Equal classification rates can be achieved with leaving PE, PB or ROCE out, one at a time, or by leaving ROCE and EBITM out together. The four additional ratio profiles have more type I leaning error distribution, so, the original full profile is chosen for further examination. The errors are rather uniformly distributed; 28 % for type I and 30 % for type II and the classification is strongly supported by the cross-validation of 67.6 % correct classifications. The cross-validation is leave72 one-out of a kind, in which each observation is classified by the functions derived from all cases other than that case. Although the inversion is based on a trial rather than well justified variable normality approach, it yielded the second highest classification rate and the highest prediction rate. The highest classification rate for the trimmed sample is 73 % and it is achieved by two different combinations, but the one with the higher type I error is discarded. The ratio profile excludes EBITM and PS ratio, similarly with the basic sample, leaving PE, GEA MEDIAN, PB, FCPY, CPY and ROCE as explanatory variables. The type I error is 22 % while type II error is 31 % and the rate of 73 % is cross-validated as 61 %, by the leave-one-out type of cross-validation. The cross validation suggests that chance had a little to do with the high classification rate. The exclusion of the EBITM and PS ratio did not improve the prediction rate, which remained at the 57 %. The prediction errors were 37 % and 48 %. 6.3.1 The proposed models for classifying and predicting The trimmed sample with the GEA MEDIAN substitution and EBITM and PS exclusion resulted the highest classification rate in this Thesis; 73 %. The highest prediction rate was achieved by the inversed basic sample with GEA MEADIAN substitution; 61 %. The two samples are now called the ultimate classifier and the ultimate predictor. Statistical and practical significance comparison of the ultimate samples is gathered to Table 29. Table 29 Significances of the ultimate functions. Canon Corr 0.4123 0.4362 Canon Corr ^ 2 0.17 0.1903 Prob Level 0.0147 0.0059 Wilks' Lambda 0.83001 0.809717 The ultimate classifier The ultimate predictor As expected, the null hypothesis can be rejected at an alpha level of 0.05, implying that in both cases, the two groups are significantly different, respect to the explanatory variables taken jointly. Even though the discriminant functions 73 are statistically significant, the difference between the groups might not be large, especially with large sample sizes. The practical significance of the contenders is measured by the square of the canonical correlation, which is equal to the share of the between group variance from the total variance. The square of the canonical correlation of analogous to the R-squared in multiple regression, hence, it is the strength of the discriminant function. The ultimate predictor sample is two percentage points higher than the ultimate classifier in the squared canonical correlation, but the 19 % cannot be considered impressive. There is a slight improvement from the 13 % that was achieved by the original basic sample. Standardized coefficients are normally used for assessing the relative importance of discriminator variables forming the discriminant function. Table 30 exhibits the coefficients for the two ultimate samples as well as for the basic sample with GEA MEDIAN to comparative purposes. Table 30 Standardized coefficients for the ultimate functions. Basic Sample FCPY PS CPY EBITM PB GEA MEDIAN PE ROCE -1.56 1.13 1.08 -0.52 -0.48 0.48 -0.37 0.24 The Ultimate Classifier 0.66 CPY -0.50 GEA MEDIAN 0.49 ROCE -0.43 PB 0.43 PE 0.30 FCPY The Ultimate Predictor -0.90 INVERSED PE -0.83 INVERSED CPY 0.70 INVERSED PS 0.67 GEA MEDIAN -0.59 INVERSED FCPY -0.32 INVERSED ROCE -0.30 INVERSED EBITM 0.01 INVERSED PB The existence of the multicollinearity is pretty evident because of the varying relative importance between the samples. For example, FCPY ratio is the most important variable in the basic sample but in the ultimate classifier sample (outliers excluded); it is the least important contributor. Similarly, PE ratio is the most important in the ultimate predictor sample compared to the second last position in the two other samples. A conservative conclusion could be made for the importance of the cash flow ratios jointly for the high ranking of CPY ratio and varying ranking for FCPY ratio. Strong inferences regarding the importance 74 of the discriminator variables should be avoided because of the multicollinearity present in the variables. Another way to rank the discriminator variables is to compare the correlation coefficients between the discriminant score and the discriminator variable, which vary from +1 to -1. A correlation close to either end of the range indicates high communality between the discriminator variable and the discriminant score. The structure matrices of the two ultimate samples and the basic sample with GEA MEADIAN variable substitute are compared below. Table 31 Comparison of the structure matrices. Basic sample FCPY CPY PS GEA MEDIAN ROCE EBITM PB PE -0.54 -0.34 0.27 0.27 0.17 -0.16 0.13 -0.09 The ultimate classifier -0.69 FCPY -0.64 CPY 0.35 GEA MEDIAN 0.27 PB -0.24 ROCE 0.10 PE The ultimate predictor INVERSED FCPY INVERSED CPY INVERSED PE INVERSED ROCE INVERSED PS INVERSED PB GEA MEDIAN INVERSED EBITM -0.61 -0.55 0.36 -0.29 0.28 0.28 0.23 -0.22 The structure matrices speak for the conservative notion suggested earlier for the importance of the cash flow ratios. FCPY ratio possesses the highest and CPY the second highest correlations with the discriminant scores among the variables all the three samples in Table 31. Although the variables are inversed in the ultimate predictor sample, the order is same for the FCPY and CPY ratio as in the noninversed samples. Compared to a value of 0.50, which is sometimes considered as a cut-off value by researchers, the cash flow ratios are the only ones to be seriously correlated. The conclusion is that, in this regard, the cheap stocks have stronger cash flow ratios than the expensive stocks. PE ratio is the least correlated with the discriminant score, until it is transformed, which makes it third most important variable. Furthermore, the GEA MEDIAN ratio slips from being mediocre in the basic and the ultimate classifier sample to second last in the ultimate predictor sample. The ambiguities in the rankings after variable 75 transformations suggest that rigorous conclusion cannot be drawn from the composition of the discriminant scores. The two ultimate classifiers are yet being tested with the non-cyclical sector. The prediction rates to the external sample of 60 companies did not yield reasonable prediction rates; 43 % for the ultimate classifier and 43 % for the ultimate predictor. The conclusion is that the prediction models formed in this Thesis are not capable of out-of-sample prediction, even though the external sample can be discriminated more accurately with the existing variables. The estimated variable coefficients for the two ultimate functions: Table 32 Two ultimate discriminant functions. The ultimate classifier -1,001 GEA MEDIAN 0,040 PE -0,347 PB 3,889 FCPY 7,782 CPY 0,067 ROCE -1,407 Constant The ultimate predictor 8,454 INVERSED PE -0,050 INV ERSED PB 8,281 INV ERSED FCPY 9,165 INV ERSED CPY 22,978 INV ERSED ROCE 2,963 INV ERSED EBITM -3,579 INV ERSED PS -1,330 GEA MEDIAN -33,961 Constant 76 7 Discussion and conclusions 7.1 Evaluation of tested approaches This Thesis examines the relationship between stock valuation and financial ratios. In the beginning, the classification rate of the discriminant analysis was 56 % which is too weak to be of use in practice. Later on, as the data was transformed, various ratio combinations analyzed and one new ratio introduced, a classification rate of 73 % was achieved. This analysis suggests that the valuation level of a stock can be defined from combinations of financial ratios. The sample forming procedure used in the Thesis is built on balanced assembly. The same amount of companies where gathered for each year and for the both groups. The major limitation in the procedure lies within the valuation phase. As the companies are valuated according to the future cash flows from the following seven years, the analysis lags seven years behind the present time. Shortening the time span was thought to cause volatility for company valuation. Therefore, the whole valuation procedure should be redesigned if the time span is shortened. Besides the current valuation method, the sample size is also limited by the market capitalization restriction of $300m-$3bn and by the exclusion of companies with incomplete history data. Size and quality of history data was poor before the 90´s. The sample size should be greater in order to gain wider separation between the two groups and more intervening companies could be excluded. Classification and prediction rates improved as the data was harmonized by excluding outliers or by variable transformation. As the original sample was trimmed 17 % by excluding multivariate and univariate outliers, the analysis was able to classify 66 % of the observations correctly. Similar improvement was achieved by transforming the variables by the basic transformations; square root, 77 logarithmic and inverse. An optimal transformation for each of the variables was assessed based on trials, tests and histograms, but in the end, the crude inverse transformation turned out to outperform all the other alternatives. By inversing all the variables, the classification rate improved to the level of 65 %. Thus, variable transformations and outlier exclusion are concluded to substitute rather than complement each other in data harmonization. The substitution of the gearing ratio by a binary variable, GEA MEDIAN, further improved the discriminating capability. The entrant variable simply divided the gearing ratios in half, which improved the classification rates to the level of 71 % for the both the trimmed and the inversed sample. Finally, various ratio combinations were analyzed and a classification rate of 73 % was achieved. The corresponding prediction rate to the hold-out sample is 57 %. The inversed basic sample maintained the second highest classification rate of 71 % and the highest prediction rate of 61 % to the hold-out sample. The prediction rates were further assessed with a sample from another sector which resulted 43 % for both the functions. If the classification rate is deemed satisfactory, the prediction capability of the formed functions is faint. The conclusion is that variable transformations and outlier exclusion are useful tools for approaching multivariate normality and better discrimination results. Variable transformations yielded ambiguous results and the most discriminating results were achieved by a trial rather than by proceeding in strictly analytical manner. The exclusion of outliers, for its part, is easy to execute and it improved prediction capability but the problem is diminishing sample size. The assumption for the equality of the covariance matrices was concluded to follow the normality assumption quite faithfully. Therefore, it seemed like another way to assess the normality rather than being a distinct requirement for discriminant analysis. The conclusion is that the growth stems from strong cash flow. The conclusion is in line with the real life situation because, after all, a company needs hard cash in 78 order to grow. The importance of an individual financial ratio is always of interest to economists. Throughout the Thesis, the most widespread attention was gained by the cash flow ratios, FCPY and CPY. The cash flow ratios were duly related with a record correlation of 0.92. Despite the correlation, it was not beneficial to remove one or the other within almost any of the samples and ratio profiles. FCPY was the most and CPY the second most correlated variable with the discriminant scores. Both the correlations were strong and negative which indicates that the cheap stocks have stronger cash flow ratios. The dichotomous classification rate was strong for the cash flow ratios; 64 % for FCPY and 56 % for CPY with the original basic sample. In summary, it was found that the interpretation of individual variables in a multivariate discriminant function is complicated and ambiguous. In some cases of the sensitivity analysis, the composition of a discriminant score varied significantly when the inputs were changed. This raises the question of the relevancy of examining the score composition instead of focusing only on the classification capability? Despite of multiform analyses, it was difficult to piece together the behavior of the score compositions. That is why an iterative code would be needed to try out many, if not all the combinations that can be formed with the ratios and transformations. Probably because of irregular data and the multicollinearity, the stepwise procedure did not result in consistent variable profiles. The stepwise procedure was found to require cleaner data because it selects the variables on the basis of statistical criterion. 7.2 Suggestions for further study For the practical purposes, the time lag, currently seven years, should be shortened to a just a couple of years to be able to use the analysis in practice. Additionally, if the model valuates companies based on data from, say, two years ahead, the prediction should be rather accurate two years into the future. The idea behind the long valuation time span was to protect the valuation from momentary price fluctuations. Because of the volatility of financial ratios and stock prices, the 79 use of moving averages should be examined. It is possible that smoothed data is more informative than the original data and the predictive capability improves. Furthermore, the smoothed data might improve the robustness of the model as the case-specific differences in the data-sets might be reduced. Tried and tested portfolios can also be used to adjust the discriminant function. Assuming that the portfolio consists of cheap stocks and a group of expensive stocks is defined, the model could be used to classify the rest of the stocks in the light of the existing portfolio. The financial ratios used in this Thesis were pre-determined. It would be interesting to examine and design different kinds of ratio selection procedures in order to find the superior discriminator combinations. Five out the eight ratios in this Thesis were pure valuation ratios, thus they are compared to the market price. A logical direction to broaden the ratio base would be to the direction of the ratios that are not compared to the market price. The ratio examination could also be expanded into the qualitative measure, such as management competency, working environment, political risk, etc. The inclusion of the qualitative factors is not complicated because many qualitative factors are already being measured by financial institutions and news agencies. Including all the available company financials would be a practical starting point for further study. The abundant ratio base and advanced mathematical optimization methods might result in models that would be of use for practical purposes. There are several other multivariate techniques to study such as non-parametric logit- and probit models as well as more recent invention of neural networks. Neural networks are more complicated than the basic MANOVA techniques and it is time consuming to program the models to learn from past events. Nonetheless, researchers have achieved excellent bankruptcy prediction results with neural networks and it would be interesting to apply them to the problem of identifying cheap stocks. Neural networks can be used to model complex relationships and patterns in data. 80 The identification of undervalued stocks could be approached from the bottom up, with an agent-based simulation. A model is built starting from simple rules at an individual company level. The idea is to test how changes in individual behaviors affect the overall system as a whole, as the system consists of numerous heterogeneous companies. An individual agent has its interests, limited knowledge and it can for example learn from the past. A successful simulation generates behavior similar to empirical data which helps understanding the behavior of the stock markets. Agent based modeling attempts to re-create and predict the behavior of complex systems that are not easily explained rationally. Finally, various models could be combined into hybrid models that would give an output based on a certain weighting on the outputs from the individual models. Hybrid models are usually formed in order to achieve higher levels of prediction robustness because strengths of some models may offset weaknesses of other models. An extensive survey for prediction robustness would be achieved by expanding further into different stocks from various countries and sectors as well as to various time frames. More complex models do not inevitably mean better prediction results. On the contrary, it often means less visibility and growing danger of overfitting. When expanding the research, the costs and benefits should be evaluated thoroughly. 81 8 References Adnan, M., Adnan, M. & Dar, H.A. 2006, "Predicting corporate bankruptcy: where we stand?", Corporate Governance: The International Journal of Effective Board Performance, vol. 6, no. 1, pp. 18. Altman, E.I. 1968, "Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy", The Journal of Finance, vol. 23, no. 4, pp. 589-609. Altman, E.I. 1968, "The Prediction of Corporate Bankruptcy: A Discriminant Analysis", The Journal of Finance, vol. 23, no. 1, pp. 193-194. Andersson, T., Andersson, T. & Lee, E. 2006, "Financialized accounts: Restructuring and return on capital employed in the S&P 500", Accounting Forum (Elsevier), vol. 30, no. 1, pp. 21. Baker, M. & Wurgler, J. 2002, "Market Timing and Capital Structure", The Journal of Finance, vol. 57, no. 1, pp. 1-32. Beaver, W.H. & Beaver, W.H. 1966, "Financial Ratios as Predictors of Failure", Journal of Accounting Research, vol. 4, no. 3, pp. 71. Blake, D. 2000, Financial Market Analysis, 2nd edn, John Wiley & Sons, New York. Booth, P.J. & Booth, P.J. 1983, "DECOMPOSITION MEASURES AND THE PREDICTION OF FINANCIAL FAILURE", Journal of Business Finance & Accounting, vol. 10, no. 1, pp. 67. Brockett, P.L., Brockett, P.L., Golden, L.L., Jang, J. & Yang, C. 2006, "A Comparison of Neural Network, Statistical Methods, and Variable Choice for Life Insurers' Financial Distress Prediction", Journal of Risk & Insurance, vol. 73, no. 3, pp. 397. Brockett, P.L., Cooper, W.W., Golden, L.L. & Pitaktong, U. 1994, "A Neural Network Method for Obtaining an Early Warning of Insurer Insolvency", The Journal of Risk and Insurance, vol. 61, no. 3, pp. 402-424. Campbell, J.Y., Campbell, J.Y. & Shiller, R.J. 1998, "Valuation Ratios and the Long-Run Stock Market Outlook", Journal of Portfolio Management, vol. 24, no. 2, pp. 11. Chan, L. & Lakonishok, J. 2004, "Value and growth investing: Review and update", FINANCIAL ANALYSTS JOURNAL, vol. 60, no. 1, pp. 71-86. Damodaran, A. 2001, The dark side of valuation : valuing old tech, new tech, and new economy companies, New York : Financial Times/Prentice Hall. DeFond, M.L. & DeFond, M.L. 2003, "An empirical analysis of analysts' cash flow forecasts", Journal of Accounting & Economics, vol. 35, no. 1, pp. 73. 82 Dimitras, A.I., Dimitras, A.I., Zanakis, S.H. & Zopounidis, C. 1996, "A survey of business failures with an emphasis on prediction methods and industrial applications", European Journal of Operational Research, vol. 90, no. 3, pp. 487. Fama, E.F. & Fama, E.F. 1970, "EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY AND EMPIRICAL WORK", Journal of Finance, vol. 25, no. 2, pp. 383. Fama, E.F. & French, K.R. 1995, "Size and Book-to-Market Factors in Earnings and Returns", The Journal of Finance, vol. 50, no. 1, pp. 131-155. Fama, E.F. & French, K.R. 1992, "The Cross-Section of Expected Stock Returns", The Journal of Finance, vol. 47, no. 2, pp. 427-465. FILLIBEN, J. 1975, "PROBABILITY PLOT CORRELATION COEFFICIENT TEST FOR NORMALITY", TECHNOMETRICS, vol. 17, no. 1, pp. 111-117. Fisher, R.A. 1936, "The use of multiple measurements in taxonomic problems", Annals Eugen, vol. 7, pp. 179-188. Frank, M. & Jagannathan, R. 1998, "Why do stock prices drop by less than the value of the dividend? Evidence from a country without taxes", Journal of Financial Economics, vol. 47, no. 2, pp. 161. Frank, R.E., Frank, R.E., Massy, W.F. & Morrison, G. 1965, "Bias in Multiple Discriminant Analysis", Journal of Marketing Research (JMR), vol. 2, no. 3, pp. 250. Gnanadesikan, R. 1977, Methods for Statistical Data Analysis of Multivariate Observations 1st edn, John Wiley & Sons. Goetzmann, W.N. & Jorion, P. 1995, "A Longer Look at Dividend Yields", The Journal of Business, vol. 68, no. 4, pp. 483-508. Gordon, M.J. 1962, "[Security and a Financial Theory of Investment]: Reply", The Quarterly Journal of Economics, vol. 76, no. 2, pp. 315-319. Gordy, M.B. 2000, "A comparative anatomy of credit risk models", Journal of Banking & Finance, vol. 24, no. 1, pp. 119. Griffin, J.M. 1988, "A Test of the Free Cash Flow Hypothesis: Results from the Petroleum Industry", The Review of Economics and Statistics, vol. 70, no. 1, pp. 76-82. Hagstorm, R.G. 2001, The Essential Buffet: Timeless Principles for the New Economy, John Wiley & Sons. Helfert, E.A., Helfert, E.A. & ebrary, I. 2001, Financial analysis [Elektroninen aineisto] : tools and techniques : a guide for managers, , New York : McGraw-Hill, cop. 2001. 83 Hoover, S. & ebrary, I. 2006, Stock valuation [Elektroninen aineisto] : an essential guide to Wall Street's most popular valuation models, , New York : McGraw-Hill, cop. 2006. Hovakimian, A., Opler, T. & Titman, S. 2001, "The debt-equity choice", JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS, vol. 36, no. 1, pp. 1-24. JENSEN, M. 1986, "AGENCY COSTS OF FREE CASH FLOW, CORPORATE-FINANCE, AND TAKEOVERS", AMERICAN ECONOMIC REVIEW, vol. 76, no. 2, pp. 323-329. JOHNSON, D.W. & WICHERN, R.A. 1987, Applied Multivariate Statistical Analysis, Longman Higher Education. Lachenbruch, P.A., Sneeringer, C. & Revo, L.T. 1973, "Robustness of linear and quadratic discriminant function to certain types of non-normality", Communications in Statistics Theory and Methods, vol. 1, no. 1, pp. 39-56. Lachenbruch, P.A. 1975, "Discriminant Analysis", Macmillan Pub Co, New York. Laitinen, T., Back, B., Sere, K. & Wezel, M. 1995, "Choosing Bankruptcy Predictors Using Discriminant Analysis, Logit Analysis and Genetic Algorithms", Proceedings of the first International Meeting on Artificial Intelligence in Accounting, Finance and Tax, , pp. 337356. Laitinen, E.K., Laitinen, E.K. & Laitinen, T. 1998, "CASH MANAGEMENT BEHAVIOR AND FAILURE PREDICTION", Journal of Business Finance & Accounting, vol. 25, no. 7, pp. 893. LeRoy, S.F., LeRoy, S.F. & Porter, R.D. 1981, "THE PRESENT-VALUE RELATION: TESTS BASED ON IMPLIED VARIANCE BOUNDS", Econometrica, vol. 49, no. 3, pp. 555. Liu, J., Liu, J., Nissim, D. & Thomas, J. 2007, "Is Cash Flow King in Valuations?", Financial Analysts Journal, vol. 63, no. 2, pp. 56. Luenberger, D.G. 1997, Investment Science, Oxford University Press, New York. Mannila, H., Smyth, P. & Hand, D.J. 2001, Principles of Data Mining (Adaptive Computation and Machine Learning), The MIT Press. Marks, S., Marks, S. & Dunn, O.J. 1974, "Discriminant Functions When Covariance Matrices Are Unequal", Journal of the American Statistical Association, vol. 69, no. 346, pp. 555. MICHAUD, R. & DAVIS, P. 1982, "VALUATION MODEL BIAS AND THE SCALE STRUCTURE OF DIVIDEND DISCOUNT RETURNS", Journal of Finance, vol. 37, no. 2, pp. 563-573. MILLER, M. 1977, "DEBT AND TAXES", Journal of Finance, vol. 32, no. 2, pp. 261-275. 84 MITRA, D., BISWAS, A. & OWERS, J. 1991, "A DIRECT TEST OF THE FREE CASH FLOW HYPOTHESIS", Financial Management, vol. 20, no. 1, pp. 13-14. MODIGLIANI, F. & MILLER, M. 1958, "THE COST OF CAPITAL, CORPORATION FINANCE AND THE THEORY OF INVESTMENT", AMERICAN ECONOMIC REVIEW, vol. 48, no. 3, pp. 261-297. Murphy, J.J. 1999, "Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications (New York Institute of Finance)", . Murphy, K.J. & Murphy, K.J. 1985, "CORPORATE PERFORMANCE AND MANAGERIAL REMUNERATION An Empirical Analysis", Journal of Accounting & Economics, vol. 7, no. 1, pp. 11. Myers, S.C. 1984, "The Capital Structure Puzzle", The Journal of Finance, vol. 39, no. 3, pp. 575592. Ohlson, J.A. & Ohlson, J.A. 1980, "Financial Ratios and the Probabilistic Prediction of Bankruptcy", Journal of Accounting Research, vol. 18, no. 1, pp. 109. Osborne, J. 2002, "Notes on the use of data transformations.", Practical Assessment, Research & Evaluation, vol. 8, no. 6. Park, Y.S. & Lee, J. 2003, "An empirical study on the relevance of applying relative valuation models to investment strategies in the Japanese stock market", Japan & the World Economy, vol. 15, no. 3, pp. 331. Penman, S.H. & Penman, S.H. 1996, "The Articulation of Price-Earnings Ratios and Market-toBook Ratios and the Evaluation of Growth", Journal of Accounting Research, vol. 34, no. 2, pp. 235. Ross, S.A., Westerfield, R.W., Jaffe, J. & Ku, S. 1999, Corporate Finance, McGraw-Hill College. Senchack Jr., A. J., Senchack Jr., A. J. & Martin, J.D. 1987, "The Relative Performance of the PSR and PER Investment Strategies", Financial Analysts Journal, vol. 43, no. 2, pp. 46. Sharma, S. 1996, Applied Multivariate Techniques, New York : John Wiley. Shiller, R.J. & Shiller, R.J. 1981, "Do Stock Prices Move Too Much to be Justified by Subsequent Changes in Dividends?", American Economic Review, vol. 71, no. 3, pp. 421. Smith, K.A., Gupta, J.N.D. & ebrary, I. 2002, Neural networks in business [Elektroninen aineisto] : techniques and applications, , Hershey, PA : Idea Group Pub : Information Science Pub, cop. 2002. Stracca, L. 2004, "Behavioral finance and asset prices: Where do we stand?", Journal of Economic Psychology, vol. 25, no. 3, pp. 373. 85 Tabachnick, B.G. & Fidell, L.S. 2000, Using Multivariate Statistics, 4th edn, Allyn & Bacon. Yang, Z.R., Yang, Z.R., Platt, M.B. & Platt, H.D. 1999, "Probabilistic Neural Networks in Bankruptcy Prediction", Journal of Business Research, vol. 44, no. 2, pp. 67. Zapranis, A., Zapranis, A. & Ginoglou, D. 2000, "FORECASTING CORPORATE FAILURE WITH NEURAL NETWORK APPROACH: THE GREEK CASE", Journal of Financial Management & Analysis, vol. 13, no. 2, pp. 11. Zavgren, C.V. & Zavgren, C.V. 1985, "ASSESSING THE VULNERABILITY TO FAILURE OF AMERICAN INDUSTRIAL FIRMS: A LOGISTIC ANALYSIS", Journal of Business Finance & Accounting, vol. 12, no. 1, pp. 19. 86

Related docs
premium docs
Other docs by ramhood15
cr112
Views: 115  |  Downloads: 0
You are Holy
Views: 277  |  Downloads: 4
dv125
Views: 152  |  Downloads: 0
at155
Views: 100  |  Downloads: 0
Dickinson v Dodds
Views: 971  |  Downloads: 5
Dahl BC Tires Patterson Briefs
Views: 280  |  Downloads: 1
dv145
Views: 132  |  Downloads: 0
Marshall Lefkowitz Briefs
Views: 281  |  Downloads: 0
Hardy v LaBelle
Views: 441  |  Downloads: 2
adr111
Views: 98  |  Downloads: 0
Lord Most High
Views: 329  |  Downloads: 2
There is a Redeemer
Views: 184  |  Downloads: 3
Chemsitry and Your Career
Views: 444  |  Downloads: 23
I Am Mine No More
Views: 215  |  Downloads: 0