VIEWS: 147 PAGES: 9 CATEGORY: Emerging Technologies POSTED ON: 5/15/2012
International Journal of Computer Science and Information Security (IJCSIS) provide a forum for publishing empirical results relevant to both researchers and practitioners, and also promotes the publication of industry-relevant research, to address the significant gap between research and practice.
Being a fully open access scholarly journal, original research works and review articles are published in all areas of the computer science including emerging topics like cloud computing, software development etc. It continues promote insight and understanding of the state of the art and trends in technology. To a large extent, the credit for high quality, visibility and recognition of the journal goes to the editorial board and the technical review committee.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences. The topics covered by this journal are diversed. (See monthly Call for Papers)
For complete details about IJCSIS archives publications, abstracting/indexing, editorial board and other important information, please refer to IJCSIS homepage. IJCSIS appreciates all the insights and advice from authors/readers and reviewers. Indexed by the following International Agencies and institutions: EI, Scopus, DBLP, DOI, ProQuest, ISI Thomson Reuters. Average acceptance for the period January-March 2012 is 31%.
We look forward to receive your valuable papers. If you have further questions please do not hesitate to contact us at email@example.com. Our team is committed to provide a quick and supportive service throughout the publication process.
A complete list of journals can be found at:
IJCSIS Vol. 10, No. 3, March 2012 Edition
ISSN 1947-5500 � IJCSIS, USA & UK.
International Journal of Computer Science and Information Security (IJCSIS) provide a forum for publishing empirical results relevant to both researchers and practitioners, and also promotes the publication of industry-relevant research, to address the significant gap between research and practice. Being a fully open access scholarly journal, original research works and review articles are published in all areas of the computer science including emerging topics like cloud computing, software development etc. It continues promote insight and understanding of the state of the art and trends in technology. To a large extent, the credit for high quality, visibility and recognition of the journal goes to the editorial board and the technical review committee. Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences. The topics covered by this journal are diversed. (See monthly Call for Papers) For complete details about IJCSIS archives publications, abstracting/indexing, editorial board and other important information, please refer to IJCSIS homepage. IJCSIS appreciates all the insights and advice from authors/readers and reviewers. Indexed by the following International Agencies and institutions: EI, Scopus, DBLP, DOI, ProQuest, ISI Thomson Reuters. Average acceptance for the period January-March 2012 is 31%. We look forward to receive your valuable papers. If you have further questions please do not hesitate to contact us at firstname.lastname@example.org. Our team is committed to provide a quick and supportive service throughout the publication process. A complete list of journals can be found at: http://sites.google.com/site/ijcsis/ IJCSIS Vol. 10, No. 3, March 2012 Edition ISSN 1947-5500 � IJCSIS, USA & UK.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 Data Mining Techniques: A Key for Detection of Financial Statement Fraud Rajan Gupta Nasib Singh Gill Research Scholar, Dept. of Computer Sc. & Head, Dept. of Computer Sc. & Applications, Applications, Maharshi Dayanand University, Maharshi Dayanand University, Rohtak (Haryana), Rohtak (Haryana) – India. Email: India. Email: email@example.com firstname.lastname@example.org Abstract of their auditors. Warnings of fraud in US listed Chinese companies have grown in recent months. In recent times, most of the news from business world In January 2011, the shares of China Forestry is dominated by financial statement fraud. A financial Holdings were suspended after the auditor KPMG statement becomes fraudulent if it has some false information incorporated by the management informed the board of directors of possible intentionally. This paper implements data mining irregularities in its accounting books. On 11 April, techniques such as CART, Naïve Bayesian classifier, 2011 the SEC suspended trading in RINO Genetic Programming to identify companies those International due to questions surrounding the issue fraudulent financial statements. Each of these accuracy and completeness of information techniques is applied on a dataset from 114 contained in RINO’s public filings, and the companies. CART outperforms all other techniques company’s failure to report the resignation of its in detection of fraud. chairman, directors of the board and an outside lawyer and forensic accountants brought in to 1. Introduction investigate allegations of fraud. The finger was Financial statement fraud is a serious social and pointed at Sino-Forest Corporation, a Toronto- economic problem worldwide and more severe in listed forestry firm, on 2 June, 2011, after a short- growing countries. A company listed with any seller accused the firm of inflating its assets. More stock exchange is required to publish its financial recently, the unravelling of Longtop Financial statements such as balance sheet, income statement, Technologies Ltd highlighted the scale of the statements of retained earnings and cash flow problem. The company regularly reported income statements yearly and quarterly. Financial that was slightly higher than executives’ statements of a company reflects its actual financial predictions .1 The "cash balance" on Longtop's health by analysing which, stockholders can form a balance sheet was fake--a fiction created by the wise decision about investing in the company. An company's managers with bank complicity .2 intentional distortion of information in the financial Data Mining is an iterative process within which statement is termed as financial statement fraud. progress is defined by discovery of knowledge. Conventionally, auditors are responsible for Data Mining is most useful in an exploratory identification and detection of fraudulent financial analysis scenario in which there are no statement. Although, auditors are supposed to predetermined notions about what will constitute an provide information weather the statement is “interesting” outcome .3 The application of Data according to GAAP or not. With an increase in Mining techniques for detection and identification number of high profile fraud cases, auditors are of financial statement fraud is a fertile research overburdened with an additional duty of detection area. Several law enforcement agencies and special of fraud. Hence, various techniques of data mining investigative units have used data mining are being used to ease out this extra pressure from techniques successfully for detection of financial the mind of the auditors. frauds Some of the world’s major fraud cases include In this study, we analyse the financial statements of Enron, WorldCom, Satyam and many more. A various organisations for detection of financial number of Chinese companies listed on US stock statement fraud by using data mining techniques. exchanges have faced accusations of accounting This research aims at identifying the financial fraud, and in June 2011, the U.S. Securities and ratios / items from financial statements in order to Exchange Commission warned investors against help auditors in assessing the probability of investing with Chinese firms listing via reverse financial fraud. In this study, data mining mergers. While over 20 US listed Chinese techniques namely CART, Naive Bayesian companies have been de-listed or halted in 2011, a Classifier and Genetic Programming are tested for number of others have been hit by the resignation their applicability in detection of fraudulent 49 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 financial statements and differentiating between fraud. An innovative fraud detection mechanism is fraud and non fraud reporting. The dataset consists developed by Huang et al. 12on the basis of of financial ratios obtained from publicly available Zipf’s Law. This technique reduces the burden of financial statements. auditors in reviewing the overwhelming volumes of The paper is organised as follows: Section 2 datasets and assists them in identification of any discusses the relevant prior research followed by potential fraud records. Hoogs et al 13 presents section 3 which describes the various tricks a genetic algorithm approach to detecting financial adopted by management for falsifying financial statement fraud. Cerullo and Cerullo 14 statements. Section 4 reveals the key variables and explained the nature of fraud and financial financial ratios related to detection of financial statement fraud along with the characteristics of statement fraud. Section 5 provides an insight in to NN and their applications. They illustrated how NN the data mining techniques used in this study. packages could be utilized by various firms to Section 6 analyses the results followed by predict the occurrence of fraud. Koskivaara 15 concluding remarks (Section 7). proposed NN based support systems as a possible tool for use in auditing. He demonstrated that the 2. Related Work main application areas of NN were detection of An overview of the academic literature concerning material errors, and management fraud. Busta and detection of financial statement fraud is given here. Weinberg16 used NN to distinguish between Number of studies such as PwC 4, and ACFE ‘normal’ and ‘manipulated’ financial data. They 5 tells the story about detection of fraud. examined the digit distribution of the numbers in Findings of these studies suggest that many a the underlying financial information. Koh and number of times fraud has been detected by chance Low17 construct a decision tree to predict the means or accident. For example reports of PwC  hidden problems in financial statements by revels that 41% of the fraud cases were detected by examining the following six variables: quick assets means of tip – offs or by chance. to current liabilities, market value of equity to total assets, total liabilities to total assets, interest Several groups of researchers have devoted a payments to earnings before interest and tax, net significant amount of effort in studying the use of income to total assets, and retained earnings to total data mining techniques in detection of financial assets. Belinna et al  18examine the statements fraud from different perspectives. effectiveness of CART on identification and Beasley 6 used Logit regression to test the detection of financial statement fraud. They prediction that the inclusion of larger proportions concluded by saying that CART is a very effective of outside members on the board of directors technique in distinguishing fraudulent financial significantly reduces the likelihood of financial statement from non fraudulent. Further, Deshmukh statement fraud with a sample of 150 American and Talluru 19 demonstrated the construction of firms. They found that non-fraud firms have boards a rule-based fuzzy reasoning system to assess the with significantly higher percentages of outside risk of management fraud and proposed an early members than fraud firms. Green and Choi 7 warning system by finding out 15 rules related to presented a neural network fraud classification the probability of management fraud. Zhou & model employing endogenous financial data. A Kapoor 20 examine the effectiveness and classification model created from the learned limitations of data mining techniques such as behavior pattern is then applied to a test sample. regression, decision trees, neural network and Fanning and Cogger 8 also used an artificial Bayesian networks. They explore a self – adaptive neural network to predict management fraud. Using framework based on a response surface model with publicly available predictors of fraudulent financial domain knowledge to detect financial statement statements, they found a model of eight variables fraud. Recently, Ravisankar et al 21 uses data with a high probability of detection. Kirkos 9, mining techniques such as Multilayer Feed carry out an in-depth examination of publicly Forward Neural Network (MLFF), Support Vector available data from the financial statements of Machines (SVM), Genetic Programming (GP), various firms in order to detect FFS by using Data Group Method of Data Handling (GMDH), Mining classification methods. In this study, three Logistic Regression (LR), and Probabilistic Neural Data Mining techniques namely Decision Trees, Network (PNN) to identify companies that resort to Neural Networks and Bayesian Belief Networks are financial statement fraud. They found that PNN tested for their applicability in management fraud outperformed all the techniques without feature detection. Spathis et al10  compared multi- selection, and GP and PNN outperformed others criteria decision aids with statistical techniques with feature selection and with marginally equal such as logit and discriminant analysis in detecting accuracies. fraudulent financial statements. Cecchini et al  If we summarize the existing academic research, 11 developed a novel financial kernel using support we arrive at a conclusion that detection of financial vector machines for detection of management statement fraud is an instance of classification and 50 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 decision problem. In present research, we apply the different financial ratios. Financial ratio assists same idea and implements data mining investors / auditors in evaluating the actual position classification methods for differentiation between of the company. On the basis of existing academic fraudulent and non fraudulent observations. research and expert’s knowledge, we identify the 3. Artifice used by top level executives for following financial variables / ratios (Table 1). fraudulent financial reporting: Financial statements are a company's basic (a) Z-score: Financial distress may be a documents to reflect its financial status .22 A motivation for management fraud . To complete and thorough analysis of financial measure the financial distress Z-score is statements could help investors in judging the developed by Altman 25. It is a financial status of a company. Any material formula for estimating the financial status misstatement in the financial statement has been a of a company and also helpful in major apprehension to the investors worldwide. bankruptcy prediction. The formula for Z- The techniques associated with the production of score for public companies is given by: fraudulent financial statement have been discussed in Schilit’s book “Financial Shenanigans” .23 Z-score= (Working capital / Total assets* 1.2) + The book reported seven common tricks: (Retained earnings ÷Total assets* 1.4)+ (Earnings (1) Recording revenue before it is earned; before income tax ÷Total assets* 3.3)+(Book value (2) Creating fictitious revenue; of total / Liabilities * 0.6) + (Sales ÷Total assets* (3) Boosting profits with non-recurring 0.999) transactions; (4) Shifting current expenses to a later period; (b) A high debt structure may be an (5) Failing to record or disclose liabilities; indicator for fraudulent financial (6) Shifting current income to a later period and reporting, because it shifts the risk from (7) Shifting future expenses to an earlier period. mangers to debt owners. Hence we can The first five tricks aim at boosting current year state that higher levels of debt may earnings, and the last two shift current-year increase the likelihood of financial earnings to the future in order to create an illusion statement fraud and one should carefully of steady income over years. consider the financial ratios related to debt A number of studies have been conducted for structure. finding indicators for Fraudulent Financial (c) Continues growth: The need for statements. One such study conducted by C.Fei in continues growth may be another china found four types of companies which are motivational factor for financial statement more prone to financial scandals .24 fraud .26So, sales to growth ratio (i) Companies with frequent capital operations and should be measured as a fraudulent related-party transactions. financial statement indicator. (ii) Companies with high- and volatile-stock prices Sales to growth = (Current Year's sales - Last (iii) Initial Public Offering (IPO) companies and Year's sales) / (Last Year's sales) (iv) Companies in a declining or over-competitive (d) Other items: A company may manipulate business environment accounts receivable, inventories and gross Management of an organisation may falsify the margin. Accounts receivable may be financial statement to achieve the following: manipulated by recording sales before a. Good amount of loan sanctioned from a they are earned. Inventory is also prone to bank manipulation. Mangers may manipulate b. Paying less dividends to shareholders inventory either by reporting inventory at c. Avoid payment of taxes and lower cost or by obsolete inventory or d. Inflated stock prices both. A company may use gross margin In present time there is a steady increase in number as a factor for falsifying financial of companies which are falsifying their financial statement. The company may not match its statements in order to present a rosy picture about sales with the corresponding cost of goods financial status to the stockholder and making sold, thus increasing gross margin, net selfish gain. Hence, detection of such fraud is an income and strengthening the balance additional responsibility of auditors and is the need sheet . of the hour. (e) Other qualitative variables: Qualitative variables such as qualification or the composition of the administrative board of 4. Key financial items / ratios relevant to a company, previous auditors, high the detection of fraud: turnover of CEO and CFO, size and age of The values and numbers present in financial a company could prove helpful in statements can be easily interpreted with the help of searching for indicators of cooked books. 51 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 12 Current assets/Total assets 5. Research Methodology 13 Net profit/Primary business income 5.1 Dataset 14 Accounts receivable/Primary business income The dataset used in this research is obtained from 15 Primary business income/Total assets 114 companies listed in different stock exchanges. 16 Current assets/Current liabilities Out of these 114 firms used in our analysis, 85 17 Primary business income/Fixed assets firms have not reported their financial statements 18 Cash/Total assets 19 Inventory/Current liabilities fraudulently, whereas 29 organisations are having 20 Total debt/Total equity different charges of fraudulent financial reporting. 21 Long term debt/Total assets The data has been collected form 22 Net profit/Gross profit www.wikinvest.com for all the 114 companies. We 23 Total debt/Total assets reviewed AAERs (Accounting and auditing 24 Total assets/Capital and reserves enforcement releases) published by SEC (U.S. 25 Long term debt/Total capital and reserves Securities and Exchange Commission) between 26 Fixed assets/Total assets 2007 and 2012, to identity companies accused of 27 Deposits and cash/Current assets falsifying financial statements. All the firms in the 28 Capitals and reserves/Total debt sample have been checked by auditors. There was a 29 Accounts receivable/Total assets clear indication of fraudulent financial reporting for 30 Gross profit/Primary business profit 29 fraud firms. Some of the indicators of fraud 31 Undistributed profit/Net profit includes: resignation by the auditors, chairman and 32 Primary business profit/Primary business profit board of directors, doubts reported by auditors, of last year observations by the tax authorities. 33 Primary business income/Last year's primary business income The 29 fraud firms have been matched with 85 non 34 Account receivable /Accounts receivable of last fraud organisations. These firms are classified as year non fraud because no published indication or proof 35 Total assets/Total assets of last year is present. However, absence of any proof does not 36 Debit / Equity guarantee that these firms have not falsified their 37 Accounts Receivable / Sales financial statements or will not do the same in 38 Inventory / Sales future. This research only assures that fraudulent 39 Sales – Gross Margin reporting has been found for these firms. 40 Working Capital / Total Assets 5.2 Variables 41 Net Profit / Sales 42 Sales / Total Assets All the variables to be used as a candidate for 43 Net income / Fixed Assets participation in the input vector have been 44 Quick assets / Current Liabilities extracted from published financial statements such 45 Revenue /Total Assets as income statement and balance sheet. The dataset 46 Current Liabilities / Revenue contain 52 financial ratios / items for each of the 47 Total Liability / Revenue 114 companies. A list of these financial ratios / 48 Sales Growth Ratio items is presented in Table 1. The selection of these 49 EBIT financial variables is based on prior research and 50 Z – Score financial ratios on liquidity, safety, profitability and 51 Retained Earnings / Total Assets efficiency of the organisations under consideration. 52 EBIT / Total Assets During the preprocessing stage, each of the independent financial variables has been normalized. In order to improve the reliability of We compiled all the financial items / ratios of the result further we perform ten – fold cross Table 1. We applied one way ANOVA on the validation. dataset for reducing dimensionality and to test Table 1: Items / Ratios from financial statement to be whether the differences between the two classes used for detection of financial statement fraud: namely fraud and non fraud, were significant for S.No. Financial items / Ratios each variable. The variables with high p – value are 1 Debt considered non informative. Variables with p – 2. Total assets value <= 0.05 are considered informative and are 3 Gross profit tested further using data mining methods. The 4 Net profit financial ratios which are considered informative 5 Primary business income are present in Table 2 along with their F – values 6 Cash and deposits and p – values. 7 Accounts receivable Table 2: Informative financial ratios / items 8 Inventory/Primary business income S. P- 9 Inventory/Total assets No. Financial Ratios / Items value F - Value 10 Gross profit/Total assets Debt 0.028 11 Net profit/Total assets 1 1.345 52 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 Inventor ry/Primary businness 0.001 with pr asses. CART is a classificatio re-assigned cla s on 2 income 31 3.03 d method different from traditio cal onal statistic ry/Total assets Inventor 0.046 method s n ds. CART is a decision tree learnin ng 3 5.74 49 fit/Total assets Net prof 0.001 que techniq that produ uces classifica he ation tree if th 4 3.04 45 depend dent variable is categorical and regressio on Cash/Total assets 0.001 5 2.90 06 tree ot is est therwise. Thi method be classify th he 6 bt/Total assets Total deb 0.002 2.30 03 es sample in to a nu umber of non – overlappin n ng s Fixed assets/Total assets 0 0.05 s. regions The tree construction using CAR RT 7 2.07 75 method es dology include three steps. The first ste ep s ent Deposits and cash/Curre assets 0.001 k also known as g se growing phas consists of 8 2.93 37 Working Capital / Total Assets g 0.001 construucting maximu um tree whi ich means th hat 9 2.86 63 ng splittin of learning sample should be done up a Total Assets Sales / T 0.002 10 12.07 77 h point where each terminal node contain ns ome ets Net inco / Fixed Asse 0.001 observa y his ations of only one class. Th step is mo ost 11 3.0 04 e Revenue /Total Assets 0.002 time co onsuming bec cause each iter he ration seeks th 12 12.07 77 best sp le. plitting variabl Tree constr ructed as abovve EBIT 0.026 may co onsist of hundr and reds of levels a insignifica ant 13 4.29 97 Z score 0.001 nodes or subtrees. T Therefore, thi complex tree is 14 3.13 37 should be pruned by using cross v validation as onne Account receivable/Prim ts mary 0.018 15 business income 99 6.09 of the p thm. This pruning will result in pruning algorit Primary business incomee/Total 0.001 a right size tree which will be used in the third ste h ep 16 assets 3.04 48 for classsifying new daata. Primary business incomee/Fixed 0.001 CAART as a cla m assification method does n not 17 assets 57 3.05 e n require variables to be selected in advance. Th his otal Capitals and reserves/To debt 0.003 d method automatica ally identifi ies significa ant 18 2.2 22 variables and ignor res the non significant on ne. rofit/Primary bus Gross pr siness 0.008 Moreov CART is very sensitive to the trainin ver, e ng 19 profit 3.92 25 Account Receivable / S ts Sales 0.013 data if it consists o outliers. Ou f of ery utliers are ve 20 2.31 11 promin al nent in financia data due to financial crise es. Retined earnings / Total Assets 0.001 Non p parametric na ature and its capability of s 21 3.04 44 EBIT / T Total Assets 0.001 ng handlin noisy data are one of the reasons f for 22 3.05 59 ng ne hod selectin CART as on of the meth to be used in search. this res 3.2 5.3 Bayesian Class Naïve B sifier Data Mining Methods 5.3 D M Naive BBayesian class sifier is a proba ng abilistic learnin Detection of financial statement fra aud can be que techniq based on a es’ applying Baye theorem wi ith d l considered as a classical problem of c classification. class c condition ind dependence as ssumption. Th his wo e Classification includes tw steps. In the first step, a strong (naive) indepe endence assum mption states th hat t et model that describes a se of predeterm mined classes presenc or absence o an attribute of a class is n ce of not is construccted. The sam he mple used in th process is e related to presence or absence of any oth her s known as training sam mple. Each tu uple in the attribut Bayes’ the te. tes eorem calculat the posteri ior et training se is supposed to belong to a predefined probability as he class as determined by th class label atttribute. This P(H|X) = (PP(X|H) * P(H)) / P(X) step of sup ing ed pervised learni is followe by second Where, H is a hypo , othesis such a the object X as step in wh hich the model attempts to classify new belongs to class C. objects wh e hich form the validation sample. Data If an object X belo ongs to one of i alternativ ve mining su uggests a n c number of classification classes, in order to c classify the obbject a Bayesia an s, techniques which have an excellent re eputation for ier he es classifi calculates th probabilitie P(Ci|X) for a all their class sification cappabilities. Most of these the pos Ci ssible classes C and assigns the object to th he ion methods are derived fro artificial classificati a om with um y class w the maximu probability P(Ci|X). s. c intelligence and statistics Three such classification The coonditional distr he ribution over th class variab ble methods n , ian namely CART, naive Bayesi classifier be C can b expressed as s tic and Genet Programmi ing are emplooyed in this research sttudy. 5.3.1 CART Classification and R Regression T Tree is a computerizzed, non – pa a arametric data exploration e istorical data and prediction technique which uses hi 53 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 aling factor Where Z (the evidence) is a sca ed achieve by select ting two par rent trees an nd dependent only on , i.e a constant e., reproduucing to form ttwo new solut ent tions. The pare if the valu of the feat ues ture variables or attributes re om trees ar selected fro the initial population by a y are know wn. If assump ption of clas condition ss functio of the fitness of the solutio The creatio on s ons. on independen nce holds tr rue, the naiv Bayesian ve om ver of the offsprings fro the crossov operation is classifier produces best accuracy rates. This accomp eting the crosso plished by dele over fragment of classificati technique requires smal amount of ion ll st the firs parent and then inserting the crossov ver data to guuesstimate the parameters such as means fragmeent of the s t. second parent The secon nd and varian nces of the variables, ne ecessary for ng d ric offsprin is produced in a symmetr manner. Th he classificati ion. This assum y mption greatly reduces the fitness function to search the most efficie ent computatio onal cost as on class distr nly ribution is to ter he comput program that can solve th given proble em be counted d. n is given below . t n However, this assumption of independe ence may not fied No. of samples classif correctly n be valid in many cases, b ally because genera attributes are depend dent in nature This naive design and e. ______________ Fitness = _ ______________ ______________ _ simplified assumption should not be taken as its limitation because naive Bayes’ clas e ssifier works samples used for training during evaluation No. of s r g much bett in many complex and real world ter situations. 5.3.3 rogramming Genetic Pr The ap rn pplication of GP in patter classificatio on Genetic p programming (GP) is an evolutionary the offers t following aadvantages. learning teechnique that o offers a great potential for 1) GP is very f be flexible which means it can b classificati 27. GP follows Darw ion P win’s theory adapted to the needs of each particul lar of evolutio commonly known as “su on, urvival of the problem. There is a ra fittest”. T andomly gene erated initial population of solutions that reproduc with each n ce 2) GP can be employed on the data in i its other usin various ge ng enetic operato such as ors m. original form reproductio crossover, mutation etc. This process on, 3) A priori kno t owledge is not required abo out of evolutio is termed as generation. on ion the distributi of the data since GP is free GP is ess sentially considered to be a variant of from data distribution. genetic al lgorithms (GA that uses a complex A) 4) GP can easily expr ress unknow wn representat tion language to codify in ndividuals 28 relationship among t the data in . The b basic differenc between GP and GA is ce P mathematica expressions. al the reprresentation of solutionss. Genetic 5) GP can be useful in pre eprocessing an nd programmi g ing follows the following sequential postprocessi along with classification in ing steps for soolving a proble 29 . em order to enha ance the classifier. a) Create a random population o programs, m of 6) GP can be helpful in f e finding out th he or rules, using the symbolic expressions r majority of discriminating features of a prrovided as the initial populatiion. training stage. class in the t b) Ev valuate each program o rule byor tal 6. Experiment Results and Analysisd asssigning a fitnness value acc cording to a Three data mining m methods discus ssed above hav ve prredefined fitn ness function that can n been im n mplemented on the dataset an compared o nd on measure the capability of the rule or m c sis ity the bas of sensitivi and specifi icity. Sensitiviity prrogram to solve the problem. articular method can be meas of a pa sured as the rat tio Use c) U the reprod duction operat to copy tor of num mber of fraud dulent organis sation identifie ed ex ms w xisting program into the new generation. accurattely as fraudu t ulent to the total number of d) G Generate the new popul lation with f ms actual fraudulent firm whereas spe tio ecificity is a rat crrossover, muta er ation, or othe operators n n of the number of non fraud firms id dentified as non- fr y rom a randomly chosen set of parents. f fraud to the total nu t umber of real non-fraudule l ent e) R ond Repeat the seco to the fou urth steps for nies compan (Table 8). he th new popul lation until a predefined s T In this study, CART model is co onstructed usin ng te ermination crite ed, erion is satisfie or a fixed A SIPINA Research edi ition software version – 32 b bit. nuumber of gener rations is comppleted. The tre given below has been built by using who ee w t ole f) Th solution to the problem is the genetic he e sample as training set with conf fidence level of prrogram with the best fitnes within all t ss 0.05. geenerations. ure: T Figu 1 CART The most important ope eration for gen nerating new n population in GP is crossover. C Crossover is 54 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 Table: 5 (Confusion matrix for Naïve Bayesian Classifier) Label NF (Non F (Fraud) Fraud) NF 83 2 F 10 19 Genetic programming has been implemented using tool Discipulus version 5.1. The data set has been divided into training and validation data. The training data set is used to train the sample and validation dataset is used exclusively for the purpose of validation. 80% of the whole dataset is used to train the sample, while 20% is used of the purpose of validation. Since our dependent variable (target output) is binary, we select “hits then fitness” as a fitness function. Every single run of Discipulus has been set to terminate after it has gone 50 generations with no improvement in fitness. The confusion matrix for genetic programming is given as table 6. Table: 6 (Confusion Matrix for Genetic The confusion matrix is given below (Table: 3) Programming) Table: 3 (Confusion Matrix for CART) Label NF (Non Fraud) F (Fraud) Label NF (Non F (Fraud) NF 84 1 Fraud) F 13 16 NF (Non Fraud) 85 0 F (Fraud) 4 25 From table 7 we can observe the input impact of various input parameters on the model. CART manages to classify 96 % cases. This method well classifies all the non fraud cases (100 Table: 7 Impact of input variables (Genetic %) and misclassifies only 4 fraud cases. The Programming) percentage of classification for fraud cases is 86 %. The tree presented here uses Deposits and cash to current assets ratio as the first splitter. This ratio S.N Variable Freque Avera Maxim indicates that how better the company is in o. ncy ge um converting its non – liquid assets into cash. At Impac Impact second level of the tree, retained earnings / total t assets and fixed assets / total assets has been used 1 Debt 0.06 00.00 00.000 as a splitter. Table 4 consist of all the ratios used by 000 00 the tree. 2 Inventory/Pri 0.35 22.52 53.846 Table: 4 mary 747 15 S. No. Financial Ratios / Items business Net profit/Total assets income 1 3 Inventory/To 0.35 09.70 20.879 2 Fixed assets/Total assets tal assets 696 12 3 Deposits and cash/Current assets 4 Net 0.06 02.19 02.197 Working Capital / Total Assets profit/Total 780 80 4 assets 5 Sales / Total Assets 5 Cash/Total 0.29 03.84 05.494 6 Retained earnings / Total Assets assets 615 51 6 Total 0.12 00.00 00.000 Second technique of classification, the Naïve debt/Total 000 00 Bayesian Classifier has been implemented using assets SIPINA Research edition software version – 32 bit. 7 Fixed 0.00 00.00 00.000 The method correctly classifies 89% cases. The assets/Total 000 00 confusion matrix is given below (Table 5): assets 8 Deposits and 0.18 06.59 06.593 55 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 cash/Current 341 41 7. Conclusion assets 9 Working 0.06 00.00 00.000 In this study, data mining methods of good repute Capital / 000 00 is implemented on dataset collected from financial Total Assets statements of 114 companies for classifying 10 Sales / Total 0.00 00.00 00.000 organizations as fraud or non – fraud. We collected Assets 000 00 and compiled 52 financial variables / ratios. Then, 11 Net income / 0.41 07.69 09.890 one way ANOVA is used for finding informative Fixed Assets 231 11 variables on the basis of p –value. Then three 12 Revenue 0.29 09.01 14.285 intelligent classification methods namely CART, /Total Assets 099 71 Naïve Bayesian Classifier and Genetic Programming are applied on 22 informative ratios. 13 EBIT 0.06 05.49 05.494 In order to have better reliability of the result, ten – 451 51 fold cross validation has been implemented 14 Z score 0.06 19.78 19.780 throughout the study. All the three methods have 022 22 been compared on the basis of sensitivity and 15 Accounts 0.29 00.54 01.098 specificity. CART produces best sensitivity and receivable/Pr 945 90 specificity as compared with other two methods. imary The accuracy rate of these methods can be further business enhanced by using some qualitative information income such as composition of administrative board along 16 Primary 0.18 02.74 03.296 with financial ratios used in this research. business 725 70 income/Total assets 17 Primary 0.41 03.29 08.791 business 670 21 References: income/Fixed assets 1 18 Capitals and 0.00 00.00 00.000 http://www.financierworldwide.com 2 http://www.ethicssage.com reserves/Tota 000 00 l debt 3 Kantardzi3 c M. (2002), Data Mining: Concepts, Models, 19 Gross 0.53 05.65 09.890 Methods, and Algorithms’, profit/Primar 149 11 Wiley – IEEE Press. 4 PriceWaterhouse&Coopers: Economic crime: People, culture y business and controls. The 4th Biennial Global Economic Crime Survey profit (2007), available at: www.pwc.com 20 Accounts 0.00 00.00 00.000 Receivable / 000 00 5 Association of Certified Fraud Examiners: 2006 ACFE Report Sales to the nation on Occupational fraud and abuse (2006), Technical 21 Retained 0.18 02.93 04.395 report, Association of Certified Fraud Examiners, USA, earnings / 040 60 available at: www.acfe.com Total Assets 22 EBIT / Total 0.24 03.29 05.494 6 Beasley, M. (1996). An empirical analysis of the relation Assets 670 51 between board of director composition and financial statement fraud. The Accounting Review, 71(4), 443–466. 7 Table: 8 (Performance Matrix) Green, B. P., & Choi, J. H. (1997). Assessing the risk of management fraud through neural-network technology. Auditing: A Journal of Practice and Theory, 16(1), 14–28. S.No. Predictor Sensitivity Specificity (%) (%) 8 1 CART 86.2 100 Fanning, K., & Cogger, K. (1998). Neural network detection of management fraud using published financial data. International 2 Naïve 65.5 97.6 Journal of Intelligent Systems in Accounting, Finance & Bayesian Management, 7(1), 21–24. Classifier 3 Genetic 53 99.2 9 Programming Efstathios Kirkos, Charalambos Spathis & Yannis Manolopoulos (2007). Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications 32 (23) (2007) 995–1003 56 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 10 C. Spathis, M. Doumpos, C. Zopounidis, Detecting falsified financial statements: a comparative study using multicriteria analysis and multivariate statistical techniques, European Accounting Review 11 (3) (2002) 509–535. 11 M. Cecchini, H. Aytug, G.J. Koehler, and P. Pathak. Detecting Management Fraud in Public Companies. http://warrington.ufl.edu/isom/docs/papers/DetectingManageme ntFraudInPublicCompanies.pdf 12 Rajan Gupta obtained masters degree in computer application S.-M. Huang, D.C. Yen, L.-W. Yang, J.-S. Hua, An from Department of Computer Science & Application, Guru investigation of Zipf's Law for fraud detection, Decision Support Jambheshwar University,Hisar, Haryana, India and Master Systems 46 (1) (2008) 70–83. 13 Degree of Philosophy in Computer Science from Madurai Hoogs Bethany, Thomas Kiehl, Christina Lacomb and Deniz Kamraj University, Madurai, India. He is currently pursuing Senturk (2007). A Genetic Algorithm Approach to Detecting Doctorate degree in Computer Science from Department of Temporal Patterns Indicative Of Financial Statement Fraud, Computer Science & Application, Mahrshi Dayanand Intelligent systems in accounting finance and management 2007; University, Rohtak, Haryana, India. 15: 41 – 56, John Wiley & Sons, USA, available at: www.interscience.wiley.com 14 M.J. Cerullo, V. Cerullo, Using neural networks to predict financial reporting fraud: Part 1, Computer Fraud & Security 5 (1999) 14–17. 15 E. Koskivaara, Artificial neural networks in auditing: state of the art, The ICFAI Journal of Audit Practice 1 (4) (2004) 12–33. 16 B. Busta, R. Weinberg, Using Benford's law and neural Dr Nasib S. Gill obtained Doctorate degree in computer science networks as a review procedure, Managerial Auditing Journal 13 and Post doctoral research in Computer Science from Brunel (6) (1998) 356–366. Univerrsity, U.K. He is currently working as Professor and Head 17 H.C. Koh, C.K. Low, Going concern prediction using data in the Department of Computer Science and Application, mining techniques, Managerial Auditing Journal 19 (3) (2004) Mahrshi Dayanand University, Rohtak, Haryana, India. He is 462–476. having more than 22 years of teaching and 20 years of research 18 Belinna Bai, Jerome yen, Xiaoguang Yang, False Financial experience. His interest areas include software metrics, Statements: Characteristics of china listed companies and CART component based metrics, testing, reusability, Data Mining and Detection Approach, International Journal of Information Data warehousing, NLP, AOSD, Information and Network Technology and Decision Making , Vol. 7, No. 2(2008), 339 - Security. 359 19 A. Deshmukh, L. Talluru, A rule-based fuzzy reasoning system for assessing the risk of management fraud, International Journal of Intelligent Systems in Accounting, Finance & Management 7 (4) (1998) 223–241. 20 Wei Zhou, G. Kappor, Detecting evolutionary financial statement fraud, Decision Support Systems 50 (2011) 570 – 575. 21 P.Ravisankar, V. Ravi, G.Raghava Rao, I., Bose, Detection of financial statement fraud and feature selection using data mining techniques, Decision Support Systems, 50(2011) 491 - 500 22 W.H. Beaver, Financial ratios as predictors of failure, Journal of Accounting Research 4 (1966) 71–111 23 H. M. Schilit, Financial Shenanigans (McGraw-Hill, Inc., New York, 1993). 24 C. Fei, The performances of four classes of listed companies are incredible (in Chinese), Hunan Daily (28-Sep-01). 25 E.I. Altman, Financial ratios, discriminant analysis and prediction of corporate bankruptcy, The Journal of Finance 23 (4) (1968) 589–609. 26 Stice J., Albrecht S. and Brown L., (1991), ‘Lessons to be learned-ZZZZBEST, Regina, and Lincoln Savings’, The CPA Journal, April, pp. 52-53. 27 W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic Programming—An Introduction; On the Automatic Evolution of Computer Programs and its Applications. San Mateo, CA/Heidelberg, Germany: Morgan Kaufmann/dpunkt.verlag, 1998. 28 J. H. Holland, Adaptation in Natural and Artiﬁcial Systems. Ann Arbor, MI: Univ. of Michigan Press, 1975. 29 K.M. Faraoun, A. Boukelif, Genetic programming approach for multi-category pattern classification applied to network intrusion detection, International Journal of Computational Intelligence and Applications 6 (1) (2006) 77–99. 57 http://sites.google.com/site/ijcsis/ ISSN 1947-5500
Pages to are hidden for
"Data Mining Techniques: A Key for detection of Financial Statement Fraud"Please download to view full document