Paper-A comparison of two data mining techniques 
Intelligent Data Analysis 7 (2003) 3–13 3 IOS Press A comparison of two data mining techniques to predict abnormal stock market returns Alan M. Safer Department of Mathematics, California State University, Long Beach, 1250 Bellflower Blvd., Long Beach, CA 90840-1001, USA Tel.: +1 562 985 4731; Fax: +1 562 598 7257; E-mail: asafer@csulb.edu Received 4 February 2002 Revised 14 March 2002 Accepted 4 June 2002 Abstract. Two data mining techniques were compared for their ability to improve the prediction of abnormal returns using insider stock trading data. The two were neural networks (NN) and Multivariate Adaptive Regressive Splines (MARS). In the comparison, both analyzed abnormal stock market returns from the same 343 companies over the identical 412 year period (1/93-6/97). The major findings were: 1) both NN and MARS generally identified the same industries that had the most predictive abnormal stock returns 2) both found that predictions further in the future (12 and 9 months ahead) were more accurate than predictions closer to the trading date (6 and 3 months ahead) 3) both obtained better predictive accuracy using four – rather than two – months of back aggregated stock data 4) NN identified a substantially greater percentage of stocks in the group with the highest explained variance than did MARS 5) data from small and midsize companies led to higher predictive accuracy than data from large size (S&P 500) companies using NN, but not MARS. The findings illustrate that the very complex interaction between insider trading data and abnormal stock returns can be systematically analyzed using non-linear techniques. Of the two assessed, NN led to comparatively more accurate predictions than did MARS. 1. Introduction Most previous research studies find that insider traders usually make abnormal returns [11,16,24]. Outsiders who use legal insider information can also make increased profits [2,13,24]. The ability of outsiders using insider trading information to predict abnormal returns can be increased by focusing on data such as the size of the company and the number of months in the future that are predictive for stock prices [11,18,23]. Amore mathematically precise analysis using insider trading data for the prediction of abnormal returns is possible with the aid of recent data mining technology, such as neural networks (NN) and Multivariate Adaptive Regression Splines (MARS). NNandMARS are nonparametric techniques useful for analyzing nonlinear data sets such as those that characterize stock price information. In scholarly journals, neural networks, but not MARS, has been used to analyze stock market data [10,21,25]. However, only one previous study – by the author – has used a data mining technique to predict abnormal returns of stocks based on insider trading information [19]. That preliminary study using neural networks was limited by having a very small sample of companies (n = 36). This study will expand data mining approaches and the data set in order to improve the prediction of abnormal stock price returns from legal insider trading. It will apply and compare two different 1088-467X/03/$8.00 2003 – IOS Press. All rights reserved4 A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns Fig. 1. An example of the three layer feedforward neural network. techniques, NN and MARS, and will increase the number of companies, the number of industries evaluated, the number of variables used in the prediction of abnormal stock returns, and the number of previous and future months on which the prediction is based. It is based on a 4 12 year study from January 1993 to June 1997 covering 343 companies. 2. Technical background of data mining techniques 2.1. Neural networks A neural network is a set of highly interconnected computational units (nodes) in a network of layers that are used to estimate the parameters (weights) of the data. Most neural networks use only onedirecctiona (feedforward) signal flow. Furthermore, most feedforward neural networks are organized in layers. An example of the three layer feedforward neural network (input, hidden, and output layers) is shown in Fig. 1. The feedforward neural network is used for nonlinear transformations (mapping) of a multidimensional input variable into another multidimensional output variable [26]. In theory, any input-output mapping should be possible if the neural network has enough neurons in its hidden layers since the size of output layer is set by the number of outputs required. Practically, this is not an easy task since there is no satisfactory method to define how many neurons should be used in hidden layers. Consequently, the size of the hidden layers is usually found by a trial and error method. In general, it is known that if more neurons are used, more complicated shapes can be mapped. Nonetheless, networks with a large number of neurons increasingly lose their ability for generalization. Weights in nodes are adjusted during a training procedure. Various learning algorithms have been developed, but only a few are suitable for multi-layer neural networks. The backpropagation algorithm was a significant breakthrough in neural network research. Backpropagation trains a neural network using a gradient descent algorithm to minimize the mean square error between the network’s output and the desired output. This creates a global cost function which is minimized iteratively by ‘backpropagating’ the error from the output nodes to the input nodes. Once the network’s error has reached the specified threshold, the network has converged and is now trained. Nonetheless, backpropagation is also viewed as an algorithm with a very poor convergence rate. The Levenberg-Marquardt (LM) algorithm used in this study is a variation of the backpropagation method. LM is now considered as the most efficient algorithm since it combines speed and stability [26].A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns 5 A training set consists of inputs and associated outputs used for learning the values of the weights. A holdout or validation set is used to better evaluate the parameters. Neural networks can be valuable to determine the functional relationship between dependent and independent variables. The networks are able to approximate any continuous function, so one does not have to guess the functional form [4]. 2.2. MARS MARS is a nonparametric technique useful for analyzing nonlinear data sets such as those that characterize stock price information. It was created in 1991 by Jerome Friedman [7]. MARS is one of the more useful nonlinear nonparametric methods for solving statistical applications in business, but it is infrequently reported in scholarly journals. In contrast to global parametric methods, which are most appropriate when the user has a very good knowledge of the function, many of the nonlinear nonparametric methods, such as MARS, are highly data driven and therefore computer intensive. With the availability of modern computer capability today, MARS can be used very effectively for complex business data analysis. The main concept involved in MARS is that in different areas of the input space different variables may have a greater or lesser contribution to the response surface. The adaptive term in MARS refers to the ability of the algorithm to select the dominant variables in each of the sub-regions. In addition, MARS determines from the data the number of intervals needed for each variable [22]. MARS segments the space of possible input cases into rectangular regions that are fit with linear or cubic splines (moderately complex curves). A spline is a piecewise polynomial whose different polynomial segments have been attached together by points called knots so as to insure continuous properties. The knot designates the end of one region of data and the start of another. It is the point where the pattern of the function changes. In classical splines, the knots are predetermined and evenly spaced. In MARS, the knots are found by a search procedure. Consequently, only as many knots as needed are used in the model. Then the knots which contribute least to the overall fit are removed during the backward pruning step. In MARS, basis functions are used for generalizing the search for knots in multiple dimensions. Basis functions re-express the association of the predictor and response variable [8]. The basic MARS algorithm involves building a set of product spline basis functions and fitting the coefficients of these basis functions to the data by the least squares method. It uses “hockey stick” functions (so called because the function is flat initially, and then has a slope that looks like a hockey stick) to represent splines. Then instead of using the variables in the regression, MARS uses the basis functions. If a pair of basis functions (the primary and mirror image) contribute more to the model when interacting with the one basis function already in the model, an interaction is added to the model instead of a main effect. MARS models the true underlying function f(x) by: ˆ f(x) = a0 + Mm=1 am Km k=1Bkm(xv(k,m)) (1) In effect, it is a sum of tensor product spline basis functions. The coefficient of the constant basis function is a0 and the coefficient of the mth basis function is am. The basis functions, Bkm, are firstorrde truncated power splines: Bkm(x) = ±(x − tkm)+ where tkm is the knot of the input variable so (x−t)+ = 0, xt x − t, x > t. In essence, (x−t)+ is max(0, x−t). For Km = 1, the model is additive.6 A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns For Km = 2, the model allows pairwise interactions. The user sets a maximum, M, on the number of basis functions allowed in the model. There are 2M+1 basis functions in the model after the Mth iteration. First, the constant basis function is entered. Following this, two newbasis functions (the primary and mirror image) – which most decrease the residual sum of squares – are added to the model until the Mth iteration. The predictor-knot pair that most reduces the residual sum of squares is then selected and utilized [5]. Essentially, the maximum number of basis functions allowed are entered in a deliberate attempt to overfit the data. After the 2M+1 basis functions entered the model, the least important basis functions are individually taken out of the model based on the generalized cross-validation criterion (GCV) [14]. The GCV is obtained using the following formula: GCV (M) = 1 N N i=1[yi − ˆyi]2 1 − C(M) N 2 (2) where the denominator is a penalty for increasing C(M), the model complexity, which contains M basis functions and where N is the number of cases. The GCV formula results in C(M) > M (the usual measure used in linear regression). The modified C(M) allows for the more general case where the basis functions and the expansion coefficients are data determined to reflect the additional degree to which the model is being fit to the data. Specifically, C(M) = M ∗ dc +1+ 1. M is the number of non-constant basis functions in the model. The degrees of freedom parameter, d, represents an additional contribution from each basis function to the overall model complexity. It results from fitting the basis function parameters to the data at each iterative step. It is actually d/2 since there are 2 basis functions for each nonlinear fit [8]. Thus, it allows charging each basis function with more than one degree of freedom. After the MARS model is determined, there is a grouping of all the basis functions that involve main effects and another grouping of basis functions that involve pairwise interactions (and even higher level interactions when applicable). This grouping of basis functions is as follows: ˆ f(x) = a0 + km=1 fi(xi) + km=2 fij(xi, xj) + . . . (3) This procedure is called ANOVA decomposition [14]. 2.3. MARS and neural networks compared MARS and neural networks can be used to solve similar problems but each has a distinct approach. MARS has the following differences and advantages over neural networks: 1) it will run faster 2) there is less of a black box than in neural networks as it develops a functional form of the relationship between the inputs and output 3) an input selection process is part of the procedure, and the overall variable importance is also determined 4) there are boundaries in MARS provided for the input, and this is apparent at locations where changes in the data occur 5) the model can be represented in a form that separately identifies additive contributions and those involving interactions [6] 6) it has an adaptive knot selection so that outlier observations in the response only affect the fit locally as opposed to globally [8]. The curse of dimensionality involves the need for an exponential increase in sample size to realize a linear increase in the number of predictor variables. MARS attempts to counteract this problem byA.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns 7 utilizing its localized low-dimensional data structure during construction of the model [12]. MARS uses a tree-based regression to partition the input variable space dependent upon the data. Neural networks can prune weights to help in this same type of problem. Neural networks have their own advantages. 1) It tends to handle correlated data better than MARS; 2) neural networks allows for high order variable interactions because of its increased connectivity [5]; 3) Neural networks can have multiple outputs without being modified. Moreover, a modified MARS is also capable of having multiple outputs [17]. 3. Methods 3.1. Stock selection The insider trading data used in this study are from January 1993 to mid June 1997. The stocks used in the analyses included all stocks in the S&P 600 (small cap), S&P 400 (midsize cap) and S&P 500 (large cap) as of June 1997 that had insider records for the entire period of the study. There were 946 stocks in the three market caps which had available data for the whole time period. From the list of 946 stocks, the sample included every stock that averaged at least 2 buys per year (at least 9 total) during the 412 year study period. The resultant number of stocks used for the study was 343. The reason for using insider purchases over sales is that they are more closely aligned to a company’s prospects and are therefore more useful for the prediction of abnormal returns [11,23]. The rationale for requiring two purchases per year is that it provides sufficient transaction data for the analyses. The original data came from the Securities and Exchange Commission (SEC). These data include: company, name of insider, rank, transaction date, stock price, shares traded, type of transaction (buy or sell), and shares held after trade. Reporting the data to the SEC can be delayed up to a maximum of 1 month and 10 days after each transaction. 3.2. Past and future periods of analysis In the present study, an important design issue involves finding the optimal length of time in the past from which to analyze buy and sell transaction data. Many studies take an aggregate of insider activities one month before the current date and then predict future returns [24]. Based on the knowledge of a leading investigator who has followed insider trading for over 15 years [13], this study uses 2 months (specifically 9 weeks) and 4 months (18 weeks) of insider trading history to appraise past trading patterns. When considering transactions by insiders, it is useful to include trades several months before the current date. This is because insiders do not expect to have the stock change immediately after their transactions. If they did, they might violate insider trading laws. Consequently, insiders will often make transactions over a period of time. In addition, some insiders might have confidence in the longer term performance of the stock knowing that the company has a competitive advantage that will show up sometime in the future. The period of time in the future used to predict abnormal returns was then arbitrarily set to 3, 6, 9, and 12 months. 3.3. Abnormal returns In order to control for risk and determine abnormal returns for stocks, an event analysis similar to the one by Brown and Warner [3] was used. Consequently, the Sharpe-Lintner form of the Capital Asset8 A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns Pricing Method (CAPM) [9] was utilized in this study. In the CAPM, risk and the reward for taking risk are quantified. Risk is defined using the concept of beta (β). Beta is the ratio of the movements of an individual stock relative to the movements of the overall market. Consequently, stocks with betas of 1 move similar to the market. Those stocks with betas greater than 1 fluctuate more than the market and are riskier than the overall market; conversely, those stocks with betas lower than 1 fluctuate less than the market as a whole. The CAPM implies that the expected return of an asset is linearly related to the covariance of its return with the return of the overall market. The CAPM uses the analysis from a period before and after an “event”. The event in this study is not the traditional event, such as an earnings report date [20]. In this study, the event is the current day when an investor decides to make a transaction in a particular stock. This method includes a pre-event period starting the day before the event and going back 3 months. The output variable in this study, abnormal return, is calculated using the estimates gathered from the pre-event period. These estimates help determine the abnormal returns 3, 6, 9, and 12 months ahead. 3.3.1. Part 1 of CAPM First, a multiple regression analysis was used to estimate αi (the intercept) and βi (a measure of the systematic risk of an asset) based on the pre-event period (t-1 day to t-90 days). This is done using the following equation: Ri,t = αi + βi ∗ Rm,t + εi,t (4) where: Ri,t is the pre-event return on stock i for day t; Rm,t is the pre-event return of the market; εi,t is the error for stock i; t is the pre-event period from t-90 to t-1 (t-0 is the event day). 3.3.2. Part 2 of CAPM To calculate abnormal returns, αi and βi from part 1’s pre-event period were used. The outputs are: abnormal returns 3, 6, 9, and 12 months ahead. εi = Ri,T − αi − βi ∗ Rm,T (5) T is the event period (either 3, 6, 9, or 12 months ahead). 3.4. Variables The variables used in the study to predict abnormal returns are shown in Table 1. Variable 1 is the number of new shareholders in the company. Variable 2 is the eight-week ratio of the number of sell transactions over the number of buy transactions for the market as a whole. Variables 3 and 4 include the median number of shares bought and sold relative to the amount the insiders held before the transaction. Variable 5 retroactively reports the overall average percent change of all traders’ individual returns from the previous 2 and 4 month periods to the subsequent 3 months of insider buy transactions. Variables 6 and 7 convey an overall average percent change in returns as in variable 5. However, these variables involve returns from the previous 2 and 4 month periods to the subsequent 6 and 9 month periods of transactions. The reason for using variables 5–7 is to ascertain whether traders who bought stock in the past that resulted in financial gain achieved similar results in subsequent trading. This is relevant becauseA.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns 9 Table 1 Predictor variables used in this study Variable name Description of variable 1 # new holders Number of new shareholders 2 8 week ratio of sells/buys 8 week ratio of # selling transactions/# buying transactions for the market as a whole 3 median buy Median individual insider shares bought relative to holdings (# shares bought/# shares held before trade) 4 median sell Median individual insider shares sold relative to holdings (# shares sold/# shares held before trade) 5 avg pct change (3 months) look 3 months ahead and see avg of pct change from past insiders who bought 6 avg pct change (6 months) look 6 months ahead and see avg of pct change from past insiders who bought 7 avg pct change (9 months) look 9 months ahead and see avg of pct change from past insiders who bought 8 # buys # buy transactions in period 9 # sells # sell transactions in period 10 buy volume # shares bought in period 11 sell volume # shares sold in period 12 buy value dollar value of buy transactions 13 sell value dollar value of sell transactions some insiders have been more aware of their company’s prospects than others. The company rank of the insider (e.g., CEO, CFO) was not used as a variable in this study because it has had mixed results in regard to predicting abnormal stock price returns [11,15]. For similar reasons, insiders who own 10% or more of company shares but who were uninvolved in company decisions were not included [2]. 3.5. Data mining specifications The inputs used in neural networks and MARS are listed in Table 1. The abnormal return (3, 6, 9, and 12 months ahead) is the numeric output variable (as described in Section 3.3 and shown in 3.8.1 and 3.8.2) from the previous 9 and 18 weeks of aggregate data, wherein 223 total weeks were covered. This analysis randomly used approximately 80% for the training set, and the rest (approximately 20%) for the validation set. 3.6. Neural network specifications There was one hidden (middle) layer in this neural network analysis (i.e., 1 input layer, 1 hidden middle layer, 1 output layer). The number of nodes in the hidden layer varied depending on the stock, but usually was between 5 and 9. For the 9 week and the 18 week analyses, different numbers of neurons in the hidden layer were used. The number of neurons selected was the amount in the network with the least means squared error in the validation set. For the 2 sets of data described, the data were aggregated. That is, the inputs were aggregated from the week of the transaction decision event and included every week up to 9 weeks back and every week up to 18 weeks back. 3.7. MARS specifications Pairwise interactions were allowed in the MARS model. The maximum number of basis functions was set at 60. Thirteen input variables were included in this model. MARS attempts to get as close as possible to the maximum number of basis functions. No interactions higher than pairwise were used so as to keep the number of estimated parameters from being extremely high.10 A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns Table 2 Comparison of NN and MARS using 18 weeks back data, percent of stocks having 60% or higher explained variance Method Forecast period Neural networks MARS 12 months ahead 119 (34.7%) 84 (24.5%) 9 months ahead 102 (29.7%) 66 (19.2%) 6 months ahead 59 (17.2%) 34 (9.9%) 3 months ahead 11 (3.2%) 6 (1.2%) 3.8. Standard Deviation (S.D.) ratio by industry The S.D. ratio is very useful in determining the model fit. The S.D. ratio = standard deviation of errors for the output variable standard deviation of the target output variable (6) The explained variance of the model can be found by subtracting the S.D. ratio from 1 (i.e, 1-S.D. ratio). In order to determine what types of industries have the highest percentage of stocks with highest explained variance, Standard Industrial Classification (SIC) codes from 1987–1997 were used. They cover the 10 major industry categories and are grouped using the first two digits of a 4 digit SIC code. Included are 00-09 (Agriculture, forestry, and fishing), 10–14 (Mining), 15–17 (Construction), 20– 39 (Manufacturing), 40–49 (Transportation, Communication, Electronic Gas, Sanitary Services), 50–51 (Wholesale Trade), 52–59 (Retail Trade), 60–67 (Finance, Insurance, and Real Estate), 70–89 (Services), and 91–97 (Public Administration). Within each division, there are major subgroups within the first 2 digits of the 4 digit SIC code. These major subgroups are the basis for the industry codes in this study. Arbitrarily in this study, the 5 top-ranked industry groups for the 4 future month prediction periods (3, 6, 9, and 12 months) are listed in the results. 4. Results Both MARS and neural networks had many similar findings, but NN had better accuracy in predicting abnormal stock returns from insider trading data (Table 2). Both MARS and NN identified similar industries having the most predictable abnormal stock returns. These industries were: industry group 36 (Electronic and other electrical equipment and components, except computer equipment), 28 (Chemical and allied products), 37 (Transportation equipment), and 73 (Business services). MARS additionally identified industry group 35 (Industrial and commercial machinery and computer equipment) as being highly predictable in terms of abnormal stock returns. Both neural networks and MARS found that the predictions further in the future, 12 months and 9 months ahead, as compared to those closer to the present, 6 and 3 months ahead, were more successful at identifying abnormal insider trading variations (Tables 2 and 3). In addition, both methods found that evaluating companies with 4 months of back aggregated data was better in being able to predict than only 2 months of back aggregated data. Nonetheless, there were prominent differences in the results obtained by MARS and neural networks in the analysis. Across industries, a greater percentage of stocks using neural networks as compared to using MARS were in the highest explained variance group (Tables 3a–3c). Consequently, neuralA.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns 11 Table 3a Industries whose insider trading accounts for at least 60% of the explained variance of abnormal returns for its companies 18 weeks back, 12 months ahead prediction SIC Code Industry group NN percentage within industry group (with at least 10 companies) MARS percentage withii industry group (with at least 10 companies) 36 Electronic equipment except computer equipment 73% (11 out of 15) 67% (9 out of 15) 73 Business services 60% (6 out of 10) 40% (4 out of 10) 35 Industrial and commercial machinery and compuute equipment – 40% (4 out of 10) 49 Electric gas 45% (17 out of 38) – 28 Chemical and allied products 42% (8 out of 19) 32% (6 out of 19) 37 Transportation equipment 38% (5 out of 13) 38% (5 out of 13) Table 3b Industries whose insider trading accounts for at least 60% of the explained variance of abnormal returns for its companies 18 weeks back, 9 months ahead prediction SIC Code Industry group NN percentage within industry group (with at least 10 companies) MARS percentage withii industry group (with at least 10 companies) 73 Business services 50% (5 out of 10) 30% (3 out of 10) 35 Industrial and commercial machinery and compuute equipment – 50% (5 out of 10) 28 Chemical and allied products 47% (9 out of 19) 42% (8 out of 19) 36 Electronic equipment except computer equipment 47% (7 out of 15) 27% (4 out of 15) 37 Transportation equipment 46% (6 out of 13) 31% (4 out of 13) 20 Food and kindred products 29% (4 out of 14) – Table 3c Industries whose insider trading accounts for at least 60% of the explained variance of abnormal returns for its companies 18 weeks back, 6 months ahead prediction SIC Code Industry group NN percentage within industry group (with at least 10 companies) MARS percentage withii industry group (with at least 10 companies) 28 Chemical and allied products – 37% (7 out of 19) 20 Food and kindred products – 23% (3 out of 13) 36 Electronic equipment except computer equipment 20% (3 out of 15) 13% (2 out of 15) 27 Printing publishing and allied industries 20% (2 out of 10) – 73 Business services 20% (2 out of 10) 10% (1 out of 10) 49 Electric, gas, and sanitary services 18% (7 out of 38) – 60 Depository Institutions 18% (6 out of 33) – 35 Industrial and commercial machinery and compuute equipment – 10% (1 out of 10) networks analyses were more accurate than MARS using insider trading to predict abnormal returns. Table 2 makes this point very obvious in respect to individual stocks that had over 60% explained variance. The other comparative results were similar. Neural networks showed that by analyzing small, and to a lesser extent midsize companies, (S&P 600 and S&P 400 respectively), it was easier to predict abnormal stock returns from insider trading than by analyzing bigger size companies (S&P 500). MARS, however, did not show company size differences in this respect (Table 4).12 A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns Table 4 Explained variance for 18 weeks back and 12 months future predictionExplained variance (%) < 40 40–60 > 60 MARS Small cap 31.7 44.2 24.0 Mid cap 26.9 48.4 24.7 Large cap 32.2 43.8 24.0 NN Small cap 15.4 46.2 38.5 Mid cap 17.0 44.7 38.3 Large cap 18.5 52.1 29.5 5. Conclusion Using data mining technology revealed that the prediction of abnormal returns from insider trading data can be maximized in the following ways: 1) extending the time of the future forecast up to 1 year; 2) increasing the period of back aggregated data; 3) selecting certain industries such as those manufacturing electronic equipment (except computer equipment) and business services rather than others; 4) choosing neural networks to achieve a higher predictive accuracy as compared to MARS. Certain findings match those reported in studies that do not use data mining technologies. For example, it has been reported that 12 months in the future is a better predictor than shorter forecasts [11], and that abnormal returns are better predictors for smaller rather than larger companies using NN [1,11]. Data mining as exemplified by this study has certain advantages over previous insider trading research approaches. It uses up to 4 months back aggregated data. Industries are analyzed by company type and are compared in relation to their prediction of abnormal returns. Furthermore, the very complex interaction between insider trading data and abnormal stock returns can be systematically analyzed with useful results using nonlinear techniques. 6. Future research Several ways to extend this study are: 1) including composite industry-wide insider trading as an input variable. 2) increasing the number of years in the study. 3) using lagged data instead of aggregate data. 4) comparing this MARS analysis with other nonlinear techniques. Acknowledgments The author thanks the Editor for his help and the anonymous reviewers for providing valuable commennts References [1] R. Banz, The relationship between return and market value of common stocks, Journal of Financial Economics 9(1) (March 1981), 3–18. [2] C. Bettis, D. Vickrey and D.W. Vickrey, Mimickers of Corporate Insiders Who Make Large-Volume Trades, Financial Analysts Journal 53(5) (September/October 1997), 57–66. [3] S. Brown and J. Warner, Using Daily Stock Returns: The Case of Event Studies, Journal of Financial Economics 14(1) (March 1985), 3–31.A.M. Safer /A comparison of two data mining techniques to predict abnormal stock market returns 13 [4] B. Cheng and D.M. Titterington, Neural Networks: A Review from a Statistical Perspective, Statistical Science 9(1) (1994), 2–54. [5] R.D. De Veaux, D.C. Psichogios and L.H. Ungar, A Comparison of Two Nonparametric Estimation Schemes: MARS and Neural Networks, Computers Chemical Engineering 17(8) (1993), 819–837. [6] W. Dwinnell, 2000. Exploring MARS: An Alternative to Neural Networks, PCAI 12(4) (2000), 21–24. [7] J.H. Friedman, Multivariate Adaptive Regression Splines, Annals of Statistics 19 (1991), 1–141. [8] J.H. Friedman and C.B. Roosen, An Introduction to Multivariate Adaptive, Regression Splines 4 (1995), 197–217. [9] E. Guo, S. Nilanjan and D. Shome, Analysts’ Forecasts: Low-Balling, Market Efficiency, and Insider Trading, The Financial Review 30(3) (August 1995), 529–539. [10] L. Kryzanowski, M. Galler and D. Wright, Using Artificial Neural Networks to Pick Stocks, Financial Analysts Journal 49(4) (July/August 1993), 21–27. [11] J. Lakonishok and I. Lee, Are Insiders’ Trades Informative? National Bureau of Economic Research, Cambridge, MA, Inc. Working Paper 6656, 1998. [12] P.A.W. Lewis, B.K. Ray and J.G. Stevens, Modeling Time Series by Using Multivariate Adaptive Regression Splines (MARS), in: Time Series Prediction: Forecasting the Future and Understanding the Past, A.S. Weigend and N.A. Gershenfeld, eds, Addison-Wesley, 1993, pp. 297–318. [13] J. Moreland, Profit from Legal Insider Trading: Invest Today on Tomorrow’s News, Dearborn Publishing, Chicago, IL, 2000. [14] V. Nguyen-Cong, G. Van Dang and B.M. Rode, Using Multivariate Adapative Regression Splines to QSAR studies of dihydroartemisinin derivatives 31 (1996), 797–803. [15] K.P. Nunn, G.P. Madden and M. Gombola, Are Some Insiders More Inside Than Others? Journal of Portfolio Management 9(3) (Spring 1983), 18–22. [16] D. Pescatrice, V. Calluzzo and M. Fragola, Insider Trading Characteristics Offering Superior Investment Returns, American Business Review 10(2) (June 1992), 73–77. [17] D.T. Pham and B.J. Peat, Automatic learning using neural networks and adaptive regression, Measurement + Control 32 (1999), 270–274. [18] M.S. Rozeff and M.A. Zaman, Market Efficiency and Insider Trading: New Evidence, Journal of Business 61(1) (January 1988), 25–44. [19] A.M. Safer, B.M. Wilamowski and R. Anderson-Sprecher, Neural Networks for Prediction Using Legal Insider Stock Trading Data, Intelligent Engineering Systems Through Artificial Neural Networks Vol 8 ANNIE’98 (Artificial Neural Networks in Engineering), St. Louis, MO., Nov. 1998, pp. 683–689. [20] A.M. Safer and B.M.Wilamowski, Using Artificial Neural Networks to Predict Abnormally High Stock Returns Around Quarterly Earning Reports IJCNN’99 (International Joint Conference on Neural Networks)Washington, DC, #302: 1–8, July 1999. [21] E. Schoneburg, Stock Price Prediction Using Neural Networks: A Project Report, Neurocomputing 2 (1990), 17–27. [22] S. Sekulic and B.R. Kowalski, MARS: A Tutorial, Journal of Chemometrics 6 (1992), 199–216. [23] H.N. Seyhun, Insiders’ profits, costs of trading, and market efficiency, Journal of Financial Economics 16(2) (June 1986), 189–212. [24] H.N. Seyhun, Investment Intelligence from Insider Trading, Cambridge, Mass: MIT Press, 1998. [25] G. Swales and Y. Yoon, Applying Artificial Neural Networks to Investment Analysis, Financial Analysts Journal 48(5) (September/October 1992), 78–80. [26] B.Warner and M. Misra, Understanding Neural Networks as Statistical Tools, The American Statistician 50(4) (November 1996), 284–292.