Ordinary Least Squares Regression with Shazam Data Data files in the statistics class are usually Excel files stored on the hard drive or a floppy disk, but data may be entered directly by selecting ―file‖, ―new‖, and ―dataset‖ on the standard toolbar (first toolbar or second row) of the Shazam screen.. Usually it is easier to create Excel files with your data. Each row of the Excel spreadsheet should be an observation and each column a variable. Suppose pcenrgy, popdnsty, pcincome, imptergy, tropics are five variables of a cross section analysis of 31 countries. Each observation or row would be a country and each column would contain data for one of the variables in that country. To import the excel data into Shazam, the first row should contain the variable names. Shazam will look for these names and show them as column headings when the data is imported. Choose a variable name of 8 or fewer characters; the first character of the variable name should be a letter. Also do not put any special characters or spaces in the variable name. The excel file should contain no blank cells (missing data) if you wish to avoid complications. The first column should be the dependent variable to simplify matters. Shown below is the file in Excel. pcenrgy popdnst pcincom imprter tropics y e gy 1525 13 8570 -25 2 5215 2 20540 -98 1 3279 97 27980 68 2 5167 310 26420 78 2 772 19 4720 40 1 7879 3 19290 -50 2 47 19 110 5 1 707 129 860 -2 2 6918 123 32500 24 2 5613 17 24080 55 3 4150 106 26050 47 2 4156 234 28260 58 2 92 75 370 66 1 2454 111 4430 47 2 260 313 390 18 1 3003 269 15810 97 1 3964 333 37850 80 2 109 47 330 82 1 692 61 4680 -88 1 1456 48 3680 -51 1 33 150 210 86 2 4741 456 25820 10 2 4290 13 19480 19 2 265 36 410 74 1 308 12 2010 -141 2 1939 108 10450 90 2 7162 4896 32940 100 1 5736 21 26220 38 3 878 116 2800 63 1 3786 243 20710 -15 2 7905 2158 29 25 28740 3450 20 -298 2 1 Notice there is a row that must be deleted because there is no data in it. Save the file where you can find it on a disk or the hard drive. Shazam In Shazam, the top row across the screen says Shazam – Professional Edition etc. The second row is a toolbar with a Shazam symbol first, then file, edit, project, data etc. to the right along the toolbar. In the third row, a toolbar has ―New‖ and ―Open‖ as the first two options with a file symbol next to ―Open‖. When ―Open‖ is depressed (selected using the mouse) Shazam brings up a menu in windows that enables you to select the appropriate excel file you stored earlier. Be sure to identify the appropriate “type of file” so that Windows will show all files of that type that are stored on the drive. The default type of file Shazam looks for is a Shazam file. Change this to ―microsoft excel‖ or ―all files‖ so that the window will show excel files on the drive. Select the file that you have stored and then select “open‖. Shazam will then show a message about variable names and data and asks ―do you want to continue‖; you should answer “yes”. Another menu will appear that will permit you to select a spreadsheet—normally select sheet 1 and open. Another popup will appear asking if you wish to add the data set to the current project; you should indicate “yes”. Then you will give the data set a name and select “save” and when another window appears give the project a name and “save”. Check again to be sure the variable names are correctly entered and that there are no blank cells in the data set. At this point “load” the data. You are using Shazam to do ordinary least squares regression and test to see if the assumptions of regression have been met by testing for multicollinearity, autocorrelation, and heteroschedasticity. You will want a regression equation, t tests of signficance for partial correlaltion coefficients (variable coefficients), a Durbin Watson statistic, an R squared and adjusted R squared, an F test for significance of the equation, a variable correlation matrix to test for multicollinearity, and a White test for heteroschedasticity. Shazam will do all these things, some automatically with the OLS command; others will be obtained as options or as a second command. The Shazam edition we are using provides ―wizards‖ that assist you in writing the appropriate commands for ordinary least squares and other procedures. Although the program has these helps, it continues to function as a command driven program. Command windows enter commands to the program and output windows show the outputs from these commands. At this point, select Command Editor on the third toolbar of the Shazam screen. The data window will recede into the background and a new window will appear with a fourth toolbar. To obtain the variable correlation coefficient matrix, enter the following command on the blank command editor screen: stat pcenrgy popdnsty pcincome imptergy tropics/pcor Notice that if the abbreviations are correctly entered, Shazam will recognize them and show them as blue characters. Next use the wizard to help write commands for the ordinary least squares procedure. Select ―Wizards‖ on the second toolbar and a window appears that describes the purpose of ―Wizards‖. Use the wizard to construct commands to complete the multiple regression project. Select ―Next‖ and a menu of choices will appear. Select Ordinary least squares regression and ―Next‖. In the ―Tasks to Perform‖ menu select all of the boxes except the one for forecasting. Go to the next window which is a summary of what you have chosen to this point. Move to the following window by selecting ―Next‖. A window now appears that allows you to select the dependent and independent variables. Shade in per capita consumption of energy and use the ―Add‖ button to move it to the dependent variable box. Shade in the other variables (population density, per capita income, imports of energy, tropics) and add them to the independent variable list. Notice lags could be introduced at this point. In this practice problem you do not want to lag anything, however. If you want to use only a part of the data you could specify the part which you will use at this point. In this practice problem, you will use the entire sample so make sure the ―use existing‖ box is marked. Then go on to the next window. This window gives a number of options that could be used in the regression. For this regression nothing in the window will be selected. Supposedly, by not selecting ―suppress ANOVA‖ the program will automatically perform analysis of variance. This feature does not work as it should: you will have to put in a command to obtain analysis of variance. Notice that “Model form” is “Linear”. If you wanted to do a regression in logarithms, at this point you would change the linear to one of the other options. In this practice problem, leave it as ―Linear‖. Go to the next window that is a menu of diagnostics. Select “print observed, predicted and residuals” and “heteroskedasticity tests” and move to the next window. In the practice problem there are no restrictions, so select ―Next‖. There are no hypotheses to specify so move to the next window. This window provides an opportunity to specify obtain confidence intervals for the variables. Shade in all the variables and move them to the selected side; go to the next window which is entitled Final Step. Be sure to select “Generate commands and insert into currently active editor” box. After you select ―Finish‖ the wizard returns you to the command editor box. You should see the following in the command editor: stat pcenrgy popdnsty pcincome imptergy tropics/pcor ols pcenrgy popdnsty pcincome imptergy tropics confid popdnsty pcincome imptergy tropics diagnos / list het Notice that new commands have been added to the command editor other than the ―stat‖ command specified earlier. You need to insert a command to obtain analysis of variance. This is done by adding a slash and “anova” after the variable list in the ols command. The command editor window should now look like this: stat pcenrgy popdnsty pcincome imptergy tropics/pcor ols pcenrgy popdnsty pcincome imptergy tropics/anova confid popdnsty pcincome imptergy tropics diagnos / list het Select “run” the tasks you have the window for any they indicate data on the fourth toolbar and Shazam will complete all selected for it. Be sure to look at the bottom of errors or warnings. Pay attention to these because problems. You may need to correct your data. Print The print command will give you everything in the ―Command Editor (output) window. A copy of the data is obtained by depressing the ―energy2.xls‖ (or whatever you have named the data) file on the third toolbar and then print. If you have problems with the data, correct them. You will then have to reload the data. To begin a new regression, it is necessary to obtain a new ―command editor‖ box. This is done by selecting ―New‖ on the second toolbar. If you have several different ―command editors‖ and data sets, the third toolbar becomes filled and an arrow appears at the right of the third toolbar to allow you to see all the previous command editors and data sets. It is wise at this point to consider the output from your commands to be sure you have everything needed: correlation matrix of variables; R squared and Adjusted R squared; analysis of variance; confidence intervals; variable coefficients, t ratios, and p values; Durbin-Watson statistic; and heteroschedasticity tests.