Econ515 SAS Guide

Shared by: HC120309021717
Categories
Tags
-
Stats
views:
8
posted:
3/8/2012
language:
English
pages:
34
Document Sample
scope of work template
							A CONCISE GUIDE TO THE SAS STATISTICAL PACKAGE

                 Professor Thornton
                   Economics 515
                   Econometrics
INTRODUCTION

This guide provides an overview of the SAS statistical package and an explanation of a number of useful
SAS commands and capabilities. It does not explain all SAS commands and capabilities. SAS is an
extremely powerful statistical package, and if you desire to learn more about what it can do you should
consult the appropriate SAS Users Manual or purchase one of the many SAS companion books available
in bookstores that provide a more detailed explanation about various facets of the SAS system.

DATA SETS

In this guide, SAS commands are explained in the context of examples. The examples are based on the
following eight data sets. It is assumed that each data set is contained in an ASCII file on a floppy disk in
drive A.

DATA7-2

The data file DATA7-2 comes with the Ramanathan econometrics text book. It consists of a cross-
section of 49 workers. The variables are WAGE = monthly wage, EDUC = years of education beyond
the eighth grade, EXPER = years of experience, AGE = age of worker, GENDER = indicator variable for
gender (1 if male, 0 if female), RACE = indicator variable for race (1 if white, 0 if nonwhite),
CLERICAL = indicator variable for clerical worker (1 if clerical worker, 0 otherwise), MAINT =
indicator variable for maintenance worker (1 if maintenance worker, 0 otherwise), CRAFTS = indicator
variable for crafts worker (1 if crafts worker, 0 otherwise).

CONSUMER

The data file CONSUMER consists a cross-section of 30 individual consumers. The variables are
QBEEF = pounds of beef consumed per year, PBEEF = price of beef per pound, INCOME = annual
consumer income in thousands, PFISH = price of fish per pound.

DOCTOR1

The data file DOCTOR1 consists of a cross-section of 87 primary care physicians for the year 1985. The
variables are ID = identification number for the physician, VISITS = number of patient visits per week,
HOURS = physician hours worked per week, AIDES = number of non-physician employees in the
medical practice, DOCWAGE = average hourly earnings of physician, AIDEWAGE = weekly wage of
non-physician employee.

DOCTOR2

The data file DOCTOR2 consists of the same cross-section of 87 primary care physicians as in the data
file DOCTOR1. The variables are ID = identification number for the physician, PRICE = fee charged by
doctor for office visit with an established patient. YDUM = indicator variable for non-medical income (1
if non-medical income of more than $10,000, 0 otherwise), AGE = age of physician, PCINC = per capita
income in county in which physician practices, POPTOT = population in county in which physician
practices.

MACROCON

The data file MACROCON consists of a times-series of annual data for the period 1959 to 1995. The
variables are YEAR = year, CONS = annual consumption spending in billions of dollars, DISINC =


                                                                                                            2
annual disposable income in billions of dollars, PRICE = consumer price index, PRIME = the prime
interest rate, UN = unemployment rate.

DEMAND

The Limdep data file CONSUMER.LPJ consists of prices and quantities purchased of three goods, and
income, for a cross section of 30 individual consumers. These data are simulated, not real world data. The
variables are: Q1 = quantity purchased of good 1, Q2 = quantity purchased of good 2, Q3 = quantity
purchased of good 3, P1 = price of good 1, P2 = price of good 2, P3 = price of good 3, I = consumer income.

PRODUCER

The Limdep Data File PRODUCER.LPJ consists of cross-section data for 92 dairy farm households for the
year 1986. These data were obtained from a random sample of Utah dairy farmers in five counties that were
the major dairy production centers. The variables are: OUTPUT = pounds of milk produced per year,
LABOR = hours worked per year by household members, CAPITAL = units of capital, LAND = units of
land, PCAPITAL = price per unit of capital, PLAND = price per unit of land, POUTPUT = price per pound
of milk, PLABOR = hourly wage of labor. Note that the price of labor and the price of land do not vary
across dairy farms i.e., all 92 dairy farms can purchase labor and land at the same price.

LABOR

The Limdep data File LABORSUPPLY.LPJ consists of cross-section data for 100 families taken from the
1976 panel study of income dynamics, and is based on data for the year 1975. The variables are: LFP = a
dummy variable for wife labor force participation (1 if wife worked in 1974, 0 otherwise), WHRS = wife’s
hours of work in 1975, KL6 = number of children less than 6 years old in household, K618 = number of
children between 6 and 18 in the household, WA = wife’s age, WE = wife’s years of education, WW =
wife’s hourly wage for 1975, HHRS = husband’s hours worked in 1975, HA = husband’s age, HE =
husband’s years of education, HW = husband’s hourly wage rate for 1975, FAMINC = total family income
for 1975, MTR = marginal tax rate for wife, WMED = wife’s mother’s years of education, WFED = wife’s
father’s years of education, UN = unemployment rate in county of residence (percentage), CIT = dummy
variable for urban area (1 if family lives in large city, 0 otherwise), AX = wife’s years of labor market
experience.

BACKGROUND INFORMATION

SAS is a statistical software package that can be used to read, manage, analyze, and present data. SAS
allows you to read data in a variety of different formats, transform the data to conduct statistical analyses,
analyze the data, and present the results.

A SAS program has two major components: Data Steps and Procedures. The data step allows you to
read SAS data sets or raw data, perform transformations on the data, create new variables, and recode
existing variables. The data step is the component of the program that creates SAS datasets. The
procedure (usually referred to as PROC) allows you to analyze and present the data. Data steps and
procedures are comprised of one or more statements. A statement is usually identified by a keyword
that suggests the statement’s function (e.g., INPUT, INFILE, MEANS, RUN). Every statement ends with
a semicolon.

EXECUTING A SAS PROGRAM




                                                                                                             3
A SAS program can be executed in different ways. The two most important ways are batch mode and
interactive windows mode. In batch mode you use a text editor (such as Microsoft WordPad) to write a
SAS program in an input file in ASCII format. You then tell SAS to execute the program in the input file
and place the resulting output in an output file. You then use a text editor to view the output file.

In interactive windows mode, you can either type SAS statements in a Program Editor window or use
the SAS program builder. To use the SAS program builder, you use the mouse to point and click on the
appropriate selections and enter the necessary information in dialogue boxes. When SAS statements are
executed the output is displayed in an Output window. A Log window is also displayed that contains
the log for any SAS statements that are executed. The log window is very useful in writing SAS
programs. The log is displayed whether the program works or not. It repeats the SAS statements that are
executed, documents any SAS datasets that are created, gives you warnings about potential problems with
your program, and error messages for mistakes such as incorrect syntax.

This guide explains how to create and execute SAS programs in interactive window mode, using both the
Program Editor window and the program builder.

CREATING A SAS DATASET

The first step in SAS programming is to create a SAS dataset. SAS has a large number of tools that can
be used to read raw data into a SAS dataset. This process is called importing. The raw data used to
create a SAS dataset can be in a number of different formats and locations. The data is usually either
stored in an external data file or is entered manually when you write a SAS program. Data entered
manually when writing a SAS program is called in-stream data.

The most important SAS statements in a data step are DATA, INFILE, CARDS, INPUT, LIBNAME.
The DATA statement gives a name to the SAS dataset your are creating. The INFILE statement tells
SAS that the raw data are located in an external data file. The CARDS statement tells SAS that the data
will be entered manually in the program. The INPUT statement names the variables in the dataset. It also
tells SAS the layout of the raw data. The LIBNAME statement tells SAS where to store the SAS dataset
you create so that you can save it for future use.

EXTERNAL DATA FILE

The following example explains how to create a SAS dataset with raw data that is contained in an external
data file in ascii format.

Example

The data file named DATA7-2 that comes with the Ramanathan econometrics text book is an example of
an external data file in ascii format. This file contains only numbers – the names of the variables are
documented in a separate location. The file is located on a floppy disk in drive A. This file has 49
observations on 9 variables. Each row is an observation (also called a record). Each column is a variable
(also called a field). This is a common layout for most external data files. Thus, there are 49 rows and 9
columns of numbers. The names of the variables are WAGE, EDUC, EXPER, AGE, GENDER, RACE,
CLERICAL, MAINT, CRAFTS. Data for the variable WAGE are contained in column 1 in the data file.
Data for the variable EDUC are contained in column 2 in the data file. Etc.

You want to create a SAS dataset named EARNINGS with the data contained in the external data file
named DATA7-2.



                                                                                                         4
Program Editor Window

Enter the following SAS statements in the Program Editor window

DATA earnings;
INFILE ‘a:data7-2’;
INPUT wage educ exper age gender race clerical maint crafts;
PROC PRINT data=earnings;
RUN;

This is a SAS program. A SAS program can be written in either uppercase or lowercase or both. In this
example, keywords are in uppercase and the rest of a statement is in lowercase. This program has both a
data step and one procedure. It is comprised of 5 SAS statements. If you desire, you can write more than
one statement on the same line. Also, a statement can extend to more than one line. However, each
statement must end with a semicolon. In the above example, each line has one statement. The DATA
statement tells SAS to create a SAS dataset and name it EARNINGS. The INFILE statement tells SAS to
read the raw data in the external file named DATA7-2 located on the floppy disk in drive A. Note that the
name and location of the file must be enclosed in single quotation marks. The INPUT statement tells
SAS the names of the variables. The PROC PRINT statement tells SAS to display the data set
EARNINGS in the Output window so you can see it. Note that if you did not include DATA =
EARNINGS in this statement, SAS would print out the current SAS dataset. This is true for any PROC
statement. The RUN statement tells SAS to execute the previous statements.

Executing the Program

There are two ways to execute the above SAS program. 1) Click the Run button on the tool bar. This is
the button with a picture of a runner. 2) Click Locals on the menu bar. Click Submit on the locals menu.
Note that SAS will execute all statements that appear in the Program Editor window. If you want SAS to
execute a subset of statements that appear in the Program Editor window, use the mouse to highlight these
statements, and then click Run or click Locals and submit.

Viewing the Output

After SAS executes the program, the Output window appears and the data are displayed.

Storing the SAS Dataset

The SAS dataset named EARNINGS is a temporary SAS dataset. It is saved in the SAS Library
named Work. To verify this, proceed as follows. Click Globals on the menu bar. Point to Access on the
globals menu. Click Display Libraries on the access menu. In the dialogue box that appears, highlight
Work and click. This indicates that the SAS library named Work contains the SAS dataset named
EARNINGS. Once your SAS session ends, this dataset is automatically deleted. To make EARNINGS a
permanent SAS dataset, you must use a LIBNAME statement. This is explained below.

Comments

The INPUT statement tells SAS the names of the variables, the type of variable, and how the data is
arranged. In the above example, the only information provided in the INPUT statement is the names of
the variables. This is because all of variables are numeric variables and there is at least one blank space
between each of the values in the data lines in the external file. If the data set includes one or more
character variables (a variable that contains letters of the alphabet), then the symbol $ must be placed in


                                                                                                              5
the INPUT statement directly after the name of the character variable. If there is not at least one blank
space between the values in the data lines, then you must tell SAS the column number(s) in which the
data for each variable is located in the data file. An example is provided below.

CREATING A PERMANENT SAS DATASET

The SAS datasets EARNINGS and DATASET1 created in the above examples are temporary SAS
datasets. They are saved in the SAS Library named Work. Once your session ends, these data sets are
automatically deleted. The following example explains how to create a permanent SAS dataset.

Example

You want to create a permanent SAS dataset using the data contained in the file named DATA7-2 located
on a floppy disk in drive A. You want to save this SAS dataset on the floppy disk in drive A.

Program Editor Window

Enter the following SAS statements in the Program Editor window

LIBNAME econ515 ‘a:’;
DATA econ515.earnings;
INFILE ‘a:data7-2’;
INPUT wage educ exper age gender race clerical maint crafts;
RUN;

The LIBNAME statement tells SAS to store the SAS dataset that follows in the library named ECON515,
which is located on the floppy disk in drive A. Note that the location of the file must be enclosed by
single quotation marks. If this library does not exist, then SAS will create it. If this library already exists,
then SAS will store the subsequent SAS dataset in it. You can store as many SAS datasets as you want in
a single library. The DATA statement tells SAS to create a SAS dataset named EARNINGS and store it
in the library named ECON515. Note that you must prefix the name of the dataset with the name of the
library in which it will be stored. The rest of the statements are the same as in the above example.

ACCESSING A PERMANENT SAS DATASET

The following examples explain how to load a permanent SAS dataset that you have created and create
new temporary or permanent SAS datasets from it.

Example

You want to access the dataset named EARNINGS which is stored in the library named ECON515 on a
floppy disk on drive A. You want to create a temporary SAS data set named EARN1.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA earn1;
SET econ515.earnings;
RUN;




                                                                                                               6
The LIBNAME statement tells SAS the name of the library and where it is located. The DATA statement
tells SAS to create a temporary SAS dataset named EARN1. The SET statement tells SAS to access the
permanent SAS dataset named EARNINGS that is located in the library named ECON515. To verify that
you have accessed EARINGS and created EARN1, use the mouse to click Globals, point to Access, and
click Display Libraries. In the dialogue box that appears, you will see the library ECON515 listed. If
you click ECON515, you will see the dataset EARNINGS listed. If you click the library named WORK
you will see the temporary dataset EARN1 listed. Note that when you end your session, the temporary
dataset EARN1 will be deleted. If you want to store this new dataset permanently in the library named
ECON515, then replace the DATA statement above with the following DATA statement

DATA econ515.earn1;

If you want to store all changes made in the current session in the permanent SAS dataset named
EARNINGS, then replace the DATA statement above with the following DATA statement

DATA econ515.earnings;

In this case, you do not create a temporary SAS dataset. Rather, SAS overwrites the permanent SAS
dataset EARNINGS with any changes that you make to the data during the current session.

Program Builder

 To access EARNINGS click Globals, point to Access, click Display Libraries, and click the New
Libraries… button. In the Library Assignment box type Econ515. In the Folder to Assign box type a:
Click Assign. The Libraries box now includes ECON515. To create the temporary SAS dataset named
EARN1, exit the Libraries box. Click Globals. Point to Analyze. Click Interactive Data Analysis. Click
ECON515. Click EARNINGS. Click Open. This opens the spreadsheet that contains the EARNINGS
data. On the menu bar click File. Point to Save. Click Data… A Save Data box appears. Highlight the
Library name WORK. Next to Data Set: type EARN1. Click OK. If you want to store EARN1
permanently in the library named ECON515, then highlight the library ECON515 rather than the library
WORK. If you want to save any changes that you make in the current session in the dataset EARNINGS,
then simply use this dataset during your session.

CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS

Assignment statements and logical expressions can be used for many purposes, such as creating new
variables from existing variables, recoding variables, and deleting observations from the current sample.
Each of these are explained below.

ASSIGNMENT STATEMENTS

Assignment statements allow you to create new variables from existing variables. Assignment statements
use the following arithmetic operators, which are carried-out in the following order if parentheses are not
used: ** (exponentiation), * (multiplication), / (division), + (addition), - (subtraction). The operator for
the natural logarithm is LOG.

Example

You want to access the dataset EARNINGS and create a temporary dataset named EARN1 that contains
all the variables in EARNINGS plus additional variables that you want to create.



                                                                                                            7
Program Editor Window

LIBNAME econ515 ‘a:’;
DATA earn1;
SET econ515.earnings;
logwage = log(wage);
yearwage = wage*12;
daywage = wage / 30;
agesq = age**2;
agecub = age**3;
toteduc = educ + 8;
RUN;

SAS will create the variables logwage, yearwage, daywage, agesq, agecub, and toteduc, and place them in
the temporary dataset EARN1 along with all existing variables in the dataset EARNINGS.

Program Builder

Access the permanent SAS dataset named EARNINGS. Create the temporary SAS dataset named
EARN1. (See instructions in previous section). Click Edit on the menu bar. Point to Variables. Click
log(Y). Click Wage. Click the Y button. After Name: type logwage. Click the OK button. Click Edit.
Point to Variables. Click Other… Click Wage. Click the Y button. Click a + b * Y. After A: type 0.
After B: type 12. After Name: type yearwage. Click the OK button. Repeat this process to create the
remaining variables. Make sure that you save these new variables in the temporary dataset EARN1. To
do so click File on the menu bar. Point to Save. Click Data… Click the OK button.

LOGICAL EXPRESSIONS

Logical expressions use conditional IF, THEN, ELSE statements, and comparison and logical operators.
The comparison operators are:

Equal to                         =       eq
Greater than                     >       gt
Less than                        <       lt
Greater than or equal to         >=      ge
Less than or equal to            <=      le
Not equal to                     ^=      ne
In                                       in
Notin                                    notin

The logical operators are:

And                              &       and
Or                               |       or

In the following example, a description of each logical expression and its use is given directly below the
expression for ease of reference.

Example




                                                                                                             8
You want to access the dataset EARNINGS, create a temporary dataset named EARN1, and create new
variables, recode existing variables, and delete observations from the sample to construct EARN1.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA earn1;
SET econ515.earnings;

This accesses the permanent SAS dataset named EARNINGS from the library named ECON515, and
creates the temporary SAS dataset named EARN1.

IF educ > 4 THEN college = 1;
ELSE college = 0;

This creates a dummy variable named college that can take two values: 1 or 0. The IF THEN statement
assigns a value of 1 to the variable college if the variable educ is greater than 4. The ELSE statement
assigns a value of 0 to the variable college for all observations that do not have a value of one.

IF age > 50 THEN newage = 2;
ELSE IF age > 25 THEN newage = 1;
ELSE newage = 0;

This creates a multinomial variable called newage that can take three values: 2,1,or 0. The IF THEN
statement assigns a value of 2 to the variable newage if the variable age is greater than 50. The ELSE IF
THEN statement assigns a value of 1 to the variable newage if the variable age is greater than 25 and
equal to or less than 50. The ELSE statement assigns a value of 0 to the variable newage for all
observations that do not have a value of 2 or 1. Note that only one ELSE statement is allowed per IF
THEN statement.

IF gender = 1 THEN sex = ‘male’;
ELSE sex = ‘female’;

This creates a character variable named sex, that can take two names: male or female. The IF THEN
statement assigns the name male to the variable sex if the variable gender is equal to 1. The ELSE
statement assigns the name female to the variable sex for all observations that do not have the name male.

IF wage > 1300;

This keeps any observation for which the variable wage is greater than 1300. It deletes all observations
for which wage is 1300 or less.

IF exper = 1 THEN delete;

This deletes any observation for which the variable exper is equal to 1.

IF exper = 3 and gender = 1 then delete;

This deletes any observation for which both the variable exper is equal to 3 and the variable gender is
equal to 1. If either one of these conditions is not satisfied, then the observation is not deleted.



                                                                                                            9
IF educ = 11 or age > = 57 then delete;

This deletes any observation for which either the variable educ is equal to 11 or the variable age is greater
than or equal to 57.

IF wage = . THEN delete;

SAS represents a missing observation with a period (.). This deletes any observation for which the
variable wage has a missing value.

IF age = . then age = 65;

This assigns the value of 65 to the variable age for any observation that is missing.

RUN;

DELETING VARIABLES FROM A SAS DATASET

Example

You want to create two new permanent SAS datasets from the permanent SAS dataset named
EARNINGS. You want to name these new SAS datasets EARNSUB1 and EARNSUB2. You want
EARNSUB1 to contain the variables WAGE, EDUC, EXPER, AGE. You want EARNSUB2 to contain
the variables WAGE, EDUC.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA econ515.earnsub1;
SET econ515.earnings;
KEEP wage educ exper age;
DATA econ515.earnsub2;
SET econ515.earnsub1;
KEEP wage educ;
RUN;

An alternative program that would accomplish the same task is the following.

LIBNAME econ515 ‘a:’;
DATA econ515.earnsub1;
SET econ515.earnings;
DROP gender race clerical maint crafts;
DATA econ515.earnsub2;
SET econ515.earnsub1;
DROP exper age;
RUN;

The LIBNAME statement tells SAS to access and/or store permanent SAS datasets in the library named
ECON515, which is located on the floppy disk in drive A. The first DATA statement tells SAS to create
a new permanent SAS dataset named EARNSUB1 and store it in the library named ECON515. The first
SET statement tells SAS to access the permanent SAS dataset name EARNINGS located in the library


                                                                                                           10
named ECON515. The KEEP statement tells SAS to include the variables WAGE, EDUC, EXPER, AGE
from the dataset EARNINGS in the dataset EARNSUB1 (or delete the variables GENDER, RACE,
CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1). Alternatively,
the DROP statement tells SAS to delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT
from the dataset EARNINGS in the dataset EARNSUB1(or include the variables WAGE, EDUC,
EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1). The second DATA statement
tells SAS to create a new permanent SAS dataset named EARNSUB2 and store it in the library named
ECON515. The second SET statement tells SAS to access the permanent SAS dataset name EARNSUB1
located in the library named ECON515. The KEEP statement tells SAS to include the variables WAGE,
and EDUC from the dataset EARNSUB1 in the dataset EARNSUB2. Alternatively, the DROP statement
tells SAS to delete the variables EXPER and AGE from the dataset EARNSUB1 in the dataset
EARNSUB2.

Program Builder

To access EARNINGS click Globals, point to Access, click Display Libraries, and click the New
Libraries… button. In the Library Assignment box type Econ515. In the Folder to Assign box type a:
Click Assign. The Libraries box now includes ECON515. To create the permanent SAS dataset named
EARNSUB1, exit the Libraries box. Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click EARNINGS. Click Open. This opens the spreadsheet that contains the
EARNINGS data. On the menu bar click File. Point to Save. Click Data… A Save Data box appears.
Highlight the Library name ECON515. Next to Data Set: type EARNSUB1. Click OK. Close the
spreadsheet that contains the file EARNINGS. Open the dataset EARNSUB1. To do this, click Globals.
Point to Analyze. Click Interactive Data Analysis. Click ECON515. Click EARNSUB1. Click Open. To
delete the variable GENDER click on GENDER in the spreadsheet. Click Edit in the menu bar. Click
Delete. This deletes the variable GENDER from the spreadsheet. Repeat this process to delete the
variables RACE, CLERICAL, MAINT, and CRAFTS. Save the file EARNSUB1. To do so, click File.
Point to Save. Click Data… A Save Data box appears. Click OK. To create the permanent SAS dataset
EARNSUB2, repeat the steps given above and delete the variables EXPER and AGE from the dataset
EARNSUB1 to create EARNSUB2.

DISPLAYING A SAS DATASET

Example

You want to display the data in the permanent SAS dataset named EARNINGS.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA earn1;
SET econ515.earnings;
PROC PRINT data=earn1;
RUN;

The temporary SAS dataset EARN1 that contains the data from the permanent SAS dataset EARNINGS
will be displayed in the Output Window.

Program Builder




                                                                                                 11
Access the permanent SAS dataset named EARNINGS. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click EARNINGS. Click Open. A spreadsheet appears that contains the EARNINGS
data.

COMBINING TWO OR MORE DATASETS

Almost any combination of SAS datasets is possible. Three often used techniques for combining SAS
datasets are matched merge, concatenation, and interleaving. This section explains the matched merge
technique. The matched merge allows you to combine two or more datasets connecting observations by a
common variable. The observations in the datasets are matched according to the values of a BY variable.
Each observation in the new dataset will contain all of the variables of each of the separate datasets.

Example

You want to combine the two external ascii data files named DOCTOR1 and DOCTOR2 located on the
floppy disk in drive A into a single permanent SAS dataset named DOCTOR and store it in the library
named ECON515 on the floppy disk in drive A.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA doctor1;
INFILE ‘a:doctor1’;
INPUT id visits hours aides docwage aidewage;
DATA doctor2;
INFILE ‘a:doctor2’;
INPUT id price ydum age pcinc poptot;
RUN;
DATA econ515.doctor;
MERGE doctor1 doctor2;
BY id;
RUN;

This program creates two temporary SAS datasets named DOCTOR1 and DOCTOR2. It then merges
these two temporary SAS datasets by the variable ID to create the permanent SAS dataset named
DOCTOR. The dataset DOCTOR contains the values of the variables ID, VISITS, HOURS, AIDES,
DOCWAGE, AIDEWAGE, PRICE, YDUM, AGE, PCINC, and POPTOT for each ID number (i.e., each
physician).

FREQUENCY DISTRIBUTIONS AND SCATTER DIAGRAMS

The easiest way to display frequency distributions and scatter diagrams is to use the program builder.

Example

You want to access the permanent SAS dataset named DOCTOR which is stored in the library named
ECON515 on a floppy disk in drive A. You want to display an absolute frequency distribution for the
variable VISITS, a relative frequency distribution for the variable VISITS, and a scatter diagram for the
variables VISITS and HOURS.



                                                                                                            12
Program Builder

Access the permanent SAS dataset named DOCTOR. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click DOCTOR. Click Open. A spreadsheet appears that contains the DOCTOR data.
Click Analyze on the menu bar. A pop-up menu appears that has 8 choices. Three of these choices are
Histogram/Bar Chart (Y), Scatter Plot (Y,X), and Distribution (Y). To display an absolute frequency
distribution of VISITS, click Histogram/Bar Chart (Y). Click Visits. Click the Y button. Click the OK
button. The absolute frequency distribution of VISITS is now displayed. To display a relative frequency
distribution of VISITS, click Distribution (Y). Click Visits. Click the Y button. Click the OK button.
The relative frequency distribution of VISITS is now displayed. In addition, an assortment of descriptive
statistics, such as the mean, variance, standard deviation, coefficient of variation, etc., are also provided.
To display a scatter diagram of VISITS and HOURS, click Scatter Plot (Y,X). Click Visits. Click the Y
button. Click Hours. Click the X button. Click the OK button. A scatter diagram for VISITS and
HOURS is now displayed.

DESCRIPTIVE STATISTICS

You want to access the permanent SAS dataset named DOCTOR which is stored in the library named
ECON515 on a floppy disk in drive A. You want to calculate the mean, variance, standard deviation, and
coefficient of variation for the variables VISITS, HOURS, and AIDES. You also want to calculate the
covariances and correlation coefficients for these variables.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA doc1;
SET econ515.doctor;
PROC MEANS mean var std cv;
VAR visits hours aides;
PROC CORR COV;
VAR visits hours aides;
RUN;

The LIBNAME, DATA and SET statements access the permanent SAS dataset named DOCTOR and
create the temporary SAS dataset named DOC1. Note that this temporary dataset will be deleted when
your session ends. The PROC MEANS statement and the options MEAN, VAR, STD, and CV tell SAS
to calculate the mean, variance, standard deviation, and coefficient of variation. The VAR statement tells
SAS to calculate these statistics for the variables VISITS, HOURS, and AIDES only. If you omit the
VAR statement, then SAS will calculate descriptive statistics for all variables in the dataset DOCTOR.
The PROC CORR COV statement tells SAS to calculate the correlation matrix and covariance matrix.
The VAR statement tells SAS to calculate the correlation coefficients and covariances for the variables
VISITS, HOURS, and AIDES only. If you want SAS to provide a full range of descriptive statistics, you
can replace the PROC MEANS mean var std cv; statement with the following statement.

PROC UNIVARIATE;

SAS will provide a large number of different types of descriptive statistics for the variables VISITS,
HOURS, and AIDES.

Program Builder


                                                                                                            13
Access the permanent SAS dataset named DOCTOR. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click DOCTOR. Click Open. A spreadsheet appears that contains the DOCTOR data.
Click Analyze on the menu bar. Click Distribution (Y). Click Visits. Click the Y button. Click Hours.
Click the Y button. Click Aides. Click the Y button. Click the OK button. Relative frequency
distributions for VISITS, HOURS, and AIDES are now displayed. In addition, a full range of descriptive
statistics are provided below the relative frequency distributions. These are the same descriptive statistics
that are provided by the statement PROC UNIVARIATE. To calculate the covariances and correlation
coefficients, with the spreadsheet open click Analyze on the menu bar. Click Multivariate (Y’s). Click the
Output button. Click the boxes next to CORR and COV. Click the OK button. Click Visits. Click the Y
button. Click Hours. Click the Y button. Click Aides. Click the Y button. Click the OK button.

LINEAR REGRESSION

Many of the following examples use the data in the external data files named CONSUMER and
MACROCON, which are assumed to be located on a floppy disk in drive A. The following program
creates a permanent SAS dataset named CONSUMER and saves it in the library named ECON515
located on the floppy disk in drive A.

LIBNAME econ515 ‘a:’;
DATA econ515.consumer;
INFILE ‘a:consumer’;
INPUT qbeef pbeef income pfish;
RUN;

The following program creates a permanent SAS dataset named MACROCON and saves it in the library
named ECON515 located on the floppy disk in drive A.

LIBNAME econ515 ‘a:’;
DATA econ515.macrocon;
INFILE ‘a:macrocon’;
INPUT year cons disinc price prime un;
RUN;

Example #1

You want to use the data in the SAS dataset CONSUMER to run a linear regression of QBEEF on
INCOME. You also want to print the variance-covariance matrix for the parameter estimates.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;
PROC REG;
MODEL qbeef = income / covb;
RUN;

The PROC REG statement tells SAS to run a linear regression using the OLS estimator. The MODEL
statement tells SAS the dependent variable, independent variable(s), and any optional output to print. The


                                                                                                           14
dependent variable is on the left-hand side of the equal sign and the independent variable(s) are on the
right-hand side. The / separates the regression equation from the options. The option covb tells SAS to
display the variance-covariance matrix of estimates in the Output window along with the standard
regression results. If you do not give SAS any options, then you do not have to include the / .

Program Builder

Access the permanent SAS dataset named CONSUMER. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click CONSUMER. Click Open. A spreadsheet appears that contains the CONSUMER
data. Click Analyze on the menu bar. Click Fit(X Y). Click the Output button. Click the box next to
Estimated Covariance Matrxix. Click the OK button. Click Qbeef. Click the Y button. Click Income.
Click the X button. Click the OK button.

Example #2

You want to use the data in the SAS dataset CONSUMER to run a linear regression of QBEEF on
INCOME. You want to test the null hypothesis that coefficient of INCOME is 1.5.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;
PROC REG;
MODEL qbeef = income;
TEST income = 1.5;
RUN;

The TEST statement tells SAS to test an hypothesis or restriction on the parameters of the statistical
model. The equation INCOME = 1.5 tells SAS to test the hypothesis that the coefficient of INCOME is
equal to 1.5. Note that SAS will print-out an F-statistic and P-value, not a t-statistic. In this example, the
F-statistic is 1.16 and the P-value 0.29. However, it can be shown that the square root of the F-statistic,
which is 1.08, is the absolute value of the t-statistic and the P-value for the F-test is exactly the same as
the P-value for the t-test. Therefore, the TEST statement can be used to conduct a t-test.

Program Builder

Not applicable.

Example #3

You want to use the data in the SAS dataset CONSUMER to run a linear regression of QBEEF on
INCOME, PBEEF, and PFISH. You also want to print the variance-covariance matrix for the parameter
estimates.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;


                                                                                                            15
PROC REG;
MODEL qbeef = income pbeef pfish / covb;
RUN;

This program is the same as the program for example #1, except we include the two additional
independent variables, PBEEF and QBEEF, in the MODEL statement.

Program Builder

Access the permanent SAS dataset named CONSUMER. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click CONSUMER. Click Open. A spreadsheet appears that contains the CONSUMER
data. Click Analyze on the menu bar. Click Fit(X Y). Click the Output button. Click the box next to
Estimated Covariance Matrxix. Click the OK button. Click Qbeef. Click the Y button. Click Income.
Click the X button. Click Pbeef. Click the X button. Click Pfish. Click the X button. Click the OK
button.

Example #4

You want to use the data in the SAS dataset CONSUMER to run a linear regression of QBEEF on
INCOME, PBEEF, and PFISH. You want to test the following hypotheses. 1) The price of beef and the
price of fish have no joint effect on beef consumption; that is, the coefficient of PBEEF and the
coefficient of PFISH are jointly equal to zero 2) The coefficient of PBEEF and the coefficient of PFISH
are equal in magnitude and opposite in sign; that is, the sum of coefficients of PBEEF and PFISH is zero.
3) The sum of the coefficients of INCOME, PBEEF, and PFISH is equal to 5.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;
PROC REG;
MODEL qbeef = income pbeef pfish;
TEST pbeef = 0, pfish = 0;
TEST pbeef + pfish = 0;
TEST income + pbeef + pfish = 5;
RUN;

Note that one or more TEST statements can follow a MODEL statement. Because we are testing three
different hypotheses for the same regression model, we have three TEST statements that follow the model
statement. Note that when you are testing a joint hypothesis (i.e., two or more restrictions jointly), after
the TEST statement you separate the equation that defines each hypothesis by a comma.

Program Builder

Not applicable.

Example #5

You want to use the data in the SAS dataset CONSUMER to run a linear regression of QBEEF on
INCOME, PBEEF, and PFISH, and impose the restriction that the coefficient of PBEEF and the


                                                                                                         16
coefficient of PFISH are equal in magnitude and opposite in sign; that is, the sum of coefficients of
PBEEF and PFISH is zero. Thus, your objective is to estimate a restricted model that imposes a
restriction on the model parameters.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;
PROC REG;
MODEL qbeef = income pbeef pfish;
RESTRICT pbeef + pfish = 0;
RUN;

The RESTRICT statement tells SAS to impose a restriction on the parameters of the statistical model.
The restriction that you want to impose is given by the equation after the RESTRICT statement. Note
that the format of the RESTRICT statement is identical to the format of the TEST statement. SAS will
display the parameter estimates for the restricted model in the Output window. In addition, it provides an
estimate for a parameter called RESTRICT. This is a parameter estimate for a Lagrange parameter that is
introduced during the estimation process. If the coefficient of RESTRICT is zero, then the restricted and
unrestricted estimates are not significantly different, which means that the restriction has no effect. In this
example, a t-test cannot reject the null hypothesis that the coefficient of RESTRICT is zero. This
indicates that imposing the restriction is valid.

Program Builder

Not applicable.

Example #6

You want to use the SAS dataset named CONSUMER to run a linear regression of QBEEF on INCOME,
PBEEF and PFISH. You want to check for multicollinearity among the independent variables. To do this
you want to run a regression of each independent variable on all remaining independent variables so you
can calculate variance inflation factors. You also want to calculate the correlation coefficients for the
independent variables.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;
PROC REG;
MODEL qbeef = income pbeef pfish;
MODEL income = pbeef pfish;
MODEL pbeef = income pfish;
MODEL pfish = income pbeef;
PROC CORR;
VAR income beef pfish;
RUN;




                                                                                                            17
You can use the R2 statistic for the last three models to calculate variance inflation factors for INCOME,
PBEEF and PFISH. You can check the correlation matrix for high correlation coefficients between the
independent variables. Note that SAS will display certain multicollinearity diagnostics, such as
eigenvalues and condition indexes, if you use the MODEL statement

MODEL qbeef = income pbeef pfish / collin;

Program Builder

Run the 4 separate regressions using example #3 as a prototype of how to run a regression. Calculate
variances and covariances as in section entitled DESCRIPTIVE STATISTICS. If you want SAS to
display certain multicollinearity diagnostics, such as eigenvalues and condition indexes, before you run
the regression of QBEEF on INCOME, PBEEF, and PFISH, click the Output button in the Fit (Y X)
dialogue box. Click the box next to Collinearity Diagnostics. Click the OK button.

Example #7

You want to use the SAS dataset named CONSUMER to run a linear regression of QBEEF on INCOME.
You want to do a Lagrange multiplier test to test whether the variables PBEEF and PFISH should be
included in the model.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA beef1;
SET econ515.consumer;
PROC REG;
MODEL qbeef = income;
OUTPUT out=beef1 residual=resid;
PROC REG;
MODEL resid = income pbeef pfish;
Run;

The OUTPUT statement that follows the MODEL statement for the regression of QBEEF on INCOME
tells SAS to save the residuals from this regression as the variable named RESID (residual=resid), and
include the variable named RESID in the temporary SAS data set named BEEF1 (out=beef1). To
calculate the Lagrange multiplier test statistic, take the unadjusted R2 statistic from the regression of
RESID on INCOME, PBEEF, and PFISH (R2 = 0.39) and multiply it by the sample size (n = 30). For this
example, the Lagrange multiplier test statistic is LM = (0.39)(30) = 11.7.

Program Builder

Run the regression of QBEEF on INCOME using example #3 as a prototype of how to run a regression.
However, before you run the regression click the Output button in the Fit (Y X) dialogue box. Click the
Output Variables button. Click the box next to Residual. Click the OK button. When you run the
regression, SAS will save the residuals as the variable R_QBEEF_2. This variable will now appear in the
spreadsheet containing the CONSUMER data. Run the regression of R_QBEEF_2 on INCOME,
PBEEF, and PFISH. Calculate LM test statistic using the output from this regression.

Example #8



                                                                                                           18
You want to use the SAS dataset named EARNINGS to estimate a varying slope parameter model where
WAGE depends upon EXPER, and the coefficient of EXPER depends upon GENDER.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA earn1;
SET econ515.earnings;
int = gender*exper;
PROC REG;
MODEL wage = exper int;
RUN;

Note that to estimate this model, you first had to create an interaction term for EXPER and GENDER.

Program Builder

Access the permanent SAS dataset named EARNINGS. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Click Globals. Point to Analyze. Click Interactive Data Analysis.
Click ECON515. Click EARNINGS. Click Open. A spreadsheet appears that contains the EARNINGS
data. Click Analyze on the menu bar. Click Fit(Y X). Click Wage. Click the Y button. Click Exper.
Click the X button. Click Exper. Hold down the Ctrl key on the key board and click Gender. Click the
Cross button. This creates the interaction term for EXPER and GENDER. Click the OK button.

Example #9

You want to use the SAS dataset named DOCTOR to run a linear regression of VISITS on HOURS and
AIDES. You then want to use White’s test to test for heteroscedasticity.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA doc1;
SET econ515.doctor;
PROC REG;
MODEL visits = hours aides;
OUTPUT out=doc1 residual=resid;
DATA doc2;
SET doc1;
hourssq = hours**2;
aidessq = aides**2;
ha = hours*aides;
residsq = resid**2;
PROC REG;
MODEL residsq = hours aides hourssq aidessq ha;
RUN;

Note that in this program we use two DATA statements. The first DATA statement creates the temporary
SAS dataset named DOC1, which contains all of the variables in the permanent SAS dataset named
DOCTOR. The second DATA statement creates the temporary SAS dataset named DOC2, which
contains all of the variables in the temporary SAS dataset named DOC1 plus the variable RESID that was


                                                                                                       19
saved in DOC1 by using the OUTPUT statement. The variables HOURSSQ, AIDESSQ, HA, and
RESIDSQ that are created with assignment statements are placed in the dataset DOC2. The dataset
DOC2 is then used for the regression of RESIDSQ on HOURS, AIDES, HOURSSQ, AIDESSQ, and HA.
To calculate the Lagrange multiplier test statistic for the White test, take the unadjusted R2 statistic from
this regression (R2 = 0.19) and multiply by the sample size (n = 30). For this example, the Lagrange
multiplier test statistic is LM = (0.19)(87) = 16.53.

Program Builder

Access the permanent SAS dataset named DOCTOR. (See section entitled ACCESSING A
PERMANENT SAS DATASET). Use example #7 as a prototype. To create the new variables needed to
calculate the test statistic for White’s test, see ASSIGNMENT STATEMENTS in section entitled
CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS.

Example #10

You want to use the SAS dataset named DOCTOR to run a linear regression of VISITS on HOURS and
AIDES. You then want to estimate this model using the FGLS estimator (weighted least squares)
assuming that the variance of the error term is a linear function of HOURS.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA doc1;
SET econ515.doctor;
PROC REG;
MODEL visits = hours aides;
OUTPUT out=doc1 residual=resid;
DATA doc2;
SET doc1;
residsq = resid**2;
PROC REG;
MODEL residsq = hours;
OUTPUT out=doc2 predicted=varhat;
DATA doc3;
SET doc2;
IF varhat <= 0 THEN varhat = residsq;
sdhat = sqrt(varhat);
w = 1/sdhat;
PROC REG;
MODEL visits = hours aides;
WEIGHT w;
RUN;

In this program we use three DATA statements to create three temporary SAS datasets. The OUTPUT
statement that follows the MODEL statement for the regression of RESIDSQ on HOURS tells SAS to
save the predicted values of RESIDSQ for this regression as the variable named VARHAT
(predicted=varhat), and include this variable in the temporary SAS dataset named DOC2 (out=doc2). The
conditional IF THEN statement tells SAS to replace any value of the variable VARHAT that is negative
or zero with the value for the variable RESIDSQ. We must do this because we cannot take the square
root of zero or a negative number. The function SQRT tells SAS to find the square root of the variable


                                                                                                          20
VARHAT. The WEIGHT statement that follows the last MODEL statement tells SAS to run a weighted
least squares regression using the variable W as the weight. This is the FGLS estimator.

Program Builder

Not applicable.

Example #11

You want to use the SAS dataset named MACROCON to run a linear regression of real consumption
expenditures (RCONS) on real disposable income (RDISINC) and PRIME. Real consumption
expenditures is defined as CONS divided by PRICE, with the appropriate adjustment for the decimal
point. Real disposable income is defined as DISINC divided by PRICE, with the appropriate adjustment
for the decimal point. You want to do a Largrange multiplier test to test for second-order autocorrelation.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA con1;
SET econ515.macrocon;
rcons = cons/(price/100);
rdisinc = disinc/(price/100);
PROC REG;
MODEL rcons = rdisinc prime;
OUTPUT out=con1 residual=resid;
DATA con2;
SET con1;
resid1 = lag1(resid);
resid2 = lag2(resid);
PROC REG;
MODEL resid = rdisinc prime resid1 resid2;
RUN;

The assignment statements for RCONS and RDISINC tell SAS to create the new variables RCONS and
RDISINC and save them in the temporary SAS dataset named CON1. The OUTPUT statement that
follows the MODEL statement for the regression of RCONS on RDISINC and PRIME tells SAS to save
the residuals from this regression as the variable named RESID, and include the variable named RESID in
the temporary SAS dataset named CON1. The second DATA statement tells SAS to create a second
temporary SAS dataset named CON2. The SET statement tells SAS to include all of the variables in the
temporary SAS dataset CON1 in the temporary SAS dataset named CON2. The assignment statement
RESID1 = LAG1(RESID) tells SAS to create a new variable named RESID1 that is equal to the variable
RESID lagged one period. The assignment RESID2 = LAG2(RESID) tells SAS to create a new variable
named RESID2 that is equal to the variable RESID lagged two periods. The variables RESID1 and
RESID2 are saved in the temporary SAS dataset CON2. To calculate the Lagrange multiplier test
statistic, take the unadjusted R2 statistic from the regression of RESID on RDISINC, PRIME, RESID1,
and RESID2 (R2 = 0.31) and multiply by the sample size (n = 35). Note that you lose two observations
when running this regression because you have a variable that is lagged two periods. For this example,
the Lagrange multiplier test statistic is LM = (0.31)(35) = 10.8.

Program Builder



                                                                                                         21
Too cumbersome because of the need to create lagged variables.

Example #12

You want to use the SAS dataset named MACROCON to run a linear regression of real consumption
expenditures (RCONS) on real disposable income (RDISINC) and PRIME. You want to estimate this
model using the FGLS Cochrane-Orcutt estimator to correct for first-order autocorrelation.

Program Editor Window

LIBNAME econ515 ‘a:’;
DATA con1;
SET econ515.macrocon;
rcons = cons/(price/100);
rdisinc = disinc/(price/100);
PROC AUTOREG itprint;
MODEL rcons = rdisinc prime / nlag=1 iter converge=0.0001;
RUN;

The PROC AUTOREG statement tells SAS to run a linear regression and correct for autocorrelation.
The option ITPRINT tells SAS to print out each iteration that SAS performs so you can see how the
estimate of the autocorrelation coefficient () changes. The MODEL statement tells SAS to run a linear
regression of RCONS on RDISINC and PRIME. The / tells SAS that options follow. The option
NLAG=1 tells SAS to correct first-order autocorrelation. The ITER option tells SAS to use Cochrane-
Orcuitt estimator, which involves doing iterations. The CONVERGE=0.0001 option tells SAS to stop
iterating when the estimate of  from two successive iterations differ by no more than 0.0001. If you do
not include a the CONVERGE option, SAS will use its own default value for when convergence is
achieved. It is important to note that SAS will print out the negative of the estimate of the autocorrelation
coefficient, . Thus, if SAS prints a negative  it is positive, indicating positive autocorrelation. If SAS
prints a positive  it is negative indicating negative autocorrelation.

Program Builder

Create the new variables RCONS and RDISINC and place them in a temporary SAS dataset named
CON1 along with all variables in the SAS dataset MACROCON. The easiest way to do this is to write
the following program in the Program Editor Window.

LIBNAME econ515 ‘a:’;
DATA con1;
SET econ515.macrocon;
rcons = cons/(price/100);
rdisinc=disinc/(price/100);
RUN;

Click Globals. Click SAS/ASSIST. Click the box named Data Analysis. Click the box named Time
Series. Click Regression with correction for autocorrelation… Click the Active Data Set button. Scroll
down the list of data sets until you find the temporary SAS dataset name WORK.CONS1, which is most
likely at the bottom of the list. Click on CONS1. Click the Dependent Variable button. Click RCONS.
Click OK. Click the Independent Variable button. Click RDISINC and PRIME. Click OK. Click on the
Lags To Be Fit button. Enter 1 after Order of the autoregressive process. Click on the Additional



                                                                                                           22
Options button. Click on Estimation Options. Click next to Yule-Walker. Click next to Iterate Yule-
Walker Estimation Method. Click OK. Click the Locals button on the menu bar. Click Run. Note: If you
want to print out each iteration that SAS performs so you can see how the estimate of the autocorrelation
coefficient () changes, in the ADDITIONAL OPTIONS dialogue box, click Printing options… Click
next to Details at Each Iteration.

NONLINEAR, SYSTEMS OF EQUATIONS, AND LIMITED DEPENDENT VARIABLE
MODELS

Many of the following examples use the data in the external data files named DEMAND, PRODUCER,
and LABOR, which are assumed to be located on a floppy disk in drive A. The following program
creates permanent SAS datasets for each of these files. Note that the Program Builder cannot be used for
these more complex models; therefore, you must type the appropriate SAS statements in the Program
Editor Window.

LIBNAME econ515 ‘a:’;
DATA econ515.demand;
INFILE ‘a:demand’;
INPUT p1 p2 p3 I q1 q2 q3;
RUN;

LIBNAME econ515 ‘a:’;
DATA econ515.producer;
INFILE ‘a:producer’;
INPUT output labor capital land pcapital pland poutput plabor ;
RUN;

LIBNAME econ515 ‘a:’;
DATA econ515.labor;
INFILE ‘a:labor’;
INPUT lfp whrs kl6 k618 wa we ww hhrs ha he hw faminc mtr wmed wfed un cit ax;
RUN;

NONLINEAR LEAST SQUARES REGRESSION

Example

You want to use the SAS dataset DEMAND to create a new variable, Q1EXP = P1*Q1, and use the
nonlinear least squares estimator to run a regression of Q1EXP on P1, P2, P3, and I, that is nonlinear in
parameters. In particular, you want to estimate a Stone-Geary demand equation.


LIBNAME econ515 ‘a:’;
DATA demand1;
SET econ515.demand;
q1exp = p1*q1;
PROC NLIN method=dud;
PARMS a=50 c=0.5 d=30 e=40;
MODEL q1exp = a*p1 + c*(i – a*p1 –d*p2 – e*p3);
RUN;



                                                                                                            23
An assignment statement is used to create the new variable Q1EXP. The PROC NLIN statement tells
SAS to run a nonlinear regression using the nonlinear least squares estimator. The option
METHOD=DUD tells SAS to compute numerical derivatives when applying the nonlinear least squares
estimator. You can provide your own derivatives by using the statement DER.parametername = followed
by the expression for the derivative, for each parameter. The PARMS statement tells SAS the names of
the parameters and their starting values. The MODEL statement tells SAS the functional form to
estimate. Note that the default algorithm is the Gauss-Newton iterative algorithm. Other algorithms are
also available. To use an alternative algorithm, you must specify it as an option in the PROC NLIN
statement.

LINEAR SEEMINGLY UNRELATED REGRESSIONS

Example #1

You want to use the SAS dataset DEMAND to estimate the parameters of two linear equations jointly.
For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For
equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I.

LIBNAME econ515 ‘a:’;
DATA demand1;
SET econ515.demand;
PROC SYSLIN sur vardef=n;
good1: MODEL q1 = p1 p2 p3 i / covb;
good2: MODEL q2 = p1 p2 i / covb;
RUN;

The PROC SYSLIN statement tells SAS that you are going to estimate a system of linear equations. The
option SUR tells SAS to estimate the system of equations using the FGLS estimator (Zellner’s SUR
estimator). If you want SAS to estimate the system of equations using the interated FGLS estimator
(iterated SUR estimator), replace the option SUR with the option ITSUR. The option VARDEF=N tells
SAS to use the sample size as the denominator when calculating estimates of the variances and
covariances. If you omit this option, SAS will use the degrees of freedom (n – k) as the denominator.
The MODEL statement tells SAS the equation to estimate. The model statement is prefixed with a name
for the equation followed by a colon. In the above example, the name of the first equation is GOOD1 and
the name of the second equation is GOOD2. You may use any name you desire. The option COVB tells
SAS to print out the variance covariance matrix of estimates for the system of equations.

Example #2

You want to use the SAS dataset DEMAND to estimate the parameters of two linear equations jointly.
For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For
equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I. You want to test
the two cross-equation restrictions that the coefficient of P1 in equation 1 is equal to the coefficient of P1
in equation 2, and the coefficient of P2 in equation 1 is equal to the coefficient of P2 in equation 2.

LIBNAME econ515 ‘a:’;
DATA demand1;
SET econ515.demand;
PROC SYSLIN sur vardef=n;
good1: MODEL q1 = p1 p2 p3 i ;
good2: MODEL q2 = p1 p2 i ;


                                                                                                            24
STEST good1.p1 = good2.p1, good1.p2 = good2.p2;
RUN;

The STEST statement tells SAS that you want to test a cross-equation restriction. The form of the STEST
statement is the same as the TEST statement in PROC REG, except you must attach the name of the
equation to the variable so that SAS knows to which equation the variable belongs. The STEST
statement calculates the F-statistic for the approximate F-Test.

Example #3

You want to use the SAS dataset DEMAND to estimate the parameters of two linear equations jointly.
For equation 1, the dependent variable is Q1. The independent variables are P1, P2, P3, and I. For
equation 2, the dependent variable is Q2. The independent variables are P1, P2, and I. You want to
impose the two cross-equation restrictions that the coefficient of P1 in equation 1 is equal to the
coefficient of P1 in equation 2, and the coefficient of P2 in equation 1 is equal to the coefficient of P2 in
equation 2.

LIBNAME econ515 ‘a:’;
DATA demand1;
SET econ515.demand;
PROC SYSLIN sur vardef=n;
good1: MODEL q1 = p1 p2 p3 i ;
good2: MODEL q2 = p1 p2 i ;
SRESTRICT good1.p1 = good2.p1, good1.p2 = good2.p2;
RUN;

The SRESTRICT statement tells SAS that you want to impose a cross-equation restriction. The form of
the SRESTRICT statement is the same as the RESTRICT statement in PROC REG, except you must
attach the name of the equation to the variable so that SAS knows to which equation the variable belongs.

NONLINEAR SEEMINGLY UNRELATED REGRESSIONS

Example

You want to use the SAS dataset DEMAND to create two new variables, Q1EXP = P1*Q1, and Q2EXP =
P2*Q2, and estimate two equations jointly that are nonlinear in parameters. For equation 1, the dependent
variable is Q1. The independent variables are P1, P2, P3, and I. For equation 2, the dependent variable is
Q2. The independent variables are P1, P2, and I.

LIBNAME econ515 ‘a:’;
DATA demand1;
SET econ515.demand;
q1exp = p1*q1;
q2exp = p2*q2;
PROC MODEL;
PARMS a c d e f;
q1exp = a*p1 + c*(i – a*p1 – d*p2 - e*p3);
q2exp = d*p2 + f*( i – a*p1 – d*p2 - e*p3);
FIT q1exp q2exp / itsur;
RUN;



                                                                                                            25
Two assignment statements are used to create the new variables Q1EXP and Q2EXP. The PROC
MODEL procedure can be used to estimate, and simulate, systems of linear or nonlinear equations. The
PROC MODEL statement tells SAS that you are going to estimate or simulate a system of linear or
nonlinear equations. The PARMS statement tells SAS the names of the parameters and their starting
values. If you don’t include starting values (as in this example), then SAS will use zero as the starting
value for each parameter. The next two assignment statements tell SAS the specific functional form of
the equations to be estimated. Note that in this example, the values of the parameters for a, d, and e are
forced to be the same in both equations because the same letter is used to designate the parameters in each
equation. The FIT statement tells SAS the equations to be estimated, which are indicated by the left-hand
side variable. The option ITSUR tells SAS to estimate the system of equations using the iterated
seemingly unrelated regressions estimator. The default maximum number of iterations is 40. If you want
to increase or decrease the maximum iterations, then after ITSUR include the option MAXIT = and the
number of iterations.

LINEAR TWO-STAGE LEAST SQUARES REGRESSION

Example

You want to use the SAS dataset PRODUCER to create three new variables, the logarithm of OUTPUT,
LABOR, and CAPITAL, and use the two-stage least squares (2SLS) estimator to run a linear regression
of the log of output on the log of labor and the log of capital. You assume that OUTPUT and LABOR are
endogenous variables. You assume that CAPITAL, LAND, PCAPITAL, and POUTPUT are exogenous
variables.

LIBNAME econ515 ‘a:’;
DATA producer;
SET econ515.producer;
loutput = log(output);
llabor = log(labor);
lcapital = log(capital);
PROC SYSLIN 2sls vardef=n first;
ENDOGENOUS loutput llabor;
INSTRUMENTS lcapital land pcapital poutput;
pf: MODEL loutput = llabor lcapital;
RUN;

The first three assignment statements create the new variables. The PROC SYSLIN statement tells SAS
that you are going to estimate at least one equation in a system of linear equations. The option 2SLS tells
SAS to estimate the system of equations using the two-stage least squares estimator. The option
VARDEF=N tells SAS to use the sample size as the denominator when calculating estimates of the
variances and covariances. If you omit this option, SAS will use the degrees of freedom (n – k) as the
denominator. The option FIRST tells SAS to print out the results of the first-stage regression. Note that
since you used the option VARDEF=N, the standard errors for the first stage regression and second stage
regression will use the sample size as the denominator when making the calculation. The
ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells
SAS the variables that you will use as instruments to create an instrumental variable(s). The MODEL
statement tells SAS the equation to estimate. The model statement is prefixed with a name for the
equation followed by a colon. In the above example, the name of the equation is PF (which is short for
production function). You may use any name you desire.

NONLINEAR TWO-STAGE LEAST SQUARES REGRESSION


                                                                                                         26
You want to use the SAS dataset PRODUCER to use a nonlinear two-stage least squares (N2SLS)
estimator to run a regression of OUTPUT on LABOR and CAPITAL, that is nonlinear in parameters. In
particular, you want to estimate a constant elasticity of substitution (CES) production function. You
assume that OUTPUT and LABOR are endogenous variables. You assume that CAPITAL, LAND,
PCAPITAL, and POUTPUT are exogenous variables.

LIBNAME econ515 ‘a:’;
DATA producer;
SET econ515.producer;
PROC MODEL;
PARMS a=2.1 e=0.3 c=0.1 d=0.5;
output = a*(e*labor**(-c) + (1-e)*capital**(-c))**(-d/c);
ENDOGENOUS output labor;
INSTRUMENTS capital land pcapital poutput;
FIT output / n2sls;
RUN;

The PROC MODEL statement tells SAS that you are going to estimate or simulate at least one equation
in a system of linear or nonlinear equations. The PARMS statement tells SAS the names of the
parameters and their starting values. The next assignment statement tells SAS the specific functional
form of the equations to be estimated. The ENDOGENOUS statement tells SAS the endogenous
variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instruments to
create an instrumental variable(s). The FIT statement tells SAS the equation(s) to be estimated, which are
indicated by the left-hand side variable. The option N2SLS tells SAS to estimate the equation(s) using
the nonlinear two-stage least squares estimator. This estimator uses an iterative procedure to obtain the
estimates. The default maximum number of iterations is 40. If you want to increase or decrease the
maximum iterations, then after N2SLS include the option MAXIT = and the number of iterations.

LINEAR THREE-STAGE LEAST SQUARES REGRESSION

Example

You want to use the SAS dataset PRODUCER to estimate two simultaneous equations jointly. Equation
1 is a production function. The left-hand side variable is the log of OUTUT. The right-hand side
variables are the log of LABOR and the log of CAPITAL. Equation 2 is a labor demand equation. The
left-hand side variable is the log of LABOR. The right-hand side variable is the log of the real price of
labor (LRPLAB), the log of the real price of capital (LRPCAP), and the log of OUTPUT. You assume
that OUTPUT and LABOR are endogenous variables. You assume that CAPITAL and the real prices
are exogenous variables.

LIBNAME econ515 ‘a:’;
DATA producer;
SET econ515.producer;
loutput = log(output);
llabor = log(labor);
lcapital = log(capital);
lrplab = log(plabor/poutput);
lrpcap = log(pcapital/poutput);
PROC SYSLIN 3sls vardef=n first;
ENDOGENOUS loutput llabor;


                                                                                                        27
INSTRUMENTS lcapital lrplab lrpcap;
pf: MODEL loutput = llabor lcapital;
ld: MODEL llabor = lrplab lrpcap loutput;
RUN;

The first five assignment statements create the new variables. The PROC SYSLIN statement tells SAS
that you are going to estimate a system of linear equations. The option 3SLS tells SAS to estimate the
system of equations using the three-stage least squares estimator. If you want SAS to estimate the system
of equations using the iterated three-stage least squares estimator, replace the option 3SLS with the option
IT3SLS. The option VARDEF=N tells SAS to use the sample size as the denominator when calculating
estimates of the variances and covariances. If you omit this option, SAS will use the degrees of freedom
(n – k) as the denominator. The option FIRST tells SAS to print out the results of the first-stage
regression(s). Note that since you used the option VARDEF=N, the standard errors for both the first
stage regression and second stage regression will use the sample size as the denominator when making the
calculation. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS
statement tells SAS the variables that you will use as instruments to create an instrumental variable(s).
The MODEL statements tells SAS the equations to estimate. The model statement is prefixed with a
name for the equation followed by a colon. In the above example, the name of the equations are PF and
LD. You may use any name you desire.

NONLINEAR THREE-STAGE LEAST SQUARES REGRESSION

Example

You want to use the SAS dataset PRODUCER to estimate two simultaneous equations joint; at least one
of these equations is nonlinear in parameters. Equation 1 is a constant elasticity of substitution (CES)
production function, and therefore is nonlinear in parameters. The left-hand side variable is OUTPUT.
The right-hand side variables are LABOR and CAPITAL. Equation 2 is a labor demand equation that is
linear in parameters. The left-hand side variable is LABOR. The right-hand side variables are the real
price of labor (RPLAB), the real price of capital (RPCAP), and OUTPUT. You assume that OUTPUT
and LABOR are endogenous variables. You assume the CAPITAL, RPLAB, and RPCAP are exogenous
variables.

LIBNAME econ515 ‘a:’;
DATA producer;
SET econ515.producer;
rplab = plabor/poutput;
rpcap = logpcapital/poutput;
PROC MODEL;
PARMS a=2.1 e=0.3 c=0.1 d=0.5 e=0 f=0 g=0 h=0;
output = a*(e*labor**(-c) + (1-e)*capital**(-c))**(-d/c);
labor = e + f*rplab + g*rpcap + h*output;
ENDOGENOUS output labor;
INSTRUMENTS capital rplab rpcap;
FIT output labor / n3sls;
RUN;

The two assignment statements create the new variables. The PROC MODEL statement tells SAS that
you are going to estimate or simulate a system of linear or nonlinear equations. The PARMS statement
tells SAS the names of the parameters and their starting values. The next two assignment statements tell
SAS the specific functional forms of the equations to be estimated. The ENDOGENOUS statement tells


                                                                                                         28
SAS the endogenous variables. The INSTRUMENTS statement tells SAS the variables that you will use
as instruments to create instrumental variables for output and labor. The FIT statement tells SAS the
equation to be estimated, which are indicated by the left-hand side variable. The option N3SLS tells SAS
to estimate the system of equations using the nonlinear three-stage least squares estimator. This estimator
uses an iterative procedure to obtain the estimates. The default maximum number of iterations is 40. If
you want to increase or decrease the maximum iterations, then after ITSUR include the option MAXIT =
and the number of iterations.

BINARY PROBIT REGRESSION

Example

You want to use the SAS dataset LABOR to estimate a labor force participation equation for women.
The dependent variable is LFP (a dummy variable). The independent variables are WA, WE, KL6, K618,
CIT, UN. You want to analyze the impact that each of the independent variables has on the probability
that a women will choose to work (the probability that LFP=1).

LIBNAME econ515 ‘a:’;
DATA labor;
SET econ515.labor;
lfpnew = 1 – lfp;
PROC PROBIT;
CLASS lfpnew;
MODEL lfpnew = wa we kl6 k618 cit un;
RUN;

The first assignment statement creates a new variable, named LFPNEW in this example, that is one minus
LFP. SAS models the probability of the lower value of the dependent variable. In this example, LFP = 1
if a woman works, LFP = 0 if a woman does not work. Therefore, if you use LFP as the dependent
variable, SAS would estimate the probability that a woman does not work. To estimate the probability
that a woman works, you must create the new variable LFPNEW. The PROC PROBIT statement tells
SAS to estimate a probit model. The CLASS statement tells SAS the variable that is being analyzed. The
MODEL statement tells SAS the equation to estimate.

BINARY LOGIT REGRESSION

Example

Same as for probit model. The SAS statements are the same as the probit model, except you must include
the option D = LOGISTIC in the MODEL statement.

MODEL lfpnew = wa we kl6 k618 cit un / d = logistic;

TOBIT (CENSORED) REGRESSION

Example

Suppose you are using the Limdep data file LABORSUPPY.LPJ. You want to estimate an hours of work
equation for wives. The dependent variable is WHRS. The explanatory variables are KL6, K618, WA,
WE. Fifty of the 100 wives in the sample do not work, and therefore WHRS is zero for these 50
observations. However, we do have data for the explanatory variables for all 100 wives. In this case, the


                                                                                                         29
distribution of the dependent variable, WHRS, is censored from below at zero. The appropriate
regression model is a Tobit (censored) regression model.

LIBNAME econ515 ‘a:’;
DATA labor;
SET econ515.labor;
IF whrs = 0 THEN lower = .; ELSE lower = whrs;
PROC LIFEREG;
MODEL (lower, whrs) = kl6 k618 wa we / d = normal covb itprint;
RUN;

The IF/THEN/ELSE statement creates the variable named LOWER. If the value of the variable LOWER
is missing, then SAS will treat the variable WHRS as censored from below. The PROC LIFEREG
statement tells SAS to estimate a censored regression model. The MODEL statement tells SAS the
equation to estimate. Note that the dependent variable (lower, whrs) specifies two variables. If the values
of the variable LOWER is missing, then SAS assumes that the variable WHRS is censored from below.
The option D = NORMAL tells SAS to assume that the dependent variable has a truncated normal
distribution. The option COVB tells SAS to print out the variance/covariance matrix of estimtes. The
option ITPRINT tells SAS to print out the iterations.

MATRIX COMMANDS

To do matrix and vector operations, you use PROC IML. IML stands for Interactive Matrix Language.
The general form of a SAS matrix program is

PROC IML;
IML statements;
QUIT;

PROC IML tells SAS that you want to start doing matrix operations. QUIT tells SAS that you are
finished doing matrix operations. IML statements are arranged in groups called modules. To begin a
module, you use a START statement. To end a module, you use a FINISH statement. To execute the
module, you use a RUN statement. The general form of an IML module is

START module name;
IML statements;
FINISH;
RUN module name;

Defining Matrices and Vectors

Example

Suppose that you want to create the following matrices and vectors.


| 7 4 9 |               | 1 5 2 |            |5|
| 2 8 9 |               | 6 7 4 |            |2|       | 1 8 9 |
| 5 4 7 |               | 7 1 3 |            |9|

PROC IML;


                                                                                                        30
START first;
a = {7 4 9, 2 8 9, 5 4 7};
b = {1 5 2, 6 7 4, 7 1 3};
c = {5,2,9};
d = {1 8 9};
PRINT ,a,b,c,d;
FINISH;
RUN first;
QUIT;

The PROC IML statement tells SAS to start doing matrix operations. The START statement tells SAS to
begin a new module, and name this module FIRST. The next four lines create two matrices named a and
b, and two vectors named c and d. Note that braces { } are used to define matrices and vectors. The rows
of a matrix or vector are separated by a comma. The PRINT statement tells SAS to show the matrices in
the output window. If commas are used to separate the names of the matrices and vectors (as in the
example above), this tells SAS to print each matrix and vector on a new line. The FINISH statement tells
SAS to end the module. The RUN statement tells SAS to execute the statements in the module FIRST. If
you do not give a name to your module, then the RUN statement will execute the statements in the
module that immediately precedes it. Thus, in this example it is not necessary to name the module.

Defining Matrices and Vectors with Existing Data

Example

You want to use the SAS dataset DEMAND to create a column vector for the variable Q1, and a data
matrix that includes P1, P2, I, and a column of 1’s for the constant term.

LIBNAME econ515 ‘a:’;
DATA demand1;
SET econ515.demand;
PROC IML;
START first;
USE demand1;
READ all var{q1} into q1;
PRINT q1;
FINISH;
RUN first;
START second;
USE demand1;
READ all var{p1 p2 I} into Z;
t = NROW(Z);
ones = J(t, 1, 1);
X = ones||Z;
PRINT / X;
FINISH;
RUN second;
QUIT;

The LIBNAME, DATA, and SET commands tell SAS to access the permanent SAS dataset named
DEMAND and put the data in the temporary SAS dataset named DEMAND1. The PROC IML statement
tells SAS to start doing matrix operations. The START statement tells SAS to begin a new module, and


                                                                                                      31
name this module FIRST. The USE statement tells SAS to read the SAS dataset named DEMAND1 into
PROC IML. The READ statement tells SAS to take data from the SAS dataset DEMAND1 and place it in
a matrix or vector. The options ALL, VAR{Q1}, and INTO Q1 tell SAS to use all of the observations
for the variable Q1and read them into a vector named Q1. The PRINT statement tells SAS to display the
vector Q1 in the output window. The FINISH statement tells SAS to end the module. The RUN
statement tells SAS to execute the statements in the module FIRST. The second START statement tells
SAS to begin a new module, and name this module SECOND. The USE statement tells SAS to read the
SAS dataset named DEMAND1 into PROC IML. The READ statement tells SAS to take data from the
SAS dataset DEMAND1 and place it in a matrix. The options ALL, VAR{P1 P2 I}, and INTO Z tell
SAS to use all of the observations for the variables P1, P2, and I, and read them into a matrix named Z.
The T= NROW(Z) statement (which uses the NROW function) tells SAS to count the number of rows in
the matrix Z and assign this value to T. The ONES = J(t, 1, 1) statement (which uses the J function) tells
SAS to create a matrix that has identical values. This statement tells SAS to create a matrix named ONES
with T rows and 1 column, and fill it with all 1’s. Therefore, SAS will create a Tx1 column vector of 1’s.
The X = ONES||Z statement tells SAS to create a new matrix named X, which stacks the vector ONES
and the matrix Z side by side (this is called horizontal concantenation, and the operator for this is ||). The
PRINT / X statement tells SAS to display the matrix X in the output window. The option “/” tells SAS to
skip to a new page when displaying X. The FINISH statement tells SAS to end the module. The RUN
statement tells SAS to execute the statements in the module SECOND. The QUIT statement tells SAS
that you are finished with matrix operations.

Defining an Identity Matrix

An identity matrix is a square matrix with ones on the principal diagonal and zeros off the principal
diagonal.

Example

You want to create a 30x30 identity matrix named IMATRIX.

PROC IML;
START ;
imatrix = I(30);
PRINT imatrix;
FINISH;
RUN;
QUIT;

The imatrix = I(30) statement tells SAS to create an identity matrix (I is called the I function) named
IMATRIX. The number inside parentheses defines the number of rows and columns of the matrix.

Matrix Operations and Matrix Algebra

The following is an example of some matrix operations that can be done with PROC IML. The SAS
statements are in the form of NAME = OPERATION, where NAME is the name of the matrix or scalar
that results from the operation that is performed.

Example

You want to create two matrices, a and b, and two vectors, c and d, and perform a number of operations
on them. A description of the operation performed is to the right of the SAS statement


                                                                                                           32
PROC IML;
START first;
a = {7 4 9, 2 8 9, 5 4 7};           * Module first creates the matrices and vectors;
b = {1 5 2, 6 7 4, 7 1 3};
c = {5,2,9};
d = {1 8 9};
FINISH;
RUN first;
START second;
e = a||b;                  * e is a new matrix that is the horizontal concantenation of a and b (a and b are
                              placed side by side to create e);
f = a//b;                  * f is a new matrix that is the vertical concantenation of a and b (a is stacked on
                              top of b to create f);
g = BLOCK(a,b);            * g is a new matrix that is a block diagnol matrix with a and b as separate
                              blocks;
h = DIAG(c);               * h is a new matrix that is a diagonal matrix with the elements of the vector c on
                             the principal diagonal;
i = INV(a);                * i is the inverse of the matrix a;
j = DET(a);                * i is the determinant of the matrix a;
k = TRACE(a);              * j is the trace of the matrix a;
l = T(a);                  * l is the transpose of a;
m = a +b;                  * m is the sum of a and b;
n = a-b;                   * n is the difference between a and b;
o = a*b;                   * o is the product of a and b;
p = 5#b;                   * p is the matrix that results from multiplying each element in b by the scalar 5;
q = b/5;                   * q is the matrix that results from dividing each element in b by the scalar 5;
r = a@b;                   * r is the kroneckor product of a and b;
s = VECDIAG(a);            * s is a vector whose elements are the elements of the principal diagonal of a;
PRINT ,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s;
FINISH;
RUN second;
QUIT;

Matrix Command Examples

Example #1

You want to use the SAS dataset named DEMAND to calculate estimates of the parameters of the linear
regression of Q1 on a constant, P1, P2, and I, using the OLS estimator. You want to calculate the
variance/covariance matrix of estimates, standard errors of the estimates, and t-statistics for the zero null
hypothesis.

LIBNAME econ515 ‘a:’;
DATA demand1;                             * Accesses SAS dataset DEMAND and creates temporary SAS;
SET econ515.demand;                       * data set DEMAND1;

PROC IML;                                 * Initiates matrix operations;

START first;
USE demand1;


                                                                                                             33
READ all var{q1} into q1;               * Module FIRST creates vector of observations for q1 and data;
READ all var{p1 p2 I} into z;           * matrix;
t = NROW(z);
ones = J(t, 1, 1);
x = ones||z;
PRINT q1 x;
FINISH;
RUN first;

START second;                         * Starts module named SECOND;
xt = T(x);                            * Transpose of data matrix;
xtx = xt*x;                           * Transpose of data matrix times the data matrix;
xtxi = INV(xtx);                      * Inverse of xtx;
b = xtxi*xt*q1;                       * OLS estimator;
q1fit = x*b;                          * Vector of fitted values for q1;
res = q1-q1fit;                       * Vector of residuals;
rss = T(res)*res;                     * Residual sum of squares;
df = NROW(x)-NCOL(X);                 * Degrees of freedom;
sig2 = rss/df;                        * Estimate of the error variance;
covb=sig2#xtxi;                       * Variance/covariance matrix;
stderr = SQRT(VECDIAG(covb));         * Vector of standard errors of the estimates;
tstat = b/stderr;                     * Vector of t-statistics;
PRINT b df sig2 stderr tstat covb;    * Display estimates;
Results = b || stderr || tstat;       * Create matrix with estimates, standard errors, and t-statitics;
estname = {int p1 p2 I};              * Create row names;
col = {estimate se tstat};            * Create column names;
PRINT / results [rowname = estname colname = col] covb; * Display estimates in table form;
FINISH;                               * Ends module named SECOND;
RUN second;                           * Runs module named SECOND;

QUIT;                                   * Ends matrix operations;




                                                                                                          34

						
Related docs
Other docs by HC120309021717