An Introduction to SAS for Windows Spring 2006 I. Starting SAS
To start SAS in Windows: Under the “Start” menu select “Programs”. Then select “SAS System”, then select “SAS System for Windows V 9.1.”
For each SAS job, you will see three windows in SAS: PROGRAM EDITOR: A window where the user writes, edits and executes SAS programs. LOG: The program log has the printout of program statements written by the user and messages generated by the system. If there are any errors, they will appear in red. OUTPUT: Where the results of the program will be. You can easily move between the boxes by clicking on the box you want (program editor/log/output).
To Run a SAS Program
Create or edit your program in the PROGRAM EDITOR window. Submit the program to run by clicking on the little running guy icon. View the SAS LOG first to see if the program ran correctly. Then view the SAS OUTPUT.
Elements of a SAS Job
1. SAS program (.sas): A text file containing SAS commands 2. SAS log (.log): A text file generated when a program is submitted in SAS. 3. SAS output (.lst): A text file generated as a result of PROC steps in a SAS program 4. SAS data set (.sas7bdat)
II. Two Parts of a SAS Program
There are two basic building blocs of a SAS program: DATA step and PROCedure step. They are both made up of statements. 1. DATA steps read and modify data and create SAS data sets. Every DATA step begins with the DATA statement and ends with a RUN statement. The keyword DATA is followed by a name that the user assigns to the data set (I like to start with data1 for the first step, data2 for the second and so on). Create permanent or temporary SAS data sets which can be used in subsequent PROC step. The basic format is as follows: data name; y = x +1; run; /*creates a temporary data set*/
2. PROC steps perform specific analyses or functions, and they produce the results. Each PROC step starts with the word PROC followed by the name of the procedure (e.g.,
PRINT, SORT, UNIVARIATE, MEANS, PLOT or so on) and ends with a RUN statement. proc print data = name; run; Note: SAS executes a block of DATA or PROC statements together. A group of statements end with a RUN statement. You need to add the RUN statement if you are running SAS interactively.
SAS Syntax Rules
1. SAS statements or SAS code (command lines) usually begin with a SAS keyword (e.g. DATA, INPUT, PROC). Each statement ends with a semi-colon (;). They can begin anywhere on the line and they can extend across lines or be grouped together on one line. 2. Variable names can be up to 32 characters in length. They must begin with a letter or an underscore. They can include letters and numbers but cannot contain blanks or special symbols (&,%,$,*,#, etc.) 3. “Character strings” are surrounded by quotation marks. For example: if state = „Nevada‟; Items in quotation marks are case sensitive so „Nevada‟ is not the same as „nevada‟. In most cases you can use single or double quotation marks but they must match. 4. You can include comments anywhere in the SAS program. The easiest way to do this is enclose a comment with in a forward slash and an asterisk as in the following example: /*This step creates new variables*/
III. Reading Raw Data into a SAS Data File
In order to perform statistical operations in SAS, raw data must be first read into a SAS data file. A SAS data file is a group of variables stored under the same name. There are two types of SAS data files in the program: 1) Permanent data sets are data sets that are stored in a user-designated directory (e.g. a:\mydata.sas7bdat). Permanent data sets are useful if the data they contain will be used in a future application. Permanent data sets will have the extension “.sas7bdat”. 2) Temporary data sets are data sets that will be automatically deleted after a SAS session. They are referenced only within the current program. DATA steps are temporary data sets. When reading a data set in to SAS, the first step is to create a FILENAME, which tells SAS where your ASCII data set is. FILENAME statement creates a logical name (a nickname) for an ASCII data file. We are going to read in the ASCII data set called “CEOSAL1.raw”.
filename ceo 'a:\CEOSAL1.raw'; The next step will be to read in the data using the DATA step. The INFILE statement opens an ASCII data set 'a:\CEOSAL1.raw'. data data1; infile ceo; The INPUT statement describes the arrangement of values in a data file and assigns the values to SAS variables. It is here that you assign the variable names, so the first row of your data should NOT contain the variable names. Variable names followed by a „$‟ are character variables and those without a „$‟are numeric. input salary pcsalary sales roe pcroe ros indus finance consprod utility lsalary lsales; run; To view what our data looks like, let‟s print it using the PROC statement. proc print data=data1; run;
If you intend to use or create a permanent data set, you will also need to create a LIBNAME, which lets you reference a data directory. LIBNAME statement establishes a LIBREF (nickname) for a directory which contains (or will contain) a permanent SAS data set. After the keyword LIBNAME, write the LIBREF and specify the location of your permanent data set. The following program will create a permanent data set named CEOSAL1 and save it into a folder called MYSASLIB on A drive (You must first create the MYSASLIB folder in you‟re A: drive). In the DATA statement, the first part of the data set name is the LIBREF assigned in the LIBNAME statement. libname sample 'a:\mysaslib'; data sample.ceosal1; set data1; run; Note: If you look in the MYSASLIB folder you will see that the data set is actually called CEOSAL1.SAS7BDAT.
IV. Reading a permanent SAS data set
Reading a permanent SAS data is much easier than importing ASCII data. If the data already exists as a temporary or permanent SAS data set, we do not need to use the INFILE or INPUT statements. Instead we use the SET statement.
SET statement reads a permanent SAS data set into a new SAS data set. SET can refer to either a temporary or a permanent SAS data set. The following program will read the permanent SAS data set called CEOSAL1.SAS7BDAT. libname ceos 'a:\mysaslib'; data data1; set ceos.ceosal1; run;
V. Working with Data
Following are some of the data manipulations that can only be done within a SAS data step. Creating and Redefining Variables You can create and redefine variables with assignment statements using the basic format: variable = expression; salary_new = salary/10; ln_salary = log(salary); salary2 = salary*salary; The first line above creates a new variable called salary_new which is the old salary divided by 10. The second creates a new variable called ln_salary which is the natural log of salary. The third line creates a variable called salary2 which is the square of salary values. IF-THEN Statements Often you want an assignment statement to apply to some observations but not all. This can be done with the IF –THEN statement and its subcommands AND, OR, and ELSE. The basic structure of an IF-THEN statement is IF condition THEN action; Here are some basic comparison operators associated with the IF-THEN statement. Mnemonic eq ne gt ge lt le Symbolic = ~= > >= < <= Meaning Equals Not equal Greater than Greater than or equal Less than Less than or equal
Whether you use a mnemonic or a symbolic operator depends on you preference and availability of symbols on you keyboard. If-then statements are often used to create a new variable in the following manner: if variable1= value then variable2= some value; The following statement will create a variable called INDUSTRY which is equal to finance whenever the variable FINANCE equals 1. if finance = 1 then industry = „finance‟; The ELSE subcommand issues instructions if the conditions(s) in the IF-THEN statement is/are false. With the else statement we can make INDUSTRY equal to nonfin whenever FINANCE is not equal to 1 (in this case when it is equal to zero). if finance = 1 then industry = „finance‟; else industry = „nonfin‟; You can also specify multiple conditions with the keywords AND and OR; if finance = 1 and pcroe ge 0 then finroe = „positive‟; This will create a new variable called FINROE which is equal to positive if the firm is in the finance industry and the return on equity is positive. Note: when “pcroe” < 0, then an entry under finroe will be blank unless we specify what we want this variable to say. For example, you can add: else finroe = „negative‟; IF-THEN statement is an ideal way to create a dummy variable, which usually takes on a value of 1 when a condition is true, and 0 when the condition is false. The following statement will create a dummy variable finroe equal to 1 when the return on equity is positive. roe_dum = 0; if pcroe ge 0 then roe_dum = 1;
VI. Commonly Used Procedures
Procedures are the reason why you will want to use SAS. All statistical operations take place in the procedures or PROC step of a SAS program. Available procedures range from descriptive statistics (e.g. frequency distributions or means) to inferential statistics
(e.g. regressions). The names of these procedures are generally mnemonic. The procedures themselves contain many options that are useful in tailoring your output. This section contains a brief description of the six procedures that you will find most useful in this course. For more details see the SAS Procedures Guide. Contents PROC CONTENTS provides a summary of the data set, including the number of observations and the list of variables in your data set, the names, length and types of those variables, how the data set is sorted, etc. The syntax of this procedure is: proc contents data = name; run; Let‟s apply this to our data set. proc contents data = data1; run; Frequency PROC FREQ computes a frequency distribution of the variables specified and has the following format: proc freq data = name; tables variable list; run; The list of variables can contain single variable names (e.g. salary) or multiple variables for which cross tabulations are to be computed. For example proc freq data = data1; tables finance sales finance*sales; run; This will produce three frequencies: (1) a frequency distribution of the variable finance; (2) a frequency distribution of the variable sales; and (3) a cross tabulation of these two variables. Means PROC MEANS computes the mean value of variables in a data set. The basic format is: proc means data = name options; by variable list; var variable list; run;
Many of the lines above are optional. If you do not specify options, SAS will report for each variable the number of non missing values, the mean, the standard deviation, and the minimum and the maximum values. The VAR statement specifies which numeric variables to use in the analysis. If it is absent all numeric variables are used. The BY statement performs separate analyses for each level of the variables in the list. The data must first be sorted in same order as the variable list. (Use PROC SORT to do this). Print PROC PRINT prints the values of variables in a data set. It has the following format: proc print data = name; var variable list; run; The data set specified is the one to be printed but the variables printed will be restricted to those specified by the VAR statement. If the VAR statement is absent, all of the variables will be printed. Correlations PROC CORR provides a correlation matrix for the variables specifies and has the following format: proc corr data = name; var variable list; run; Sorting data PROC SORT sorts the data by a particular variable. It has the following format: proc sort data = name; by variable; run; By default, SAS sorts in ascending order (smallest to largest). For character variables, SAS will sort in ascending alphabetical order (a to z). If you need to sort in descending order, place the word DESCENDING before the variable name: proc sort data = name; by descending variable; run; SAS can also sort by more than one variable. It will first sort the data by the variable specified first and then within each category of that variable by the variable specified second.
proc sort data = name; by variable1 variable2; run; Regression PROC REG estimates OLS regressions. The syntax of a regression is proc reg data = name; model dependent variable = independent variable; run;
Ending Your Session
If you want to save your Program, log file or the output file, make sure that the specific window is active. Under the “File” menu, choose “Save As” and then choose the directory where you want to save it as well as the name of the file. Usually it is a good idea to save the program, log and list file under the same name if they correspond to the same SAS job (they will have different extensions). If you do not wish to save anything, choose “Close”.
Some Useful Assignment Statements
Assignment Statement Expression newvar = 10; Numeric constant newvar =‟ten‟; Character constant newvar = oldvar; A variable newvar = oldvar + 10; Addition newvar = oldvar – 10; Subtraction newvar = oldvar *10; Multiplication newvar = oldvar/10; Division
Some useful Functions
Functions ABS EXP LOG MAX MIN SQRT Returns Absolute value Exponential Natural logarithm Largest value Smallest value Square Root