Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Introduction to SAS by JamieGribowicz

VIEWS: 25 PAGES: 17

									          Introduction to SAS

Statistics Outreach Center—Short Course

            Brad Brossman
Getting Started

Students at the University of Iowa can use SAS on their ―Virtual Desktop.‖ This site can

be found at: https://virtualdesktop.uiowa.edu/Citrix/VirtualDesktop/auth/login.aspx



To open SAS:

        1. Go to the Virtual Desktop website shown above

        2. Log in using your HawkID username and password

        3. You will see a main menu with several folders. Click on ―SAS 9‖

        4. You will see SAS 9_1_3 and SAS 9_2. Click on ―SAS 9_2‖ (although you

               could click on SAS 9_1_3, which is just an earlier version of the program.

        5. You will see a pop-up window titled ―Getting Started with SAS.‖ Click

               ―Close‖ for the time being.

        6. You are ready to begin using SAS

Note:

        These instructions can be used on computers that do not have SAS. If you are

using a computer that has SAS installed, you can use SAS directly from the installed

program. Click the ―Start‖ menu in the lower left hand corner of the screen, click on ―All

Programs,‖ and then click on ―SAS (English).‖
SAS Basics

          There are 5 main ―windows‖ you can view when using SAS: Explorer, Results,

Editor, Log, and Output. Explorer and Results are at the bottom of the left-hand panel,

and Editor, Log, and Output are at the bottom of the main panel. A brief description of

what each of these performs appears below:

Explorer:       This contains the folders ―Libraries,‖ ―File Shortcuts,‖ ―Favorite Folders,‖

and ―My Computer.‖ The two most commonly used folders you might use in this

environment are ―My Computer‖ and ―Libraries.‖ ―My Computer‖ gives you access to

all files on your computer. ―Libraries‖ gives you access to SAS datasets that you create.

Results:        Results from SAS procedures that you have previously conducted during

your work session are stored in here.

Editor:         This window is where you type in (and ―edit‖) your SAS code. Your SAS

program is run from this window.

Log:            After you ―run‖ a program, the Log contains notes concerning your code.

This window in SAS keeps track of how procedures were performed, and gives

indications of any errors in your SAS code.

Output:         Output from the requested procedures will be displayed in the output

window.
Sample SAS Program
DM 'LOG;CLEAR;OUT;CLEAR;';    /* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course                                       */
/*     FOR: COE Students                                          */
/*      BY: Sheila Barron                                         */
/*    DATE: November 13, 2007                                     */
/* NOTES: Entering data and checking it                           */
/*****************************************************************/
DATA WORK.CLASSDAT;
   INPUT ID $ NAME $ SEX $ EXAM1 GRADE $;
DATALINES;
S01 Max    M 84 A
S02 John M 89 A
S03 Sarah F 86 B
S04 Lee    M 85 B
S05 Rosa F 94 A
S06 Ming F 84 C
;

PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; RUN;

PROC PRINT DATA=WORK.CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;

/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
Components of a SAS program

In the SAS editor you can type in the commands you want SAS to execute.

A simple SAS program can be thought of as having two important parts (although it is
not necessary that every program have both parts).

      SAS data step: The word DATA tells SAS that you want to work with your data
       set – either inputting the data or manipulating the data.

      SAS procedures step: The word PROC tells SAS you want to do something
       with the data (e.g., print it out, calculate statistics).

A few things to remember about SAS:

      Each SAS statement must end with a semicolon ―;‖

      At the end of your program you must have a run statement, ―RUN;‖. Otherwise
       the last SAS data step or SAS procedure will not get executed.

      SAS comments: Anything written between ―/*‖ and ―*/‖ is considered as
       documentation that the person writing the program did not intend SAS to try to
       execute. In other words, SAS will pass over anything that is written between ―/*‖
       and ―*/‖. ―*‖ and ―;‖ also works to denote a comment. It is a good idea to use
       comments to document what you are doing in your program. If you come back to
       the program later, the comments will hopefully help you understand the purpose
       of the program.

Running your program

      When you want SAS to execute the statements you have written, click the
       ―running man‖ icon on the toolbar. Or click on the Run pull-down menu and
       select ―submit.‖

      If you want to run the entire program make sure nothing in the program is
       highlighted when you click Run.

If you only want to run part of the program, highlight the part you want to run and then

click Run. SAS will only process the part of the program that you have highlighted.
SAS Datasets

        Before SAS can perform the variety of functions that it is used for, SAS first

needs to know what dataset it is going to use. SAS datasets contain columns

corresponding to specific variables (e.g., height, weight, etc.) and rows corresponding to

specific observations (e.g., persons, clinic sites, etc.). SAS can read data in two different

methods:

        1. SAS datasets can be directly embedded in the Editor window

        2. SAS datasets can be imported from a file (i.e., text file, excel file, etc.)

Suppose we want to use the following dataset in SAS. Note that each row corresponds to

a specific observation (person), and each column corresponds to a specific variable (ID,

Name, Gender, Exam1, and Grade).

S01   Max     M   84   A
S02   John    M   89   A
S03   Sarah   F   86   B
S04   Lee     M   85   B
S05   Rosa    F   94   A
S06   Ming    F   84   C

        SAS variables can be in one of two possible formats: character, and numeric.

―Numeric‖ variables are typically numbers, and mathematical operations can be

performed on them. ―Character‖ variables are typically letters or strings of letters and

numbers, and mathematical operations can not be performed on them. In the example

above, only ―Exam1‖ will be in numerical format.
Data Embedded in Editor Window


Option 1: List Input


Notice that the data set does not have any missing data and there is always at least 1
blank space between variables. When your data are set up like this it is OK to list the
variables in the INPUT statement without telling SAS where to find each variable. This
is called ―list input‖ – SAS will read the input statement and expect the variables to be in
the order they are listed and separated by at least one space. If you have missing data that
are represented by blanks, variables that include blanks, or if you have variables that have
no spaces between them, ‗list input‖ won‘t work (you will need to put a ―.‖ for missing
data).


Option 2: Column Input


Another option is ―Column input.‖ In order to use ―column input,‖ values for each
variable must line up – that is they must always be in the same columns. Then in the
input statement you add column numbers to tell SAS what column or columns to find
each variable.


 INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16;


Option 3: Informats


A third way of reading in data is to use SAS informats. SAS informats tell the computer
the format of the data that is to be read in. The most commonly used informats are date
informats. Dates are a little tricky to deal with in computer programs if you want to use
them in calculations.


A numeric informat consists the following pieces – name, width, a period, number of
places after the decimal. For example, an informat for a date that is written month, day,
year, separated by slashes (e.g., 11/10/2007) is ―MMDDYY10.‖ The name of this
informat is MMDDYY, the width is 10, next is the period. This is not a number with a
decimal so the number of places after the decimal is omitted. Another note: Character
informats start with a dollar sign ‗$‘.


We will not be discussing informats in great detail. However, to look up other SAS
informats, go to the HELP menu, select SAS Help and Documentation and Contents.
Then go to:


SAS products
    Base SAS
        SAS 9.2 Language Reference: Dictionary
            Dictionary of Language Elements
                Informats
                    Informats by Category
Data from an external file

We will discuss how to import data from two common sources: an EXCEL file and a
TEXT file. For the most part, the input statement will follow all the same rules as if the
data were in the program but you need to tell SAS where to find the data.


Data from a text file using FILENAME statement:


DM 'LOG;CLEAR;OUT;CLEAR;';   /* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course                                      */
/*    FOR: COE Students                                          */
/*     BY: Sheila Barron                                         */
/*   DATE: November 13, 2007                                     */
/* NOTES: Entering data and checking it                          */
/*****************************************************************/

FILENAME IN1 'H:\RA_SAS_Short_Course_INTRO_TXT.TXT';

DATA WORK.CLASSDAT; INFILE IN1; INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11
EXAM1 13-14 GRADE $ 16;

PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; RUN;

PROC PRINT DATA=WORK.CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;

/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
Data from a text file without using FILENAME statement (simpler):


DM 'LOG;CLEAR;OUT;CLEAR;';   /* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course                                      */
/*    FOR: COE Students                                          */
/*     BY: Sheila Barron                                         */
/*   DATE: November 13, 2007                                     */
/* NOTES: Entering data and checking it                          */
/*****************************************************************/

DATA WORK.CLASSDAT; INFILE 'H:\RA_SAS_Short_Course_INTRO_TXT.TXT';
INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16;

PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; RUN;

PROC PRINT DATA=WORK.CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;

/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
Importing Data from an Excel file:

Pulldown: File
    Import Data
    Next
    Browse for workbook (appropriate EXCEL file)
    Sheet name [Next]
    Data name [Next]
    SAS File Name [Finish]
[Open new SAS program]

Notice that when you read the data in from EXCEL, SAS tries to assign informats that
seem the most logical. This can be a big help – for example, SAS will often correctly
read in dates. But it can also be a pain when the informat SAS picks in not the correct
one. Thus, be careful when you import data to look carefully and make sure the data got
read in correctly.
Data in SAS

There are two types of files (data) that can be used in SAS: work datasets, and save
datasets.


Temporary datasets:


Work datasets are temporary datasets. SAS remembers them during the particular
session that you are working in, but will ―forget‖ them for subsequent sessions. Up until
this point in time, we‘ve only been working with work datasets—hence the
―WORK.______‖ format for all specified datasets.


Permanent datasets:


Permanent datasets can be created (and stored) using the ―SAVE.______‖ format for
specified datasets. To do this you need to start your program with a library reference
(LIBREF). Then use that reference as the first part of the dataset name you assign. For
example, I like to call my library SAVE so I use the following libref.


LIBNAME LIBREF          'H:\';


LIBNAME     lets SAS know that the permanent directory is going to be specified. LIBREF
(can be anything) is the name used to refer to the external data library specified by 'H:\'
which is the full pathname.


When specifying the data, use the ―SAVE._______‖ format. For example, using the
LIBNAME statement above, SAVE.CLASSDAT would create a permanent file
compared to the WORK.CLASSDAT which we have been using.
Ready to Begin Data Analysis
Now that your data is in SAS, you are ready to conduct statistical procedures. SAS has
literally hundreds of procedures that will do just about any quantitative analysis you
want. To get an overview of the procedures go to the HELP menu, select SAS Help and
Documentation and Contents. Then go to:


SAS products
    SAS/STAT
        SAS/STAT Users Guide


In the user guide you will find overviews for different types of analyses as well as details
on specific procedures.
SAS Code for Common Procedures
DM 'LOG;CLEAR;OUT;CLEAR;';    /* CLEARING LOG & OUTPUT WINDOWS */
/*****************************************************************/
/*PROJECT: SAS Short Course                                       */
/*     FOR: COE Students                                          */
/*      BY: Sheila Barron                                         */
/*    DATE: November 13, 2007                                     */
/* NOTES: Entering data and checking it                           */
/*****************************************************************/
DATA WORK.CLASSDAT;
   INPUT ID $ NAME $ SEX $ EXAM1 GRADE $;
DATALINES;
S01 Max    M 84 A
S02 John M 89 A
S03 Sarah F 86 B
S04 Lee    M 85 B
S05 Rosa F 94 A
S06 Ming F 84 C
;

PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; RUN;

PROC PRINT DATA=WORK.CLASSDAT;
TITLE 'SAS SHORT COURSE';
RUN;

PROC PRINT DATA=WORK.CLASSDAT (OBS=3);
VAR NAME GRADE;

PROC FREQ;
TABLES EXAM1;

PROC FREQ;
TABLES EXAM1*GRADE;

PROC FREQ;
TABLES EXAM1*GRADE /LIST;

PROC MEANS;
VAR EXAM1;

PROC UNIVARIATE;
VAR EXAM1;


/*****************************************************************/
/*DM 'OUT;FILE OUT REP;';
DM 'LOG;FILE LOG REP;'; */
/*****************************************************************/
PROC CONTENTS DATA=WORK.CLASSDAT VARNUM;



To get a listing of the variables in a data set along with other information about the
dataset.


PROC PRINT;


To print out a data set (often good to check the data using PROC PRINT before running
any analyses).


PROC PRINT DATA=WORK.CLASSDAT (OBS=3);
VAR NAME GRADE;


If the data set is small you can print out the whole thing. If it is large you may want to
select particular variables to print using a VAR statement or select particular observations
to print using an OBS= option.


PROC FREQ;
TABLES EXAM1;


To produce a frequency distribution for a variable (specify the variable using the
―TABLES‖ statement.

PROC FREQ;
TABLES EXAM1*GRADE;


PROC FREQ will also produce two-way (or higher) cross-tabulations of the data.

PROC FREQ;
TABLES EXAM1*GRADE /LIST;


If there are lots of unique values for the variables, you may want to try a LIST option to
produce more concise output.

PROC MEANS;
VAR EXAM1;

PROC UNIVARIATE;
VAR EXAM1;
To produce means and other descriptive statistics use PROC MEANS or PROC
UNIVARIATE. PROC UNIVARIATE will produce more extensive output. (Note that
the specific variable is specified by the VAR statement. If no VAR statement is included,
by default SAS will produce output for all variables.)

NOTE:

Note that in some PROC statements, the keyword ―DATA=‖ is specified. In other PROC
statements, it is omitted. It is necessary to tell SAS which dataset to use if you are just
starting your SAS session or if you are switching the dataset you want SAS to use. If you
are continuing to use the same dataset that you used in the last procedure or data step,
then it is not necessary to tell SAS which dataset to use, it will automatically use the
dataset it used last.
Miscellaneous

Note that there are some lines in this program that we have not talked about.

      The top line (DM 'LOG; CLEAR; OUT; CLEAR; ';) tells SAS to clear out the log
       and output windows. Without this line, each time you run the program, SAS will
       add the log and output to the end of the old log and output. This can sometimes
       be useful, but it can be confusing after several runs of a program.

      The two lines that start with ―FILENAME‖ tell SAS where the log and output are
       to be saved (not included in this program).

      The last two lines tell SAS to save the log and output and if those files already
       exist, to replace the old versions with these versions (not included in this
       program).

When you have written a program it is a good idea to save it. Go to the FILE menu and

click SAVE AS. It will prompt you for a name. After that, you can save your revisions

by selecting SAVE or clicking the save icon. When you come back later, you can open

the program and continue working.

								
To top