VA Information Resource Center (VIReC) SAS Programming Efficiency

Document Sample
VA Information Resource Center (VIReC) SAS Programming Efficiency Powered By Docstoc
					                                           Your Guide to VA Data




SAS Programming Efficiency Tips
    ®




                Phil Colin
  VA Information Resource Center (VIReC)
            Hines VA Hospital

                     September 4, 2001
        SAS is a registered trademark of the SAS Institute, Inc.   1
SAS Programming Efficiency Tips
Overview
Reading external data
Processing SAS data sets
Storing SAS data sets
Sorting SAS data sets
Documentation and coding
System related issues


                                  2
                     Overview
One of the keys to successful programming is to write
efficient and understandable source code. Efficiency, as
defined by the SAS Institute, can be thought of as “getting
more results from fewer resources”. By following certain
guidelines when writing SAS source code we can:
    Reduce processing time and system resource demands
    (CPU, I/O, memory and storage).
    Reduce programming time.
    Design programs which are more understandable to
    others.
This presentation lists some techniques which, when
implemented, should improve the performance, reliability
and readability of your SAS programs.
                                                              3
        Reading External Data
Read in only the data elements that you need.
(DROP or KEEP statement/data set option).
Read selection criteria fields first and eliminate
unwanted records before reading in the rest of the
fields.
Keep only the records that you need for
subsequent processing.
Define variables with the smallest length possible
(LENGTH, INFORMAT or ATTRIB statement).

                                                     4
       Reading External Data
Define numeric variables as character rather than
numeric if they are not going to be used in
arithmetic operations and are less than 8 bytes.




                                                    5
     Processing SAS Data Sets
Process and store only the variables that you need
(DROP or KEEP statement/data set option).
Create as many data sets in one DATA step as
possible (OUTPUT statement).
Read in as many SAS data sets in one DATA step
as possible (SET or MERGE statement).
Assign a value to a constant only once (RETAIN
statement).
Use mutually exclusive conditions
(IF-THEN / ELSE or SELECT statements).
                                                     6
     Processing SAS Data Sets
Put only statements affected by a program loop
(DO…END statements) in the loop.
Take advantage of SAS procedures.
Utilize SAS automatic variables (FIRST.variable,
LAST.variable, _N_, _I_, _ERROR_).
 Utilize SAS reserved words ( _NULL_, _LAST_,
_INFILE_, _ALL_, _NUMERIC_,
_CHARACTER_ ).
Utilize SAS functions whenever possible.
Use the IN rather than the logical OR operator.
                                                   7
     Processing SAS Data sets
Use the macro facility for repetitious code.
Utilize arrays whenever feasible.
Set the lower bound of an array to 0 (not 1).
Utilize global statements and variables whenever
possible.
Take advantage of the SAS System defaults.
Eliminate processing unwanted observations in
procedures (WHERE statement).
Know your data and code for unknown data.
                                                   8
     Processing SAS Data Sets
Check for undesirable conditions and stop
processing (STOP, ABORT or ENDSAS
statements).
Avoid default data type conversions
(PUT (numeric-to-character) or INPUT
(character-to-numeric) functions).
Test by reading in a sample subset of your data
(OBS= option or _N_ automatic variable).
When coding nested loops write the loop with the
fewest changes in the index variable outermost.
                                               9
     Processing SAS Data Sets
Write condition check statements in order of
descending probability.
Format variables instead of assigning them
different values (PROC FORMAT).
Verify the contents of a SAS data set. (PROC
CONTENTS).
Store data in permanent SAS data sets.
When uniformly changing the value of a variable
use the MODIFY statement instead of SET.

                                                  10
       Sorting SAS Data Sets
Plan sorting to reduce the number of sorts.
Sort data only when necessary.
Sort as few observations as possible (WHERE
statement).
When available, use a CLASS statement in
procedures to avoid sorting the data first.




                                              11
       Storing SAS Data Sets
Store only the variables that you need. (DROP or
KEEP statement/data set option).
Store summarized procedure output in SAS data
sets to avoid rereading the information again
(OUTPUT= statement).
Minimize the storage space used for variables
(LENGTH or INFORMAT statement).
Store numeric categorical data that do not need to
be processed numerically in character variables.

                                                12
        Storing SAS Data Sets
Store large SAS data sets in the compressed
format (COMPRESS= data set option).
For large data sets considering using indexes.
Store numeric variables as packed decimal
whenever feasible.




                                                 13
    Documentation and Coding
Attach labels to your permanent SAS data sets
(LABEL= data set option).
Attach labels and formats to your permanent SAS
data set variables (LABEL or ATTRIB statement).
Use meaningful variable and data set naming
conventions. (32 byte max. in version 8.)
Separate DATA and PROC steps with RUN
statements to create a step boundary which makes
the SAS log easier to read.

                                               14
    Documentation and Coding
Insert blank lines between DATA and PROC steps
for easier readability.
Indent subsequent statements in DATA and PROC
steps for easier readability.
Group declarative statements.
Retrieve non-volatile external source code with
the %INCLUDE statement during testing but place
all source code together in the finalized program.
Reference array variables using an abbreviated
style (i.e. var1-var10).
                                                15
    Documentation and Coding
Insert comment lines in your program.
Simplify complex expressions as much as
possible.
Document your procedure output (TITLE(n) or
FOOTNOTE(n) statements).




                                              16
        System Related Issues
Compile your programs before execution to test
syntax.
Disable the macro facility if the program does not
use any macro variables or functions (NOMACRO
system option) if extra memory is needed.
Protect existing data sets during the testing phase
(NOREPLACE system option).
Review the SAS log after the program completes
to verify results.

                                                 17
        System Related Issues
Manage your SAS data sets (PROC DATASETS
or PROC DELETE).
Schedule batch jobs to run off-hours if possible.
Become familiar with the operating system and
software that you are working with:
  Austin Automation Center (OS/390) - JCL/TSO/ISPF.
  Local systems (OpenVMS) - DCL.
Know which SAS engine (version) that you are
using and take advantage of any new
enhancements available in a version upgrade.
                                                      18
       System Related Issues
Know the SAS system options in effect (PROC
OPTIONS) and modify if necessary (OPTIONS
statement).
At AAC specify the most appropriate batch job
service level (6 – 9).




                                                19
        Additional Resources
SAS Programming Tips: A Guide to Efficient SAS
Processing
Efficiency: Improving the Performance of Your
SAS Applications
In the Know: SAS Tips & Techniques From
Around The Globe
SAS TODAY! – A Year of Terrific Tips
SAS Companion for the OpenVMS Environment
SAS Companion for the OS/390 Environment

                                                 20
Questions?




             21
     Contact Information
               Phil Colin
VA Information Resource Center (VIReC)
     Edward Hines, Jr. VA Hospital
          PO Box 5000 (151V)
         Hines, IL 60141-5000
     Phone: (708) 202-8387 x27042
          Fax: (708) 202-2415
Email: colin@research.hines.med.va.gov
                                         22