# Introduction to Arrays

Document Sample

```					                                             Introduction to Arrays

Bob Virgile
Robert Virgile Associates, Inc.

Overview                                                          DATA NEW;
SET CITIES;
This paper explains the basics of defining and using              IF BIRDS = . THEN BIRDS = 0;
an array. The information and examples will be useful             IF BEES = . THEN BEES = 0;
to the programmer who is either unfamiliar with or                IF FLOWERS = . THEN FLOWERS = 0;
confused by arrays. The basics are simple enough                  IF TREES = . THEN TREES = 0;
that most programmers can begin to successfully use               IF SKY = . THEN SKY = 0;
arrays immediately after reading this paper.                      IF ABOVE = . THEN ABOVE = 0;
IF LOVE = . THEN LOVE = 0;
This paper illustrates the most common ARRAY
statement syntax and usage. For all the details you              A revised program would place all seven variables into
might ever want to know, see SAS® Language:                      an array and then process all variables within the
Reference, Version 6, First Edition, pp. 160-171 and             array:
pp. 292-306.
DATA NEW;
SET CITIES;
What Is an Array?                                                 ARRAY LYRICS {7} BIRDS BEES FLOWERS
TREES SKY ABOVE LOVE;
An array means "a subset of the variables that make               DO I = 1 TO 7;
up one observation of a SAS data set." A sample data               IF LYRICS{I} = .
set might consist of 11 variables, with 7 of those                 THEN LYRICS{I} = 0;
variables making up an array.                                     END;

Variables within                                                This revised program produces the same result more
SAS data set CITIES:                                            economically. First, the program is three lines shorter.
Second, it becomes very clear that the program
STATE                                                           processes exactly seven variables. Therefore, it
BIRDS           }                                               becomes easier to understand and maintain the
BEES            }   These                                       second program. (Note the difference if the array
FLOWERS         }   variables                                   contained 80 variables instead of 7. The second
TREES           }   make up                                     program would still contain six statements, although
SKY             }   an Array.                                   the ARRAY statement would be longer. The first
ABOVE           }                                               program, however, would add 73 more statements.)
LOVE            }
INDEX                                                           "Economical" does not mean the program requires
CITY                                                            less CPU time. If anything, arrays require slightly
POP                                                             more CPU time. However, this is a minor expense
compared to the savings in the length (and
By defining these seven variables as an array, the               maintainability) of the program.
program can process the variables easily and
economically. Usually the array helps when all its
variables will be processed in a similar fashion. In the         Basic Rules
following example, all seven variables are being
processed in exactly the same fashion. Therefore,                The ARRAY statement defines which variables are
this program is a prime candidate for constructing and           included in the array. The statement appears within a
using an array:                                                  DATA step and defines the array for the duration of
that DATA step. Array definitions do not carry over
from one DATA step to the next. The same DATA

-1-
step can contain many ARRAY statements.
2.    The name of a SAS function (such as
The word "element" is frequently used to refer to a                       LENGTH, COMPRESS, or TRIM). This is not
variable in an array. The previous array contained                        an error, but it does disable that function for
seven elements. (Technically, one variable could be                       the duration of the DATA step.
two elements if the array statement were to list that
variable twice.)                                                  Next, specify the number of variables in the array,
putting the number in curly brackets. The ARRAY
A single array cannot contain both character and                  statements above used {5} and {7} to indicate the
numeric variables. This make sense since the whole                number of elements. Parentheses (5) or square
purpose of the array is to process many variables in              brackets [5] are also permitted.
the same fashion. After all, how much sense would
this statement make:                                              Due to laziness or other more complex factors, the
number of elements in the array may be unknown.
IF TREES = . THEN TREES = 0;                                     The asterisk can replace the actual number in the
ARRAY statement. The DIM function then becomes
if TREES were a character variable?                               very useful; it counts the elements in an array. For
example, in the following program:
Finally, arrays work with one observation at a time.
They never compare information in one observation                  DATA NEW;
with information in another observation. If your                   SET CITIES;
program must make such a comparison, use other                     ARRAY LYRICS {*} BIRDS BEES FLOWERS
standard tools such as a RETAIN statement, the LAG                          TREES SKY ABOVE LOVE;
function, or BY variables.                                         SIZE = DIM(LYRICS);

the variable SIZE has a value of 7 because the array
Syntax for the ARRAY Statement                                    LYRICS contains 7 elements.

The ARRAY       statement    supplies   the     following         Lastly, the ARRAY statement lists the names of all
information:                                                      variables that make up the array. The two most
common methods for listing variables are:
1.   A name for the array.
1.    Naming each variable, as in the ARRAY
2.   The number of variables in the array.                             statement above.

3.   A list of the variable names.                               2.    Specifying a numbered list. For example,
ELEMENT1-ELEMENT5 means the five
Variations exist. For example, the array statement                        variables    ELEMENT1,       ELEMENT2,
can omit the element names as long as it specifies the                    ELEMENT3, ELEMENT4, and ELEMENT5.
number of elements. The software then creates the
element names by appending numbers to the name of                 The SAS system supports other methods for
the array. If the array were named TEST, for                      specifying a list of variable names. However, these
example, the software would create the elements                   methods are complex and unnecessary 99% of the
TEST1, TEST2, TEST3, etc.                                         time. ARRAY statements can utilize two additional
features. First, the statement may define a default
These are valid ARRAY statements:                                 length for new variables. If ELEMENT1-ELEMENT5
are character variables, and NEWVAR has never
ARRAY ELEMENTS {5} ELEMENT1-ELEMENT5;                            been defined, these two sets of statements would both
ARRAY LYRICS {7} BIRDS BEES FLOWERS                              define NEWVAR as character with a length of 12:
TREES SKY ABOVE LOVE;
ARRAY ADD1 {6} \$ 12 ELEMENT1-ELEMENT5
Use any valid SAS name as the name of the array, but                       NEWVAR;
avoid:
LENGTH NEWVAR \$ 12;
1.   The name of a variable in the SAS data set.                ARRAY ADD1 {6} ELEMENT1-ELEMENT5
This is an error.                                                 NEWVAR;

-2-
SET CITIES;
Lastly, you may encounter implicitly subscripted                   ARRAY LYRICS {*} BIRDS BEES FLOWERS
arrays. The syntax varies slightly:                                          TREES SKY ABOVE LOVE;
DO I = 1 TO DIM(LYRICS);
ARRAY LYRICS (_I_) BIRDS BEES FLOWERS                              IF LYRICS{I}=. THEN LYRICS{I}=0;
TREES SKY ABOVE LOVE;                                     END;

Parentheses (not brackets) now contain a variable                 Finally, implicitly subscripted arrays use only the array
name rather than a number or an asterisk. Slight                  name to refer to an element.
differences in syntax will arise when referring to an
element of an implicitly subscripted array. See the                DATA NEW;
next section of this paper for details.                            SET CITIES;
ARRAY LYRICS (_I_) BIRDS BEES FLOWERS
Implicitly subscripted arrays are not recommended                            TREES SKY ABOVE LOVE;
style. Any program that uses them could also use                   DO _I_ = 1 TO 7;
regular (explicitly subscripted) arrays just as easily or           IF LYRICS=. THEN LYRICS=0;
more easily. (For that matter, any program that uses               END;
arrays could be written without arrays. However, it
might become a much longer program.) These arrays                 or
are described here so that you will recognize them
when you see them, not to encourage you to use                     DO OVER LYRICS;
them.                                                               IF LYRICS=. THEN LYRICS=0;
END;

Referring to an Array Element                                     Herein lies the lone advantage of implicitly subscripted
arrays over explicitly subscripted arrays. The DO
Later statements in the DATA step refer to an array               OVER syntax (illegal with explicitly subscripted arrays)
element by referring to the array name rather than the            conveniently processes every element in the implicitly
variable name. One previous program used this                     subscripted array. Still, implicitly subscripted arrays
technique:                                                        are not recommended. They are described here so
that you will recognize them when you see them.
DATA NEW;
SET CITIES;
ARRAY LYRICS {7} BIRDS BEES FLOWERS                              Usefulness of Arrays: A Sample Problem
TREES SKY ABOVE LOVE;
DO I = 1 TO 7;                                                   In this sample problem, the SAS data set OLD
IF LYRICS{I}=. THEN LYRICS{I}=0;                                contains 20 character variables named LINE1 through
END;                                                             LINE20. Each has a length of 50. These variables
contain text information and are intended to be printed
LYRICS{I} refers to one variable in the array,                    one beneath the next with a statement like:
depending on the current value for the variable I.
When I=4, LYRICS{I} means the variable TREES (the                  PUT LINE1 / LINE2 / LINE3 / LINE4 /
fourth element in the array). When I=7, LYRICS{I}                    LINE5 / LINE6 / LINE7 / LINE8 /
means the variable LOVE (the seventh element in the                  LINE9 / LINE10 / LINE11 /
array). Since LYRICS contains seven variables, the                   LINE12 / LINE13 / LINE14 /
statement:                                                           LINE15 / LINE16 / LINE17 /
LINE18 / LINE19 / LINE20;
DO I = 1 TO 7;
The three variables LINE10 through LINE12 always
processes each element in the array, one by one.                  contain the following text:

If the number of array elements were unknown, the                 LINE10 reads:
DIM function could count them.        The following
program produces an identical result:                             SUMMARY STATISTICS, ALL DIVISIONS

-3-
ARRAY LINES {20} LINE1-LINE20;
(THESE ARE PRELIMINARY FIGURES ONLY.                            DO I=11 TO 18;
LINES{I} = LINES{I+2};
DO I=19 TO 20;
FINAL NUMBERS WILL ARRIVE SOON.)                                 LINES{I}=' ';
END;
Now another month has passed, the final numbers are
in, and the note in parentheses (LINE11 and LINE12)            This program is shorter and more flexible. If the
no longer applies. With or without arrays, a program           number of variables increases from 20 to 50, the first
should blank out the note in parentheses. Without              program (without arrays) would have to add 30 lines.
arrays, the program would be:                                  But this program (with arrays) would remain virtually
unchanged.
DATA NEW;                                                     Let's add one more wrinkle to the original problem.
SET OLD;                                                      Suppose the three key lines don't necessarily begin
LINE11=' ';                                                   with LINE10. The text:
LINE12=' ';
SUMMARY STATISTICS, ALL DIVISIONS
With arrays, the program would be:
appears anywhere from LINE5 through LINE15. Now
DATA NEW (DROP=I);                                            the program must first locate the text and then change
SET OLD;                                                      all subsequent variables. The program with arrays
ARRAY LINES {20} LINE1-LINE20;                                takes 12 statements:
DO I=11 TO 12;
LINES{I}=' ';                                                 DATA NEW (DROP=I START);
END;                                                           SET OLD;
ARRAY LINES {20} LINE1-LINE20;
The second program is longer and more complex. So               DO I=5 TO 15;
why bother with arrays?                                           IF LINES{I}=
'SUMMARY STATISTICS, ALL DIVISIONS'
As the program's objective becomes more and more                  THEN START=I+1;
complex, arrays can simplify the program                        END;
considerably. As an example, consider one shortfall             DO I=START TO 18;
of the previous programs. When printing the report,               LINES{I}=LINES{I+2};
there would now be two blank lines in the middle. A             END;
more complex objective would be to remove the note,             DO I=19 TO 20;
without leaving any blank lines in the middle. Without            LINES{I}=' ';
arrays, the program would be:                                   END;

DATA NEW;                                                     The first DO group locates which of the variables
SET OLD;                                                      among LINE1-LINE15 contains the key text. The
LINE11 = LINE13;                                              program will change the values of all "subsequent"
LINE12 = LINE14;                                              variables. For example, if LINE8 contains the key text,
LINE13 = LINE15;                                              then the program will change the values of LINE9
LINE14 = LINE16;                                              through LINE20. Therefore, the program notes (and
LINE15 = LINE17;                                              assigns to the variable START) the number of the first
LINE16 = LINE18;                                              variable to be changed. In this case, START would
LINE17 = LINE19;                                              receive a value of 9. The last two DO groups work as
LINE18 = LINE20;                                              before, modifying values of the variables.
LINE19 = ' ';
LINE20 = ' ';                                                 The same program without arrays would take about
80 lines! (Being extremely clever, you could write this
Using arrays, the program becomes:                             program in 30 lines without arrays. But if you are that
clever, you don't need to be reading this paper. I will
DATA NEW (DROP=I);                                            honor all written requests for 30-line solutions.)
SET OLD;

-4-
For the Future                                                    The author welcomes questions, comments, and
requests for 30-line solutions. Feel free to call or
In most applications which use arrays, arrays are 10%             write:
of the pie and other tools make up 90%. Therefore,
knowledge of arrays must be combined with                                 Bob Virgile
knowledge of other tools. DATA step tools, especially                     Robert Virgile Associates, Inc.
different forms of the DO statement, are most                             3 Rock Street
important.                                                                Woburn, MA 01801
(781) 938-0307
Consider the sample application, for example. It                          virgile@attbi.com
eliminated two lines, shifting a block of text "up" by
two lines. In practice, this program is likely to be part         SAS is a registered trademark of SAS Institute Inc.
of a more complex system that uses a much greater
variety of tools. A similar program might "insert" blank
lines by shifting a block of text "down." A third
program might change the order of the variable
values, equivalent to "moving a paragraph." Finally,
for running these programs interactively. The menus
might allow users to make requests such as:

•    For observation 5, insert 3 blank lines after
line 4.

•    For observation 25, move lines 14 through 18
to after line 3.

To a small extent, a SAS-based system is now
operating as a word processing system! But arrays
constitute one of many tools needed to accomplish
this.

In your future programs, more complex uses for
arrays may come into play, including:

1.   Creating arrays to hold sets of constants

2.   Relating multiple arrays in one DATA step.
For example, based on the 80 variables
HEIGHT1 through HEIGHT80 and 80 more
variables WIDTH1 through WIDTH80,
compute 80 variables AREA1 through
AREA80 (AREA1 = HEIGHT1 * WIDTH1,
etc.).

3.   Defining multidimensional arrays.

4.   Reading or writing array elements with INPUT
or PUT statements.

These types of capabilities are invaluable for solving
certain problems. However, for most problems, the
introductory concepts and techniques in this paper will
be more than sufficient.

-5-

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 49 posted: 11/21/2008 language: English pages: 5
How are you planning on using Docstoc?