Multiple Uses for a Simple SQL Procedure

Document Sample
Multiple Uses for a Simple SQL Procedure Powered By Docstoc
					Multiple Uses for a Simple SQL
          Procedure
          Rebecca Larsen
     University of South Florida
                              Introduction
• The SQL Procedure…Equivalent or advantageous
  to other SAS data steps and procedures?
• Handy SQL Procedure:
    CREATE TABLE…
    INSERT INTO…SELECT
• Several brief examples of using the above
  procedure will be presented
SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries.
   ® indicates USA registration.
Other brand and product names are registered trademarks or Trademarks of their respective
    companies.
       Overview of Topics
1. Inserting Rows from Multiple Tables into
   One Table
2. Inserting Rows from Multiple Queries into
   One Table
3. Changing the Order of Variables in a SAS
   Data Set
4. Changing Variable Attributes and/or Names
   in a SAS Data Set
1. Inserting Rows from Multiple
     Tables into One Table
• Why would you want to insert rows from
  multiple tables into one table?
  – Information related to one
    customer/patient/client is stored in multiple
    data sets and linked by a common identifier
  – Information related to one issue/problem you
    would like to study is stored in multiple data
    sets and needs to be combined
   Health Care Claims Example
• In this first example, there are multiple source
  files with different record layouts
   –   HomeHealth_clm – record layout 1
   –   Hospice_clm – record layout 1
   –   SkilledNursing_clm – record layout 1
   –   Inpatient_clm – record layout 1
   –   Outpatient_clm – record layout 2
   –   PartB_clm – record layout 2
• All files contain information needed to complete
  your research study
• You need to combine information from all source
  data sets into one “master” analysis data set
 Record Layouts of Health Care
   Claims Example Data Sets
• Record Layout 1:            • Record Layout 2:
  –   person_id                   –   person_id
  –   health_status_cd            –   health_status_cd
  –   service_category_cd         –   service_category_cd
  –   race_cd                     –   race_cd
  –   procedure_amt*              –   service_amt*
  –   procedure_dt*               –   service_dt*

*In this example, source variables have different names
in the different record layouts, although they contain
analogous information and should be analyzed as one
variable in the “master” data set
 Create a new, blank table using
      the following code:
PROC SQL;
CREATE TABLE allclaims
(
person_id                char(11),
health_status_cd         char(2),
service_category_cd      char(2),
race_cd                  char(1),
serv_amt_cu              num,
service_dt               num
)
;
QUIT;
 Use a macro to insert values from
multiple source files into one data set

/** 1. Define macro **/
%MACRO insert(amount, date, file);
  PROC SQL;
   INSERT INTO allclaims
    SELECT person_id,
           status_cd,
           service_category_cd,
           race_cd,
           &amount.,
           &date.
   FROM &file.;
  QUIT;
%MEND insert;
                 (continued)

/** 2. Call macro **/
%insert(service_amt, service_dt, HomeHealth_clm);
%insert(service_amt, service_dt, Hospice_clm);
%insert(service_amt, service_dt, Inpatient_clm);
%insert(service_amt, service_dt, SkilledNursing_clm);
%insert(procedure_amt, procedure_dt, Outpatient_clm);
%insert(procedure_amt, procedure_dt, PartB_clm);
                      Notice…
• …that different source variables for the original
  files are inserted into a single variable in the new
  file
   – Both variables must be of the same type in the original
     file as being created in the new file (i.e. numeric into
     numeric)
      • service_amt (num) vs. procedure_amt (num)
      • service_dt (num) vs. procedure_dt (num)
• One master data set was created for the purpose of
  completing analysis on information related to
  health care claims data from multiple source files
2. Inserting Rows from Multiple
     Queries into One Table
• Why would you want to insert rows from
  multiple queries into one table?
  – You want to create summary records from your
    master data set based on different “where
    clause” conditions in separate queries and insert
    them into one summary data set
Health Care Claims Example, Part 2
• Continuing with the health care claims
  example provided above, a summary table
  with one record per person is desired to be
  used in statistical analyses, i.e. ANOVA,
  PROC LOGISTIC, etc.
• You want the summary data set to be
  created with the following characteristics:
  – One record per person,
  – “Race_cd” (5 class levels) to be categorized to
    create the new variable, “minority_cd” (2 class
    levels),
               (continued)
– Summary variable “mean_cu” to be created as
  the mean amount paid per person,
– Records with missing service amount to be
  excluded,
– Persons with more than one value for
  minority_cd were excluded (to eliminate
  persons with indistinguishable minority status)
   • For statistical tests with a CLASS variable such as
     minority status, you want to make sure the classes
     are mutually exclusive, or you will compromise the
     validity of your results
 First, create the blank summary
 table using the following code:
PROC SQL;
CREATE TABLE minority
(
person_id               char(11),
health_status_cd        char(2),
service_category_cd     char(2),
minority_cd             char(4),
mean_cu                 num
)
;
Next, Insert values from multiple queries
 into one summary data set (Query 1)
/** Query 1 **/
INSERT INTO minority
 SELECT
  distinct person_id,
  health_status_cd,
  service_category_cd,
  case
  when race_cd = ‘1’
   then ‘1’
   else ‘0’
  end as minority_cd,
  mean(serv_amt_cu) as mean_cu
 FROM allclaims
 WHERE serv_amt_cu is not missing
   and service_category_cd = ‘mh’
   and count(distinct minority_cd) = 1
 GROUP BY person_id;
     Continue inserting values into the
       summary data set (Query 2)
/** Query 2 **/
INSERT INTO minority
 SELECT
  distinct person_id,
  health_status_cd,
  service_category_cd,
  case
  when race_cd = ‘1’
   then ‘1’
   else ‘0’
  end as minority_cd,
  mean(serv_amt_cu) as mean_cu
 FROM allclaims
 WHERE serv_amt_cu is not missing
   and service_category_cd = ‘ph’
   and count(distinct minority_cd) = 1
 GROUP BY person_id;
QUIT;
                  Cool Code…
• The count function in this WHERE clause is an
  example of PROC SQL code that can accomplish
  in one step, what would take more than one step
  and several more lines of regular SAS code
   where   serv_amt_cu is not missing
     and   service_category_cd = ‘ph’
     and   count(distinct minority_cd) = 1
   group   by person_id
• The case expression can be used to create a new
  variable that is a “re-categorization” of the values
  of another variable
   case
     when race_cd = ‘1’ then ‘1’
     else ‘0’
   end as minority_cd
    3. Changing the Order of
   Variables in a SAS Data Set
• Why would you want to change the order of
  the variables in your data set?
  – Perhaps your data step manipulations have
    altered the order of the variables from the
    original, and you need to export the data in the
    same format you received it
  – Or you wish to arrange certain variable types
    together for coding processes to be run on
    sequential variables in your data set
 Example of Changing the Order
  of Variables in Your Data Set
• In the following example, oldfile (Table 1)
  has variables in a certain order that you
  would like to change, and newfile (Table 2)
  has variables in the order to which you
  would like to change
 Before changing                     …After changing
Table 1. Order of variables before   Table 2. Order of variables after changing
changing (oldfile)                   (newfile)
pin                                  pin
sex_cd                               sex_cd
race_cd                              race_cd
diagnosis_cd                         DOB
procedure_cd                         age
DOB                                  diagnosis_cd
enrollment_dt                        procedure_cd
termination_dt                       enrollment_dt
death_dt                             termination_dt
age                                  death_dt
 The following code was used to
   change the variable order:
        /** Step 1 **/             /** Step 2 **/
PROC SQL;                  INSERT INTO newfile
CREATE TABLE newfile        SELECT
(                            pin,
pin            char(11),     sex_cd,
sex_cd         char(1),      race_cd,
race_cd        char(1),      DOB,
DOB            num,          age,
age            num,          diagnosis_cd,
diagnosis_cd   char(5),      procedure_cd,
procedure_cd   char(5),      enrollment_dt,
enrollment_dt num,           termination_dt,
termination_dt num,          death_dt
death_dt       num          FROM oldfile;
);                         QUIT;
              Remember…
• …the order in which you specify the
  variables in the INSERT INTO…SELECT
  statement must be the exact order of the
  variables that are created in the blank data
  set, NOT the order of the data set from
  which they came.
 4. Changing Variable Attributes
 and/or Names in a SAS Data Set
• The names of the variables do not have to be
  identical in the newly created table as the table
  from which you are selecting the records
• Other attributes (i.e. formats, informats, labels,
  character length, etc.) can be easily added or
  changed
• For example, we will create a new file called,
  “newfile2”, with different variable names and
  attributes from the data set in the previous
  example called, “newfile”
    Use the following code to change the
  name of variables and add a date formats
              in your data set:
          /** Step 1 **/                     /** Step 2 **/
PROC SQL;                           INSERT INTO newfile2
CREATE TABLE newfile2                SELECT
(                                     pin,
pin_id         char(11),              sex_cd,
sex_cd         char(1),               race_cd,
race_cd        char(1),               DOB,
DOB_dt         num format=date9.,     age,
age_iv         num,                   diagnosis_cd,
diagnosis_cd   char(5),               procedure_cd,
procedure_cd   char(5),               enrollment_dt,
enrollment_dt num format=date9.,      termination_dt,
termination_dt num format=date9.,     death_dt
death_dt       num format=date9.     FROM newfile;
);                                  QUIT;
                Notice…
• If all you want to do is change names or
  attributes of variables in a data set, then
  PROC DATASETS is a much more
  efficient tool; however,
• If you have other steps that need to be
  accomplished by writing a new file, then the
  ability to change names and attributes is
  available in PROC SQL
                  Conclusion
• There are many more applications for the
  CREATE TABLE…INSERT INTO…SELECT
  statements in PROC SQL than the few examples
  shown
  – Many times it is just another way to do the same thing
  – Other times it may cut down the number of sorts, data
    steps & procedures or lines of code required
  – Try it out for yourself…you may find something you
    will wish you had always known!
• My preferred SQL coding resource is the
  following user’s guide:
  – SAS Institute Inc. (2000), SAS®
    SQL Procedure User’s Guide,
    Version 8, Cary, NC: SAS
    Institute Inc.
       About the Speaker
            Speaker:   Rebecca Larsen,
                       Assistant in Research

Location of Company:   13301 Bruce B. Downs Boulevard
                       MHC2617
                       University of South Florida
                       Tampa, Florida 33612-3807

          Telephone:   (813) 974-7206
                Fax:   (813) 974-6411

              Email:   rlarsen@fmhi.usf.edu

       To view the presentation slides online, go to
   http://psrdc.fmhi.usf.edu/SQL_Savannah_2002.ppt