Document Sample
STATA Powered By Docstoc
           Today’s Workshop
• General comprehension of how STATA handles data /
  wants you to think

• Introduction to Commands

• Importing Data

• Running Regressions

• Regression Tests

• Manipulating Data
               Thinking About STATA
• Model for working with data: Word Processor
   – A copy of the data are loaded into memory and worked with there. (*No
     changes are made to the copy of the data on disk until you explicitly
     replace the file)

• Connected with the web and your folders
• Commands!
• File types:
    .do files → txt files with your commands, for future reference and editing
    .log files → txt files with your output, for future reference and printing
    .dta files → data files in Stata format
    .gph files → graph files in Stata format
    .ado files → programs in Stata
Command Summary                       Command Results,
                                      Main place to
                                      monitor work

                                   Viewer: A Side Window
                                   which provides
                                   additional information
                                   (such as a help quarry)

Data Summary      Command Window
• Syntax:
           Command       varlist       if exp        in range

                 List of variables   If expression    Observation number
                                                      •Written beginning#/end#
                                     • set with a
                                                           •ex: 1/10
                                     qualifier like >5,
                                     meaning “greater
                                     than 5,” or ==20,
                                     meaning “is 20”

*For this presentation, all commands will be followed by a
colorful → and description of what they do
            Most Common Commands
         Category                                 Stata Commands
      Getting online help         search, findit, help

  Operating system interface      pwd, cd, sysdir, mkdir, rmdir, dir, erase, copy,

Using and saving data from disk   use, save, append, merge, compress

    Inputting data into Stata     input, edit, infile, infix, insheet

The Internet and updating Stata   update, net, ado, news

     Basic data reporting         describe, codebook, list, browse, count, inspect,
                                  summarize, table, tabulate

      Data manipulation           generate, replace, egen, rename, drop, keep, sort,
                                  encode, decode, order, by, reshape, collapse

          Formatting              format, label

  Keeping track of your work      log, notes

         Convenience              display
• STATA will give you lots of information on how to fix
  whatever went wrong
   – Click on the blue error message
   – Viewer will pop-up with reason

• Search
   – Help → Search
   – .help term
• Interactively connected to your folders
   – Can directly pull or save files from anywhere on your computer
   – .pwd → tells you in what directory you are currently
   – Once you are in a directory, any file saved will be located in this directory
      .use filename → open any file saved in that directory
      .save filename → save file in STATA format
      .save filename, replace → overwrites the dataset
      .mkdir → makes a new directory, i.e. a new folder
   – .cd → change your directory
      ex: .cd\users\f_desmarais\workshop takes you to the folder “Workshop”
        in my file on the h: drive

   * In General Do Not Save in STATA directory
        • Back-up your files by saving somewhere else
              Exercise #1
• Create a directory for your STATA work in
  your personal file on the network
• Keep track of everything you do
   – Create a log (saves in current directory):
     .log using filename → makes a new log

   – Add to a log:
     .log using filename, append → appends a log
     .log using filename, replace → overwrites a log

   – Close a log:
     .log close → closes a log
              Exercise #2
• Create a log for today’s workshop
                     Importing Data
• The Long Way…
  –   Open Data Editor
  –   Data → Data Editor
  –   Click on first cell and paste data
  –   Save as a .dta file
       .save filename → saves file in STATA format

• The Short Way…
  – Save data in Excel as an .xml file
  – Type Command:
     .xmluse filename → imports file
     .xmluse filename, firstrow → makes first row variable names
     .xmluse filename, firstrow allstring → imports alpha strings
     .destring filename, replace → Converts strings to numeric data
  – Save as a .dta file
       .save filename
                  Importing Data
• A few more things:
  – .clear → removes any data you might be working on, unless
    you save the data before, nothing you did will affect the actual
    data (important to do before you import new data)

  – Memory Space
     • STATA likes to use as little space on your computer as
         .set mem XXXm → expand memory size
         .compress → compresses data

  – Dictionaries
     • Can specify how you want to import data
       (search “dictionaries” if you want to learn more)
               Exercise #3
• Import the Micro file
• Save as a .dta file in your directory
                    Data Reporting
.describe → basic information on variables
.summarize → basic descriptive statistics
.codebook → descriptive statistics, lots of information
.list → spreadsheet form
.label → create variable labels and values
.table → frequency table
.q → stops STATA in whatever its running

              Exercise #4
• What is the standard deviation of each
               Running Regressions

• The Long Way…
  – Statistics → Linear models and related→ Linear regression

  variable                                               Select
  from list                                              independent
                                                         from list
                          Exercise #5
   • Run a regression of price against lot_size,
     trees, and distance

Coefficient Information
(notice constant,
_cons, is listed last)
            Running Regressions
• The Short Way…

  – Command:
 regress depvar [indepvars] [if] [in] [weight] [, options]

  – For the regression we just ran:
      .regress price lot_size trees distance → regresses
        the first, dependent variable against the following
        independent variables
                   Regression Tests & Fixes
   • Correlations and Covariance
        – Statistics → Summaries, tables, and tests → Summary and
          descriptive statistics → Correlations and covariances

Select variables

        – Command:
        .cor variable → correlates the covariances of coefficients
           Regression Tests & Fixes
• Heteroskedasticity

  – Command:
     .imtest, white

  – Fix:
     .regress dependent independents, r → runs regression
       with robust standard errors
        Regression Tests & Fixes
• Looking at Residuals

  – Commands:
     .predict   resid, r → creates residuals
     .plot resid dependent_variable → graphs residuals
       Regression Tests & Fixes
• Normality

  – Skewness/Kurtosis Test:
     .predict resid
     .sktest resid → calculates skewness/kurtosis
              Exercise #6
• Check to see if you have heteroskedasticity
                Data Manipulation
.generate → create new variable
  .generate variable_name = expression
  ex: generate log_lotsize = log(lotsize)

.rename → rename a variable
  .rename old_name new_name
  ex: .rename trees shrubbery

.drop → delete a variable
.keep → keep a variable
.replace → replace one variable with another
.sort → sort a variable in ascending order
.gsort → allows you to sort by multiple variables
.encode → string to numeric
.decode → numeric to string
.by → runs command only on attribute you specify
                 .ado Program Files
• Importing
   – Anyone can make them and post
   – Can download them into STATA : Keep STATA updated!

• Create your own:
   – .do filename → creates a .do file in your current directory (follows
     same command rules as all other files: i.e. replace, save, close)
   – Each line is read as a command
   – Lines that begin with * are ignored and you can use them as notes
                      Help! (again)
• The commands:
   Help → Search
   .help command → command information
   .search keyword → searches all sources
   .search net keyword → only searches the internet
   .findit keyword → searches unofficial sites as well

• The sources:
   (SJ) STATA Journal                 Internet
   (STB) STATA Technical Bulletin
   STATA website
   Statalist – a list serve
            Workshop Completed
If you would like more individual help,
  please see the Technical Statistical
            Coordinator 

Shared By: