Stata Seminar Session 2: Syntax Francisco Jose Gonzalez Carreras fjg23@sussex.ac.uk 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 1 First: to read an excel file 1. 2. 3. 4. 5. 6. 7. Download the file and save. Open the excel file and save is as Text (tab delimited) Open stata and go to the directory where the new text file is. Remember cd f:\ (for example) (In this case with data1) type set memory 5m (we are increasing the memory because the file is big and otherwise it would give us an error) 5 type insheet using data1.txt, clear Type save data1saved and is already a .dta file See that we lost labels, it is better to download from the original source. *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 05/11/08 2 Elements of Stata Commands How to use stata commands. Elements can be: required permitted prohibited Type help summarize summarize [varlist] [if] [in] [weight] [, options] Command: summarize [varlist] means variable list [if] stands for the if qualifier (if gender==1…). Qualifiers restrict the command to a particular subsample of the database [, options], it specifies the command in a more particular way. Start session and load data1 (big brother and use) log using session2.log use data1, clear 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 3 Syntax: commands Commands can be abbreviated, see that in the help some letters of the command are underlined: these are the shortest possible abbreviation of the command. This means that summarize can be written as su, but also sum summ summa… Type su income and then sum income: it is exactly the same 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 4 Syntax: Commands Command Describe generate graph help list regress summarize save sort tabulate use 05/11/08 Recommended d gen graph h l reg sum save sort tab use Usage Describe data in memory Create new variables Graph data Call online help List data Linear regression Descriptive statistics Save data in memory Sort data Tables of frequencies Load data into memory *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 5 Syntax: Variable list [varlist] varlist means that you can use a variable list, that you enter by writing the name of the variables, separated by spaces [varlist] in square brackets the variable list is possible but not required. summarize varlist with no square brackets, it is required: if you do not write it the program will report an error. Type help drop. Typing just drop gives an error. After that type drop _all. All are dropped. We cannot reverse this command, so we have to upload the file again: use data1, clear *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 6 05/11/08 Syntax: Variable list [varlist] You can also see varname or depvar with no brackets. This is the case for variable lists that consist of one single variable. Sometimes it is a variable within a list of variables where the order is important Type help regress: you see depvar (for dependent variable) and then [indepvars] that you can enter or not Remember session 1 regress income sex fulltime *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 05/11/08 7 Syntax: Variable list [varlist] Abbreviation rules Single variables Abbreviate the name of the only variables that begin with a letter: kitchen with k sum k summarizes kitchen ~ save typing characters of a variable sum y~h summarizes ybirth and saves you birt Multiple variables Use ? for variables with the same name except one character: sum np940? Use asterisk * to specify variables that share characters in their name: sum np* summarizes all the variables that begin with np Hyphen – to specify a range of variables that should be in order. sum kitchen-phone is equal to write sum kitchen phone shower wc heating cellar balcony garden phone Let’s mix resources: sum r~s np* k-ph summarizes a lot of variables!!! *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 8 05/11/08 Syntax: Options [, options] Commands have a default execution and options modify it. They are different for each command and are possible when you can see the word option after a comma. Different options are described below. We did this in the first session. Type: summarize income, detail See also syntax of tabulate. Type: tabulate gender np9506, missing row *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 9 05/11/08 Qualifiers: [in] (order) The in qualifier limits the execution of the command to a subset of observation. ORDER It is composed by the word in and a range of observations separated by /(slash). Before the slash will be the first observation for which the command will be executed and after the / will be the last: FIRST/LAST. If the range is a single observation, one its number is enough. Remember that it might be very important to sort before the data to make sure that you execute the commands for those observations that you want. Examples: list persnr gender ybirth in 10 (only the tenth observation) list persnr gender ybirth in 10/14 (tenth to fourteenth) list persnr gender ybirth in -5/-1 (fifth from the last / the last) list persnr gender ybirth in -5/l (fifth from the last / the last) list persnr gender ybirth in 3330/-5 (3330th / 5th from the last) *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 10 05/11/08 Qualifiers: [if ] (condition) Restricts the execution of the command to those observations that meet a particular condition, that has to follow the qualifier. We did sum income if gender ==1. See it here: list income gender in 1/5, nolabel list income gender in 1/5 if gender==1, nolabel See how does if work?? sum income if ybirth < 1979 sum income if ybirth <= 1979 sum income if ybirth ~= 1979 tab edu, missing nolabel sum ybirth if edu>=6 see that 28 missings are added Try these others: Careful with the infinite mistake: the missing trap!!!!! 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 11 Qualifiers: [if ] Relational operators == Equals Practice: sum ybirth if edu==6 | edu==7 sum ybirth if edu>=6 & edu <=7 sum ybirth if edu>=6 & edu<. > < Higher than Lower than Higher than or equal to >= <= & Lower than or equal to links two conditions: this AND that | *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. Or, either comply with this OR with that 12 05/11/08 Expressions, operators Allowed or required when the term exp appears in the syntax diagram. Type help generate This command needs an expression after the command name. Stata calculator: display. Type: display 2+2 Operators : *, +, /, ^ type : display 3==2 display 2==3 | 2==2 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 13 05/11/08 Lists of numbers 1,2,3,4 1234 1,2,3,4 1,2,3,4 1/4 2 4 to 8 8 6 to 2 2 4: 8 1,2,3,4 2,4,6,8 8,6,4,2 2,4,6,8 8 6: 2 2(2)8 8(-2)2 8/10 15 to 30 32 to 36 8/10(5)30(2)36 8,6,4,2 2,4,6,8 8,6,4,2 8,9,10,15,20,25,30,32,34, 36 8,9,10,15,20,25,30,32,34, 36 14 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. Using filenames In stata some commands read or write a file. In the syntax is expressed with using filename Normally it consist of a directory, the name of the file itself and the extension. F:\data1.dta If you type a name stata looks for in the current directory (bottom left hand corner). If it is not here, it will report it. If you type a filename without extension, Stata looks for one with an extension that is appropriate for the specified Extensions and commands in the table. .dta use; save; post; append; merge; joinby; describe log cmdlog do; run graph using; graph, saving() 15 .smcl .txt .do .gph 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. Repeating similar commands: by prefix Sometimes you will need to type similar commands over and over again. There are two main options to do this: by prefix: already known, remember it executes the commands by the batches determined by the prefix command. Type sort gender by gender: sum income by edu, sort: summarize income Play with the by prefix: bysort edu: summarize income by gender edu, sort: sum income (the same in only one command, adding one variable) *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 16 05/11/08 Repeating similar commands: foreach or forvalues loops Syntax: DO NOT COPY-PASTE THE LOOPS, errors will appear because of the formats. foreach lname listtype list { commands referring to lname } The first line starts the loop and ends into a { Then add stata commands Close the loop The element name (lname), the list type (listtype), and the foreach list (list) { You state the name of the element, the list of parameters you want to execute the commands on, then close, then the command (s) and close the }. This example a list of variables foreach x of varlist np9501 – np9504 { tabulate `x’ gender } 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 17 Repeating similar commands: foreach Other examples of foreach: foreach var of newlist r1-r10 { gen `var’=uniform() } List of new variables. Uniform() creates uniformly distributed random number. foreach num of numlist 1/10{ replace r`num’=uniform() } List of numbers. Replace because the variables already exist Practice: let’s label all the variables r1-r10 writing a common label: 1st uniform variable, 2nd uniform variable… First, second, third typing, the rest with a loop. *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 18 05/11/08 Repeating similar commands: forvalues It has a simplified syntax: forvalues lname=range { commands } forvalues num=1/10 { replace r`num’=uniform() } Practice: replace the label of variables with a forvalues loop instead. 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 19 Repositories Stata saves results of statistical commands in r() and of estimation commands in e(). They are called repositories. Statistical: summarize. Type summarize income and then return list and you will see the contents of the last r-class command Estimation: regress. Type regress income yedu and then ereturn list You can operate with them. Type sum income display r(mean) + 1.96*sqrt(r(Var)/r(N)) display r(mean) - 1.96*sqrt(r(Var)/r(N)) Stored results are deleted with a new command. Some commands store results into matrices but we will not see them 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 20 Unplug big brother log close doedit Copy the commands in the review window and paste in the doedit. Save as session2 clear exit 05/11/08 *Source: Kohler, U. and F. Kreuter (2005). Data Analysis Using Stata. 21

