Docstoc

Hypothesis testing

Document Sample
Hypothesis testing Powered By Docstoc
					      More on Regression
          Topics Covered

• Dummy Variables      AWZ 11.6.1




How do you include non-numeric x-
 variables into the methodology?
        More on Regression
            Topics Covered

 • Dummy Variables       AWZ 11.6.1

 • Model Selection       AWZ 12.6




How do you choose the best model from
     a given set of x-variables?
             Dummy Variables
A way of including qualitative variables into
a regression/prediction model.
Q: How do we do it?
A: Create 0/1 variables that code whether
an individual is, or is not, of that group type.
Called dummy variables
                           Dummy Variables
Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
55901     53   Services    Male      0       1        0         1         0
54318     54   Finance    Female     1       0        1         0         0
44177     54   Finance    Female     1       0        1         0         0
46987     40   Finance    Female     1       0        1         0         0
44200     49    Telco      Male      0       1        0         0         1
32394     50   Finance    Female     1       0        1         0         0
48867     39   Services   Female     1       0        0         1         0
49816     55    Telco      Male      0       1        0         0         1
52681     44   Services   Female     1       0        0         1         0
46792     50   Finance    Female     1       0        1         0         0
42648     40    Telco      Male      0       1        0         0         1
51519     48   Finance    Female     1       0        1         0         0
58628     40   Services   Female     1       0        0         1         0
74450     50   Services   Female     1       0        0         1         0
55491     55   Services   Female     1       0        0         1         0
45490     55   Services    Male      0       1        0         1         0
56835     44   Finance    Female     1       0        1         0         0
31646     37   Finance    Female     1       0        1         0         0
41800     40   Services   Female     1       0        0         1         0
44199     44   Finance    Female     1       0        1         0         0
                           Dummy Variables
Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
55901     53   Services    Male      0       1        0         1         0
54318     54   Finance    Female     1       0        1         0         0
44177     54   Finance    Female     1       0        1         0         0
46987     40   Finance    Female     1       0        1         0         0
44200     49    Telco      Male      0       1        0         0         1
32394     50   Finance    Female     1       0        1         0         0
48867     39   Services   Female     1       0        0         1         0
49816     55    Telco      Male      0       1        0         0         1
52681     44   Services   Female     1       0        0         1         0
46792     50   Finance    Female     1       0        1         0         0
42648     40    Telco      Male      0       1        0         0         1
51519     48   Finance    Female     1       0        1         0         0
58628     40   Services   Female     1       0        0         1         0
74450     50   Services   Female     1       0        0         1         0
55491     55   Services   Female     1       0        0         1         0
45490     55   Services    Male      0       1        0         1         0
56835     44   Finance    Female     1       0        1         0         0
31646     37   Finance    Female     1       0        1         0         0
41800     40   Services   Female     1       0        0         1         0
44199     44   Finance    Female     1       0        1         0         0
                           Dummy Variables
Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
55901     53   Services    Male      0       1        0         1         0
54318     54   Finance    Female     1       0        1         0         0
44177     54   Finance    Female     1       0        1         0         0
46987     40   Finance    Female     1       0        1         0         0
44200     49    Telco      Male      0       1        0         0         1
32394     50   Finance    Female     1       0        1         0         0
48867     39   Services   Female     1       0        0         1         0
49816     55    Telco      Male      0       1        0         0         1
52681     44   Services   Female     1       0        0         1         0
46792     50   Finance    Female     1       0        1         0         0
42648     40    Telco      Male      0       1        0         0         1
51519     48   Finance    Female     1       0        1         0         0
58628     40   Services   Female     1       0        0         1         0
74450     50   Services   Female     1       0        0         1         0
55491     55   Services   Female     1       0        0         1         0
45490     55   Services    Male      0       1        0         1         0
56835     44   Finance    Female     1       0        1         0         0
31646     37   Finance    Female     1       0        1         0         0
41800     40   Services   Female     1       0        0         1         0
44199     44   Finance    Female     1       0        1         0         0



         Don’t need all of them. One column is always redundant.
             Dummy Variables
How to use them in regression:
• Include all but one of your dummy variables.
• The one not included becomes baseline.
• Obtain “slope” estimate for included groups.
• Measure mean difference between that group
and the (omitted) baseline group.
                           Dummy Variables
 Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
 55901     53   Services    Male      0       1        0         1         0
 54318     54   Finance    Female     1       0        1         0         0
 44177     54   Finance    Female     1       0        1         0         0
 46987     40   Finance    Female     1       0        1         0         0
 44200     49    Telco      Male      0       1        0         0         1
 32394     50   Finance    Female     1       0        1         0         0
 48867     39   Services   Female     1       0        0         1         0
 49816     55    Telco      Male      0       1        0         0         1
 52681     44   Services   Female     1       0        0         1         0




Include dummy for males only →
Slope estimates mean salary of males minus
mean salary for females (baseline).
                           Dummy Variables
 Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
 55901     53   Services    Male      0       1        0         1         0
 54318     54   Finance    Female     1       0        1         0         0
 44177     54   Finance    Female     1       0        1         0         0
 46987     40   Finance    Female     1       0        1         0         0
 44200     49    Telco      Male      0       1        0         0         1
 32394     50   Finance    Female     1       0        1         0         0
 48867     39   Services   Female     1       0        0         1         0
 49816     55    Telco      Male      0       1        0         0         1
 52681     44   Services   Female     1       0        0         1         0




Include dummy for females only→
Slope estimates mean salary of females minus
mean salary for males (baseline).
                           Dummy Variables
 Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
 55901    53    Services    Male      0       1        0         1         0
 54318    54    Finance    Female     1       0        1         0         0
 44177    54    Finance    Female     1       0        1         0         0
 46987    40    Finance    Female     1       0        1         0         0
 44200    49     Telco      Male      0       1        0         0         1
 32394    50    Finance    Female     1       0        1         0         0
 48867    39    Services   Female     1       0        0         1         0
 49816    55     Telco      Male      0       1        0         0         1
 52681    44    Services   Female     1       0        0         1         0




Include dummies for telco/finance only →
Slope estimate for telco estimates mean
salary of telco managers minus mean
salary of Services managers.
                           Dummy Variables
 Salary   Age    Sector    Gender   Female   Male   Finance   Services   Telco
 55901     53   Services    Male      0       1        0         1         0
 54318     54   Finance    Female     1       0        1         0         0
 44177     54   Finance    Female     1       0        1         0         0
 46987     40   Finance    Female     1       0        1         0         0
 44200     49    Telco      Male      0       1        0         0         1
 32394     50   Finance    Female     1       0        1         0         0
 48867     39   Services   Female     1       0        0         1         0
 49816     55    Telco      Male      0       1        0         0         1
 52681     44   Services   Female     1       0        0         1         0




Include dummies for telco/finance only →
Slope estimate for finance estimates mean
salary of finance managers minus mean
salary of Services managers.
Example
273 managers within one industry
Response: – salary in US$ pa
Explanatory variables:
         – experience
         – Education level
         – Age
         – Gender
Example
273 managers within one industry
Possible issues of interest:
  – Benchmarking salaries for HR (trend estimate
  and identifying high/low salaries)
  – Gender/Age discrimination (testing whether
  there is an effect)
  – Value of education (measuring an effect of
  particular interest)
  – identifying exceptional individuals
Class exercise
273 managers within one industry

Allow quadratic trend in age and experience,
and simple linear model in education.
Remove unnecessary x-variables (P>0.2).
What is your assessment of the presence of
gender discrimination?
What is you assessment of the amount of age
discrimination?
Example
273 managers within one industry

Does each level of education add the
same amount ($3000) to salary?
Dummy variables can be used to describe
the effect of education in a completely
flexible manner.

Class exercises (c)-(f)
       Automatic Model Selection
What is the problem?
Example. Predict salary from age, gender,
experience, education, job class (4 of them).
Allowing quadratic effects, there are 10
possible predictors.
# possible models: 210>1000.
Which is the best model?
       Automatic Model Selection
Model criteria.
A. Smallest estimated σ or largest r2ADJ
B. All included variables significant at user
   chosen level. All excluded variables
   insignificant at user chosen level.

B allows some user flexibility.
       Automatic Model Selection
Search Strategy (after criterion).
• Check all possible models. (Ideal)
• Start with all variables in. Remove one by
one if this “improves” the model. (backward)
• Start with all variables out. Include one by
one if they “improve” the model. (forward)
• Combine forward and backward. (stepwise)
       Automatic Model Selection
StatproGo Methods.
• Backward, Forward or Stepwise
• User sets significance levels for inclusion
and/or exclusion. (Warning: careful with 0,1)
• Different methods can sometimes lead to
different models.
•You should check residuals, measures of fit
and logic/interpretation afterwards.
Example (ct.d)

  273 managers within one industry
  Response: – annual salary in US$ pa
       Exp(2)
       Age(2)
       Gender(1)
       Education level(7)
  There are over 212>4000 possible models!
  Which are the best?
Exercise
HK real estate data
Try and find the best model you can.
• dummies for beds(3) and baths(2)
• quadratics for age(2) and area(2)
• cars as linear (1) (Why???)
• Run Stepwise with P-to-leave=0.19, P-to-enter=0.20.
• predict an apartment which is 2000ft2, 7 years old with
  2 beds, 2 baths and no space.
• How would you get a better model?


                                Slidesdata\clearwater bay.xls
KEY TAKE AWAYS FROM CLASS

Dummy variables for
  – Qualitative input variables
  – Flexible modelling of input variable
Model selection
  – Large number of possible models
  – Partly automated in StatproGo
  – Still need to check residuals
  – Possible to miss best model

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:10/7/2012
language:Latin
pages:22