LA MOD�LISATION DU COMPORTEMENT DU CONSOMMATEUR by Y51jP30

VIEWS: 4 PAGES: 59

									    Data Analysis:

 Review and Practical
Application using SPSS
             Data of Interest
   National Insurance Company
    – 1000 questionnaires sent
    – 285 respondents
   Questionnaire Presentation
    – Copy given in class
                        Coding
 Coding broadly refers to the set of all tasks
  associated with transforming edited responses into
  a form that is ready for analysis
 Steps
    – Transforming responses to each question into a set of
      meaningful categories
    – Assigning numerical codes to the categories
    – Creating a data set suitable for computer analysis
Transforming Responses into
   Meaningful Categories
 A structured question is pre-categorized
 Responses to a nonstructured or open-ended
  question to be grouped into a meaningful
  and manageable set of categories
Q 1: In this questionnaire, how many non-
  categorized questions?
      Missing-Value Category
   A missing value can stem from
    – A respondent's refusal to answer a question
    – An interviewer's failure to ask a question or
      record an answer or a "don't know" that does
      not seem legitimate
   Best way to treat missing value responses
    – Sound questionnaire design
    – Tight control over fieldwork
    Assigning Numerical Codes
 Assign appropriate numerical codes to
  responses that are not already in quantified
  form
 To assign numerical codes, the researcher
  should facilitate computer manipulation and
  analysis of the responses
   Multiple Response Question –
       Rank Order Question
    Please rank the following Insurance companies by
     placing a 1 beside the company you think is best
     overall, a 2 beside the company you think is
     second best, and so on.
     __________Progressive
     __________All State
     __________National

    Q2 How would you code the previous question to
     be added to the questionnaire ?
This question requires as many variables (and columns) as there are
objects to be ranked: 3 separate variables are needed
          Creating a Data Set
 Organized collection of data records
 Each sample unit within the data set is called a
  Case or Observation
 Structure of a Data Set
    – The number of observations = n
    – The total number of variables embedded in the
      questionnaire is m, then
   Data set = n x m matrix of numbers
   Importance of Coding Sheet: Anybody can enter
    /check data set. (Copy of coding sheet)
          SPSS Data Set
 2 Views : Variable and Data.
 Raw Variable (labels and values)
 Transformed Variable (compute and
  recode)
    Preliminary Data Analysis:
    Basic Descriptive Statistics
 Preliminary data analysis examines the
  central tendency and the dispersion of the
  data on each variable in the data set
 Measurement level dictates what to do
 Feeling for the data


   What can we do: limitations on next slide?
    Run descriptives. (outputs 1)
Measures of Central Tendency and
 Dispersion for Different Types of
             Variables
       Why Averages May be
            Misleading
   Researchers tested a new sauce product and
    found
    – Mean rating of the taste test was close to the
      middle of the scale, which had "very mild" and
      "very hot" as its bipolar adjectives
   Researcher’s conclusion
    – Consumers need really neither really hot nor
      really mild sauce
       Why Averages May be
        Misleading (Cont’d)
   Deeper examination revealed
    – The existence of a large proportion of
      consumers who wanted the sauce to be mild
      and an equally large proportion who wanted it
      to be hot nor really mild sauce
   Moral of the story:
    – A clear understanding of the distribution of
      responses can help a researcher avoid erroneous
      inferences. Talk about Skewness and Kurtosis.
      Crosstabs: Occurencies in
          specific condition.
   Most of the time with categorical variables

   Examples to run
     Cross-Tabulations- Comparing
        frequencies: Chi-square
           Contingency Test
   Technique used for determining whether there is a
    statistically significant relationship between two
    categorical (nominal or ordinal) variables
Cross-Tabulation Using SPSS for
  National Insurance Company
   One crucial issue in the customer survey of
    National Insurance Company was how a
    customer's education was associated with whether
    or not she or he would recommend National to a
    friend.
Need to Conduct Chi-square Test
    to Reach a Conclusion
   The hypotheses are:

    – H0:There is no association between educational level
      and willingness to recommend National to a friend (the
      two variables are independent of each other).
    – Ha:There is some association between educational level
      and willingness to recommend National to a friend (the
      two variables are not independent of each other).


    – Let’s do it….
            Conducting the Test

   Test involves comparing the actual, or observed,
    cell frequencies in the cross-tabulation with a
    corresponding set of expected cell frequencies(Eij)
            Expected Values

           ninj
Eij =     -----
           n


where ni and nj are the marginal frequencies, that
is, the total number of sample units in category i
of the row variable and category j of the column
variable, respectively
          Chi-square Test Statistic

                r    c          (Oij -      Eij)2

       2 =                 -----------------

                i=1 j=1               Eij

where r and c are the number of rows and columns, respectively,
in the contingency table. The number of degrees of freedom
associated with this chi-square statistic are given by the product
(r - 1)(c - 1).
National Insurance Company
            Study


                     Computed Chi-
                      square value


                      P-value
    National Insurance Company
    Study --P-Value Significance
 The actual significance level (p-value) = 0.019
 the chances of getting a chi-square value as high
  as 10.007 when there is no relationship between
  education and recommendation are less than 19 in
  1000.
 The apparent relationship between education and
  recommendation revealed by the sample data is
  unlikely to have occurred because of chance.
 We can safely reject null hypothesis.
Precautions in Interpreting Cross
      Tabulation Results
   Two-way tables cannot show conclusive evidence
    of a causal relationship

   Watch out for small cell sizes

   Increases the risk of drawing erroneous inferences
    when more than two variables are involved
      Overview of Techniques for
       Examining Associations
   Spearman Correlation Coefficient Technique
   The technique is appropriate when
    – The degree of association between two sets of ranks (pertaining to
       two variables) is to be examined
   Illustrative Research Question(s) This Technique Can
    Answer:
    – Is there a significant relationship between motivation levels of
       salespeople and the quality of their performance?
          Assume that the data on motivation and quality of performance
           are in the form of ranks, say, 1through 20, for 20 salespeople
           who were evaluated subjectively by their supervisor on each
           variable
    Overview of Techniques for
     Examining Associations
             (Cont’d)
 Pearson Correlation Coefficient Technique
 This technique is appropriate when
    – The degree of association between two metric-scaled
      (interval or ratio) variables is to be examined
   Illustrative Research Question(s) This Technique
    Can Answer:
    – Is there a significant relationship between customers'
      age (measured in actual years) and their perceptions of
      our company's image (measured on a scale of 1to 7)?
        Spearman Correlation
             Coefficient
A Spearman correlation coefficient is a measure of
association between two sets of ranks
                                   n
                                 6  d2 i
                                    i =1
                 rs = 1 - ----------------------------
                                 n(n2 - 1)

di = the difference between the ith sample unit's ranks on the
two variables
n = the total sample size
            Pearson Correlation
                     Coefficientdegree of association
The Pearson correlation coefficient is the
between variables that are interval-or ratio-scaled.
Pearson correlation coefficient (rxy) between them is given by

                              n
                               (Xi – X)(Yi – Y)
                             i=1
                    rxy =    -----------------------------
                                    (n-1) sx sy
n = sample size (total number of data points)
X and Y = means
Xi and Yi = values for any sample unit i
sx and sy = standard deviations
 National Insurance Company– Computing
Pearson Correlation Among Service Quality
                Constructs
    National Insurance Company was interested in the
     correlations between respondents’ overall service-
     quality perceptions (on the 10-point scale) and
     their average ratings along each of the five
     dimensions of Service Quality
 National Insurance Company– Computing
Pearson Correlation Among Service Quality
          Constructs Using SPSS
       Interpreting Pearson
      Correlation Coefficients
 Each of the five service-quality measures
  (reliability, empathy, tangibles, responsiveness,
  and assurance) is significantly related to the
  overall quality (OQ) at the .001 level of
  significance
 Responsiveness has the strongest correlation
  (.8625)
 Tangibles have the weakest correlation (.5038)
 All the correlations are strong enough to be
  meaningful
          Comparing Means
   Mainly T-tests and ANOVAs

   T-test on OQ and gender.
          Independent T-tests
   Independent Variable with 2 categories
    max.

   Equality of variance (cf output)

   88% of chance that the difference of .04 is
    due to chance (random effect). Cannot
    reject the null hypothesis.
         Analysis of Variance
   ANOVA is appropriate in situations where
    the independent variable is set at certain
    specific levels (called treatments in an
    ANOVA context) and metric measurements
    of the dependent variable are obtained at
    each of those levels
                           Example
                 24 Stores Chosen randomly for the study


          8 Stores randomly chosen for each treatment


   Treatment 1                 Treatment 2           Treatment 3
                          Store brand sold at
Store brand sold at                               Store brand sold at
                          50¢ off the regular
 the regular price                                75¢ off the regular
                           price
                                                         price



   monitor sales of the store brand for a week in each store
  Table 15.2 Unit Sales Data Under Three
            Pricing Treatments
  Treatment    Regular Price   50 ¢ off   75 ¢ off


Unit Sale in
each store
                    37           46         46


                    38           43         49


                    40           43         48


                    40           45         48


                    38           45         47


                    38           43         48


                    40           44         49


                    39           44         49


 Number of          8             8          8
  stores
 Mean sales       38.75         44.13      48.00
       ANOVA –Grocery Store
           Hypothesis
   Grocery Store Example
    – Ho    1 = 2 = 3
    – Ha    At least one  is different from one or more of
            the others
   Hypotheses for K Treatment groups or samples
    – Ho   1 = 2 = ………..k
    – Ha   At least one  is different from one or more of
           the others
Exhibit 15.1 SPSS Computer
Output for ANOVA Analysis

            Between-Subjects Factors

                         Value Label   N
Treatment   1         Regular
                                           8
group                 price
            2         50 cents off         8
            3         75 cents off         8
 Exhibit 15.1 SPSS Computer
 Output for ANOVA Analysis
            (Cont’d)
                         Tests o f Betw een-Subjects Effects

 Dependent Variable: SALES
                    Type III Sum
 Source             of Squares         df        Mean Square       F       Sig.
 Corrected Model        345.250a             2       172.625     137.445      .000
 Intercept            45675.375              1     45675.375   36367.123      .000
 TREAT                  345.250              2       172.625     137.445      .000
 Error                   26.375             21         1.256
 Total                46047.000             24
 Corrected Total        371.625             23
   a. R Squared = .929 (A djusted R Squared = .922)


There is less than a .001 probability of obtaining an F-
value as high as 137.447
                 ANOVA
   OQ recommendation and OQ, individual
    variable

   OQ and EDUC (Graph)..and post hoc
    Overview of Techniques for
     Examining Associations
             (Cont’d)
 Simple Regression Analysis Technique
 This technique is appropriate when
    – A mathematical function or equation linking
     two metric-scaled (interval or ratio) variables is
     to be constructed, under the assumption that
     values of one of the two variables is dependent
     on the values of the other
  Overview of Techniques for
Examining Associations–Simple
 Regression Analysis (Cont’d)
   Illustrative Research Question(s) this
    Technique Can Answer:
    – Are sales (measured in dollars) significantly
      affected by advertising expenditures (measured
      in dollars)?
    – What proportion of the variation in sales is
      accounted for by variation in advertising
      expenditures? How sensitive are sales to
      changes in advertising expenditures?
      Overview of Techniques for
    Examining Associations (Cont’d)

 Multiple Regression Analysis Technique
 This technique is appropriate when
    – Under the same conditions as simple regression
     analysis except that more than two variables are
     involved wherein one variable is assumed to be
     dependent on the others
      Overview of Techniques for
    Examining Associations (Cont’d)
   Illustrative Research Question(s) this
    Technique Can Answer:
    – Are sales significantly affected by advertising
      expenditures and price (where all three
      variables are measured in dollars)?
    – What proportion of the variation in sales is
      accounted for by advertising and price? How
      sensitive are sales to changes in advertising and
      price?
    Simple Regression Analysis
   Generates a mathematical relationship
    (called the regression equation) between
    one variable designated as the dependent
    variable (Y) and another designated as the
    independent variable (X)
     Independent Variable Vs.
        Dependent Variable
   Independent variable
    – Explanatory or predictor variable
    – Often presumed to be a cause of the other
   Dependent variable
    – Criterion Variable
    – Influenced by the independent variable
     Practical Applications of
      Regression Equations
 The regression coefficient, or slope, can
  indicate how sensitive the dependent
  variable is to changes in the independent
  variable
 The regression equation is a forecasting tool
  for predicting the value of the dependent
  variable for a given value of the
  independent variable
        Precautions In Using
        Regression Analysis
 Only capable of capturing linear associations
  between dependent and independent variables
 A significant R2-value does not necessarily imply
  a cause-and-effect association between the
  independent and dependent variables
 A regression equation may not yield a trustworthy
  prediction of the dependent variable when the
  value of the independent variable at which the
  prediction is desired is outside the range of values
  used in constructing the equation
    Precautions In Using
 Regression Analysis (Cont’d)
 A regression equation based on relatively
  few data points cannot be trusted
 The ranges of data on the dependent and
  independent variables can affect the
  meaningfulness of a regression equation
    Multiple Regression Analysis
   Yi = a + b1X1i + b2X2i + … + bkXki
   Yi is the predicted value of the dependent variable
    for some unit i;
   X1i, X2i, …, Xki are values on the independent
    variables for unit i;
   bl, b2, . . . , bk are the regression coefficients;
   a is the Y-intercept representing the prediction for
    Y when all independent variables are set to zero
National Insurance Company–
 Multiple Regression Using
            SPSS
   Jill and Tom were interested in conducting a
    multiple regression analysis wherein overall
    service quality perceptions is the dependent
    variable and the average ratings along the
    five dimensions are the indpendent variable
             Factor Analysis
   A data and variable reduction technique that
    attempts to partition a given set of variables
    into groups of maximally correlated
    variables
Factor Analysis Output and Its
        Interpretation
   Primary output of factor analysis is a factor-
    loading matrix
Table 15.4 Factor-Loading Matrix Based on Data from
              Study of Star Customers
                         Factor Loadings                Factors         Achieved
                                                       F1       F2    Communalities

                   X4: My friends are very            0.96     0.06       .926
                   impressed with the Star VCR

3 Variables load   X6: No other brand of VCR          0.92     0.17       .875
high on factor 1   even comes close to matching
                   the Star
                   X1: I did not mind paying the      0.89     0.15       .815
                   high Price for my Star VCR

                   X3: I hardly ever worry about      0.18     0.94       .916
                   anything going wrong with my
                   Star VCR
3 Variables load                                      0.09     0.88       .782
                   X5: The Star VCR has the
high on factor 2   latest technology built into it

                   X2: I am pleased with the          0.16     0.86       .766
                   variety of things that a Star
                   VCR can do
                   VCR
                   Eigenvalues: Standardized          2.626   2.454
                   variance explained by each
                   factor
                   Proportion of the total variance   0.438   0.409
                   explained by each factor
        Reducing Star Data
 X1, X4, and X6 can be combined into one
  factor
 X2, X3, and X5 can be into a second factor
 6 variables can be reduced to two factors
      Potential Applications of
          Factor Analysis
   Used to
    – Develop concise but comprehensive, multiple-
      item scales for measuring various marketing
      constructs
    – Illuminate the nature of distinct dimensions
      underlying an existing data set
    – Convert a large volume of data into a set of
      factor scores on a limited number of
      uncorrelated factors
          Cluster Analysis
 Segment objects into groups so that
  members within each group are similar to
  one another in a variety of ways
 Useful for segmenting customers, market
  areas, and products
         Use of Cluster Analysis
 Firm offering recreational services wanted to
  enter a new region of the country
 They gathered data on more than 100
  characteristics including
    –   Demographics
    –   Expenditures on recreation
    –   Leisure time activities
    –   Interests of household members
   The firm identified one or several household
    segments that are likely to be most responsive to
    its advertising and to its services
    How Does Cluster Analysis
            Work?
   Cluster analysis measures the similarity
    between objects on the basis of their values
    on the various characteristics
  Exhibit 15.8 Clusters Formed
     by Using Data on Two
         Characteristics
High




Low
       Low                                                        High
             Extent of participation in outdoor sporting events

								
To top