Docstoc

Quantitative methods

Document Sample
Quantitative methods Powered By Docstoc
					   Multiple Sequence Analysis: a
contextualized narrative approach to
          longitudinal data
        University of Stirling, September 2007

                   Gary Pollock
             Department of Sociology
         Manchester Metropolitan University
              g.pollock@mmu.ac.uk
     Longitudinal processes
start and end times (EHA)
competing risk, multi-episode (EHA)
contiguous states as a single DV (SA)
ie. SA offers an alternative
    (complementary) approach to EHA
 Sequence analysis using OMA
1. Sequences of statuses are processed by….
2. Optimal Matching Analysis (OMA) which
   results in …
3. A distance matrix representing the closeness
   (proximity) of each sequence with all others
   which can then be processed by…
4. Cluster analysis which leads to the
   construction of…
5. A typology of sequence categories
             Single Sequences
• social class (S/N/M)
• eg.         case 1:SSSSSSSSSS
•             case 2:NNNNNSSSSS
•             case 3:NNNNNMMMMM etc.
• Case Analysis: resulting typology is an end-in-itself
• Variable Analysis: typology as a predictor or a
  dependent variable
• Class, employment status, qualifications, housing,
  marital status, housing.. can all be analysed in this way –
  a range of typologies…but these don’t account for
  interactions as they are each independently arrived at
• why not combine sequence data prior to analysis in
  order to capture interactions?
                 Analysis: process
• Create sequence data file
• Determine what to do with internal gaps (fill, delete or skip)
• Determine the ‘costs’ to be used in the OMA (indel and
  substitution). These are the parameters which define the
  distances between the sequences. They work by giving
  low distance scores to similar sequences and high scores
  to dissimilar sequences
• Perform the OMA (though there other SA techniques)
• Weight the distances scores to account for different
  sequence lengths
• Perform cluster analysis
• Analyse clusters (i. sequence progression ii. covariates)
       Indel and substitution costs
case 1:SSSSSSSSSS
case 2:NNNNNSSSSS
case 3:NNNNNMMMMM
• If INDEL = 1 and SUBS = 2 (often a default setting)
• 1,2 = 10
• 1,3 = 20
• 2,3 = 10
• If INDEL = 1 and SUBS = 2 (NM, MN, SM,MS) and 1.5
  (NS,SN)
• 1,2 = 7.5
• 1,3 = 17.5
• 2,3 = 10
      Data: BHPS 1991-2007
• born 1970-1975
• tracked from age 21 to 29
• data shifted to a common time axis
• class and qualifications examined here (housing,
  marriage, employment status and fertility status
  also processed)
• All internal gaps filled
• All sequence lengths included
• Year on year transitions used to inform
  substitution costs
Sequence gaps over waves A to N
Length of gaps across all waves A to N (note that a single case may have
more than one internal gap on a single variable hence a straight % of no.
gaps/N of sequences is problematic, instead % = of all gaps on that variable)
10,264 sequences Class                          Highest qualification
1 yr gap                   2,018 (54%)                   1,224 (70%)
2 yr gap                    761 (20%)                      274 (16%)
3 yr gap                     321 (9%)                       121 (7%)
4 yr gap                     230 (6%)                        67 (4%)
5 yr gap                     144 (4%)                          19
6 yr gap                      87 (2%)                          14
7 yr gap                      63 (2%)                          10
8 yr gap                        35                              6
9 yr gap                        34                              5
10 yr gap                       17                              3
11 yr gap                       10                              1
12 yr gap                       10                              4
N                  3,730                        1,748
              Data: BHPS 1991-2007

N   04
M   03
L   02
K   01
J   00
I   99
H   98
G   97
F   96               Year of birth = 1975:
E   95               Year of birth = 1974:
D   94               Year of birth = 1973:
C   93               Year of birth = 1972:
B   92               Year of birth = 1971:
A   91               Year of birth = 1970:
         16   17   18     19 20         21   22   23   24   25   26   27   28   29
                                 Age
           Single Sequences: class
C 21 22 23 24 25 26 27 28 29
1 3    3   3   3   3 -1 -1 -1 -1   0 = no job yet
2 3    3   3   3   3 3 2 3 3       1 = Service class Higher
3 4    3   3   3   3 3 3 2 3       2 = Service class Lower
4 3    4   4   4   3 3 2 1 -1      3 = Non-manual
5 6    6   4   4   6 -1 -1 -1 -1   4 = Self
6 5    5   5   5   5 6 6 6 6       5 = Skilled
7 0    0   3   3   3 3 3 3 3       6 = unskilled
8 0    0   0   2   4 2 2 2 2
9 3    3   3   3   3 3 3 1 3
10 2   2   4   4   2 2 5 6 1
    Proportions of time spent in a
           particular class

N=810           sch          scl          nm          self          Skil          unsk
          0.0         75.9         54.1        50.9          87.9          66.9          60.9
à   0.2                8.5         10.4         8.6           4.1           8.1           7.3
à   0.4                8.0         14.9        11.7           3.8           6.8          11.1
à   0.6                4.3          9.8        10.0           2.0           7.0           7.7
à   0.8                2.2          5.6         7.7           0.9           3.5           5.4
à   1.0                1.0          5.3        11.1           1.4           7.7           7.7
  Year on year class transitions

                  0   sch     scl    nm    self   skil   unsk Total
           0    242    12     27      29    10     14      36        370
         sch      0   247     75      32     6     14      10        384
          scl     0   114    725     126    15     41      30       1051
         nm       0    73    202     960    14     21      45       1315
         self     0     5     13      10   175     14      25        242
         skil     0    21     57      27    23    597      97        822
        unsk      0    18     41      82    25    108     686        960
Total           242   490   1140    1266   268    809     929       5144
Year on year class transitions: off
 diagonal proportions (N = 1512)
              SCH   SCL   NM   Self   Skilled   Unsk
      None      1     2    2     1          1      2
      SCH             5    2     0          1      1
       SCL      8          8     1          3      2
        NM      5    13          1          1      3
       Self     0     1    1                1      2
   Skilled      1     4    2     2                 6
      Unsk      1     3    5     2         7
Total                                                  100
        Class substitution costs

       None   sch    scl    nm     self   skil   unsk
None   0.0,   1.8,   1.8,   1.8,   1.8,   1.8,   1.8,
sch    1.8,   0.0,   1.2,   1.3,   1.8,   1.7,   1.7,
scl    1.8,   1.2,   0.0,   1.1,   1.7,   1.3,   1.3,
nm     1.8,   1.3,   1.1,   0.0,   1.7,   1.6,   1.3,
Self   1.8,   1.8,   1.7,   1.7,   0.0,   1.6,   1.6,
Skil   1.8,   1.7,   1.3,   1.6,   1.6,   0.0,   1.2,
unsk   1.8,   1.7,   1.3,   1.3,   1.6,   1.2,   0.0;
         Cluster analysis of class
               sequences
An eight cluster solution produces the following:
Clus % cases      description
1        17       non manual, little if any mobility
2        12       service class, lower, little mobility
3        13       unskilled, little mobility
4        12       moving from unskilled to skilled work
5        15       mixed
6         6       skilled, little mobility
7        19       upwards mobility, NM, SCL, SCH
8         5       self employed, little mobility
       Single Sequences: highest
              qualification
C 21 22 23 24 25 26 27 28 29
1 2 2 2 2 2 -1     -1 -1   -1
2 2 2 2 2 2 2       2 1     1
                                 1 = HE
3 2 2 2 2 2 2       2 2     2
                                 2 = Post GCSE/O grade
4 2 1 1 1 1 1       1 -1   -1
                                 3 = GCSE / O grade
5 5 5 5 5 5 5       5 5     5
                                 4 = Other
6 3 3 3 3 3 3       3 3     3
                                 5 = None/at school
7 2 2 2 2 2 2       2 2     2
8 3 3 2 2 2 2       2 2     2
9 3 3 3 3 3 3       3 3     2
10 2 2 2 2 2 2       2 1     1
Proportions of time spent in highest
       qualification statuses

N=831         HE          Post      GCSE     Other      None
                          GCSE
          0        79.8        35.5     72.8       89.5      92.9
à   0.2             0.5         7.9      1.9        0.4       0.1
à   0.4             2.2         8.9      1.9        0.7       0.7
à   0.6             1.6         4.7      2.4          1       0.4
à   0.8             6.4         4.7      1.9        0.2       0.6
à   1.0             9.6        38.3       19        8.2       5.3
    Year on year changes in HEQ
•       HE         A          O          Other         None         Total
HE           710          0          0             0            0            710
A            136       2426          0             0            0           2562
O              4         73       1133             0            0           1210
Other          0         15          5           485            0            505
None           0         14          0             1          264            279
Total        850       2528       1138           486          264           5266
Year on year changes in HEQ: off
 diagonal proportions (N = 248)
        HE        A        O       Other       None
HE                    0        0           0          0
A            55                0           0          0
O            2        29                   0          0
Other        0        6        2                      0
None         0        6        0           0
Total
        HEQ substitution costs



      None   HE     A      O      oth    none
None 0.0,    2.0,   2.0,   2.0,   2.0,   2.0,
HE    1.8,   0.0,   2.0,   2.0,   2.0,   2.0,
A     1.8,   1.1,   0.0,   2.0,   2.0,   2.0,
O     1.8,   1.8,   1.2,   0.0,   2.0,   2.0,
Other 1.8,   1.7,   1.6,   1.7,   0.0,   1.8,
None 1.8,    1.8,   1.6,   1.7,   1.7,   0.0;
Cluster analysis of HEQ sequences
A seven cluster solution produces the following:
Clus % cases description
1    17          from GCSE to post-GCSE
2     7          ‘late’ post GCSE to HE
3    30          post GCSE, stable
4    13          ‘early’ post GCSE to HE
5     6          no qualifications
6    14          GCSE, stable
7    11          other, stable
Multiple Sequence Analysis (MSA)
• combine different sequences prior to OMA
  processing
• eg. class, qualifications, (housing, marital and
  fertility statuses) are combined in a single
  measure
• the sequences represent a narrative of change
  (or stability) on the measured dimensions
• the resulting typology can be analysed using
  case and variable methods as before but is in
  itself a representation of complex time
  embedded associations between the source
  variables
      Multiple Sequences: class and
           highest qualification
C    21   22   23   24   25   26    27    28 29
1    23   23   23   23   23   -1    -1    -1 -1    1st Digit:
                                                   1 = HE
2    23   23   23   23   23   23    22    13 13
                                                   2 = Post GCSE/O grade
3    24   23   23   23   23   23    23    22 23    3 = GCSE / O grade
4    23   14   14   14   13   13    12     -1 -1   4 = Other
5    56   56   54   54   56    -1    -1    -1 -1   5 = None/at school
6    35   35   35   35   35   36    36    36 36
7    20   20   23   23   23   23    23    23 23    2nd Digit:
8    30   30   20   22   24   22    22    22 22    0 = no job yet
9    33   33   33   33   33   33    33    31 23    1 = Service class Higher
10   22   22   24   24   22   22    25    16 11    2 = Service class Lower
                                                   3 = Non-manual
                                                   4 = Self
                                                   5 = Skilled
                                                   6 = unskilled
           Year on year changes
• This is a large (35 by 35 ) matrix
• Calculation of substitution costs as for single
  sequence structure
•   Frequent transitions:
•   12à11 (2.9%)
•   13à12 (2.3%)
•   21à22 (2.6%)
•   22à21 (2.6%)
•   22à23 (4.5%)
•   23à22 (5.9%)
•   26à25 (2.4%)
       Sequence analysis of class-
              HEQ data
Clus   %    description
1      11   post GCSE, NM, stable
2       8   post GCSEàHE, NMàSCL
3       5   no quals, self empàunsk
4      10   GCSE, mixed emp (self,sk,unsk)
5       7   post GCSE, NMàSCL
6       7   GCSE, NM both stable
7       4   post GCSE skilled, both stable
8       6   from unsk and skà SCH, HE
9      15   mixed
10      4   other quals and à SCL, SCH
11      3   post GCSE, SCH/SCLswitching
12      6   other and sk/unsk , stable
13      8   post GCSE , unsk stable
14      2   post GCSE, self , stable
         Advantages of MSA
• Is not limited to a single sequence
  measure
• Is not limited to a single event type
• Articulates the full scope of related
  sequences together
                 Issues
• Increasing complexity of the measure as
  new variables drawn in
• computing time / software switching
• Lack of formal rules in executing the OMA
  and clustering processes
• Largely exploratory: scope to develop in
  relation to EHA

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:11/4/2011
language:English
pages:27