Sample Working Reference Letters for Immigration - PowerPoint

Document Sample
Sample Working Reference Letters for Immigration - PowerPoint Powered By Docstoc
					                 DNA Forensics
• DNA Forensics deals with the use of recombinant DNA
  technology on one or more biological specimens for
  forensic investigation
• Common use of DNA Forensics include: Human
  Identification, Kinship Analysis for Missing Person
  Identification, Parentage Testing, etc.
• Probability and Statistics play important roles in assessing
  the strength of DNA evidence in all such applications
• Events in DNA forensics are generally low probability
  events, and statistical assessment of DNA forensic data
  requires estimation based on sparse multi-dimensional data
    Brief Introduction of the DNA Forensics
           Session of the Symposium
•   Four talks will address some of the major Statistical/Probabilistic issues
    of DNA Forensics
•   Current paradigm of the topic will be the focus of the first talk (R.
•   B. Budowle will address challenges to such paradigm, when DNA
    quantity is low, and for identification of source of microbial agents in
    forensic samples
•   T. Wang will introduce the need of pedigree-based probabilistic
    calculations for missing person identification
•   A. Eisenberg will discuss possible statistical formulations applicable for
    newer technologies that being (or, about to be) implemented in the field
•   All four speakers are major players in DNA Forensics in the country;
    contributed significantly in the development of DNA Forensics; and
    together, have over 75 years of experience working in the subject
   Statistical and Probabilistic Issues in
   DNA Forensics: Current Paradigms

                  Ranajit Chakraborty, PhD
       Robert A. Kehoe Professor and Director
            Center for Genome Information
         Department of Environmental Health
      University of Cincinnati College of Medicine
              Cincinnati, OH 45267, USA
           Tel. (513) 558-4925/3757; Fax (513) 558-4505
(Presentation at the University of Cincinnati Symposium on Probability Theory and
                          Applications on March 21, 2009)
                Overview of the Talk
• Brief History of DNA Forensics
• Currently used DNA Markers in Forensics
• Three Generic Forensic Scenarios
• Examples of DNA Evidence Data
• Frequency, Likelihood, and Bayesian Logic of DNA Statistics
• Population Substructure and Its Effect on DNA Statistics
• Lineage Markers (mtDNA and Y-STR haplotypes)
• Match and Partial Match in Databases
           Brief History of DNA Forensics
•   1980 – Ray White described the first hypervariable RFLP marker
•   1985 – Alec Jeffreys discovered multilocus VNTR probes (the term “DNA
    Fingerprinting” coined)
•   1985 – First paper on PCR published
•   1988 – In US, FBI started DNA forensic casework
•   1991 – First STR paper published
•   1992 – NRC-I Report Issued
•   1994 –CODIS STR Loci Characterized
•   1995 – FSS started UK DNA Database
•   1996 – NRC-II Report Issued; mtDNA introduced in Forensics
•   1997 – 13 CODIS STR Loci Validated for Forensic Use; Y-STRs described for
    forensic investigation purposes
•   1998 – FBI launched CODIS Database
•   2000 – RFLP Technology replaced by Multiplex STR Technology
•   2002 – FBI mtDNA Population Database published; Y-STR 20plex published
•   2002 – SNPs have been proposed as supplementary markers
•   2004 – Large sizes of “offenders’ data bases” opened issues of coincidental
    full/partial matches
•   2007 – Familial search through partial match occurrences in databases
        Advantages of Use of STR Loci
             in DNA Forensics
• PCR Based

• Low quantity DNA              TPOX
• Degraded DNA                  THO1

• Amenable to automation        D13S317

• Non-isotopic                  D16S539
• Rapid typing                  D18S51

• Discrete alleles              D3S1358
• Abundant in genome            D5S818

• Highly informative
                                Penta D
                                Penta E

(satisfied by the CODIS STRs)
                       15 CODIS STR Loci with
                        Chromosomal Positions


                          D5S818                                        VWA

                    FGA                 D7S820

             Penta E

                  D16S539          D18S51              D21S11,
                                                       Penta D
 Three Types of DNA Forensic Issues
Transfer Evidence: DNA profile of the evidence
  sample providing indications of it being of a single source

Mixture of DNA: Evidence sample’s DNA profile
  suggests it being a mixture of DNA from multiple (more than
  one) individuals

Kinship Determination: Evidence sample’s DNA
  compared with that of one or more reference profiles is to be
  used to determine the validity of stated biological
  relatedness among individuals
Transfer Evidence – An Example
       DNA Mixture Analysis
(amelogenin, D8S1179, D21S11, D18S51)
Lineage Marker
                                Y-Chromosomal Genes

Lahn, Pearson & Jegalian 2001
Y STR Loci
Three Types of Conclusions

    Match, or Inclusion
  Statistical Assessment of DNA Evidence

Needed most frequently in the inclusionary events
(Apparent) exclusionary cases may also be sometimes
 subjected to statistical assessment, particularly for kinship
 determination because of genetic events such as mutation,
 recombination, etc.
Loci providing inconclusive results are often excluded
 from statistical considerations
Even if one or more loci show inconclusive results,
 inclusionary observations of the other typed loci can be
 subjected to statistical assessment
Approaches for Statistical Assessment of
            DNA Evidence
Frequentist Approach: indicating the coincidental chance
  of the event observed
Likelihood Approach: indicating relative support of the
  event observed under two contrasting (mutually exclusive)
  stipulations regarding the source of the evidence sample
Bayesian Approach: providing a posterior probability
  regarding the source, when data in hand is considered with
  a prior probability of the knowledge of the source (later is
  not generally provided by the DNA profiles being considered
  for statistical assessment)
           Frequentist Approach of Statistical
           Assessment for Transfer Evidence
  When the evidence sample DNA profile matches that of the reference
  sample, one or more of the following questions are answered:
 How often a random person would provide such a DNA match?
  Equivalently, what is the expected frequency of the profile observed in
  the evidence sample? – also called Random Match Probability,
  complement of which is the Exclusion Probability
 What is the expected frequency of the profile seen in the evidence
  sample, given that it is observed in another person (namely in the
  reference sample) – also called Conditional Match Probability
 What would be the expected frequency of the profile seen in the
  evidence sample in a relative (of specified kinship) of the reference
  individual, given the DNA match of the reference and evidence samples
  – also called the Match Probability in Relatives
            Frequentist Approach of Statistical
               Assessment for DNA Mixture
  When the evidence mixture DNA profile fails to exclude a reference sample as a part
  contributor, and more commonly a set of reference samples together explains all
  alleles seen in the mixture, one or more of the following questions are answered:
 How often a random person would be excluded as a part contributor of
  the mixture sample? – also called Exclusion Probability, the
  complement of which is the inclusion probability, giving the expected
  chance of Coincidental Inclusion
  (Note: This answer is based on the data on the evidence sample alone, without any
  consideration of the profiles of the reference samples)
 With a stipulation on the number of contributors, how often a random
  person’s DNA, mixed with that of one or more of the reference persons,
  would provide a mixture profile as seen in the evidence sample, given
  that the reference persons are also part contributors of the DNA mixture
  (Note: This answer considers data on the profiles of evidence sample as well as those
  of the reference samples stipulated to be part contributors)
   Kinship Assessment – Frequentist
  When comparisons of evidence and reference samples fail
  to exclude a stated relationship of the evidence sample with
  the reference individual(s), the frequency based question is
  of the form:
 What is the chance of excluding the stated relationship? –
  called the Exclusion Probability (PE), this is generally
  answered conditioned on the profiles of the reference
  samples and stated relationship
  Note: Average exclusion probability can also be computed
  disregarding the profiles examined, which rationalizes the
  choice of loci to be typed for validating the stated
       Concept of Likelihood
A Likelihood represents the support of a given
hypothesis (of vale of a parameter) provided by the
observations in the data, written as
      Likelihood = Prob. (Data | Hypothesis).
Technically, likelihood is mathematically identical to
the probability of the data given the hypothesis, but
interpreted as a function of the hypothesis (or,
parameter values specified by the hypothesis) for
the observations in the data.
             Likelihood Ratio
With two (mutually exclusive) hypotheses, say H1
and H2, the likelihood ratio (LR) is the ratio of
probabilities of observing the same data under H1
and H2 , giving

     LR = Prob. (Data | H1) / Prob. (Data | H2).

Meaning of LR:
     LR < 1: Data less well supported by H1, compared with H2
       LR = 1: Data equally well supported by H1 and H2
     LR > 1: Data better supported by H1, compared with H2
        LR in Transfer Evidence
  DNA profile of evidence sample (E) matches that of
  the suspect (S); i.e., E = S
Contrasting Scenarios of Source (Hypotheses):
  Hp: DNA in the evidence sample came from the
  Hd: DNA in the evidence came from someone other
  than the suspect, but it coincidentally matches the
  DNA profile of the suspect.
      LR in Transfer Evidence

        LR = Pr. (Data | Hp) / Pr. (Data | Hd)
          = Pr. (E = S | Hp) / Pr. (E = S | Hd)
          = 1 / Pr. (coincidental match)
Thus, LR in this case is simply the inverse
(reciprocal) of the relative frequency of the DNA
profile of the evidence sample in the population,
given that it is the same as of the suspect
       LR in Transfer Evidence
Since LR can be defined for any two mutually exclusive
hypotheses, one may also consider the alternative
hypothesis as:
Hr: A relative of the suspect is the source of evidence DNA
     In this case, the likelihood ratio, LR(r), will be
          LR(r) = Prob. (E=S | Hp) / Prob. (E =S | Hr)
                       = 1/ Pr. (DNA match in the relative),
which equals the reciprocal of the probability of the DNA
profile found in the evidence sample in the relative of the
suspect, given that the suspect has the same DNA profile
                 LR in DNA Mixture
Data: The DNA evidence profile, E (a DNA mixture) has alleles
  which are all explained by alleles present in the suspect’s
  DNA profile (S) and that of a victim’s DNA profile (V)
Contrasting Hypotheses:
  Hp: DNA in the evidence sample is the mixture of DNA of the
  suspect and that of the victim; (i.e., Hp: E = V + S)
  Hd1: Evidence DNA is a mixture of DNA from the victim and
  that of an unknown person (i.e., Hd1: E = V + UN)
  Hd2: Evidence DNA is a mixture of DNA from two unknown
  persons (i.e., Hd2: E = UN + UN)
                   LR in DNA Mixture
Pr. (Data | Hp: E = V + S) = 1, since data represents all alleles in
   the mixture are explained by alleles present in V and S, and
   no extra alleles are present in V and/or S.
   Hence under Hp: E = V + S, data observed is the only
   possible outcome, but
Pr. (Data | Hd1: E = V + UN) = relative frequency of a random
   person, whose DNA, mixed with the DNA of the victim,
   would yield a mixture that matched the evidence sample,
Pr. (Data | Hd2: E = UN + UN) = relative frequency of a pair of
   random persons, whose DNA mixture would match the
   profile seen in the evidence sample
             LR in DNA Mixture
 LR for Hp vs. Hd1: = 1 / Pr. (Data | Hp: E = V + UN),
which becomes the reciprocal of the relative
frequency of a random person, whose DNA, mixed
with the DNA of the victim, would yield a mixture
that matched the evidence sample
LR for Hp vs. Hd2: = 1 / Pr. (Data | Hp: E = UN + UN),
which is the inverse of the relative frequency of a
pair of random persons, whose DNA mixture would
match the profile seen in the evidence sample
  Other Considerations of Computing
            LR in DNA Mixture
 Computations of numerator and denominator
 of LR in mixture interpretation depend on:
Precise knowledge of the number of
 contributors in the DNA mixture
Assumptions regarding the biological
 relatedness of the unknown contributors
 (between themselves, or with the reference
Population origin of the contributors
       Likelihood Ratio in Kinship
 Although the logic is similar, principles of LR
 formulation in kinship analysis can be simply
 illustrated with:
Standard paternity analysis (with DNA of
 mother, child, and alleged father typed for
 several loci), and
Kinship assessment for a pair of individuals
 (with genotype data from one or more loci)
  Interpretation of LR in Paternity Testing
 LR in paternity testing, also called PI, is the ratio of
  two conditional probabilities
 It contrasts the chance of observing the specific trio
  of genotypes (GC, GM, and GAF) given that AF = BF,
  as opposed to AF ≠ BF
 PI (or LR) can be computed even when M and AF, or
  AF and BF, are biologically related
 PI can be computed for apparent exclusion events
  as well, invoking mutation and/or recombination
  (generally leading to drastically reduced PI or LR for
  the loci where such events are observed)
        LR in Standard Paternity Testing

  Mother’s DNA profile (GM), and that of the child (GC)
  suggests that all obligatory alleles (i.e., the alleles that the
  child must have received from its biological father, BF) are
  present in the DNA profile of AF (GAF)
Hypotheses contrasted:
Hp: Alleged father (AF) is the biological father (BF) of the
  child (M is assumed to the true mother); i.e., Hp: AF = BF
Hd: Alleged father is not the biological father, but he is not
  excluded from paternity (i.e., Hd: AF ≠ BF)

Under the mutation-drift balance, the probability of a
sample in which    copies of the allele    is observed,
for any set of              is given by

              freq. of allele    in the population,
and G(.) is the Gamma function, in which is the
coefficient of coancestry (equivalent to Fst or Gst, the
coefficient of gene differentiation between
subpopulations within the population)
               Match Probability - Formulae

               under HWE          with substructure adjustment
                           unconditional            conditional

Homozygote        p i2     pi2 +θpi (1-pi)   [pi (1-θ)+2θ] [pi (1-θ)+3θ]
  (AiAi )                                           (1+θ) (1+2θ)

Heterozygote      2pipj    2pipj (1-θ)       2[pi (1-θ)+θ] [pj (1-θ)+θ]
  (AiAj )                                           (1+θ) (1+2θ)

                             [2  (1   ) pi ][3  (1   ) pi ]
       Pr( Ai Ai | Ai Ai ) 
                                      (1   )(1  2 )
                              2[  (1   ) pi ][  (1   ) p j ]
       Pr( Ai Aj | Ai Aj ) 
                                       (1   )(1  2 )

Where pi, pj are frequencies of alleles Ai and Aj , and
      = coefficient of co-ancestry ( Fst/Gst) representing
     extent of population substructure effect
     (Balding and Nichols, 1994)
               Match Probability - examples
                          under HWE      with substructure adjustment (θ=.01)
                                          unconditional          conditional
D3S1358   (14, 18 )         0.0457            0.0457               0.0495

vWA       (14, 16)          0.0411            0.0411               0.0451

FGA       (23, 25)          0.0218            0.0218               0.0253

D8S1179 (12, 14)            0.0586            0.0586               0.0626

D21S11    (29, 30)          0.0840            0.0840               0.0881

D18S51    (13, 17)          0.0381            0.0381               0.0418

D5S818    (12, 12)          0.1252            0.1275               0.1367

D13S317   ( 9, 11)          0.0488            0.0488               0.0542

D7S820    (10, 10)          0.0844            0.0865               0.0949

Cumulative                  3.9610-12       4.1310-12          9.1510-12

Upper bound of 95% C.I.     1.0210-11      1.0510-11           2.1710-11
       Paternity Testing – Frequentist
             Approach Example
In a standard paternity testing case, with mother’s genotype being A1A1, and
the child’s A1A2, an alleged father whose genotype does not contain the A2
allele would be excluded, giving

     PE  1 – Freq.(A2 A2 ) – Freq.(A2 A2 )
where A2 is any allele other than the allele A2. This computation assumes
that no mutation occurred during the transmission of alleles across
Note: Average exclusion probability can also be computed disregarding the
profiles examined, which rationalizes the choice of loci to be typed for
validating the stated relationship
    LR for Kinship of a Pair of Individuals

   DNA profile (GX) of one individual X, compared with that (GY)
   of another individual Y is considered to assess the accuracy
   of a specified stated biological relationship between X and Y

Hypotheses contrasted:
Hp: X and Y are biologically related (i.e., the stated
   relationship is correct)
Hd: X and Y are biologically not related
Note: Comparison between two stated relationships may also be tested
     IBD Probabilities – ITO Method
 Two individuals of genotypes GX and GY can
Both alleles IBD (called scenario I),
Only one allele from each is IBD (scenario T),
None of their alleles are IBD (scenario O).
  Their probabilities are denoted by Φ2, Φ1, and Φ0,
  respectively, and for any biological relatedness
 0  Φ2,Φ1,Φ0  1, Φ2 + Φ1 + Φ0 = 1, and 4 Φ0 Φ2  Φ12
Kinship Analysis of a pair of Individuals :
           IBD Coefficients In Relatives

Relationship Type    Symbol     0     1      2

Monozygotic twins      MZ        0         0   1
 Parent-Offspring       PO       0         1   0

     Full Sib           S       1/4    1/2     1/4

   First Cousin         1C      3/4    1/4     0

    Unrelated           U        1         0   0
  Conditional Probability of Gy given Gx for
         specific kinship of x and y
• Stipulated kinship between x and y specifies the IBD
  probabilities 0, 1, 2 for x and y
• For observed Gx and Gy :
 Pr (Gy | Gx for the specified relationship)
 = 0•Pr(Gy | Gx under O) + 1•Pr(Gy | Gx under T)
     + 2•Pr(Gy | Gx under I)
 Rule: Conditional probability of Gy given Gx for a stated kinship is the
 weighted average of conditional probabilities of the same event under
 specified IBD described by the kinship
            Bayes Formula (Odds form)

  P(H1 | E)   P(E | H1 )   P(H1 ) 
                        
  P(H | E)   P(E | H )   P(H )   
     2               2        2 

posterior odds = likelihood ratio x prior odds
E = DNA evidence
H1 = alleged father is biological father
H2 = alleged father is not biological father
    Note: While the first factor of the RHS is computed from DNA evidence,
  the second factor, P(H1)/P(H2), is not necessarily a DNA-based information
Synthesis of Three Approaches of Statistical
Frequency-Approach provides the probability of the
 observed DNA evidence (unconditional as well as
 conditional) under a given stipulated hypothesis
Likelihood Ratio (LR) contrasts such probabilities for
 two mutually exclusive hypotheses
In Bayesian approach, with the use of prior probability,
 LR is transformed to obtain the relative odds of one
 hypothesis against another given the DNA data of the
 evidence (and that from known persons tested)
   Synthesis of Three Approaches (Contd.)

The three approaches are built on one another, and
 hence, it is inaccurate to say one is wrong and the
 others are correct
LR, without the transformation with the use of the prior
 probability, may be incorrectly interpreted as the
 answer of the Bayesian computation, but the numerator
 and denominator of LR can be stated with frequentist’s
 interpretation to avoid the error of reverse conditioning
The prior probability of the Bayesian approach
 generally comes from non-DNA evidence, and hence,
 their assumptions are untestable from DNA data
           Important Fact with An Example
          LR, by itself, is not a Bayesian Approach, and the
      prosecutor’s fallacy can be avoided by explaining the two
                 conditional probabilities separately

Example: Consider a mixture case, where victim’s profile (V) together with the defendant’s
   profile (S) explains all alleles in the mixture profile (E).

    Under Hp: E = V + S, the conditional probability of E given Hp is 1.0, but under Hd: E = V +
    UN, say the conditional probability of E given that the other contributor is unknown (UN) is
    1 in 100,000.
    Instead of telling LR = 100,000, it is less confusing to say that if we were to assume that the
    mixture DNA came from the victim and this defendant, this is the only observation possible
    (certain), but if the other contributor is unknown, we have to sample 100,000 unrelated
    persons before finding one, whose DNA mixed with that of the victim would produce a
    profile matching the profile seen in the mixture DNA evidence sample.
Is the Extent of Population
  Substructure Uncertain
   for the Forensic Loci?
           Inbreeding Coefficient
                      African                          Native
          Caucasian   American   Hispanic    Asian    American
CSF1PO    -0.0007     -0.0009    -0.0003    -0.0012   0.0244
D13S317   -0.0008      0.0029     0.0047    0.0071    0.0157
D18S51     0.0001     0.0012      0.0011    0.0046    0.0268
D21S11     0.0008      0.0005     0.0013    0.0056    0.0371
D3S1358   -0.0009     -0.0009     0.0010    0.0035    0.0764
D5S818    -0.0001      0.0010     0.0010    0.0028    0.0656
D7S820    -0.0005      0.0000     0.0010    0.0039    0.0201
            Inbreeding Coefficient
                      African                         Native
          Caucasian   American   Hispanic    Asian   American
D8S1179    0.0000     -0.0001    0.0005     0.0025   0.0125
FGA       -0.0004      0.0004    0.0008     0.0029   0.0168
THO1      -0.0012      0.0015    0.0041     0.0058   0.0356
TPOX      -0.0015      0.0021    0.0024     0.0100   0.0164
VWA       -0.0011      0.0011    0.0029     0.0027   0.0172

Average   -0.0005      0.0006    0.0021     0.0039   0.0282
  The NRC-II recommendation

 = 0.01 for large cosmopolitan populations
   = 0.03 for small isolated populations
is well-validated by empirical as well as
         theoretical foundations
    Are the DNA Forensic
Population Databases Random
  and are their Sample Sizes
      Features of Genetic Databases
• Population Genetics historically always employed
  ‘convenient’ sampling, in stead of strict random sampling
• ‘Convenient sampling’ defined as sampling of individuals
  without any prior knowledge of their DNA type is
  operationally random, in particular, when variations at
  DNA loci do not affect fertility, viability, cognitive or life
  achievement abilities
• Allele frequency estimates from convenient samples have
  been shown to well-approximate those estimates from
  structured strict random sampling
• Strict random samples collected at one point of time from a
  natural population may not remain random at another time
  point because of birth, death, immigration, and emigration
  Features of Genetic Databases - 2
• Allele frequencies from subjects of
  convenient samples described by ‘self-
  identified’ ethnicity have been shown to
  represent genetic affinities comparable with
  similar inferences drawn from
  anthropologically well-defined populations
• Occasional presence of biological relatives
  in convenient samples does not affect allele
  frequency estimates, but may produce
  excess allele/genotype sharing at some loci
Phylogenetic Tree (UPGMA) for some World
 Populations with allele frequency data of
           the CODIS STR Loci

                             SW Hispanic (TX)
                             SW Hispanic (CA)
                             US Caucasian
                             SE Hispanic (FL)
                             African American (TX)
                             African American (CA)
      Sample Size Limitation Issue
• Strictly speaking, no sample size is universally
  sufficient unless all individuals are continually
  genotyped over times
• Sample sizes such as 100 to 150 individuals per
  population has been shown to produce stable
  estimates of allele frequencies above a prescribed
  minimum threshold allele frequency
• Current forensic DNA statistics employ the
  concepts of minimum threshold allele frequency,
  and upper 95% confidence interval to account for
  sampling variation
Concerns Related to Databases
  Used for Lineage Markers
 (e.g., mtDNA and Y-STRs)
    Inheritance of Lineage Markers
  (NOTE: Colors denote mtDNA-type, Letters (X, A, B) indicate Y-
linked information, where X denotes no Y-chromosome; A and B are
                  Y-linked alleles or Haplotypes)

                         X         A

B              X         A         A         A              X

X B B B                                     X X A X
  Introductory Comments on Lineage Markers

• mtDNA is maternally inherited, and Y-STRs are
  transmitted to only sons from fathers alone
• Barring mutations, all maternally related persons
  (males as well as females) will have the same
  mtDNA profile, and all paternally related males
  will have the same Y-STR profile
• Different markers on mtDNA are genetically
  linked (with virtually no recombination) and so
  are the Y-STRs (residing on the non-recombining
  region of the Y chromosome)
  Comments on Lineage Markers (Contd.)
• Consequently, mtDNA sequence data has to be treated like
  a haploid haplotype, frequency of which is NOT
  multiplicative across markers, and so is the case of Y-STR
  based profile
• Counting method is the one that captures the genetic
• Stated ethnicity of individuals does not necessarily reflect
  patrilineal or matrilineal ancestry (e.g., mtDNA of
  Hispanics may be almost entirely of Native American
  descent, while for the autosomal STRs, only 30-50% of
  their genes are of Native American descent)
• Thus, grouping of populations used for autosomal nuclear
  STR loci does not necessarily provide accurate frequency
  estimates of Y-linked STR haplotype, nor that of specific
  mtDNA sequence
Fundamental Difference of Frequency of CODIS
 STR DNA Profile and that of based on mtDNA
                 and Y-STRs

• For CODIS STR loci, profile frequency provides
  information regarding the rarity of the profile in
  the population, or conditional probability given
  that the profile is found in someone else
• For mtDNA, it is the frequency among individuals
  who are NOT maternally related
• For Y-STRs, likewise, it is the frequency among
  individuals NOT paternally related
Computation of Frequency of Lineage-based
              Marker Profile
Using the general theory, the unconditional
frequency of an haplotype (say Ai), which is
count divided by sample size, can be
modified to get the conditional probability
     Pr. (Ai|Ai) = [pi2 + pi(1-pi)]/pi
              = pi + (1-pi)
              =  + pi(1 - )
Hence, the conditional probability always
exceeds , the adjustment factor of possible
population substructure in the database used
Computation of Frequency of Lineage-based
         Marker Profile (Contd.)
Some advocates suggest that the quantity pi
        Pr. (Ai|Ai) =  + pi(1 - )
can be substituted by
        (Count of Ai + 2)/(N + 3),
where N is the sample size.
When N is large, this has little effect, but can be of
help when the count of Ai in the database is zero
(i.e., profile in evidence not seen in the database)
    mtDNA and Y-STR  -Value

Since in terms of match versus non-match,
how different are the haplotypes is not an
issue, the  values for mtDNA and Y-linked
haplotypes are to be computed not based on
mismatch based approaches (such as
AMOVA), but treating all haplotypes as
different alleles, generally leading to much
smaller  value
Issues Related to DNA-Match
 Statistics when Suspects are
Identified by Database Search
  Three Approaches – Three Types of
• The NRC-I recommendation to use only the additional
  loci, not used in database search, is counter-productive
• The chance of coincidental finding of a profile in a
  database depends on the expected rarity of the profile and
  database size
• NRC-II’s Np rule answers the question of expected number
  of profiles matching a target profile (of rarity p) in a
  database (random with respect to crime) of size N
• Bayesian approach makes additional assumptions
  regarding the prior odds of each individual in the database
  being the contributor of the DNA of the target profile

Prob. that in a sample of persons, all
  birthdays are different is given by
                                           OBSERBED AND EXPECTED
                                             MATCH PROBABILITY
                                           Caucasian                                                       African-American
            0.45                                                                0.45
             0.4                                                                 0.4
            0.35                                                                0.35
             0.3                                                                 0.3

            0.25                                                                0.25
             0.2                                                                 0.2
            0.15                                                                0.15
             0.1                                                                 0.1
            0.05                                                                0.05
              0                                                                    0
                   0   1   2   3   4   5    6   7   8   9   10   11   12   13          0   1   2   3   4     5    6   7   8   9   10   11   12   13

                                           Hispanic                                                              Caribbean
            0.45                                                                0.45

             0.4                                                                 0.4

            0.35                                                                0.35

             0.3                                                                 0.3

            0.25                                                                0.25

             0.2                                                                 0.2
            0.15                                                                0.15

             0.1                                                                 0.1
            0.05                                                                0.05
              0                                                                   0
                   0   1   2   3   4   5    6   7   8   9   10   11   12   13          0   1   2   3   4     5    6   7   8   9   10   11   12   13
                                       Number of Loci                                                        Number of Loci
                       EFFECT OF PRESENCE OF RELATIVES
                    (Caucasian data on CODIS loci,  =0, N = 1000)
Number of Pairs

                       0.1                                                                        Unrelated
                                                                                                  1 Full sib
                                                                                                  10 Full sibs
                                                                                                  100 Full sibs
                             0   1   2   3   4   5       6    7       8   9   10   11   12   13

                                                     Number of Loci
• With larger amount of data collected since1996, and with
  experiences of statistical results from caseworks, NRC-II
  recommendations remain as appropriate suggestions for
  statistical evaluation of Forensic DNA evidence
• Statistical answers for different questions are necessarily
  different; they do not constitute lack of general acceptance
• mtDNA and Y-STR database groupings are necessarily
  different from that of autosomal STRs because of uni-
  parental ancestry of lineage markers
• Convenient sampling effect and sampling size limitations
  are imbedded in current protocols of DNA statistics
• Suspect from database search raises multiple type of
  questions answers of which are different
• Dr. Bruce Budowle - from FBI Academy
• Hee S. Lee, Xiaohua Sheng, Jianye Ge -
  Graduate Students at CGI, Univ. Cincinnati
• SWGDAM members – for providing
• US Granting Agencies NIH and NIJ – for
  partial support of the research
Thank You!

Shared By:
Description: Sample Working Reference Letters for Immigration document sample