Poster_13_Muhammad_Saleem by xiangpeng


									                                 Measurement of b-tagging Fake rates in Atlas Data
                                                                                                                                          M. Saleem*
    USAtlas meeting, NewYork University                                               In collaboration with Alexandre Khanov** F. Raztidinova**, P.Skubic*
    (NewYork, USA). 03 – 05 Aug, 2009                                                              *University of Oklahoma, USA; **Oklahoma State University, USA

                                 Motivations (I)                                                                b-tagging: How this works                                                                                  b-tagging: Mis-tag rate
   In Atlas b-tagging is important for high PT physics program which                           Each tagger is characterized by its b-tagging efficiency and Mistag                              For the tagger performance we can not entirely rely on the MC due to the
     includes:                                                                                 rate, defined as follows:                                                                         discrepancies between data and MC simulation. It is also important for the
           o Precision measurements of the top quark properties                                  b = ratio of the b-tagged jet (above certain weight threshold wcut) to the                     early running of the Atlas detector to measure the tagger performance and
           o Large Cross-section, moderate b >50% (will be good)                                     number of jets of this particular flavor( b).                                              mistag rates on data. Our discussion is devoted to the measurement of mistag
                - Help reducing the combinatoric background w+jets                                Mistag rate (l) = ratio of the number tagged light jet (above a certain                       rate on data.
                - S/B ~ 2 x (4 x) if require one (two) b-tagged jet(s).                           threshold, wcut) to the total number of light jets in the sample.                               We cannot measure the mistaging rate directly on data, since we can not
   Searches for SUSY particle and Higgs boson (both Standard Model
                                                                                                Several b-tagging algorithms developed in the Atlas. In this presentation we                    have 100% pure sample of light jets.
     Higgs and non-Standard Model Higgs bosons)
                                                                                                  concentrate only on 2 types of taggers (and there combination) both of these                   We have to find a way to measure the mistag rate on a sample contaminated
           o H->bb , ttH(->bb) with 4 b-jets. (comaparitve                                        taggers makes use of the relatively long life time and mass of B-hadrons.                      with heavy flavor jets (that is presence of b, c in an inclusive jet sample).
             low cross-section, require high b ~ 70%)                                                      1. Impact parameter (IP) based taggers (relies on the presence of                    Major sources that lead to tagging of the light jets:
           o SUSY Higgs (H+ ->tb)                                                                               tracks with large impact parameter significance), S(IP).                                        o finite resolution of the reconstructed track/vertex parameters
  In most cases simple kinematic cuts are not enough to                                                    2. Secondary vertex (SV) taggers (attempts to reconstruct the                                       o Tracks/vertices from the long-lived particles that decay in jets
    separate the background from the signal. It is crucial to distinguish jets                                  decay vertices of B-hadrons inside the jet, S(Lxy)).
   originating from b-, c-, and, light quarks. B-tagging algorithms (taggers) are                                                                                                                 We report on 2 approaches to measure the mistag rates on data.
   powerful tool which are being used by the hadron collider collaborations for                                                                                                                   Method (I):
   years. For a given jet each algorithm, this provides a single number – tag                                                                             The excess of events on
                                                                                                                                                          positive tag for DLS b is
                                                                                                                                                                                                      o based on the measurement of negative tag rate.
   weight (w), which allows us to separate jets of various flavors on statistical                                                                         due heavy flavor jets
                                                                                                                                                                                                  Method (II):
   basis.                                                                                                                                                                                             o Makes use of tag weight templates.
                                                                                                                       S(IP) = IP/(IP)

        Mistag Rate – Using Negative tag (I)                                                       Mistag Rate: Using Negative tag (II)                                                              Mistag Rate: Using Negative tag (III)
   Assume that the resolution of the track impact parameter significance or                    These issues are taken into account by introducing 2 correction factors:                         since inclusive negative tag rate depends on both flavor composition of the
  secondary vertex significance is perfectly symmetric, and that the contribution               one due to the presence of decay products of long-lived particles in light jets                 jet sample and negative tag rates of certain flavor, it is convenient to write:
  from long-lived particles can be effectively suppressed, the “negative” tagging
                                                                                                               k    1                                                                                                        1                      Data sample: Dijet MC samples.
                                                                                                                                                                                                 k hf  l                                               Event Selection:
  rate should be close to the “positive” tagging rate.                                                          ll  l l                                                                                 incl 1  f b ( b  l  1) f c ( c  l  1)   - Jet pT > 20 GeV; ||<2.5;
   Tracks with negative impact parameters significance (IPS) can be used to                                                                                                                                                                             - Require 2 leading jets with  > 2
                                                                                                second due to presence of additional tail on the negative side from b- , c-                    These correction factors ae evaluated
  evaluate the tagging efficiency from light (uds) quark and gluon(g) jets.                                                                                                                                                                                (back-to-back).
                                                                                                 hadrons.                               These Correction factors are estimated                 On MC, thus introduces systematics.
   For jets of any other flavor, using the negative I.P. tracks,                                             khf
                                                                                                                      l
                                                                                                                               1       Using MC by calculating the ratio of
  the tagging efficiency is called negative tag rate (neg).                                                               incl          relevant tag rates.
                                                                                                                                                                                                                                   l                                     Khf
   Since the IPS tail is same for all jet flavors, so we can expect                            In both cases the effects are expected to be small (Kll ≈ 1 ; Khf ≈ 1).
  incl_neg≈ l_neg(so instead of measuring Positive tag rate we try to                         The inclusive negative tag rate measured on data turn out to be a good                                                                                                                   Kll
  measure the negative inclusive tag rate).
                                                                                                                                                                                                                                                    IP3D taggers
                                                                                               approximation for the true mistag rate:
                                                                                                                                                                                             Conventional efficiencies
   Also, l_neg ≈ l (IPS distribution for light jets is symmetric around zero).                                                         k  k                                                                             True mistag rate
                                                                                                                                              l      hf        ll      incl
  There are two issue to this approach:                                                                                                                                                                                                                                Khf
     0 Presence of the tracks form b-,c- hadrons in the negative tails in addition             A similar approach is used to measure the mistag rate for SV based taggers.
        to the resolution.                                                                     In this case the negative tagging is performed by considering the jets with SV
                                                                                                                                                                                                                                                           SV1 taggers                      Kll
     0 Presence of the tracks form long-lived particle,  conversion, material inter           which have negative decay length significance (DLS).
                                                                                                                                                                                              Heavy flavor fractions

    Mistag Rate – Using Negative tag (IV)                                                                 Mistag Rate – Template Method (I)                                                                 Mistag Rate – Template Method (II)
                                                                                                 Split the data into a pair of (compare) 2 samples with different heavy flavor                  Templates: normalized tag weight distribution for b-,c-, and light jets.
   Closure          Test:
                                                                                                compositions:                                                                                       - bi, ci, li = value of the i-th template bin, bi = ci = li = 1
    - Perform the closure test: Divide the dijet sample in 2 halves,       1st   part to get                                                                                                      Tag weight distribution is directly
                                                                                                   - assume that distributions of tag weight (the weight “w”, output of the                                                                                      Tag weight templates for jet pT ranges:
  the correction factors.                                                                                                                                                                        related to tagging efficiencies as:                             50-75,75-100,100-150, 150-200 GeV.
                                                                                                tagging algorithm) for b-, c-jets are known.
    -   2nd   part is used to measure the negative inclusive tag rate and true mistag                                                                                                             for i-th bin template -> w < wi < wi+1
                                                                                                   - the light tag weight template is unknown, but expected to be same in both
  rate(using MC truth information) as:                                                                                                                                                            the b-tag efficiency for w>wi :  i   N b j

                                                        k k
                                                                                                samples (assumption).                                                                                                                                     b  j 1
                                                                                                                                                                                                 similarly for c-tagging eff.
                                                                                                If tag weight distribution has N bins, with 2N-2 equations. for each of the 2                                                                  N
                      Light jet(uds) Mistag Rate        l        incl            ll      hf                                                                                                      and mistag rate for w>wi:                   k  i l k
                                                                                                samples and N+3 unknowns (b-, c- fractions in each of the 2 samples and N
                                                                                                                                                                                                 The System: Assumeing bi , ci are known
                                                                        Negative Tag rate       bins of light tag weight distribution).                                                          with 2N-2 equs. (N = no. of tag weight bin)
                  IP3D                                                                          If we have enough bins, can resolve this (over constrained) system and find                     N+3 unknown. f b , f b , f c , f c , l i ; i  1,..., N 1(l i  1)
                                                                                                                                                                                                                       n p n p
                                             SV1                                                                                                                                                      ni  n( f n bi  f n c i  (1 - f n - f n ) l i)
                                                                                                b-, c-fractions and mistag rates.                                                                               b        c              b c
                                                                                                Practical details: Look at 2 leading pT jets, For a given tagger, look at                            p i  p( f p b  f p c  (1 - f p - f p ) l )
                                                                                                                                                                                                                 b i      c i            b     c i
                                                                                                distributions of tag weight w for leading jet.                                                  n(p) = total no of jets in n(p)- sample; ni ( p i)= no. of jets in i-th bin of the tag wt
                                                                                                                                                                                                  n p n p                                                              n n          p p
                                                                                                                                                                                                f b ( f b ); f c ( f c ) = fraction of b ; c jets in n(p) sample; 1 - f - f c (1 - f - f c ) :light jet frac
                                                                                                Split the sample in two: p-sample=next-to-leading jet tagged (w < wcut) –                                                                                             b            b
                                                                                                                                                                                                   n (p f p); nf n (p f p)
Good agreement is observed for both IP3D and SV1 taggers.                                       enriched with b-jets, n-sample = next-to-leading jet untagged (w>wcut).                         nf b       b       c     c = no. of b ; c jets in n(p) sample

              Mistag Rate – Template Method (III)                                                       Mistag Rate – Template Method (IV)                                                                          Mistag Rate - systematics
   Performance of the method:                                                                                                                                                                    Major source of systematics for the 2 methods are:
                                                                                                 This is observed that the stability of the method depends on whether or not
    - Split initial sample of events in two (1st part is used to make heavy flavor              c-jet fraction is fixed.                                                                           1. Due to heavy flavor fraction; 2. Due to MC generators
  templates, which are then used to evaluate the light template in 2nd part)                    If all 4 flavor fractions are left free or float, this gives a bias in the measured               3. Due to JES ;                  4. Due to b-jet energy scale
    - The procedure is repeated Natt = 1000 times, for each fitted variable plot                mistag rate.
                                                                                                                                                     n p    n p
  the distribution: ( fit - true) / true                                                        If we fix these fractions by assuming that f c ( f c ) / f b ( f b ) are known (from MC)    Total systematic
                                                                                                and only fit the b-jet fraction.                                                             uncertainty using
   Mean and RMS of the distribution is taken to the measure of uncertainty of
                                                                                                This leads to stable fit but gives rise to a systematic uncertainty due to                  Negative tags method.
  the method.                                                                                   unknow c/b ratio.                                                                            The uncertainty
   This includes the assumption that template shapes are identical for n- and p-                                                                                                            increases with jet pT
  samples (is not purely a closure test).                                                       Comparison b/w measured (blue) and                                                           For combined tagger, the total systematic uncertainty is 6 – 12 %
                                                                                                True (red) mistag rates for SV1, IP3d                                                        (depends on operating point)
Example of enssemble                                                                            and combined taggers for 2 operating
test result (combined                                                                                                                                                                        Total systematic uncertainty using
                                                                                                Points w>2, w>4 with fixed c/b ratio.                                                        template method.
tagger, 20<pT<35GeV,
W>2,3,4 template bins                                                                                                                                                                        The uncertainty for this method is
                                                                                                                                                                                             higher (20-30% for combined tagger
                                                                                                                                                  Comparison b/w measured (blue) and         at w>4 operating point) due to
                                                        Relative statistical uncertainty
                                                                                                                                                  true (red) mistag rates for SV1, IP3d      dependence on b-tag efficiency
                                                        on the mistag rate in % defined
                                                                                                                                                  and combined taggers for 2 operating       (b-template taken from MC).
                                                        as r.m.s. of ensemble tests
                                                                                                                                                  Points w>2, w>4 with floating c/b ratio.   In future, with the b-tag efficiency measured on data with accuracy 5-10% will
                                                                                                                                                                                             reduce the systematic uncertainty of the method.

To top