Measurement of b-tagging Fake rates in Atlas Data M. Saleem* USAtlas meeting, NewYork University In collaboration with Alexandre Khanov** F. Raztidinova**, P.Skubic* (NewYork, USA). 03 – 05 Aug, 2009 *University of Oklahoma, USA; **Oklahoma State University, USA email@example.com Motivations (I) b-tagging: How this works b-tagging: Mis-tag rate In Atlas b-tagging is important for high PT physics program which Each tagger is characterized by its b-tagging efficiency and Mistag For the tagger performance we can not entirely rely on the MC due to the includes: rate, defined as follows: discrepancies between data and MC simulation. It is also important for the o Precision measurements of the top quark properties b = ratio of the b-tagged jet (above certain weight threshold wcut) to the early running of the Atlas detector to measure the tagger performance and o Large Cross-section, moderate b >50% (will be good) number of jets of this particular flavor( b). mistag rates on data. Our discussion is devoted to the measurement of mistag - Help reducing the combinatoric background w+jets Mistag rate (l) = ratio of the number tagged light jet (above a certain rate on data. - S/B ~ 2 x (4 x) if require one (two) b-tagged jet(s). threshold, wcut) to the total number of light jets in the sample. We cannot measure the mistaging rate directly on data, since we can not Searches for SUSY particle and Higgs boson (both Standard Model Several b-tagging algorithms developed in the Atlas. In this presentation we have 100% pure sample of light jets. Higgs and non-Standard Model Higgs bosons) concentrate only on 2 types of taggers (and there combination) both of these We have to find a way to measure the mistag rate on a sample contaminated o H->bb , ttH(->bb) with 4 b-jets. (comaparitve taggers makes use of the relatively long life time and mass of B-hadrons. with heavy flavor jets (that is presence of b, c in an inclusive jet sample). low cross-section, require high b ~ 70%) 1. Impact parameter (IP) based taggers (relies on the presence of Major sources that lead to tagging of the light jets: o SUSY Higgs (H+ ->tb) tracks with large impact parameter significance), S(IP). o finite resolution of the reconstructed track/vertex parameters In most cases simple kinematic cuts are not enough to 2. Secondary vertex (SV) taggers (attempts to reconstruct the o Tracks/vertices from the long-lived particles that decay in jets separate the background from the signal. It is crucial to distinguish jets decay vertices of B-hadrons inside the jet, S(Lxy)). originating from b-, c-, and, light quarks. B-tagging algorithms (taggers) are We report on 2 approaches to measure the mistag rates on data. powerful tool which are being used by the hadron collider collaborations for Method (I): years. For a given jet each algorithm, this provides a single number – tag The excess of events on positive tag for DLS b is o based on the measurement of negative tag rate. weight (w), which allows us to separate jets of various flavors on statistical due heavy flavor jets Method (II): basis. o Makes use of tag weight templates. S(IP) = IP/(IP) Mistag Rate – Using Negative tag (I) Mistag Rate: Using Negative tag (II) Mistag Rate: Using Negative tag (III) Assume that the resolution of the track impact parameter significance or These issues are taken into account by introducing 2 correction factors: since inclusive negative tag rate depends on both flavor composition of the secondary vertex significance is perfectly symmetric, and that the contribution one due to the presence of decay products of long-lived particles in light jets jet sample and negative tag rates of certain flavor, it is convenient to write: from long-lived particles can be effectively suppressed, the “negative” tagging k 1 1 Data sample: Dijet MC samples. k hf l Event Selection: rate should be close to the “positive” tagging rate. ll l l incl 1 f b ( b l 1) f c ( c l 1) - Jet pT > 20 GeV; ||<2.5; Tracks with negative impact parameters significance (IPS) can be used to - Require 2 leading jets with > 2 second due to presence of additional tail on the negative side from b- , c- These correction factors ae evaluated evaluate the tagging efficiency from light (uds) quark and gluon(g) jets. (back-to-back). hadrons. These Correction factors are estimated On MC, thus introduces systematics. For jets of any other flavor, using the negative I.P. tracks, khf l 1 Using MC by calculating the ratio of the tagging efficiency is called negative tag rate (neg). incl relevant tag rates. l Khf Since the IPS tail is same for all jet flavors, so we can expect In both cases the effects are expected to be small (Kll ≈ 1 ; Khf ≈ 1). incl_neg≈ l_neg(so instead of measuring Positive tag rate we try to The inclusive negative tag rate measured on data turn out to be a good Kll measure the negative inclusive tag rate). IP3D taggers approximation for the true mistag rate: Conventional efficiencies Also, l_neg ≈ l (IPS distribution for light jets is symmetric around zero). k k True mistag rate l hf ll incl There are two issue to this approach: Khf l 0 Presence of the tracks form b-,c- hadrons in the negative tails in addition A similar approach is used to measure the mistag rate for SV based taggers. to the resolution. In this case the negative tagging is performed by considering the jets with SV SV1 taggers Kll 0 Presence of the tracks form long-lived particle, conversion, material inter which have negative decay length significance (DLS). Heavy flavor fractions Mistag Rate – Using Negative tag (IV) Mistag Rate – Template Method (I) Mistag Rate – Template Method (II) Split the data into a pair of (compare) 2 samples with different heavy flavor Templates: normalized tag weight distribution for b-,c-, and light jets. Closure Test: compositions: - bi, ci, li = value of the i-th template bin, bi = ci = li = 1 - Perform the closure test: Divide the dijet sample in 2 halves, 1st part to get Tag weight distribution is directly - assume that distributions of tag weight (the weight “w”, output of the Tag weight templates for jet pT ranges: the correction factors. related to tagging efficiencies as: 50-75,75-100,100-150, 150-200 GeV. tagging algorithm) for b-, c-jets are known. - 2nd part is used to measure the negative inclusive tag rate and true mistag for i-th bin template -> w < wi < wi+1 - the light tag weight template is unknown, but expected to be same in both rate(using MC truth information) as: the b-tag efficiency for w>wi : i N b j k k samples (assumption). b j 1 similarly for c-tagging eff. If tag weight distribution has N bins, with 2N-2 equations. for each of the 2 N Light jet(uds) Mistag Rate l incl ll hf and mistag rate for w>wi: k i l k samples and N+3 unknowns (b-, c- fractions in each of the 2 samples and N The System: Assumeing bi , ci are known Negative Tag rate bins of light tag weight distribution). with 2N-2 equs. (N = no. of tag weight bin) IP3D If we have enough bins, can resolve this (over constrained) system and find N+3 unknown. f b , f b , f c , f c , l i ; i 1,..., N 1(l i 1) n p n p SV1 ni n( f n bi f n c i (1 - f n - f n ) l i) b-, c-fractions and mistag rates. b c b c Practical details: Look at 2 leading pT jets, For a given tagger, look at p i p( f p b f p c (1 - f p - f p ) l ) b i c i b c i distributions of tag weight w for leading jet. n(p) = total no of jets in n(p)- sample; ni ( p i)= no. of jets in i-th bin of the tag wt n p n p n n p p f b ( f b ); f c ( f c ) = fraction of b ; c jets in n(p) sample; 1 - f - f c (1 - f - f c ) :light jet frac Split the sample in two: p-sample=next-to-leading jet tagged (w < wcut) – b b n (p f p); nf n (p f p) Good agreement is observed for both IP3D and SV1 taggers. enriched with b-jets, n-sample = next-to-leading jet untagged (w>wcut). nf b b c c = no. of b ; c jets in n(p) sample Mistag Rate – Template Method (III) Mistag Rate – Template Method (IV) Mistag Rate - systematics Performance of the method: Major source of systematics for the 2 methods are: This is observed that the stability of the method depends on whether or not - Split initial sample of events in two (1st part is used to make heavy flavor c-jet fraction is fixed. 1. Due to heavy flavor fraction; 2. Due to MC generators templates, which are then used to evaluate the light template in 2nd part) If all 4 flavor fractions are left free or float, this gives a bias in the measured 3. Due to JES ; 4. Due to b-jet energy scale - The procedure is repeated Natt = 1000 times, for each fitted variable plot mistag rate. n p n p the distribution: ( fit - true) / true If we fix these fractions by assuming that f c ( f c ) / f b ( f b ) are known (from MC) Total systematic and only fit the b-jet fraction. uncertainty using Mean and RMS of the distribution is taken to the measure of uncertainty of This leads to stable fit but gives rise to a systematic uncertainty due to Negative tags method. the method. unknow c/b ratio. The uncertainty This includes the assumption that template shapes are identical for n- and p- increases with jet pT samples (is not purely a closure test). Comparison b/w measured (blue) and For combined tagger, the total systematic uncertainty is 6 – 12 % True (red) mistag rates for SV1, IP3d (depends on operating point) Example of enssemble and combined taggers for 2 operating test result (combined Total systematic uncertainty using Points w>2, w>4 with fixed c/b ratio. template method. tagger, 20<pT<35GeV, W>2,3,4 template bins The uncertainty for this method is higher (20-30% for combined tagger Comparison b/w measured (blue) and at w>4 operating point) due to Relative statistical uncertainty true (red) mistag rates for SV1, IP3d dependence on b-tag efficiency on the mistag rate in % defined and combined taggers for 2 operating (b-template taken from MC). as r.m.s. of ensemble tests Points w>2, w>4 with floating c/b ratio. In future, with the b-tag efficiency measured on data with accuracy 5-10% will reduce the systematic uncertainty of the method.