Editable Newspapers Template

Reviews
QR’09 RESEARCH NEWS EXCLUSIVE False Identity Detection ? Most Wanted Order-of-Magnitude Based Approach Tossapon Boongoen & Qiang Shen, Aberystwyth University, UK An integration of qualitative reasoning and link analysis, to detect possible use of false (or deceptive) identity. Date 24/06/09 Outline Background of False Identity False Identity Detection Approaches Order-of-Magnitude Based Model Experimental Results Conclusion Date 24/06/09 Background Age of Terror False Identity has become the common denominator of serious crimes and terrorism In UK, financial losses due to such cause are reported to be around 1.3 billion pounds each year. In particular to 9/11 attack, US authorities failed to discover the use of false identities by terrorists. 19 terrorists entered the US on 9-11 with false identity Date 24/06/09 Background Identity is a set of characteristic descriptors unique to a specific person. Identity Attributed Name Date-of-birth Biographical Biometric DNA and fingerprint Easy to Educational, financial or criminal falsify! history Date 24/06/09 Attributed Identity Name Deception is the most common practice with attributed identity. False Identity (attributed) Name (100%) DOB (66.7%) ID (56.3%) Resident (33.3%) Completely different name Add-on abbreviation Similar pronunciation First-second name swap Date 24/06/09 False Identity Detection Text-based approach makes use of string-matching techniques to compare the similarity of strings (X, Y), e.g. Edit distance and Jaro. Problems of high deception Bin laden  The prince Bin laden  The emir Fadil muhamad  Harun fazul Edit distance is based on the number of edit operations to transform X to Y. Jaro relies on the number and order of the common characters between X and Y. Effective for short strings, especially personal names. This method is effective for problems caused by data-entry or translation errors. But, it fails to deal with ‘highly deceptive cases’. Date 24/06/09 False Identity Detection Link-based approach Despite using several false identities, a criminal (e.g. terrorist) typically exhibits a unique relation pattern to other information objects. Similarity of objects can be estimated from the link patterns they are part in. Example methods: House Phone No. Identity A Cash Card Email Identity B SimRank (publication domain) PageSim (Internet domain) Identity C Date 24/06/09 Link Analysis D 1 Terminology Vertex  Name Edge  Co-occurrence relation Edge weight  Co-occurrence frequency 8 4 2 Link-based similarity Several methods use different properties of shared neighbours. E.g., for the neighbours of (A, B): Cardinality = 2 (i.e. C and D) 1 A 3 C 1 6 B Uniqueness: average(Uniquenesses of individual neighbours) = Uniqueness of C + Uniqueness of D 2 Date 24/06/09 Uniqueness Measure Uniqueness is estimated for each shared neighbour k of vertices i and j: UQ  k ij f ik  f jk f m mk fik : frequency of the link between vertices i and k, fjk : frequency of the link between vertices j and k, fmk : frequency of the link between any vertex m and the vertex k. Uniqueness measure captures the relative density of unique links to the nodes in question. Date 24/06/09 Uniqueness Measure 8 1 4 2 D 8 1 4 1 A 3 1 2 A 3 C 1 6 1 6 B B Uniqueness of D = 1+3 1+3 Uniqueness of C = 2+6 2+6+1+1 Date 24/06/09 OM-based Model Motivations Existing numerical techniques encounter the problem of inaccurate description (often caused by unduly large values). E.g. Normalised interpretation of cardinality = 100 is 0.1, when the maximum cardinality = 1,000. Link properties, such as cardinality, are usually a matter of degree. Link property measures are gauged and described qualitatively: using order-of-magnitude formalism. Additionally, most link-based similarity methods take into account one property of neighbourhood context. Multiple properties (e.g. cardinality and uniqueness) are combined to improve the quality of similarity measure. Date 24/06/09 OM-based Model Constructing an OM scale: Cardinality Numerical scale 0 2 6 …  Landmark set = {2, 6} Human analyst Small Medium Large OM scale [0, 2] (2, 6] (6, ) Date 24/06/09 OM-based Model OM Space: Cardinality [small, large] Abstraction Precision [small, medium] [medium, large] [small, small] [medium, medium] [large, large] [small, medium] [medium, large] Date 24/06/09 OM-based Model Semi-supervised determination of landmarks Human-directed landmarks are not optimal for different datasets. A better alternative is to learn from data. In this work, Density function is used to determine landmarks: N (t ) D(t )  * N D(t): density of property measure t, N(t): number of entity pairs, whose property measure ≥ t, N*: number of all entity pairs. Date 24/06/09 OM-based Model Learning landmark values 101 102 103 4 7 10 104 23 105 Order-of-magnitude Values of D(t) Date 24/06/09 OM-based Model Homogenisation of OM Models Multiple link properties are described in different OM spaces. Prior to combining these measures, the homogenisation of property-specific OM scales is required. For instance: Landmark sets of cardinality and uniqueness are {2, 6} and {0.1, 0.3, 0.6, 0.8}, to be mapped onto the homogenised scale of {-3, -2, -1, 0, 1, 2, 3}. Step1.1: Select the central landmark (lc), which is in the middle of each ordered landmark set CT = {2, 6}  lc = 2 or lc = 6 UQ = {0.1, 0.3, 0.6, 0.8}  lc = 0.3 or lc = 0.6 Date 24/06/09 Homogenisation Step1-2: Modify each original landmark li to its new value sli, such that sli = li – lc. CT = {2, 6}  {0, 4}, lc = 2 UQ = {0.1, 0.3, 0.6, 0.8}  {-0.2, 0, 0.3, 0.5}, lc = 0.3 Step2: Add landmark values, such that they symmetrically appear on both positive and negative sides of 0. CT = {0, 4}  {-4, 0, 4} UQ = {-0.2, 0, 0.3, 0.5}  {-0.5, -0.3, -0.2, 0, 0.2, 0.3, 0.5} Step3: Add additional landmarks, such that all landmark sets have the same granularity. CT = {-4, 0, 4}  {-4, -2, -1, 0, 1, 2, 4} UQ = {-0.5, -0.3, -0.2, 0, 0.2, 0.3, 0.5} Date 24/06/09 Homogenisation Finally, map the modified scales to the homogenised set. Date 24/06/09 OM-based Model Homogenised and Original Scales Property Cardinality Label small medium large very low low moderate high very high Original [0, 2] (2, 6] (6, +) [0, 0.1] (0.1, 0.3] (0.3, 0.6] (0.6, 0.8] (0.8, 1] Homogenised (-, 0] (0, 3] (3, +) (-, -1] (-1, 0] (0, 2] (2, 3] (3, +) Uniqueness Date 24/06/09 OM-based Model Combining property measures Different relevance (importance) degrees for different properties. Qualitative relevance is used here: Cardinality (CT) = ++ (or 2) and Uniqueness (UQ) = + (or 1). OMS (Order-of-Magnitude based Similarity). OMS  [ ](CT ,UQ , RV CT , RV UQ )  [(CT ,UQ, RV CT , RV UQ )]S *  [2CT  UQ]S * RVCT, RVUQ: relevance degrees of CT and UQ, respectively, (.): real weighted sum, [(.)]: qualitative expression of (.), S*: OM space for expressing OMS values. Date 24/06/09 OM-based Model Combining property measures Example: CT = [medium, medium] and UQ = [moderate, high] OMS = 2CT + UQ = (2×(0, 3] + (0, 2])  (2×(0, 3] + (2, 3]) = (0, 8]  (2, 9] = (0, 9] Date 24/06/09 OM-based Model Order-of-magnitude Similarity (OMS) • Estimated with respect to homogenised scale • Described using the OM space of S* VL -1 L 0 M 6 H 9 VH OMS of (0, 9] = [M, H] • Different S* can be used for a specific precision level required. Date 24/06/09 Terrorist Data Terrorist Data is extracted from online news and web stories Wanted Al-Qaeda chief Osama bin Laden and his top aide, Ayman al-Zawahri, have been witnessed ... ... Osama bin Laden and Ayman al-Zawahri, moved out of Pakistan and are believed to have crossed the border back into Afghanistan ... Al-Qaeda 1 1+ 1 1 Afghanistan Ayman alZawahri 1 Osama bin Laden 1 Date 24/06/09 Example Data Abu abdallah Abu muhammad 20 57 10 14 35 Terrorist 13 September11 attack Al qaida Bin laden Afghanistan Chung-Hsing Yeh 11 Rowena Chen DBLP 4 Hepu Dong 7 Jisong Chen 2 10 1 5 Hepu Deng Kate A. Smith Date 24/06/09 OMS Performance Different Combination Methods OMS: Order-of-magnitude model with semi-supervised landmarks. For Terrorist, CT = {4, 7, 10, 23}, UQ = {0.05, 0.12, 0.27, 0.43, 1} For DBLP, CT = {2, 5, 9, 15}, UQ = {0.008, 0.04, 0.17, 0.31, 1} OMSH: OMS with human-directed landmarks. CT = {2, 6}, UQ = {0.1, 0.3, 0.6, 0.8} QT: Numerical weighted summation. Note that again, the relevance degrees of CT and UQ are 2:1 here. Date 24/06/09 OMS Performance With Terrorist Data (Precision/Recall) Method 200 OMS OMSH QT K name-pairs with top values 400 600 800 0.183/0.159 0.134/0.120 0.094/0.082 1,000 0.180/0.196 0.138/0.150 0.102/0.111 0.215/0.047 0.200/0.087 0.192/0.125 0.045/0.009 0.143/0.062 0.151/0.099 0.040/0.008 0.103/0.045 0.100/0.065 precision  recall  # (disclosed alias pairs) # (retrieved pairs) # (disclosed alias pairs) # (alias pairs in dataset) Date 24/06/09 OMS Performance With DBLP Data (Precision/Recall) Method 100 OMS OMSH QT K name-pairs with top values 200 300 400 0.015/0.261 0.012/0.217 0.010/0.174 500 0.020/0.435 0.012/0.261 0.010/0.217 0.040/0.174 0.025/0.217 0.017/0.217 0.010/0.043 0.010/0.087 0.010/0.130 0.010/0.043 0.005/0.043 0.007/0.087 Date 24/06/09 OMS Performance OMS against other link-based methods With Terrorist Data Method 200 OMS SimRank PageSim K name-pairs with top values 400 600 800 0.183/0.159 0.001/0.001 0.099/0.086 1,000 0.180/0.196 0.002/0.002 0.092/0.100 0.215/0.047 0.200/0.087 0.192/0.125 0.000/0.000 0.000/0.000 0.002/0.001 0.035/0.008 0.090/0.039 0.105/0.069 Date 24/06/09 OMS Performance With DBLP Data Method 100 OMS SimRank PageSim K name-pairs with top values 200 300 400 0.015/0.261 0.005/0.087 0.005/0.087 500 0.020/0.435 0.006/0.130 0.008/0.174 0.040/0.174 0.025/0.217 0.017/0.217 0.000/0.000 0.005/0.043 0.007/0.087 0.010/0.043 0.005/0.043 0.003/0.043 Date 24/06/09 Conclusion Contribution: OMS, as a combination of OM reasoning and link analysis, with (semi-supervised) data-driven determination of landmarks. Usually performing better than numerical link-based approaches. Improving similarity measure by combining link properties. Allowing explanation for possible reduction of false positives. Further Work: Evaluation with more relevant data. Learning of relevance degrees from data. Acknowledgement: This research is supported by UK EPSRC grant EP/D057086.

Related docs
premium docs
Other docs by rottentees
app002
Views: 103  |  Downloads: 0
A Drug-Free Approach to Autism
Views: 300  |  Downloads: 5
I Have Decided to Follow Jesus
Views: 284  |  Downloads: 1
de351
Views: 138  |  Downloads: 0
Turn Your Eyes Upon Jesus
Views: 224  |  Downloads: 2
dv126infos
Views: 130  |  Downloads: 1
at140
Views: 112  |  Downloads: 0
Lucy v Zehmer Brief
Views: 1815  |  Downloads: 8
Negligence
Views: 270  |  Downloads: 7
Holy is the Lord
Views: 283  |  Downloads: 4
dv500info
Views: 89  |  Downloads: 0
World History Standards Test
Views: 384  |  Downloads: 3
cd200
Views: 109  |  Downloads: 0
The Nails in Your Hands
Views: 230  |  Downloads: 1