Embed
Email

Macro_Trends_in_CT_Technologies_JeffJonas

Document Sample

Shared by: xiaoyounan
Categories
Tags
Stats
views:
1
posted:
11/29/2011
language:
English
pages:
87
Macro Trends in

Counter-Terrorism Technologies

And Thoughts on Responsible Innovation









DETECTER Project, Brussels

September 7th, 2011







Jeff Jonas, IBM Distinguished Engineer

Chief Scientist, IBM Entity Analytics

JeffJonas@us.ibm.com

1

Today‟s Material



 Background



 Macro Trends



 Detecting Bad Guys in Big Data



 Challenging Privacy and Civil Liberties Issues



 Privacy by Design (PbD) Considerations



 Questions and Answers





2

Background



 Early 80‟s: Founded Systems Research & Development

(SRD), a custom software consultancy



 1989 – 2003: Built numerous systems for Las Vegas

casinos including a technology known as Non-Obvious

Relationship Awareness (NORA)



 2001/2003: Funded by In-Q-Tel



 2005: IBM acquires SRD



 2005: Acquired by IBM, now Chief Scientist, IBM Entity

Analytics



 Cumulatively: I have had a hand in a number of systems

with multi-billions of rows describing 100‟s of millions of

entities

3

Roles



 Member, Markle Foundation Task Force on National

Security in the Information Age



 Board Member, US Geospatial Intelligence Foundation

(USGIF), the GEOINT organizing body



 Senior Associate, Center for Strategic and International

Studies (CSIS)



 Member, EPIC advisory board



 Advisor, Privacy International









4

Current Primary Area of Interest



 Making sense of information in large data sets,

across complex ecosystems with emphasis on privacy

and civil liberties protections



– 1996: Created an identity-centric customer repository based on

4,200 disparate systems … >100 million resolved identities





– 2001: Assistance in various post-9/11 data analysis programs for

public and private sector





– 2005: Missing persons project following Hurricane Katrina

resulting in re-unification of >100 loved ones









5

A Late Bloomer to Privacy



1980 – 2001 No clue whatsoever





2001 – 2006 Slowly waking up





2007 – 2011 Today, at best, a

student of privacy









6

A Journey Fraught with Reflection and Rethinking







The greater

my privacy and

civil liberties

awareness The greater

the number of

imperfections

appear in my

rearview mirror



7

Katrina – Missing Persons Reunification Project



 Information about status of persons quickly end up

scattered across countless databases

– Over 50 such web sites/organizations were identified as having

victim related data

– Many people were registered duplicate times in the same

database

– Many people were registered duplicate times across databases

– Many people were registered as missing in one database and

found in another database



 Connecting found persons previously reported as

missing becomes nearly impossible

– Too many databases

– Constantly changing data







8

Katrina Reunification Project Statistics



 Total data sources 15



 Usable records 1,570,000



 Unique persons 36,815



 Total loved ones reunited >100









9

Katrina – Missing Persons Reunification Project



 Privacy by Design (PbD)

– Contractually authorized to delete all the data

after the reunification office completed its work

– Hence, a few months later, all collected data and

reporting products were deleted







DESTRUCTION OF EVIDENCE!

Data Decommissioning – Destruction of Accountability









10

Macro Trends









11

Good News: The World is Not More Dangerous

Avg Age









67 75M

~17+%









Number Dead

37



300M

1900: Today: ~4.5%

Western Global

Europe Average

1300‟s: Today:

“Black Death” If America

sunk into ocean

and everyone dies



12

Prediction



Your doctor is 102

and this is not weird.





13

Bad News: “More Death Cheaper in Future” Graph





10 Kiloton

Complexity of Execution





Nuke









1918

Spanish

Influenza









Death

14

1918 Spanish Influenza Genome









15

“More Death Cheaper in Future” Graph





10 Kiloton

= Bad

Complexity of Execution





Nuke









Easier

1918

Spanish

Influenza

More Death





Death

16

Jerome Kerviel – US$7B









www.chinapost.com.tw/news_images/20080127/p1d.jpg





17

Jerome Kerviel – US$7B









Back it out Back it in Back it out Back it in









Analytic

Analytic

Checkpoint

Checkpoint





1 Day

18

2050 Predictions



A single person can

kill 100M people for

<$1,000.



19

State of the Union:

Enterprise Amnesia









20

Amnesia, definition



A defect in memory, especially resulting

from brain damage.









21

US National Security Amnesia Events



9/11

Two known terrorists were admitted into the US (only discovered

after the fact).







Christmas Day Bomber

Abdulmutallab possessed a multi-entry VISA while at the same

time was on the terrorist watch list (only discovered after the

fact).









22

Trend: Organizations Are Getting Dumber



Every two days now we create as

Available much information as we did from

Observation the dawn of civilization up until

Computing Power Growth





Space 2003.”



~ EricContext CEO Google

Schmidt,

Enterprise

Amnesia









Sensemaking

Algorithms





Time

23

Trend: Organizations Are Getting Dumber



Available

Observation

Computing Power Growth





Space



WHY?

Context









Sensemaking

Algorithms





Time

24

Algorithms at Dead End.



You Can‟t

Squeeze Knowledge

Out of a Pixel.



25

No Context





scrila34@msn.com









26

Context, definition



Better understanding

something by taking into

account the things around it.





27

Information without

context







is hardly actionable.

28

Lack of Context – Consequences



 Alert queues growing faster than the

humans address – filled mostly with false

positives



 The top item in the queue is not the most

relevant item



 Items require so much investigative

effort – they are often abandoned

prematurely



 Risk assessment becomes the risk

29

29

Information in Context … and Accumulating





scrila34@msn.com









Job

Applicant Most

Trusted

Source







Known

Terrorist

No Fly

List



30

The Puzzle Metaphor



 Imagine an ever-growing pile of puzzle pieces of varying

sizes, shapes and colors



 What it represents is unknown – there is no picture



 Is it one puzzle, 15 puzzles, or 1,500 different puzzles?



 Some pieces are duplicates, missing, incomplete, low

quality, or have been misinterpreted



 Some pieces may even be professionally fabricated lies



 Until you take the pieces to the table and attempt

assembly, you don‟t know what you are dealing with



31

32

Puzzling: 4 Puzzles, 620 Useful Pieces



270 pieces 30 pieces

90% 10%

(duplicates)







200 pieces 6 pieces

66% 2%

(pure noise)







150 pieces

50% +36 Useless Pieces!







33

34

First Discovery









35

More Data Finds Data









36

Duplicates in Front Of Your Eyes









37

First Duplicate Found Here









38

39

40

Incremental Context – Incremental Discovery



6:40pm START



22min “Hey, this one is a duplicate!”



35min “I think some pieces are missing.”



37min “Looks like a bunch of hillbillies on

a porch.”



44min “Hillbillies, playing guitars, sitting

on a porch, near a barber sign …

and a banjo!”



41

150 pieces

50%









42

Incremental Context – Incremental Discovery



47min “We should take the sky and grass

off the table.”



2hr “Let‟s switch sides, and see if we

can make sense of this from

different perspectives.”



2hr10m “Wait, there are three … no, four

puzzles.”



2hr17m “We need a bigger table.”



2hr18m “I think you threw in a few random

pieces.”



43

44

45

46

Trend: Big Data [in context] = New Physics



More data: better the predictions

– Lower false positives

– Lower false negatives





More data: bad data … good

– Suddenly glad your data was not perfect





More data: less compute



47

From Pixels to Pictures to Insight





Relevance Detection

Contextualization









Observations Persistent Consumer

Context (An analyst, a system,

the sensor itself, etc.)



48

One Form of Context is “Expert Counting”





 Is it 5 people each with 1 account … or is it 1

person with 5 accounts?



 Is it 20 cases of H1N1 in 20 cities … or one

case reported 20 times?



 If one cannot count … one cannot estimate

vector or velocity (direction and speed).



 Without vector and velocity … prediction is

nearly impossible.

49

Skilled adversaries engage in

“channel separation.”









Cell Phone #1 Cell Phone #2 Bank Acct #1 Passport #1



Unknown Unknown Billy K. William A.



50

Hence, detection requires

“channel consolidation.”









William A

aka Billy K.

• Cell Phone #1

• Cell Phone #2

• Bank Acct #1

• Passport #1



51

Expert Counting: Degrees of Difficulty

Deceit



Bob Jones Ken Wells

123455 550119



Incompatible

Features



Bob Jones bjones@hotmail

Fuzzy 123455







Bob Jones Robert T Jonnes

Exactly 123455 000123455

Same



Bob Jones Bob Jones

123455 123455

52

Deceit Detection Using Context Accumulation





Deceit Feature

Accumulation

Bob Jones Ken Wells

Robert Jones 123455 550119

123455

POB 13452

DOB 03/12/73

Ken Wells

550119

POB 999911

DOB 03/12/73

Bob Jones

gw3e56@hotmail.com

POB 13452

gw3e56@hotmail.com

gw3e56@hotmail.com

DOB 03/12/73

Robert Jones

123455 Resolved!

Ken Wells

53

550119

3 Models for

Information Sharing









54

1. Bulk Transfer



 Large collections are passed along to appropriate third parties



 May be required if the recipient must commingle the data in

secret



 The recipients must have a capacity much larger than their own

native requirements



 The more copies the more difficult it is to maintain the

information currency across the ecosystem



 The more copies the more difficult to prevent of unintended

disclosure

 Useful when the number of recipients and transactional

volumes are very small







55

2. Services for Inquiry



 Owners enable third party inquiry (human or machine lookups)



 When lots of systems are integrated, federated search can be

automated to search all third party data sources based on a

single user/machine search



 Each system in the federation must be sized for all volume



 Third party systems often lack the necessary indexes



 Nearly impossible to ensure each federated systems is on-line



 Useful for periodic, on-demand, inquiry using each third party

data source like a reference system – particularly appropriate

for narrow investigative work and/or forensic analysis



 Not that useful for detect/preempt missions





56

3. Central Catalog/Index



 Parties interested in information sharing supply metadata to a

central catalog (index)



 Inquiries can discover the location of all available documents

using a single lookup



 Card catalogs provide pointers to source systems and

documents enabling efficient/scalable lookup (aka federated

fetch)



 Easier to keep the data current … than bulk transfer



 Scales massively



 Easier to secure





57

Discovery at the Library









?

Subject Title Author









58

Enterprise Discovery









Who What Where When How









59

The Policy Focus Becomes … “Discoverability”







If you don‟t publish your meta-data (who,

what, where, when) to the enterprise

catalog …



Information is not discoverable …



Therefore, the value of your operational

system to the broad strategic interests of

the enterprise is effectively ZERO!





60

Are You Playing Well With Others?







SHARING SCORECARD(*)

DISCOVERABILITY



Organization Records Discoverable %

This org 5B 2.5B 50%

That org 120B 6B 5%

The other org 3B 1B 33%

Their org 1B 750K 75%

Their other org 1B 500K 50%









(*) Any resemblance to real organizations and real number would be coincidental

61

Challenging

Privacy and Civil Liberties

Issues









62

Issue #1: Essential Secrets vs. Transparency



 To detect professionally fabricated lies,

using only data, one must either:

1. Collect observations the adversary doesn‟t know you have

2. Or, be able to perform compute over your observations in a

manner the adversary cannot fathom





 The Challenge: How can organizations

catch bad guys if there is transparency

over their observational space and what

is computable?



63

Issue #2: More Data Good



 The good news: Both those in the counterterrorism business

and privacy community equally detest false positives

– The government recognizes that false positives waste government

resources

– The privacy community recognizes that false positives place the innocent

under undeserved government scrutiny





 The challenge: Two remedies for false positives

1. Change the rules to reduce the number of alerts (which increases the

false negatives)

2. Add more information such that the additional context permits greater

discrimination





 The more data, the lower the false positives and the lower

the false negatives





64

Issue #3: Necessity of Central Indexes



 Federated search is extremely limited

– Does not scale when the mission is to get “left of boom”

(detection)





 Central card catalogs (indexes) are the only

viable way forward

– Only the metadata centralized with pointers, not all the

data





 The Challenge: General reaction to central

databases, even if just an index



65

Issue #4: Lone Gunmen Surveillance



 Rare events planned by one or a small group are more difficult

to detect



 The size of the observation space needed to detect lone

gunmen planning acts of terrorism … approaches ubiquitous

surveillance



 Risk-based surveillance

– A car bomb in a public place

– A sector of national infrastructure at risk

– WMD over a major city





 The Challenge: At some point when one person can create

extraordinary damage, cheaply, without a trace … then what?







66

Issue #5: Less Secrets Lead to Chilling Effects?



 It is becoming harder and harder to

have secrets



 Will this chill behavior?

– Will population behavior gravitate towards the

center of the bell curve?

– Or, will mankind become more tolerant of

diversity?









67

Privacy by Design (PbD)

Considerations









68

Universal Declaration of Human Rights



 Article 9

No one shall be subjected to arbitrary arrest, detention or exile.



 Article 12

No one shall be subjected to arbitrary interference with his privacy,

family, home or correspondence, nor to attacks upon his honor and

reputation. Everyone has the right to the protection of the law

against such interference or attacks.



 Article 15

(1) Everyone has the right to a nationality.

(2) No one shall be arbitrarily deprived of his nationality nor denied

the right to change his nationality.



 Article 17

(1) Everyone has the right to own property alone as well as in

association with others.

(2) No one shall be arbitrarily deprived of his property.







69

PbD: Information Attribution



 Avoid the receipt of any data that does not come with an

ability to track its pedigree/attribution.



 When passing your data into secondary systems, pass the data

pedigree/attribution along to the recipient (even if that means

only a pointer to your copy).



 If the „chain of where data came from‟ is not maintained in the

information sharing ecosystem – there is no hope of keeping it

current and very difficult to reconcile cross-system

consistency.





More here:

Full Attribution, Don‟t Leave Home Without It

Out-bound Record-level Accountability in Information Sharing Systems



70

PbD: Data Destruction



 When the data is no longer needed or there is a mandate …

purge it.



 For example, at the close of a special information analysis

project; consider decommissioning the data sets in proportion

to the consequences of unintended disclosure or misuse.



 If there is a legal requirement to retain data, or long term

accountability is necessary, consider pushing the data to forms

of retrieval useful only in the context of

forensic/investigatory purposes.





More here:

Decommissioning Data: Destruction of Accountability







71

PbD: Limit Data Transfers



 If you don‟t have to move the entire record: don‟t.



 Using information sharing systems as an example, it is best not

to send all the data to each (and every) information sharing

partner. Better to create a central index with prescribed

fields. The index then points to the original data holder – and

getting access to the original record requires permission at

that time, from the original data holder. This ensures a degree

of transparency.





More here:

Discoverability: The First Information Sharing Principle









72

PbD: Data Tethering



 When data is moved from systems of record out into

secondary systems, as the source data changes (adds, changes

and deletes) these secondary systems should be notified.



 If the secondary systems have themselves forwarded the data

to tertiary systems, these same changes should be passed

through the entire food chain.





More here:

Data Tethering: Managing the Echo









73

PbD: Obfuscate Data



 For every copy there is a increasing risk of unintended

disclosure.



 When there is an opportunity to perform data masking,

anonymization, encryption … do it.



 Techniques now exist whereby data can be first obfuscated

(e.g., encrypted, anonymized, masked, etc.) before information

transfer ... while still maintaining a capability of performing

deep analytics (e.g., data matching) post obfuscation.





More here:

To Anonymize or Not Anonymize, That is the Question









74

Maximizing Discovery - Minimizing Disclosure



Persistent Observations Sensors

Context

Cd5dced41028cb …

00c9782a552a2 …

7f2b6e48ea7d0 …







!



Employee

Record #A-701 Database





0d06b31faa7c…

B5e341a4b0c…

00c9782a552…

FEATURES: …

Cd5dced41028cb7ea51

00c9782a552a2d09b1b Record #B-9103 Fraud

7f2b6e48ea7d042bbe8 Database





75

Maximizing Discovery - Minimizing Disclosure



Observations Sensors



Mark Randy Smith Policy Controls

DOB: 06/07/74

123 Main Street Discovery

713 731 5577 Employee Record #A-701

Record #A-701 Database

Matches

Record #B-9103

M. Randal Smith

DOB: 06/07/74

713 731 5577

Policy Controls



Record #B-9103 Fraud

Database





76

PbD: Build Accountability into Systems



 Opt for the use of tamper-resistant audit logs. The greater

the lack of transparency, the greater the need for immutable

logs: mandated or not.





More here:

Immutable Audit Logs (IAL‟s)

Found: An Immutable Audit Log









77

Comments on: Data Mining



 Data mining is not bad. There are setting where data mining is

very valuable and saves lives



 Predictive Data Mining – Limited efficacy without volumes of

training data



 Predicate Triage Data – Used to organize data sets containing

only “subjects of interest”





More here:

Effective Counter-Terrorism and the Limited Role of Predictive Data Mining

Data Mining, Predicate Triage and NSA Domestic Surveillance









78

Data Mining Defined (humorous)









“Torturing the data until it confesses …

and if you torture it enough, you can

get it to confess to anything.”

ACM SIGKDD Conference, Philadelphia 2006









79

Comments on: Link Analysis



 Link analysis is very powerful, when used in a narrow fashion.

Inspection of “subjects of interest” outward.



 Predicate-based link analysis: Big social maps are not useful

unless one has an entrance point.



 Link analysis: prune early





More here:

Hunting Bad Guys, Phone Records and a Few Good Dead Men

Predicate-based Link Analysis: A Post 9/11 Analysis (1+1= 13)

Sometimes a Big Picture is Worth a 1,000 False Positives









80

Comments on: Watch Listing and False Positives



 Difference between wrongly named and wrongly matched



 Low fidelity watch lists are the single biggest cause of false

positives - solving this ambiguity involves additional data



 Minimize collection, maximize consumer participation and

election



 Provide a redress process





More here:

Precision in TSA‟s Terrorist Watch List

Comments on the TSA No-Fly and Selectee Watch List Process









81

Closing Thoughts









82

”The data must find the

data … and the relevance

must find the user.”







83

In Closing



 There is going to be more sensors, more data



 This data will be commingled for greater accuracy to serve

consumers and protecting countries



 What data is collected/observed and when … will be the debate



 Chief privacy principle: Avoid consumer surprise



 If it has been collected, the holder has the obligation to make

sense of it



 Organizations must harness data to be smart, efficient, and

survive … but how smart do they need to be and do we trust

them?



 Hence the tension

84

Related Papers



Heritage Foundation: Paul Rosenzweig/Jeff Jonas

Correcting False Positives: Redress and the Watch List Conundrum



Cato Foundation: Jeff Jonas/Jim Harper

Effective Counterterrorism and the Limited Role of Predictive Data Mining



Steptoe & Johnson: Stewart Baker

Anonymization, Data-Matching and Privacy: A Case Study



IEEE Security and Privacy: Jeff Jonas

Threat and Fraud Intelligence: Las Vegas Style



Giannino Bassetti Foundation: Jeff Ubios

Transparency, Privacy and Responsibility: An Interview with Jeff Jonas



Markle Foundation

Nation At Risk: Policy Makers Need Better Information to Protect the Country









85

Related Blog Posts



Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel

Puzzling: How Observations Are Accumulated Into Context

When Risk Assessment is the Risk

Big Data. New Physics.

The Christmas Day Intelligence Failure – Part II: Jeff Jonas‟ Christmas Wish List

Decommissioning Data: Destruction of Accountability

Source Attribution, Don‟t Leave Home Without It

Data Tethering: Managing the Echo

Out-bound Record-level Accountability in Information Sharing Systems

To Anonymize or Not Anonymize, That is the Question

Immutable Audit Logs (IAL‟s)

The Information Sharing Paradox

Discoverability: The First Information Sharing Principle

When Federated Search Bites

Using Transparency As A Mask



86

Macro Trends in

Counter-Terrorism Technologies

And Thoughts on Responsible Innovation









DETECTER Project, Brussels

September 7th, 2011







Jeff Jonas, IBM Distinguished Engineer

Chief Scientist, IBM Entity Analytics

JeffJonas@us.ibm.com

87



Other docs by xiaoyounan
irregular plural verbs spelling
Views: 0  |  Downloads: 0
pres8
Views: 0  |  Downloads: 0
50889
Views: 0  |  Downloads: 0
inscritos_andaluz_absoluto_05
Views: 0  |  Downloads: 0
Week 2 Term 3 Aug 8th
Views: 0  |  Downloads: 0
F1
Views: 0  |  Downloads: 0
suspensions_extensions
Views: 0  |  Downloads: 0
dangerous minds journal
Views: 0  |  Downloads: 0
CommitteeontheRightsoftheChild
Views: 0  |  Downloads: 0
projectsummary_1
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!