Crime Data Mining and Visualization - Filebox

Document Sample
Crime Data Mining and Visualization - Filebox Powered By Docstoc
					CRIME DATA MINING AND
VISUALIZATION

Nemallapudi Chaitanya, Sunkara Anish, ELGammal
Mahmoud
Agenda
   Overview
   Objective
   Motivation
   Approach
   Design
   Mining
   Visualization
Overview
   Implementation of Crime Data Mining and
    Visualization Application
   Use of Mining & Visualization techniques
     PostGIS

     Google    Maps API
     WEKA

   Client-Server approach using JAVA, JAVASCRIPT,
    XML
Objective
   This project is to implement mining methodologies on
    crime data.
   Provide visualization for better understanding the
    data
   This is based on publicly available dispatch reports of
     City of Falls church
     Fairfax county
     Arlington county
Motivation
   Data mining has proven to be a useful methodology in
    providing analytical data normally unseen by
    traditional methods.
   Because of its ability to draw conclusions based on
    many perspectives, it can be used to -
     Identify crime trends and patterns/series.
     Assist law-enforcement agencies in planning of resources

     Aid investigation process by giving a different perspective
Approach
   Collect publicly available crime data
   Parse useful data and load it into database
   Use spatial database to get the co-ordinates of the
    crime locations, criminal location etc.,
   Use mining algorithms (DBScan, K-Nearest Neighbor
    and EM) to analyze the trends in the data
   Use Google Maps to show the crime data based on
    location
   Use prefuse visualization to show graphs based on the
    data collected
Input (Data sets)
 Quality is an important characteristic for any data
 Challenges:

  Difficult to extract few attributes .(eg…. juvenile BMs
  1517yo wearing dark clothing….)
  Missing values( criminal age is not specified in all the
  descriptions)
  In some cases, latitudes and longitude values are
  swaped.
 Develop a crime ontology
Data Preprocessing
   Need to implement dimensionality reduction
   Reduce amount of time and memory required by data
    mining algorithms
   Allow data to be more easily visualized
   May help to eliminate irrelevant features or reduce noise
   Implemented aggregation
   863 crime types were reduced to 45 crime types
   Classification of crimes (e.g. Burglary Commercial and
    Burglary Residential are classified as Burglary)
   Crime Information Extraction (XML Parsing)
System Architecture

                 Input
 Home page       (Request)
 Home Page
                                       Data   Database
                             Servlet
                 Output
 Visualization
(Charts, Maps)




 Map Server                   Mining
Design
   Data Cleansing
     Automated   grouping of 863 crime types in raw data
      into 45 final crime types
     Cleaning of some missing information and handling of
      null and defining data types is done via a parser that
      reads the data from the file and loads it into the
      database
   Data Model
     Indexes are added to some of the most used fields in
      the queries, for performance improvement.
                       Column Name
Id                           Zip
Dataset                      Criminal_age
Crime_type                   Criminal_gender
Description                  Victim_age
Crime_time                   Victim_gender
address                      Crime_latlng
Criminal_address
Criminal_description
Indices
Data Mining
   Different API’s for data mining.
     Ex:WEKA, Java Data Mining Package (JDMP),
      RapidMiner (YALE)
   WEKA
     WEKA    is a Machine Learning and Data Mining
      software tool written in Java
     Open Source, well documented, support for
      visualization
Data Mining Functions Implemented

   WEKA works on a “Attribute-Relation File Format
    (ARFF).”
   Filters:
     Supervised   : Interface for filters that make use of a
      class attribute.
       Ex:   Discretize, NominalToBinary, Resample
     Unsupervised:      Interface for filters that do not need a
      class attribute.
       Ex:   Standardize, StringToNominal, StringToWordVector
Functions Available
   Supported Classification
     Ex:   NaiveBayes, RandomizableClassifier
   Supported Association Functions
     Ex:   Apriori, Associator.
   Supported Clustering Algorithms
     Ex:   Simple K-means, DBScan, EM
   Outlier Detection based on location is facilitated by
    PostGIS
Implementation
   Data mining part is implemented separately on the
    3 datasets
     Due to variations in attributes in data sets.
     Results do not reflect anomalies in the datasets.
       Ex: 725 records in Falls Church, compared to 11507 records
        in Fairfax (1705 records dealing with Auto-Theft)
   Fetching data
         is fetched from the data base or CSV files using
     Data
      WEKA functions.
Implementation Continued…
   Filtering
       Unwanted attributes are filtered (removed) from the
        working data set. Ex: Criminal Description etc.
   Clustering
     DBScan, K-means, EM algorithms are implemented using
      WEKA API.
     Simple K-means,
         Takes a range of values of K (Say 1 to 45) as we know 45 is the
          number of different crime types in the database.
         Calculates the SSE between the clusters corresponding to the K
          value and picks the K where SSE is low.
Implementation Continued…
   Visualization
     The mined data is then sent to the Visualization
      explorer in WEKA, where different attributes can be
      graphed and represented.
   Examples of Some Visualizations are :
Examples

   Will include some graphs…..




       Arlington, DOW VS Clusters: Auto Theft(C3) is low on Weekends
Example 2




       Arlington Data set, data inclined towards Wednesday.
Example 3




  Fairfax, Month VS Clusters: Show that in June the data is very sparse
Advantages of this Implementation

   Does not depend on one algorithm such as K-means.
   Modules can be added seamlessly to the existing
    code to implement other algorithms or using WEKA
    API
   Open design: Algorithm implementation can be
    switched with simple parameter changes.
Visualization
   Google Maps
     Visualization   of data is implemented using Google
      Maps API
   WEKA used for histograms and cluster visualization
PostGIS
   PostGIS is spatial database extender for the
    PostGreSQL DBMS.
   Adds spatial functions such as distance, area, and
    specialty geometry data types to the database.
    Relies on GiST (Generalized Search Tree) for
    indexing geometric data.
PostGIS (continued)
   Examples of geometry data types:
       POINT(2572292.2 5631150.7)
       LINESTRING (2566006.4 5633207.9, 2566028.6 5633215.1,
        2566062.3 5633227.1)
       POLYGON (2568262.1 5635344.1, 2568298.5 5635387.6,
        2568261.04 5635276.15, 2568262.1 5635344.1)
   Examples of PostGIS functions/operators:
       Distance(), Intersetcs(), Within(), Contains(), Length(), Area(),
        ConvexHull(), Extent(), ...
       A~B         (A contains b?)
       A@B         (B contains A?)
       A && B (Do A and B overlap?)
       ...
Detecting Outliers By Location
Map Visualization
   The client UI is implemented in JavaScript.
   The Google Maps API was used to view the map
    and draw all necessary illustrations.
   The UI uses asynchronous (AJAX) requests to
    communicate with the server.
   The server replies in JSON (a data-interchange
    format native to JavaScript).
   Request/Reply batching is used to improve
    performance.
Map Visualization (continued)
   The UI can generate map-based visualizations
    showing the following:
     Crime  rate in different regions filtered by: dataset,
      crime status (attempted/committed), year, month, day
      of week.
     As well as: crime type, year, month, day of week with
      the highest frequency in different regions.
References
   PostGIS (http://postgis.refractions.net/)
   Google Map API
    (http://code.google.com/apis/maps/)
   JDMP (http://www.jdmp.org)
   WEKA (www.cs.waikato.ac.nz/ml/weka/)
     API:   (http://weka.sourceforge.net/doc/)
   Rapid Miner(YALE) (www.rapidminer.com)
DEMO

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:2/22/2013
language:Latin
pages:30