Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Data_Mining

VIEWS: 3 PAGES: 27

									Data Mining
with JDM API
  Regina Wang
           Data Mining
Knowledge-Discovery in Databases (KDD)
Searching large volumes of data for patterns.
The nontrivial extraction of implicit, previously
known, and potentially useful information from
data.
The science of extracting useful information
from large data sets or databases.
Uses computational techniques from statistics,
machine learning, and pattern recognition.
Descriptive Statistics
Collect data
Classify data
Summarize data
present data
Make inferences to draw a conclusions
--Point and interval estimation
--Hypothesis testing
--Prediction
   Machine Learning
Concerned with the development of
techniques which allow computers to
"learn".
Concerned with the algorithmic
complexity of computational
implementations.
Many inference problems turn out to be
NP-hard or harder .
Common Machine Learning
      Algorithm
 Supervised learning—prior knowledge
 Unsupervised learning—statistical
 regularity of the patterns
 Semi-supervised learning
 Reinforcement learning
 Transduction
 Learning to learn
     Pattern Recognition
The act of taking in raw data and taking an
action based on the category of the data.
Aims to classify data patterns based on prior
knowledge or on statistical info.
Based on availability of training set:
supervised and unsupervised leanings
Two approaches: statistical (decision theory)
and syntactic (structural).
Supervised Techniques
  Classification:
  -- k-Nearest Neighbors
  --Naïve Bayes
  --Classification Trees
  --Descriminant Analysis
  --Logistic Regression
  --Neural Nets
Supervised Techniques

  Prediction (Estimation):
  --Regression
  --Regression Trees
  --k-Nearest Neighbors
Unsupervised Techniques
  Cluster Analysis
  Principle Components
  Association Rules
  Collaborative Filtering
JAVA Data Mining API (JDM)
 Data-mining tools were traditionally
 provided in products with vendor-
 specific interfaces.
 The Java Data Mining API (JDM)
 defines a common Java API to interact
 with data-mining systems.
 Developed by Java Community Data
 Mining Expert Group
 JDM Current Versions
   JDM 1.0 (JSR 73) final specification in
   August, 2004
http://www.jcp.org/en/jsr/detail?id=73
   JDM 2.0 (JSR 247) Early Review
http://www.jcp.org/en/jsr/detail?id=247
  JDM is for the Java™ 2 Platform
  (J2EE™) and (J2SE™)
    Data Mining System
A typical data-mining system consists of
--a data-mining engine
--a repository that persists the data-mining
artifacts, such as the models, created in
the process.
The actual data is obtained via a database
connection, or via a file-system API.
JDM Architectural components
Application programming interface (API)
Data mining engine (DME) – or data mining
server (DMS), provides the infrastructure
that offers a set of data mining services to its
API clients.
Mining object repository (MOR) - The
DME uses a mining object repository which
serves to persist data mining objects
    Key JDM API benefit :
abstracts out the physical components, tasks, and
algorithms to java classes




     Figure 1. Components of a data-mining system
Building a data-mining model
1. Decide what you want to learn.
2. Select and prepare your data.
3. Choose mining tasks and configure the
   mining algorithms.
4. Build your data-mining model.
5. Test and refine the models.
6. Report findings or predict future
   outcomes.
Data Mining Process




          Figure 2. Data mining steps.
    Usage of JDM API
Using JDM to explore mining object
repository (MOR) and find out what
models and model building parameters
work best.
Follow a few simple steps that map the
process to JDM interactions.
Build Java Data Mining GUI Application
  Figure 4. Top level interfaces.




Figure 3. Top level packages.
Figure 4. Top level interfaces.
          Using the JDM API
1.   Identify the data you wish to use to build your
     model—your build data—with a URL that points to
     that data.
2.   Specify the type of model you want to build, and
     parameters to the build process. Such parameters
     are termed build settings in JDM. such as
     clustering, classification, or association rules.
     These tasks are represented by API classes.
3.   Create a logical representation of your data to
     select certain attributes of the physical data, and
     then map those attributes to logical values.
        Using the JDM API
4. Specify the parameters to your data-mining
   algorithms
5. Create a build task, and apply to that task
   the physical data references and the build
   settings.
6. Finally, you execute the task. The outcome
   of that execution is your data model. That
   model will have a signature—a kind of
   interface—that describes the possible input
   attributes for later applying the model to
   additional data.
Using data model and results
 Once you've created a model, you can test
 that model, and then even apply the model
 to additional data. Building, testing, and
 applying the model to additional data is an
 iterative process that, ideally, yields
 increasingly accurate models.
 Those models can then be saved in the
 MOR, and used to either explain data, or
 to predict the outcome of new data in
 relation to your data-mining objective.
      JDM Data Connection
A JDM connection is represented by the engine
variable, which is of type
javax.datamining.resource.Connection. JDM
connections are very similar to JDBC
connections, with one connection per thread.

PhysicalDataSetFactory dataSetFactory =
(PhysicalDataSetFactory)
engine.getFactory("javax.datamining.data.PhysicalDataS
et");
      JDM Data Connection
Build data is referenced via a PhysicalDataSet
object, which, in turn, loads the data from a file
or a database table, referenced with a URL.

PhysicalDataSet dataSet = pdsFactory.create(
"file:///export/data/textFileData.data", true);
          Code Example: Building a
              clustering model
// Create the physical representation of the data
(1) PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory) dme-
Conn.getFactory( ―javax.datamining.data.PhysicalDataSet‖ );
(2) PhysicalDataSet buildData = pdsFactory.create( uri, true );
(3) dmeConn.saveObject( ―myBuildData‖, buildData, false );
// Create the logical representation of the data from physical data
(4) LogicalDataFactory ldFactory = (LogicalDataFactory) dmeConn.getFactory(
―javax.datamining.data.LogicalData‖ );
(5) LogicalData ld = ldFactory.create( buildData );
(6) dmeConn.saveObject( ―myLogicalData‖, ld, false );
// Create the settings to build a clustering model
(7) ClusteringSettingsFactory csFactory = (ClusteringSettingsFactory) dme-
Conn.getFactory( ―javax.datamining.clustering.ClusteringSettings‖);
(8) ClusteringSettings clusteringSettings = csFactory.create();
(9) clusteringSettings.setLogicalDataName( ―myLogicalData‖ );
(10) clusteringSettings.setMaxNumberOfClusters( 20 );
       Code Example: Building a
        clustering model con’t
(11) clusteringSettings.setMinClusterCaseCount( 5 );
(12) dmeConn.saveObject( ―myClusteringBS‖, clusteringSettings, false );
// Create a task to build a clustering model with data and settings
(13) BuildTaskFactory btFactory = (BuildTaskFactory) dmeConn.getFactory(
―javax.datamining.task.BuildTask‖ );
(14) BuildTask task = btFactory.create( ―myBuildData‖, ―myClusteringBS‖,
―myClusteringModel‖ );
(15) dmeConn.saveObject( ―myClusteringTask‖, task, false );
// Execute the task and check the status
(16) ExecutionHandle handle = dmeConn.execute( ―myClusteringTask‖ );
(17) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done
(18) ExecutionStatus status = handle.getLatestStatus();
(19) if( ExecutionState.success.equals( status.getState() ) )
(20) // task completed successfully...
         References
  Java Data Mining Specification
http://www.jcp.org/en/jsr/detail?id=73
  Mine Your Own Data with the JDM
  API, Frank Sommers, July 7, 2005
http://www.artima.com/lejava/articles/da
  ta_mining.html
  http://www.stanford.edu/class/cs345a
  /#handouts

								
To top