Embed
Email

Data_Mining

Document Sample

Shared by: xiaopangnv
Categories
Tags
Stats
views:
1
posted:
12/11/2011
language:
pages:
27
Data Mining

with JDM API

Regina Wang

Data Mining

Knowledge-Discovery in Databases (KDD)

Searching large volumes of data for patterns.

The nontrivial extraction of implicit, previously

known, and potentially useful information from

data.

The science of extracting useful information

from large data sets or databases.

Uses computational techniques from statistics,

machine learning, and pattern recognition.

Descriptive Statistics

Collect data

Classify data

Summarize data

present data

Make inferences to draw a conclusions

--Point and interval estimation

--Hypothesis testing

--Prediction

Machine Learning

Concerned with the development of

techniques which allow computers to

"learn".

Concerned with the algorithmic

complexity of computational

implementations.

Many inference problems turn out to be

NP-hard or harder .

Common Machine Learning

Algorithm

Supervised learning—prior knowledge

Unsupervised learning—statistical

regularity of the patterns

Semi-supervised learning

Reinforcement learning

Transduction

Learning to learn

Pattern Recognition

The act of taking in raw data and taking an

action based on the category of the data.

Aims to classify data patterns based on prior

knowledge or on statistical info.

Based on availability of training set:

supervised and unsupervised leanings

Two approaches: statistical (decision theory)

and syntactic (structural).

Supervised Techniques

Classification:

-- k-Nearest Neighbors

--Naïve Bayes

--Classification Trees

--Descriminant Analysis

--Logistic Regression

--Neural Nets

Supervised Techniques



Prediction (Estimation):

--Regression

--Regression Trees

--k-Nearest Neighbors

Unsupervised Techniques

Cluster Analysis

Principle Components

Association Rules

Collaborative Filtering

JAVA Data Mining API (JDM)

Data-mining tools were traditionally

provided in products with vendor-

specific interfaces.

The Java Data Mining API (JDM)

defines a common Java API to interact

with data-mining systems.

Developed by Java Community Data

Mining Expert Group

JDM Current Versions

JDM 1.0 (JSR 73) final specification in

August, 2004

http://www.jcp.org/en/jsr/detail?id=73

JDM 2.0 (JSR 247) Early Review

http://www.jcp.org/en/jsr/detail?id=247

JDM is for the Java™ 2 Platform

(J2EE™) and (J2SE™)

Data Mining System

A typical data-mining system consists of

--a data-mining engine

--a repository that persists the data-mining

artifacts, such as the models, created in

the process.

The actual data is obtained via a database

connection, or via a file-system API.

JDM Architectural components

Application programming interface (API)

Data mining engine (DME) – or data mining

server (DMS), provides the infrastructure

that offers a set of data mining services to its

API clients.

Mining object repository (MOR) - The

DME uses a mining object repository which

serves to persist data mining objects

Key JDM API benefit :

abstracts out the physical components, tasks, and

algorithms to java classes









Figure 1. Components of a data-mining system

Building a data-mining model

1. Decide what you want to learn.

2. Select and prepare your data.

3. Choose mining tasks and configure the

mining algorithms.

4. Build your data-mining model.

5. Test and refine the models.

6. Report findings or predict future

outcomes.

Data Mining Process









Figure 2. Data mining steps.

Usage of JDM API

Using JDM to explore mining object

repository (MOR) and find out what

models and model building parameters

work best.

Follow a few simple steps that map the

process to JDM interactions.

Build Java Data Mining GUI Application

Figure 4. Top level interfaces.









Figure 3. Top level packages.

Figure 4. Top level interfaces.

Using the JDM API

1. Identify the data you wish to use to build your

model—your build data—with a URL that points to

that data.

2. Specify the type of model you want to build, and

parameters to the build process. Such parameters

are termed build settings in JDM. such as

clustering, classification, or association rules.

These tasks are represented by API classes.

3. Create a logical representation of your data to

select certain attributes of the physical data, and

then map those attributes to logical values.

Using the JDM API

4. Specify the parameters to your data-mining

algorithms

5. Create a build task, and apply to that task

the physical data references and the build

settings.

6. Finally, you execute the task. The outcome

of that execution is your data model. That

model will have a signature—a kind of

interface—that describes the possible input

attributes for later applying the model to

additional data.

Using data model and results

Once you've created a model, you can test

that model, and then even apply the model

to additional data. Building, testing, and

applying the model to additional data is an

iterative process that, ideally, yields

increasingly accurate models.

Those models can then be saved in the

MOR, and used to either explain data, or

to predict the outcome of new data in

relation to your data-mining objective.

JDM Data Connection

A JDM connection is represented by the engine

variable, which is of type

javax.datamining.resource.Connection. JDM

connections are very similar to JDBC

connections, with one connection per thread.



PhysicalDataSetFactory dataSetFactory =

(PhysicalDataSetFactory)

engine.getFactory("javax.datamining.data.PhysicalDataS

et");

JDM Data Connection

Build data is referenced via a PhysicalDataSet

object, which, in turn, loads the data from a file

or a database table, referenced with a URL.



PhysicalDataSet dataSet = pdsFactory.create(

"file:///export/data/textFileData.data", true);

Code Example: Building a

clustering model

// Create the physical representation of the data

(1) PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory) dme-

Conn.getFactory( ―javax.datamining.data.PhysicalDataSet‖ );

(2) PhysicalDataSet buildData = pdsFactory.create( uri, true );

(3) dmeConn.saveObject( ―myBuildData‖, buildData, false );

// Create the logical representation of the data from physical data

(4) LogicalDataFactory ldFactory = (LogicalDataFactory) dmeConn.getFactory(

―javax.datamining.data.LogicalData‖ );

(5) LogicalData ld = ldFactory.create( buildData );

(6) dmeConn.saveObject( ―myLogicalData‖, ld, false );

// Create the settings to build a clustering model

(7) ClusteringSettingsFactory csFactory = (ClusteringSettingsFactory) dme-

Conn.getFactory( ―javax.datamining.clustering.ClusteringSettings‖);

(8) ClusteringSettings clusteringSettings = csFactory.create();

(9) clusteringSettings.setLogicalDataName( ―myLogicalData‖ );

(10) clusteringSettings.setMaxNumberOfClusters( 20 );

Code Example: Building a

clustering model con’t

(11) clusteringSettings.setMinClusterCaseCount( 5 );

(12) dmeConn.saveObject( ―myClusteringBS‖, clusteringSettings, false );

// Create a task to build a clustering model with data and settings

(13) BuildTaskFactory btFactory = (BuildTaskFactory) dmeConn.getFactory(

―javax.datamining.task.BuildTask‖ );

(14) BuildTask task = btFactory.create( ―myBuildData‖, ―myClusteringBS‖,

―myClusteringModel‖ );

(15) dmeConn.saveObject( ―myClusteringTask‖, task, false );

// Execute the task and check the status

(16) ExecutionHandle handle = dmeConn.execute( ―myClusteringTask‖ );

(17) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done

(18) ExecutionStatus status = handle.getLatestStatus();

(19) if( ExecutionState.success.equals( status.getState() ) )

(20) // task completed successfully...

References

Java Data Mining Specification

http://www.jcp.org/en/jsr/detail?id=73

Mine Your Own Data with the JDM

API, Frank Sommers, July 7, 2005

http://www.artima.com/lejava/articles/da

ta_mining.html

http://www.stanford.edu/class/cs345a

/#handouts



Related docs
Other docs by xiaopangnv
180617
Views: 0  |  Downloads: 0
apostar-por-crear-una-empresa
Views: 0  |  Downloads: 0
Contemplative Pedagogy Principles and Design
Views: 1  |  Downloads: 0
PreApplications
Views: 1  |  Downloads: 0
Basic or Pure Science vs. Applied Science
Views: 0  |  Downloads: 0
Algorithmic Problems Related To The Internet
Views: 0  |  Downloads: 0
E07-PC-23-03a_EFET Wish list
Views: 0  |  Downloads: 0
ATT
Views: 2  |  Downloads: 0
1793A_Example
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!