Document Sample
Data MINING Powered By Docstoc
					            Data MINING
• Data mining is the process of extracting
  previously unknown, valid and actionable
  information from large data and then using
  the information so derived to make crucial
  business and strategic decision.

• To discover meaningful patterns and rules.
Data Warehouse to Data Mining
• D.W.H, being a subject-oriented, integrated, time variant
  and non-volatile collection of data aims at supporting
  business decision, whereas data mining process is a
  natural and logical continuation of D.W.H
• D.M needs accurate , consistent and good quality data
  as the model mined are to be consistent and accurate.
  D.W.H is collection of all aspects
• D.M can give more relations, uncovered patterns and
  rules it mined from different and more data sources
• Query directions are made easy in D.W.H
            Steps of Data Mining
1.   Identifying the Data – Data could be all over the world not just in
     the enterprise. data could distributed (paper, people heads etc..)
2.   Getting the data ready – (Getting the data ready) – It is to put in
     data in right format in DB are to be built in the system
3.   Mining the data – After right data is known , its cleaned, scrubbed
     and remove unnecessary items and get only data essential for
4.   Getting useful results – After mining , What outcomes do we
     want? Do we want the tools to find interesting patterns? Are these
     tools available or do we build the tools?
5.   Identifying action – After having determined the data and tools ,
     we start the tools to operate the data. It can produce lots of data
     so we know what to do with patterns?
6.   Implementing the actions – After getting useful results?
     Examine results and identify actions that can be taken
     eg analyzing pattern items go together and put
7.   Evaluating the benefits – After actions implemented,
     we wait to save results. These results may
     immediately or may take longtime. Once we are in
     position to determine the benefits and costs, re-
     evaluate the procedure. By that the data may have
     changed. New tools may be recommended. So plan
     the next mining cycle and find how to go about it
8.   Determining what to do next
9.   Carrying out the next cycle
    Outcomes of Data Mining
There are six activities in data mining which
   are known as data mining tasks or types
1. Classification
2. Estimation
3. Prediction
4. Affinity grouping or Association rules
5. Clustering
6. Description and Visualization
• Classification consists of examining the features
  of a newly presented object and assigning to it a
  predefined class the common features are
• Classification is carried out by developing
  training sets with pre-classified examples and
  then building a model that fits the description of
  the classes.
• In classification, a group of entities is partitioned
  based on a predefined value of some attributes.
• Classification deals with discrete outcomes yes
  or no, debit card or car loan.
• Based on the spending patterns of a
  person and his age one can estimate his
  salary or the number of children he has.
• Estimation deals with continuously valued
• Classification and estimation are used
• This task predicts the future behaviour of some values.
  For example based on the education of a person, his
  current job and trends in the industry one can predict
  that his salary will be a certain amount be year 2008.
• Predictive task feel different because the records are
  classifed according to some predicted future behaviour
  or estimate future value.
• With prediction the only way to check the accuracy of the
  classification is to wait and see.
• Historical data is used to build a model that current
  observed behavior, when applied to current input it can
  predict future behavior.
   Affinity grouping or Association
• The task of affinity grouping is to determine is to
  determine which things go together.
• This determines the items that go together eg
  who are the people that travel together? What
  are the items that are purchased together?
• Affinity grouping can also be used to identify
  cross selling opportunity and to design attractive
  packages or grouping of product and services.
• Clustering is a DM task that is often confused with
• Clustering are formed by analyzing data.For eg group X
  prefer zen, Y prefer ford icon, z prefer nano.
• Once cluster are obtained then each cluster can be
  examined and mined future for other outcomes such as
  estimation and classification.
• Clustering ids often done as a prelude to some other
  form of data mining or modeling.
• Clustering might be the first step in a market segment
    Description and Visualization

• For eg john usually goes shopping after he
  goes to the bank, but last week he went to
  church after shopping
• Anomaly detection is a form of deviation
  detection and is used for applications such
  as fraud detection and medical illness

Shared By: