Data Mining

Document Sample
Data Mining Powered By Docstoc
					Data Mining
 Mandeep Jandir
       What is Data Mining?
Data mining, or knowledge discovery, is
 the process of discovering hidden patterns
 and relationships in data in order to make
 better and more informed decisions.
Data mining tools predict behaviors and
 future trends, allowing businesses to make
 knowledge-driven decisions.
        Why use Data Mining?
Data mining is technique that helps
 individuals or companies find useful
 information to make better decisions from
 large amounts of data.
     Reduce risks
     Find problems and issues
     Save money
     High confidence predictions
     Simplifies information
          Goals of Data Mining
     Data mining can show how certain attributes
      within the data will behave in the future.
     Ex. - certain seismic wave patterns may
      predict an earthquake with high probability.
     Data patterns can be used to identify the
      existence of an item, an event, or an activity.
  Goals of Data Mining (cont’d)
     Data mining can partition the data so that
      different classes or categories can be
      identified based on combinations of
     Ex. - customers in a supermarket can be
      categorized into discount-seeking shoppers,
      shoppers in a rush, loyal regular shoppers,
      shoppers attached to name brands, and
      infrequent shoppers.
  Goals of Data Mining (cont’d)
     Optimize the use of limited resources such as
      time, space, money, or materials and
      maximize output variables such as sales or
      profits under a given set of constraints.
Types of Knowledge Discovered
      during Data Mining
Knowledge is often classified as inductive
 versus deductive.
     Deductive knowledge deduces new
      information based on applying pre-specified
      logical rules of deduction on the given data.
     Data mining addresses inductive knowledge,
      which discovers new rules and patterns from
      the supplied data.
Types of Knowledge Discovered
   during Data Mining cont’d
It is common to describe knowledge
 discovered during data mining as:
     Association Rules
     Classification hierarchies
     Sequential patterns
     Patterns within time series
     Clustering
   Types of Association Rules
Market-Basket Model, Support, and
Apriori Algorithm
Sampling Algorithm
Frequent-Pattern Tree Algorithm
Partition Algorithm
           Apriori Algorithm
Principle: Any subset of a frequent itemset
 must be frequent.
Generate k-itemsets by joining large k-1-
 itemsets and deleting any that is not large.
      Apriori Algorithm cont’d
Input: Database of m transactions, D, and
 a minimum support, mins, represented as
 a fraction of m.
Output: Frequent itemsets, L1,L2,…,Lk
Elmasri, R. and Navathe, S.:
 Fundementals of Database Systems, 5th

Shared By: