DATA MINING
Blake Lehmans
What is Data Mining?
• Analyzing data to extract the meaning
• Discover hidden patterns
• Predict behavior and future trends
• Consists of three stages:
• 1. Initial exploration
• 2. model building
• 3. deployment
How Data Mining Originated?
• Statistics
• Artificial Intelligence
• Benefits from computer innovations
Why Data Mining?
• Improve customer service
• Target marketing campaigns
• Identify high-risk clients
• Improve production processes
• SAS
What Can Data Mining Do?
• Automated discovery of previously unknown patterns
• Automated prediction of trends and behaviors
• Areas of benefit:
• Market segmentation
• Customer churn
• Fraud detection
• Direct marketing
• Interactive marketing
• Market basket analysis
• Trend analysis
Build a Data Warehouse
• Gather Data
• Stores large quantities of data by specific categories
• Easily retrieved, interpreted, and sorted by users
• Centralization
• Data Cleaning
Regression Analysis
• Develops a
mathematical formula
to fit the data
• Y=a+bx
• Major limitation
Clustering
• Segment population into groups with similar
characteristics
• Can determine outliers
• Hierarchical clustering
Nearest Neighbor vs. K-Nearest Neighbor
• Nearest neighbor
• Prediction based on
closest data
• K-Nearest neighbor
• Considers multiple
neighboring data points
Decision Tree
• Used for both
exploration and
prediction
• Develop good
questions
• When to stop growing
the tree
Artificial Neural Networks
• Highly accurate
• Designed to work like
a brain
• Can slowly be
changed
Rule Induction
• “if this then this”
• How often is the rule correct?
• How often does the rule apply?
• Not always causality
Rule Accuracy Coverage
If breakfast cereal purchased then milk purchased. 85% 20%
If bread purchased then swiss cheese purchased. 15% 6%
If 42 years old and purchased pretzels and purchased dry roasted 95% 0.01%
peanuts then beer will be purchased.
Results Validation
• Verify the patterns produced
• Finding patterns that are not present in the general data
set
• Overfitting
Required Technology Infrastructure
• Size of the database
• Query complexity
Data Mining Social Media
• Information posted publicly is fair game
• Aggregators collate information and sell it to companies
who want to learn about customers and what they do
online
• Affect your credit offers?
Conclusion
• Helps discover and predict information that can cut costs
and increase revenues
• Answer business questions that were traditionally too time
consuming to resolve
References
• http://databases.about.com/od/datamining/a/datamining.htm
• http://www.anderson.ucla.edu/faculty/jason.frand/teacher/techn
ologies/palace/datamining.htm
• http://www.thearling.com/text/dmwhite/dmwhite.htm
• http://www.laits.utexas.edu/~norman/BUS.FOR/course.mat/Ale
x/
• http://mashable.com/2010/03/02/data-mining-social-media/
• http://www.thearling.com/text/dmtechniques/dmtechniques.htm
• http://databases.about.com/gi/o.htm?zi=1/XJ&zTi=1&sdn=data
bases&cdn=compute&tm=715&f=00&su=p504.1.336.ip_&tt=2&
bt=0&bts=0&zu=http%3A//www.statsoft.com/textbook/stdatmin.
html
• http://www.youtube.com/user/SASsoftware?v=cIcmd5zfu3c&fe
ature=pyv&ad=8206369956&kw=data%20mining