Database Modeling and Design Chapter 8 (Part D) Data Mining Basics Instructor: Paul Chen Topics 1. How Data Mining Evolved? 2. Decision Processing Overview and Tasks 3. Data Mining, What’s it? 4. Data Mining vs. Data Warehousing 5. How Data Mining Works? And Its Applications 6. Data Mining Operations and Associated Techniques 7. The Data Mining Process 8. Data Mining Tools 9. Data Mining Applications For CRM 10. Data Mining From Government Printing Office 11. Data Mining Techniques- A Summary Topic 1:How Data Mining Evolved? Many businesses have invested heavily in information technology to help them manage their businesses more effectively and gain a competitive edge. Increasingly large amounts of critical business data are being stored electronically and this volume is expected to continue to grow. The Data Mining technology is helping companies leverage their existing data more effectively and obtain insightful information giving them a competitive edge. How Data Mining Evolved? 1960s 1990s Late 1990s to 1970s-80s Data OLAP and Now RDBMS Data Mining Collection DW Time Line Topic 2: Decision Processing Overview Decision processing systems, and their underlying analytical applications, provide business users with the information they need to track and analyze business trends, and to explore new business opportunities. As businesses become increasingly competitive and complex, effective decision processing systems are essential for success. The Next Generation of Business Intelligence A decision processing system analyzes business information captured from operational systems (Back- and-front office, and e-business applications). Distribution of business information to business users is via corporate intranets and extranets. The flow of data can be thought of as an information supply chain whose objective is to convert operational data into useful business information. The Decision Processing Information Supply Chain Business Metrics Operational Systems External Analytic E-Business Data Applications Applications Collaborative DW & Back-Office Office Systems Transaction Business Applications Intelligence Information Tools Staging Area Business Front-Office Decisions Applications Decision Processing—Four Tasks*** Extracting and transforming information This involves capturing data from operational systems, transforming it into business information, and loading Into a data warehouse information store. Current extract templates on the market are primarily at Capturing data from ERP (Enterprise Resource Planning) Transaction processing systems –for example: SAP Business Information Warehouse and Peoplesoft BPM data warehouse) *** Mentioned in chapter 2 Decision Processing—Four Tasks (Cont’d) Managing information This task encompasses the maintenance of business information in information stores, and how these information stores are processed by business intelligence tools and analytic applications. The cornerstone of decision processing is data warehousing, and warehouse information stores should be organized and modeled into relational and multidimensional database products. Decision Processing—Four Tasks (Cont’d) Analyzing and modeling information The traditional approach to decision processing is to build a data warehouse and supply business users with a set of business intelligence tools (query, reporting, OLAP and data mining, for example) to process information in data warehouse information stores. A better approach is employ turn-key and web- based analytic application packages that are designed to provide comprehensive analyses for the business area being researched. Key business metrics (ex. Revenue dollars per sales rep per day) are useful. Decision Processing—Four Tasks (Cont’d) Distributing information Business intelligence tools and analytic applications distribute information and the results of analysis operations to business users via standard graphical and Web interfaces. To help users uncover and organize this range of business information, an enterprise information portal (EIP) is required. An EIP provides a single point of entry to any piece of business information, no matter where it resides. The main components of an EIP are information assistant (Web browser interface) , an information directory and a subscription facility. Decision Making Under Risk Decisions are made under three sets of conditions: Certainty The decision makers know everything in advance of making the decision Uncertainty The decision makers know nothing about the probabilities or the consequences of decisions Risk Decision-Making Style Decision-making styles of users are categorized as either Analytic or Heuristic Analytic and Heuristic Decision Making Analytical Decision Maker Heuristic Decision Maker Learns by analyzing Learns by acting Uses step-by-step procedure Uses trial and error Values quantitative Values experiences information and models Relies on common sense Builds mathematical models Seeks completely satisfying and algorithms solution Seeks optimal solution Topic 3: Data Mining, What’s it? Data Mining has been defined as “ a decision support process in which a search is made for patterns of information in data”. To detect patterns in data, Data Mining uses sophisticated statistical analysis and modeling technologies to uncover useful relationships hidden in databases. It predicts future trends and finds behavior allowing businesses to make predictive, knowledge-driven decisions. Data Mining, What’s it? The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions, (Simoudis,1996). Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. Data Mining, What’s it? Reveals information that is hidden and unexpected, as little value in finding patterns and relationships that are already intuitive. Patterns and relationships are identified by examining the underlying rules and features in the data. Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions. Data Mining, What’s it? Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data. Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. Relatively new technology, however already used in a number of industries. Topic 4: Data Mining vs. Data Warehousing Data Mining does not require that a Data Warehouse be built. Often, data can be downloaded from the operational files to flat files that contain the data ready for the data mining analysis. Data Mining can be implemented rapidly on existing software and hardware platforms. Data Mining tools can analyze massive databases to deliver answers to questions such as, “ Which customers are most likely to respond to my next promotional mailing, and why?” Data Mining vs. Data Warehousing Major challenge to exploit data mining is identifying suitable data to mine. Data mining requires single, separate, clean, integrated, and self- consistent source of data. A data warehouse is well equipped for providing data for mining. Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive models. Data warehouses are populated with clean, consistent data. Data Mining vs. Data Warehousing Advantageous to mine data from multiple sources to discover as many interrelationships as possible. Data warehouses contain data from a number of sources. Selecting relevant subsets of records and fields for data mining requires query capabilities of the data warehouse. Results of a data mining study are useful if there is some way to further investigate the uncovered patterns. Data warehouses provide capability to go back to the data source. Topic 5: How Data Mining Works? How exactly is Data Mining able to tell you important things that you didn’t know or what is going to happen next? The technique in Data Mining is called Predictive Modeling which is knowledge discovery process via relationships and patterns in broad sense. Modeling is the act of building a model in one situation where you know the answer and then applying it to another situation that you don’t. Examples of Applications of Data Mining via relationships and patterns Retail / Marketing Identifying buying patterns of customers Finding associations among customer demographic characteristics Predicting response to mailing campaigns Market basket analysis Examples of Applications of Data Mining via relationships and patterns Banking Detecting patterns of fraudulent credit card use Identifying loyal customers Predicting customers likely to change their credit card affiliation Determining credit card spending by customer groups Examples of Applications of Data Mining via relationships and patterns Insurance Claims analysis Predicting which customers will buy new policies. Medicine Characterizing patient behaviour to predict surgery visits Identifying successful medical therapies for different illnesses. Examples of Applications of Data Mining via relationships and patterns Customer profiling: characteristics of good customers are identified with the goals of predicting who will become one and helping marketers target new prospects. Targeting specific marketing promotions to existing and potential customers offers similar benefits. Market-basket analysis: With Data Mining, companies can determine which products to stock in which stores, and even how to place them within a store. Examples of Applications of Data Mining via relationships and patterns Customer Relationships Management-Determines characteristics of customers who are likely to leave for a competitor, a company can take action to retain that customer because doing so is usually for less expensive than acquiring a new customer. Fraud detection- With Data Mining, companies can identify potentially fraudulent transactions before they happen. Topic 6: Data Mining Operations and Associated Techniques In previous foils, predictive modeling in essence includes other operations shown in the above table. Descriptive: The dealer sold 200 cars last month. Operational (OLTP) Explanatory: For every increase in 1 % in the interest, auto sales decrease by 5 %. Traditional DW OLAP Predictive: predictions about future buyer behavior. Data Mining Level of Modeling vs. Level of Analytical Processing Descriptive Explanatory Predictive SIMPLE QUERIES “WHAT IF” & REPORTS PROCESSING DETERMINE IF ANY PATTERNS ANALYZE WHAT EXIST BY REVIEWING HAS PREVIOUSLY DATA RELATIONSHIPS OCCURRED TO BRING ABOUT THE CURRENT STATE OF THE DATA Normalized Denormalized + Statistical Analysis/ Tables Tables Artificial Intelligence Roll-up; Drill Down Classification & Value Prediction Predictive Modelling Similar to the human learning experience uses observations to form a model of the important characteristics of some phenomenon. Uses generalizations of ‘real world’ and ability to fit new data into a general framework. Can analyze a database to determine essential characteristics (model) about the data set. Predictive Modelling Model is developed using a supervised learning approach, which has two phases: training and testing. Training builds a model using a large sample of historical data called a training set. Testing involves trying out the model on new, previously unseen data to determine its accuracy and physical performance characteristics. Predictive Modelling Applications of predictive modelling include customer retention management, credit approval, cross selling, and direct marketing. Two techniques associated with predictive modelling: A. classification B. value prediction, distinguished by nature of the variable being predicted. Statistical Analysis of Actual Sales (dollars and quantities) relative To these Signage Variables-a predictive modeling example. Content Frequency Depth Focus Depth Scale Length Location Statistical Analysis : Correlation, Regression, Experiment Design, Optimization. Now it goes into real time analysis. Signage Signage PREDICTIVE MODELING There are two techniques associated with predictive modeling: classification and value prediction, which are distinguished by the nature of the variable being predicted. Predictive Modelling - Classification Used to establish a specific predetermined class for each record in a database from a finite set of possible, class values. Two specializations of classification: tree induction and neural induction. Example of Classification using Tree Induction Example of Classification using Tree Induction Customer renting property > 2 years No Yes Rent property Customer age>45 No Yes Rent property Buy property Example of Classification using Neural Induction Example of Classification using Neural Induction Each processing unit (circle) in one layer is connected to each processing unit in the next layer by a weighted value, expressing the strength of the relationship. The network attempts to mirror the way the human brain works in recognizing patterns by arithmetically combining all the variables with a given data point. In this way, it is possible to develop nonlinear predictive models that ‘learn’ by studying combinations of variables and how different combinations of variables affect different data sets. Predictive Modelling - Value Prediction Used to estimate a continuous numeric value that is associated with a database record. Uses the traditional statistical techniques of linear regression and non-linear regression. Relatively easy-to-use and understand. Predictive Modelling - Value Prediction Linear regression attempts to fit a straight line through a plot of the data, such that the line is the best representation of the average of all observations at that point in the plot. Problem is that the technique only works well with linear data and is sensitive to the presence of outliers (i.e.., data values, which do not conform to the expected norm). Predictive Modelling - Value Prediction Although non-linear regression avoids the main problems of linear regression, still not flexible enough to handle all possible shapes of the data plot. Statistical measurements are fine for building linear models that describe predictable data points, however, most data is not linear in nature. Predictive Modelling - Value Prediction Data mining requires statistical methods that can accommodate non-linearity, outliers, and non-numeric data. Applications of value prediction include credit card fraud detection or target mailing list identification. Database Segmentation Aim is to partition a database into an unknown number of segments, or clusters, of similar records. Uses unsupervised learning to discover homogeneous sub-populations in a database to improve the accuracy of the profiles. Database Segmentation Less precise than other operations thus less sensitive to redundant and irrelevant features. Sensitivity can be reduced by ignoring a subset of the attributes that describe each instance or by assigning a weighting factor to each variable. Applications of database segmentation include customer profiling, direct marketing, and cross selling. Example of Database Segmentation using a Scatter plot Database Segmentation Associated with demographic or neural clustering techniques, distinguished by: Allowable data inputs Methods used to calculate the distance between records Presentation of the resulting segments for analysis. Example of Database Segmentation using a Visualization Link Analysis Aims to establish links (associations) between records, or sets of records, in a database. There are three specializations Associations discovery Sequential pattern discovery Similar time sequence discovery Applications include product affinity analysis, direct marketing, and stock price movement. Link Analysis - Associations Discovery Finds items that imply the presence of other items in the same event. Affinities between items are represented by association rules. e.g. ‘When customer rents property for more than 2 years and is more than 25 years old, in 40% of cases, customer will buy a property. Association happens in 35% of all customers who rent properties’. Link Analysis - Sequential Pattern Discovery Finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over a period of time. e.g.Used to understand long term customer buying behaviour. Link Analysis - Similar Time Sequence Discovery Finds links between two sets of data that are time- dependent, and is based on the degree of similarity between the patterns that both time series demonstrate. e.g. Within three months of buying property, new home owners will purchase goods such as cookers, freezers, and washing machines. Deviation Detection Relatively new operation in terms of commercially available data mining tools. Often a source of true discovery because it identifies outliers, which express deviation from some previously known expectation and norm. Deviation Detection Can be performed using statistics and visualization techniques or as a by-product of data mining. Applications include fraud detection in the use of credit cards and insurance claims, quality control, and defects tracing. A Summary: Data-Driven Techniques* Data Visualization Decision Trees Clustering Factor Analysis Neural Network Association Rules Rule Induction * Based on Sakhr Youness’s book “ Professional Data Warehousing with SQL Server 7.0 and OLAP Services Data Visualization A pie chart showing the sales of a product by region is Sometimes much more effective than presenting the same Data in a text or tabular form. 9% Northeast South 11 % 39% North 21 % West 20 % East Decision Tree Cluster Analysis First segment (high income>8,000) Have Children Second Segment (8000>middle income >3000) Married Third Segment (low income < 3000) Last car is A used one Own car Factor Analysis Unlike cluster analysis, factor analysis builds a model from data. The technique finds underlying factors, also called “latent variables” and provides models for these factors based on variables in the data. For ex., a software company is considering a survey to find out the nine most perceived attributes of one of their products. They might categorize these products to categories such as service for technical support, availability for training and a help system. Factor analysis is used for grouping together products based on a similarity of buying patterns so that vendors may bundle several products as one to sell them together at a lower price than their added individual prices.. Neural Networks Association Rules Association models are models that examine the extent to which values of one field depend on, or are produced by, values of another field. These models are often referred to as Market Basket Analysis when they are applied to retail industries to study the buying patterns of these customers, especially in grocery and retail stores that issue their own credit cards. Charging against these cards gives the store the chance to associate the purchases of customers with their identities, which allows them to study associations among other things. Rules Induction This is a powerful technique that involves a large number of rules using a set of “if..then” statements in the pursuit of all possible patterns in the dataset. For ex., if the customer is a male then, if he is between 30 and 40 years of ages, and his income is less than $50,000 and more than $20,000, he is likely to be driving a car that was bought as new. A Summary: Theory-Driven Techniques Correlations T-Tests Analysis of Variables Linear Regression Logistic Regression Discriminate Analysis Forecasting Methods Topic 7: The Data Mining Process Define the problem. Select the data. Prepare the data. Mine the data. Deploy the model. Take business action. Are you ready for Data Mining? Define the problem A successful data mining initiative always starts with a well-defined project. To insure that the project produces incremental value, include an assessment of the status quo solution and a review of technology, organization, and business processes. Select the data This step involves defining your data source . (not every data source and record is required.) The data is usually extracted from the source system to a separate server. Prepare the data This step represents up to 80 percent of the total project effort. For data mining, the data must reside in one flat table (each record has many columns). In addition to being the most time consuming, the step is also the most critical. The resulting models are only as good as the data used to create them. Mine the data Typically the easiest and shortest phase, this step involves applying statistical and AI tools to create mathematical models. Data mining typically occurs on a server separate from the data warehousing and other corporate systems. Deploy the Model Model deployment is the process of implementing the mathematical models into operational systems to improve business results. Take Business Action Use the deployed model to achieve improved results to the business problem identified at the beginning of the process. Step to Implement Data Mining Discovery (patterns, relations Prior Knowledge Associations, etc.) Information Model Validation Deployment ARE YOU READY FOR DATA MINING? Just because you have a data warehouse doesn’t mean you’re necessarily ready for data mining. Much of the work our company does in the data mining arena has more to do with data mining readiness assessment than with actually performing data mining. Metrics you can use to gauge your data mining readiness Do you have a staff of experienced knowledge workers? Do you have the data? Do you have marketing processes in place that can use this data? Do you have a business champion who can embrace the process and results? Do you have the technology infrastructure to support advanced analysis? Topic 8: Data Mining Tools Data mining tools are typically classified by the type of algorithm they use to identify hidden patterns. There are many different algorithms in use, but the four most popular are association, sequence, clustering (or segmentation), and predictive modeling. Data Mining Tools There are a growing number of commercial data mining tools on the marketplace. Important characteristics of data mining tools include: Data preparation facilities Selection of data mining operations Product scalability and performance Facilities for visualization of results. Data Mining vs. OLAP They are two separate breeds of analysis with entirely different objectives, not to mention tools, skill sets, and implementation methods. Data Mining With canned reports, ad hoc querying, and OLAP, the end user defines a hypothesis and determines which data to examine. With data mining, the tool identifies the hypothesis, and it actually tells the user where in the data to start the exploration process. Data Mining Rather than using SQL to filter out values and methodically reduce the data into a concise answer set, data mining uses algorithms that exhaustively review the relationships among data elements to determine if any patterns exist. The whole purpose of data mining is to yield new business information that a business person can act on. OLAP vs. Data Mining Tools OLAP Tools Data Mining Tools Are ad hoc, shrink wrapped Methods for analyzing tools that provide an interface multiple data types to data -- Regression Trees -- Neural networks Are used when you have -- Genetic algorithms specific known questions Are used when you don’t Looks and feels like a know what the questions are spreadsheet that allow rotation, slicing and graphic Usually textual in nature Can be deployed to large number of users Usually deployed to a small number of analysts Data Mining Tools ASSOCIATION Association, also frequently referred to as "affinity analysis," reviews numerous sets of items and looks for common groupings. An example of association is market basket analysis, which involves reviewing the products that consumers purchase in a single trip to the grocery store. ASSOCIATION Finds items that imply the presence of other items in the same event. Affinities between items are represented by association rules. e.g. ‘When a customer rents property for more than 2 years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties’. Data Mining Tools SEQUENCE Sequential analysis helps data miners identify a set of order-specific items or events. Association identifies the existence of patterns or groups of items; sequential analysis identifies the order of those patterns or groups of items. SEQUENCE Finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over a period of time. e.g. Used to understand long term customer buying behavior. Link Analysis - Similar Time Sequence Discovery Finds links between two sets of data that are time- dependent, and is based on the degree of similarity between the patterns that both time series demonstrate. e.g. Within three months of buying property, new home owners will purchase goods such as cookers, freezers, and washing machines. Data Mining Tools CLUSTERING Cluster analysis lets the data miner assemble data into unforeseen groups containing similar characteristics. Also known as "segmentation," this type of data mining is probably the most widely used. CLUSTERING Aim is to partition a database into an unknown number of segments, or clusters, of similar records. Uses unsupervised learning to discover homogeneous sub- populations in a database to improve the accuracy of the profiles. Data Mining Tools PREDICTIVE MODELING As the name implies, predictive modeling involves developing a model from historical data for predicting a future event. The power of predictive modeling engines is that they can use a broad range of data attributes to identify future behavior. Both cluster analysis and predictive modeling tools identify distinct groups of items with common attributes; the difference is that predictive modeling focuses on the likelihood of a particular outcome for a particular group. Topic 9: Data Mining Applications for CRM Which customers are most profitable to me? Why? What promotions are most effective? For which customers? What kind of customers will be interested in my new product? What customers are at risk to defect to my competitor? How do I identify prospects with the greatest profit potentials? Customer information is rapidly becoming a company’s most important asset to answer these questions. However, to answer these Questions in broad generalities is not enough. Each customer must be Analyzed and potentially treated uniquely. Customer relationship management provides the framework for analyzing customer Profitability and improving marketing effectiveness. Customer Relationship Management -Framework Many organizations have collected and stored a wealth of data about their Customers, suppliers, and business partners. However, the inability to Discover valuable information hidden in the data prevents these organizations From transforming this data into knowledge. The business desire is, therefore, to Extract valid, previously unknown, and comprehensible information from large Databases and use it for profits. To fulfill these goals, organizations need to follow these steps: - Capture and integrate both the internal and external data into a comprehensive view that encompasses the whole organization. - “Mine” the integrated data for information. - Organize and present the information with knowledge for decision- making. Customer Relationship Management -Framework From the architecture point of view, the entire CRM framework can Be classified into three key components: Operational CRM – The automation of horizontally integrated business processes, including customer touch-points, channels, and front-back office integration. Analytical CRM- The analysis of data created by the Operational CRM Collaborative CRM- Applications of Collaborative services including e-mail, personalized publishing, e-communities, and similar vehicles designed to facilitate interactions between customers and organizations. CRM Architecture Business Rules and Metadata Management Data Sources Market Data Decision Support Communication Contact History Store Applications Channels Direct Mails Campaign Campaign Mgt Mgt Contact Mgt Transaction ETL Call Center History Tools Call Center Customer Service Analytics Data Mining Center Marketing Data Mart Analytics Internet Data Marts E-mail Reporting Reporting Other External Data Data Mart Data Mart Workflow Management CRM -The Business Perspective Tools and technologies will be applied to these real CRM business problems. They are: Customer Profitability – provides a blueprint for how to define and use customer profitability as the bedrock for your CRM processes. Customer Acquisition – shows how to use data mining to acquire new customers in the most profitable way possible. Customer Cross-selling – details how the technology architecture can be used to increase the value of existing customers by applying more to them. Customer Retention – uses a case study from the telecommunications industry to show how to execute successful CRM systems to retain your profitable customers. Customer Segmentation – provides the business methodology of how to segment and manage your customers in a consistent and repeatable way across the enterprise. Information Mining and Knowledge Discovery for Effective CRM In the current and emerging competitive and highly dynamic business Environment, only the most competitive companies will achieve sustained market success. In order to capitalize on business Opportunities, these organization will distinguish themselves by the Capacity to leverage information about their marketplace, customers, And operations. A central part of this strategy for long-term Sustaining success will be an active information repository- an Advanced data warehouse, in which information from various Applications or parts of the business is coalesced and understood. Information Mining The shortest path from complex data to knowledge discovery is Information mining instead of data mining to reflect the rich variety Of forms that information required for business intelligence can take. Information mining implies using powerful and sophisticated tools to Do the following: Uncover associations, patterns, and trends Detect deviations Group and classify information Develop predictive models Information Mining From a technical perspective, the real keys to successful information Mining are its algorithms: complex mathematical processes that Compare and correlate data. Algorithms enable an information mining application to determine who the best customers for the Business are or what they like to buy. They can also determine at what time of day, in what combinations, or how an organization can Optimize inventory, pricing, and merchandising in order to retain These customers and cause them to buy more, at increased profit Margins. A large volume of information is stored in anon-numeric Forms: documents, images and video files. Text Mining and Knowledge Management Text Mining is a subset of information mining technology that, in turn, is a Component of a more general category of Knowledge Management (KM) Knowledge, in this case, refers to the collective expertise, experiences, know-How, and wisdom of an organization. In a business world, knowledge is Represented not only by the structured data found in traditional database, But in a wide variety of unstructured sources such as word documents, Memos and letters, e- mail messages, news feeds, Web pages, and so forth. Text Mining and Knowledge Management Unlike data mining, text mining works with information stored in an Unstructured collection of text documents. Specifically, online text Mining refers to the process of searching through unstructured data On the internet and deriving some meaning from it. Text mining goes beyond applying statistical models to data files; in fact, text mining Uncovers relationships in a text collection, and leverages the creativity of the knowledge work to explore these relationships and Discover new knowledge. Text Mining Technologies There are two key key technologies that make online text mining possible: Internet Searching - It has been around for a quite few years. Yahoo, Alta Vista, and Excite are three of the earliest. Search engines (and discovery services) operate by indexing the context in a particular Web site and allows users to search the indexes. Although useful, first generations of these tools often were wrong because they did nit correctly index the content they retrieved. Advances in text mining applied to the internet searching resulted in online text mining, representing the new generation of Internet search tools. With these products, users can gain more relevant information by processing smaller amount of links, pages and indexes. Text Mining Technologies Text Analysis - It has been around longer than Internet searching. Indeed, scientists have been trying to make computers understand natural languages for decades; text analysis is an integral part of these efforts. The automatic analysis of text information can be used for several different general purposes: 1. To provide an overview of the contents of a large document collection, for ex., finding significant clusters of documents in a customer feedback collection could indicate where a company’s products and services need improvement. 2. To identify hidden structures between groups of objects; this may help to organize an intranet site so that related documents are all connected by hyperlinks. Text Mining Technologies 3. To increase the efficiency and effectiveness of a search process to find similar or related information; for ex., to search articles from a news service and discover all unique documents that contain hints on possible trends or technologies that have so far not been mentioned in their articles. 4. To detect duplicate documents in an article. Text Mining Technologies- Applications 1. E-mail management. A popular use of text analysis is for messae routing in which the computer “reads” the message to decide who should deal with it. (Spam control is another good example) 2. Document Management. By mining the different documents for meaning as they are put into a document repository, a company can establish a detailed index that allows the location of relevant documents at any time. 3. Automated help desk. Some companies use text mining to respond to customer inquiries. Customers’ letters and e-mails are processed by a text mining applications. 4. Market research. A market researcher can use online text mining to gather statistics on the occurrences of certain words,c phases, concepts, or themes on the World Wide Web. This information can be useful for establishing market demographics and demand curves. 5. Business intelligence gathering. This is the most advanced use of text mining. (See next slide) Blogger Blogger is one of the most popular online blogging tool, works with any browser, and is free, well designed and easy to use. Millions of people are changing their information acquisition habits, and the web Log, or “blog” has become a popular source. Title-Publishing a blog with blogger/by Elizabeth Castro, Berkeley, Calif, Peachpit, 2005 Title- Blog: Understaning the information that’s changing your world/ Hugh Howitt, Nashiville, Tenn, Nelson Books, c2005 Webblogs (isbn 0321321235) CRM in the e=Business World As e-business continues to mature and affect radical changes throughout all Aspects of the businesses, the focus of new e-business-enabled application Software will shift away from narrowly defined commerce platforms toward A broader vision of managing customer relationships. A new model that Forrester Research calls eRelationship Management (eRM) Is defined as follows: “A Web-centric approach to synchronizing customer relationships across Communication channels, business functions, and audiences” CRM in the e=Business World To implement this new e-business CRM model, companies should do the Following: Create a dynamic customer context that can address every customer interaction that is different from a view of the customer constructed from data contained in the applications. This can be achieved by collecting and organizing customer data, calculating high-level matrices for each customer (I.e., customer profitability, satisfaction, and churn potential), and assembling and delivering dynamic context to customer touch points. Generate consistent, custom responses by delivering a consolidated rules engine for routing, workflow, personalization, smart navigation, and consistent treatment of customers Build and maintain a Content Directory to point to company, products, and business partner content; and give to employees, business partners, and customers. Topic 10: Data Mining From US Government Printing Office Washington, March 25, 2003. Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census Oversight hearing on “Data Mining: Current Applications and the Future Possibilities”-Available via www.gpo.gov/congress/house or www.house.gov/reform. Background: The hearing will explore instances where data mining technology is currently employed, examine the benefits and the pitfalls, and discuss the potential uses of data mining at the Federal level of government. A specific focus on privacy and abuse concerns surrounding this technology. Data Mining: Current Applications and the Future Possibilities Data Mining technology has been utilized successfully for many years in both the private and public sectors to identify and analyze useful data that would otherwise be overlooked or inaccessible. Government agencies have also used data mining techniques quite extensively to identify and eliminate fraud, waste and abuse. States work with localities by providing them access to their data sources. This has allowed local and state enforcement agencies to zero in on tax evaders, perpetrators of financial crimes or those conducting any number of fraudulent activities. At the federal level, the Treasury Department uses this technology to identify and prosecute money laundering schemes, the IRS to track down delinquent taxpayers, and the US Customers to identify drug trafficking activities at U.S, boarders. Topic 11: Data Mining Techniques- A Summary Artificial neural networks: Non-linear predictive models that learn through training and resembles biological neural networks in structure. Decision Trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a database. Generic Algorithms: Optimization techniques that use processes such as generic combination, mutation, and natural selection in a design based on the concepts of revolution. Rule induction: The extraction of useful if-then rules from data based on statistical significance. Data Mining Techniques- A Summary Predictive modeling Classification Value prediction Database Segmentation Demographic clustering Neural clustering Association discovery Link analysis Sequential pattern discovery Similar time sequence discovery Deviation detection Statistics Visualization Two Types of Data Mining Modeling- Verification and Discovery The verification model utilizes a process that looks in a database to detect trends and patterns in data that will help answer some specific questions about the business. In this mode, the user generates a hypothesis about the data, issues a query against the data and examines the results of the query looking for verification of the hypothesis or the user decides that the hypothesis is not valid. Verification Model In this model, very little information is created in this extraction process: either the hypothesis is verified or it is not. Common tools used in this mode are: queries, multidimensional analysis and visualization. What all have in common are that the user is essentially ‘guiding’ the exploration of the data being inspected. Discovery Model A more popular model is the Discovery Model that utilizes a process that looks in a database to discover and/or predict future patterns. The discovery model is divided into two modes: “Descriptive” and “Predictive”. Discovery Model- Descriptive Mode The Descriptive mode finds hidden patterns without a predetermined idea or hypothesis about what the patterns may be. In other words, the Data Mining software or program takes the initiative in finding what the interesting patterns are, without the user thinking of the relevant questions first. In this mode information is created about the data with very little or guidance from the user. The exploration of the data is done in such a way as to yield as large a number of useful facts about the data in the shortest amount of time. Discovery Model- Predictive Mode In the Predictive mode patterns discovered from the database are used to predict the future patterns or trends. Predictive modeling allows the user to submit records with some unknown field values, and the system will guess the unknown values based on previous patterns discovered from the database. In comparing the two models, one can state that “Verification” can be very inefficient, timely and costly. Whereas, “Discovery” modeling can be very efficient, cost effective, less dependent on user input and increases modeling accuracy.
Pages to are hidden for
"Data Mining"Please download to view full document