E Business Opportunities

Reviews
Shared by: Jordanpugh
Categories
Stats
views:
28
rating:
not rated
reviews:
0
posted:
8/30/2009
language:
English
pages:
0
Data Mining Meets E-Business: Opportunities and Challenges Umeshwar Dayal (with colleagues from the Data Mining Solutions and E-Business Process Management research groups) Hewlett-Packard Labs. Palo Alto, CA dayal@hpl.hp.com e Outline • Context: The E-Business Landscape and Data Mining Opportunities • Four Cases – Customer Relationship Management – Catalog Creation and Service Discovery • Text Categorization • Information Extraction from Semi-structured Text – Business Process Intelligence • Conclusions e The E-Business Landscape Internet Customer Relationship Management: Sales, Marketing, Support, … Intelligent Enterprise Supplier/ Partner Mgmt: Design, Procurement, Outsourcing, Supply chain.., Sell-side Manufacturing, Logistics, ERP Buy-side “The worldwide business-tobusiness Internet commerce market will boom to $8.5 trillion in 2005 despite economic slowdowns. B-to-B Internet commerce sales totaled more than $433B in 2000, up 189% from 1999, and are expected to more than double to $919B this year.” [Gartner Report]. An Intelligent Enterprise in the E-Services Marketplace must achieve Automation, Integration, and Optimization across all customer relationship, supply chain, and internal business processes by: gathering, managing, and analyzing large amounts of data on its customers, products, services, operations, suppliers, and partners, and all the transactions in between. e Data Mining Landscape • Commercial activity: Has shifted from horizontal software and toolkits to vertical applications, system integration, and services. • Many data mining opportunities exist for the intelligent enterprise in the e-business marketplace – Intelligent customer relationship management: segmentation, personalization, marketing, support – Supply chain management: procurement, dynamic discovery & bundling of services, pricing – End-to-end optimization of business processes: customer demand through ERP & manufacturing to procurement • Research: Must shift from obsession with algorithms to developing solutions enriched by data mining (“invisible, embedded data mining”, “closing the loop”). e What Industry Analysts Are Saying • Top CIO Priorities 1999 (Gartner Group) Business Improve Customer Service Capabilities Develop New Distribution Channels Improve Targeted Marketing Abilities Enable Knowledge Transfer Streamline Internal Business Processes Technical Build Intranet & Extranet Capabilities Exploit Data Warehousing & Data Mining Implement E-Commerce Build IT Infrastructure Improve network and system security • Market demand is very large • E-Intelligence spending in 2003 estimated to be $31B (IDC) • It is the next wave in IT spending…will eventually reach or exceed the ERP market (Merrill Lynch) • CRM analytic application market forecast to grow at 54.1% per year through 2003 (IDC) • By 2002, the number of data mining projects will grow more than 300% to improve customer relationships and help enterprises listen to their customers (Gartner Group, 1999) Interactive personalization Text mining Resource optimization • By 2003, at least 90% of all consumer-intensive industries with e-point-of-service/sales will utilize data mining models to predict customer preferences (Gartner Group, 1999). e Challenges • Scalability: Very high data volumes and data flow rates – Large retail site: 35000 products, 4.2 billion transactions, tens to hundreds of TBs per year – Have to consider scalability of the whole architecture • Complex, structured, semi-structured, and unstructured data • Data extraction, cleaning, and consolidation from many sources – Integrate data warehousing, on-line analytical processing (OLAP), and data mining. • Interactive, on-line mining – Incorporate real-time data streams, "live" updates, user interactions – Incremental analysis – Interactive visualization • Integrate into complete solutions – Use results of analysis and mining for decision making, e.g., marketing campaigns, adapting business processes, supply chain optimization e Outline • Context: The Intelligent Enterprise, E-Business, and Data Mining Opportunities • Four Cases – Customer Relationship Management – Catalog Creation and Service Discovery • Text Categorization • Information Extraction from Semi-structured Text – Business Process Intelligence • Conclusions e Case 1: Intelligent Customer Relationship Management External Data Product Catalog DB Customer Data Reporting, Analysis and Mining Business Web log Event Log Manager Web Server Content Server Commerce Server Campaign, Business Promotion Rules Manager Engine Customers Transaction DB Data warehouse Product/page recommendations Target marketing, promotions Customer profiling Customer/market segmentation Product affinity analysis e Data Mining for Intelligent CRM • Data Sources: – web logs: page accessed, IP address, time, referring site, bytes, … – event logs: ads seen, products seen, products added to shopping cart, products bought, abandoned shopping carts, … – transaction database: customer id, products ordered, time, quantity, price, … – query logs: search terms used, documents returned, … • Types of analysis – – – – Multidimensional analysis (profiling) Association rules (product affinities) Clustering, classification (segmentation) Similarity (collaborative filtering) e OLAP-Based Profiling Architecture Store back Report / analysis/visualization tools Extract, Transform, Load usage data Profile table OLAP Servers Profile cube Usage pattern cubes of individual customer Current usage table Data Warehouse Updated profile cube Profile snapshot cube • • • • Typically, OLAP (On-Line Analytical Processing) is used as a front-end tool for analysis. OLAP servers provide memory mgmt, efficient computation over data cubes. Traditionally, intended for relatively static operation: periodic batch refresh of the warehouse, re-compute data cubes, re-evaluate queries and reports. We use OLAP servers as data summarization engines in a computational pipeline. Q. Chen, M. Hsu, U. Dayal “OLAP-Based Scalable Profiling of Customer Behaviour”, First Intl. Conf. On Data Warehousing and Knowledge Discovery (DAWAK) 1999. e OLAP: Operations on Data Cubes by time • Represent data by multidimensional cubes: (hierarchical) dimensions and measures Mar Feb Jan by area L.A. S.F. NYC. • Dice, slice: Select a sub cube , e.g., sales where city = LA & month = Jan98 • Roll-up (summarize), drill-down (detail): e.g., Total sales of books for first quarter ‘98 in CA Music Books Electronics by products Dimension Hierarchies year Country State time City hour Category month week day • Ad-hoc queries • Flexible report types • Powerful derivations: Get derived measures, e.g., profit = (sales - expense) across all dimensions • Ranking: e.g. top 10% of cities by average quarterly sales of books Sales volume (Measure) Product Area Product e OLAP-Based Mining • • • Enables powerful analysis and multi-level summarization of e-commerce data. Scalable to large data volumes and data flow rates. Supports continuous, incremental analysis: – Use OLAP server as a compute engine: create only those cubes that are needed (can think of cubes as materialized views over data in the warehouse); use only those dimensions that are needed for particular analyses; use binning to reduce the cardinality of the dimensions. – Store back results persistently in the data warehouse (RDB) to overcome data size limitations. • • OLAP scripts as high-level language for multi-dimensional, multi-level data mining. Model customer profiles, patterns, similarity measures, association rules as cubes – compute efficiently using cube operations in the OLAP server – evolve incrementally in real-time as new data flows in – multi-dimensional, multi-level analysis over cubes provides enhanced expressive power (e.g., richer association rules) by integrating OLAP style drill down, rollup operations with data mining tasks. e Cube-based Associations • Association rules are represented as cubes – can be generated by cube operations – can be maintained as cube cells – Scalable to large data sets • Allows definition of new kinds of multilevel, multidimensional association rules with enhanced expressive power – scoped association rules based on different elements cross-sale rule based on transactions (traditional shopping basket analysis) x∈Transactions: contain_product(x, A) ⇒contain_product(x, B) cross-sale rule based on customers (regardless of whether purchased in the same transaction) x∈Customers: buy_product(x, A) ⇒buy_product(x, B) – multidimensional rule – high-level rule [ x∈Customers: buy_product(x, ‘A’) ⇒buy_product(x, ‘B’) ]  customer_group = ‘engineer’, area = ‘Los Angeles’, time = ‘Jan98’ [ x∈Customers: buy_product(x, ‘A’) ⇒buy_product(x, ‘B’) ]  customer_group = ‘engineer’, area = ‘California’, time = ‘Year98’ e Cube-Based Association Rule Mining |B| 3 Base-cube product P1 P1 P2 1/3 P3 2/3 product2 product P1 S1 1 2 2 1 P2 P3 2 2 P2 P3 1/3 2/3 product P1 P1 3 1 2 P2 1 1 P3 2 Support-cube |X∧Y| / |B| customer product2 S2 S3 P2 P3 2 Volume-cube Association|X∧Y| cube product product P1 P2 P3 3 1 2 P1 P1 P2 P3 1/1 1/1 P2 1/3 P3 2/3 product2 Confidence-cube |X∧Y|/ |X| Population-cube |X| e OLAP-based Profiling • Scalability challenges • Huge data volumes and data flow rates: a busy e-commerce site can generate hundreds of millions of events per day. – Solution: Scale using parallel loading and analysis • Fine-grained analysis (e.g., individual customer profiling) requires very large, very sparse cubes – Example: a newspaper web site had 48,128 customers * 10,432 referring sites * 18,085 pages * 24 hours per day => ~200 trillion cells! – Compressed for storage, but cube rollup operation very slow (~10,000 hours!) – Solution: careful design + optimizations yielded 3-4 orders of magnitude improvement. e Scalability of Cube Rollup • Dimension hierarchies Aggregates Basic measures – ip : 63.211.140.164 →origin : CA – uri: exp.com/TODAY/topstory.html →subject: exp.com/TODAY/ • Typical cube rollup operation (embedded total) – When original cube has multiple large-sized dimensions, a large number of additional cells are needed to hold the embedded-total. – In the above example, these sub-totals occupy approximately 50 trillion cells in the rolled up cube, out of a total of 267 trillion cells. – While the OLAP engine compresses sparse cubes for efficient storage, the cells containing nulls must be checked in some way during the rollup operation. • Rolling up such a cube as a whole is impractical. e Scaling: Huge, Sparse Cubes Aggregates (dimensioned subtotals) HDC: EXPvolume.high Loader1 Loader2 Basic measures Web log records BVC: EXPvolume HDC Solution: careful design + optimizations • Maintain high diagonal cube (HDC) separate from basic volume cube (BVC). • Populate by direct loading and binning, not by rollup. • Maintain relationships between HDC and BVC for drilldown. •.Compute intermediate aggregates on demand. • High-profile cubes: limit dimension elements to those corresponding to cells with large counts. • Yielded 3-4 orders of magnitude improvement. Q. Chen, U. Dayal, M. Hsu, “An OLAP-Based Scalable Web Analysis Engine”, Proc. 2nd Intl. Conf. on Data Warehousing and Knowledge Discovery (DAWAK) 2000. BVC Update cells containing aggregated data WLR Update the cells containing basic data e Outline • Context: The Intelligent Enterprise, E-Business, and Data Mining Opportunities • Four Cases – Customer Relationship Management – Catalog Creation and Service Discovery • Text Categorization • Information Extraction from Semi-structured Text – Business Process Intelligence • Conclusions e Case 2: Text Categorization Call centre/ Help desk Data Mining Customer support portal FAQs Case histories query logs web logs Topic hierarchies • Mine content and usage data – Automatically build topic hierarchy and categorize documents to assist in search. – Extract problems/ FAQs, and recommend relevant documents. e Text Categorization Framework TEXT 4 million text Documents Content map Existing Taxonomy yes Manual work Learn a classifier LOG 100,000 queries/ week FAQ no Learn a taxonomy no NA yes Training data Search terms e Topic Hierarchy Creation & Text Categorization • Mine content and usage (query logs) data – Automatically build topic hierarchy and categorize documents to assist in search. – Extract problems/ FAQs and relevant solution documents, and place them on topic hierarchy. Content Usage • • • • • Data Cleaning & Transformation Topics Clustering Evaluation and Visualization Hot Topics Extract key words and phrases Transform documents and query log records into vectors Cluster hierarchically Label each cluster with significant words, phrases Visualize as hyperbolic tree for navigation/browsing e Challenges in Text Categorization • Problem: Docs are noisy, conversational, not well structured, replete with typos, abbreviations, jargon, unconventional text (e.g., code fragments, tables) • Difficult issues: – Normalization and cleaning – Sentence boundary detection & extraction of most significant sections of the document – Feature selection – Scalable, incremental, robust clustering algorithms – Clustering techniques were effective in producing leaf nodes of the taxonomy – Hierarchical clustering to produce higher nodes of the taxonomy proved very difficult – Labeling the nodes of the taxonomy (with terms that are semantically meaningful to humans) proved very difficult • Data mining as an aid to human experts, e.g., suggestions for expanding or modifying a taxonomy, generating “hot topics” for placement in a taxonomy, generating cross-index terms. e Toolkit for Normalization and Summarization Anomaly Effect Functionality Required Unify representation of words Removal* of code, dumps and tables Tools** - Thesaurus Assistant -Normalizer dN Stage 1 Typos False word (General Misspellings occurrences Cleaning) Abbreviations Stage 2 Code (TaskDumps specific Cryptic tables Cleaning) Stage 3 (Extraction) Complicate sentence identification possibly w/o adding value --- - Code Remover - Table Remover - Sentence Identifier - Sentence Scorer --- Obtain summary e M. Castellanos, J. Stinger: “A Practical Approach to Extracting Relevant Sentences in the Presence of Dirty Text”, SIAM Data Mining Workshop on Text Mining, April 2001. Thesaurus Generation for Feature Engineering • In many text mining techniques, the basic ingredient is the frequency of occurrence of words • Typos, misspellings, abbreviations mislead the results – different orthographic representations for same “word” will be taken as different words • unless… we add a “clean-up” preprocessing step to the text mining task: normalization omniback omni back desc omniback 11.0 omniback omniback omni back omniback 3.0 10.20 omniback omniback 3.00 omniback 3.1 omniback 3.10 omniback ii omnibackii omniback2 omniback gui omniback db omniback emer omnibook omniback 2.55 e Automatically Indexing Document Collections e Hierarchical Classification Root HP-UX MPE NT Databases System Software ... Applications Networking Oracle Sybase Powerpoint • • Goal: – Given a clean document, find the best class for it in the topic hierarchy – If you misclassify a document, at least have it be somewhere reasonable – Some human verification / correction / training is available • Ideally, automate this (4,000,000 documents) Challenges: – How wrong is wrong? Evaluating coherence of the hierarchy – Unbalanced datasets – Taking advantage of the hierarchy – Can we avoid enormous training sets (co-training) – Evolution of the hierarchy e Outline • Context: The Intelligent Enterprise, E-Business, and Data Mining Opportunities • Four Cases – Customer Relationship Management – Catalog Creation and Service Discovery • Text Categorization • Information Extraction from Semi-structured Text – Business Process Intelligence • Conclusions e Case 3: Information Extraction for Catalog Creation, Service Discovery Parametric Search, Supply Chain applications, Service Discovery “ Find processor with low power consumption @ 3.3V & operating at clock speed > 50 MHz & leadtime < 6 weeks with cost < $35@qty=10000 ” Structured Product Catalog Web Content Mining • • • • HTML or PDF Documents (e.g., data sheets published by vendors) Web navigation Document structure recognition (e.g. table recognition in pdf) Attribute extraction and tagging XML formulation WWW e Problem: Attribute Values May Be Found in Free Text, Lists, Tables, Diagrams e Solution: Model-Driven Content Mining Agents Product concept model: Product family hierarchy, applicable attributes, thesaurus (e.g., synonyms, units, conversions) Document model: Document structure (section, paragraph, table, etc), where to find attributes, extraction rules (e.g., patterns) Alternative approach: wrapping web sites. Does not work well for very heterogeneous web sites; more sensitive to restructuring of the pages; does not work with PDF content. Domain Model Domain Model Parser Domainspecific scripts Vendor Catalog Navigator Vendor URL Data Sheet URL Extractor XMLtagged Component AttributeValue data Component DB Vendor site url, navigation rules (e.g. look for table of contents and follow links, fill out query form), vendor-specific dictionary and document model. WWW e M. Castellanos, J. Stinger, M. Lemon, M.Hsu, U. Dayal, P.Siegel “Component Advisor: a tool for automatically extracting electronic component data from Web datasheets.” WWW7 Workshop on Reuse of Web-based Information, April 1998. Extraction from Data Sheets -- Problems • First identify hidden structures (tables, lists, paragraphs) in the data. For HTML tagged documents, this is easier than for PDF documents. But 95% of the data sheets are in PDF. • Existing PDF to HTML/XML conversion tools have font and formatting problems, and do not handle tables. • Content mining agent combines several heuristics – Font analysis: exploit cues inherent in font usage to detect potential section headings, row and column labels in tables, etc. – Image analysis: histograms of pixel density – Geometric analysis: spacing between words on a line, lining up of words in columns, etc. e Outline • Context: The Intelligent Enterprise, E-Business, and Data Mining Opportunities • Four Cases – Customer Relationship Management – Catalog Creation and Service Discovery • Text Categorization • Information Extraction from Semi-structured Text – Business Process Intelligence • Conclusions e Case 4: Business Process Intelligence • Goal: improving the quality of enterprise business processes & services – Internal quality, as perceived by the service provider (e.g. reduced operating costs) – External quality, as perceived by the user (e.g., better service) • Enterprise business processes are automated by Workflow Engines. Initiate Notify Requester of Initiation Get Approval Join Get next Approver Notify Approver of Work Get Approver Decision Check Approval Status Notify FInal Decision Done • These engines monitor many aspects of process execution and service delivery – Who does what, when, how long do they take • Record data in audit logs that can be used to analyze, understand, and optimize processes. e Problem Current Situation: Reporting Tools Workflow Design Engineer Business Process Analyst System Administrator IT Manager Business Manager/Analyst (built in or external) Reporting tools Workflow Audit Logs Workflow Engine • Writing the “right” queries is very difficult and time-consuming • What is the performance and outcome of activities executed on Fridays? • Which resources perform best for a given activity? • How does the relative performance of a resource change as a function of time? • Dirty data, missing values, special codes • Query performance is poor: complex queries involving joins and aggregation • Little support for integrating other data sources or multidimensional analysis • No support for understanding the causes of problems, predicting problems, or optimizing processes. e Business Process Intelligence Reporting, Simulation BPI Console OLAP/mining tools Monitoring and Optimization Manager Optimization Workflow Engine A Workflow Engine B Workflow A Audit Logs Workflow B Audit Logs Aggregated data, prediction models BPI Engine ETL Process definition and execution data Other sources Warehouse e Example Application: Exception Analysis, Prediction, and Prevention • Service providers need to deliver services (execute processes) with high and predictable quality. • A key issue is reducing the occurrence of exceptions. – Exception: a deviation from the optimal (or acceptable) execution. It is a high-level, user-defined, subjective concept. • To help reduce the occurrence of exceptions, support: – Exception Analysis: identify the causes of exceptional behaviors. – Exception Prediction: predict the occurrence of exceptions as early as possible during process execution. – Exception Prevention: take actions to avoid (when possible and convenient) the occurrence of the exceptional situation. D. Grigori, F. Casati, U. Dayal, M-C. Shan: “Improving Business Process Quality through Exception Understanding, Prediction, and Analysis.” Proc. Intl. Conf. on Very Large Data Bases, Sept. 2001. e Approach to Exception Analysis • Mine process definition and execution data – We treat exception analysis as a classification problem Mining <=2 T V 0% 1.87 % 100% 98. 13% 0 9115 150 6076 1 0 1 0 St artDay {Sat ,..T hu} T 5% 95% 553 1052 2 V 8% 92% 61 8 7111 1 0 1 0 1 0 1 0 T V 11. 9% 11.4% 88. 1% 88.6% 28 50 203 0 2115 0 15790 NumExec _ Get A ppro verD ecision >2 And <=6 T V 10. 1% 8. 9% 89. 9% 91.1% 1217 1094 5 802 7233 >6 T 60.6 % 39. 4% 1633 1089 V 60% 40% 1078 718 1 0 1 0 1 0 1 0 Resource_I nit _ GetA pproverD ecision {Friday} T V 61.1 % 60.0 % 33.9 % 40.0 % 652 184 435 122 {Res1, ..} T 1 0 1 0 V 1 0 1 0 { Resn ,...} T V 4.1 1% 7.6% 95. 89% 92. 4% 17 18 396 220 1 0 1 0 70% 68. 0% 30% 32. 0% 16 16 10 60 693 498 Classification rules Training and Validation sets Interpretation Causes of exception Preparation and Labeling Process Definitions Exception Definitions Process Executions e Experimental Results: Analysis • We applied the techniques to Administrative processes to analyze process duration exceptions – Process considered “long” when over 20 days – On average, 15% of instances were exceptional • Analysis: – When a certain node were executed by resources in group A, 70% of the instances was exceptional. – When the node was executed by resources in group B, 5% of the instances were exceptional Initiate Notify Requester of Initiation Get Approval Join Get next Approver Notify Approver of Work Get Approver Decision Check Approval Status Notify FInal Decision Done e Exception Prediction • Goal: predict occurrence of exception as early as possible – Prediction accuracy increases as process execution progresses Mining <=2 T V 0% 1.87 % 100% 98. 13% 0 15 0 91 15 6076 1 0 1 0 St artDa y {Sat,..T hu} T 5% 95% 553 10522 V 8% 92% 61 8 7111 1 0 1 0 1 0 1 0 1 0 1 0 T V 11. 9% 11.4% 88. 1% 88.6% 28 50 2 03 0 21150 15790 T V 1 15.1% 15.8% 0 84.9% 84.2% 1 6390 2960 0 35920 15830 Duration_GetApproverDecision < =2 1 0 1 0 T V 11. 9% 11 .4% 88. 1% 88 .6% 2850 2030 21 150 15790 NumE xec _ Get ApproverD ecision >2 And < =6 V 1.87% T V 10. 1% 8. 9% 89. 9% 91 .1% 1217 802 10 945 7233 >6 T 60. 6% 39.4 % 1633 10 89 V 60% 40% 1078 71 8 NumE xec _ Get ApproverD ecisio n >2 And <=6 T V 10. 1% 8. 9% 89. 9% 91.1% 12 17 80 2 10945 7233 >6 <5.6 T V 1 9.4% 10.3% T V 0 90.6% 89.7% 60. 6% 6 0% 1 3710 1800 39. 4% 40 % 0 35740 15750 1633 1089 1078 718 Len_Approvers 1 0 1 1 0 0 >=5.6 T 0% 1 0 1 0 1 0 1 0 100% 98. 13% T V 0 15 0 93.7% 93.5% 9115 6.3% 6076 6.5% 1 2680 1160 0 180 80 St artDay 1 0 1 0 1 0 1 0 Resource_I nit _ Get Appro verD ecision {Sat ,..T hu} {Frida y} {Res1, ..} T V 70% 68. 0% 30% 32. 0% 1616 1060 693 498 { Resn ,...} T V 4.1 1% 7.6% 95. 89% 92.4% 17 18 396 220 Reso urce _I nit _ =8 ApproverD ecision Get =16 1 0 1 0 {Friday} T V 61.1 % 60.0 % 33.9 % 40.0 % 65 2 1 84 43 5 1 22 1 0 1 0 T V T V 1 0% 0% { Resn ,...} 1 6.5% 8.9% {Res1, ..} 0 100% 100% 0 93.5% 91.1% 1 0 0 1 860 520 T V T 0 8040 3840 0 V 12300 5320 70% 6 8. 0% 30% 3 2. 0% 1616 1060 69 3 49 8 1 0 1 0 4.1 1% 7.6% 95.8 9% 92.4% 17 18 396 220 T V 5% 8% 95 % 92%T 553 1 618 15.6% 1052 2 7111 >16 1 0 V 33. 9% 1 16.3%652 0 435 0 84.4% 83.7% 1 2850 1280 0 15400 6590 T V 61. 1% 60.0% 40.0% 18 4 12 2 1 0 1 0 1 0 1 0 Classification Training and rules Validation sets Preparation and Labeling e Process Definitions Exception Definitions Process Executions Several Training/Validation sets were prepared (one for each execution stage). Each set only includes process execution attributes defined at that stage. A predictive model was generated for each stage. Experimental Results: Prediction • Good predictions at the very start of the process – A process input variable determines the number of loops, and therefore was correlated to the process duration • For some other combination of input data, as high as 50% exception probability • After the execution of a “critical” node, prediction accuracy increased substantially. • A lot more work needs to be done to prevent exceptions. 50% Initiate Notify Requester of Initiation Get Approval Join Get next Approver Notify Approver of Work Get Approver Decision Check Approval Status Notify FInal Decision Done 55% 80% 90% e Process Improvement • Designing processes is challenging – Difficult to know the process (even for the people involved in it) – Difficult for the modeler to ask the right questions, get the right answers • Business Process Intelligence supports process (re)design, by emphasizing problems and inefficiencies Remind Supplier Add supplier to quoting tool end branch Initiate N R otify equester of Initiation G Approval Join et G next Approver et N Approver of W otify ork G Approver D et ecision C heck Approval Status N FInal D otify ecision D one End Start Node Loop Request data from supplier Split Cancel Prepare supplier Notify Setup supplier in setup form Accounting dept. procurement tool e Outline • Context: The Intelligent Enterprise, E-Business, and Data Mining Opportunities • Four Cases – Customer Relationship Management – Catalog Creation and Service Discovery • Text Categorization • Information Extraction from Semi-structured Text – Business Process Intelligence • Conclusions e Conclusions • Commercial Landscape: Shift from horizontal software, toolkits to vertical applications, system integration, and services. • Research: Must shift from obsession with algorithms to developing solutions enabled by data mining (“invisible, embedded data mining”). • Many applications of usage mining and content mining, and combinations of these, for e-business. • Use many different techniques drawn from different disciplines: – For usage mining: OLAP, clustering, association rules, classification, … – for content mining: clustering, classification, information retrieval, linguistic analysis, … • Have to address end-to-end scalability of the whole solution architecture. • Data preparation and cleaning are still an art. • Important to close the loop: use the results of mining for decision making and optimization of business processes. e

Related docs
Opportunities
Views: 8  |  Downloads: 3
Restaurant Business Opportunities
Views: 5  |  Downloads: 0
consulting business opportunities
Views: 104  |  Downloads: 18
Alabama Business Opportunities
Views: 0  |  Downloads: 0
irish business opportunities
Views: 112  |  Downloads: 1
e
Views: 8  |  Downloads: 1
best business opportunities
Views: 17  |  Downloads: 2
at home business opportunities
Views: 72  |  Downloads: 6
Business Opportunities for the Entrepreneur
Views: 72  |  Downloads: 9
brokerage opportunities
Views: 146  |  Downloads: 1
Business Opportunities in Europe
Views: 0  |  Downloads: 0
Business Opportunities Manual
Views: 0  |  Downloads: 0
premium docs
Other docs by Jordanpugh
Office Space Orlando
Views: 29  |  Downloads: 0
Nc Small Business
Views: 194  |  Downloads: 1
Nightclubs For Sale
Views: 89  |  Downloads: 0
Office Space Ontario
Views: 31  |  Downloads: 0
Ohio Hotel Listings
Views: 14  |  Downloads: 0
Ohio Business Licenses
Views: 44  |  Downloads: 0
Offices For Rent
Views: 21  |  Downloads: 0
Office Space Irvine
Views: 20  |  Downloads: 0
Office Space Texas
Views: 21  |  Downloads: 0
Office Space Charlotte
Views: 29  |  Downloads: 0
Nyc Office Sublets
Views: 156  |  Downloads: 0
Office Space Lease
Views: 80  |  Downloads: 0
Office Space Greensboro
Views: 48  |  Downloads: 0
Office Space Available
Views: 41  |  Downloads: 0
Nj Business Forms
Views: 57  |  Downloads: 0