DATAMINING by gopishrine


Abstract: The purpose of this paper is giving a short introduction to the concepts of Data mining and Data warehousing and an explanation of their general possibilities and a short description of their uses in the field of Enterprise System Integration. We also describe the concept of data mining by comparing traditional marketing research with relationship marketing. The background of data mining is discussed with special emphasis paid to the various terms in data mining such as data warehouses and data marts as well as knowledge discovery in databases (KDD) and continuous relationship marketing (CRM). Steps necessary for companies to implement successful data mining projects are enumerated and there is much scope for future research. An enterprise website nowadays becomes one of the most important channels between the enterprise and its existing/potential customers (visitors). We envision a better management of visitor relationship will bring about loyalty from the existing customers and stimulate the interests in the enterprise from the potential customers. In this paper, we apply the concept of CRM to the management of an enterprise website, that is, visitor relationship management. In other words, customers are differentiated with their different values and served with different relationship strengthening practices with the understanding of the visitors. Present paper deals with following topics: I. Introduction II. Why use data warehouse and data mining III. Use in enterprise system integration IV. Architecture of data warehouse V. Working of data warehousing and data mining VI. Characteristics of data warehouses VII. Customer relationshipment management VIII. Benefits of Customer relation Management IX. Application of Data mining X. Conclusions -1I. INTRODUCTION: The promise of data mining in business environments is enormous. Until recently capitalizing on that promise in a real-world business environment has sometimes been very difficult. The promise is still as bright as ever and the recent past has taught practitioners of data mining for CRM (customer relationship management) much about delivering high-return, practical results. Data mining is not a universal panacea for CRM success. Critical criteria include tools selection, business objective matching, data discovery, preparation & delivery. Successful data mining in a CRM environment is far more than the application of algorithms to data. Customer Relationship Management is a broad approach to doing business. It is holistic in that it encompasses all aspects and functions of a company, focusing on

managing the relationship between customer and company just as much between company and customer. CRM requires a two-way street – and exchange of information just as much as of goods and services. There are five crucial CRM strategic business areas. Each area is examined separately to enable a clear view of the problems to be met, the business problem to be solved, and the methods for delivering value. The areas chosen for scrutiny are: 1) Data preparation 2) Customer segmentation 3) Attrition 4) Cross sell and 5) E-commerce. These extend the miner‘s skill set within CRM.

a) Data warehousing: The volume of data that a company collects may be very large, like also the databases may be numerous. In such a case, a system that makes easier and faster the process of retrieval information is needed. This instrument is a Data Warehouse. A common definition of data warehouse: ―A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to run queries over data that originally came from different sources.‖ A data warehouse is a database in which are stored data from the other databases of the company, after that these data have been pre-processed in order to make them more accessible.

b) Data mining: -2Data mining is a method for data processing; nowadays it could be considered the powerful one. Data mining is also known as Knowledge Discovery in Databases – KDD, and it can be defined as a method for retrieving information from data. Information and data is not the same thing: data is just something stored somewhere; information is something richer. Data mining becomes a hot topic in the last year‘s thanks to increase of computing power: previous data, which have been compiled and never analysed, have been analysed and the data mining techniques have been improved. The power of data mining is the ability to achieve not visible information stored in the data. Data mining finds patterns to classify data into information. None of other traditional data process methods is so unrelated with human way of thinking: data mining doesn‘t need a ―guide‖ to achieve information: there‘s no need to say to it what to search, that‘s way it can find precious information previously unknown.

 Relation between data mining and data ware housing: Data mining is useful especially if there‘s a great amount of data to analyse: the biggest and the most complete data repositories actually are data warehouses. So the link between these two things is very clear. II. Why use DATA WAREHOUSING AND DATAMINING a) Data warehousing: In a company where there are different databases, organized in different ways according to the needs of the single department or unit of the enterprise, the retrieval of the useful information for the strategy or other ―high level‖ decisions, like marketing or customer service decisions, may be a difficult and slow process. On the other hand, the databases of an enterprise are often based on different systems, like mainframes and ―old‖ systems, called legacy systems, and ―newer‖ systems such as server-client architecture. So, in order to provide an instrument that can support high-level decisions and give the right information at the right time, integration of databases and pre-processing of the great amount of data are needed. These are the functions that a data warehouse implements. There is another task that data warehouse can perform. It could be useful not only to retrieve information, but also for ―create‖ new knowledge from the available data. In fact, data warehouse is often used like a support for the activity of data mining. b) Data mining: There are a lot of methods for processing data, but most of them are deeply related with the ideas and way of thinking of the people who are using them. They need to be guided in some way by human intelligence. Also data mining can work in this way, but it can work also in a more independent way from human minds. This is very useful if there‘s not a concrete idea of the information to be found. This feature in some field of research could be very important: discover previous unconsidered relation between some diseases and other factors, for example, can lead to find a new approach to the study of these diseases. -3– III. Use in enterprise system integration One of the main purposes of enterprise system integration is knowledge management, and it can be split in three categories: knowledge acquisition, knowledge organization and knowledge deployment. The first and the second category are related with data mining and data warehousing. As previous said, data mining is a powerful instrument for information retrieving, and this is directly related with knowledge acquisition. As regards knowledge organization, one of the functions of data warehousing is storing data in order to support business analysis and management decision-making. So, the use of data warehousing and data mining can help the ESI process, but, on the other hand, the process of creating a data warehouse and, then, performing data mining has to be led by the business policy of the company, especially during the preprocessing of the data of the different databases, in phases like elimination of ―not useful‖ data and aggregation.

a) Data warehousing:

Often information is split in different database according to the needs of the different components of the company. The marketing division has its own database, with a structure to fulfill its needs, and so on for the sales division, the product development division; Data stored in these ways are not very helpful for the management purpose and for having a complete overview of the company. So through data warehousing is possible to process and combine data in an automated way in order to fulfill needs previous unsatisfied. This is needed for developing a decision support system. b) Data mining: This instrument can be a very important help for discover new information that can support the planning of new strategies for the company, the analysis of current strategies, the development of new products, and so on. One of the most important fields, related with ESI, in which data mining is used, is CRM (Customer Relationship Management). CRM is a process that manages the interactions between a company and its customers. The primary users of CRM software applications are database marketers who are looking to automate the process of interacting with customers. Data mining applications automate the process of searching the mountains of data to find patterns that are good predictors of purchasing behaviors. After mining the data, marketers must feed the results into campaign management software that, as the name implies, manages the campaign directed at the defined market segments. Data mining helps marketing users to target marketing campaigns more accurately; and also to align campaigns more closely with the needs, wants, and attitudes of customers and prospects. If the necessary information exists in a database, the data mining process can model virtually any customer activity. The key is to find patterns relevant to current business problems. -4IV. Architecture of dataware house:

External data source Decision Support system


Metadata Repository




Data Warehouse Operational Database

 External data source: The source available outside and that can be access to the system of dataware house.  Operational Database: An operational database is the system that can be used to the day-to-day operation is required in the business.  Extract: Data extracts are subsets of data that are offloaded from the server machine onto other machines. Extracts can be unscheduled user extract of some query results, or they can be scheduled extracts such as data mart refreshes. Data extraction takes data from source system and makes it available to the data warehouse; data load takes extracted data and loads it into the data warehouse.  Clean: What ever data that can not usable in some extents of time that is clean. That is removal of older data.  Transform: Metadata may be used during data transformation and load to describe the source data and any changes that need to be made. Whether or not you need to store any metadata for this process will depend on the complexity of -5the transformation that are to be applied to the data ,when moving it from source to data warehouse. It will also depend on the number of source system and the type of each system.

 The more sources that are used to feed the data warehouse, the more likely it is that you will need to store metadata about the process.  LOAD: Once the data is extracted from the source system, it is then typically loaded into a temporary data store in order for it to be cleaned up and made consistent. These checks can be quite complex, and identify consistency issues when integrating data from a number of data source. In addition, as data changes over time, errors become apparent that have gone unnoticed because the day-today discrepancies were too small to detect.  REFRESH: The data is updating time to time.  Metadata Repository: It will be necessary to keep changing the summaries that are produced to match the query profiles at each point in time. If we had to modify the warehouse manager every time we wished to add a new summery or change an existing one, the system would be perpetually in flux. Metadata can be used to address this issue, by data-driving the generation of summaries. Within the database itself, we store descriptions of the summery tables we require in terms of facts and dimensions.  OLAP: The term OLAP is an acronym for online analytical processing. Much has been written about the subject in the computer literature, and for a detailed discussion should consult some of that work. For our purpose it is sufficient to understand of the term. OLAP is primarily all about being able to access live data online and analyze it. It is about the methods, structures and tools required to perform this analysis. OLAP is about rapid access to and analysis of data. OLAP tools are designed to allow reasonably large quantities of data to be analyzed online. An OLAP tool will allow a user to quickly perform standard analytical functions on the data and to represent both data results graphically. The idea is to allow the user to easily manipulates and visualize the data. Relational technology has been around for many years, and is family well understood these days.  Decision Support system: DSS (decision-support system) also had known as EIS (executive information system) support an organization‘s leading decision makers with higher level data for complex and important decisions. V. WORKING OF data warehousing and data MINING: a) Data warehousing -6Data warehousing is something more than a second copy of data, otherwise it would be a simply backup database. Creating and maintaining a data warehouse implies other operations, which can be classified in: extraction, consolidation, filtering, cleansing, transformation, aggregation and updating.

      

Extraction: periodical download of new data from various databases. Consolidation: combination of data from different databases in order to perform data analysis. Filtering: elimination of data not needed for analysis. Cleansing: finding and repairing errors due to data manipulations. Transformation: modification of data in order to make them consistent. Aggregation: summarization of data into appropriate units for analysis. Updating: adding new data.

 Data modeling for data warehouses Multidimensional models take advantage of inherent relationship in the data to populate data in multidimensional matrix called cubes. For the data that leads itself to dimensional formatting, query performance in multidimensional matrices can be much better than in the relational data model. Three examples of dimensions in a corporate data warehouse would be the corporation‗s fiscal period, products, and region.  A two dimensional matrix model ………………… …..

REG1 P123 P124 . . . . . . . . . .



A standard spreadsheet is a two dimensional matrix. One example would be spreadsheet of regional sales by product for a particular time period. Product could be shown as rows, sales revenue for each region comprising the columns.

-7 Three dimensional data cube model Adding a time dimension, such as an organization‘s fiscal quarters, would produce a three dimensional matrix, which could represented using a data cube as shown in the

figure. In the figure there is a three dimensional data cube that organizes product sales data by fiscal quarters and sales regions. Each cell could contain data for a specific product, specific fiscal quarter, and specific region. By including additional dimensions, a data hypercube can be obtained, although more than three dimensions cannot be easily visualized at all or presented graphically. The data can be queried directly in any combination of dimensions, bypassing complex database queries. Tools exist for viewing data according to the user‘s choice of dimensions.  Pivoting Changing from one dimensional hierarchy (orientation) to another is easily accomplished in a data cube by a technique called pivoting (also called rotation).In this technique the data cube can be thought of as rotating to show different orientation of the axes. Multidimensional models lend themselves readily to hierarchical views in what is known as roll-up display and roll-down display. b) Data mining: There are a lot of techniques related with data mining, but the general process can be described using the following steps: 1) Identification of the problem.  Data preparation: before applying data processing techniques, data needs to be manipulated in order to choose the relevant ones.  Creation of data mining patterns: Using different techniques is possible to obtain different patterns. The patterns are obtained by selecting a training set of data (a subset of existing data used to create the pattern) and by testing them using other subsets of data called testing sets. Testing sets and techniques are needed in order to avoid problem like over fitting: the pattern fits efficiently the data given but is not useful for other set of data, as it is too tied up with training set data. To choose between different patterns generated with different techniques a valuation of the kind of errors that the patterns are likely to generate is needed. The choice of the technique is driven by the goal that is to be achieved: for example, fraud recognition in an assurance company suggests the use of a technique of classification.(data mining is used to find rules useful to classify in categories, like ―safe‖ and ―not safe‖, the costumers, using age, profession and other parameters), products sales analysis in a supermarket needs a technique of associations recognition (collected data are used to find new relations between products). Other techniques are, for example, clustering and regression. -8VI. Characteristics of data warehouses: The diagram shown is explained about the characteristics of the data warehouse. Compared with the transactional database, data warehouses are nonvolatile. That means

that information in the data warehouse is changes for less often and my be regarded as non-real –time with periodic updating. Back flushing DATA WAREHOUSE OLAP Cleanin g Reformattin g MATADATA DATA Databases DSSI EIS DATA MINING

Other Data Inputs

Updates/New Data




Customer is not new, Relations are as old as a buyer and a seller and so is not Management. The concepts of CRM have been there since the concept of buying and selling came into being. Then, what is creating waves in today's CRM industry? Is that small electronic 'e' changing the trend? CRM is considered to be a software tool and a technology solution in this Information Technology industry. In fact CRM is a strategy towards achieving a holistic view of any partner engagement. CRM, which is a combination of marketing and business processes, is the basic understanding of customers and how organizations measure them. The mantra behind CRM is catering to customized needs "centrally". As defined by "gurus" of CRM - Customer Relationship Management is a business strategy to select and manage the most valuable customer relationships. CRM requires customer-centric business philosophy and culture to support effective marketing, sales and service processes. CRM applications can enable effective customer relationship management, provided that an enterprise has the right leadership, strategy and culture. USE OF CRM: Keeping in mind the pace at which technology is changing today, any -9company which is a step ahead of others because of some web product or service will not be able to hold on to that advantage for long. Key to stability in today's dynamic marketplace is forging long-term relationships with the customers.

Customers can be divided into three zones: 1. Zone of defection where customers are extremely hostile and have the lowest level of satisfaction. 2. Zone of indifference where customers are not sure. They have a medium level of satisfaction and loyalty towards the company. 3. The third level of customers is in the zone of affection described as "Apostles". CRM focuses on bringing customers from level 1 to level 3 and retaining apostle customers. Traditional Marketing Research Today the majority of companies that consider themselves market driven are still organized around their products. These companies position their products to a carefully researched segment of customers whose wants are unfulfilled. To virtually guarantee success, these companies believe that they must give additional value to the chosen segment by differentiating their product in some unique way. Companies of this type emphasize the refining of internal processes and outputs to meet the needs of the massmarket and customers are treated as a homogeneous and basically passive mass. A number of companies attempted to change or redirect their efforts in the late 1980's and early 1990's. At that time "customer service" became a "hot" topic. Everyone from CEO's to brand managers to hourly employees was admonished to "Take Care of the Customer." Traditional surveys of what the customers want or the service they have received are what many companies rely on today. This traditional survey gives the company reliable information on what customers think they think or what they think they want, but it may not be what they really think or want . If you are only supplying what your Customer wants or think they want today, you are not tapping into the unspoken needs and unserved markets that may be the key to the customer of today and the potential customers of tomorrow. Companies that consider themselves market driven spend an inordinate amount of time differentiating their product through quality improvement. It is estimated that focusing on quality improvements are only about 10% of what you should be doing in your company. This overriding strategy of the past was to acquire customers and respond to their aggregate needs. Relationship Marketing- The Modern View Forward looking companies of today believe that customers are what sustain any business and that they have "lifetime value," not just the value of a single sale. It is believed that customer groups, if managed and maintained, cannot be easily copied by the competition i.e., they are one of the few "sustainable" competitive advantages open to the company. Progressive companies of the future will know and understand the difference between knowledge of the customer and customer knowledge. For instance, knowledge of the customer is knowing how many hits a browser makes on your web site, whereas customer knowledge is knowing what to do with the hits. To benefit from this "new" - 10 philosophy a company must change the entire business operation so that research and development and marketing, work seamlessly and financial resources are allocated in the "right" places.

The producers and suppliers must be able to put together the right mix of service and information surrounding the differentiated or personalized products of the future. This mix will be customized by creating very separate portraits of individual customers. The technology to develop these portraits exists in today's data mining technology. Companies are able to take information from their own company's database and augment it with enhancement information provided by a data compiler and then apply a predictive model to the augmented data set using sophisticated data mining techniques. In this way we can understand some of the things the individuals in the year 2020 will want to achieve as customers. VIII. Benefits of Customer relation Management  Centralized Customer Database: VRM installs a Companies folder that provides a one-to-many relationship between company and contacts. Company information is stored once only and maintained in one place. All contacts reference the same company information.  Automatic E-mail Processing: VRM automates the process of creating a new contact and company record from information within the e-mail, increasing team productivity and information quality. It also automates the action of logging the transaction, and can create related tasks and appointments.  Instant Set-Up: VRM can be installed and used immediately — there is no lengthy set-up and implementation time. The product can be used out-of-the-box or easily customized to meet specific organizational requirements.  Enhancements to Outlook: VRM implements a wealth of refinements and extensions to existing Microsoft Outlook features such as: enhanced linking to the journal; extending the logging of activities from a private journal folder to a public journal folder for shared access; and enforcing consistency of categories between multiple users.  Builds on Existing Investment: VRM leverages your existing financial investment in Microsoft Outlook by building on current Microsoft Outlook features. It leverages your training and implementation IX. Application of Data mining: Data mining technologies can be applied to a large variety of decision-making contexts in business. In particular, areas of significant payoffs are expected to include the following: Marketing—Applications include analysis of costumer behavior based on buying patterns; determination of marketing strategies including advertising, store location, and targeted mailing; segmentation of customers, stores, or products; and design of catalogs, store layouts, and advertising campaigns. - 11 Finance: Applications include analysis of creditworthiness of clients, segmentation of amount receivables, performance analysis of finance investments like stocks, bonds, and mutual funds; evaluation of financing options; and fraud detection.

Manufacturing: Applications involve the optimization of resources like machines, manpower, and materials; optimal design of manufacturing processes, shop-floor layouts, and production design, such as for automobiles based on customer requirements. Health Care: Applications include discovering patterns in radiological images, analysis of microarry (gene-chip) experimental data to relate to diseases, analyzing side effects of drugs. X. Conclusions: The ―value‖ of an instrument or a technology is directly tied with the importance of the problems that it can help to resolve or the relevance of the results that it provides. So, according to this consideration, data warehouse and data mining are of ―actual‖ and ―strategic‖ importance, since they are related with problems which are ―actual‖ and ―strategic‖. The reason for which data warehouse and data mining are so popular nowadays is perhaps that, in a world where information is a so important resource for an enterprise, they can ―create‖ and ―make more powerful‖ this resource, working not only by themselves, but also integrated with other instrument or in a wider ―philosophy‖, like CRM. In order to better understand what they can do, and also what they cannot do, it is useful to see at how they work, which are the relations between them and which are the link between them and the needs of a company, like enterprise integration or marketing research. Data warehouse makes information more accessible and useful for the whole company, processing the data collected in the different existing databases, data mining create new knowledge operating with algorithms that classify or find relations between these data. Knowledge and information, that are central topics for enterprises health, are also the issues to which data warehouse and data mining look.

References:  An Introduction to Database by C.J.Data  DATA WAREHOUSING IN REAL WORLD BY Sam Anahory & Dennis Murray.  Fundamentals of Database System by Remez Elmasri & Shamkant B. Navathe 

- 12 -

To top