Docstoc

Enchancing and Deriving Actionable Knowledge from Decision Trees

Document Sample
Enchancing and Deriving Actionable Knowledge from Decision Trees Powered By Docstoc
					                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                     Vol. 8, No. 9, December 2010




  ENCHANCING AND DERIVING ACTIONABLE KNOWLEDGE FROM
                    DECISION TREES

1. P.Senthil Vadivu                                                    2.Dr.(Mrs) Vasantha Kalyani David.
HEAD, Department Of Computer Applications,                             Associate Professor,
Hindusthan College Of Arts and Science,                                Department of Computer Science,
Coimbatore-641028. Tamil Nadu, India.                                  Avinashilingam Deemed University
Email: sowju_sashi@rediffmail.com                                      Coimbatore, Tamil Nadu, India
                                                                       Email: vasanthadavid@yahoo.com




Abstract                                                              marketing actions and compare the rankings using a
                                                                      lift chart or the area under curve measure from the
    Data mining algorithms are used to discover customer              ROC curve. Ensemble based methods are examined
models for distribution information, Using customer                   under the cost sensitive learning frameworks. For
profiles in customer relationship management (CRM), it                Example, integrated boosting algorithms with cost
has been used in pointing out the customers who are loyal             considerations.
and who are attritors but they require human experts for                  A class of reinforcement learning problems and
discovering knowledge manually.
                                                                      associated techniques are used to learn about how to
    Many post processing technique have been introduced
that do not suggest action to increase the objective function         make sequential decisions based on delayed
such as profit. In this paper, a novel algorithm is proposed          reinforcement so as to maximize cumulative rewards.
that suggest actions to change the customer from the                      A common problem in current application of data
undesired status to the desired one. These algorithms can             mining in intelligent CRM is that people tend to focus
discover cost effective actions to transform customer from            on, and be satisfied with building up the models and
undesirable classes to desirable ones. Many tests have been           interpreting them, but not to use them to get profit
conducted and experimental results have been analyzed in              explicitly. More specifically, most data mining
this paper
                                                                      algorithms only aim at constructing customer profiles;
    Key words : CRM,BSP,ACO, decision trees, attrition
                                                                      predict the characteristics of customers of certain
                                                                      classes. Example of this class is: what kind of
1. Introduction                                                       customers are likely attritors and kind are loyal
          Researchers are done in data mining. Various                customers?
models likes Bayesian models, decision trees, support                     This can be done in the telecommunications
vector machines and association rules have been                       industry. For example, by reducing the monthly rates
applied to various industrial applications such as                    or increasing the service level for valuable customers.
customer relationship management,(CRM)[1][2] which                        Unlike distributional knowledge, to consider
maximizes the profit and reduces the costs, relying on                actionable knowledge one must take into account
post processing techniques such as visualization and                  resource constraint such as direct mailing and sales
interestingness ranking.                                              promotion [14]. To make a decision one must take into
          Because of massive industry deregulation                    account the cost as well as the benefit of actions to the
across the world each customer is facing an ever                      enterprise.
growing number of choices in telecommunication                            This paper is presented with many algorithms for
industry and financial services [3] [10] .The result is               the Creation of decision tree, BSP (Bounded
that an increasing number of customers are switching                  Segmentation Problem), Greedy-BSP and ACO (Ant
from one service provider to another. This                            Colony Optimization) which helps us to obtain actions
Phenomenon is called customer “churning “or                           for maximizing the profit and finding out the number
“attrition”.                                                          of customer who are likely to be loyal.
          A main approach in the data mining area is to
rank the customers according to the estimated
likelihood and they way they respond to direct




                                                                230                              http://sites.google.com/site/ijcsis/
                                                                                                 ISSN 1947-5500
                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                 Vol. 8, No. 9, December 2010




2. Extracting Actions in Decision Tree                                      The domain specific cost matrix for the net
    For CRM applications, using a set of examples a              profit of an action can be defined as follows:
decision tree can be built, which is described by set of                   PNet=PE*Pgain-∑i COSTij           (1)
attributes such as name, sex, birthday, etc., and                Where PNet denotes the net profit, PE denotes the total
financial information such as yearly income and family           profit the customer in the desired status, Pgain denotes
information such as lifestyles and number of children.           the probability gain, and COSTij denotes the cost of
    Decision tree is used in vast area of data mining            each action involved.
because one can easily convert methods into rules and                      The leaf node search algorithm for searching
also obtain characteristics of customers those who               the best actions can be described as follows:
belong to a certain class. The algorithm that is used in
this paper do not only relay on prediction but also it           Algorithm: leaf-node search
can classify the customers who are loyal and such rules          1. For each customer x, do
can be easily derived from decision trees.                       2. Let S be the source leaf node in which x falls into;
   The first step is to extract rules when there is no           3. Let D be a destination leaf node for x the maximum
restrictions in the number of rules that can be                  net profit PNet.
produced. This is called as unlimited resource case [3].         4. Output (S, D, Pnet);
The overall process of the algorithm is described as
follows:
                                                                 An example of customer profile:-
Algorithm 1:
     Step 1: Import customer data with data collection,                                   Service
data cleaning, data preprocessing and so on.
     Step 2: A Decision tree can be build using
decision tree learning algorithm[11] to predict, if a                               Low      Med          HIGH
customer is in desire status or not. One improvement
for the decision building is to use the area under the                                         C                    RATE
                                                                            SEX
curve of the ROC curve [7].
     Step 3: Search for optimal actions for each                                                                L          H
customer using the key component proactive solution                                          0.1
[3].                                                                            F   M
     Step 4: Produce reports, for domain experts to
review the actions that deploy the actions.                                  A       B                          D              E

2.1 A search for a leaf node in the unlimited                             0.9       0.2                0.8                 0.5
resources
          This algorithm search for optimal actions and
transforms each leaf node to another node in the more            Consider the above decision tree, the tree has five
desirable fashion. Once the customer profile is built,           nodes. A, B, C, D, E each with the probability of
the customers who are there in the training examples             customers being a loyal. The probability of attritors
falls into a particular leaf node in a more desirable            simply “1” minus this probability.
status thus the probability gain can then be converted               Consider a customer Alexander who’s record states
into expected gross profit.                                      that the service=Low (service level is low), sex=M
          When a customer is moved from one leaf to              (male), and Rate =L (mortgage rate is low). The
another node there are some attribute values of the              customer is classified by the decision tree. It can be
customer that must be changed. When an attribute                 seen that Alexander falls into the leaf node B, which
value is transformed from V1 to V2, it corresponds to            predicts that Alexander will have only a 20 percent
an action that incurs cost which is defined in a cost            chance of being loyal. The algorithm will now search
matrix.                                                          through all other leafs (A, C, D & E) in the decision
           The leaf node search algorithm searches all           tree to see if Alexander can be “replaced” into a best
leafs in the tree so that for every leaf node ,a best            leaf with the highest net profit.
destination leaf node is found to move the customer to           1. Consider the leaf node A. which do not have a high
the collection of moves are required to maximize the             probability of being loyal(90%), because the cost of
net profit.                                                      action would be very high if Alexander should be




                                                           231                                http://sites.google.com/site/ijcsis/
                                                                                              ISSN 1947-5500
                                                  (IJCSIS) International Journal of Computer Science and Information Security,
                                                  Vol. 8, No. 9, December 2010




changed to female). So the net profit is a negative                Step 2: Consider a constant, k.(K<m) ,where m is total
infinity.                                                          number of source leaf nodes.
2. Consider leaf node C, it has a lower probability of             Step 3: Build a cost matrix with attributes U and V.
being loyal, so we can easily skip it.                             Step 4: Build a unit benefit vector, when a customer
3. Consider leaf node D the probability gain is 60                 belongs to positive class
percent (80 percent - 20 percent) if Alexander falls into          Step 5: Build a set of test cases.
D, the action needed is to change service from L (low)               The goal is to is to find a solution with maximum net
to H (high).                                                       profit. by transforming customers that belongs to a
4. Consider leaf E, the probability gain is 30 percent             source node S to the destination node D via, a number
(50 percent – 20 percent), which transfers to $300 of              of attribute value changing actions.
the expected gross profit. Assume that the cost of the              GOALS:
actions (change service from L to H and change rate                      The goal is to transform a set of leaf node S to a
from L to H). Is $250, then the net profit of the jack             destination leaf node D, S->D.
from B to E is $ 50 (300-250).                                     ACTIONS:
  Clearly, the node with the maximum net profit for                    In order to change one has to apply one attribute
Alexander is D, that suggest actions of changing the               value changing action. This is denoted by
service from L to H.                                                          {Attr, u->v}.
                                                                      Thus the BSP problem is to find the best K groups
3. COST MATRIX:-                                                   of source leaf nodes {Group i=1, 2…, k} and their
  Each attribute value changes incur cost and the cost             corresponding goals and associated action sets to
for each attribute is determined by the domain experts.            maximize the total net profit for a given data set Ctest.
The values of many attributes, such as sex, address,
                                                                                             Service
number of children cannot be changed with any
reasonable amount of money. These attributes are
called “hard attributes”. The users must assign large                                  Low                    HIGH
number to every entry in the cost matrix.
        Some values can be easily changed with
                                                                               STATUS
reasonable costs, these attributes such as the service                                                           RATE
level, interest rate and promotion packages are called
“soft attributes”.
      The hard attributes should be included the tree                      A          B                     C           D
building process in the first place to prevent customers
from being moved to other leafs is because that many
hard attributes are important accurate probability
estimation of the leaves.                                                                                                   L4
                                                                      L1              L2                 L3
    For continuous value attributes, such as interest rate
which is varied within a certain range. the numerical                     0.9         0.2           0.8              0.5
ranges be discretized first for feature transformation.              Example: To illustrate the limited resources problem,
                                                                   consider again our decision tree in above figure.
4.     THE      LIMITED        RESOURCE           CASE:            Suppose that we wish to find a single customer
POSTPROCESSING DECISION TREES:-                                    segment {k=1}. A candidate group is {L2, L4}, with a
4.1 BSP (Bounded Segmentation Problem)                             selected action set {service <-H, Rate <--C} which can
      In the previous example that is considered above             transform the group to node L3. assume that group to
each leaf node of the decision tree is a separate                  leaf node L3,L2 changes the service level only and
customer group. For each customer group we have to                 thus, has a profit gain of (0.8-0.2)*1-0.1=0.5 and L4
design actions to increase the net profit. But in practice         has a profit gain of (0.8-0.5)*1-0.1=0.2.Thus, the net
the company may be limited in its resources. But when              benefit for this group is 0.2+0.5=0.7.
such limitations occur it is difficult to merge all the               As an example of the profit matrix computation, a
nodes into K segments. So to each segment a                        part of the profit matrix corresponding to the source
responsible manager can apply several actions to                   leaf node. L2 is as shown in table, where
increase the overall profit.                                       Aset1={status=A}. Aset2={service=H, Rate =C } and
Step 1: Here, a decision tree is build with collection             Aset3={service=H , Rate=D}. here, for convenience
S(m) source leaf nodes and collection D(m) destination             we ignore the source value of the attributes which is
leaf nodes.                                                        dependent on the actual test cases.




                                                             232                              http://sites.google.com/site/ijcsis/
                                                                                              ISSN 1947-5500
                                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                    Vol. 8, No. 9, December 2010




  TABLE1 An         example     of   the   profit   matrix         input parameters. Greedy BSP algorithm processes this
computation                                                        matrix in a sequenced manner for k iterations. In each
Aset1(L2)(Goal      Aset2(L2)(Goal         Aset3(L2)(Goal=         iteration, it considered adding one additional column
= ->L1)             = ->L3)                ->L4)                   of the M matrix, until it considered all k columns.
0.6                 0.4                    0.1                       Greedy BSP algorithm considers how to expand the
…                   …                      …                       customer group by one. To do this, it considers which
                                                                   addition column will increase the total net profit to a
                                                                   highest value we can include one more column.
TABLE 2 : lustrating the Greedy-BSP algorithm                      5. IMPROVING THE ROBUSTNESS USING
                                                                   MULTIPLE TREES:
Source     Aset1       Aset2         Aset3      Aset4                  The advantage of the Greedy BSP algorithm is that
nodes      (goal=      (goal=        (goal=     (goal= -           it can significantly reduce the computational cost while
           ->D1)       ->D2)         ->D3)      >D4)               guaranteeing the high quality of the solution at the
S1         2           0             1          1                  same time. In Greedy BSP algorithm, built decision
S2         0           1             0          0                  tree always choose the most informative attribute as
S3         0           1             0          0                  the root node.
S4         0           1             0          0                      Therefore, we have also proposed an algorithm
Column     2           3             1          1                  referred to as Greedy BSP multiple which is based on
sum                                                                integrating an ensemble of decision trees in this paper
Selected               X                                           [16],[5],and [15].
actions                                                                The basic idea is to construct multiple decision trees
                                                                   using different top ranked attributes as their root
                                                                   nodes. For each set of test cases, the ensemble decision
 Then the BSP problem becomes one of picking the                   trees return the median net profit and the
best k columns of matrix M such that the sum of the                corresponding leaf nodes and action sets as the final
maximum net profit value for each source leaf node                 solution.
among the K columns is maximized. When all Pij                         Thus, we expect that when the training data are
elements are of unit cost, this is essentially a maximum           unstable, the ensemble based decision tree methods
coverage problem, which aims at finding K sets such                can perform much more stable as compared to results
that the total weight of elements covered is maximized,            from the single decision tree.
where the weight of element is the same for all the                Algorithm Greedy BSP Multiple:
sets. A special case of the BSP problem is equivalent              Step 1: Given a training data set described by P
to the maximum coverage problem with unit costs. Our               attributes
aim will then to find approximation solutions to the                  1.1 Calculate gain ratios to rank all the attributes in
BSP problem.                                                              a descending order.
                                                                      1.2 For i=1 to p
Algorithm for BSP:                                                            Use the ith attribute as the root node to
                                                                          construct the ith decision tree
Step 1: Choose any combination of k action sets                           End for
Step 2: Group the leaf nodes into k groups                         Step 2: take a set of testing examples as input
Step 3: Evaluate the net benefit of the action sets on                    2.1 For i= 1 to p
the group                                                                 Use the ith decision tree to calculate the net
Step 4: Return the k action set with associated leaf                      profit by calling algorithms Greedy BSP
node                                                                      End for
   Since the BSP needs to examine every combination                       2.2 return k actions sets corresponding to the
of k action sets, the computation complexity is more.                     median net profit.
   To avoid this we have develop the Greedy algorithm                     Since Greedy BSP multiple relies on building,
which can reduce the computational cost and guarantee                     multiple decision trees to calculate the median
the quality of solution.                                                  net profit different sampling can only affect the
   We consider the intuition of the Greedy BSP                            construction of a small portion of decision trees.
algorithm using an example profit matrix M as shown                       Therefore. Greedy BSP Multiple can produce
in table. Where we assume a k=2 limit. In this table                      net profit less variance.
each number a profit Pij value computed from the




                                                             233                               http://sites.google.com/site/ijcsis/
                                                                                               ISSN 1947-5500
                                                  (IJCSIS) International Journal of Computer Science and Information Security,
                                                  Vol. 8, No. 9, December 2010




6.  AACO   (ADAPTIVE                ANT       COLONY                  Collecting customer data [9] and using the data for
OPTIMISATION):                                                     direct marketing operations have increasingly become
                                                                   possible. One approach is known as database
  The searching process of ACO is based on the                     marketing which is creating a bank of operation about
positive feedback reinforcement [4][12]. Thus, the                 individual customers from their orders, queries and
escape from the local optima is more difficult than the            other activities using it to analyze customer behavior
other Meta heuristics, therefore, the recognition of               and develop intelligent strategies [10],[13],[6].
searching status and the escape technique from the                    Another important computational aspect is to
local optima are important to improve the search                   segment a customer group into sub-groups. AACO is a
performance of ACO. Regarding the recognition of                   new technique that is used to find the accuracy in this
searching status, the proposed algorithm utilizes a                paper.
transition of the distance of the best tour. A period of              All the above research works have aimed at finding
which each ant builds a tool represents one generation.            a segmentation of the customer’s database taking a
  Regarding the recognition of searching status, the               predefined action for every customer based on that
proposed utilizes a transition of the distance of the best         customer’s current status. None of them have
two (shortest 2). A period of which each ant builds a              addressed about discovering actions that might be
tool represents one generation.                                    taken from a customer database, in this paper we have
There is a cranky ant which selects a path let us not              addressed about how to extract actions and find out the
being selected and which is shortest.                              best accuracy for the customer to be loyal.

ADAPTIVE ANT COLONY ALGORITHM                                      EXPERIMENTAL EVALUATION
  1. Initialized the parameters ∞, t max, t, sx, sy
  2. for each agent do
  3. Place agent at randomly selected site an grid
  4. End for
  5. While (not termination)
  // such at t≤tmax
  6. for each agent do
  7. Compute agent’s
  Fitness t(agent)
  And activate probability Pa (agent)
  According to (4) and (7)
   8. r<-random([0,1]}
  9. If r≤ Pa then
  10. Activate agent and move to random
  Selected neighbor’s site not
  Occupied by other agents
  11. Else                                                         This experimental evaluation shows the entropy value
  12 .Stay at current site and sleep                               that has been calculated for each parameter.
  13 .end if
  14 .end for
  15 .adapting update parameters ∞, t<-t+1
  16. End while
  17.Output location of agents

7. DIFFERENCE FROM A PREVIOUS WORK
    Machine learning and data mining research has
been distributed to business practices by addressing
some issues in marketing. One issue is that the difficult
market is cost sensitive. So to solve this problem an
associative rule based approach is used to differentiate
between the positive class and negative class members
and to use this rules for segmentation.




                                                             234                             http://sites.google.com/site/ijcsis/
                                                                                             ISSN 1947-5500
                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                Vol. 8, No. 9, December 2010




The best tree is selected from the various experimental
result analyzed.




                                                                   In our example each action set contains a number of
                                                                actions and each action set contains four different
                                                                action set with attribute changes. The experiment with
                                                                Greedy BSP found action sets with maximum net
                                                                profit and is more efficient than optimal BSP. We also
                                                                conducted this experiment with AACO and found that
                                                                AACO is more accurate than the Greedy BSP
                                                                algorithms


                                                                REFERENCES
                                                                    [1] R.Agarwal and R.Srikanth “Fast algorithms for
                                                                large data bases (VLDB’94), pp.487-499, sept.1994.




                                                          235                              http://sites.google.com/site/ijcsis/
                                                                                           ISSN 1947-5500
                                              (IJCSIS) International Journal of Computer Science and Information Security,
                                              Vol. 8, No. 9, December 2010




     [2]Bank Marketing Association, Building a                   [15] X.Zang and C.E. Brodley, “Boosting Lazy
Financial Service Plan: Working plans for product and         Decision Trees”, Proc Conf .Machine Learning (ICM),
segment marketing, Financial Source Book, 1989.               pp178-185, 2003.
     [3]A.Berson, K.Therling, and S.J.Smith, Building           [16] Z.H.Zhou, J.Wu and W.Tang “Ensembling
Data Mining Applications for CRM.McGraw-Hill,                 Neural Network IEEE Conf. Data Mining, pp 585-588,
1999.                                                         2003.
     [4] Bezdek,J.C., (1981) Pattern Recognition with
fuzzy Objective Function Algorithm”                            P.Senthil Vadivu received MSc (Computer Science)
     [5] M.S.Chen, J.Han and P.S.Yu,”Data Mining: An          from Avinashilingam Deemed University in 1999,
Overview from a Database Perspective,”IEEE Trans              completed M.phil from Bharathiar university in
on Knowledge and Data Engineering”, 1996.                     2006.Currently working as the Head, Department of
     [6]R.G.Drozdenko      and     P.D.Drake.”Optimal         Computer Applications, Hindusthan College of Arts
Database Marketing “2002.                                     and Science, Coimbatore-28. Her Research area is
     [7] J.Hung and C.X.Ling,”Using Auc and                   decision trees using neural networks.
Accuracy in Evaluvating Learning Algorithms”, IEEE
    Trans Knowledge and Data Engineering. Pp 299-
310, 2005.
     [8] Lyche “A Guide to Customer Relationship
Management” 2001.
     [9]H.MannilaH.Toivonen,and
.I.Verkamo,”Efficient Algorithms for Discovering
Association Rules”Proc workshop Knowledge
discovery in databases.
    [10]E.L.Nash,”Database Marketing” McGraw Hill,
1993.
   [11]J.R.Quinlan, C4.5 Programs for Machine
Learning”, 1993
 [12] S.Rajasekaran, Vasantha Kalyani David ,
Pattern Recognition using Neural And Functional
networks by 2008.


 [13]Rashes    attrition   and   M.Stone   DataBase
Marketing, Joattritionhn Wiley, 1998.
   [14] Q.Yang, J.Yin,C.X Ling, and T.Chen, “Post
processing Decision Trees to Extract Actionable
Knowledge” IEEE conf.Data Mining ,pp 685-688,
2003.




                                                        236                              http://sites.google.com/site/ijcsis/
                                                                                         ISSN 1947-5500

				
DOCUMENT INFO
Description: The International Journal of Computer Science and Information Security (IJCSIS) is a well-established publication venue on novel research in computer science and information security. The year 2010 has been very eventful and encouraging for all IJCSIS authors/researchers and IJCSIS technical committee, as we see more and more interest in IJCSIS research publications. IJCSIS is now empowered by over thousands of academics, researchers, authors/reviewers/students and research organizations. Reaching this milestone would not have been possible without the support, feedback, and continuous engagement of our authors and reviewers. Field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. ( See monthly Call for Papers) We are grateful to our reviewers for providing valuable comments. IJCSIS December 2010 issue (Vol. 8, No. 9) has paper acceptance rate of nearly 35%. We wish everyone a successful scientific research year on 2011. Available at http://sites.google.com/site/ijcsis/ IJCSIS Vol. 8, No. 9, December 2010 Edition ISSN 1947-5500 � IJCSIS, USA.