Docstoc

Therapeutic Diet Prediction for Integrated Mining of Anemia Human Subjects using Statistical Techniques

Document Sample
Therapeutic Diet Prediction for Integrated Mining of Anemia Human Subjects using Statistical Techniques Powered By Docstoc
					                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                            Vol. 8, No. 8, November 2010




       Therapeutic Diet Prediction for Integrated Mining of
      Anemia Human Subjects using Statistical Techniques

                        Sanjay Choudhary                                                                  Abha Wadhwa
         Department of Mathematics & Computer Science                                    Department of Computer Science & Application
                Govt. Narmada P.G. Mahavidyalaya                                                      Govt Girls P.G. College
                         Hoshangabad, India                                                             Hoshangabad, India
                schoudhari123@rediffmail.com                                                           abhahbd@gmail.com
                          Kamal Wadhwa                                                                    Anjana Mishra
         Department of Mathematics & Computer Science                                   Department of Mathematics & Computer Science
                 Govt. Narmada P.G. Mahavidyalaya                                              Govt. Narmada P.G. Mahavidyalaya
                         Hoshangabad, India                                                             Hoshangabad, India
                   wadhwakamal68@gmail.com                                                       anjanamishra10@yahoo.com




                                                                               mining, which determines the correlation between items belonging
Abstract :- Chronic disease anemia [1] occurs when blood                       to transaction database [6], [7].
doesn’t have enough hemoglobin. Hemoglobin is a protein in red
blood cells that carries oxygen from lungs to the rest of our                     Chronic disease is a disease[10] that is long lasting for recurrent.
body. All the body parts need oxygen. Anemia can starve our                    Anemia also becomes a chronic disease if it is not cured timely. It is
body of the oxygen it needs to survive. Possible causes of anemia              prolonged, do not resolve spontaneously and are rarely cured
include low vitamin B12 or folic acid intake and some chronic                  completely. If anemia results from a diet which is low in iron, iron
illnesses. But the most common cause is not having enough iron                 rich foods or iron pills may be the doctor suggests. Data Mining
in blood which needs to make hemoglobin. This type of anemia                   technology can be utilized for improving the quality of health care
is called iron deficiency anemia.                                              of an individual. There is need for lowering the cost of health care
                                                                               facilities along with quality based treatment especially for poor
   Data Mining is widely used in database communities because                  sections of our society.
of its wide applicability. One major application area of Data
Mining is in therapeutic diet prediction. There are several                       Health care organization need to lower cost, raise quality and still
chronic diseases which can be prevented using nutritive food.                  remain competitive. IT can be used for patient health care. In the IT
This paper presents association and correlation between anemia                 driven society the role of IT in health care is well established. We
human subject and its prevention through diet nutrients. The                   can use data mining techniques for analyzing patient data to
role of diet in preventing and controlling iron deficiency is                  generate predictions and knowledge which can in turn be used for
significant. Due to changes in dietary and life style patterns                 fast and better clinical decision making. Medical data mining deals
anemia can become catastrophic, so by predicting proper and                    with large amount of data, there fore we need data mining
sufficient diet nutrients for individuals, we can reduce the                   techniques, which can explore hidden patterns in the data sets of the
impact of anemia on human subjects.                                            medical domain. These patterns can be utilized for clinical diagnosis
                                                                               and prediction.
Keywords :- Chronic Disease, Anemia, Diet Nutrients, Clinical
System, Correlation, Data Mining                                                  The available raw medical data are widely distributed,
                                                                               heterogeneous in nature and voluminous, therefore these data need
                                                                               to be collected in an organized form. This collected data can be then
                        I. INTRODUCTION
                                                                               integrated to form as a basis for prediction of nutrients chronic
                                                                               diseases. Data mining technology provides a user oriented approach
   Data Mining is referred to as Knowledge Discovery from                      to extract hidden patterns from the data. Data mining can deal with
databases[5], a process of nontrivial extraction of implicit previously        heterogeneous type of data. Medical data are different from data in
unknown and potentially useful information from databases, has                 other databases. Medical data are heterogeneous and contain text
wide application in information management concepts, query                     and images also, for example MRI , ECG, etc generate huge
processing, decision making, process control, statistical analysis etc.        amounts of heterogeneous medical data. Knowledge discovery from
[4], [5] An association rule mining is an important process in data            this type of data can greatly benefit mankind by improved diagnostic
                                                                               techniques.


                                                                          92                             http://sites.google.com/site/ijcsis/
                                                                                                         ISSN 1947-5500
                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                           Vol. 8, No. 8, November 2010



A. Background                                                                        II. ASSOCIATION AND CORRELATION TO BUILD
                                                                                                    RELATIONSHIP
   Diet and nutrition are important factors in the promotion and
maintenance of good health throughout the entire life course. The                In this paper we are dealing with two distinct systems, anemia
chronic diseases related to diet and nutrition are anemia, obesity,           human subject and diet prediction. Based on the level of hemoglobin
diabetes, cardiovascular disease, cancer, osteoporosis etc. Anemia            of individual, we have made an attempt to associate human subject
can be defined as a reduction in the hemoglobin, hematocrit or red            and iron requirement. It has been found that average range of
cell number[10]. The sudden, rapid loss of 30% of the total blood             hemoglobin for males is 14-16 and for females is 12-14. The iron
volume often results in death. The burden of chronic disease is               requirement for average range of hemoglobin individual according
rapidly increasing world wide .It has been calculated that in 2001,           to RDA (Recommended Dietary Allowances) table is 28mg for
chronic diseases contributed approximately 60% of the 56.5 million            males and 30mg for females. The table given below depicts the fact
total reported deaths in the world and approximately 46% of the               that the iron requirement of females is more than that of males. We
global burden of disease.                                                     have used the correlation statistical technique for finding the
                                                                              relationship between both the systems.
   The chronic disease problem has effected a large proportion of
our population. It has been projected that by 2020 chronic diseases              In every case of anemia[9], the cause should be discovered and
will account for almost three quarters of all deaths worldwide are            treated. In clinical practice, nutritional anemia is commonly
due to heart diseases. Chronic diseases are largely preventable               associated with overall under-nutrition and a balanced diet should be
disease. Research has shown that diet can control chronic diseases to         given. Usually, diet alone is not adequate and therapy with specific
large extent[11]. Modern dietary patterns and physical activity               supplements – particularly iron – is also needed. Supplements of
patterns are beneficial in prevention of chronic diseases. The                10mg (179 micromol) iron daily prevent iron deficiency.
chemical composition of food and physiological response to diet can           Association rule mining[5] uses support confidence framework.
be discovered. Unraveling the interconnection between diet and                Diet can cure anemia, this statement can be justified through support
health through data mining techniques is the basis of this paper              confidence measure.       We can augment support confidence
identification of disease and diet associated composite molecular             framework through correlation. The Bayesian correlation technique
biomarkers will facilitate identification of new molecular targets for        has been used to find the correlation between required level of iron
development of novel therapeutic agents to address diet related               (in mg) and range of hemoglobin of individuals.
chronic diseases .
                                                                                                III.   EXPERIMENTAL RESULTS
B. Data Mining And Statistics
                                                                                 Sufficient safe and varied food supplies not only prevent
  Data Mining is designed[2] to learn future prediction. The Data             malnutrition but also reduce the risk of chronic diseases whereas
Mining tool checks the statistical significance of the predicted              nutritional deficiency increases the risk of common infectious
patterns and reports. The difference between Data Mining and                  diseases generally in children. This paper has used data mining
statistics is that Data Mining automates the statistical process              techniques to investigate factors that contribute significantly in
requiring in several mining tools. Statistical inference is assumption        reducing the risk of chronic disease anemia.
driven in the sense that a hypothesis is formed and tested against
data. Data Mining in contrast is discovery driven. That is, the                 There are two methods of analyzing correlation between
hypothesis is automatically extracted from the given data. The other          variables:
reason is Data Mining techniques tend to be more robust for real-
world data and also used less by expert users.
                                                                                   (i)       Karl Pearson’s Method
                                                                                   (ii)      Scatter Diagram Method
C. Statistical Correlation
                                                                              (i) Use of Karl Pearson’s method to find coefficient of correlation:
   Through correlation[3] we may find that a change in one variable           Correlation is the relationship between two or more than two
result in change of second variable.                                          variables. We are finding correlation between anemia human
                                                                              subjects(male and female) and predicted diet. Anemia is a chronic
   Whenever there exists a relationship between two variable such             disease caused due to several factors. Among those factors
that a change in one variable results in a positive or negative change        inadequate and malnutrition is a major cause of anemia. We have
in other and also greater change in one variable results in a                 generated the following correlation Table 1 for male anemia subject
corresponding greater change in the other, the relationship is called         with mid value of hemoglobin range(Xm) and iron intake (Ym mg):-
correlation and the two variables are called correlated.
                                                                              Hb range:          0-6       6-8         8-12         12-14   14-16
  Two variables are called positively correlated if corresponding to
an increase (or decrease) in one variable results in an increase (or          Iron intake:       42        36          34           31      28
decrease) in the other.
                                                                              TABLE 1.
   Two variables are negatively correlated if corresponding to an
                                                                                                                  x2           y2
increase (or decrease) in one variable results in decrease (or                Xm    Ym       x=Xm-      y=Ym                                 x*y
increase) in the other.                                                                      Mx         -My
                                                                              3     42       -6.6       7.8       43.56        60.84         -51.48
                                                                              7     36       -2.6       1.8       6.76         3.24          -4.68

                                                                         93                               http://sites.google.com/site/ijcsis/
                                                                                                          ISSN 1947-5500
                                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                             Vol. 8, No. 8, November 2010



10                 34        0.4         -0.2     0.16     0.4          -0.8                 TABLE 2.
13                 31        3.4         -3.2     11.56    10.24        -10.88
                                                                                                                                             x2            y2
15                 28        5.4         -6.2     29.16    38.44        -33.48               Xf                 Yf        x=Xf-     y= Yf                                x*y
48                 171                            Σx2=     Σy2=         Σx*y =                                            Mx        -My
                                                  91.2     113.16       -101.32              3                  45        -5.25     8.5      27.56         72.25         -44.625
                                                                                             7                  38        -1.25     1.5      1.56          2.25          -1.875
MXm = Σ Xm /n = 48/5 = 9.6                                                                   10                 33        1.75      -3.5     3.06          12.25         -6.125
                                                                                             13                 30        4.75      -6.5     22.56         42.25         -30.875
MYm = Σ Ym /n = 171/5 = 34.2                                                                 33                 146                          Σx2=          Σy2=          Σx*y=
                                                                                                                                             54.74         129           -83.5
The coefficient of correlation for male anemia subjects:-

r = Σ x*y/√ Σ x2* Σ y2
                                                                                             MXf = Σ Xf /n = 33/4 = 8.25
 = -101.32/√(91.2)*(113.16)
                                                                                              MYf = Σ Yf /n = 146/4 = 36.5
 = -101.32/√ (10320.192)

 = -101.32/101.5883
                                                                                             The coefficient of correlation for female anemia subjects:-
 = -0.9974
                                                                                             r = Σ x*y/√ Σ x2* Σ y2

                                                                                              = -83.5/√(54.74)*(129)
(ii) Scatter Diagram Method: The diagrammatic representation of a
bivariate data is known as scatter diagram. For bivariate the values                          = -83.5/√ (7061.46)
of variables X and Y are plotted (male and female) in the X-Y plane.
X axis denote range of hemoglobin and Y axis denotes iron required
for individuals.                                                                              = -83.5/84.0325

Figure 1                                                                                      = -0.9937


          Correlation between Hemoglobin and Iron Intake
                              (Male)                                                         Figure 2


                   50                                                                                            Correlation between Hemoglobin and Iron Intake
                   40
     Iron Intake




                                                                                                                                    (Female)
                   30
                                                                    Iron Intake
                   20                                                                                           50
                   10                                                                                           40
                                                                                                  Iron Intake




                    0
                                                                                                                30
                         0         5        10      15     20                                                                                                          Iron Intake
                                                                                                                20
                                   Hem oglobin Range
                                                                                                                10

                                                                                                                 0
                                                                                                                      0             5             10            15
   In this scatter diagram all the points lie on a line and it declines                                                           Hem oglobin Range
from left to right, this depicts the fact that there is a negative
correlation of high degree between the variables, i.e. if the
hemoglobin count of individual is less than the required iron is
more.                                                                                          If measure of correlation is between -0.75 and -1, negative then
                                                                                             measure of correlation is of high degree.
   We have generated the following correlation Table 1 for female
anemia subject with mid value of hemoglobin range(X) and iron                                  Both the methods have justified the same fact that there is
intake (Ymg):-                                                                               negative correlation of high degree between hemoglobin range and
                                                                                             required iron.
Hb range:                          0-6      6-8     8-12    12-14
                                                                                                                          IV.   CONCLUSION AND FUTURE SCOPE
Iron intake:                       42       36      34      30
                                                                                               Data Mining can be performed by dietitians and hospital
                                                                                             administration to prepare diet chart for anemic patients. The goal is

                                                                                        94                                              http://sites.google.com/site/ijcsis/
                                                                                                                                        ISSN 1947-5500
                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                             Vol. 8, No. 8, November 2010



to detect when the Hb percent of patient is very low the diet                   [6]    I. Ha, Y. Cai, and N. Cercone, “Data-driven of Quantitative
predicted should contain food high in iron – such as seafood, dried                    Rules in Relational Databases.” IEEE Tram. Knowledge
fruits like apricots, prunes, raisins, nuts, beans, green leafy                        and Data Eng., vol.5, pp. 29-40, 1993.
vegetables, whole grains etc..                                                  [7]    R. Agrawal, T. Imielinski, and A. Swami, “Mining
                                                                                       Association Rules between Sets of Items in Large
   This research has revealed the fact that anemia human subjects                      Databases.” In Proceeding ACM SIGMOD Conference, pp.
should be recommended food containing high percentage of iron and                      207-216, 1993.
those having normal range of hemoglobin can take food containing                [8]    Abdullah H. al-Assaf, “Anemia and Iron Intake of Adult
average percentage of iron.                                                            Saudis in Riyadh City-Saudi Arabia.” Pakistan Journal of
                                                                                       Nutition 6 (4): 355-358, 2007, issn 1680-5194.
                                                                                [9]    Antia F. P. and Abraham Philip, “Clinical Dietitics and
  There exist negative correlation[3] between diet high in iron and
                                                                                       Nutrition”, 329-330, 1997.
Hb% of patient. If Hb% of patient is low, diet should be rich in iron
                                                                                [10]   Corinne H. Robinson, Marilyn R. Lawler “Normal and
and if the reverse condition exist i.e. Hb% of patient is high, diet can
contain average percentage of iron.                                                    Therapeutic Nutrition” 511-519.
                                                                                [11]   Report of a Joint WHO/FAO Expert Consultation, “DIET
                                                                                       NUTRITION AND THE PREVENTION OF CHRONIC
   This paper depicts the correlation between anemia human subject                     DISEASES”, WHO Technical Report Series, 916, 4-5.
and diet intake of individual. Anemia is common disease effecting               [12]   Swaminathan M. “ Food & Nutrition”, 66-74, vol. II,
large masses of people and if it is not cured timely, it becomes                       Applied Aspects, 2003.
catastrophic. Organisation for social services can use the above                [13]   Abidi, S.S.R. ( 2001) Knowledge management in
result for preventing anemia of particular area. There are several                     healthcare: towards 'knowledge-driven' decision- support
areas which suffer from some kind of deficiencies such as Calcium                      services. International Journal of Medical Informatics
deficiency is common in Bhopal, Iodine deficiency in Kannur,                           63, 5-18.
Purulia Distt., West Bengal, Sickle Cell anemia in tribal population            [14]   B.Shri Laxmi, Food Science.
of Maharashtra, Gujrat, Orissa and Tmilnadu etc.. They can use                  [15]   Cios, K.J., & Moore, G.W. ( 2000) Medical Data Mining
above conclusion to predict diet for anemia affected areas. Data                       and       knowledge Discovery: An Overview. In Cios K. J.
mining deals with large data set, suppose we consider total human                      , Medical         Data Mining and knowledge Discovery.
population of any area and we can classify the population into male                    Heidelberg: Physica-Verlag.
and female data sets. In areas where iron deficiency is common the              [16]   J Am Coll Cardiol. Aug, 28 (2) , 515-521. Data Science
hemoglobin range of individuals will be below 8%.                                                Journal, Volume 5,19 October 2006.
   This paper has used the correlation concept of statistics to depict          [17]   J.S. Garrow, Human Nutrition & Dietetics.
and justify the fact that there is high degree of correlation between
hemoglobin range of individual and iron intake. By using this fact
organizations can plan diet patterns for areas facing the problem of
anemia. The entire population of that area can be prescribed iron
rich food. Dietitians can also use this fact for preventing this
common disease of particular geographical area. There are several
remote areas in India which suffer from this common disease.
Organisations can directly plan nutrients for those needy people
without the need for performing unnecessary calculations. This
result will be beneficial for large masses of people. They can be
prevented from this chronic disease by giving iron rich food, so the
above results are beneficial for our society.


                           REFERENCES

[1]   The College of Family Physicians of Canada, 2630 Skymark
        Avenue, Mississauga,
         ON L4W 5A4
[2]     Jayanthi Ranjan, Applications of Data Mining Techniques
        in Pharmaceutical Industry, Journal of Theoritical and
        Applied Information Technology, 2005-2007, 61-67.
[3]     Statistical Methods, H. K. Pathak and Dr. D. C. Agrawal,
        258 – 265, Correlation and Regression.
[4]     Yeong-Chyi Lee A, Tzung-Pei Hong, Tien Chin Wang,
        “Multi-level fuzzy mining with multiple minimum
        supports.” Journal of Elsevier Expert Systems with
        Applications, vol.34, pp.459-468, 2008.
[5]     J. Han, M. Kamber,         “Data Mining: Concepts and
        Techniques.” The Morgan Kaufmann Series, 261-265,
        2001.




                                                                           95                           http://sites.google.com/site/ijcsis/
                                                                                                        ISSN 1947-5500