Docstoc

Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh

Document Sample
Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh Powered By Docstoc
					                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                     Vol. 8, No. 5, August 2010
              Clinical Data Warehouse on Insect Vector Diseases to Human of
                                    Andhra Pradesh
                              1
                                  Dr.M. Usha Rani, 2M.Kalpana Devi, 3 Dr.D.M. Mamatha
                                     4
                                       Dr.R.Seshadri , 5Yaswanth Kumar.Avulapti
                          1
                              Associate Professor, 2Research Scholar, Dept. of Computer Science,
                              3
                                Associate Professor, Dept. of Seri-Bio Sciences, SPMVV, Tirupati
                                4
                                  Director, S.V.U.Computer Center S.V.University, Tirupati
                               5
                                Research Scholar, Dept of Computer Science ,S.V.University, Tirupati

ABSTRACT                                                             of India, out of which 7,52,245 were from Karnataka alone.
         The Widespread of Insect Vector Diseases to                 Impact on disease spread includes socio-economic aspects,
humans is causing substantial morbidity and economic loss            clinical attendance and barriers to health care and lack of
to our nation. The year 2006 is likely to go down as one of          awareness to control the diseases. This stressed the need to
the worst years in terms of public health, which has                 track the relevant information, the various aspects and data
witnessed a high incidence of Insect Vector Diseases such            about these diseases.
as Malaria, Chikungunya, Dengue, Lymphatic Filariasis,
And Japanese Encephalitis. This stressed the need to track           3. Significance of the Work
the relevant information about these diseases. The reliable                   The epidemic diseases are a great threat to India
and quickly retrievable clinical data on disease wise is a           and there is a need to construct the data warehouse for
need of the hour with which planners can prepare their               prevention, early detection and to take control measures.
strategies to control and curb the diseases. From the                There is a need to aware the public about epidemic diseases.
aforesaid point of view this particular data warehouse               The information given by data warehouse is useful to the
(DWH) going to be handy to the planners.                             researchers, academicians, doctors, health workers and
                                                                     Govt. servants including common man. This data keeps us
Key Words: Insect Vector diseases, Chikungunya,                      aware and forearmed to prevent such attacks in future.
Malaria, Dengue, Lymphatic Filariasis           &    Japanese
Encephalitis, Clinical data, Data warehouse                          4. Objectives
                                                                               This work is proposed to be undertaken with the
1. Introduction                                                      following objectives:
          The epidemic diseases are a threat to the society           Persons at the helm of affairs at central Government in
starting from the stone age to till date. Even though we have             general and State Government in particular are worsely
good past experience about epidemic diseases but the                      in need of disease wise clinical data to equip
problems are not handled in a proper way. The control of                  themselves with corrective cum counter strategies. The
these diseases involves control of three living beings and                reliable and quickly retrievable clinical data on disease
their environment viz. man-the host, mosquito-the vector                  wise is a need of the hour with which planners can
and the deadly pathogen-the parasite. Since the vector and                prepare their strategies to control and curb the diseases
the pathogen are highly adaptable, much of the emphasis is                from this point of view this particular data warehouse
on man i.e. bringing the awareness in public related to the               going to be handy to the planners.
insect vector diseases.                                               This data warehouse is for the future use of the
          The National and International efforts over these               researchers, academicians, Doctors, Health workers and
Insect Vector Diseases control were highly successful in late             Govt. servants including common man. This data keeps
1950’s and the early 60’s. However, due to various reasons                us aware and forearmed to prevent such attacks in
the control programs received setbacks all over the world                 future.
and today it has come back with vengeance. Present                    The data warehouse and analysis reports will be made
epidemic of Chikungunya in India after a gap of 30 years, is              publicly available for further research.
the largest ever in the world, with over 1.3 million people
affected. For other mosquito borne diseases there has been a         5. Data Warehouse
threefold increase in Japanese Encephalitis since 2001.                       Data Warehousing is a buzz-phrase that has taken
Malaria infects 2 million Indians annually. It is time to            the information systems’ world by storm. A data warehouse
address the research on these lines to explore, where the            (DWH) can be looked at as an “informational database” that
system fails in combating these diseases.                            is maintained separately from an organization’s operational
                                                                     database. But that would fall short of the full technological
2. Origin of the Research Problem                                    implications of the DWH term. The process of transforming
          The widespread of Insect Vector diseases to                data into information and making it available to the user in a
human is causing substantial morbidity and economic loss             time bound manner to make a difference is known as data
to our nation. The year 2006, is likely to go down as one of         warehousing.
the worst years in terms of public health, which has
witnessed a high incidence of Insect Vector diseases such                     In order to serve the decision making process of the
as Malaria, Chikungunya, Dengue, Japanese Encephalitis.              management the data warehouse has to supply the following
The WHO regional office for South-East Asia has reported             primary functionality:
1.3 million cases from 152 districts in 10 states/provinces           The DWH is a reflection of the business rules of the
                                                                         enterprise – not just of a specific function or business

                                                               240                               http://sites.google.com/site/ijcsis/
                                                                                                 ISSN 1947-5500
                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                       Vol. 8, No. 5, August 2010




    unit-as they apply to strategic decision support                  data warehouse. ETL also changes as the data warehouse
    information.                                                      evolves, so ETL processes must designed for ease of
 It is the collection point for the Integrated, Subject-             modification.
    Oriented strategic information that is handled by the                      Once the scope had been set, the relevant data has
    data acquisition process.                                         to be identified from the raw source data available, to
 It is the historical store of strategic information, with           formalize the approach of this task, a method which
    the history relating to either the data or its relationships.     transforms data from the raw data to the source of DWH has
 It is the source of stable data regardless of how the               three steps which are as follows:
    processes may change. This requires a data model that                   First stage data is the raw data from operational
    is not influenced by the operational processes creating                    database.
    the data.                                                               Second stage data is transformed, cleansed and
         Additionally the data warehouse provides as                           normalized from stage 1 data.
functionality for the support of ad hoc queries.                            Third stage data is further transformed from stage
                                                                               2, optimized for final fact data representation.
5.1 The Clinical Data Warehouse                                            “Data stage” software is an ETL Tool selected to
          A clinical data warehouse or CDWH is a facility             implement data warehouse.
that houses all electronic data collected at a clinical center.
For any modern clinical institute, it is necessary to separate        5.2.1 Stage 1- Raw Data
operational data from informational data by creating a                         Data collected from the Ministry of Health and
clinical data warehouse. A growing number of technologies             Family Welfare, Hyderabad, Andhra Pradesh. This stage
for integrating and performing structured analyses of data            data is considered as raw source data, which are of table
from disparate sources are competing to win the day for               format. Data about all the five diseases such as Dengue
healthcare organizations.                                             Table, Malaria Table, Chikungunya Table, Japanese
          A CDWH is therefore a DWH tailored for the                  Encephalitis Table, and Filariasis Table are of the same
needs of users in a clinical environment, combining                   format. Description of the sample table is as follows.
information from a variety of legacy health-care databases
and cleansed operational data to form a centralized data              Dengue Table (District Name                     Character
repository to answer the informational needs of all clinical                        Year                              Number
users.                                                                              Total Blood Samples
          Data warehouse in clinical context have                                     Collected                       Number
traditionally been administrative in nature, focusing on                            Confirmed Cases                   Number
patient billing and patient-care management, organizational                         Number of Deaths                  Number)
aspect of hospitals that were optimized using data
warehouse technology not much different than                          5.2.2 Stage 2 - Refined data
contemporary enterprises. Technology however evolved                            Based on the tables from Stage 1 certain design
quickly and more complex areas of clinical data                       decisions had to be made before any data modeling could
management could be tackled. The information technology               commence. Questions like
supported collection process of clinical data has had a long             What are the central facts?
history, and the promise of a new technology leveraging                  Which are the dimensions should be focused?
these collections put physicians, nurses and clinical                           Parallel to the modeling process, steps were taken
researchers right next to the administrators on the map.              to get an idea of what information could be derived from
                                                                      the data available. Stage one tables are used to construct text
5.2 Extraction, Transformation and Loading -                          files. The description of data files are as follows:
Three Stage Method
          As data warehouse data are highly aggregated, very          Disease Table
complex relationships are constructed from various data               No. of records: 6
sources. The process that is responsible for exactly that
transformations is called Extraction Transfer and                              We have taken five Mosquito Borne Diseases such
Loading(ETL) process and handles getting data out of one              as Malaria, Dengue, Chikungunya, Lymphatic Filariasis,
data store[extraction], modify it [transfer], and inserting it        and Japanese Encephalitis. Description of the table as
into a different data store[loading].                                 follows.
          Data are extracted from operational databases,
legacy systems and external data sources, transformed to              Disease Table (S. No                  Number,
match the DWH schema, and loaded into the data                                      Disease Id              Character,
warehouse database. Generally ETL is a complex                                      Disease Name            Character)
combination of processes and technology that consumes a               (S. No=0, No disease)
significant portion of the data warehouse development
resources and time. Further importance is placed on the ETL
                                                                      District Table
process due to the fact that it is not a one-time event, but          No. of records : 23 (Total number of districts in Andhra
staged periodically. Typical periodicity shows in monthly,            Pradesh)
weekly, or daily updates, depending on the purpose of the


                                                                241                                http://sites.google.com/site/ijcsis/
                                                                                                   ISSN 1947-5500
                                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                      Vol. 8, No. 5, August 2010




         Description of the table as follows.
District table (District Id          Character,
                District Name        Character)

Case–year Table
No. of records: Number of confirmed cases of a disease in
a particular year varies based on the disease and district.
          We have constructed 9 text files; each contains
data for a single year i.e. from 2000 to 2008.

Case Year Table (Case Id                        Character,
                 District Id                    Character,
                 Disease Id                     Character,
                 Blood Samples
                  Collected or Not              Logical,                                      Fig: 1
                 Year                           Number,
                 Disease Status                 Logical)                      Data from all the Case tables from 23 districts for a
                                                                     single year (for example 2009 data) are combined using link
Later we combine all the 9 text files into a single text file        collector into another sequential file by using Round Robin
called CASE_ENTIRE_YEAR text file.                                   Algorithm.

Case–District-2009 Table                                                 Text files for all
                                                                         Districts                                 Text file
Total No. of Records: 33601
         We have constructed 23 text files for 23 districts              Case _                                 Case _
each contain data for the year 2009. (These tables are for               Chittoor                               entire_year
the current year updation.)
                                                                         Case_                                  Case – id
Case District 2009(Case Id                Character,                                          Link
                                                                         Kadapa                                 District – id
                   Disease Id             Character,
                                                                                              Collector         Disease – id
                   Blood Samples Logical,                                                                       Blood –
                   Year                   Number,                    :            :                             samples
                   Disease Status         Logical)                                                              Year
                                                                         Case_
                                                                                                                Disease_
                                                                         Hyderab
Later we combine all the 23 districts text files into a single                                                  status
                                                                         ad
text file called CASE_HISTORY text file.

Death Table
No. of records: 1200                                                 Case_2009 Job

Death table (Case Id                 Character,
             District Id             Character,
             Disease Id              Character,
             Blood Samples
              Collected or not       Logical,
             Year                    Number,
             Disease Status          Logical,
             Death Id                Character)



5.2.3 Stage3-LoadingStage(Clinical Warehouse
creation)
                                                                                              Fig: 2
         Data from all the text files was extracted and stored
in an Oracle file, while transforming primary keys are to be         The resulted sequential table is then transformed to Oracle
specified in the oracle table.                                       table with the same attributes.

                                                                              Data from all the Case-year tables for 9 years
                                                                     (2000 to 2008) are combined using link collector into
                                                                     another sequential file by using Round Robin Algorithm.

                                                               242                                        http://sites.google.com/site/ijcsis/
                                                                                                          ISSN 1947-5500
                                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                      Vol. 8, No. 5, August 2010




                                                                     Query applied:
         Text files for 2000 to        Text file                     Select Count (*), District_id, Year
         2008 years data
                                                                         From Case_table
  Case _                                                                 Where blood_samples = ‘y’
  2000_                                                                  Group by District_id, Year;
  year
                                   Case _ history

  Case_                            Case – id
  2001_                Link        District – id
  year                 collector   Disease – id
                                   Blood –
     :                             samples
                                   Year
  Case_                            Disease - status
  2008_
  year


Case_History Job


                                                                                                Fig: 4
                                                                     Fact 2 Table
                                                                     Hash file 4: To find Distinct District_id, Disease_id from
                                                                     Case table
                                                                     Query applied:

                                                                     Select distinct District_id, Disease_id, Year

                                                                          From Case_table;




                       Fig: 3

         The resulted sequential table is then transformed to
Oracle table with the same attributes.

Fact 1 Table
Hash file 1: To find the death count we apply query on the
Death table

Query applied:

 Select Count (*), District_id, Disease_id, Year                                           Fig: 5
     From Death_table
     Group by District_id, Disease_id, Year;                                   All the 4 Hash tables, Oracle District table and
                                                                     Disease table are combined and transformed to create final
Hash file 2:        To find case count we apply query on the         fact table. In the transformation process we apply the
Case table                                                           following queries on the Oracle Disease table and District
                                                                     table to include the names of the Disease and Districts in the
Query applied :                                                      final fact table.
                                                                     Query applied on Disease table:
Select Count (*), District_id, Disease_id, Year                       Select Disease_tb. Disease_id, Disease_tb.Disease _name
    From Case_table                                                         From pro.Disease_tb Disease_tb
   Group by District_id, Disease_id, Year;                                  Where Disease_tb. Disease_id=:1;

Hash file 3: To find total blood samples collected we apply
query on the Case table



                                                               243                                  http://sites.google.com/site/ijcsis/
                                                                                                    ISSN 1947-5500
                                                  (IJCSIS) International Journal of Computer Science and Information Security,
                                                  Vol. 8, No. 5, August 2010




Query applied on District table:                                 Sample Report produced :
Select District_tb.District_id, District_tb. District _name
        From pro.District_tb, District_tb
        Where District_tb.District_id =:1;
Final Fact Job:




                                                                                                  Fig: 8
                                                                 Conclusions
                                                                           Modern tools now come in handy to address the
                                                                 issues on the disease surveillance, control, monitoring and
                                                                 evaluation, where should health care centers to be situated
                          Fig: 6
                                                                 and what services should they offer. Monitoring and
5.2.4 Data Model:                                                evaluation are an essential part of the health programme as
         Data modeling technique we used is Star schema.         well as other programmes related to development. Hence,
The advantage of Star schema is that it is easy to               there is a need to sensitize the public about epidemic
understand, easy to define hierarchies, reduces the number       diseases. This stressed the need to construct the data
of physical joins, and requires low maintenance and simple       warehouse for prevention, early detection and to take
Meta data. The actual data model for this data warehouse is      control measures. This data keeps us aware and forearmed
as follows.                                                      to prevent such attacks in future.
                                                                           The work is concentrated towards to build the data
                                                                 warehouse. Due to the time limitation, the current history
                                                                 file is constructed only based on the data from 2000 to 2009.
                                                                 This data to be extend regularly with the availability of the
                                                                 next year data. This data warehouse is for the future use of
                                                                 the researchers, academicians, Doctors, Health workers and
                                                                 Govt. servants.

                                                                 References
                                                                 [1]  Sid Adelman, Larissa Moss, “Data Warehouse Project Management”,
                                                                                 2000.
                                                                 [2] Codd E.F., “The Clinical Data Warehousing”, 2001.
                                                                 [3] www.etl-tools.info/en/bi/etl-process.htm
                                                                 [4] Jonathan G. Geiger, “The Data Warehouse Model”, 2000.
                                                                 [5]    Michael Haisten, “The real-time data warehouse: the next stage in
                                                                                 Data warehouse evolution”, 1999.
                                                                 [6] Harry Singh, “Interactive Data Warehousing”, PHI, 1994.
                                                                 [7] William Inmon, “Building the Data Warehouse”, 2nd ed, John Wiley,
                                                                                 New York, 1996.
5.3 Reporting Tool:                                              [8] “Clinical Data Warehouse Functionality” Peter Villiers, SAS
         Data Stage ETL Tool is used to create data                              Institute Inc., Cary, NC
warehouse .The final fact table produced from ETL Tool           [9] Laura Hadley, “Developing Data warehousing Architecture”, 2001.
                                                                 [10] Steven R. Meyer, “Which ETL Tool is Right for You?”, DM Review,
will be given to the Reporting Tool, which will produce                          Mar 2001.
Reports . Reporting Tool we used is Business Objects.            [11] Nigel Pendse, “What is OLAP? The Codd Rules and Features”, 2008.
Requested sample query:                                          [12] National Center for Infectious Diseases, Division of Parasitic Disease.
                                                                 [13] DJ. Power, “A Brief History of Decision Support Systems”, 2002.
                                                                 [14] Ralph Kimball, “The Data Warehouse Toolkit”, 1994.
                                                                 [15] Warren Thornthwaite, “Understanding Data Warehouse Architecture
                                                                                 requirements”, 2000.
                                                                 [16] Usha Rani M., Jyothi S., Rama Sree R. J., “Data Warehousing and
                                                                                 Data Mining”, Ikon Books, 2009.




                 Fig: 7


                                                           244                                      http://sites.google.com/site/ijcsis/
                                                                                                    ISSN 1947-5500

				
DOCUMENT INFO
Description: Vol. 8 No. 5 August 2010 International Journal of Computer Science and Information Security Publication August 2010, Volume 8 No. 5 (Download Full Journal) (Archive)