Clinical Data Warehouse on Insect Vector Diseases to Human of Andhra Pradesh
Description
Vol. 8 No. 5 August 2010 International Journal of Computer Science and Information Security Publication August 2010, Volume 8 No. 5 (Download Full Journal) (Archive)
Document Sample


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 5, August 2010
Clinical Data Warehouse on Insect Vector Diseases to Human of
Andhra Pradesh
1
Dr.M. Usha Rani, 2M.Kalpana Devi, 3 Dr.D.M. Mamatha
4
Dr.R.Seshadri , 5Yaswanth Kumar.Avulapti
1
Associate Professor, 2Research Scholar, Dept. of Computer Science,
3
Associate Professor, Dept. of Seri-Bio Sciences, SPMVV, Tirupati
4
Director, S.V.U.Computer Center S.V.University, Tirupati
5
Research Scholar, Dept of Computer Science ,S.V.University, Tirupati
ABSTRACT of India, out of which 7,52,245 were from Karnataka alone.
The Widespread of Insect Vector Diseases to Impact on disease spread includes socio-economic aspects,
humans is causing substantial morbidity and economic loss clinical attendance and barriers to health care and lack of
to our nation. The year 2006 is likely to go down as one of awareness to control the diseases. This stressed the need to
the worst years in terms of public health, which has track the relevant information, the various aspects and data
witnessed a high incidence of Insect Vector Diseases such about these diseases.
as Malaria, Chikungunya, Dengue, Lymphatic Filariasis,
And Japanese Encephalitis. This stressed the need to track 3. Significance of the Work
the relevant information about these diseases. The reliable The epidemic diseases are a great threat to India
and quickly retrievable clinical data on disease wise is a and there is a need to construct the data warehouse for
need of the hour with which planners can prepare their prevention, early detection and to take control measures.
strategies to control and curb the diseases. From the There is a need to aware the public about epidemic diseases.
aforesaid point of view this particular data warehouse The information given by data warehouse is useful to the
(DWH) going to be handy to the planners. researchers, academicians, doctors, health workers and
Govt. servants including common man. This data keeps us
Key Words: Insect Vector diseases, Chikungunya, aware and forearmed to prevent such attacks in future.
Malaria, Dengue, Lymphatic Filariasis & Japanese
Encephalitis, Clinical data, Data warehouse 4. Objectives
This work is proposed to be undertaken with the
1. Introduction following objectives:
The epidemic diseases are a threat to the society Persons at the helm of affairs at central Government in
starting from the stone age to till date. Even though we have general and State Government in particular are worsely
good past experience about epidemic diseases but the in need of disease wise clinical data to equip
problems are not handled in a proper way. The control of themselves with corrective cum counter strategies. The
these diseases involves control of three living beings and reliable and quickly retrievable clinical data on disease
their environment viz. man-the host, mosquito-the vector wise is a need of the hour with which planners can
and the deadly pathogen-the parasite. Since the vector and prepare their strategies to control and curb the diseases
the pathogen are highly adaptable, much of the emphasis is from this point of view this particular data warehouse
on man i.e. bringing the awareness in public related to the going to be handy to the planners.
insect vector diseases. This data warehouse is for the future use of the
The National and International efforts over these researchers, academicians, Doctors, Health workers and
Insect Vector Diseases control were highly successful in late Govt. servants including common man. This data keeps
1950’s and the early 60’s. However, due to various reasons us aware and forearmed to prevent such attacks in
the control programs received setbacks all over the world future.
and today it has come back with vengeance. Present The data warehouse and analysis reports will be made
epidemic of Chikungunya in India after a gap of 30 years, is publicly available for further research.
the largest ever in the world, with over 1.3 million people
affected. For other mosquito borne diseases there has been a 5. Data Warehouse
threefold increase in Japanese Encephalitis since 2001. Data Warehousing is a buzz-phrase that has taken
Malaria infects 2 million Indians annually. It is time to the information systems’ world by storm. A data warehouse
address the research on these lines to explore, where the (DWH) can be looked at as an “informational database” that
system fails in combating these diseases. is maintained separately from an organization’s operational
database. But that would fall short of the full technological
2. Origin of the Research Problem implications of the DWH term. The process of transforming
The widespread of Insect Vector diseases to data into information and making it available to the user in a
human is causing substantial morbidity and economic loss time bound manner to make a difference is known as data
to our nation. The year 2006, is likely to go down as one of warehousing.
the worst years in terms of public health, which has
witnessed a high incidence of Insect Vector diseases such In order to serve the decision making process of the
as Malaria, Chikungunya, Dengue, Japanese Encephalitis. management the data warehouse has to supply the following
The WHO regional office for South-East Asia has reported primary functionality:
1.3 million cases from 152 districts in 10 states/provinces The DWH is a reflection of the business rules of the
enterprise – not just of a specific function or business
240 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 5, August 2010
unit-as they apply to strategic decision support data warehouse. ETL also changes as the data warehouse
information. evolves, so ETL processes must designed for ease of
It is the collection point for the Integrated, Subject- modification.
Oriented strategic information that is handled by the Once the scope had been set, the relevant data has
data acquisition process. to be identified from the raw source data available, to
It is the historical store of strategic information, with formalize the approach of this task, a method which
the history relating to either the data or its relationships. transforms data from the raw data to the source of DWH has
It is the source of stable data regardless of how the three steps which are as follows:
processes may change. This requires a data model that First stage data is the raw data from operational
is not influenced by the operational processes creating database.
the data. Second stage data is transformed, cleansed and
Additionally the data warehouse provides as normalized from stage 1 data.
functionality for the support of ad hoc queries. Third stage data is further transformed from stage
2, optimized for final fact data representation.
5.1 The Clinical Data Warehouse “Data stage” software is an ETL Tool selected to
A clinical data warehouse or CDWH is a facility implement data warehouse.
that houses all electronic data collected at a clinical center.
For any modern clinical institute, it is necessary to separate 5.2.1 Stage 1- Raw Data
operational data from informational data by creating a Data collected from the Ministry of Health and
clinical data warehouse. A growing number of technologies Family Welfare, Hyderabad, Andhra Pradesh. This stage
for integrating and performing structured analyses of data data is considered as raw source data, which are of table
from disparate sources are competing to win the day for format. Data about all the five diseases such as Dengue
healthcare organizations. Table, Malaria Table, Chikungunya Table, Japanese
A CDWH is therefore a DWH tailored for the Encephalitis Table, and Filariasis Table are of the same
needs of users in a clinical environment, combining format. Description of the sample table is as follows.
information from a variety of legacy health-care databases
and cleansed operational data to form a centralized data Dengue Table (District Name Character
repository to answer the informational needs of all clinical Year Number
users. Total Blood Samples
Data warehouse in clinical context have Collected Number
traditionally been administrative in nature, focusing on Confirmed Cases Number
patient billing and patient-care management, organizational Number of Deaths Number)
aspect of hospitals that were optimized using data
warehouse technology not much different than 5.2.2 Stage 2 - Refined data
contemporary enterprises. Technology however evolved Based on the tables from Stage 1 certain design
quickly and more complex areas of clinical data decisions had to be made before any data modeling could
management could be tackled. The information technology commence. Questions like
supported collection process of clinical data has had a long What are the central facts?
history, and the promise of a new technology leveraging Which are the dimensions should be focused?
these collections put physicians, nurses and clinical Parallel to the modeling process, steps were taken
researchers right next to the administrators on the map. to get an idea of what information could be derived from
the data available. Stage one tables are used to construct text
5.2 Extraction, Transformation and Loading - files. The description of data files are as follows:
Three Stage Method
As data warehouse data are highly aggregated, very Disease Table
complex relationships are constructed from various data No. of records: 6
sources. The process that is responsible for exactly that
transformations is called Extraction Transfer and We have taken five Mosquito Borne Diseases such
Loading(ETL) process and handles getting data out of one as Malaria, Dengue, Chikungunya, Lymphatic Filariasis,
data store[extraction], modify it [transfer], and inserting it and Japanese Encephalitis. Description of the table as
into a different data store[loading]. follows.
Data are extracted from operational databases,
legacy systems and external data sources, transformed to Disease Table (S. No Number,
match the DWH schema, and loaded into the data Disease Id Character,
warehouse database. Generally ETL is a complex Disease Name Character)
combination of processes and technology that consumes a (S. No=0, No disease)
significant portion of the data warehouse development
resources and time. Further importance is placed on the ETL
District Table
process due to the fact that it is not a one-time event, but No. of records : 23 (Total number of districts in Andhra
staged periodically. Typical periodicity shows in monthly, Pradesh)
weekly, or daily updates, depending on the purpose of the
241 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 5, August 2010
Description of the table as follows.
District table (District Id Character,
District Name Character)
Case–year Table
No. of records: Number of confirmed cases of a disease in
a particular year varies based on the disease and district.
We have constructed 9 text files; each contains
data for a single year i.e. from 2000 to 2008.
Case Year Table (Case Id Character,
District Id Character,
Disease Id Character,
Blood Samples
Collected or Not Logical, Fig: 1
Year Number,
Disease Status Logical) Data from all the Case tables from 23 districts for a
single year (for example 2009 data) are combined using link
Later we combine all the 9 text files into a single text file collector into another sequential file by using Round Robin
called CASE_ENTIRE_YEAR text file. Algorithm.
Case–District-2009 Table Text files for all
Districts Text file
Total No. of Records: 33601
We have constructed 23 text files for 23 districts Case _ Case _
each contain data for the year 2009. (These tables are for Chittoor entire_year
the current year updation.)
Case_ Case – id
Case District 2009(Case Id Character, Link
Kadapa District – id
Disease Id Character,
Collector Disease – id
Blood Samples Logical, Blood –
Year Number, : : samples
Disease Status Logical) Year
Case_
Disease_
Hyderab
Later we combine all the 23 districts text files into a single status
ad
text file called CASE_HISTORY text file.
Death Table
No. of records: 1200 Case_2009 Job
Death table (Case Id Character,
District Id Character,
Disease Id Character,
Blood Samples
Collected or not Logical,
Year Number,
Disease Status Logical,
Death Id Character)
5.2.3 Stage3-LoadingStage(Clinical Warehouse
creation)
Fig: 2
Data from all the text files was extracted and stored
in an Oracle file, while transforming primary keys are to be The resulted sequential table is then transformed to Oracle
specified in the oracle table. table with the same attributes.
Data from all the Case-year tables for 9 years
(2000 to 2008) are combined using link collector into
another sequential file by using Round Robin Algorithm.
242 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 5, August 2010
Query applied:
Text files for 2000 to Text file Select Count (*), District_id, Year
2008 years data
From Case_table
Case _ Where blood_samples = ‘y’
2000_ Group by District_id, Year;
year
Case _ history
Case_ Case – id
2001_ Link District – id
year collector Disease – id
Blood –
: samples
Year
Case_ Disease - status
2008_
year
Case_History Job
Fig: 4
Fact 2 Table
Hash file 4: To find Distinct District_id, Disease_id from
Case table
Query applied:
Select distinct District_id, Disease_id, Year
From Case_table;
Fig: 3
The resulted sequential table is then transformed to
Oracle table with the same attributes.
Fact 1 Table
Hash file 1: To find the death count we apply query on the
Death table
Query applied:
Select Count (*), District_id, Disease_id, Year Fig: 5
From Death_table
Group by District_id, Disease_id, Year; All the 4 Hash tables, Oracle District table and
Disease table are combined and transformed to create final
Hash file 2: To find case count we apply query on the fact table. In the transformation process we apply the
Case table following queries on the Oracle Disease table and District
table to include the names of the Disease and Districts in the
Query applied : final fact table.
Query applied on Disease table:
Select Count (*), District_id, Disease_id, Year Select Disease_tb. Disease_id, Disease_tb.Disease _name
From Case_table From pro.Disease_tb Disease_tb
Group by District_id, Disease_id, Year; Where Disease_tb. Disease_id=:1;
Hash file 3: To find total blood samples collected we apply
query on the Case table
243 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 5, August 2010
Query applied on District table: Sample Report produced :
Select District_tb.District_id, District_tb. District _name
From pro.District_tb, District_tb
Where District_tb.District_id =:1;
Final Fact Job:
Fig: 8
Conclusions
Modern tools now come in handy to address the
issues on the disease surveillance, control, monitoring and
evaluation, where should health care centers to be situated
Fig: 6
and what services should they offer. Monitoring and
5.2.4 Data Model: evaluation are an essential part of the health programme as
Data modeling technique we used is Star schema. well as other programmes related to development. Hence,
The advantage of Star schema is that it is easy to there is a need to sensitize the public about epidemic
understand, easy to define hierarchies, reduces the number diseases. This stressed the need to construct the data
of physical joins, and requires low maintenance and simple warehouse for prevention, early detection and to take
Meta data. The actual data model for this data warehouse is control measures. This data keeps us aware and forearmed
as follows. to prevent such attacks in future.
The work is concentrated towards to build the data
warehouse. Due to the time limitation, the current history
file is constructed only based on the data from 2000 to 2009.
This data to be extend regularly with the availability of the
next year data. This data warehouse is for the future use of
the researchers, academicians, Doctors, Health workers and
Govt. servants.
References
[1] Sid Adelman, Larissa Moss, “Data Warehouse Project Management”,
2000.
[2] Codd E.F., “The Clinical Data Warehousing”, 2001.
[3] www.etl-tools.info/en/bi/etl-process.htm
[4] Jonathan G. Geiger, “The Data Warehouse Model”, 2000.
[5] Michael Haisten, “The real-time data warehouse: the next stage in
Data warehouse evolution”, 1999.
[6] Harry Singh, “Interactive Data Warehousing”, PHI, 1994.
[7] William Inmon, “Building the Data Warehouse”, 2nd ed, John Wiley,
New York, 1996.
5.3 Reporting Tool: [8] “Clinical Data Warehouse Functionality” Peter Villiers, SAS
Data Stage ETL Tool is used to create data Institute Inc., Cary, NC
warehouse .The final fact table produced from ETL Tool [9] Laura Hadley, “Developing Data warehousing Architecture”, 2001.
[10] Steven R. Meyer, “Which ETL Tool is Right for You?”, DM Review,
will be given to the Reporting Tool, which will produce Mar 2001.
Reports . Reporting Tool we used is Business Objects. [11] Nigel Pendse, “What is OLAP? The Codd Rules and Features”, 2008.
Requested sample query: [12] National Center for Infectious Diseases, Division of Parasitic Disease.
[13] DJ. Power, “A Brief History of Decision Support Systems”, 2002.
[14] Ralph Kimball, “The Data Warehouse Toolkit”, 1994.
[15] Warren Thornthwaite, “Understanding Data Warehouse Architecture
requirements”, 2000.
[16] Usha Rani M., Jyothi S., Rama Sree R. J., “Data Warehousing and
Data Mining”, Ikon Books, 2009.
Fig: 7
244 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Related docs
Other docs by ijcsis
Comparative Analysis between Split and HierarchyMap Treemap Algorithms for Visualizing Hierarchical Data
Views: 15 | Downloads: 0
Non-Preemptive Multi-Constrain Scheduling for Multiprocessor with Hopfield Neural Network
Views: 5 | Downloads: 0
Reliable Multipath Routing Protocol (RMRP) For Mobile Ad Hoc Networks Using Adaptive Video Compression
Views: 10 | Downloads: 1
Single CCTA-Based Four Input Single Output Voltage-Mode Universal Biquad Filter
Views: 36 | Downloads: 0
A Cloud Computing Architecture for E-Learning Platform, Supporting Multimedia Content
Views: 42 | Downloads: 0
Get documents about "