DATA MINING WITH HUMAN GENETICS TO ENHANCE GENE BASED ALGORITHM AND

Document Sample
DATA MINING WITH HUMAN GENETICS TO ENHANCE GENE BASED ALGORITHM AND Powered By Docstoc
					 International Journal of Computer Engineering and Technology ENGINEERING
 INTERNATIONAL JOURNAL OF COMPUTER (IJCET), ISSN 0976-
 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
                          & TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)                                                    IJCET
Volume 4, Issue 3, May-June (2013), pp. 176-181
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
                                                                        ©IAEME
www.jifactor.com



     DATA MINING WITH HUMAN GENETICS TO ENHANCE GENE
        BASED ALGORITHM AND DNA DATABASE SECURITY

                                   Vijay Arputharaj J
                     Research Scholar, Department of Computer Science,
                             Karpagam University, Coimbatore,
                                     Tamil Nadu, India
                                  Dr.R.Manicka Chezian
                    Associate Professor, Department of Computer Science,
                                NGM College (Autonomous),
                                 Pollachi,Tamil Nadu, India



 ABSTRACT

          The goal of data mining in DNA Database is to check some possible combinations of
 DNA sequences and to generate a common sympathetic code or algorithm to formulate the
 sequence on mutations. Since the data mining is the best technique to analyze and extract the
 data, it is also helpful to formulate the common algorithm.
          Data mining in the area of study on human genetics, an important goal is to
 understand the mapping relationship between the inter-individual variation in human DNA
 sequences and variability in disease, mutation susceptibility. In lay terms, it is used to find
 out how the changes in an individual's DNA sequence affect the risk of developing common
 diseases and mutations with high level security. This investigation also helps in parental
 identification algorithms for DNA sequences, genome expressions. Data mining, data
 extraction techniques are used to understand the need for analyses of large, complex,
 information-rich data sets in DNA Sequences.
          Regulation of gene expression includes the processes that cells and viruses use to
 regulate the way that the information in genes is turned into gene products. An important
 challenge in use of large scale gene expression data for biological classification occurs when
 the expression dataset being analyzed involves multiple classes. To overcome this kind of
 problems data mining is used.



                                              176
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

Key Words- Data mining, DNA Database, DNA Sequence, Gene Expression, Biological
classification, Multiple class

1. INTRODUCTION

         The Human Genome Task or Project is a worldwide scientific study mission with a
main aim of formative the succession of chemical base pairs which structure DNA, also to
identify and map the genes of the human genome from the corporeal and serviceable position.
A DNA database or DNA databank is a database of contains all DNA data. A DNA
Databank can be used in the analysis of parental comparison, genetic diseases, genetic
fingerprinting for criminology, genetic genealogy etc.
         Data mining in the area of human genetics, an important goal is to understand the
mapping relationship between the individual variation in human DNA sequences and
variability in various algorithms for database security issues, for mutation susceptibility and
parental identification differences. In our country India which is solidly populated there is
huge need for DNA databases which may help in stopping different types of fraud as like
Passport fraud, Other fraud etc.
         Data mining, data extraction techniques are used to understand the need for analyses
of large, complex, information-rich data sets in DNA Sequences. Several visualizations and
data mining techniques are already available, and they are used to validate and attempt to
discover new methods for differentiating DNA sequences or exons, from non-coding DNA
sequences or introns. Since the data mining is the best technique to analyze and extract the
data, it is also helpful to formulate the common algorithm.

2. LITERATURE STUDY

  2.1 INTERNATIONAL STATUS
        In northern countries data exploration techniques designed to classify DNA
sequences, many different classification techniques including rule-based classifiers and
neural networks. It is used visualization of both the original data and the results of the data
mining to help verify patterns and to understand the distinction between the different types of
data and classifications.
        Forensic identification problems are examples in which the study of DNA profiles is a
common approach. Here we present some problems and develop their treatment putting the
focus in the use of Object-Oriented Bayesian Networks - OOBN. The use of DNA databases,
which began in 1995 in England, has created new challenges about its use. In Portugal, the
legislation for the construction of a genetic database was defined in 2008. Cryptographic,
Authentication and High Definition Security approaches for databases are used for several
countries like Thailand, US, UK etc

  2.2 NATIONAL STATUS
        Genetic features and environmental factors which were involved in multi factorial
diseases. data mining tools were required and we proposed a 2-Phase approach using a
specific genetic algorithm. For the first phase, the feature selection problem, we used a
genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms
had been introduced in the genetic algorithm such as sharing, random immigrant, dedicated
genetic operators and a particular distance operator had been defined. Then, the second phase,

                                             177
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

a clustering based on the features selected during the previous phase, will use the clustering
algorithm k-means.
   INDIA CHENNAI: The FBI has a DNA index system. The UK has a similar database. And
if Parliament passes the DNA Profiling Bill, 2007, India will soon join the league, creating a
national DNA database that will help police arrest serial offenders and give a boost to
forensic investigation. The bill, drafted and sent to all ministries and departments for their
feedback, has been modified. The final version has been sent to the law ministry, which has
sent it to the legal department for final drafting,

 2.3 SIGNIFICANCE OF THE STUDY
      • The important significance of this research is useful for entire society, the identity
        of the citizen can be stored thru the Secured DNA Database, Which might not
        contain any fraud like passport fraud, Ration card fraud etc.
      • This research advances and aids in criminal and forensic databases, This
        application is also useful for the government and for the society
      • This research is primarily deals with the advancement of genetic algorithm with
        proper security features in DNA Databases and it enhances the special features in
        DNA database security.

3. RESEARCH STUDY AND DEVELOPMENT

3.1 AIMS AND OBJECTIVES
    • To Enhance Database Security
This research is primarily deals with the advancement of genetic algorithm with proper
security features in DNA Databases and it enhances the special features in DNA database
security.
    • Mapping relationships in DNA sequences and variability in disease, mutation
        susceptibility
    • Effective Solution in parental identification algorithms for DNA sequences, genome
        expressions.

3.2 MATERIAL AND METHODS
  1. Data mining and information retrieval
  2. Visual Analytics and Collaboration
  3. Combination of Parallel algorithms for sequence analysis
  4. Seamless high-performance computing
  5. Security Algorithms
        a) Reverse Encryption algorithm to protect data
        b) Advance Cryptography algorithm to protect data
        c) Advanced Encryption Standard (AES)

        The above methodologies the Data mining technique is used for knowledge
discovery from entire DNA Database, There can be three levels of genome data mining. The
simplest is an in-depth analysis of the result from a single query using a genome browser. In
this level, one may start with a gene or marker name, or by mapping a sequence to the
genome. Cross comparison of various annotation 'tracks' may help make sense of the query
region. This is the most popular use of any genome browser. Data mining is opposite to the

                                             178
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

information retrieval in the sense, it does not based on predetermine criteria; it will uncover
some hidden patterns by exploring our data.
         Visual Analytics, Parallel algorithms are used in the implementation of security issues
in the database.
Seamless High performance computing is connects with speed of access in the database
records Information retrieval is what based on predetermine criteria, like you are interested in
retrieving group of certain peoples belongs to certain class, having certain mortgage plan, or
having certain characteristics which you already know.
         Cryptography is usually referred to as "the study of secret", while nowadays is most
attached to the definition of encryption. Encryption is the process of converting plain text
"unhidden" to a cryptic text "hidden" to secure it against data thieves. This process has
another part where cryptic text needs to be decrypted on the other end to be understood.
         In the broad meadow of cryptography, encryption is the procedure of indoctrination
letters (or information) within such a method that hackers cannot understand writing it, other
than that approved parties only can used it.
         In an encryption scheme, the memorandum or information, it is also called as plain
text; this text is encrypted using an encryption algorithm, turning it into an unreadable cipher
text. This is usually done with the use of an encryption key, which specifies how the message
is to be encoded. After that decryption is also done by the authorized party.
         Encryption is a method of hiding data so that it cannot be read by anyone who does
not know the key. The key is used to lock and unlock data. To encrypt a data one would
perform some mathematical functions on the data and the result of these functions would
produce some output that makes the data look like garbage to anyone who doesn't know how
to reverse the operations.
         The Advanced Encryption Standard (AES) is a measurement for the encryption of
electronic records which is conventional scheme by the U.S.National Institute of Standards
and Technology (NIST) in 2001,

 STEPS:
  1. KeyExpansion—round keys are derived from the cipher key using Rijndael's key
     schedule.
  2. InitialRound
         1. AddRoundKey—each byte of the state is combined with the round key using
             bitwise xor.
  3. Rounds
         1. SubBytes—a non-linear substitution step where each byte is replaced with
             another according to a lookup table.
         2. ShiftRows—a transposition step where each row of the state is shifted
             cyclically a certain number of steps.
         3. MixColumns—a mixing operation which operates on the columns of the state,
             combining the four bytes in each column.
         4. AddRoundKey
  4. Final Round (no MixColumns)
         1. SubBytes
         2. ShiftRows
         3. AddRoundKey



                                              179
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

3.3 FINDINGS
      • The DNA aging & sequencing’s success in sequencing the chemical bases of DNA
        is almost transformed accord to the biological changes in age. It is form new
        knowledge about fundamental biological processes. The initial segment of the task,
        called mapping, it has fragmented the chromosomes into groups as a combined set
        of regulated expressions. High Data mined Processors can be used to point out the
        location of these grouped genes and expression of genes.
      • Age correlated with an increasing percentage of sperm with highly damaged DNA
        (range: 0–83%) and tended to inversely correlate with percentage of apoptotic
        sperm (range: 0.3%–23%).
      • Gene mutations prevent one or more of these proteins from working properly. By
        changing a gene’s instructions for making a protein, a mutation can cause the
        protein to malfunction or to be missing entirely. When a mutation alters a protein
        that plays a critical role in the body, it can disrupt normal development or cause a
        medical condition. A condition caused by mutations in one or more genes is called
        a genetic disorder

      •   FUTURE OF GENOMIC RESEARCH
          Develop and apply genome-based strategies for the early detection, diagnosis, and
          treatment of diseases
          Develop new technologies to study genes and DNA on a large scale and store
          genomic data efficiently

5. RESULT AND DISCUSSION

       It is form new knowledge about fundamental biological processes. High Data mined
Processors can be used to point out the location of these grouped genes and expression of
genes. The various algorithms and ideas are identified for DNA Database security also.

 AGE CORRELATION
    • Age correlated with an increasing percentage of sperm with highly damaged DNA
      (range: 0–83%) and tended to inversely correlate with percentage of apoptotic
      sperm (range: 0.3%–23%).
    • The DNA aging & sequencing’s success in sequencing the chemical bases of DNA
      is almost transformed accord to the biological changes in age. It is form new
      knowledge about fundamental biological processes. The initial segment of the task,
      called mapping, it has fragmented the chromosomes into groups as a combined set
      of regulated expressions. High Data mined Processors can be used to point out the
      location of these grouped genes and expression of genes.

6. CONCLUSION

       The successful module in aging sequences of DNA genome expressions achieved
completely. The research process is yet to achieve further goals and objectives in disease,
mutation susceptibility, and parental modules with DNA Database security



                                            180
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

REFERENCES

[1]    B. Figg. (2004). Cryptography and Network Security. Internet:
       http:/www.homepages.dsu.edu/figgw/Cryptography%20&%20Network%2
       0Security.ppt.[March 16, 2010].
[2]    A. Kahate, Cryptography and Network Security (2nd ed.). New Delhi: Tata McGraw Hill, 2008.
[3]    M. Milenkovic. Operating System: Concepts and Design, New York: McGrew-Hill, Inc., 1992.
[4]    P.R. Zimmermann. An Introduction to Cryptography. Germany: MIT press. Available:
       http://www.pgpi.org/doc/pgpintro, 1995, [March 16, 2009].
[5]    W. Stallings. Cryptography and Network Security (4th ed.). Englewood (NJ):Prentice
       Hall,1995.
[6]    V. Potdar and E. Chang. “Disguising Text Cryptography Using Image Cryptography,”
       International Network Conference, United Kingdom: Plymouth, 2004.
[7]    S.A.M. Diaa, M.A.K. Hatem, and M.H. Mohiy (2010). “Evaluating The Performance of
       Symmetric Encryption Algorithms” International Journal of Network Security, 2010, 10(3),
       pp.213-219
[8]    T. Ritter. “Crypto Glossary and Dictionary of Technical Cryptography’. Internet:
       www.ciphersbyritter.com/GLOSSARY.HTM , 2007, [August 17, 2009]
[9]    K.M. Alallayah, W.F.M. Abd El-Wahed, and A.H. Alhamani.“Attack Of Against Simplified
       Data Encryption Standard Cipher System Using Neural Networks”. Journal of Computer
       Science,2010, 6(1), pp. 29-35.
[10]    D. Rudolf. “Development and Analysis of Block Cipher and DES System”.
       Internet:http://www.cs.usask..ca/~dtr467/400/, 2000, [April 24, 2009]
[11]   H. Wang. (2002). Security Architecture for The Teamdee System. An unpublished MSc
       Thesis submitted to Polytechnic Institution and State University, Virginia, USA.
[12]   G.W. Moore. (2001). Cryptography Mini-Tutorial. Lecture notes University of Maryland
       School of Medicine. Internet: http://www.medparse.com/whatcryp.htm [March16, 2009].
[13]   T. Jakobsen and L.R. Knudsen. (2001). Attack on Block of Ciphers of Low Algebraic
       Degree. Journal of Cryptography, New York, 14(3), pp.197-210.
[14]   N. Su, R.N. Zobel, and F.O. Iwu. “Simulation in Cryptographic Protocol Design and Analysis.”
       Proceedings 15th European Simulation Symposium, University of Manchester, UK., 2003.
[15]   Dr.R.Manicka Chezian, and Dr.T.Devi. “Termination of triggers in active databases”
       International Journal of Information Systems and Change Management, USA, Vol-5, No-3 PP
       251-266, 2011
[16]   Dr.R.Manicka Chezian, and Dr.T.Devi. “A new algorithm to detect the non termination of
       triggers in active databases” International Journal of Advanced Networking and Applications,
       Vol-3, Issue-2 PP 1098-1104, 2011
[17]   Dr.R.Manicka Chezian, and P.M.Nishad “A vital approach to compare the size of DNA
       sequence using LZW with fixed length binary code and tree structures”, International Journal of
       Computer Applications, Vol-3, No-1, PP 7-9, 2012
[18]   Dr.R.Manicka Chezian, and C.Bagyalakshmi “A survey on cloud data security using encryption
       technique” International Journal of Advanced Research in Computer Engineering and
       Technology, Vol-1, Issue-5, PP 263-265, 2012.
[19]   B.Saichandana, Dr.K.srinivas and Dr. Reddi Kiran Kumar, “Visual Cryptography Scheme for
       Color Images”, International Journal of Computer Engineering & Technology (IJCET),
       Volume 1, Issue 1, 2010, pp. 207 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[20]   Ahmad Salameh Abusukhon, “Block Cipher Encryption for Text-To-Image Algorithm”,
       International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3,
       2013, pp. 50 - 59, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.




                                                 181

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:6/15/2013
language:
pages:6