A Genetic Algorithmic Approach to the Identification of the Crucial Human Factors Involved in the Development of Secure Software

Document Sample
A Genetic Algorithmic Approach to the Identification of the Crucial Human Factors Involved in the Development of Secure Software Powered By Docstoc
					    International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 2, July – August 2012                                          ISSN 2278-6856

       A Genetic Algorithmic Approach to the
     Identification of the Crucial Human Factors
        Involved in the Development of Secure
                                                Sumithra A1, Dr E Ramraj2
                                                       Research Scholar,
                                           Madurai Kamaraj University, Madurai, India.
                                                       Associate Professor,
                                              Alagappa University, Karaikudi, India.

                                                                  such factors can greatly aid the Software Project Manager
Abstract: Software Project Managers of the day are being          in Security Risk Mitigation
increasingly confronted with the problem of managing the
security risks of the software they are attempting to engineer.   3. MOTIVATION
While Software Project Management itself is quite
challenging, Security Risk Management adds to the severity        The motivation for the research presented in this paper is
of the problem. Human Factors play an important role in           the seminal work of Vivanco et. el. Presented in [2] where
management and Software Security Risk Management is no            Vivanco et. al. apply a Genetic Algorithm to identify a
exception. The paper attempts to study the influence of           subset of metrics that clearly indicate the Maintainability
various Human Factors in the Security of the developed            of Java Classes. [2] claims that “given a set of objects
Software using a Genetic Algorithm.                               (object-oriented classes), with known features (source
Keywords: Human Factor, Genetic algorithm, Security               code metrics) and class labels (expert quality rankings)
Risk Management, Project Management.                              building a classifier that will be able to predict the quality
                                                                  of the software from it’s metrics is a classification
1. INTRODUCTION                                                   problem that lends itself to the application of genetic
As the usage arena of Software expands to include many            algorithms”.
security sensitive domains such as National Security and          The authors find a lot of similarities in the search of a
Financial Management, the Security Risks posed by the             subset of human factors that deeply influence the security
Software assume manifold proportions. Every Software              of the developed software and the search of a subset of
Project Manager has to deal with such Security Risks and          metrics that adequately characterize the maintainability of
Security Risk is too important a risk to be considered on         a software product – the problem addressed in [2]. Since
par with other risks. This necessitates the development of        [2] has successfully applied Genetic Algorithm to
frameworks and techniques tailored to manage Security             discover a subset of metrics that characterize the
Risks posed by the Software.                                      maintainability of a software product, the research
                                                                  attempts to use Genetic Algorithm to identify a subset of
                                                                  human factors that influence the Security of the developed
Human Factors play an important part in any
management. Security Risk Management can be no                    4. GENETIC ALGORITHM
exception to this. The Security of the Software developed         Genetic Algorithms that mimic the natural process of
tends to lean heavily on the quality of the people involved       evolution, to find solutions to problems, have been very
in its development. Even with the adoption of highly              promising in search and optimization problems. In
“secure” technologies and many sound principles of                Genetic Algorithms, a solution is represented by a set of
Secure Software Engineering, the importance of human              genes. Information is represented in the genes using
factors cannot be overstated.                                     various encoding methods. From an initial set of genes,
[1] lists many human factors that impinge on the Security         Genetic Algorithms attempt to discover the genes with
of the developed Software. While it is clear that human           the maximum fitness (very close to the correct solution)
factors play a key role in Software Security Risk                 by employing various methods like cross over where a
Management, it is always not clear as to which subset of          new gene is produced by combining information from 2
these factors heavily impact the Security of the developed        parent genes and mutation where information in a gene is
Software. A Clear understanding and identification of             changed.

Volume 1, Issue 2 July-August 2012                                                                                  Page 256
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 2, July – August 2012                                          ISSN 2278-6856

Even though Genetic Algorithms have been largely                   LOYA          Loyalty to the organization
employed in the bio-medical field [3,4,5], there has been          RESP          Willingness to learn from mistakes
a little attempt to try them in the domain of Software            The rating of the Software Project Manager with 10
Project Management. The dearth of the available                years experience in Java and 5 years in Management
literature clearly proves the claim. The application of        establishes the basis upon which the influence of the
Genetic Algorithms with respect to Software Engineering        various human factors is studied. Even with a little
has been in identification of fault-prone modules in the       subjectivity that is inherently involved in such ratings, the
engineering of highly reliable systems [6,7].                  skill and the experience of the Manager are expected to of
                                                               immense utility in reducing the subjectivity of the
5. RESEARCH METHODOLOGY                                        judgment.
A web site developed by a software organization in the         6. APPLICATION                OF      GA      TO      THE
town and that has been in operation for over 2 years is
analyzed for the number of security attacks reported on it.
                                                               In the developed Genetic Algorithm, genes are
A total of 211 Java Classes have been used in the web
                                                               represented as 10 bit strings – where a 0 indicates that the
site. Every Class has been developed by an individual
                                                               corresponding human factor is not present in the subset
developer. The Security Attacks have been traced to their
                                                               and a 1 indicates the opposite. The fitness function is
origin in the various classes and therefore, for all the 211
                                                               evaluated by using the Linear Discriminate Analysis
classes information on the number of attacks reported on
                                                               (LDA) – which is a classifier strategy proposed in [8] and
them is available.
                                                               used by [2] with the leave-one-out method of training and
The 211 classes are classified into 5 groups – a class
                                                               testing. This means that first a class is selected and
included in Group 1 has reported a large number of
                                                               training is done with all the remaining 210 classes and it
attacks on it while a class included in Group 5 has
                                                               is observed whether the selected class is correctly groped.
reported the least number of attacks. The classification
                                                               The process is repeated for all the 211 classes. The
into groups is done by selecting a threshold value for the
                                                               number of genes in a population was set at 200 and the
number of attacks for each group and comparing the
                                                               number of elite genes is fixed at 50. This means that in
number of attacks in the classes with the threshold values.
                                                               each generation after the genes are sorted in decreasing
The proportion of Classes in the 5 groups is shown below:
                                                               order of fitness, the top 50 genes are passed on to the next
                                                               generation as “elite” genes. A random probability P is
   Table 1: Proportion of Classes in various Groups
                                                               generated and from the remaining 150 genes, 2 genes
  10       68          101         23        9
                                                               with fitness greater or equal to P are selected as parent
  Group 1 Group2       Group 3 Group 4 Group 5                 genes. A cross-over point is selected at random and a new
                                                               offspring is produced by combining bits from both the
Interestingly the proportion of classes in the extreme 2       parents. If the generated probability is greater than 1 –
groups is abysmally low and the proportion of classes in       10%=0.9, the offspring is mutated by changing each bit
the intermediate groups – classes with “average” security      from 0 to 1 and vice versa. The process of cross-over and
is very high. This adds to the difficulty in identifying the   mutation are repeated until 150 new offsprings are
required subset of human factors that have a high impact       created and with the new population of the 50 elite genes
on the Security of the developed classes.                      plus the 150 offsprings the whole process is repeated for
The Project Manager is asked to rate the developers            200 generations. The classification rate was the number
involved in the development of the 211 classes on the          of times the class left out was correctly grouped.
following factors.
                                                               7. RESULT AND DISCUSSION
     Table 2: Human Factors taken into account
   COMPTE Competency in the Technology /                       To find the extent to which the performance of the
              Language (Java)                                  algorithm is affected by the selected parameters -
   EXPTE      Experience in the Technology /                   mutation rate (10%), population size (200), number of
              Language (Java)                                  generations (200) and the number of elite genes (50), the
   SECEXP     Exposure to the Security Principles of           process was repeated by changing these parameters and it
              Java by attending Security Training              was observed that while increasing the number of
              Programs                                         generations and the population size had a positive impact
   ABTM       Ability to work in a team                        on the performance of the algorithm, changing oter
   EXPDOM Experience in the Domain                             parameters where of little, if any, value.
   POLAW      Awareness of the Security Policies of the        The maximum classification rates obtained – 67.81%,
              Organization                                     64.97% and 62.15% - and the human factors represented
                                                               in the corresponding genes are shown below:
   MOT        Motivation
   COMMT      Personal Commitment to the success of
                                                                Table 3: Human Factors represented in the “successful”
              the project
Volume 1, Issue 2 July-August 2012                                                                              Page 257
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: Email:,
Volume 1, Issue 2, July – August 2012                                          ISSN 2278-6856

Human           67.81%          64.97%       62.15%                    Maintainability Using a Parallel Genetic
Factor                                                                 Algorithm, Proc. Of Genetic and Evolutionary
COMPTE          Yes             Yes          Yes                       Computation Conference – GECCO 2004, Seattle,
EXPTE                           Yes          Yes                       USA.
SECEXP          Yes                          Yes                 [3]   Nikulin A.E., Dolenko B., Bezabeth T., Somorjai
ABTM                            Yes                                    R.J., NMR Biomed. Near-Optimal Feature
MOT             Yes                          Yes                       Selection for Feature Space Reduction, Novel
COMMT           Yes             Yes          Yes                       Preprocessing Methods for Classifying MR
                                                                       Spectra, Vol. 11, 1998, pp 209-216.
LOYA                            Yes
                                                                 [4]   Yang J., Honavar V., Feature Subset Selection
EXPDOM          Yes             Yes          Yes
                                                                       Using a Genetic Algorithm, IEEE Intelligent
RESP                            Yes
                                                                       Systems, vol. 13, 1998, pp 44-49.
                                                                 [5]   Raymer M.L., Punch W.F., et. al., Dimensionality
It is interesting to note the human factors represented in             Reduction Using Genetic Algorithms, IEEE Trans.
all the 3 genes – Competency in the technology seems to                On Evolutionary Computation, Vol.4, 2000, pp
be a very important contributor to the security of the                 164-171.
developed classes, even more important than the                  [6]    Hochman R., Khoshgoftaar T.M., Allen A.B.,
experience in the technology. Experience in the domain                 Hudepohl J.P., Using the Genetic Algorithm to
of Web Site Development also seems to heavily impact                   Build Neural Networks for Fault-Prone Module
the security. This captures our intuition that people                  Detection, Proc. Of 7 th IEEE International
experienced in developing web sites are also likely to be              Symposium on Software Reliability Engineering,
aware of the various security issues that could arise. The             New York, 1996, pp 152-162.
most unexpected and interesting candidate represented in         [7]   Liu Y., Khoshgoftaar T.M., Genetic Programming
all the 3 genes was Personal Commitment to the success                 Model for Software Quality Classification, Proc. Of
of the project. Often, this factor is under estimated and              6 th IEEE International Symposium on High
placed on lower priority list by most managers when                    Assurance Systems Engineering, 2001.
compared to other “important” factors like Experience in         [8]    Duda C.D., Hart P.E., Stork D.G., Pattern
the technology, attending security training workshops etc.             Classification, Wiley & Sons, New York, USA,
A Genetic Algorithm was applied to the problem of              AUTHOR
identifying the most significant human factors that
contribute to the security of the developed software. It was                A.Sumithra received the BE from Sethu
observed that Competency in the Technology, Experience                      Institute of Engineering. and M.Tech.
in the Domain and Personal Commitment to the success                        degrees in Information Technology from
of the project dominate other factors in influencing the                    Kalasalingam University in 2008 and
security of the developed Software. Software Project                        2010, respectively. At Present working in
Managers interested in Security Risk Mitigation must           Velammal College Of Engineering and Technology..
focus on these areas.
The subjectivity involved in the research, the assessment
of the Software Project Manager of the various developers
can be criticized but the authors believe that when it
comes to analyzing human factors, such subjectivity is
Other Training Methods can be tried instead of LDA
employed by the research. The method can be applied to
many more projects to improve the credibility of the
stated results.

  [1] Shameful Islam, Wei Pong, Human Factors in
      Software Security Risk Management, Proc. Of the
      First International Workshop On Leadership and
      Management in Software Architecture, New York,
      USA, 2008, pp 13-16.
  [2] Vivanco, Rodrigo and Pizzi, Nicolino, Finding
      Effective    Software   Metrics   to   Classify

Volume 1, Issue 2 July-August 2012                                                                             Page 258

Description: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) is an online Journal in English published bimonthly for scientists, Engineers and Research Scholars involved in computer science, Information Technology and its applications to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the IJETTCS are selected through rigid peer review to ensure originality, timeliness, relevance and readability. The aim of IJETTCS is to publish peer reviewed research and review articles in rapidly developing field of computer science engineering and technology. This journal is an online journal having full access to the research and review paper. The journal also seeks clearly written survey and review articles from experts in the field, to promote intuitive understanding of the state-of-the-art and application trends. The journal aims to cover the latest outstanding developments in the field of Computer Science and engineering Technology.