Document Sample
					  International Journal of JOURNAL OF COMPUTER (IJCET), ISSN 0976-
 INTERNATIONALComputer Engineering and Technology ENGINEERING
  6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
                          & TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)                                                     IJCET
Volume 4, Issue 3, May-June (2013), pp. 531-538
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)

                   AND N.L.P TECHNIQUES

             V.Sujatha1, K.Sriraman2,K. Ganapathi Babu3, B.V.R.R.Nagrajuna4
        Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science,
                              Vadlamudi Guntur Dist., A.P., India.
        Assoc.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science,
                              Vadlamudi Guntur Dist., A.P., India.
    Pursuing M.Tech in CSE at Vignan's LARA Institute Of Technology & Science, Vadlamudi
                                    Guntur Dist., A.P., India.
    Pursuing M.Tech in CSE at Vignan's LARA Institute Of Technology & Science, Vadlamudi
                                    Guntur Dist., A.P., India.


          Software engineering is the process in developing a software .this gives a challenging
  task for developers. Developers follow software development life cycle models for getting
  better results. In this, Testing phase plays a vital role in the entire software development life
  cycle. Testing is nothing but reducing bugs and developing software with quality. This paper
  deals with a new approach of search based testing and test case generation by implementing
  fuzzy logic with natural language processing.

  Index Terms: Test case generation, Fuzzy logic, Natural Language Processing.


          Testing[12] is the process of searching the bugs and errors and correcting them and
  satisfying the user requirements in all aspects and after delivery of the product and the stake
  holder should not face any problem. Testing process examines the code in all environments
  for getting better results. Testing mainly focuses on usability, scalability, performance,
  compatibility and reliability. Testing compares the behavior of the product against all odds by
  principles and mechanism it recognizes the problems occurred. We have many testing
  techniques for search based testing[3] , test case generation [1],test suite generation[12] ,test
  case reduction[12] ,automatic fault finding[12] etc. In this paper we mainly concentrate on

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

the search based testing [3] and automatic test case generation[1], automatic faults finding
using fuzzy logic with natural language processing techniques by using this getting efficient

                           Fig1: overview of this testing technique


        Testing may be either dynamic testing[12] and static testing[12] .we have two
software engineering testing techniques they are black box testing[12] and white box
testing[12]. Black box testing is the functional and white box testing is the structural
testing[12]. White box testing chooses inputs to exercise paths through the code and
determine the appropriate outputs test paths within a unit.
        To know the paths between units during integration and between subsystems are by
the system–level test only. White-box test design techniques includes Control flow
testing[11], Data flow testing[12], Branch testing[11], Path testing[12], Statement
coverage[11], Decision coverage where black box testing is a method of testing which
examines the functionality of an application without peering into its internal structures or
workings. Test cases[11] are builtaround specifications and requirements, that is what the
application is supposed to do. Test cases are generally derived from external descriptions of
the software, including specifications, requirements and design parameters. Although the
tests are performed primarily functional [12]in nature, and also non-functional tests[11]. The
test designer selects both valid and invalid inputs and determines the correct output without
any knowledge of the test object's internal structure.


        Gray-box[12] is the combination of white-box testing and black-box testing.. The aim
of this testing is to search for the defects if any due to improper structure or improper usage
of applications. Gray-box testing is also known as translucent testing[12]. Gray-box testing is
based on requirement test case generation because it presets all the conditions before program
is tested by using assertion method[11]. Requirement specification language is used to state
the requirements which make easy to understand the requirements and verify its correctness
too where input for requirement test case generation is the predicates and the verification
discussed in requirement specification language.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

Gray-box testing techniques are:
•        Matrix Testing[11]: states the status report of the project.
•        Regression testing[12]: it implies rerunning of the test cases if new changes are made.
•        Pattern Testing[11]: verify the good application for its design or architecture and
•        Orthogonal array testing[12]: used as subset of all possible combination.
Gray-box testing is suited for functional or business domain testing. Functional testing is
done basically a test of user interactions with may be external systems. As gray-box
testing[can efficiently suits for functional testing due to its characteristics; it also helps to
confirm that software meets the requirements defined for the software[12]
          Test suite[11], less commonly known as a validation suite, is a collection of test
suites that are intended to be used to test a software program to show that it has some
specified set of behaviors. A test suite often contains detailed instructions or goals for each
collection of test cases and information on the system configuration to be used during testing.
The tester should be aware of what the software is supposed to do but is not aware of how it
does it. For instance, when the tester is aware that a particular input returns a certain
invariable output when the tester is not aware of how the software produces the output in the
first place.
          Test automation[12] is the special technique used to control the execution of tests
and the comparison of actual outcomes to predicted outcomes. Test automation can automate
some repetitive, but necessary tasks in a formalized testing process already in place, or add
additional testing that would be difficult to perform manually. There are two general
approaches to test automation Code-driven testing.[11], Graphical user interface testing[12].
Code driven testing that allow the execution of unit tests to determine whether various
sections of the code are acting as expected under various circumstances. Graphical user
interface testing helps users to interactively record user actions and replay them back any
number of times, comparing actual results to those expected
A framework[12] is an integrated system that sets the rules of automation of a specific
product. This system integrates the function libraries, test data sources, object details and
various reusable modules. This components acts as small building blocks which need to be
assembled to represent a business process. The framework provides the basis of test
automation and simplifies the automation effort. Various techniques implemented in testing
are as follows,
Exploratory testing[11] is an approach to software testingthat is concisely described as
simultaneous learning, test design and test execution is "a style of software testing that
emphasizes the personal freedom and responsibility of the individual tester to continually
optimize the quality of his/her work by treating test-related learning, test design, test
execution, and test result interpretation as mutually supportive activities that run in parallel
throughout the project." While the software is being tested, the tester learns things that
together with experience and creativity generates new good tests to run. Exploratory testing is
often thought of as a black box testing technique. Instead, those who have studied it consider
it a test approach that can be applied to any test technique, at any stage in the development
process. The key is not the test technique nor the item being tested or reviewed; the key is the
cognitive engagement of the tester, and the tester's responsibility for managing his or her
 Functional testing[11] refers to activities that verify a specific action or function of the
code. These are usually found in the code requirements documentation, although some

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

development methodologies work from use cases or user stories. Functional testtend to
answer the question of "can the user do this" or "does this particular feature work."
Non-functional testing[11] refers to aspects of the software that may not be related to a
specific function or user action, such as scalability or other performance behavior under
certain constraints, or security. Testing[12] will determine the flake point, the point at which
extremes of scalability or performance leads to unstable execution. Non-
functionalrequirements tend to be those that reflect the quality of the product, particularly in
the context of the suitability perspective of its users. In this paper we mainly concentrate on
the search based testing and automatic test case generation, automatic faults finding using
fuzzy logic with natural language processing techniques by using this getting efficient results.


Fuzzy logic [6] is the form of many valued logic it deals with reasoning that is approximate.
Compared to traditional binary sets. Fuzzy logic variables may have a truth values that ranges
in degree between 0 and 1. Fuzzy logic extends concept of partial truth, where the truth value
may range between completely true and completely false. Furthermore, when linguistic
variables [6] are used, these degrees may be managed by specific functions can be known as
the fuzzy functions [20]. Testing plays a vital role in the software development life cycle.
Fuzzytesting or fuzzing [20] is a software testing technique, often automated or semi-
automated, that involves providing invalid, unexpected, or random data to the inputs of
a computer program. The program is then monitored for exceptions such as crashes, or failing
built-in code assertions [12] or for finding potential memory leaks [12]. Fuzzing is commonly
used to test for security problems in software or computer systems [12]. Fuzzy testing or
fuzzing is one form of black box testing that focuses on the robustness of the software, and
the history of fuzzing is relatively short in comparison to other software testing techniques.
There are two forms of fuzzing. Program, mutation based and generation based, which can be
employed as white, grey or black-box testing... Mutation based fuzzers mutate [23] existing
data samples to create test data while generation based fuzzers [20] define new test data
based on models of the input. File formats and network protocols [12] are the most common
targets of testing, but any type of program input can be fuzzed. Interesting inputs
include environment variables keyboard and mouse events, and sequences of API calls [12].
Even items not normally considered "input" can be fuzzed, such as the contents of data
bases, shared memory [12], or the precise interleaving of threads [12]. The simplest form of
fuzzing technique is sending a stream of random bits to software, either as command line
options, randomly mutated protocol packets, or as events. This technique of random inputs
still continues to be a powerful tool to find bugs in command-line applications, network
protocols, and GUI-based applications [12] and services [12]. Another common technique
that is easy to implement is mutating existing input a test suite by flipping bits at random or
moving blocks of the file around. However, the most successful fuzzers have detailed
understanding of the format or protocol being tested.
The better specification-based fuzzer involves writing the entire array of specifications into
the tool, and then using model-based test generation techniques in walking through the
specifications and adding anomalies in the data contents, structures, messages, and
sequences. This "smart fuzzing"[12] technique is also known as robustness testing [11],
syntax testing, [12] grammar testing [6], and (input) fault injection [23]. The protocol
awareness can also be created heuristically from examples using a tool such as Sequitur [12]

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

these fuzzers can generate test cases from scratch, or they can mutate examples from test
suites or real life. They can concentrate on valid or invalid input, with mostly-valid input
tending to trigger the "deepest" error cases [12].
        There are two limitations of protocol-based fuzzing based on protocol
implementations of published specifications: 1) Testing[11] cannot proceed until the
specification is relatively mature, since a specification is a prerequisite for writing such a
fuzzer; and 2) Many useful protocols are proprietary, or involve proprietary extensions to
published protocols. If fuzzing [19] is based only on published specifications, test coverage
[11] for new or proprietary protocols will be limited or nonexistent. Fuzz testing can be
combined with other testing techniques. [12] White-box fuzzing uses symbolic execution [7]
and constraint solving. Evolutionary [12] fuzzing leverages feedback from a heuristic [11]
(E.g., code coverage in grey-box harnessing, or a modeled attacker behavior in black-box
harnessing) effectively automating the approach of exploratory testing.
        Fuzzing is a negative black box testing technique which aims to find security flaws
[12] and robustness defects in the SUT [11]. The fuzz testing process starts with feeding
unexpected and malformed inputs to the SUT through external interfaces in a repeated
manner. Response messages sent by the SUT can be logged and analysed. The SUT is also
monitored for any faulty behaviour throughout the testing process. To ease the analysis of
any found defects, failure monitoring should include a recording of the events leading to the
failure and the failure itself. Ideally, this process is highly automated, with the aim being to
perform as much testing as possible in a short time span without producing a large amount of
redundant testing data Fuzzer Categorization [19].


         A statistical language model [14] assigns a probability [12] score to a string that
estimates the likelihood of that string occurring in the language it models. A good language
model for English, for example, assigns higher probability scores to strings that resemble
well-formed words, such as “testing”, and lower scores to strings that do not, e.g. “Qu5$-ua”.
Language models are widely used in natural language and speech processing [6] for a wide
range of tasks, including machine translation [12] and automatic speech recognition [8]. In
this paper, a character-based language model [8] is used, where the language is represented as
a sequence of characters. The same basic approach is used by both word and character based
models. Not only character based but also we are using fuzzy testing in this language
processing. A fuzzy testing tool is generally called a fuzzer. Taken et al. describe a four-way
categorization of fuzzers which is founded on test case complexity. The following
categorization is sorted from the least intelligent fuzzers to the most intelligent ones. The first
category is static and random template fuzzers, which are typically used to test simple
request/response protocols, or file formats. They generate arbitrary input data, and have very
little or no structural or semantic knowledge at all about the protocol used by the SUT. The
second category of fuzzers is block-based fuzzers, which contain an implementation of the
basic structure of a request/response protocol. When a fuzzer possesses structural details of
the protocol, it can focus on testing specific parts of the protocol messages at a time, while
possibly retaining other parts of the message valid and intact. The third category is dynamic
generation or evolution-based fuzzers, which are able to learn the protocol used by the SUT
based on the interpretation of the messages received from the SUT or recorded from similar

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

protocol exchanges between the SUT and other protocol implementations. Therefore, it is not
a necessity for the fuzzer to have preliminary knowledge of the protocol under fuzz testing.
The last category of fuzzers is model based or simulation-based fuzzers [19], which
incorporate either an implementation of the protocol model used by the SUT, or a simulation
of the interactions within the protocol. A fully model-based fuzzer is able to achieve full
interoperability with the SUT, which means that the fuzzer exercises the input handling
routines of the SUT thoroughly. The model-based approach also enables fuzzing the latter
parts of a complex message sequence
         However, the number of possible strings means that many of these sequences will not
be found, even in an extremely large corpus[12], making these probabilities impossible to
estimate directly. A language with c possible characters has cn possible sequences of n
characters. For example, if we assume that there are26 characters in English (i.e. ignoring
case, punctuation and whitespace) the number of possible 5 character sequences is over 11
Let c1n be a sequence of n characters (c1, c2, c n) A language model aims to assign a value to
the probability P (c1n). This can be decomposed using the chain rule of probability, allowing
the probability of each character ci to be estimated based on the characters that preceded it in
P(c1n) = P(c1)P(c2/c1)P(c3/c21):::P(cn/c1n-1) = ¶ni=1=1P(ci/c1 i-1)
        Where P (ci/c1i-1) is the probability of character ci following the sequence c1i-1.
Consequently language models approximate the probability of strings by combining the
probabilities of shorter sequences, for which more reliable probabilities can be inferred from
the corpus. One approach is to estimate the probability of each character based only on the
character that immediately precedes it:
        P(cn1) ~ ¶ni=1P(ci/c i-1)
        This type of language model is known as a bigram model. However, even when using
a bigram model some pairs of characters will not be seen in large corpora, and in these cases
the probabilities are estimated by combining the probabilities of individual characters, i.e.
P(ci), computed using smoothing and back-off techniques. In general longer strings are less
likely to occur than shorter ones and language models assign them lower probabilities. To
avoid bias in favour of shorter strings the probability generated by the language model is
normalized by taking the geometric mean, i.e. the score assigned to a string, score(c1n), is
computed as score(c1n) = P(c1n)1/n.
        We are introducing an advanced Search-based test input generation has been applied
extensively to the generation of structural test data . Formulation of a fitness function is the
main feature of a search-based approach. The fitness function underpins the test goal,
rewarding inputs close to fulfilling the goal with good fitness values, while punishing inputs
that are far away with weak fitness values. A metaheuristic search technique[5], such as an
evolutionary or local search algorithm[13], is used to optimize the fitness function. The
search favours exploration around input values with the best fitness values – on the
assumption that they lie in the vicinity of inputs with even better fitness – with the aim of
finding inputs that lead to the satisfaction of the current test goal of interest. The conventional
fitness function for generating inputs to cover individual branches is concerned with the
control structure of the program and the values of variables at decision points only. The
fitness function is to be minimized, with a zero fitness value representing the global optimum.
The approach level (AL) scores how far down the control dependency graph the input
penetrated with respect to a target branch. Added to the approach level is the normalized

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

branch distance (BD) metric, which scores how close the input was to taking the alternate
desired branch. With this fitness function, the goal of the search is to cover a particular
branch with any input that can be found. However, the inputs found tend to look random from
a human perspective, and the information encoded in string inputs requires work to decipher
due to arbitrary character sequences.


        This paper has presented an approach to automatic string input generation and dealing
with natural language processing and using an emerging technique Fuzzy logic to get
efficient results. where the strings are not only optimized for test coverage but also according
to a language model, in order to improve human readability. Since automated oracles are
frequently unavailable in software engineering practice, this work therefore has an important
bearing on lowering the costs of human involvement in the testing process as a manual
oracle. Our technique improves readability of strings without weakening the test adequacy
criterion. However, for non stringent criteria like branch coverage (as used our study), where
several inputs may execute a branch, the use of less readable strings may be more effective at
exposing corner cases and discovering flaws in an implementation. In this case, the use of a
language model may represent a trade-off for a tester. The future working proposed to do for
this paper is to maintain.


[1]. Tim Miller, “Prioritisation of test suites containing precedence constraints”.
[2]. Dennis Bernard Jeffrey, “Test Suite Reduction With Selective Redundancy”.
[3]. Andrearcuri, “Automatic software generation and improvement Through Search Based
[4]. Phil Mcminn, Muzammil Shahbaz And Mark Stevenson “Search-Based Test Input
      Generation For String Data Types using The Results Of Web Queries”.
[5]. PhilMcminn,Mark Harman, Kiran Lakhotia,,Youssef Hassoun “Input Domain Reduction
      Through Irrelevant Variable Removal and Its Effect On Local, Global And Hybrid Search-
      Based Structural Test Data Generation”.
[6]. Sheeva Afshan, Phil Mcminn And Mark Stevenson “Evolving Readable String Test Inputs
      Using A Natural Language Model To Reduce Human Oracle Cost”.
[7]. Guandong Xu1, Yanchun Zhang1, And Xiaofang Zhou2,“A Web Recommendation
      Technique Based on Probabilistic Latent Semantic Analysis”.
[8]. In-Young Ko, Robert Neches, and Ke-Thia Yao,“A Semantic Model and Composition
      Mechanism for Active Document Collection Templates in Web-based Information
      Management Systems”.
[9]. Steven P. Reiss, “Semantics-Based Code Search”.
[10]. Claudia d’Amato, “Similarity-Based Learning Methods for The Semantic Web”.
[11]. Jovanović, Irena,“Software testing methods and techniques”
[13]. P. Maragathavalli “Search-Based Software Test Data Generation Using Evolutionary
[14]. Anastasis A. Sofokleous, Andreas S. Andreou, Antonis Kourras “Symbolic Execution For
      Dynamic, Evolutionary Test Data Generation”.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

[15]. M.Papadakis1AndN. Malevris1, “Improving Evolutionary Test Data Generation With The
      Aid Of Symbolic Execution”.
[16]. Ymateus Borges, Ymarcelo D’amorim, Saswat Anand, Zdavid Bushnell, And Zcorina S.
      P˘As˘Areanu, “Symbolic Execution With Interval Solving And Meta-Heuristic Search”.
[17]. Tao Xie1 Nikolai Tillmann2 Jonathan De Halleux2 Wolfram Schulte2 “Fitness-Guided
      Path Exploration In Dynamic Symbolic Execution”.
[18]. Tuomas Parttimaa,“Test Suite Optimisation Based On Response status Codes And
      Measured Code Coverage”.
[19]. Craig Stuart Carlson,“Fuzzy Logic Load forecasting With Genetic algorithm Parameter
[20]. Vikash Kumar1, D. Chakraborty1 ,“Optimizing Fuzzy Multi-Objective Problems Using
      Fuzzy Genetic Algorithms ,Fz dt Test Functions”.
[21] Arthur Baars,Mark HarmanYoussef Hassoun Kiran Lakhotia Phil McMinn Paolo Tonella
      Tanja Vos K. Kollman, J. H. Miller, and S. E. Page, “Adaptive parties in spatial elections,”
      The American Political Science Review, vol. 86, no. 4, pp. 929–937, 1992,.“Symbolic
      Search-Based Testing”.
[22]. Muzammil Shahbaz, Phil Mcminn, Mark Stevenson, “Automated Discovery Of Valid Test
      Strings From The Web Using dynamic Regular Expressions Collation And Natural
      Language Processing”.
[23] Salem F. Adra And Phil Mcminn, “Mutation Operators For Agent-Based Models”.
[24] Anand Handa, Ganesh Wayal Rkdfist and Rgpv Bhopal, “Software Quality Enhancement
      using Fuzzy Logic with Object Oriented Metrics in Design”, International Journal of
      Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012,               pp. 169 -
      179, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[25] Mousmi Chaurasia and Dr. Sushil Kumar, “Natural Language Processing Based
      Information Retrieval for the Purpose of Author Identification”, International Journal of
      Information Technology and Management Information Systems (IJITMIS), Volume 1,
      Issue 1, 2010, pp. 45 - 54, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413.


  V.Sujatha, Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology &
Science, Vadlamudi Guntur Dist., A.P., India. She has done her B. Tech at JNTU Hyderabad
in the year of 2003, her M.Tech at Nagarjuna University in the year of 2007. Her research
interests are AI,NN, Software Engineering and Image Processing.

 K.Sriraman, Assoc.Prof, Department of CSE, Vignan's LARA Institute Of Technology
& Science, Vadlamudi Guntur Dist., A.P., India. His research interests are Soft Computing,
AI, NN, FS,PR&Image Processing and Security. He is Life Member in ISTE.

  K. Ganapathi Babu, Pursuing M.Tech in Department of CSE at Vignan's LARA
Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India. His research
interest includes in Image Processing, Software Engineering and Data Mining. He is being a
member of IACSIT, IAENG and IAEME journal and association.

  B.V.R.R.Nagrajuna, Pursuing M.Tech in Department of CSE at Vignan's LARA
Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India. His research
interest includes in Software Engineering, Data Mining and Image Processing.


Shared By: