Professional Report - DOC 18 by lonyoo

VIEWS: 23 PAGES: 45

									MSc Advanced Software Engineering Academic Session 2004-05 CSMPRJ

FINAL REPORT
WebMark®
A Bug free WWW

Department of Computer Science Kings College, London. 2nd of September 2005

Student: Mathulan Ganeshan (ganesma/0434410) Supervisor: Professor Mark Harman

Abstract
This project aims to develop a semi-automated web application testing tool. It uses some of the unique features of Web applications to gather user session data. Gathering test data involves monitoring the HTTP packets and generating a XML log file in the server side. Then using the server XML log files to generate test cases with algorithms described in a PhD research paper [1]. Enhancements for the application include developing a module for capturing HTTP packets and generate XML log files. Most of the currently available web applications testing tools are either Whitebox based or black box based. Also they heavily rely on the scripts generated on the client site of the web application and capture reply technique. This project combines the features of white box and black box testing techniques and uses the data gathered on the server side.

2

Acknowledgements
My first heartfelt thanks go to Professor Mark Harman to create a link with BBC and Kings to work together. Being my supervisor and guiding me towards the end of the project is truly an amazing experience. I named the project as WebMark to honour him by having his name in to the project title. I am also thankful to Andrew Macinnes of BBC for his invaluable help and time he allocated to us. During the visit to BBC, Andrew’s well organised tour of BBC gave me with an in-depth understanding of how the dynamic work at BBC and the challenges. I would like to thank other colleagues Mohammed, Bojan, Nadia and Kiran for their feedback during discussions. -Mathulan Ganeshan Kings College, London September 2005

3

TABLE OF CONTENTS
TABLE OF CONTENTS ..................................................................................................................... 4 INTRODUCTION ................................................................................................................................ 5 1.1 ABSTRACT..................................................................................................................................... 5 1.2 TESTING SOFTWARE APPLICATIONS ............................................................................... 5 1.3 AUTOMATED TESTING TOOLS ............................................................................................ 7 1.4 WEB APPLICATION TESTING TOOLS ................................................................................ 7 1.5 HTTP PROTOCOL ........................................................................................................................... 8 1.6 USER SESSION DATA GENERATION ................................................................................. 12 1.7 LOG FILES ................................................................................................................................... 13 1.8 HTTP PACKET ANALYSIS ...................................................................................................... 14 REVIEW .............................................................................................................................................. 16 2.1 INITIAL SPECIFICATIONS ..................................................................................................... 16 2.2 PACKET CAPTURE AND ANALYSIS .................................................................................. 17 2.3 WEBPAGE ANALYSIS .............................................................................................................. 17 2.4 ALGORITHMS FOR TEST CASE GENERATION ............................................................ 18 SPECIFICATION .............................................................................................................................. 21 3.1 CHOICE OF TECHNOLOGY................................................................................................... 21 3.2 USE CASE DIAGRAM................................................................................................................ 23 3.3 OVERALL ARCHITECHTURE .............................................................................................. 24 IMPLEMENTATION........................................................................................................................ 25 4.1 INTERNALS ................................................................................................................................. 25 4.2 LOG FILE GENERATION ........................................................................................................ 25 4.3 WEB APPLICATION TESTING .............................................................................................. 27 4.4 VIEW TEST RESULTS............................................................................................................... 38 EVALUATION ................................................................................................................................... 39 CONCLUSION ................................................................................................................................... 41 6.1 FUTURE WORK ............................................................................................................................... 42 BIBLIOGRAPHY............................................................................................................................... 43

4

Chapter

1

Introduction

This chapter gives an overview of the initial research carried out on web application testing related areas. Areas of interest include testing of software applications, automated testing, HTTP protocol and available web application testing tools. It details the importance of testing and methods used to test web applications.

1.1 ABSTRACT
Web applications provide access to data for users around the world despite of different time zones. Web pages are deployed on the internet and are available to users at anytime. Internet is becoming more and more popular and reliable. Web applications demands high reliability, scalability and availability. It is important to make sure that these applications run 24/7 without system breakage or errors. This project aims to develop a testing server to test web application using some of web application’s unique features. The project is based on a research paper [1] about testing web applications. Enhancements to the proposed testing methods are also included in this project.

1.2 TESTING SOFTWARE APPLICATIONS
Testing involves operation of a system or application under controlled conditions and evaluating the results (e.g., 'if the user is in interface A of the application while using hardware B, and does C, then D should happen'). The controlled conditions should include both normal and abnormal conditions. Testing should intentionally attempt to make things go wrong to determine if things happen when they shouldn't or things don't happen when they should. Most of the software projects undergo various phases. Core phases [3] are as follows 1. Business modelling 2. Requirements analysis 3. Analysis & Design 4. Implementation 5. Test 6. Deployment Software applications being intangible it is difficult to match the customer requirements against the system being developed. The testing phase is one of the most important phases since it validates the requirement against the system being developed. Testing involves matching the functional and nonfunctional requirements. Software applications need to be tested before deployment in order to verify that  All requirements have been implemented  Identify and ensure defects are addressed prior to deployment  Proper integration of all components of the software  Interaction between user and software Common problems in the software development process which demands more test to be carried out before delivery.


Poor requirements - if requirements are unclear, incomplete, too general, and not testable, there will be problems.

5

   

Unrealistic schedule - if too much work is crammed in too little time, problems are inevitable. Inadequate testing - no one will know whether or not the program is any good until the customer complains or systems crash. Featuritis - requests to pile on new features after development is underway; extremely common. Miscommunication - if developers don't know what's needed or customer's have erroneous expectations, problems are guaranteed.

Most of the software developed is tested one or more of the following tests:    Black box testing - not based on any knowledge of internal design or code. Tests are based on requirements and functionality. White box testing - based on knowledge of the internal logic of an application's code. Tests are based on coverage of code statements, branches, paths, conditions. Unit testing - the most 'micro' scale of testing; to test particular functions or code modules. Typically done by the programmer and not by testers, as it requires detailed knowledge of the internal program design and code. Not always easily done unless the application has a well-designed architecture with tight code; may require developing test driver modules or test harnesses. Integration testing - testing of combined parts of an application to determine if they function together correctly. The 'parts' can be code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems. Functional testing - black-box type testing geared to functional requirements of an application; this type of testing should be done by testers. This doesn't mean that the programmers shouldn't check that their code works before releasing it. Regression testing - re-testing after fixes or modifications of the software or its environment. It can be difficult to determine how much re-testing is needed, especially near the end of the development cycle. Automated testing tools can be especially useful for this type of testing. Acceptance testing - final testing based on specifications of the end-user or customer, or based on use by end-users/customers over some limited period of time. Stress testing - term often used interchangeably with 'load' and 'performance' testing. Also used to describe such tests as system functional testing while under unusually heavy loads, heavy repetition of certain actions or inputs, input of large numerical values, large complex queries to a database system, etc. Usability testing - testing for 'user-friendliness'. Clearly this is subjective, and will depend on the targeted end-user or customer. User interviews, surveys, video recording of user sessions, and other techniques can be used. Programmers and testers are usually not appropriate as usability testers. Mutation testing - a method for determining if a set of test data or test cases is useful, by deliberately introducing various code changes ('bugs') and retesting with the original test data/cases to determine if the 'bugs' are detected. Proper implementation requires large computational resources. Alpha testing - testing of an application when development is nearing completion; minor design changes may still be made as a result of such testing. Typically done by end-users or others, not by programmers or testers.







 







6



Beta testing - testing when development and testing are essentially completed and final bugs and problems need to be found before final release. Typically done by end-users or others, not by programmers or testers.

1.3 AUTOMATED TESTING TOOLS
Most of the traditional testing methods are Whitebox based. In white box based testing the tester needs to have the knowledge about the application under test. Also they need to enter test data manually. If there is any possibility to automate testing it will save lot of time and effort. The main question is when automated testing can be applied.
 

For small projects, the time needed to learn and implement them may not be worth it. For larger projects, or on-going long-term projects they can be valuable. A common type of automated tool is the 'record/playback' type. For example, a tester could click through all combinations of menu choices, dialog box choices, buttons, etc. in an application GUI and have them 'recorded' and the results logged by a tool. The 'recording' is typically in the form of text based on a scripting language that is interpretable by the testing tool. If new buttons are added, or some underlying code in the application is changed, etc. the application might then be retested by just 'playing back' the 'recorded' actions, and comparing the logging results to check effects of the changes. The problem with such tools is that if there are continual changes to the system being tested, the 'recordings' may have to be changed so much that it becomes very time-consuming to continuously update the scripts. Additionally, interpretation and analysis of results (screens, data, logs, etc.) can be a difficult task. Note that there are record/playback tools for text-based interfaces also, and for all types of platforms. Another common type of approach for automation of functional testing is 'datadriven' or 'keyword-driven' automated testing, in which the test drivers are separated from the data and/or actions utilized in testing (an 'action' would be something like 'enter a value in a text box'). Test drivers can be in the form of automated test tools or custom-written testing software. The data and actions can be more easily maintained such as via a spreadsheet - since they are separate from the test drivers. The test drivers 'read' the data/action information to perform specified tests. This approach can enable more efficient control, development, documentation, and maintenance of automated tests/test cases.



Software applications are of various types needs to be tested using different testing methods. The research on web applications shows that these types of applications have some unique features those other applications does not have.

1.4 WEB APPLICATION TESTING TOOLS
Most of the tools available in the market are script based and gather user session data on the client side. Also these tools concentrate on testing either unit test or capture reply techniques to test the application. This compared to the testing server which works on the data gathered in the server side. The key difference is that the web server implemented combines the data gathered in the server side with the capture reply and generate new test cases as well. Further most of the current testing methods are white box based. In white box based techniques the tester provides data based on the clear understanding of the application architecture. In some
7

industry or software development processes the test data are generated while the software being designed. The capture replay techniques uses tools to capture single use navigations and replay them against the application under test. Below are some of the widely used HTTP testing tools.

HttpUnit
Open source Java API for testing web sites without a browser.

HtmlUnit
HtmlUnit is a java unit testing framework for testing web based applications. It is similar in concept to httpunit but is very different in implementation. HttpUnit models the http protocol so the tester deal with request and response objects. HtmlUnit on the other hand, models the returned document so that you deal with pages and forms and tables.

MaxQ
MaxQ is a free web functional testing tool. It includes an HTTP proxy that records the test script, and a command line utility that can be used to playback tests. The proxy recorder automatically stores variables posted to forms.

JWebUnit
JWebUnit is a Java framework that facilitates creation of acceptance tests for web applications. It evolved from a project where we were using HttpUnit and JUnit to create acceptance tests.

SlimDog
SlimDog offers a simple script based web application testing tool. It is based on httpunit. The tool offers a wide range of commands to work with forms, check the content of tables and navigation between HTML pages. Rather than writing long JUnit test cases or crucial XML files the users can write simple text scripts. Each line of the script file will contain one command which is a test node. All commands inside one file will be processed as a testcase. The syntax of every command is simple and easy to learn. Several scripts can be combined to a test suite. The results are written either to the console, a file or as a HTML page.

Imprimatur
Imprimatur is a web application testing tool. The tests are described in a simple XML file. Along with the standard GET and POST actions, Imprimatur handles HTTP sessions and file uploads. The responses can be validated using regular expressions and response code checks.

1.5 HTTP Protocol
HTTP [2] defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. For example, when user enters a URL in his/her browser, this actually sends an HTTP command to the Web server directing it to fetch and transmit the requested Web page. The other main standard that controls how the World Wide Web works is HTML, which covers how Web pages are formatted and displayed. HTTP is a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it. This is the main reason that it is difficult to

8

implement Web sites that react intelligently to user input. This shortcoming of HTTP is being addressed in a number of new technologies, including ActiveX, Java, JavaScript and cookies. Browser A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port.

Structure of HTTP Transactions
Like most network protocols, HTTP uses the client-server model. An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions). The format of the request and response messages is similar, and English-oriented. Both kinds of messages consist of:
a) b) c) d)

An initial line, Zero or more header lines, A blank line (i.e. a CRLF by itself), and An optional message body (e.g. a file, or query data, or query output)

Initial lines and headers should end in CRLF, though it should be gracefully handled lines ending in just LF. (More exactly, CR and LF here mean ASCII values 13 and 10, even though some platforms may use different characters.)
a)

Initial Request Line

The initial line is different for the request than for the response. A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is: GET /path/to/file/index.html HTTP/1.0
Fig 1.1 HTTP GET request for a file

Notes:
  

GET is the most common HTTP method; it says "give me this resource". Other methods include POST and HEAD. Method names are always uppercase. The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general). The HTTP version always takes the form "HTTP/x.x", uppercase. Initial Response Line (Status Line)

The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are: HTTP/1.0 200 OK or HTTP/1.0 404 Not Found
Fig 1.2 HTTP request initial response header

9

Notes:
  

The HTTP version is in the same format as in the request line, "HTTP/x.x". The status code is meant to be computer-readable; the reason phrase is meant to be humanreadable, and may vary. The status code is a three-digit integer, and the first digit identifies the general category of response: o 1xx indicates an informational message only o 2xx indicates success of some kind o 3xx redirects the client to another URL o 4xx indicates an error on the client's part o 5xx indicates an error on the server's part

The most common status codes are: 200 OK The request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body. 404 Not Found The requested resource doesn't exist. 301 Moved Permanently 302 Moved Temporarily 303 See Other (HTTP 1.1 only) The resource has moved to another URL (given by the Location: response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file. 500 Server Error An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.

b)

Header Lines

Header lines provide information about the request or response, or about the object sent in the message body. The header lines are in the usual text header format, which is: one line per header, of the form "Header-Name: value", ending with CRLF.
   

As noted above, they should end in CRLF, but LF should be handled correctly. The header name is not case-sensitive (though the value may be). Any number of spaces or tabs may be between the ":" and the value. Header lines beginning with space or tab are actually part of the previous header line, folded into multiple lines for easy reading. Header1: some-long-value-1a, some-long-value-1b HEADER1: some-long-value-1a, some-long-value-1b
Fig 1.3 Equivalent HTTP headers

HTTP 1.0 defines 16 headers, though none are required. HTTP 1.1 defines 46 headers, and one (Host is required in requests. For Net-politeness, consider including these headers in requests:
10

 

From: header gives the email address of whoever’s making the request, or running the program doing so. (This must be user-configurable, for privacy concerns.) User-Agent: header identifies the program that’s making the request, in the form “Program-name/x.xx”, where x.xx is the (mostly) alphanumeric version of the program. For example, Netscape 3.0 sends the header “User-agent: Mozilla/3.0Gold”.

Custom server headers can also included, consider including these headers in responses:


Server: header is analogous to the User-Agent: header: it identifies the server software in the form “Program-name/x.xx”. For example, one beta version of Apache’s server returns “Server: Apache/1.2b3-dev”. Last-Modified: header gives the modification date of the resource that’s being returned. It’s used in caching and other bandwidth-saving activities. Use Greenwich Mean Time, in the format Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT



c)

The Message Body

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there’s an error. In a request, this is where user-entered data or uploaded files are sent to the server. If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular,
 

Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif. Content-Length: header gives the number of bytes in the body.

Sample HTTP Exchange
a) To retrieve the file at the URL http://www.somehost.com/path/file.html
Fig 1.4 First HTTP request from browser

b) First open a socket to the host www.somehost.com, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket: GET /path/file.html HTTP/1.0 From: ganesma@dcs.kcl.ac.uk User-Agent: HTTPTool/1.0 [blank line here]
Fig 1.5 Browser formats the request to be send to the server

c) The server should respond with something like the following, sent back through the same socket:

11

HTTP/1.0 200 OK Date: Fri, 02 SEP 2005 10:00:00 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy Report writing!</h1> (more file contents) . . . </body> </html>
Fig 1.6 Web server response to the browser

d) After sending the response, the server closes the socket.

HTTP Proxies An HTTP proxy is a program that acts as an intermediary between a client and a server. It receives requests from clients, and forwards those requests to the intended servers. The responses pass back through it in the same way. Thus, a proxy has functions of both a client and a server. Proxies are commonly used in firewalls, for LAN-wide caches, or in other situations. When a client uses a proxy, it typically sends all requests to that proxy, instead of to the servers in the URLs. Requests to a proxy differ from normal requests in one way: in the first line, they use the complete URL of the resource being requested, instead of just the path. For example, GET http://www.somehost.com/path/file.html HTTP/1.0 That way, the proxy knows which server to forward the request to (though the proxy itself may use another proxy).

1.6 USER SESSION DATA GENERATION
User session being a unique feature of web application. This project utilizes this user session data in order to create test cases. Web application accesses consist of set of URLs. GET http://dcs.kcl.ac.uk/timetable.php?course=mscase&batch=2005 HTTP/1.0 From: ganesma@dcs.kcl.ac.uk User-Agent: HTTPTool/1.0 [blank line here]
Fig 1.7 GET Request received on server side

12

POST http://dcs.kcl.ac.uk/timetable.php HTTP/1.0 User-Agent: Mozilla Content-Type: application/x-www-form-urlencoded Content-Length: 32 course=mscase&batch=2005
Fig 1.8 Post request received on the server side

User or the application provides values for two parameters. In Fig 1.7, 1.8 the parameters course, batch is given values mscase, 2005 respectively. These parameters are generated by either  Get method of the application link. Or  HTML form included in the web page In the former case user clicks a link provided in the web page as part of the navigation. This is a predefined URL generated by web application and the web browser sends the request back to the web server as HTTP GET method. In the latter case user fills a form included within the web page and press the submit button. Again the web browser sends the HTTP request back to the server appending the form controls values are name, value pairs. The method is either GET or POST depends on the page. In the case of post method the HTML form will contain the method POST for submitting the form. <FORM METHOD=GET ACTION="http://dcs.kcl.ac.uk/timetable.php"> Course: <INPUT NAME="course"><BR> Batch: <INPUT NAME="batch"><BR> <INPUT TYPE=SUBMIT> </FORM>
Fig 1.9 Using HTML form to generate GET request <A HREF=" http://dcs.kcl.ac.uk/timetable.php? course=mscase&batch=2005 ">Timetable</A> Fig 1.10 Using HREF to generate GET request

<FORM METHOD=POST ACTION=" http://dcs.kcl.ac.uk/timetable.php"> Course: <INPUT NAME="course"><BR> Batch: <INPUT NAME="batch"><BR> <INPUT TYPE=SUBMIT> </FORM>
Fig 1.11 Using HTML form to generate POST request

1.7 LOG FILES
Web server log files contain information about the accesses made to the server. A log file for set of user sessions, generated either on client side or server side. If the log file is generated on client site the data gathered is belongs to the users using that particular computer to access the web site. On the other hand if the data is gathered in the server side the log file generated will contain all the user sessions from users accessing the web site.

13

The server side session log file generation can be done by 1. Using the server side scripting language to log the relevant information to the file system. This gives the flexibility for the programmer to selectively log the data. The major disadvantage is the application under test has to be modified in order to log the data required. 2. Configuring the web server to log the requests to the file system. 2.1. Apache web server Apache web server can be configured to log user requests in to a log file [5]. The log file is in ASCII format. It can also customized to log selective data in configurable format. 2.2. Internet Information Server [6] 1. W3C Extended log format W3C Extended format is a customizable ASCII format with a variety of different properties. It can log properties important to future use, while limiting log size by omitting unwanted property fields. Properties are separated by spaces. Time is recorded as UTC.

2. IIS Log file format
IIS format is a fixed (cannot be customized) ASCII format. IIS format records more information than NCSA Common format. The IIS format includes basic items, such as the user's IP address, user name, request date and time, service status code, and number of bytes received. In addition, IIS format includes detailed items, such as the elapsed time, number of bytes sent, action (for example, a download carried out by a GET command), and target file. The items are separated by commas, making the format easier to read than the other ASCII formats, which use spaces for separators. The time is recorded as local time. 3. NCSA Common Log File Format National Centre for Supercomputing Applications (NCSA) Common format is a fixed (cannot be customized) ASCII format, available for Web sites but not for FTP sites. NCSA Common format records basic information about user requests, such as remote host name, user name, date, time, request type, HTTP status code, and the number of bytes sent by the server. Items are separated by spaces; time is recorded as local time. 4. ODBC logging ODBC logging format is a record of a fixed set of data properties in a database that complies with Open Database Connectivity (ODBC), such as Microsoft Access or Microsoft SQL Server™. Some of the items logged include the user's IP address, user name, request date and time (recorded as local time), HTTP status code, bytes received, bytes sent, action carried out (for example, a download carried out by a GET command), and the target (for example, the file that was downloaded). With ODBC logging, you must both specify the database to be logged to, and set up the database to receive the data.

1.8 HTTP PACKET ANALYSIS
Analysing TCP/IP Packets is one of most common applications available in the industry. Below are the two open source projects available in Java for windows platform.

WinPcap
WinPcap [7] is the industry-standard tool for link-layer network access in Windows environments. It allows applications to capture and transmit network packets bypassing the

14

protocol stack, and has additional useful features, including kernel-level packet filtering, a network statistics engine and support for remote packet capture. WinPcap consists of a driver, which extends the operating system to provide low-level network access, and a library that is used to easily access the low-level network layers. This library also contains the Windows version of the well known libpcap Unix API.

Jpcap
Jpcap [8] is a Java class package which enables to capture and send IP packets from Java application. This package uses libpcap and Raw Socket API. Also it supports Ethernet, IPv4, IPv6, ARP/RARP, TCP, UDP, ICMPv4. In order capture and generate HTTP data Jpcap will be modified to filter the packets so that HTTP packets are the only packets captured.

15

Chapter

2

Review

This chapter gives an overview of the initial requirements and scope of the project implemented. The software utilises the data gathered on the web server for different user’s sessions. The algorithms used to generate test cases for web applications are explained.

2.1 INITIAL SPECIFICATIONS

Fig 2.1 Block diagram of systems and interactions

The above diagram details the entities and the interaction between them. 1. User navigate to the web site 2. Web server response to the user possibly with the requested page 3. Packet analyzer module analyses the requests and generate generic XML format, which is used to generate test cases. 4. An internal or built-in logging module logs the requests to a XML file. XML log is written to the log file. 5. The testing server is passed with the XML log file. It analyses and generate test cases using the XML file provided 6. New test cases are tested against the web server 7. The result is stored in a database for analysis

16

2.2 PACKET CAPTURE AND ANALYSIS
This project extends some of the drawbacks in the current proposed [1] testing methods. In the proposed method implementation the log file generation is done by modifying the application under test. This is a major drawback since the tester needs to have the technical knowledge to include scripts to capture the data values submitted by clients. In order to address this issue a HTTP packet capturing module will be developed. This module will be responsible for generating log files in XML format. For implementation purposes the existing WinPcap [7] and Jpcap [8] were used to extent to support and filter HTTP packets. The extracted packets then used to generate XML log files. XML files generated will follow the same schema required by Session reading modules.

2.3 WEBPAGE ANALYSIS
A web site consists of set of web pages. These web pages are linked between them by hyper links. Any webpage is categorized into one of two types. First one is static page, where the content is pure HTML and no data is generated dynamically. Second is dynamic page where the content is generated dynamically by server site scripting language such as ASP.NET, PHP and JSP. Web pages may also contain HTML forms to get user input data for processing. A Form contains an action attribute and the Submit Button. When the user clicks on the "Submit" button, the content of the form is sent to another file. The form's action attribute defines the name of the file to send the content to. The file defined in the action attribute usually does something with the received input. The diagram below describes the UML representation of a web site.

Fig 2.2 Web page and the relation between its elements

17

2.3.1 Web applications
Web application testing is important and challenging. First, most of the web applications change rapidly. The contents are dynamic. Also once the web application is recognized by users they tend to visit more and more to that site. This leads to a sudden increase in HTTP hits overnight. Also there are hackers trying to break the systems by generating fake HTTP requests to the server. Second, web applications undergo rapid maintenance. The site is often changed with its look and feel. Also some links are removed or relocated to other locations. Third, nowadays web applications are multi tied and very complex architectures. They depend on other application servers to deliver content to them. 2.4 ALGORITHMS FOR TEST CASE GENERATION Since this project is based on a research paper [1] the following test case generation algorithms are implemented 1. 2. 3. 4. 5. User Session 1 User Session 2 White box 1 White box 2 Hybrid

1.4.1 User Session - 1
This algorithm is the same as capture replay technique used in most of the existing testing tools. The current tools runs on the client side of the application and captures the navigation a user does and then replays it in the same sequence as the user navigated. In this project the same concept is used but the packet is captured in the server site. Since the navigation of multiple users is captured in the server the effort required to capture the user sessions is minimal. Also the test data gathered is far higher than that of on a client side.

While NOT End of File Extract user request If Not session available for that user Create new user session End if Add Request to user session Do
Fig 2.3 User session – 1 Algorithm

1.4.2 User Session – 2
In this algorithm new test-cases are created by using existing user sessions. In order to create new test-cases user sessions from different users are combined. Requests from one user are combined with requests from another user to create a new navigation with different data. Read user sessions from log While (no more unused sessions) Select an un used user session (Ua) Copy requests from R1 to Ri to TestCase
18

Select another un used user session (Ub [a != b]) While (Search for Ri url in Ub) If (FOUND (Rj)) Copy R1 to Ri to TestCase then Rj to Rn to TestCase Mark Ua “Used” End if Do Do
Fig 2.4 User session – 2 Algorithm

1.4.3 White box – 1
White box - 1 algorithm relies on the tester specifying the part of a website’s path to be tested. This algorithm uses user session values from the log and applies random parameter values for the test. Read user sessions from log Read Test path definition For each url in Test path Select a session matches the url Select a random user request belonging to that session If no user session defined Create new user session End if Add user request to the user session Do
Fig 2.5 White box - 1 Algorithm

1.4.4 White box – 2
White box - 2 algorithms exercises the entire web site using available user session log file data. In order to test the website user provides a sitemap of the website under test. Read user sessions from log Read Test path definition For each url in Test path Select a session matches the url Select a random user request belonging to that session If no user session defined Create new user session End if Add user request to the user session Do
Fig 2.6 White box - 1 Algorithm

1.4.4 Hybrid
In this implementation the white box test is always integrated with the user session log files.

19

Test Type White box 1

Description Generate test cases based on a Whitebox sequence of web navigation URLs Generate test cases to test all Whitebox possible paths available from a web site Replays the test cases in the UserSession same sequence as they occurred during user navigation Similar to user session 1. But UserSession when generating test cases it mixes user sessions from more than one user Uses user session data to White Hybrid box tests
Table 2.7 Testcase generation algorithms and descriptions

Type

White box 2

User session 1

User session 2

Hybrid

20

Chapter

3

Specification

This chapter gives an overview of the design decisions and the overall architecture of the testing server and component interactions.

3.1 CHOICE OF TECHNOLOGY
The implementation of the project requires XML processing, Internet Protocol, String and HTTP data manipulation. Also the system deployment is dependent on the web server platform if the HTTP log generation module has to be deployed. Technologies considered for the implementation are as follows. The testing server should be able to run either on the same machine as web server or in a different machine. Also the packet capturing module must run in the same machine as web server in order to gather log files. Most widely used web servers run on both UNIX and Windows platforms. 1. .NET .NET [12] is new technology from Microsoft. .NET has good XML processing and String manipulation capabilities. In addition it has API for network programming as well. This satisfies most of the requirements of this project. But the major disadvantage being that it can only be deployed on Windows platform. Due to this reason .NET was not selected for implementation. 2. Java Java [11] also provides most of the .NET features. And java applications are “write once run anywhere”. The project requires a programming language which can be deployed in both windows and UNIX. Java being a platform independent language it was chosen for implementation. 3. XML XML [4] was designed to describe and interchange data between applications. XML tags allow defining tags and process then easily. The user defined tags allows the flexibility on agreeing a set of tags that will be used between applications and interchange data. In this project the XML log files are generated by one module and processed by another module. In order to agree on a set of tags and validate the XML documents a DTD or XML Schema can be used. 4. XML DTD The purpose of a DTD [4] is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. A DTD can be declared inline in a XML document, or as an external reference. XML provides an application independent way of sharing data. With a DTD, independent groups of people can agree to use a common DTD for interchanging data. An application can use a standard DTD to verify that data that received from the outside world is valid. It can also use a DTD to verify its own data.

21

5. XML Schema XML schema [4] provides an easy way to define complex data types. It also allows an XML document to be validated and processed as Java Objects. This is the major advantage over DTD based document processing. In order to process a document the programmer only has to deal with Java objects. 6. JAXB JAXB [9] is a Java technology that enables to generate Java classes from XML schemas by means of a JAXB binding compiler. The JAXB binding compiler takes XML schemas as input, and then generates a package of Java classes and interfaces that reflect the rules defined in the source schema. 7. JDBC JDBC allows Java applications to access relational databases using a unique interface. If the underlying database changes from one to another the applications using JDBC need not to be changed. 8. PHP PHP [10] is a widely used, very efficient and elegant web programming language. PHP being open source and provided easy access to database technologies. PHP also available in both Windows and UNIX. A user interface is developed to view the test results using PHP and MySQL. 9. MySQL MySQL is one of the best open source databases. It provides most of the functionalities required by any Relational databases. To store the results of the test MySQL was selected.

22

3.2 USE CASE DIAGRAM

Fig 3.1 Use case diagram of WebMark

The above UML use case diagram describes the functionalities required by tester. The tester will specify a log file to be used for testing. Once the test file is specified he/she can select the type of test they wish to carry out from the provided console user interface. At the end of the test he/she can view the test results using the provided HTTP interface.

Use case Diagram Is part of Rational unified process [3] diagrams. Used to describe an abstract view of the system with the actors using the system and the functionalities they expect from the system.

23

3.3 OVERALL ARCHITECHTURE

Fig 3.2 Overall module interactions

Overall component architecture of the system provides and requires few external interfaces. User is provided with two interfaces. a) First interface provides the user to perform four types of the test against a web site. This is a console based interface. It also provides options to upload test file and selectively perform tests.
b) Second interface allows the user to view the test results. This is a HTTP internet so that

the test results can be viewed remotely. Testing server requires an interface to connect to the internet. To carry out the tests against a web page the testing server has to access the pages via internet.

24

Chapter

4

Implementation

This chapter explains the internal implementation details of each module. The packages, UML class diagrams and data base designs are explained.

4.1 INTERNALS
The project implementation is divided into three major modules. a) Log file generation module This module primarily extends the implementation of Jpcap [8] and used with XML schema classes to generate the log file. b) Web application testing module This is the major module which creates and executes the tests. It is divided into Java packages according to their usage. c) Test Results module A web interface to see the outcome of the test result.

4.2 LOG FILE GENERATION
Log file generation is one of the enhancements to the initial implementation used in the research [1]. Since the log file generation module’s output is passed as input to the testing module a XML schema is defined to validate and exchange log file data. The schema consists of one main complex type MultiHttpType. MultiHttpType defines few elements of complex and simple types. Some of the elements are global to the entire log file. First element is a global element title of simple type string. This is used to specify a title for the application under test. Next it defines another global element description of simple type. Description element is used to specify a description of the application under test. Third is a global element config of complex type. The config element includes two elements useragent and baseurl both of simple types. The last complex type element the schema contains is the Transaction element. Transaction element is mapped to individual user requests. Each user request is mapped to a Transaction element in the XML log file. The Transaction element contains clientIP of simple type, request of Request complex type, response of Response complex type. In addition to the elements it also contains a description attribute. Request element contains two attributes. First the type of HTTP request with name type and string simple type. Second it has document attribute indicating the URL resource document accessed. It contains zero or more complex type elements key. The key is element contains a name attribute. name attribute is the parameter name used for a testing request and value of key element is the value of the parameter used for testing. Response element in Transaction element is of complex type. It contains a single element named type and the value if this node is the response value. Based on the type the value field is extracted.

25

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="multihttp" type="MultiHttpType"/> <xsd:complexType name="MultiHttpType"> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="description" type="xsd:string"/> <xsd:element name="config" type="Config"/> <xsd:element name="transaction" type="Transaction" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Config"> <xsd:sequence> <xsd:element name="useragent" type="Useragent"/> <xsd:element name="baseurl" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Useragent"> <xsd:attribute name="cookies" type="xsd:string"/> </xsd:complexType> <xsd:complexType name="Transaction"> <xsd:sequence> <xsd:element name="clientIP" type="xsd:string"/> <xsd:element name="request" type="Request"/> <xsd:element name="response" type="Response" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute name="description" type="xsd:string"/> </xsd:complexType> <xsd:complexType name="Request"> <xsd:sequence> <xsd:element name="key" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="name" type="xsd:string" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> </xsd:sequence> <xsd:attribute name="type" type="xsd:string"/> <xsd:attribute name="document" type="xsd:string"/> </xsd:complexType> <xsd:complexType name="Response"> <xsd:sequence> <xsd:element name="test" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="type" type="xsd:string" /> </xsd:extension>
26

</xsd:simpleContent> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:schema>
Fig 4.1 XML schema definition for Log file processing and generation

When a TCP packet is capture from the network device the extended handlePacket() does a string comparison to find out whether it is an HTTP packet. If it’s a HTTP packet then the packet is further analysed and the relevant HTTP information is extracted and logged.

<?xml version="1.0"?> <multihttp> <title>Postcoder, live</title> <description>Changing user's postcode</description> <config> <useragent cookies="keep"/> <baseurl>http://www.bbc.co.uk</baseurl> </config> <transaction description="Collect BBC UID cookie"> <clientIP>212.139.243.64</clientIP> <request type="GET" document="http://www.bbc.co.uk/radio/index.shtml"/> <response> <test type="status">200</test> </response> </transaction> </multihttp>
Fig 4.2 Sample Web server log file

4.3 WEB APPLICATION TESTING
Web application testing module contains six packages which groups the classes cohesively. The packages are as follows: 1. Data Access In this package classes that access and provide services to data in the relational database and XML storage are implemented. The classes within this package includes a. DBConnection This class abstracts the relational data access for MySQL database. A singleton design pattern is followed for instances of DBConnection class. Database operations such as insert, select, delete and update are encapsulated around MySQL RDMS.

27

Fig 4.3 DBConnection class

The DBConnection class uses java JDBC to connect to the database. JDBC allows Java applications to access any underlying database in a uniform interface. Class.forName("com.mysql.jdbc.Driver"); Properties props = new Properties(); props.put("autoReconnect", "true"); props.put("roundRobinLoadBalance", "true"); props.put("failOverReadOnly", "false"); props.put("user", userName); props.put("password", password);
Fig4.4 Setting up environment variable for JDBC connection

b. XMLProcessor Provides access to underlying XML files. This class also maintains the singleton design pattern for object instantiation of XMLProcessor.

Fig 4.5 XMLProcessor class

The getNodeByName() method extracts the nodes of an XML document and returns NodeList. The NodeList is then traversed for required elements by the calling application. Package A set of classes grouped cohesively under a common name module. Singleton Ensure a class has only one instance and provide a global point of access to it. 2. HTTP The HTTP package contains classes that provide abstraction to HTTP connection. Services provided are connecting to a remote web server via internet using HTTP protocol.

28

a. HTTPConnection Also encapsulates the functionalities of making various HTTP request methods and getting the response from the server.

Fig 4.6 HTTPConnection class

If the testing server is deployed within a proxy LAN then the server has to be configured to use the proxy server. A proxy server can be set using following code. System.getProperties().put("proxySet", "true"); System.getProperties().put("proxyHost", "proxy.dcs.kcl.ac.uk"); System.getProperties().put("proxyPort", "3128");
Fig 4.7 setting proxy server values

When ever the system need to connect to internet it has to use proxy server for the connection. The code fragment below shows how the proxy server settings used to connect to the internet. String password = System.getProperties().get("proxyUsername").toString(); password += ":"; password += System.getProperties().get("proxyUsername").toString(); sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder(); String encoded = encoder.encode(password.getBytes()); connection.setRequestProperty(HEADER_PROXY_AUTHORIZATION, encoded);
Fig 4.8 using proxy to connect to internet

b. IHTTP An interface containing all HTTP related constants. Any classes need HTTP default values can implement this interface so that a consistent values are used throughout the system.

29

3. Session User session related classes are grouped in to this package. The functionalities needed in order to read and process the XML log files and create objects that provide uniform access to the user specific session data are included in to this package. a. Parameter This is a simple class with a name, pair attribute. The purpose of this class is to store the HTTP request URL’s parameters. The parameter instance is always included in to the UserRequest class. During the test process the UserRequest object is saved for future analysis of the test result. While saving the UserRequest object it calls recursively to save the parameters included in that request. b. UserRequest Encapsulates an individual user request from underlying XML log file. Each <transaction> element is mapped to a UserRequest object (i.e. one to one mapping with Transaction and UserRequest). It also provides methods for matching the HTTP response from the server for the given UserRequest against the expected HTTP response provided in the XML file. The XML processing is achieved using JAXB. JAXB is a framework for XML binding and processing. The getRequestUrl() method formats the URL based on the request type. XML Binding XML document consists of elements. XML binding is a process of generating mapping Classes to match the elements of an XML document. In order to generate the mapping between Classes and Elements an XML schema is defined describing the elements and their data types. c. UserSession A set of UserRequest s forms a UserSession. This real world concept of multiple user requests forming a single user session is abstracted in this class. The user sessions are identified by clients IP address. d. SessionDataReader This class is responsible for reading the web server’s XML log file and create UserRequest objects and group them into User session Objects. While reading the XML log file it makes use of the JAXB framework classes to validate and process the XML log file. e. HttpResponse Abstracts the HTTP response from the web server. This can be either an expected response for a given request from the server or an actual response when a request with certain parameters from the log is made.

30

Fig 4.9 Session package class interactions

The session package is one of the important packages as it deals with the session log. In order to process XML log file the JAXB binding is used. The figure below shows part of the schema used to generate the Java classes for the XML file. There are seven general steps in the JAXB data binding process: 1. Generate Classes An XML schema is used as input to the JAXB binding compiler to generate source code for JAXB classes based on that schema. 2. Compile Classes All of the generated classes, source files, and application code must be compiled. 3. Unmarshal XML documents written according to the constraints in the source schema are unmarshalled by the JAXB binding framework. Note that JAXB also supports unmarshalling XML data from sources other than files/documents, such as DOM nodes, string buffers, SAX Sources, and so forth. 4. Generate Content Tree The unmarshalling process generates a content tree of data objects instantiated from the generated JAXB classes; this content tree represents the structure and content of the source XML documents. 5. Validate (optional) The unmarshalling process optionally involves validation of the source XML
31

documents before generating the content tree. Note that if you modify the content tree in Step 6, below, you can also use the JAXB Validate operation to validate the changes before marshalling the content back to an XML document. 6. Process Content The client application can modify the XML data represented by the Java content tree by means of interfaces generated by the binding compiler. 7. Marshal The processed content tree is marshalled out to one or more XML output documents. The content may be validated before marshalling. To synopsize, using JAXB involves two discrete sets of activities:
 

Generate and compile JAXB classes from a source schema, and build an application that implements these classes Run the application to unmarshal, process, validate, and marshal XML content through the JAXB binding framework

SessionDataReader class read and process the XML log file using JAXB. While it processes the XML file it creates UserRequest objects for each Transaction in the XML file. In order to construct a UserRequest object, JAXB framework extracts the ClientIP, Request and Response objects. Request element in the Transaction may have Parameters associated with it. The Response object in the log file is set as expected response for that request. Fig [4.10] below shows a sample XML log file with multiple transactions. <?xml version="1.0"?> <multihttp> <title>Postcoder, live</title> <description>Changing user's postcode</description> <config> <useragent cookies="keep"/> <baseurl>http://www.bbc.co.uk</baseurl> </config> <transaction description="Collect BBC UID cookie"> <clientIP>212.139.243.64</clientIP> <request type="GET" document="http://www.bbc.co.uk/radio/index.shtml"/> <response> <test type="status">200</test> </response> </transaction> <transaction description="BBC Television"> <clientIP>212.139.243.65</clientIP> <request type="GET" document="http://www.bbc.co.uk/bbcfour/listings/index.shtml"> <key name="service_id">4544</key> </request> <response> <test type="content">August</test> </response> </transaction>

</multihttp>
Fig 4.10 part of XML log file

32

Once the XML log is processed SessionDataReader creates UserSession associated with each Client IP Address. 4. TestCase This package contains classes which generate test cases using the algorithms described in the research paper. All of the test case generation is implemented using polymorphic behaviour. a. TestCase This is the root of all classes that implements the test case generating algorithm. It is also an abstract class which defines abstract methods which has to be implemented later by its subclasses. This class also implements methods which are common to all of its subclasses. b. UserSession1 UserSession1 class defines the testcase generation algorithm. This class overwrites the base class’s generate() method using the algorithm described for UserSession 1. Implementation of UserSession 1 is same as capture replay (i.e. Execute the test in the same sequence as in the log file). c. UserSession2 Implements the UserSession2 testcase generation algorithm. Overwrites the base class’s generate() method using the algorithm for UserSession 2. The implementation of User session 2 is minor modification to user session 1. The difference is the algorithm will mix user requests randomly from two different user sessions. d. WhiteBox1 Implements the white box 1 testcase generation algorithm. In order to select test data from the pool of available user sessions the tester has to specify the URLs to be tested. The algorithm then reads the configuration file and select test data randomly from the pool of test data. e. WhiteBox2 Implements the white box testing. Also it integrates the Hybrid testing by providing values from log file. For this algorithm to select test data the tester has to specify with the entire web site configuration as an XML file. The algorithm then creates a set of navigation URLs from the configuration file. For each possible navigation paths the algorithm selects random test data from the pool of available session data.

Fig 4.11 Hierarchy Testcase generation classes

33

5. Testing The testing package implements the testing procedure for User Session and White box testing methods. The classes within this package are responsible for communicating with other packages in order to perform the test and store the results. a. Test The root class for the Testing classes. It is an abstract class provides common functionalities for its subclasses and defines abstract methods to be implemented by its subclasses. b. UserSessionTest Implements the abstract run () method of the Test class with algorithm required by User session based tests. c. WhiteBoxTest Implements the abstract run () method of the Test class with the algorithm required by Whitebox test testing methods.

Fig 4.12 Classes responsible for running the test testing

While the test is executed the HTTP responses are analysed and checked against the expected response of the system. First the run() method checks for any HTTP errors in the response. If the response does not have any errors then it checks for the equality of expected HTTP response which is set during log file process. If the received response during test matches the expected response then the result is marked as pass.

34

6. Overall UML Design

Fig 4.13. Overall UML design of the system

35

7. Database Design
A database is maintained by the system to store and analyse test results. The tables TestType, ResponseCode are filled with predefined values. TestType table contains the four types of tests that can be performed in the system and a description about what the test means. Other table are filled with data as the test progresses.

Fig 4.14 Database design diagram for test analysis

1. TestType TestType table maintains a textual description of the types of tests supported by the system. A TestTypeId uniquely identifies the test type being carried out. 2. Test This table contains details about the test carried out by the tester. When a test starts an entry is made to the table with the TestType, Date and an auto generated TestID. The TestID is the primary key and generated using the timestamp when the test was carried out. 26-7-2005 10:27:12:703 [Day]-[Month]-[Year] [Hour]:[Minute]:[Second]:[Millisecond]
Fig4.15 Sample Test ID

3. ResponseCode ResponseCode table maintains the RFC defined HTTP response codes and their response descriptions.

36

4. Response This table stores the response for a test request. ResponseId is an auto generated primary key value. Each response is associated with a Request. RequestId foreign key is maintained with the Request table for the relation. An IsPassed flag is used to keep track whether this response is considered a successful test or not. The Response field is used to store the response message returned by the HTTP request. For future enhancements the Html column is kept there. This column will store the actual content from the web server for that particular request. 2, 2, 200, OK, NULL, 1
Fig 4.16 sample data for the response table

5. Request One test will include number of requests depends on the log file size. Each request will have a RequestId as a primary key. This is an auto generated value. And the TestId is a foreign key used to associate the test this request belongs to. The URL field stores the request URL used for testing. The ResponseCode field indicates the response expected from the request. Then the ResponseCode is compared with the actual response to decide whether the test is success or failure. The Type field indicates the type of request generated. 1, http://www.bbc.co.uk/tv/, 26-7-2005 10:27:12:703, 0, 200, “” 3, http://www.bbc.co.uk/bbcone/listings/index.shtml?service_id=4223, 26-7-2005 10:27:12:703, 0, -1, August
Fig 4.17 Request table. First based on ResponseCode and second Based one Response message.

6. Parameter Some HTTP request will have parameters associated with them. The parameters used to test the web application are stored in the database for future analysis. The ParameterId is part of the combined primary key of the table. ParameterId is an auto generated value. Since each request is associated with a Request the foreign key RequestId is included in the table. A combined Primary key is define using t he ParameterId and RequestId. The rest of the columns are ParameterName and ParameterValue used for that particular request testing.

37

4.4 VIEW TEST RESULTS
A HTTP interface is provided to view the test results. Tester will be able to select the test from the list of tests carried out by the system.

Fig 4.18 View Test Results

The requests which resulted in error will be highlighted in red. By selecting on of the request tester will be able to see the request in detail. The details page contains the parameters used for testing.

Fig 4.19 View Test URL parameters

In addition to view the request parameter the tester is also able to see the real request made by the system by clicking the View full request link at the bottom. The link will take to the real site with the parameters used for testing.
38

Chapter

5

Evaluation
“You are all most there. But there are few things we have to discuss and address” - Andrew Macinnes BBC media centre

Since this project is geared towards the interest of BBC web site and the log files generated on their servers, the project was presented to a BBC staff member. The initial feedback from him was

The discussion with Andrew resulted in, the following areas needs more attention i. How the performance of the system will be given Giga bytes of data included in the log file ii. BBC would be interested in selectively testing few pages. These pages have to be automatically detected by the system. System has to select pages which are accessed more frequently by users. iii. Automate the selection of test data so that the data used for testing is the data mostly used by users In order to implement the first one, it was not possible to get Giga bytes of data within the time permitted. But the technology used to process the log file was an industry standard XML processing module from Sun micro systems. So we can expect with minor changes this issue can be addressed. Second issue slightly similar to the white box 1 testing algorithm, where the tester has to specify the pages to be tested by the application. In order to select frequently used pages the test case generation algorithm has to be modified to pre process the log file and generate a statistics about the URL page name against the number of accesses made. Using this statistics the white box 1 algorithm can automatically select the most hit pages for testing. Third issue can be addressed by adding a filter mechanism for the log file processing module. While processing the log file, the filtering module is responsible for creating a statistics about the data used against the relevant URL page request. Once the statistics are available the testcase generation module will select the data based on statistics than selecting one in random. The testing server was able to detect an error in the BBC web pages. The error was, on Fridays the time table for BBC programs were updated. If a user access the Friday timetable page it will return no time table between midnight and 1.00 PM. This bug is a known bug to BBC and the reason for this is, the timetable for the following week is updated during Friday midnight. Table below shows the number of test cases generated for a log file with 20 transactions. The test was carried out against http://www.bbc.co.uk/tv. Testcase User session 1 User session 2 White box 1 White box 2 Hybrid Number of Test cases 20 19 3 14 17 Number of Errors detected 2 2 1 0 1

Table 5.1 number of test cases generated and errors detected

39

20 18 16 14 12 10 8 6 4 2 0 User session 1 User session 2 white box 1 white box 2 Hybrid

Fig 5.2 Comparison of test cases generated for a give log file

As we can see from the graph the number of test cases generated various for different algorithms. The user session 1 algorithm always creates the same number of test cases available from the log file. On the other hand the user session2 algorithm either creates same number of test cases or less than that of available test cases. But the key is that it creates test cases which are different from the ones in the log file. White box 1 test case generation algorithm creates test cases equal to the number of URLs specified in the white box configuration file. So in the case of white box 1 algorithm the number of test cases given by Number of Test cases

α Number of URLs in the configuration file

Fig 5.3 Relation between number of test cases and URLs in the config file

Compared to White box 1 algorithm, the white box 2 algorithm also limited by the overall web sitemap configuration. If for example the web site contains 15 different navigation paths the number of test cases generated will be equal to the number of possible navigations. Number of Test cases

α Number of Navigation paths

Fig 5.4Relation between number of test cases and web site map

To compare the effective error detection ability of different algorithms, the bar chart below shows the number of test cases used for each algorithm and the number of errors the system was able to detect. The total number of errors included in to the system was two. As we can observe from the statistics both User session type algorithms were able to detect the errors. On the other hand the white box 1 technique was able to detect only one error. The error detection capabilities of white box algorithms are restricted due to the randomisation used in test data selection. For the white box1 algorithm to detect errors it has to satisfy two conditions according to the current implementation. First is the URL containing the error has to be specified in the white box1 configuration file. And the second is when selecting test data from the log file, the set of data which causes the error needs to be used during testing. On the other hand the Whitebox2 algorithm needs to satisfy one condition according to the current implementation. The condition is during test data selection the algorithm has to pick up set of values which caused the error.
20 18 16 14 12 10 8 6 4 2 0 User User session session 1 2 white box 1 white box 2 Hybrid

Testcase size Errors Detected

Fig 5.5 Effectiveness in error detection
40

Chapter

6

Conclusion

Testing web application is one of the interesting and challenging areas. Lot of work is carried out in the industry and academic researches. Identifying the unique features of web application and utilizing user data to test and validate web application is challenging and requires lot of time and effort. We identified features that are unique to web applications and used them in test case generation. The current proposed method was modified so that the application under test needs not to be modified. Tests carried out on one of world’s largest hit web site http://www.bbc.co.uk demands more scalability and modifications to the algorithm to suite their needs. The challenges faced in the implementation were, first to process and transfer XML files in a using a generic interface. Second to capture the HTTP packets in the server side and third is the deployment of the web server in different LANs requires the server to be configured differently to use proxy. Also discussions with BBC staff leads for future works in the areas of scalability and test data selections.

41

6.1 Future work
While working on the initial requirements we found that there are areas we can further improve. Issues relating to the number of test cases generated, effectiveness on error detection, and scalability. Number of test cases generated by white box algorithms could be modified so that the random data selection is replaced by either most frequently used data or all available data. By doing this the effectiveness on error detection will increase. Selecting frequently used data is one of desired method for BBC. In order to give more flexibility on error detection the view results page could be modified to allow the testes to provide values in run time while viewing the test results the tester can provide dynamic values and re execute the test. Also the current testing server runs in single threaded mode. The addition of multithreading will allow the tester to test the system for scalability as well. Current system will successfully test pages with simple links. But there are applications which have parent child relation within web pages. Most of the tools available in the industry do not support testing of this type of web page organization. This is another area some future work can be carried out. HTTP packet capturing module needs modification in order to log the response messages from the server. At present the module will log the requests to the log file.

42

Chapter

7

Bibliography
- Document Type Definition - Hyper Text Transfer Protocol - Java Architecture for XML Binding - Java Database Connectivity - Java package for packet capture - Local Area Network - Request for Comment - Uniform Resource Locator - Unified Modelling Language - Windows Packet Capture Library - EXtensible Mark-up Language

GLOSSARY DTD HTTP JAXB JDBC Jpcap LAN RFC URL UML WinPcap XML

REFERENCE [1] Improving Web Application Testing with User Session Data: Sebastian Elbaumy, Srikanth Karrey, Gregg Rothermel [2] Hyper Text Transfer Protocol – HTTP 1.1: http://www.w3.org/Protocols/rfc2616/rfc2616sec10.html [3] Rational Unified Process: http://www-306.ibm.com/software/rational/ [4] EXtensible Mark-up Language: http://www.w3schools.com [5] Apache log files: http://httpd.apache.org/docs-2.0/logs.html [6] IIS log files: http://www.microsoft.com/resources/documentation/WindowsServ/2003/standard/proddocs/en -us/Default.asp?url=/resources/documentation/WindowsServ/2003/standard/proddocs/enus/log_aboutlogging.asp [7] Windows Packet Capture Library: http://www.winpcap.org/ [8] Java package for packet capture: http://netresearch.ics.uci.edu/kfujii/jpcap/doc/index.html [9] Java Architecture for XML Binding: http://java.sun.com/xml/jaxb/about.html [10] PHP Documentation: http://www.php.net/ [11] Java : http://java.sun.com [12] .NET: http://msdn.microsoft.com/ [13] BBC TV web site: http://www.bbc.co.uk/tv [14] HTTPUnit Testing tool: http://httpunit.sourceforge.net/

43

[15] HTMLUnit Testing tool: http://htmlunit.sourceforge.net/ [16] MaxQ Testing tool: http://maxq.tigris.org/ [17] JWebUnit Testing tool: http://jwebunit.sourceforge.net/ [18] SlimDog testing tool: http://slimdog.jzonic.org/

44

Appendix A – Project Source code
Following pages contains source code of the project. The source code also included in the CD.

45


								
To top