IMPROVING ACCESS LATENCY OF WEB BROWSER BY USING CONTENT ALIASING IN

Document Sample
IMPROVING ACCESS LATENCY OF WEB BROWSER BY USING CONTENT ALIASING IN Powered By Docstoc
					  International Journal of JOURNAL OF COMPUTER (IJCET), ISSN 0976-
 INTERNATIONALComputer Engineering and Technology ENGINEERING
  6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
                           & TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)                                                   IJCET
Volume 4, Issue 2, March – April (2013), pp. 356-365
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
                                                                        ©IAEME
www.jifactor.com




     IMPROVING ACCESS LATENCY OF WEB BROWSER BY USING
          CONTENT ALIASING IN PROXY CACHE SERVER

                                Sachin Chavan1, Nitin Chavan2
              1
                 Department of Computer Engineering, MPSTME, NMIMS, Shirpur
             2
                 Department of Information Technology, MPSTME, NMIMS, Shirpur




  ABSTRACT

          The web community is growing so quickly that the number of clients accessing web
  servers is increasing nearly tremendously. This rapid increase of web clients affected several
  aspects and characteristics of web such as reduced network bandwidth, increased latency, and
  higher response time for users who require large scale web services. This paper considers
  different types of proxy actions and proposes a novel design and methodology to address
  these issues. Focused on studies in what way they influence the browser display time. It
  discusses also acceptable loading times and the scope of cacheable objects. The methodology
  works by analysing content in the proxy cache, identifying content aliasing, duplicate
  suppression and by the creation of the respective soft links. The present solution makes
  intelligent use of the proxy cache server to overcome these problems. In this study proxies
  were designed to enable network administrators to control internet access from within
  intranet. But when proxy cache is used, there develops the problem of Aliasing. Aliasing in
  proxy server caches occurs when the same content is stored in the cache several times. The
  present methodology improves performance in case of access latency and browser response
  time at the same time it avoids storing the same content in cache multiple times those results
  in wastage of storage space.

  KEYWORDS: Access Latency, Cache, Web Proxy, Mirroring, and Duplicate Suppression,
  Content aliasing.




                                               356
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

1. INTRODUCTION

        In the field of web server management, researchers have focused on aliasing in proxy
server caches for a long time. Web caching consists of storing frequently referred objects on a
caching server instead of the original server, so that web servers can make better use of
network bandwidth, reduce the workload on servers, and improve the response time for users.
Aliasing means giving multiple names to the same thing.
        The proxy cache also stores all of the images and sub files for the visited pages, so if
the user jumps to a new page within the same site that uses, for example, the same images,
the proxy cache has them already stored and can load them into the user's browser quicker
than having to retrieve them from the Web site server's remote site. Aliasing in proxy server
caches occurs when the same content is stored in cache multiple times. On the World Wide
Web, aliasing commonly occurs when a client makes two requests, and both the requests
have the same payload. Currently, browsers perform cache lookups using Uniform Resource
Locators (URLs) as identifiers.
        Websites that contain the same content are called mirrors. Mirrors are redundancy
mechanisms built into the web space to serve web pages faster, but they cost in terms of
cache space. As the amount of web traffic increases, the efficient utilization of network
bandwidth increasingly becomes more important. The Technique needs to analyse web traffic
to understand its characteristics. That will optimize the use of network bandwidth to reduce
network latency and to improve response time for users [8].
        A proxy cache is a shared network device that can undertake Web transactions on
behalf of a client, and, like the browser, the proxy cache stores the content. Subsequent
requests for this content, by this or any other client of the cache will trigger the cache to
deliver the locally stored copy of the content, avoiding a repeat of the download from the
original content source [4].




                                                         Proxy Cache Server




                Bandwidth Saving and Traffic Reduction

                         Figure 1. Concept of Caching (Proxy Cache)




                                                357
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

1.1 Advantages of Caching
1. Web caching reduces the workload of the remote Web server
2. Client can obtain a cached copy at the proxy if the remote server is not available.
3. It provides us a chance to analyze an organization usage patterns.
1.2 Disadvantages of using a caching:
1. A client might be looking at stale data due to the lack of proper proxy updating.
2. The access latency may increase in the case of a cache miss due to the extra proxy
processing.
3. A single proxy cache is always a bottleneck.
4. A single proxy is a single point of failure.

2. RELATED WORK

2.1 The Access Latency
        Latency is defined as the delay between a request for a Web page and receiving that
page in its entirety. The latency problem occurs when users judge the download as too long.
Unacceptable latency does not only adversely effects user satisfaction. Web pages that are
loaded faster are judged to be significantly more interesting than their slower counterparts
[12].
        Studies on human cognition revealed that the response time shorter than 0.1 second is
unnoticeable and the delay of 1 second matches the pace of interactive dialog. Following
table shows the transfer rate of different connection types.

                   Table 1. Transfer Rates for different connection Type
                  Connection Type                 Slow      Normal      Maximum
        Modem 33k6                               <2.734       ≈3          ≈3.65
        Modem 56k                                <4.199       ≈5          ≈6.08
        ISDN 64k                                 <5.469       ≈6          ≈6.94
        Cable                                    <9.766       ≈17      by provider
        ADSL                                     <12.21       ≈24         ≈732
        Ethernet 10Base-T (10 Megabits/sec)        <73       ≈195         ≈977

Table shows the different parameters that affects the access time of browser. The different
parameters are type of connection used by the user and the condition of connection. The
timing of internet use also affects on access latency due to bandwidth sharing.

2.2 Web Traffic
        The amount of data sent and received by visitors to a website is web traffic. It is
analysis to see the popularity of web sites and individual pages or sections within a site. Web
traffic can be analyzed by viewing the traffic statistics found in the web server log file, an
automatically generated list of all the pages served.
        Traffic analysis is conducted using access logs from web proxy server. Each entry in
access logs records the URL of document being requested, date and time of the request, the
name of the client host making the request, number of bytes returns to requesting client, and
information that describe how the clients request was treated as proxy [1].
 Processing these log entries can produce useful summary statistics about workload volume,
document type and sizes, popularity of document and proxy cache performance [5].

                                             358
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

2.3 Static Caching
        It is a new approach of web caching which uses yesterday’s log to predict the today’s
user request. The static caching algorithm defines a fixed set of URLs by analyzing the logs
of previous periods. It then calculates the value of the unique URL. Depending on the value,
URLs are arranged in the descending order, and the URL with the highest value is selected.
This set of URLs is known as the working set. When a user requests a document and the
document is present in the working set, the request is fulfilled from the cache. Otherwise, the
user request is fulfilled from the origin server [6].

2.4 Dynamic Caching
        Dynamic caching is more complex than static caching and requires detailed
knowledge of the application. One must consider the candidates for dynamic caching
carefully since, by its very nature, dynamically generated content can be different based on
the state of the application. Therefore, it is important to consider under what conditions
dynamically generated content can be cached returning the correct response. This requires
knowledge of the application, its possible states, and other data, such as parameters that
ensure the dynamic data is generated in a deterministic manner [3].

2.5 MD5 Algorithm
        MD5, developed by Ron Rives in 1992, is a comparison cryptographic hash algorithm
that succeeded the MD4 algorithm. MD5 takes an input of any length and generates an MD5
digest of fixed length (128 bits or 32 characters). Because MD5 uses the same algorithm
every time, a particular data string always generates the same MD5 hash every time.
MD5 cryptographic hash offers several advantages over its predecessors (such as MD4) and
its competitors (such as, SHA and SHA.1). One of these advantages is that MD5 is a one way
cryptographic hash. Another advantage is that MD5 can accept inputs of any length but still
generates a fixed length output. MD5 is fast, and it is highly unlikely that two different
strings can hash to the same digest. Moreover, with MD5 it is also highly unlikely that two
different input strings can hash to the same digest. Furthermore, MD5 is reliable in the sense
that the same input string always yields the same output digest every time [11].

3. EXPERIMENTAL SETUP

3.1 Changing of proxy server
         In most of the organization’s or institution server does not support the proxy cache, so
it is difficult to use main server as cache server so we have to change the proxy server from
main server to other server [2].
Following are the steps to switch machine to other proxy:
     1. Open the browser for ex. Internet Explorer
     2. In internet explorer pull down the Tools menu and click Internet Options...
     3. Click the Connections tab:
     4. click the LAN Settings... button:
     5. In the Address: box change "proxy1 Address" to "proxy2 Address" or vice versa and
         click OK.
     6. Click OK on the Internet Options dialogue box to get back to the browser screen and
         you will now be able to get external sites.



                                              359
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

3.2 Duplication of Data
        Duplication of data means storing the multiple copies of same data object. In case of
cache when we cache the object or the webpage that web page is stored at cache memory but
when the different users request the same page then the multiple copies of that object or web
page is stored at cache memory which results in the wastage of storage space as we all know
the maintenance of cache is an expensive task so such wastage is not affordable. To avoid the
problem of duplication of the data objects or web page duplicate suppression mechanism is to
be used [7]. If the duplicate copy of data is saved at proxy cache then it acquires more space
of storage in the analysis part given in work shows that the effect of duplication in the cache
space [4].

3.3 Duplicate Suppression
        You can reduce storage space requirements by avoiding duplicating copies of the
same data. Content Engine provides the option to suppress storage of duplicate content
elements. Duplicate suppression applies to any kind of content. Incoming content is not
added to the storage area if identical content exists in the storage area; only unique content is
added [14].
Due to large network size there are many pages on web, most of those pages will not be
referenced multiple times by any one cache, means the probability with which the Kth page
will be referenced is 1/K. re-referenced follow a distribution similar to Zipf’s law [9].

3.5 Experimental Results
        The experimentation carried out at the lab of our institute. Some of popular websites
are considered for experiment. Those websites are use to analyse for access latency of
browser under different conditions. Keyword based search also used for Latency time based
on the type of content either image or text search.

            Table 2. Response time of search engine for Text and Image Search.
                               Text Search                         Image Search
                          From            From                From            From
    Keywords            Web Server    Cache Server          Web Server    Cache Server
    SVKM                   250             140                 230             200
    NMIMS                  140             130                 300             100
    RCPIT                  250             120                 350             150
    CANNON                 240             130                 250             100
    SAMSUNG                210             140                 640             200
    NOKIA                  250             190                 240             120
    MATLAB                 240             160                 280             160
    OPERA                  250             150                 120             120
    SIEMENS                230             160                 310             100
    MICROMAX               160             140                 190             110
    MPSC                   170             140                 180             100
    UPSC                   210             150                 150             140
    IRCTC                  160             140                 330              90
    RRB                    260             120                 310              70


                                              360
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

        Table 2 show the reduction into response time of browser when page is fetched from
cache server instead of web server. First column shows the different keywords which is used
for analysis. Same keywords are used for the text search and image search. Table contains
response time of browser for text search as well as image search. From table we can say that
there is considerable amount of reduction of access latency when the page is fetch from
Proxy Cache.
        Figure 2 shows the comparison of response time when the page is fetch from main
server and the response time when it is fetch from proxy cache. From Figure 2 we can say
that there is considerable amount of reduction of the response time. Figure shows the graph
plot for comparison of response time when the response comes from main source and when
the response comes from local cache server for Text Search for some keywords. Here first bar
shows the response time when the page is fetch from Web server where second bar shows the
response time when the page is fetch from local proxy cache server where we have
implemented content aliasing algorithm.




                 Figure.2 Response time of Search engine for Text Search

       From Figure 2 it is clear that in text search for keyword we get 40 or more than 40
percent of reduction of response time. Where in case of some keywords like Samsung,
IRCTC, RRB, Siemens the response time is reduced by more than 70 percent. Wherein case
of opera, SVKM, and UPSC it is negligible or at most 10 percent. It is due to dynamic
content comes under the search.
       Figure 3 shows the comparison of response time for image search for given keywords
when the page is fetch from main server and the response time when it is fetch from proxy
cache. From Figure 3 we can say that there is difference between the response times. Figure
shows the graph plot for comparison of response time when the response comes from main
source and when the response comes from local cache server for Image Search for some
keywords. Here first bar shows the response time when the page is fetch from Web server
where second bar shows the response time when the page is fetch from local proxy cache
server where we have implemented content aliasing algorithm.



                                            361
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME




                Figure 3. Response time of Search engine for Image Search

       From Figure 3 it is clear that in Image search for keyword we get very less amount of
reduction in the response time because the images are more dynamic than the text.

        Table 3. Connection Time and Response time of browser for some Websites.
                                       From Web Server          From Cache Server
      WEBSITE                        Connection Response       Connection Response
      www.nmims.edu                    7000      44000           3000       14000
      www.rcpit.ac.in                  6120      26140           3920       10310
      www.mpsc.gov.in                  5800      25700           3200       6390
      www.upsc.gov.in                  1890       4760            320        690
      www.unipune.ac.in                2480       8600           1130       1580
      www.wipro.com                    2300      24750            900       3780
      www.infosys.com                  1710      18180            770       1980
      www.techmahindra.com              990      18000           1260       7250
      www.jaihindcollege.com           1210      13230            500       1170
      www.jaihindcollege.ac.in         1800      15930            540       1040
      www.msbte.com                    1800      10170            810       1130
      www.msbshse.ac.in                1530       4550            540       1040
      www.cbse.nic.in                  1130       5580            630        900
      www.irctc.com                    1710      12960           1670       3240

        Table 3 shows the connection time and response time of browser for a various sites. It
gives the comparison of connection time and response time when page is fetched from cache
server instead of web server. First column shows the different websites which is used for
analysis. From table we can say that there is considerable amount of reduction of access
latency when the page is fetch from Proxy Cache instead of main server.


                                             362
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME




                      Figure.4 Connection time for different Websites

       Figure 4 shows the effect of content aliasing on the access time of web browser in
terms of connection time. In maximum cases we get more than 50 percent of reduction in
connection time. In some cases the reduction is 30-50 percent. In case of IRCTC website the
reduction in connection time is negligible. Where in case of ‘TECHMAHINDRA’ website
connection time increased. It is due to the dynamic content is more on website.




                      Figure. 5 Response time for Different Websites

       Figure 5 shows the comparative graph of response time of browser for different
websites. When the web page is fetched from cache server then the response time is less.
From above graph we can say that the reduction in response time is more than 60 percent in
each case. In some cases the reduction into the response time is more than 90 percent. So by
using the content aliasing in proxy cache server we get significant amount of time save in
terms of response time as well as connection time.
                                            363
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

       It is clear that amount of user time is saved by using the concept of content aliasing.
We have achieved reduction of access latency by also considering other parameters like
cache size, stale data.

4. CONCLUSION

         The analysis based experimental results proves the need for methodology that
improve the web access performance to enhance bandwidth utilization and greater
connectivity speed. Here the suggested Design aspects improve the web performance in terms
of reduced latency, improved user response time, and optimal use of the existing bandwidth
by using web caching. Content aliasing successfully detected using a web based application,
database queries and files system calls. A considerable amount of duplicate storage can be
avoided through the suggested methodology. It is, therefore, a very useful mechanism for
web proxy caches. Moreover, the solution is successfully able to keep cached pages in
synchronization with the pages on the web server, checking for new pages if needed. This
work can be further optimize by the Daemon Process, which can be design and run
periodically to check the consistency of the data cached and the data at the web server. This
can be scheduled during the slack time with the less traffic which will not add any additional
toll on the bandwidth as well as it updates the TTL – Time to Live Period of the cached data.

REFERENCES

[1]   Kartik Bommepally, Glisa T. K., Jeena J. Prakash, Sanasam Ranbir Singh and Hema A
      Murthy “Internet Activity Analysis through Proxy Log” IEEE, 2010.
[2]   E-Services Team, “Changing Proxy Server” by the Robert Gordon University, School
      hill, Aberdeen, Scotland-2006.
[3]   Chen, W.; Martin, P.; Hassanein, H.S., "Caching dynamic content on the Web,"
      Canadian Conference on Electrical and Computer Engineering, 2003, vol.2, no., pp.
      947- 950 vol.2, 4-7 May 2003.
[4]   Sadhna Ahuja, Tao Wu and Sudhir Dixit “On the Effects of Content Compression on
      Web Cache Performance,” Proceedings of the International Conference on Information
      Technology: Computers and Communications, 2003.
[5]   Mark S. Squillante, David D. Yaot and Li Zhang “Web Traffic Modeling and Web
      Server Performance Analysis” Proceedings of the 38' Conference on Decision &
      Control Phoenix, Arizona USA December 1999.
[6]   C. E. Wills and M. Mikhailov, “Studying the Impact of More Complete Server
      Information on Web Caching,” Computer Communications, vol. 24, no. 2, pp. 184.190,
      May 2000.
[7]   J Wang “A Survey of Web Caching Schemes for the Internet” - Cornell Network
      Research Group (C/NRG), Department of Computer Science, Cornell University 1999.
[8]   N. Shivakumar and H. Garcia-Molina, “Finding near Replicas of Documents on the
      Web” Proc. Workshop on Web Databases, Mar. 1998.
[9]   L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf like
      Distributions: Evidence and Implications. In Proc. Infocom ’99. New York, NY, March,
      1999.




                                             364
 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME

[10]   Guerrero, C.; Juiz, C.; Puigjaner, R.; "Web Performance and Behavior Ontology,"
       Complex, Intelligent and Software Intensive Systems, 2008. CISIS 2008. International
       Conference on, vol., no., pp.219-225, 4-7 March 2008.
[11]   Kimmo Jarvinen, Matti Tommiska and Jorma Skytta, “Hardware Implementation
       Analysis of the MD5 Hash Algorithm,” IEEE Computer Society. 2005.
[12]   Andrzej Sieminski, “The impact of Proxy caches on Browser Latency” International
       Journal of Computer Science & Applications, 2005, Vol. II, No. II, pp. 5 – 21.
[13]   S B Patil, Sachin Chavan, Preeti Patil; “High quality design to enhance and improve
       performance of large scale web applications” International Journal of Computer
       Engineering and Technology (IJCET), Volume 3, Issue 1, January- June (2012),
       pp. 198-205, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[14]   S.Vikram Phaneendra, “Minimizing Client-Server Traffic Based on AJAX”,
       International journal of Computer Engineering & Technology (IJCET), Volume 3,
       Issue 1, 2012, pp. 10 - 16, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[15]   A. Suganthy, G.S.Sumithra, J.Hindusha, A.Gayathri and S.Girija, “Semantic Web
       Services and its Challenges”, International journal of Computer Engineering &
       Technology (IJCET), Volume 1, Issue 2, 2010, pp. 26 - 37, ISSN Print: 0976 – 6367,
       ISSN Online: 0976 – 6375.




                                            365

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:5/6/2013
language:
pages:10