Learning Center
Plans & pricing Sign in
Sign Out

Bandwidth Optimization for Mobile Thin Client Computing through


									       Bandwidth Optimization for Mobile Thin Client
       Computing through Graphical Update Caching
           B. Vankeirsbilck, P. Simoens, J. De Wachter, L. Deboosere, F. De Turck, B. Dhoedt, P. Demeester
                                IBBT - Ghent University, Department of Information Technology
                                                  Gaston Crommenlaan 8, bus 201
                                                       9050 Gent, Belgium
                                             Tel +32 9 33 14937 - Fax +32 9 33 14899
                      {bert.vankeirsbilck, pieter.simoens, jeroen.dewachter, lien.deboosere}

   Abstract-This paper presents graphical update caching as a       before the thin client computing session starts. When the user
mechanism to reduce the network load generated by thin client       logs in, the cache is loaded both at client and server side, and
computing systems. In such system, the user interaction and
processing are separated by a network. User input such as           does not get altered during this session.
keystrokes and mouse clicks are sent to the server over the            This article is organized as follows: section II discusses
network and graphical updates are transported the reverse way.      related work. In section III, our caching concept and some
The cache proposed in this article is static, meaning that it is    significant remarks are presented. Section IV tackles
composed before the thin client computing session starts and that   experimental validation through an implementation. The
the cache does not change during the session. Through
experiments with an implementation of the cache, we show that       results of our experiments are presented and discussed. Finally,
graphical update caching effectively reduces the network load       conclusions are drawn in section V.
generated by thin client computing.                                                      II. RELATED WORK
                      I.   INTRODUCTION                                Research has been conducted on various sorts of caches in
                                                                    thin client computing systems. Generally speaking, the
   The thin client computing (TCC) concept comes down to
                                                                    difference with this related work is that we aim to reduce long
moving the user’s applications to a distant server and running
                                                                    distance redundancies in a sequence of graphical updates by
a thin client protocol between client and server. The client
                                                                    caching complete frames that contain visual pixel information.
device only deals with user interaction. The protocol forwards
                                                                    A. Compression history extension
user input (e.g. keystrokes) to the server and delivers graphical
                                                                       In order to reduce the required network bandwidth,
updates back to the client for presentation on screen. This
                                                                    graphical updates should be represented by as few bytes as
approach has proven to work well over LAN and WAN with
                                                                    possible and redundant data should be minimized. In [2] it is
reasonably high bandwidth [1]. We focus on mobile client
                                                                    shown that up to 23.3% network traffic reduction can be
devices, which connect to the server over a wireless link. Both
                                                                    achieved with 512 Kbytes used for vertically extending the LZ
from wireless network and client battery lifetime perspective,
                                                                    history buffer. In essence, this vertical LZ history buffer
it is necessary to optimize bandwidth usage. If the network
                                                                    extension means using multiple separate histories and
cannot supply enough bandwidth at a given time, the user
                                                                    selecting the best fitting history for a given chunk of data that
interface will hamper and the more network activity is needed,
                                                                    needs compression, instead of using one longer history (i.e.
the more energy will be drained from the client device battery.
                                                                    horizontal extension).
   In this paper we propose graphical update caching as a
                                                                       In [3] the authors identify the presence of data spikes and
method to reduce long term redundancies in thin client
                                                                    their importance for momentary network load. Average
sessions. When analyzing the sequence of graphical updates
                                                                    bandwidth requirements for the thin client traffic can be
generated by a typical user session on a desktop computer, we
                                                                    fulfilled by the network, but peak demands could cause
found that a lot of frames resemble others that have already
                                                                    trouble: high delay, buffering (and overflow), packet loss,
been transmitted earlier in that session. Since we operate in a
                                                                    retransmission… leading to hampering user experience. The
thin client environment where all graphical updates have to be
                                                                    solution presented in this article is to cache fixed-length byte
sent over a wireless, limited bandwidth network, benefit can
                                                                    strings selected from data packets representing a part of a
be found in this phenomenon by storing well-chosen key
                                                                    compressed graphical update.
frames both at client and server side, and transmitting only the
                                                                       Both of these papers focus on extending and tuning a
differences with respect to that frame. Since less data is to be
                                                                    compression scheme to the specific thin client traffic. They try
received from the network at the client, the battery autonomy
                                                                    to eliminate the long distance redundancies too, but search for
of the device decreases slower. This will contribute to the user
                                                                    solutions on the compression stage of the image coding stack,
satisfaction and will lessen the load on the environment.
                                                                    while we work with the pixel data from the framebuffer.
   We assess the bandwidth optimization potential of a static
cache. This is a cache for graphical updates, that is filled
B. Hybrid protocol: video content                                   cache eviction, run-time election of cache elements and client-
   In [4] it is shown that video content is better transmitted      server synchronization.
through video streaming instead of sending it over a classic
thin client protocol (such as VNC [5] or RDP [6]). The authors
propose a hybrid approach. Depending on the amount of
change in the subsequent frames (motion), the system
switches between video streaming and classic VNC mode.
This way, they succeed in delivering improved Quality of
Experience (QoE) with reasonable bandwidth consumption.
The proposed method in this paper improves classic thin client
protocols by relieving them of encoding series of updates that
it cannot handle well: high motion content. The authors focus
on better encoding and compressing frequent, often
independent frame updates.                                                              Figure 1. Caching architecture
C. NX-call caching
   FreeNX [7] is an open source thin client protocol that              The architecture for integrating a static cache in a thin client
translates X-calls to the more bandwidth efficient NX-call          computing system is shown in figure 1. At the server side, the
format before they are transported over the network. A              executing applications of the user write their graphical output
complex system of caches exists in this protocol to avoid           to the server’s framebuffer. This framebuffer is analyzed in
redundancy on the data in the NX-calls. The caches in this          order to choose an optimal encoding method. If no suited
system are more focused on the reuse of graphic primitive           cache frames are found for this graphical update, it will be
calls and to a lesser extent on the content itself.                 directly encoded using a native encoding scheme from the
D. Caching in RDP and ICA                                           used thin client protocol and sent to the client. If there is a
   Two popular thin client protocols are Microsoft Remote           well matching cache frame, this cache frame is subtracted
Desktop Protocol (RDP) [6] and Citrix Independent                   from the graphical update and the difference is encoded using
Computing Architecture (ICA) [8]. Both are closed source, so        the classic encoding scheme and sent over to the client. In
less detail is known about the caching involved in the              addition, a cache header containing necessary parameters such
protocols. We do know that both incorporate a bitmap cache.         as the used cache frame has to be sent over to the client. At the
   According to [9], small parts of the screen are cached:          client side, the received data is decoded, and depending on the
(default & minimum) 1.5MB volatile cache stored in RAM              presence of a cache header, the indicated cache frame is added
and persistent bitmap cache (on disk). In RDP version 5.0 the       to the decoded frame and delivered to the client framebuffer
maximum sizes have been increased to 10 MB [10]. Xrdp [11],         which eventually is presented on the screen of the client
an open source terminal services client that complies with the      device.
Microsoft RDP server confirms that smaller bitmaps are                 We have designed a cache header in compliance to the
cached, not full frame updates. According to the presented          header rectangles from the popular Virtual Network
implementation, the cache consists of slots with fixed sizes:       Computing (VNC) [5] thin client protocol. This header is 12
256 pixels or smaller, 1024 pixels or smaller and 4096 pixels       Bytes long, consisting of a 4 Byte encoding indicator (an
or smaller. None of these single cache slots are big enough to      integer value), and an 8 Byte ‘rfbRectangle’. In normal
store a full screen frame of today’s popular resolutions.           headers this rfbRectangle contains four 2 Byte fields (short
   Even less details are known about the ICA. We know that          values), representing an x and y position and the width and
storage is reserved for caching, but the content and the size of    height of the rectangle, stating the size of the update. We
the cache is unknown.                                               chose to use the x and y components to allow translations of
   In contrast to our full frame caching and redundancy             the difference frame over the cache frame in case there is
reduction attempt, both solutions cache smaller parts of the        resolution dissimilarity between client and server. (This gets
screen.                                                             explained in subsection II.D in this text.) We use the width as
                    III. CACHING CONCEPT                            indicator for the cache frame to which the difference is
                                                                    computed. The height is unused for this static cache, but is
  We investigated the benefit of a static cache, in which the       kept for conformity.
cache frames are selected before the thin client session starts        The key difficulties in implementing the presented
and are stored in a file. This selection is based on the expected   architecture are the choice of the cache frames and the
graphical content that will be generated by the user. When the      decision whether the cache will be used for encoding the
user connects, these cache frames are fetched from the file,        frame at hand and if so, which cache frame will give the best
and do not change during a user session. This means that the        compression.
system is free of difficulties related to dynamic caches, e.g.
A. Statically choosing cache frames
   We have taken a thin client usage session offline by storing
the sequence of graphical updates. This session consisted of,
starting from the desktop background, opening Open Office
writer, typing a text, closing down the office program, opening
an internet browser, performing a Google search, followed by
visiting the homepage of a local newspaper containing
multimedia content (in the form of banners) which was
scrolled down to the bottom. Finally the browser was closed,
ending the trace with the desktop background again.

                                                                           Figure 3. Compression factor for varying image sizes.

                                                                      Figure 3 shows the spread of the compression factor of a
                                                                   series of images taken from a thin client usage session. They
                                                                   are expressed as the modified percentage of a full screen
                                                                   frame of 1024 by 768 pixels. (The compression factor
                                                                   represents the raw image size divided by the compressed size.)
                                                                   The graph confirms that there is a big variance in compression
                                                                   factor for a given image size. This chart has been generated by
                                                                   executing a VNC thin client session and logging the update
        Figure 2. Number of bytes differing between subsequent
               frames in a thin client computing session.          size (in pixels) and the compressed byte size. The same trace
                                                                   was performed as described in the previous subsection. The
   We have computed the byte-per-byte differences between          updates were all encoded through Tight encoding [12]
subsequent full screen frames (in uncompressed format). We         (version 1.3.9), which makes use of various compression
have taken the number of different bytes as a measure for the      schemes that best suit the content of the update at hand. The
resemblance between frames. Figure 2 presents these                default settings for this encoding were adopted.
difference frames, which show a spiked path. The peaks are         C. When will the cache be efficient?
interpreted as big differences between two subsequent frames.         Figure 3 also teaches us that the bigger the image size, the
Generally speaking these peaks are followed by a tail of           higher the possible compression factor gets. Since the
frames that do not differ much from their predecessor.             compression factor depends on the content, the difference of
   Through a matrix of mutual distances, we were able to           an image with a well-suited cache frame can result in a
identify the optimal combination of a predefined number of         considerably higher compression factor.
cache frames, indicated by squares in figure 2. Optimal cache         The cache will be efficient in the case that a big update is
frames are those that, combined, result in the smallest distance   requested. These big updates are expected when the user starts
to the complete sequence. After visual inspection we found         an application or when switching between applications, and
that the optimal cache frames represented the applications that    comply with the spikes identified in figure 2. A video file,
were executing, i.e. the desktop background, the office            causing very fast successive peaks will not be handled well by
program, the Google startup page and the homepage of the           a cache though, since they will not often map to the cache
local newspaper.                                                   frames that are to be predicted before the session starts. A
B. Compression factors and difference frames                       solution for this, as suggested in [4], is to stream these video
   We cannot simply state that the raw byte size of an image is    files.
a straight guideline for the compressed byte size. A lot           D. Size of the cache frames
depends on the content of the image. This is why it is difficult     In a practical setting, one cannot assume that thin client and
to predict the bandwidth gain of encoding difference frames        server have the same screen/framebuffer resolution. This rises
from the cache in function of the bandwidth used by coding         the question what the size of the cache frames should be. In
the frames themselves. Moreover, there is no strict correlation    our opinion the size of the cache frames will be dictated by the
between the original image and the difference with the cache,      server resolution. This is an upper limit for the thin client
because the separate cache frames are likely to produce            session resolution in case the client resolution is higher than
separate difference frames.                                        the server resolution, and when the server resolution is higher
than the client resolution one possibility is that only the visible      In the architecture there is a component that is hard to
parts of the screen will be requested by the client. If the size of   implement because of the difficult a priori prediction of the
the cache frames matches the server resolution, these solutions       compression factor for a given frame update. In our test
map directly onto updating the client framebuffer based on the        implementation, the decision is made by encoding the update
cache. Note that in this setting the caches have to be equal on       multiple times, i.e. plain tight coding and coding relative to all
client and server side, so the size of the cache frames does not      cache frames, and choosing the best one to send over to the
necessarily match the client resolution. Another technique that       client. This means we use an a posteriori decision process.
exists to cope with smaller client screens is scaling at server          Table 1 shows the machine configurations used in our
side. In this case the cache frames at the client side could be       experiments.
downscaled versions of those at the server.
                                                                      Table 1. Specification of test machine configurations.
                                                                      Role               Hardware                     OS / Software
                                                                      Thin Client        2.11GHz,    AMD              Kubuntu 8.04
A. Background information on VNC
                                                                                         Athlon 64 X2 Dual            Tight VNC client 1.3.9
   The proposed architecture for the static cache has been
implemented inside an existing Virtual Networking
                                                                                         512 MB RAM
Computing (VNC) [5] system. This system already contains
                                                                                         10/100BaseT NIC
some useful optimizations that deserve clarification before
discussing the experiments performed on our implementation.           Thin Server        2.11GHz,    AMD              Kubuntu 8.04
VNC uses a thin client protocol that divides the graphical                               Athlon 64 X2 Dual            Tight VNC server 1.3.9
updates into rectangles. Each rectangle can be encoded in a                              Core
different way, but practically one and the same preferred                                512 MB RAM
encoding method is used. It is a pull based protocol, demand                             10/100BaseT NIC
driven by the client. This mechanism is very well suited for
low bandwidth networks, because the slower the network or                We have experimented with a trace that was conducted as
the client is, the slower the rate of updates becomes. VNC            follows: The trace consisted of starting the thin client session,
shows a mechanism of incremental updates. The requested               resulting in the desktop background to be shown. A command
region is analyzed to find the regions that are modified with         shell window (with black background) was opened. Some
respect to the screen information already visible at the client.      commands were entered. Then Open Office Writer was started,
Unmodified regions should be omitted for encoding and                 a text was typed, the office program was minimized. A
transmission to the client as this is clearly redundant               browser was started, performing a Google search. Then there
information. There is also the concept of copy-region. This           was fiddled with the menus of the browser. The homepage of
region is a part of the current screen update that can be copied      a local newspaper was loaded and scrolled down. The browser
into another part of this update. Identification of this kind of      was closed. The shell window was closed. The office program
redundant information in individual graphical updates can             was maximized and closed down. Then the thin client session
result in a significant drop in required bandwidth since only a       was ended. This trace was simultaneously encoded using
translation vector is to be transmitted.                              Tight encoding (default compression settings) and relative to
   In particular we used the Tight VNC [12] variant that adds         five cache frames, computing the difference frames which are
its own advanced encoding scheme. This encoding scheme                Tight encoded with the same settings. An extensive log file
divides an update in several rectangles and chooses the best          has been kept, recording geometrical update sizes, and byte
suited encoding for each of these. It is able to compress             sizes for the various encoding methods. A static cache was
images using JPEG, and the compression/quality level can be           used and was kept unaltered during the session. The cache
configured per session.                                               frames were statically chosen to be the startup screens of the
                                                                      used applications: the desktop background, an empty Open
B. Experiment setup                                                   Office Writer document, a Konsole command shell window, a
   We have adapted only the server, so that he encodes the            web browser with Google search page loaded and a web
exact same graphical update in two ways: once using the               browser with the homepage of the local newspaper loaded.
classic tight encoding of the frame at hand, and once tight
encoding the difference frame relative to the cache. For testing      C. Experimental results
purposes, we have implemented the system in such way that                Figure 4 shows a cumulative bandwidth usage of the trace
only the classic tight encoded frames are sent over the network       encoded with classic tight encoding, encoding every frame
to the client for presentation. The encoded difference frames         with respect to one statically chosen cache frame and the
relative to the cache are not sent because this would disturb         optimal version that chooses between the two former
the normal operation of the thin client protocol, since every         encodings for each update. The frame we selected as cache
update request would be responded twice.                              frame was the desktop background and was read in from file
                                                                      at the start of the sequence. The first conclusions we can draw
                                                                      from this figure is that a substantial overall bandwidth gain
                                                                                29.94%. Eventually, with five cache frames we achieved a
                                                                                bandwidth requirements decrease of 34.40% over classic tight
                                                                                encoding all updates.

 Figure 4. Generated network traffic by encoding all updates with classic
Tight encoding, by encoding all updates relative to a cache containing only
one cache frame (image of desktop background), and choosing the best of
                            both at all times.
                                                                                 Figure 6. Momentary network traffic, reduction of spikes by using a cache
                                                                                                        with multiple elements.
can be reached by optimally encoding the update, in this trace
20.56% less bandwidth is needed than using classic tight                           Figure 6 presents the momentary network traffic generated
encoding. The second conclusion is that encoding every                          by the classic tight encoding and the optimal encoding using a
update relative to the single cache frame results in higher                     cache with five frames. When investigating the effect of the
bandwidth consumption in comparison to classic tight                            five cache frames on momentary network traffic generation,
encoding, for this trace this was 16.07%. This is because the                   we find that on average the frames were coded 35.17% more
cache frame is selected to bring a high amount of bandwidth                     efficient than through classic tight encoding. The maximum
reduction for a specific set of updates, but is less efficient for              spike reduction in the trace amounted to 99.81% of the classic
others.                                                                         tight encoded update. The highest spike that occurs with the
                                                                                classic tight encoding is 181.132 kB. By using the optimal
                                                                                encoding relative to the five cache frames the spike maximum
                                                                                was reduced to 121.175 kB, a reduction of 33.10%.
                                                                                D. Number of cache frames

Figure 5. Generated network traffic by at all times making the optimal choice
between classic Tight encoding and encoding relative to a cache with multiple

                                                                                  Figure 7. Increasing bandwidth gain with addition of extra cache frames.
   In figure 5, we can see that using more cache frames further
decreases the needed bandwidth. The figure presents the
                                                                                  Figure 7 presents the increase in total bandwidth reduction
gradual increase in bandwidth gain caused by addition of a
                                                                                with respect to classic tight encoding, as function of the
cache frame. Optimally encoding with respect to one cache
                                                                                number of frames in the cache. It is clear that adding an extra
frame is consistent with figure 4: the cache frame is the
                                                                                cache frame will never result in a decrease in bandwidth gain,
desktop background and yields 20.56% bandwidth reduction.
                                                                                as there is a per-frame decision of using the encoding method
Adding an extra cache frame, the browser with the homepage
                                                                                and/or cache frame that yields the most gain. Although these
of the local newspaper, brings the total bandwidth gain to
                                                                                results seem to indicate that it is preferable to use a very large
cache, the number of cache frames should be mitigated. The                                      VI. FUTURE WORK
first driver for this mitigation is to be found in the symmetrical      Research will be conducted on the impact of dynamic
nature of the caching method: the cache has to be present both       caches, and the possible gain of having this kind of caches
on server as on client. Generally speaking the server will not       instead of or in conjunction with the static cache presented in
be the limiting factor when it comes to storage space for a          this paper.
large cache. But since we target thin clients, the limited
resources of these devices could be the decisive argument. The                                 ACKNOWLEDGMENT
second driver is that the more cache frames are present, the           Part of the research leading to these results was done for the
more time is to be spent on computing the difference of the          MobiThin Project and has received funding from the European
current screen update to all cache frames, in order to decide        Community‘s Seventh Framework (FP7/2007-2013) under
which of the frames will be tight encoded and sent over to the       grant agreement nr 216946. Lien Deboosere is funded by a
client. As this computation time is part of the end-to-end           Ph.D grant of the Institute for the Promotion of Innovation
latency experienced by the user, the number of cache frames          through Science and Technology in Flanders (IWT-
can be constrained by the network at hand and the upper              Vlaanderen). Pieter Simoens is funded by a Ph.D grant of the
boundary on end-to-end delay. In [13] and [14], indications          Fund for Scientific Research, Flanders (FWO-V).
are given for the upper boundaries for this end-to-end delay in
order to guarantee a pleasant user experience. Key findings of                                      REFERENCES
these articles are that response times below 150ms are               [1]    A. Lai, J. Nieh. On the Performance of Thin Client Computing, ACM
imperceptible to the user. Higher response times are noticed                Transactions on Computer Systems (TOCS), 24 (2), pages 175-209,
                                                                            May 2006.
by the user, and exceeding the 1s barrier leads to frustration.      [2]    S. Yang and T. Y. Tiow, “Long Distance Redundancy Reduction in Thin
   If the ideal cache frames are the startup screens of the                 Client Computing” 6th IEEE/ACIS International Conference on
applications as shown in section III.A, then the optimal                    Computer and Information Science, Pages 961 – 966, 11-13 July 2007
                                                                     [3]    S. Yang and T. Y. Tiow, “Improving Interactive Experience of Thin
number of cache frames is smaller than or equal to the number               Client Computing by Reducing Data Spikes”             6th IEEE/ACIS
of used applications. (It can be smaller because: (1) two                   International Conference on Computer and Information Science, Pages
applications can be visual look-a-likes, (2) high motion                    627 – 632, 11-13 July 2007
                                                                     [4]    D. De Winter, P. Simoens, L. Deboosere, F. De Turck, J. Moreau, B.
applications (e.g. media player) do not benefit from the cache              Dhoedt, P. Demeester. A hybrid thin-client protocol for multimedia
and find more bandwidth efficiency in streaming.)                           streaming and interactive gaming applications. Proceedings of Network
                                                                            and Operating Systems Support for Digital Audio and Video 2006
                       V. CONCLUSIONS                                       (NOSSDAV2006) pages 86-92, 2006
                                                                     [5]    T. Richardson, Q. Stafford-Fraser, K. R. Wood, A. Hopper. Virtual
   In thin client computing systems, benefit can be found in                Network Computing, IEEE Internet Computing, IEEE Computer Society,
caching certain graphical updates. Other updates that resemble              Volume 02, pages 33-38, 1998.
one of these cache frames can be efficiently coded by                [6]    Microsoft Remote Desktop Protocol (RDP)
computing the difference to the cache frame, to be encoded           [7]    NoMachine FreeNX,
and transported over the network to the thin client. This article    [8]    Citrix Independent Computing Architecture (ICA):
presents how a static cache can reduce the generated network         [9]    Bitmap Cache for RDP
traffic by about 34.40%. It is shown that on a momentary basis       [10]   Whitepaper on the features and performance of RDP 5.0
the generated network traffic decreases on average by 35.17%,     
with a maximum of 99.81%. Furthermore, ideas are presented           [11]   Xrdp,
                                                                     [12]   Tight VNC:
on how to select cache frames and how the number of cache            [13]   Niraj Tolia, David G. Andersen, M. Satyanarayanan. Quantifying
frames can be deduced.                                                      Interactive User Experience on Thin Clients. IEEE Computer, Volume
                                                                            39 – 3, pages 46-52, March 2006.
                                                                     [14]   Niraj Tolia, David G. Andersen, M. Satyanarayanan. The Seductive
                                                                            Appeal of Thin Clients. February 2005.

To top