Bandwidth Optimization for Mobile Thin Client Computing through by qyz12567


									Bandwidth Optimization for Mobile Thin Client Computing through Graphical Update Caching
B. Vankeirsbilck, P. Simoens, J. De Wachter, L. Deboosere, F. De Turck, B. Dhoedt, P. Demeester
IBBT - Ghent University, Department of Information Technology Gaston Crommenlaan 8, bus 201 9050 Gent, Belgium Tel +32 9 33 14937 - Fax +32 9 33 14899 {bert.vankeirsbilck, pieter.simoens, jeroen.dewachter, lien.deboosere}
Abstract-This paper presents graphical update caching as a mechanism to reduce the network load generated by thin client computing systems. In such system, the user interaction and processing are separated by a network. User input such as keystrokes and mouse clicks are sent to the server over the network and graphical updates are transported the reverse way. The cache proposed in this article is static, meaning that it is composed before the thin client computing session starts and that the cache does not change during the session. Through experiments with an implementation of the cache, we show that graphical update caching effectively reduces the network load generated by thin client computing.

before the thin client computing session starts. When the user logs in, the cache is loaded both at client and server side, and does not get altered during this session. This article is organized as follows: section II discusses related work. In section III, our caching concept and some significant remarks are presented. Section IV tackles experimental validation through an implementation. The results of our experiments are presented and discussed. Finally, conclusions are drawn in section V. II. RELATED WORK Research has been conducted on various sorts of caches in thin client computing systems. Generally speaking, the difference with this related work is that we aim to reduce long distance redundancies in a sequence of graphical updates by caching complete frames that contain visual pixel information. A. Compression history extension In order to reduce the required network bandwidth, graphical updates should be represented by as few bytes as possible and redundant data should be minimized. In [2] it is shown that up to 23.3% network traffic reduction can be achieved with 512 Kbytes used for vertically extending the LZ history buffer. In essence, this vertical LZ history buffer extension means using multiple separate histories and selecting the best fitting history for a given chunk of data that needs compression, instead of using one longer history (i.e. horizontal extension). In [3] the authors identify the presence of data spikes and their importance for momentary network load. Average bandwidth requirements for the thin client traffic can be fulfilled by the network, but peak demands could cause trouble: high delay, buffering (and overflow), packet loss, retransmission… leading to hampering user experience. The solution presented in this article is to cache fixed-length byte strings selected from data packets representing a part of a compressed graphical update. Both of these papers focus on extending and tuning a compression scheme to the specific thin client traffic. They try to eliminate the long distance redundancies too, but search for solutions on the compression stage of the image coding stack, while we work with the pixel data from the framebuffer.



The thin client computing (TCC) concept comes down to moving the user’s applications to a distant server and running a thin client protocol between client and server. The client device only deals with user interaction. The protocol forwards user input (e.g. keystrokes) to the server and delivers graphical updates back to the client for presentation on screen. This approach has proven to work well over LAN and WAN with reasonably high bandwidth [1]. We focus on mobile client devices, which connect to the server over a wireless link. Both from wireless network and client battery lifetime perspective, it is necessary to optimize bandwidth usage. If the network cannot supply enough bandwidth at a given time, the user interface will hamper and the more network activity is needed, the more energy will be drained from the client device battery. In this paper we propose graphical update caching as a method to reduce long term redundancies in thin client sessions. When analyzing the sequence of graphical updates generated by a typical user session on a desktop computer, we found that a lot of frames resemble others that have already been transmitted earlier in that session. Since we operate in a thin client environment where all graphical updates have to be sent over a wireless, limited bandwidth network, benefit can be found in this phenomenon by storing well-chosen key frames both at client and server side, and transmitting only the differences with respect to that frame. Since less data is to be received from the network at the client, the battery autonomy of the device decreases slower. This will contribute to the user satisfaction and will lessen the load on the environment. We assess the bandwidth optimization potential of a static cache. This is a cache for graphical updates, that is filled

B. Hybrid protocol: video content In [4] it is shown that video content is better transmitted through video streaming instead of sending it over a classic thin client protocol (such as VNC [5] or RDP [6]). The authors propose a hybrid approach. Depending on the amount of change in the subsequent frames (motion), the system switches between video streaming and classic VNC mode. This way, they succeed in delivering improved Quality of Experience (QoE) with reasonable bandwidth consumption. The proposed method in this paper improves classic thin client protocols by relieving them of encoding series of updates that it cannot handle well: high motion content. The authors focus on better encoding and compressing frequent, often independent frame updates. C. NX-call caching FreeNX [7] is an open source thin client protocol that translates X-calls to the more bandwidth efficient NX-call format before they are transported over the network. A complex system of caches exists in this protocol to avoid redundancy on the data in the NX-calls. The caches in this system are more focused on the reuse of graphic primitive calls and to a lesser extent on the content itself. D. Caching in RDP and ICA Two popular thin client protocols are Microsoft Remote Desktop Protocol (RDP) [6] and Citrix Independent Computing Architecture (ICA) [8]. Both are closed source, so less detail is known about the caching involved in the protocols. We do know that both incorporate a bitmap cache. According to [9], small parts of the screen are cached: (default & minimum) 1.5MB volatile cache stored in RAM and persistent bitmap cache (on disk). In RDP version 5.0 the maximum sizes have been increased to 10 MB [10]. Xrdp [11], an open source terminal services client that complies with the Microsoft RDP server confirms that smaller bitmaps are cached, not full frame updates. According to the presented implementation, the cache consists of slots with fixed sizes: 256 pixels or smaller, 1024 pixels or smaller and 4096 pixels or smaller. None of these single cache slots are big enough to store a full screen frame of today’s popular resolutions. Even less details are known about the ICA. We know that storage is reserved for caching, but the content and the size of the cache is unknown. In contrast to our full frame caching and redundancy reduction attempt, both solutions cache smaller parts of the screen. III. CACHING CONCEPT We investigated the benefit of a static cache, in which the cache frames are selected before the thin client session starts and are stored in a file. This selection is based on the expected graphical content that will be generated by the user. When the user connects, these cache frames are fetched from the file, and do not change during a user session. This means that the system is free of difficulties related to dynamic caches, e.g.

cache eviction, run-time election of cache elements and clientserver synchronization.

Figure 1. Caching architecture

The architecture for integrating a static cache in a thin client computing system is shown in figure 1. At the server side, the executing applications of the user write their graphical output to the server’s framebuffer. This framebuffer is analyzed in order to choose an optimal encoding method. If no suited cache frames are found for this graphical update, it will be directly encoded using a native encoding scheme from the used thin client protocol and sent to the client. If there is a well matching cache frame, this cache frame is subtracted from the graphical update and the difference is encoded using the classic encoding scheme and sent over to the client. In addition, a cache header containing necessary parameters such as the used cache frame has to be sent over to the client. At the client side, the received data is decoded, and depending on the presence of a cache header, the indicated cache frame is added to the decoded frame and delivered to the client framebuffer which eventually is presented on the screen of the client device. We have designed a cache header in compliance to the header rectangles from the popular Virtual Network Computing (VNC) [5] thin client protocol. This header is 12 Bytes long, consisting of a 4 Byte encoding indicator (an integer value), and an 8 Byte ‘rfbRectangle’. In normal headers this rfbRectangle contains four 2 Byte fields (short values), representing an x and y position and the width and height of the rectangle, stating the size of the update. We chose to use the x and y components to allow translations of the difference frame over the cache frame in case there is resolution dissimilarity between client and server. (This gets explained in subsection II.D in this text.) We use the width as indicator for the cache frame to which the difference is computed. The height is unused for this static cache, but is kept for conformity. The key difficulties in implementing the presented architecture are the choice of the cache frames and the decision whether the cache will be used for encoding the frame at hand and if so, which cache frame will give the best compression.

A. Statically choosing cache frames We have taken a thin client usage session offline by storing the sequence of graphical updates. This session consisted of, starting from the desktop background, opening Open Office writer, typing a text, closing down the office program, opening an internet browser, performing a Google search, followed by visiting the homepage of a local newspaper containing multimedia content (in the form of banners) which was scrolled down to the bottom. Finally the browser was closed, ending the trace with the desktop background again.

Figure 3. Compression factor for varying image sizes.

Figure 2. Number of bytes differing between subsequent frames in a thin client computing session.

We have computed the byte-per-byte differences between subsequent full screen frames (in uncompressed format). We have taken the number of different bytes as a measure for the resemblance between frames. Figure 2 presents these difference frames, which show a spiked path. The peaks are interpreted as big differences between two subsequent frames. Generally speaking these peaks are followed by a tail of frames that do not differ much from their predecessor. Through a matrix of mutual distances, we were able to identify the optimal combination of a predefined number of cache frames, indicated by squares in figure 2. Optimal cache frames are those that, combined, result in the smallest distance to the complete sequence. After visual inspection we found that the optimal cache frames represented the applications that were executing, i.e. the desktop background, the office program, the Google startup page and the homepage of the local newspaper. B. Compression factors and difference frames We cannot simply state that the raw byte size of an image is a straight guideline for the compressed byte size. A lot depends on the content of the image. This is why it is difficult to predict the bandwidth gain of encoding difference frames from the cache in function of the bandwidth used by coding the frames themselves. Moreover, there is no strict correlation between the original image and the difference with the cache, because the separate cache frames are likely to produce separate difference frames.

Figure 3 shows the spread of the compression factor of a series of images taken from a thin client usage session. They are expressed as the modified percentage of a full screen frame of 1024 by 768 pixels. (The compression factor represents the raw image size divided by the compressed size.) The graph confirms that there is a big variance in compression factor for a given image size. This chart has been generated by executing a VNC thin client session and logging the update size (in pixels) and the compressed byte size. The same trace was performed as described in the previous subsection. The updates were all encoded through Tight encoding [12] (version 1.3.9), which makes use of various compression schemes that best suit the content of the update at hand. The default settings for this encoding were adopted. C. When will the cache be efficient? Figure 3 also teaches us that the bigger the image size, the higher the possible compression factor gets. Since the compression factor depends on the content, the difference of an image with a well-suited cache frame can result in a considerably higher compression factor. The cache will be efficient in the case that a big update is requested. These big updates are expected when the user starts an application or when switching between applications, and comply with the spikes identified in figure 2. A video file, causing very fast successive peaks will not be handled well by a cache though, since they will not often map to the cache frames that are to be predicted before the session starts. A solution for this, as suggested in [4], is to stream these video files. D. Size of the cache frames In a practical setting, one cannot assume that thin client and server have the same screen/framebuffer resolution. This rises the question what the size of the cache frames should be. In our opinion the size of the cache frames will be dictated by the server resolution. This is an upper limit for the thin client session resolution in case the client resolution is higher than the server resolution, and when the server resolution is higher

than the client resolution one possibility is that only the visible parts of the screen will be requested by the client. If the size of the cache frames matches the server resolution, these solutions map directly onto updating the client framebuffer based on the cache. Note that in this setting the caches have to be equal on client and server side, so the size of the cache frames does not necessarily match the client resolution. Another technique that exists to cope with smaller client screens is scaling at server side. In this case the cache frames at the client side could be downscaled versions of those at the server. IV. EXPERIMENTAL VALIDATION A. Background information on VNC The proposed architecture for the static cache has been implemented inside an existing Virtual Networking Computing (VNC) [5] system. This system already contains some useful optimizations that deserve clarification before discussing the experiments performed on our implementation. VNC uses a thin client protocol that divides the graphical updates into rectangles. Each rectangle can be encoded in a different way, but practically one and the same preferred encoding method is used. It is a pull based protocol, demand driven by the client. This mechanism is very well suited for low bandwidth networks, because the slower the network or the client is, the slower the rate of updates becomes. VNC shows a mechanism of incremental updates. The requested region is analyzed to find the regions that are modified with respect to the screen information already visible at the client. Unmodified regions should be omitted for encoding and transmission to the client as this is clearly redundant information. There is also the concept of copy-region. This region is a part of the current screen update that can be copied into another part of this update. Identification of this kind of redundant information in individual graphical updates can result in a significant drop in required bandwidth since only a translation vector is to be transmitted. In particular we used the Tight VNC [12] variant that adds its own advanced encoding scheme. This encoding scheme divides an update in several rectangles and chooses the best suited encoding for each of these. It is able to compress images using JPEG, and the compression/quality level can be configured per session. B. Experiment setup We have adapted only the server, so that he encodes the exact same graphical update in two ways: once using the classic tight encoding of the frame at hand, and once tight encoding the difference frame relative to the cache. For testing purposes, we have implemented the system in such way that only the classic tight encoded frames are sent over the network to the client for presentation. The encoded difference frames relative to the cache are not sent because this would disturb the normal operation of the thin client protocol, since every update request would be responded twice.

In the architecture there is a component that is hard to implement because of the difficult a priori prediction of the compression factor for a given frame update. In our test implementation, the decision is made by encoding the update multiple times, i.e. plain tight coding and coding relative to all cache frames, and choosing the best one to send over to the client. This means we use an a posteriori decision process. Table 1 shows the machine configurations used in our experiments.
Table 1. Specification of test machine configurations.

Role Thin Client

Thin Server

Hardware 2.11GHz, AMD Athlon 64 X2 Dual Core 512 MB RAM 10/100BaseT NIC 2.11GHz, AMD Athlon 64 X2 Dual Core 512 MB RAM 10/100BaseT NIC

OS / Software Kubuntu 8.04 Tight VNC client 1.3.9

Kubuntu 8.04 Tight VNC server 1.3.9

We have experimented with a trace that was conducted as follows: The trace consisted of starting the thin client session, resulting in the desktop background to be shown. A command shell window (with black background) was opened. Some commands were entered. Then Open Office Writer was started, a text was typed, the office program was minimized. A browser was started, performing a Google search. Then there was fiddled with the menus of the browser. The homepage of a local newspaper was loaded and scrolled down. The browser was closed. The shell window was closed. The office program was maximized and closed down. Then the thin client session was ended. This trace was simultaneously encoded using Tight encoding (default compression settings) and relative to five cache frames, computing the difference frames which are Tight encoded with the same settings. An extensive log file has been kept, recording geometrical update sizes, and byte sizes for the various encoding methods. A static cache was used and was kept unaltered during the session. The cache frames were statically chosen to be the startup screens of the used applications: the desktop background, an empty Open Office Writer document, a Konsole command shell window, a web browser with Google search page loaded and a web browser with the homepage of the local newspaper loaded. C. Experimental results Figure 4 shows a cumulative bandwidth usage of the trace encoded with classic tight encoding, encoding every frame with respect to one statically chosen cache frame and the optimal version that chooses between the two former encodings for each update. The frame we selected as cache frame was the desktop background and was read in from file at the start of the sequence. The first conclusions we can draw from this figure is that a substantial overall bandwidth gain

29.94%. Eventually, with five cache frames we achieved a bandwidth requirements decrease of 34.40% over classic tight encoding all updates.

Figure 4. Generated network traffic by encoding all updates with classic Tight encoding, by encoding all updates relative to a cache containing only one cache frame (image of desktop background), and choosing the best of both at all times. Figure 6. Momentary network traffic, reduction of spikes by using a cache with multiple elements.

can be reached by optimally encoding the update, in this trace 20.56% less bandwidth is needed than using classic tight encoding. The second conclusion is that encoding every update relative to the single cache frame results in higher bandwidth consumption in comparison to classic tight encoding, for this trace this was 16.07%. This is because the cache frame is selected to bring a high amount of bandwidth reduction for a specific set of updates, but is less efficient for others.

Figure 6 presents the momentary network traffic generated by the classic tight encoding and the optimal encoding using a cache with five frames. When investigating the effect of the five cache frames on momentary network traffic generation, we find that on average the frames were coded 35.17% more efficient than through classic tight encoding. The maximum spike reduction in the trace amounted to 99.81% of the classic tight encoded update. The highest spike that occurs with the classic tight encoding is 181.132 kB. By using the optimal encoding relative to the five cache frames the spike maximum was reduced to 121.175 kB, a reduction of 33.10%. D. Number of cache frames

Figure 5. Generated network traffic by at all times making the optimal choice between classic Tight encoding and encoding relative to a cache with multiple elements.

In figure 5, we can see that using more cache frames further decreases the needed bandwidth. The figure presents the gradual increase in bandwidth gain caused by addition of a cache frame. Optimally encoding with respect to one cache frame is consistent with figure 4: the cache frame is the desktop background and yields 20.56% bandwidth reduction. Adding an extra cache frame, the browser with the homepage of the local newspaper, brings the total bandwidth gain to

Figure 7. Increasing bandwidth gain with addition of extra cache frames.

Figure 7 presents the increase in total bandwidth reduction with respect to classic tight encoding, as function of the number of frames in the cache. It is clear that adding an extra cache frame will never result in a decrease in bandwidth gain, as there is a per-frame decision of using the encoding method and/or cache frame that yields the most gain. Although these results seem to indicate that it is preferable to use a very large

cache, the number of cache frames should be mitigated. The first driver for this mitigation is to be found in the symmetrical nature of the caching method: the cache has to be present both on server as on client. Generally speaking the server will not be the limiting factor when it comes to storage space for a large cache. But since we target thin clients, the limited resources of these devices could be the decisive argument. The second driver is that the more cache frames are present, the more time is to be spent on computing the difference of the current screen update to all cache frames, in order to decide which of the frames will be tight encoded and sent over to the client. As this computation time is part of the end-to-end latency experienced by the user, the number of cache frames can be constrained by the network at hand and the upper boundary on end-to-end delay. In [13] and [14], indications are given for the upper boundaries for this end-to-end delay in order to guarantee a pleasant user experience. Key findings of these articles are that response times below 150ms are imperceptible to the user. Higher response times are noticed by the user, and exceeding the 1s barrier leads to frustration. If the ideal cache frames are the startup screens of the applications as shown in section III.A, then the optimal number of cache frames is smaller than or equal to the number of used applications. (It can be smaller because: (1) two applications can be visual look-a-likes, (2) high motion applications (e.g. media player) do not benefit from the cache and find more bandwidth efficiency in streaming.) V. CONCLUSIONS In thin client computing systems, benefit can be found in caching certain graphical updates. Other updates that resemble one of these cache frames can be efficiently coded by computing the difference to the cache frame, to be encoded and transported over the network to the thin client. This article presents how a static cache can reduce the generated network traffic by about 34.40%. It is shown that on a momentary basis the generated network traffic decreases on average by 35.17%, with a maximum of 99.81%. Furthermore, ideas are presented on how to select cache frames and how the number of cache frames can be deduced.

VI. FUTURE WORK Research will be conducted on the impact of dynamic caches, and the possible gain of having this kind of caches instead of or in conjunction with the static cache presented in this paper. ACKNOWLEDGMENT Part of the research leading to these results was done for the MobiThin Project and has received funding from the European Community‘s Seventh Framework (FP7/2007-2013) under grant agreement nr 216946. Lien Deboosere is funded by a Ph.D grant of the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWTVlaanderen). Pieter Simoens is funded by a Ph.D grant of the Fund for Scientific Research, Flanders (FWO-V). REFERENCES
[1] [2] [3] A. Lai, J. Nieh. On the Performance of Thin Client Computing, ACM Transactions on Computer Systems (TOCS), 24 (2), pages 175-209, May 2006. S. Yang and T. Y. Tiow, “Long Distance Redundancy Reduction in Thin Client Computing” 6th IEEE/ACIS International Conference on Computer and Information Science, Pages 961 – 966, 11-13 July 2007 S. Yang and T. Y. Tiow, “Improving Interactive Experience of Thin Client Computing by Reducing Data Spikes” 6th IEEE/ACIS International Conference on Computer and Information Science, Pages 627 – 632, 11-13 July 2007 D. De Winter, P. Simoens, L. Deboosere, F. De Turck, J. Moreau, B. Dhoedt, P. Demeester. A hybrid thin-client protocol for multimedia streaming and interactive gaming applications. Proceedings of Network and Operating Systems Support for Digital Audio and Video 2006 (NOSSDAV2006) pages 86-92, 2006 T. Richardson, Q. Stafford-Fraser, K. R. Wood, A. Hopper. Virtual Network Computing, IEEE Internet Computing, IEEE Computer Society, Volume 02, pages 33-38, 1998. Microsoft Remote Desktop Protocol (RDP) NoMachine FreeNX, Citrix Independent Computing Architecture (ICA): Bitmap Cache for RDP Whitepaper on the features and performance of RDP 5.0 Xrdp, Tight VNC: Niraj Tolia, David G. Andersen, M. Satyanarayanan. Quantifying Interactive User Experience on Thin Clients. IEEE Computer, Volume 39 – 3, pages 46-52, March 2006. Niraj Tolia, David G. Andersen, M. Satyanarayanan. The Seductive Appeal of Thin Clients. February 2005.


[5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

To top