Secure and High-performance Web Server System for Shared Hosting Service Daisuke Hara Yasuichi Nakayama Department of Computer Science Department of Computer Science The University of Electro-Communications The University of Electro-Communications Chofu, Tokyo 182-8585 Japan Chofu, Tokyo 182-8585 Japan email@example.com firstname.lastname@example.org Abstract server can steal, delete, or tamper with data ﬁles of weblogs and wikis. We developed Hi-sap, a web server system that ensures In addition, when WebDAV is used, the owner of created the security in a server and has high performance when ﬁles is the dedicated user. Therefore, ordinary users that processing dynamic content. In existing servers, server em- actually own ﬁles typically cannot edit these ﬁles directly. bedded programs cannot be used safely in large-scale en- To solve these problems, we designed and implemented vironments like a shared hosting service. These problems a secure and high-performance web server system, Hi-sap occur because server processes run under the privilege of a . In the system, web objects 2 that are stored in a server identical user. For example, server embedded interpreters are divided into partitions, for example sites and content. are commonly used to improve performance in processing Server processes run under the privilege of different users dynamic content, like weblogs and wikis. However, other for every partition. Thus, the system can prevent theft, dele- customers that share the same server can steal, delete, and tion, and tampering when server embedded interpreters are tamper with data ﬁles of weblogs and wikis. To solve these used. Ordinary users can directly edit ﬁles that are created problems, we designed a new web server system, Hi-sap. In by way of WebDAV. Therefore, the system can solve the the system, web objects that are stored in a server are di- problems of using server embedded programs. vided into partitions. Server processes run under the priv- Also, we propose a web-server-level scheduler named ilege of different users in every partition. We implemented Content Access Scheduler to enhance the scalability of the Hi-sap on a Linux OS and tested the effectiveness of the sys- number of partitions in a server. We implemented Hi-sap on tem. Experimental results show that Hi-sap has high perfor- a Linux OS and tested the effectiveness of the system. Ex- mance and scalability. perimental results show that Hi-sap has high performance and scalability. The remainder of this paper is structured as follows. In 1. Introduction section 2, we describe the background of this work. In sec- tion 3, we describe the key aspects of our design. In section More people are creating their own websites as the In- 4, we describe an overview of our implementation of Hi-sap ternet grows in popularity. Shared hosting services, where on a Linux OS. In section 5, we present an evaluation of the many customers share a server, are widely used. system. In section 6, we describe the related work and dis- With existing servers, server embedded programs, for cuss realization approaches of a server system. Finally, in example server embedded interpreters [1, 2, 3, 4] or Web- section 7, we conclude. DAV , cannot be used safely and conveniently in large- scale environments like shared hosting services. These 2. Background problems occur because server processes that load server embedded programs run under the privilege of a identical user (dedicated user 1 ). When server embedded interpreters In this section, we describe the security in servers, server that improve performance in processing dynamic content embedded interpreters, our previous work, Harache, and like weblogs, wikis , and content management systems shared hosting services. (CMSs)  are used, other customers that share the same 2 Sets of public access ﬁles, directories, HTTP environment variables, 1 Dedicated to run server processes: apache, www-data, www, etc. etc. daer tnarg ot deriuqer ton si tI tnarg ot deriuqer si tI )---/---/-wr( .”rehto“ na ot noissimrep na ot noissimrep daer daer tnarg ot deriuqer si ti ,tuB )--r/---/-wr( .”rehto“ .resu detacided a ot noissimrep ti i bew A eetssbewssA ’ ’ eetssbewssB ti i bew B ’ ’ ti i bew A eetssbewssA ’ ’ ti i bew B eetssbewssB ’ ’ tnetnoc htua tnetnoc htua eetssbewssC ti i bew C ’ ’ eetssbewssC ti i bew C ’ ’ eteled & laets eteled & laets tpircs PHP revreS revreS Figure 1. Security in server Figure 2. Security in server: server embed- ded interpreters 2.1. Security in Servers ecuted by a server embedded interpreter not as a CGI. Sim- Existing servers run under the privilege of a dedicated ilarly, server embedded interpreters are widely used as an user. Thus, it is required to grant read and execution permis- alternative to CGI, such as Ruby, Perl, and Python [2, 3, 4]. sion to an “other” deﬁned by the UNIX permission model, However, server embedded interpreters cannot be used “owner/group/other”. Even if HTTP authentication is used together with suEXEC. suEXEC has server processes ter- to protect content against outside attacks, internal customers minate after each request because it is a CGI. Because em- that share the same server can steal, delete, or tamper with bedded scripts that are executed by server embedded inter- these content (Figure 1). preters run under the privilege of the dedicated user, it can- It is required to use both suEXEC [9, 10, 11] and POSIX not ensure the security in a server (Figure 2). ACL  (hereinafter, “suEXEC & POSIX ACL”) to solve these problems. First, read and execution permission 3 of 2.3. Harache public access ﬁles are granted to only the dedicated user by using POSIX ACL. Therefore, content can be published We previously proposed a web server, Harache [13, 14]. without granting any permission to an “other”. Second, It allows safe and convenient use of server embedded pro- CGI scripts run under the privilege of the site owner by us- grams. In Harache, server processes run under the privi- ing suEXEC. Therefore, suEXEC & POSIX ACL can pre- lege of different users for every site (Figure 3). Therefore, vent malicious CGI scripts of other customers that share the Harache must grant permission of any content that includes same server from stealing, deletion, or tampering with con- embedded scripts only to an “owner”. tent. However, Harache cannot fully use the increased speed of server embedded interpreters because server processes 2.2. Server Embedded Interpreters terminate after each session. Hi-sap allows high security in servers in the same way Recently, dynamic content like weblogs, wikis, and as Harache. In addition, it has high performance when pro- CMSs are widely used. A CGI has been used to execute cessing dynamic content by fully using the increased speed dynamic content. However, processing dynamic content at of server embedded interpreters. high speed is difﬁcult for a CGI because it requires fork() and execve() for every request. Therefore, server embed- 2.4. Shared Hosting Service ded interpreters have been used as an alternative to CGI. They have server processes including interpreters of lan- In shared hosting services, customers subscribe to host- guage processors. Server processes are used for many re- ing service providers for every site. They can store content quests without termination. PHP  scripts are usually ex- on the disk space of the server by paying a small monthly 3 Data ﬁles of weblogs and wikis additionally require write permission. fee. First, web objects that are stored in a server are divided revreS into partitions. Each partition is a site, a content, and a )ehcaraH( QUERY STRING4 according the division method. To have toor high security in a server, server processes run under the toor privilege of different users for every partition in the same ② way as Harache. ① ③ Aresu toor Second, the system pools server processes that run un- der the privilege of the different users. This is different /Aresu~/ TEG ④ from Harache whose server processes terminate after each session. Therefore, the system uses the increased speed of etisbeW s’A resU eht ot tseuqeR ① server embedded interpreters. Aresu ot diuteS ② gnissecorP ③ Third, the system creates and terminates server processes resworB tneilC eht ot esnopseR ④ dynamically. As described in section 2.4, the scalability of ssecorP revreS the number of partitions in a server is an important fac- tor in shared hosting services. In a web server, memory use strongly inﬂuences scalability . Therefore, in the Figure 3. Harache system, we tried to save memory with a web-server-level scheduler, Content Access Scheduler. Because it creates and terminates server processes dynamically, the system can have high scalability of the number of partitions in a The biggest problem for hosting service providers is server. server footprints at the data center. Many customers must be accommodated in a server to make an acceptable proﬁt. However, it is general that the amount of the data transfer 3.2. Architecture is limited in shared hosting services. There are many per- sonal sites that the amount of access is little. There could The system brings access control into operation with a even be sites that are not accessed at all. secure OS . If the privilege of the administrator ac- Therefore, the number of customers accommodated on a count is taken over because of a security hole or a mis- server must be maximized by minimizing the computation conﬁguration, the access control of Hi-sap with different resources for sites that are not accessed at all. user privilege has no effect. Every web object is stolen, In shared hosting services, a charging system exists ac- deleted, or tampered with by outside attackers. To prevent cording to the amount of the data transferred. Dynamic con- these incidents, the system ensures the security for each par- tent that transfers little data and uses a lot of CPU power tition by using a secure OS. and memory is executed for a nominal fee. This is not fair, An overview of the architecture of the system is shown because the dynamic content can cause harmful effects for in Figure 4. The system consists of a dispatcher and many other sites that share the same server. workers. Each worker runs under the privilege of a different user and processes requests for a speciﬁc dedicated parti- tion. The dispatcher is a reverse proxy server and distributes 3. Design requests to workers. Secure OSes have trouble in a transition of user privi- In this section, we describe the design of our proposed lege. If the policy of secure OSes permits workers that run server system, Hi-sap. It can be used with UNIX-like OSes. under the privilege of the administrator account to transfer to the privilege of ordinary users, then the security will have 3.1. Design Principle no effect. Therefore, in our system, workers run under the privilege of ordinary users initially. The design principles of the system are as follows. • High security: Server processes run under the privilege 3.3. Content Access Scheduler of different users for every partition. We propose Content Access Scheduler, a web-server- • High performance: The system pools server processes level scheduler to enhance the scalability of the number of that run under the privilege of the different users. partitions in a server. It controls the creation and termina- tion of workers. Its scheduling principles are as follows. • High scalability: Server processes are created and ter- minated dynamically. 4 An argument given to a CGI script and an embedded script. 1.1/PTTH / TEG )1( ten.C.www :tsoH A etisbeW B etisbeW C etisbeW rehctapsid ikiW ikiW www ikiW www resworB esnopser a gnidnes )5( golbeW golbeW yxorp esrever )4()2( SMC golbeW SMC ・・・ A B C A B B srekrow C … gnissecorp )3( A rekrow B rekrow pas-iH C rekrow revreS SO Figure 4. Overview of architecture of Hi-sap Figure 5. Partition: site • Workers are created when a request occurs 5 . are used for page switching and operations, for example browse, edit, preview, and save, in weblogs and wikis. The • Running workers are terminated when the server is in system provides these ﬁne-grained access controls at the a high-load state. web server level. The scheduler can allow high scalability, in particular by optimizing the scheduling algorithm for content. 4. Implementation 3.4. Partitions In this section, we describe the implementation of Hi- sap based on the design of the previous section. We im- As described in section 2.4, customers subscribe to host- plemented the system on a Linux OS. The dispatcher was ing service providers for each site. Also, current charging implemented as an Apache module, mod hisap on Apache systems in shared hosting services do not take into account HTTP Server ver. 2.0.55 . 1,000 Apache HTTP Server computation resource 6 use. ver. 2.0.55 were used as workers. Each worker waits for re- In our system, the basic partition is a site. In Figure quests at a unique port. Also, Content Access Scheduler and 5, worker A is dedicated to process requests for website A. other management facilities of workers were implemented Similarly, worker B is dedicated to website B, worker C is as a daemon, hisapd. An overview of a request process- dedicated to website C. Because server processes run un- ing of the system is shown in Figure 7. The details of the der the privilege of different users for each site in the sys- dispatcher and hisapd are as follows. tem, the computation resource use of each site can be eas- ily measured. Also, the system can bring limitations of the 4.1. Dispatcher computation resources for each site into operation. In addition, the system has the partition extend a con- When the dispatcher receives a request, for example for tent and a QUERY STRING. In Figure 6, website A in 5 is partition C in Figure 7, from a browser (Figure 7 (1)), it divided into content. Worker A is dedicated to process re- conﬁrms whether the dedicated worker for partition C is ac- quests for a wiki and a weblog, and worker A2 is dedicated tive (Figure 7 (2)). If the worker (worker C) is inactive, to a CMS. the dispatcher asks hisapd to activate it (Figure 7 (3)). The The measurement of the computation resource use and communication method between the dispatcher and hisapd the limitations of computation resources can be computed is a UNIX domain socket. Then Worker ID, an identiﬁer for each content. Therefore, even if a server process that of the requested worker, is recorded in a dedicated log ﬁle, processes a request for a content runs out of control, other Worker Request Log. After hisapd activates the worker (Fig- content in the site is not affected. Also, QUERY STRINGs ure 7 (4)), the dispatcher forwards the request to the worker 5 A worker of a partition that is not requested is not created permanently. (Figure 7 (5)). The worker processes the request (Figure 6 They are CPU and memory, etc. 7 (6)). Then the dispatcher receives a response from the tekcos niamoD XINU 1.1/PTTH / TEG )1( ten.C.www :tsoH A etisbeW A etisbeW B etisbeW PTTH ikiW ot gniksa )3( fi gnimrifnoc )2( C rekrow etavitca evitca si C rekrow resworB ikiW A rekrow )i( dpasih rehctapsid on sah stseuqer toor www golbeW www gnidnes )8( toor esnopser eht SMC ・・・ golbeW gnitanimret )ii( A rekrow gnitavitca )4( C rekrow esrever )7()5( yxorp A B C A rekrow 2A rekrow pas-iH SO B rekrow revreS A A B B srekrow C C C … ssecorp )6( tseuqer eht Figure 6. Partition: content Figure 7. Overview of request processing of Hi-sap worker (Figure 7 (7)) and sends the response to the browser (Figure 7 (8)). 5. Evaluation 4.2. hisapd In this section, we present the results of our evaluation As described in section 4.1, hisapd dynamically activates experiments of Hi-sap. The hardware conﬁguration of the workers after requests from the dispatcher. This is the algo- experimental environments is shown in Table 1. rithm of Content Access Scheduler for worker activation. In addition, an algorithm exists for worker termination. 5.1. Basic Performance Evaluation When thrashing occurs, hisapd terminates workers that have not been requested recently. Thrashing decreases the perfor- mance of web servers dramatically . The conditions for We evaluated the basic performance of Hi-sap in pro- which hisapd judges that thrashing occurs are as follows. cessing dynamic content to determine its usefulness. An Apache HTTP Server ver. 2.0.55 (Apache) and Apache that • A swap-in occurs. enables suEXEC (suEXEC) were used for comparisons. In the system and in Apache, a PHP script is executed by • A swap-out occurs. the server embedded interpreter. However, in suEXEC, a PHP script is executed as a CGI. The system, Apache, and • Memory use is 99% or more. suEXEC used the conﬁguration ﬁles by default. We used a httperf benchmark ver. 0.8  to measure the perfor- hisapd checks for these conditions every 5 seconds. When mance. all of the conditions are met, hisapd terminates workers. We sent requests to a PHP script and measured the re- Also, the conditions for which hisapd chooses workers to sponse throughput. The script calls phpinfo() that displays terminate are as follows. the system information of the PHP language processor. The trafﬁc of the script is 40 KB per request. The results are • The worker is active. shown in Figure 8. The x-axis shows the request fre- quency, and the y-axis shows the throughput. The sys- • The worker is not recorded int the most recent 10,000 tem loses an average of 28.0% of the throughput relative requests of the Worker Request Log. to Apache. However, the system has high throughput rela- tive to suEXEC. The overhead of the system is because of a Pseudo LRU is used to reduce the search time for the reverse proxy. Therefore, this implementation is effective. Worker Request Log. In Figure 7, hisapd chooses worker This experiment demonstrates the system has high per- A (Figure 7 (i)) and terminates it (Figure 7 (ii)). formance while ensuring security in a server. 005 Table 1. Hardware conﬁguration of experi- 054 mental environments 004 ) s 053/ N Network #( 003 t Switching Hub DELL PowerConnect 2724 u 052p h 1000 BASE-T ×24 002g u or 051h Client T pas-iH Intel Pentium III Xeon 500 MHz ×4 001 CPU 05 eno-ot-enO Memory 256 MB (swap 512 MB) 0 OS Fedora Core 4 (Linux 2.6.14) 001 002 007 006 005 004 003 008 0001 009 NIC Intel PRO/1000XT 1 Gbps )N#( snoititrap fo rebmuN Server CPU AMD Opteron 240EE 1.4 GHz ×2 Memory 4 GB (swap 8 GB) Figure 9. Scalability evaluation OS Fedora Core 4 (Linux 2.6.14) NIC Broadcom BCM5704C 1 Gbps 001 09 08 008 ehcapA %)( 07 007 ) pas-iH e s 06 yromem eno-ot-enO s 006/ CEXEus uy 05 yromem pas-iH N 005#( r o paws eno-ot-enO t u 004p m04 e paws pas-iH h g M03 003u o 02 r h 002T 01 001 0 0 001 002 007 006 005 004 003 008 009 001 001 002 003 007 006 005 004 008 009 0001 )N#( snoititrap fo rebmuN )s/N #( ycneuqerf tseuqeR Figure 10. Scalability evaluation: memory Figure 8. Basic performance evaluation use 5.2. Scalability Evaluation y-axis shows the throughput. Our system had substantially We evaluated the scalability of Hi-sap in processing dy- higher throughput than One-to-one from beginning to end. namic content. The One-to-one approach was used for com- Also, the throughput decrement in the system due to an in- parison. One-to-one uses networks with a reverse proxy, crease in the number of partitions was low. For One-to-one, and has a dispatcher and many workers that are dedicated the OS crashed due to a memory shortage when the number to process requests for each partition. Although One-to- of partitions was 600. one is similar to our system, mod hisap and hisapd are not installed. All of the workers run from beginning to end. The change of the memory use in the experiment is This experiment is intended to determine the effectiveness shown in Figure 10. The x-axis shows the number of par- of Content Access Scheduler. titions in the server, and the y-axis shows the memory use. The swap use of One-to-one dramatically increased due to We sent 100 requests to a PHP script on each partition an increase in the number of partitions. This is the reason sequentially and measured the response throughput. The of the crash of the OS. However, our system does not use script was the same as that described in section 5.1. We used swap space as much because of Content Access Scheduler. Apache HTTP server benchmarking tool ver. 2.0.41-dev in- cluded with Apache. The results are shown in Figure 9. The This experiment demonstrates the system has high scal- x-axis shows the number of partitions in the server, and the ability for the number of partitions in a server. Table 2. Comparison of approaches Security in a Server Basic Performance Scalability Generality Apache very bad very good good good suEXEC & POSIX ACL good very bad good good Sandbox / VM very good very good bad / very bad good PHP safe mode good very good good very bad Apache perchild MPM good – bad good Harache good bad good good Hi-sap very good good good good 6. Related Work 6.3. Apache In this section, we describe the related work about the perchild MPM is included in Apache HTTP Server ver. security in a server and discuss approaches for creating a 2.0 . In this mechanism, the user account and group server system. account that executes server processes can be set for each site. Although this mechanism may create high security in 6.1. Sandboxes and VMs a server, no reports show that it runs stably, and its devel- opment ended while still in the experimental phase. Also, Many sandboxes  and virtual machines (VMs)  dedicated server processes must be created initially for ev- have been proposed. These mechanisms isolate server soft- ery site. This inﬂuences the scalability of the number of ware and OSes running server software from the rest of a partitions in a server. server machine. If a sandbox or a VM is assigned to every partition, it 6.4. Discussion of Approaches has high security in a server. However, the computation resource use per partition dramatically increases by using A comparison of the approaches of the different server these mechanisms. This strongly inﬂuences the scalability systems is shown in Table 2. Normal Apache cannot have of the number of partitions in a server. high security in a server. suEXEC & POSIX ACL has very low performance. Sandboxes and VMs have low scalability. 6.2. Language Processor PHP safe mode has low generality. Apache perchild MPM has low scalability and its performance is unknown because no reports show whether it runs stably. Harache has high PHP  has a safe mode. This mechanism tries to create security in a server, high scalability, and high generality. high security in a server by restricting the operations of PHP However, its performance is low. scripts. The restriction items are as follows. Our system gets high marks in all items and does not • File handling is permitted only when the owner of the have any weak points. Therefore, it is the most effective. script is same as the owner of the ﬁle that the script is about to handle. 7. Conclusion • File handling is permitted only below speciﬁc directo- ries. We designed Hi-sap, a secure and high-performance web server system, and we implemented it on a Linux OS. We • Environment variables that can be changed are re- described the security in a server and shared hosting ser- stricted. vices and mentioned their problems. We evaluated our sys- tem. Our results demonstrate the system has an advantage • Speciﬁc functions and classes are disabled. over other approaches in performance and scalability. However, this mechanism depends on the language pro- cessor and is not commonly used. Also, this mechanism is Acknowledgments hard to use. However, our system provides a general security mecha- This work was supported in part by the Exploratory nism. Software Project of the Information-technology Promotion Agency, Japan.  P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warﬁeld. Xen and the Art of Virtualization, Proc. the 19th ACM Symposium on References Operating Systems Principles, pp. 164–177, 2003.  PHP: Hypertext Preprocessor. http://www.php.net/  mod ruby. http://modruby.net/  mod perl. http://perl.apache.org/  mod python. http://www.modpython.org/  Y. Goland, E. Whitehead, A. Faizi, S. Carter, and D. Jensen. HTTP Extensions for Distributed Authoring – WEBDAV, RFC 2518, 1999.  WikiWikiWeb. http://c2.com/cgi/wiki?WikiWikiWeb  S. Goodwin and R. Vidgen. Content, content, every- where...time to stop and think? The process of Web content management, IEE Computing & Control Engineering Jour- nal, Vol.13, No.2, pp. 66–70, 2002.  D. Hara and Y. Nakayama. Design and Implementation of a Secure and High-performance Web Server, Proc. IPSJ 47th Programming Symposium, pp. 71-78, 2006 (in Japanese).  Apache HTTP Server. http://httpd.apache.org/  Nathan Neulinger. CGIWrap: User CGI Access. http://cgiwrap.unixtools.org/  Sebastian Marsching. suPHP. http://www.suphp.org/  A. Grunbacher. POSIX Access Control Lists on Linux, Proc. FREENIX Track: 2003 USENIX Annual Technical Confer- ence, pp. 259–272, 2003.  D. Hara, R. Ozaki, K. Hyoudou, and Y. Nakayama. Harache: A WWW Server Running with the Authority of the File Owner, Journal of Information Processing Society of Japan, Vol.46, No.12, pp. 3127–3137, 2005 (in Japanese).  D. Hara, R. Ozaki, K. Hyoudou, and Y. Nakayama. Design and Implementation of A Web Server for A Hosting Service, Proc. the 9th IASTED International Conference on Internet and Multimedia Systems and Applications, pp. 69–74, 2005.  P. Loscocco and S. Smalley. Integrating Flexible Support for Security Policies into the Linux Operating System, Proc. FREENIX Track: 2001 USENIX Annual Technical Confer- ence, pp. 29–40, 2001.  D. Mosberger and T. Jin. httperf—A Tool for Measuring Web Server Performance, Proc. the 1st Workshop on Inter- net Server Performance, pp. 59–67, 1998.  P.-H. Kamp and R. N. M. Watson. Jails: Conﬁning the om- nipotent root, Proc. the 2nd International System Adminis- tration and Networking Conference, 2000.