IBM BlogCentral Version 2 Architecting for Performance
Authors:
High Performance On Demand Solutions team ibm.com/websphere/developer/zones/hipods
hallh@us.ibm.com rahulj@us.ibm.com yuewu@us.ibm.com roachman@us.ibm.com
Harold Hall Rahul Jain Helen Wu Management contact: Christopher Roach
Web address: Project contact:
Date: Status:
November 16, 2006 Version 1.0
Abstract: BlogCentral is an intranet blogging platform that allows IBM employees around the world to share opinions on work-related topics. The office of the IBM CIO asked the HiPODS team to study BlogCentral performance and implement changes to improve response times and throughput. This paper describes the HiPODS effort to develop and deploy BlogCentral Version 2 with significantly improved performance. It includes a description of lessons learned, particularly in the areas of database tuning, and options for further performance improvements.
© Copyright IBM Corporation 2006
Executive summary
BlogCentral is an intranet blogging platform that allows IBM employees around the world to share opinions on work-related topics. BlogCentral Version 1 entered production in July 2003. The office of the IBM CIO asked the High Performance On Demand Solutions (HiPODS) team to develop and deploy Blog Central Version 2. This paper describes the performance testing and tuning the HiPODS team did to ensure that Version 2 would meet its enterprise performance goal of matching or exceeding the performance of Version 1. The specific performance targets were derived by analyzing limited Version 1 production data. The Version 2 targets were: • • Number of concurrent users and throughput of 2.5 hits a second Response times of ten seconds or less (transaction dependent)
The test results demonstrate that the throughput and response times for Version 2 are significantly improved over those of Version 1, exceeding the performance targets for Version 1 by a significant margin. The considerably higher performance in Version 2 is attributed to the application level cache built into Roller1 Version 2. Because the Roller cache is so critical to the performance of the application, a recommendation was made for the cache size to be used when Version 2 was initially deployed in production. However, changes may occur in the application’s use and/or volumes that may necessitate a modification of the cache size; cache hit ratios should be monitored closely. In addition to caching and careful tuning of the environment that accomplished the performance gains of Version 2, other performance lessons were learned: • • • Review by an external team is recommended Alternative implementations can satisfy requirements yet give different performance results Every significant application code path should be soak tested
Finally, these steps have been identified for potential further improvement of BlogCentral performance: • • • Modify the Roller application caching method Explore the consolidation of the application and database servers on Power 5 hardware Eliminate the hibernate persistence layer and use direct Java™ Database Connectivity (JDBC) calls to IBM® DB2®
In summary, Version 2 has been in production since March 2006 with an order of magnitude improvement in response time over the previous version. Nevertheless, there are options for further performance improvement that can be pursued as warranted.
1
Roller is open source software that is made available under the Apache Software License.
2 BlogCentral Version 2 / Architecting for Performance
©IBM Corporation 2006
Note: Before using this information, read the information in “Notices” on the last page. Contents
Executive summary ......................................................................................................................... 2 Contents........................................................................................................................................... 3 Introduction to the architecture of BlogCentral............................................................................... 4 Architecting for performance .......................................................................................................... 5 Performance goals ........................................................................................................................... 5 Workload characteristics............................................................................................................. 6 Response time targets ................................................................................................................. 8 Summary of performance targets data ........................................................................................ 8 Additional goals for performance testing.................................................................................... 8 Description of performance tests..................................................................................................... 9 Data ............................................................................................................................................. 9 Test scenarios.............................................................................................................................. 9 Testing methodology ................................................................................................................ 10 The role of caching ................................................................................................................... 10 Metrics ...................................................................................................................................... 12 Results of performance tests.......................................................................................................... 13 Highlights.................................................................................................................................. 13 Soak test .................................................................................................................................... 15 Lessons learned ............................................................................................................................. 15 Invite review by an external team ............................................................................................. 15 Realize alternative implementations can satisfy requirements yet give different performance results ........................................................................................................................................ 15 Soak test every significant code path ........................................................................................ 16 Results of a PageDetailer analysis................................................................................................. 18 Tuning recommendations .............................................................................................................. 19 Next steps ...................................................................................................................................... 21 Modify the Roller application caching method......................................................................... 21 Explore the consolidation of the application and database servers on IBM System p5 servers 21 Eliminate hibernate persistence layer and use direct JDBC calls to DB2................................. 22 Description of performance tests................................................................................................... 22 Appendix A: Performance data of BlogCentral Version 1........................................................... 23 Appendix B: Expensive SQL query for BlogCentral dashboard content ..................................... 24 Appendix C: Methodology for tuning DB2.................................................................................. 26 Acknowledgements ....................................................................................................................... 26 Notices........................................................................................................................................... 27
©IBM Corporation 2006
3
BlogCentral Version 2 / Architecting for Performance
Introduction to the architecture of BlogCentral
BlogCentral is a blogging application on IBM’s intranet that enables IBM employees worldwide to communicate about work-related topics. BlogCentral Version 1 went into production in July 2003. HiPODS developed BlogCentral Version 2. BlogCentral is one of the projects under the IBM Technology Adoption Program (TAP), run by the office of the IBM CIO to provide an environment for innovators to test new products and solutions internally. IBM BlogCentral is based on the Apache Roller project, which is an open source development effort to create a blogging application based on the Java 2 Platform, Enterprise Edition (J2EE) architecture. Roller uses Tomcat for the servlet container runtime and MySQL for the database. BlogCentral Version 2 uses IBM® WebSphere® Application Server Version 6.0 for the servlet container runtime and IBM® DB2® Universal Database™ for the database. The Version 2 application consists of three main components (Figure 1): 1. Version 2 blogging engine. This is the blogging engine runtime component. 2. Version 2 search application. This component interfaces with IBM® WebSphere® Information Integrator OmniFind™ Edition and provides advanced search capabilities for BlogCentral data. 3. Version 2 ToSearchDataFeeder application. This application extracts new and updated content in BlogCentral and pushes it to the search server for indexing so that it can be searched.
BlogCentral Componnets
BlogCentral App BlogCentral Search App
BlogCentralToSearchDataFeeder App
Figure 1. Key components of BlogCentral solution
Version 2 runs on IBM WebSphere Application Server Version 6.0 and stores all the data in an IBM DB2 Universal Database. IBM® HTTP Server Version 6 is the Web server used to intercept BlogCentral traffic, which is then routed appropriately to the application server. The Web server and the application server are located on the same physical machine. The operating system on the application server box is Red Hat® Enterprise Linux – Advanced Server Edition Version 4; the operating system on the database server is IBM® AIX® Version 5. This topology is illustrated in Figure 2.
©IBM Corporation 2006
4
BlogCentral Version 2 / Architecting for Performance
RSS/ HTTP HTML/ HTTP Web Browser Blogger
App Server Web Server
Linux RHAS v4 xSeries 345 IBM HTTP Server 6.0
Linux RHAS v4 xSeries 345 WebSphere 6.0.2
DB Server
AIX 5.3 pSeries 630
JSP/VM
News Reader XMLRPC/ HTTP
Plug-in blogs.svl.ibm.com
HTTP
Struts Servlet
Biz Logic Value Objects
Persis tence (Hiber nate)
JDBC
DB2 UDB 8.2 BCv2 Data
Blog Central v2 App (Based on Roller 2.0) Search Servlet TCP/IP TCP/IP Data Listener Search Server Web Server WAS BCv2 Search App Index Indexer Index Search DB TCP/IP JDBC
Http://blogs.tap.ibm.com/weblogs Workplace Client
Crawler
SVL TSM Server
Backup Data
OmniFind Server 8.2.2
AIX 5.3 pSeries 690 LPAR
Figure 2. Topology diagram of BlogCentral solution
Architecting for performance
Achieving optimum application performance requires knowledge of the most frequently used features of the application. Performance is often characterized by how the application responds to these most frequent operations. As input to the development of BlogCentral Version 2, we determined the most frequent BlogCentral operations by analyzing the Web server traffic logs (IBM HTTP Server access logs) for Version 1. Our analysis revealed these BlogCentral operations were the most frequently used: 1. 2. 3. 4. 5. 6. Browser based traffic to the BlogCentral dashboard RSS feed traffic to the BlogCentral dashboard Browser based traffic to individual blogs in BlogCentral RSS feed traffic for individual blogs in BlogCentral Posting of new entries Posting of comments to existing entries
We used this list to quantify the performance goals.
Performance goals
The architecture of Blog Central Version 2 changed major application components, namely, the servlet container runtime and the database. Because of the significance of these changes, we decided that the Version 2 performance goal would be to deliver, at a minimum, the performance characteristics of BlogCentral Version 1. The performance goals are expressed in terms of the following metrics: • • • Response time Number of concurrent users Throughput (expressed as transactions per second).
©IBM Corporation 2006
5
BlogCentral Version 2 / Architecting for Performance
The first task was to quantify Version 1 performance. The only data available was the Web server access logs. These logs contain data from which a measure of concurrency and throughput information can be extracted, but not response time information. In addition, the distribution of the workload, that is, the frequency of invocation for each operation in the application, can be determined. The results of this analysis are presented in the Workload characteristics section There was no historic data on which to base response time goals. Therefore response time performance targets for Version 2 were determined empirically by determining how long each operation took on Version 1 when invoked manually in the browser. See Response time targets for details.
Workload characteristics
The analysis of the IBM HTTP Server access logs from 28 December 2005 to 3 February 2006 provided the distribution of traffic among the operations that were identified as the primary performance drivers for Version 1. Access to blogs through RSS was by far the most used function, representing 68% of the traffic, while editing of blog entries accounted for only 1%. Figure 3 shows the distribution percentages for the 37 days.
Figure 3. Distribution of BVv1 traffic among key operations (based on 37 days of data)
Next, we analyzed the log data to determine the distribution of traffic when operations are classified only as read versus write. The read category consisted of the top five BlogCentral operations (Figure 3). The only write operation was editing blog entries. It was not surprising to find that the majority of the Version 1 traffic was read-only. For the time period, the read-only traffic was quantified at 94%; write operations accounted for only 1% (Figure 4). With this data it can safely be inferred that the performance of the BlogCentral server is best characterized by how it processes read-only traffic. From an architectural perspective, an application whose performance is characterized by readonly operations is an excellent candidate for, and could probably benefit highly from, some form of caching. This topic is addressed in The role of caching.
©IBM Corporation 2006
6
BlogCentral Version 2 / Architecting for Performance
Figure 4. Version 1 traffic: read vs. write operations
To determine the degree of concurrency Version 1 had to process for RSS traffic over one day, we analyzed an arbitrarily chosen date, 16 January 2006. The distribution of RSS traffic to various blogs was plotted (Figure 5).
Figure 5. Version 1 RSS traffic to various blogs over a sample 24-hour period
By averaging the transaction rates over longer intervals than are shown in Figure 5 we concluded that Version 1 saw, on average, a concurrency of about 2.5 hits a second over the course of the day. We also observed that Version 1 had peaks of 19 hits a second during this day. The traffic was bursty and peaks were generally followed by rather few hits in the next couple of seconds. However, the traffic appeared remarkably consistent over the day, and did not appear to have specific times of the day when traffic would be unusually high or low.
©IBM Corporation 2006
7
BlogCentral Version 2 / Architecting for Performance
In the absence of any data on the kind of throughput Version 1 was processing, we assumed that the concurrency numbers observed, 2.5 hits a second, also represented the throughput for Version 1.
Response time targets
Historic Version 1 response time data was not available. To estimate response times for key operations, the test team invoked these key operations manually in a browser and recorded the wall clock elapsed time. Appendix A contains a table showing the key metrics results.
Summary of performance targets data
We established Version 2 performance targets (Table 1) based on the analysis of Version 1 data. The percentage of the transactions was slightly modified from the Version 1 target represented by Figure 3. This was due to the allocation of the load drivers in HP Mercury LoadRunner in the testing environment. To make the test representative of the real-world scenario, we used the average size of the Version 1 dashboard page and blog page that we obtained from the Version 1 analysis. Therefore in the testing environment, the dashboard page size was 28KB, and the heavy sample user blog page size was 46KB
Table 1. Version 2 performance targets
Performance targets BlogCentral operation Percent traffic (using Version 1 traffic mix) 69% 13% 8% 3% 2% 5% Average TPS 3 2 1 1 1 1 Response time (Version 1) N/A N/A 10 sec 3 sec 10 sec 5 sec
Access to blogs with RSS Access to dashboard with RSS Access to blogs with browser (HTML) Access to dashboard with browser (HTML) Access to comments with browser (HTML) Edit blog entries Additional goals for performance testing
In addition to ensuring that Version 2 met its performance targets, determining the capacity of the chosen hardware, in terms of how much traffic volume it could support, was of interest. The testing goals were to answer these performance and scalability questions:
©IBM Corporation 2006
8
BlogCentral Version 2 / Architecting for Performance
1. How many concurrent users can be supported by Version 2 with no loss in quality of service (response time < three seconds) on the given hardware? 2. Are there any bottlenecks in Roller that prevent scalability, that is, prevent achieving 100% CPU utilization? 3. Are there any bottlenecks in deployment that prevent scalability? 4. What are the optimal configuration settings in the environment for maximum throughput? 5. What is the effect of Roller caching on throughput? 6. What is the overhead of accessing servers, such as w3 and bluepages, that are external to BlogCentral? Lastly it was important to understand the stability of the Version 2 application by executing a soak test. The soak test required Version 2 to serve simulated traffic (using the combined workload) at 60% CPU utilization for a 24-hour period. This test would be an indicator of the stability of the Version 2 application and the middleware settings, while also providing indicators on the rate of growth of both data and logs. Knowing the rate of growth of data and logs is useful in instituting appropriate processes for maintaining the file system size and archiving log files. Production level monitoring metrics were enabled on the middleware for all the performance and soak tests.
Description of performance tests
To simulate the anticipated real-world scenario and identify potential performance issues, HP Mercury LoadRunner Version 8.1 was used in conjunction with HiPODS’ performance testing best practices. We used a pre-GA release of Version 2 for testing. Roller cache was set to 300, and the JVM was specified with a minimum and maximum size of 512 MB.
Data
The data used for the results presented below is a 12 October 2005 snapshot of the BlogCentral Version 1 database that was migrated to BlogCentral Version 2 schemas using the custom data migration application on 6 November 2005.
Test scenarios
After analyzing the distribution in the data snapshot, we selected five key scenarios representing 95% of the Version 1 transactions: 1. 2. 3. 4. 5. View blog (a standard weblog testing sample was used for benchmark.) View entry View dashboard Get RSS feed for blog Get RSS feed for dashboard
We also selected some secondary scenarios. Although these scenarios represent a small amount of the total traffic, they involve committing updates to the database, which cause potential performance bottlenecks. Testing these scenarios would provide insight into database behaviors during heavy load on the application. The transactions selected were:
©IBM Corporation 2006
9
BlogCentral Version 2 / Architecting for Performance
• •
Create a new blog entry Add a comment to an existing blog entry
Testing methodology
The first step in the testing methodology was to establish a baseline for performance. The performance baseline provides a benchmark for comparing the performance results of any changes instituted in the testing environment. Hence, establishing a performance baseline is a best practice when tuning performance and is highly recommended. For Web-based applications, the recommended starting point for establishing the performance baseline is a test run using one virtual user (vuser) with no think-time. The response times recorded during this test run are the best that the application will provide for the given configuration. At higher loads, that is, more concurrent virtual users, response times will only degrade because processing concurrent loads always has a certain amount of overhead. The first set of test runs was the execution, independently of each other, of the scenarios (see Test scenarios). The individual scenarios were run several times, each time with a different number of vusers ranging from one vuser to approximately forty vusers. These tests runs provided specific scenario behaviors under different load. This data is useful both in projecting the performance results of any mix of workloads, and in analyzing the performance of the mixed workloads. The next set of tests, which established the Version 2 performance baseline, was performed using a mixed transaction workload that was designed to simulate production traffic. This was accomplished by using the same transaction mix, both by type of transaction and volume, as that recorded on Version 1 on 12 October 2006 (Table 1). As was done with the individual scenarios, the mixed workload scenario was executed using from one vuser to approximately forty users. Each test ran for six minutes, including the vusers ramp up and ramp down times. The test script was run using no think time in order to maximize the load on the application with a minimal numbers of concurrent vusers. To record the performance metrics while the system was in a steady state, the thirty seconds of ramp up time and the thirty seconds of ramp down time were excluded from the reported results (Table 2).
The role of caching
As mentioned in Workload characteristics, an application whose performance is characterized by read-only operations is an excellent candidate for, and could probably benefit highly from, some form of caching. Not surprisingly, Apache Roller developers exploited this read-only workload characteristic, and implemented a cache that stores the fully composed dashboard page and the blog pages. They most likely realized that there will always be many more users who are reading blogs than those who are authoring blogs. The results Highlights section shows the performance difference for data served from the cache versus data retrieved from the database.
©IBM Corporation 2006
10
BlogCentral Version 2 / Architecting for Performance
Caching considerations
When testing performance, it is important to keep in mind the effect of caching in the various middleware tiers. One such cache is the buffer pool cache in the database tier. IBM DB2, like most databases, provides a buffer pool cache. DB2 buffer pools allow the database to store the most frequently used data in memory on the database server. As an application submits various SQL queries against the database, the database brings the SQL result set from the persistence storage (hard disks) into memory on the database server. These result sets remain in the buffer pool cache based on the least recently used (LRU) policy. Data in memory can be accessed much faster than data on hard disk. Hence, for performance reasons, database tuning includes configuring the database server with as large a buffer pool as possible. An application like BlogCentral that can exploit caching because 95% of the workload is read-only, can substantially improve performance, if the size of the buffer pool cache that could hold the working set of data can be determined and provided. The caching behavior that is used by database products must be kept in mind while analyzing the results of the performance test. Take the example of a test scenario that visits the blogs of the top 200 most active users in BlogCentral. With a large enough database server, the buffer pool size could be configured to hold all the data for these 200 BlogCentral users. This cache implementation would result in the highest possible throughputs of the database server. However, these test results are not particularly useful because the test scenario does not represent the production workload. In a BlogCentral production workload, blogs other than those by the 200 most active users will be visited on any given day. Thus, a more useful test scenario would create a mix of database accesses that enables some requests to be served from the buffer pool cache, while other requests require the database to retrieve the data from disk. Retrieving data from disk causes the database server to incur the processing overhead of cache cleanup. Therefore, the test results of a representative workload that is run using an appropriately cached database will provide performance test results that can be helpful in projecting production performance. Similar consideration must also be kept in mind about the cache available in the BlogCentral (Roller) application. Roller has an application level cache where it stores the fully rendered blog pages, thereby avoiding the overhead of dynamically composing the blog page every time a user reads it. The size of the Roller cache is a run time configuration option. As explained previously, the volume of test data needs to accurately represent production traffic for test results to provide meaningful production performance projections.
Version 2 testing with caching
The focus of this performance testing was on the behavior of the Version 2 application in the application server tier (WebSphere). Hence the buffer pool in the database server was given a size large enough to hold almost all the testing data. By doing so, the likelihood of the application running out of resources in the application server tier first as opposed to the database tier was significantly increased. To observe the effects of caching on the performance of Version 2, we ran two test cases that explicitly focused on caching. The first test case was a worst case scenario for performance; the second a best case scenario. The mixed transaction workload was used for both. Worst case scenario: First the workload was run with no caching in either the application or database server. This forces the application and database servers to process all transactions as if they were the first transaction, that is, without any previously
©IBM Corporation 2006 11 BlogCentral Version 2 / Architecting for Performance
accessed data being available. Physical disk reads will be required for all data. The results of this test demonstrate the poorest performance that might be expected. Best case scenario: The workload was run with caching in both the application and the database servers. The cache enabled run demonstrates the best performance that might be expected. Knowing both performance extremes is important because the performance of the application in production will fall somewhere within this range when the environment is optimally tuned. Finally, additional caching tests were executed to understand different dimensions of the performance behavior of Version 2: With maximum caching: • View the same blog repeatedly • Guarantee 100% cache hit (Roller and DB buffer) • No emulation of browser cache With minimum caching: • Run in authenticated mode (because the code path taken is different if you are viewing your own blogs) • Hit 200 unique blogs • Run with Roller cache and IBM® DB2® buffer pool cache disabled (100% cache-miss).
Creating test data
One challenge in performance testing is creating the data. It is important to generate representative data to ensure valid testing. The methods we used to generate the required data are: • • Use the most active 200 blogs from the November data Select 200 test ids (these have to be actual ids because some portions of the blog page, such as photos, are populated from w3 bluepages) and replicate the weblog testing sample contents for these 200 chosen users.
Metrics
These metrics were monitored and recorded for each server during every performance test run to understand the behavior of the application and the utilization of hardware resources: • • • • • • CPU utilization Memory consumption Disk I/O Network traffic and status Application response time Transactions per second (TPS)
©IBM Corporation 2006
12
BlogCentral Version 2 / Architecting for Performance
Results of performance tests
Highlights
Table 2 shows that the throughput and response times for Version 2 are significantly improved over those of Version 1, exceeding the performance targets for Version 1 by a significant margin. The considerably higher performance in Version 2 is attributed to the application level cache built Roller. Because the Roller cache is so critical to the performance of the application, a recommendation was made for the cache size to be used when Version 2 was initially deployed in production. The recommendation was based on the workload used during testing; the workload was designed to simulate production transactions to-date. However, changes may occur in the application’s usage and/or volume and these changes may necessitate a modification of the app cache size. To determine if the Roller cache warrants recalculation, cache hit ratios should be monitored closely. The Version 2 performance data shown in Table 2 was extracted from the performance test run book. The tests were run in February 2006, prior to the Version 2 launch.
Table 2. Highlights of performance test results
Performance targets -- Version 1 BlogCentral operation Access to blogs through RSS Access to dashboard through RSS Access to blogs through browser (HTML) Access to dashboard through browser (HTML)
Performance runs -- Version 2 App CPU 89%
Percent Percent of total TPS TPS Response of total TPS TPS Response workload (avg) (peak) time (sec) workload (avg) (peak) time (sec) 69 3 19 100% 77 131 0.15
DB CPU 6%
13
2
3
-
100%
31
33
0.2
93%
5%
10
1
4
10
100%
155
283
0.05
60%
3%
3
1
3
3
100%
446
476
0.013
74%
1%
We tested all new Version 2 code for performance before it was deployed in production. The goal of this testing was to identify any performance degradation. Tables 3 and 4 present the result of
©IBM Corporation 2006
13
BlogCentral Version 2 / Architecting for Performance
test runs that show the difference in performance of BlogCentral when application level caching was enabled.
Table 3. Performance test results when Roller cache is enabled
Build Number BCv209 timestamp 20060608092258 JVM size: 512/512 min/max JVM size: 512/512 JVM size: 512/512 JVM size: 512/512 JVM size: 512/512
Transactions Dash_Browser Dash_Browser Entry_Browser RSS_Dash RSS_Dash RSS_Entry RSS_Entry
Roller cache (page) 400 400 400 400 400 400 400
TPS average 3.7 8.5 6.7 218 512 9 28.6
TPS peak 4 9.2 10 230 720 9.4 31
Response time (sec) 0.263 1.14 3.42 0.004 0.033 0.108 0.639
App CPU percent use 1% 2.5% 41% 18% 85% 25% 99%
Table 4. Performance test results when Roller cache is disabled
Build#
Transactions
TPS average 0.5 1 0.137 0.066 0.13 0.259 0.504
TPS peak 0.61 1.2 0.36 0.088 0.19 0.33 0.64
Response time (sec) 1.9 1.9 6.7 14.9 14.9 14.8 14.9
App CPU percent use 1% 2% 24% 1% 1.80% 3.3% 6.70%
BCv209 timestamp 20060608092258 JVM size: min/max JVM size: 512/512 JVM size: 512/512 JVM size: 512/512 JVM size: 512/512
Dash_browser Dash_browser Entry_browser Entry_Browser_Luis Entry_Browser_Luis Entry_Browser_Luis Entry_Browser_Luis
©IBM Corporation 2006
14
BlogCentral Version 2 / Architecting for Performance
Soak test
In addition to good response times and throughput, an application’s stability is important to a well-running production environment. A soak test is used to observe the stability of an application over an extended period of time, typically 12-24 hours. This type of test provides the opportunity to observe the behavior not only of the application but also of the test environment. Version 2 was given a 12-hour soak test during which a variety of system elements were monitored. Figure 6 shows the LoadRunner throughput and response time results graphs for the 12-hour test.
Figure 6. Twelve-hour soak test
The nearly flat throughput and response time lines in the graphs indicate that the application was behaving consistently over a relatively long period of time. This demonstration of the stability of the application and its system environment provides a measure of confidence that the application will remain stable when put in production.
Lessons learned
Invite review by an external team
It is strongly recommended that the performance test plan and results be reviewed by an expert outside of the performance test team. No matter how experienced the test team may be, some of the commonly known best practices system/software settings can be overlooked. An external review team may catch these omissions and thereby avoid surprises and downtimes in production.
Realize alternative implementations can satisfy requirements yet give different performance results
When Version 2 was first put in production, response times exceeded acceptable limits within 15 minutes. Diagnosis showed a very high level of database response times and very high levels of
©IBM Corporation 2006
15
BlogCentral Version 2 / Architecting for Performance
disk usage. No amount of database tuning in terms of increasing buffer pool sizes, sort heap sizes, etc. seemed to completely fix the problem. The next step was to look inside the application at the database queries. DB2 SQL statement snapshots were taken and an unusually expensive query was singled out. The query, associated with bringing up the main page of Version 2, retrieved the Version 2 dashboard and the most recent thirty entries. (Appendix B contains the SQL for this query.) Every execution of the query for this fairly common application operation took sixty seconds. Deeper analysis found that the SQL query was expensive because it resulted in touching all rows of the “weblogentry” table, which had a relatively large number of rows. The query brought all the rows from this table into temporary table space, sorted them by time, and then returned only the first thirty rows to be displayed. Recoding the SQL statement to enable the FETCH FIRST 30 ROWS ONLY clause to be added to the query was not possible because of the hibernate persistence layer. Such a clause could allow DB2 to process only the thirty rows of interest and avoid the overhead of joining and sorting all the rows in the database. This seemed to be an expensive way to determine the thirty most recent entries in BlogCentral. The solution was to restrict the initial selection of rows from the “weblogentry” table to those that were added in the last seven days. It was presumed that there was sufficient traffic on BlogCentral that it would easily get thirty entries over a seven day period. The initial restriction of blog entries to those within the last seven days resulted in bringing a far smaller subset of the data into the temporary table space, sorting it by time, and then picking the thirty most recent entries out of this far smaller subset. Consequently, query execution time for this operation was reduced from sixty seconds to less than one second. This type of application level tuning is a good indicator of how desired results can be achieved by changing the business logic in the application. The only risk to Version 2 was that there would be less than thirty entries on the dashboard if there had been very little activity in the last seven days. The final assessment was that, in the case of a popular application like BlogCentral, chances of this happening were very low; hence this implementation compromise was acceptable. The example also highlights some challenges to performance testing using sample databases. Given the way the original query was designed, the response time would get progressively worse as the database grew. However, the negative effect of the original SQL query may not have been observed in the test if the database did not have a sufficiently large amount of data. The way that the redesigned query performs, it is important to make sure that the database has enough data that was added within the last seven days to have this query perform during testing as it would in a production environment.
Soak test every significant code path
We ran several soak tests with prerelease versions. These tests were flawed because they did not exercise all the key code paths in the application. For example, after BlogCentral was in production for three months, a more thorough performance test revealed that at high traffic loads of RSS read operations, that is, at > 70% application server CPU utilization, the application was going to run out of memory. We turned on the WebSphere verbose garbage collection (GC) to produce the verbosegc output in the log file to conduct the JVM performance analysis. The JVM verbosegc analysis provided the necessary evidence of this problem. Figure 7. VerboseGC analysis shows this graph.
©IBM Corporation 2006 16 BlogCentral Version 2 / Architecting for Performance
Figure 7. VerboseGC analysis
Investigations revealed that the application was inadvertently creating HTTP session objects even for “read-only” anonymous traffic. When the application was under very heavy load, the number of HTTP session objects was so large that it exhausted the entire heap and resulted in the application getting OutOfMemory exceptions. Under lighter loads, the older HTTP session objects would expire before memory was used up. Further code analysis showed that if users were not logged in, the application did not need to create a session. The fix was simply to avoid creation of these HTTP session objects for anonymous users. This issue had not been seen in production because of the combination of: • • • • A JVM heap size of 768MB An HTTP session timeout setting of two hours Relatively light RSS traffic in production (the performance tests simulated much greater loads) Enabling of “http session overflow”
This lesson also demonstrated that even an experienced performance test team can overlook best practice recommendations. For Web applications, one best practice clearly states that the number of HTTP session objects must always be capped and “http session overflow” must not be enabled.
©IBM Corporation 2006
17
BlogCentral Version 2 / Architecting for Performance
Results of a PageDetailer analysis
The Page Detailer component of IBM® WebSphere® Studio was developed by HiPODS and IBM Research to monitor Web page performance from the client side. Page Detailer helps to understand the components and key metrics of a Web site, for example, the download time of the Web page, the size of the Web site, and the details of each component of the Web site. Using Page Detailer, we compared the Blog Central dashboards for Version 1 and Version 2. The results are presented in Figure 8 and Figure 9.
Date Download time Page size Page items
11/17/05 6.043 seconds using T1 213818 Bytes 37
Figure 8. PageDetailer analysis of Version 1 dashboard
©IBM Corporation 2006
18
BlogCentral Version 2 / Architecting for Performance
Date Download time Page size Page items
7/07/06 1.646 seconds using T1 11679 Bytes 16
Figure 9. PageDetailer analysis of Version 2 dashboard
An analysis of this comparison shows that Version 2 has a significantly smaller page size and many fewer components than Version 1. Version 2 has only 16 components on the dashboard as compared to 37 components in Version 1. A HiPODS best practice recommendation is to limit the number of components of each Web page to 20. This recommendation is made partially because each component, regardless of its size, involves a significant amount of HTTP overhead. For example, the Roller.js file size was zero (0) bytes but required 992 bytes of transfer overhead. In comparison, the 17.7 MB dogear-thumb.jpg file resulted in only 902 bytes of transfer overhead. In addition, each component takes server CPU time to load and network bandwidth to deliver. Thus, using the least number of components on a Web page saves machine cycles and network bandwidth for other Web services. While the Version 1 dashboard was large at 214 KB, the Version 2 dashboard is a much smaller 12 KB. By reducing the amount of data being downloaded, the load time can be reduced. Making the response time for a Web page download as low as possible is key to a good customer experience. The large number of items and large amount of data of Version 1 drove high server response times (machine cycles) and high delivery times (network bandwidth) which resulted in slow load times. Version 2 significantly improved the load time from 6.043 to 1.646 seconds by reducing both the number of components and the overall size of the page.
Tuning recommendations
The efficient operation of the servers and middleware used by an application is another key element in the delivery of optimum application performance. Each of the three tiers of the Version 2 operating environment were highly tuned and methodically maintained. Table 5 lists
©IBM Corporation 2006
19
BlogCentral Version 2 / Architecting for Performance
the settings used for the key parameters in the database, WebSphere Application Server, and IBM HTTP Server tiers.
Table 5. Recommended tuning of middleware for Blog Central Version 2
Server Database
Parameter Database heap Buffer pool size (pages) Max storage for lock list (4KB) Max storage for lock list (4KB) Log file size (4KB) Number of primary log files Number of secondary log files JVM PMI Thread pool min Thread pool max Session Max Count Allowed Overflow Session TimeOut Session Persistence Logging Max Historical Log Max Connections Min Connections Reap Time Unused TimeOut Aged TimeOut Connection TimeOut Purge Policy Global Security Cache TimeOut MaxKeepAliveRequests KeepAliveTimeout ThreadLimit ServerLimit StartServers MaxClients MinSpareThreads MaxSpareThreads ThreadsPerChild MaxRequestsPerChild
Settings DBHEAP = 1200 BUFFPAGE = 1000 LOCKLIST = 1000 LOCKLIST = 1000 LOGFILSIZ = 1000 LOGPRIMARY = 10 LOGSECOND = 2 512-768 Extended Level 5 10 10000 Disabled 120 Min Disabled 20 MB 50 MB 10 5 180 600 1800 180 Entire Pool 600 Seconds 100 10 25 64 2 600 25 75 25 0
WebSphere Web Container
Logging JDBC datasource
Security HTTP Server
In addition to the tuning of the system settings, the database log files were configured to use separate physical disks from the disks that stored the data. Further, “runstats” was periodically run on the BlogCentral database, which resulted in the updating of the physical statistics of the database tables and indexes, thereby helping the database optimizer to calculate the most efficient path to access the data.
©IBM Corporation 2006 20 BlogCentral Version 2 / Architecting for Performance
On the WebSphere tier, analysis, testing, tuning and the use of the Index advisor ensured that the right set of indexes were in place.
Next steps
Blog Central Version 2 demonstrates that significant strides have been made in building BlogCentral into a highly available, high performing pilot application. The next step is to focus on enabling BlogCentral to scale to the enterprise as part of the internal IBM production application suite. To accomplish this goal some major focus areas have been identified.
Modify the Roller application caching method
The application level caching built into Roller code and hence Version 2 has the negative side effect that it prevents the application from being deployed in a clustered topology of more than one application server. This is because there is no reliable way to invalidate the cache entry in all the application servers simultaneously whenever a blog gets a new entry or a comment is posted to a blog entry. Roller only provides a way to invalidate the cache in the application server that processed the add entry or add comment operation. As a result the caches in any other application servers continue to serve the older pages until the cache timeout forces the entry to be refreshed again from the database. This caching status inconsistency across application servers could be addressed by either: • • • Replacing the application level cache in Roller with an implementation that is based on IBM WebSphere DynaCache. DynaCache provides a mechanism for the caches in all the application servers to be invalidated when new content is posted, in this case, to a blog. Changing the application to build in an efficient way of checking every request to see if the entry in the cache is in sync with the data in the database. Using IBM® WebSphere® Application Server Edge Component as a caching store, instead of using the memory in the application JVM to store the fully rendered blog pages.
Explore the consolidation of the application and database servers on IBM System p5 servers
An especially efficient way to improve performance and capacity would be to port the application on to the latest IBM platforms that provide outstanding capacity, scalability, and performance. The hardware that was used for the BlogCentral pilot was determined by what was available at the time. Now that Version 2 performance has been analyzed, there might be a very good justification for moving BlogCentral to the IBM® System p5™ platform. With the Version 2 cache well sized, most user requests are served through the application cache, and thus the DB server is often idle, while the application server is under load. On the other hand, when the server starts up and has to initialize the cache or when queries arrive that cannot be satisfied through the cache, the database layer will be busy while the application sever layer is idle. It appears that the peak load for the application server and the database server will not be at the same time. Consequently by having both layers on an IBM System p5 server with CPU sharing enabled, it should be possible to have enough capacity to accommodate both the application server and database server loads -- even when one of those loads is at its peak. This approach would obviate the need to have an application server and a database server each of
©IBM Corporation 2006
21
BlogCentral Version 2 / Architecting for Performance
which is large enough to handle their respective peak loads. With each server sized for peak loads when peak loads do not exist, a great deal of capacity is unused.
Eliminate hibernate persistence layer and use direct JDBC calls to DB2
Roller, and consequently BlogCentral, is implemented with a hibernate persistence layer. We did not study the effect on performance of using the persistence layer compared to using straight JDBC. However this may be an appropriate area for attention if the performance of BlogCentral degrades to the point where nothing else seems to help. Then it would be worth revisiting this topic and considering the elimination of the application’s persistence layer. Several considerations make the removal of the persistence layer reasonable. The first is that BlogCentral runs on an IBM WebSphere Application Server Version 6.0 and IBM DB2 stack and apparently will for the foreseeable future. Given this circumstance the abstraction provided by hibernate is questionable. Additionally there are likely to be advanced DB2 functions that can be easily used by using JDBC calls but may not be easy to use through hibernate. Appendix B has an example using Fetch first n rows only instead of the complex arrangement that hibernate uses. Lastly, additional BlogCentral development and support would be facilitated because there are more resources skilled in DB2 than in hibernate.
Description of performance tests
The test results demonstrate that the throughput and response times for BlogCentral Version 2 are significantly improved over those of Version 1, exceeding the performance targets for Version 1 by a significant margin. The considerably higher performance is attributed to the application level cache built into Roller 2. Because the Roller application cache is critical to performance, a recommendation was made for the cache size to be used when Version 2 was initially deployed in production. However, changes may occur in the application’s usage and/or volume and these changes may necessitate a modification of the application cache size so cache hit ratios should be monitored closely. In addition to the caching and careful environment tuning that accomplished the performance gains of Version 2, other performance lessons we learned and recommend are: • • • Invite review by an external team Realize that alternative implementations can satisfy requirements yet give different performance results Soak test every significant application code path
Finally, we identified these potential next steps for further improvement of BlogCentral performance: • • • Modify the Roller application caching method Explore the consolidation of the application and database servers on IBM System p5 hardware Eliminate the hibernate persistence layer and use direct JDBC calls to DB2
©IBM Corporation 2006
22
BlogCentral Version 2 / Architecting for Performance
Appendix A: Performance data of BlogCentral Version 1
Very little information on Version 1 performance target metrics was available. As a result, the performance goals for Version 1 were deduced by using the current performance results. The table below captures the current performance metrics. The data was extracted from a typical weblog page in BlogCentral v1. The target for Version 2 release 1 was to achieve the targets in this table as the minimum goal: Performance Requirements Number of read page views / day Size of each HTTP response for a page view1 Response time in browser1 Number of page views of dashboard / day Size of HTTP response for dashboard2 Response time in browser2 Number of blog entries created / day Size of each blog entry Number of comments/trackbacks created / day Size of each blog comment Number of updates (both blog entries / comments) / day Number of blog entry deletes / day Number of blogs created / day Will all traffic be secure (HTTPS)? As of 29 August 2005 Total user population of BlogCentral Total number of blog entries Total number of comments on August 29, 2005 Total number of blogs Total number of active blogs (at least 2 posts) Version 1 25000 46KB 10 seconds 1100 28KB 3 seconds 200 1KB 200 500 Bytes 200 11 20. Max:350 No
13000 23000 21000 4200 1900
The average page size is 46 KB. Reasoning: This is based on the size of the blog of an employee identified as representative of a standard user. There are 200 blogs that are active. Reasoning: Version 1 had 14000 registered users, 2000 had blogs with two or more entries. Assuming 10% of these users with two or more entries are blogging on a given day, there are 200 blogs updated daily. This number has an impact on the JVM size and caching. The database buffer pool for the initial set of runs will be as large as possible to attempt to ensure that every query gets a cache-hit. Reasoning: This will enable the initial focus to be on any bottlenecks on the Application Server.
©IBM Corporation 2006
23
BlogCentral Version 2 / Architecting for Performance
Appendix B: Expensive SQL query for BlogCentral dashboard content
By taking DB2 SQL statement snapshots, we determined that the query shown below was causing performance problems in BlogCentral Version 2 shortly after launch. Although the query is large, the bulk of the query is a join of several tables. Other than the join predicates, there are a few other predicates involving status, publication time, and isEnabled. These extra predicates don’t decrease the quantity of data; they only remove very few rows that are not active. Consequently, essentially the entire database is being processed. When this data is joined, a row number is assigned and a temp table is created. Then the rows in the temp table are sorted and returned if the row number is <= some value. This is how hibernate coded a query to return the most recent entries. An alternative was suggested that used the DB2 capability to FECTH N ROWS only. While this should have worked and run very efficiently, it would have been difficult to code the application to make hibernate do this, and it was not deemed preferable to replace hibernate to solve this problem. Analyzing the code where it called hibernate showed another potential solution. Instead of simply searching for PUBTIME <=, the current date time, it was also possible to add a PUBTIME >= another value. Thus the product could return not the most recent fifty values, but the most recent fifty values within the last week. If there were no values within the last week, there would be no recent values. This alternative required minimal code changes, but resulted in greatly improved performance. The query now sorts one week’s worth of data to find the most recent entries, instead of sorting the entire database. Additional testing assured that performance was acceptable when the system had many entries during this week. Replacing hibernate was considered to reduce the database persistence overhead and improve the performance. However, the Version 2 schedule did not allow the time required to rewrite the persistent layer. Changing the requirement from showing the fifty most recent entries to showing the thirty most recent entries within the last week, was a quick, simple, and effective solution. The Lessons learned section explains how this problem was addressed.
select * from ( select rownumber() over(order by this_.pubtime desc) as rownumber_, this_.id as id7_, this_.categoryid as categoryid18_7_, this_.websiteid as websiteid18_7_, this_.userid as userid18_7_, this_.title as title18_7_, this_.text as text18_7_, this_.anchor as anchor18_7_, this_.pubtime as pubtime18_7_, this_.updatetime as updatetime18_7_, this_.status as status18_7_, this_.link as link18_7_, this_.plugins as plugins18_7_, this_.allowcomments as allowco13_18_7_, this_.commentdays as comment14_18_7_, this_.righttoleft as rightto15_18_7_, this_.pinnedtomain as pinnedt16_18_7_, weblogcate3_.id as id0_, weblogcate3_.name as name16_0_, weblogcate3_.description as descript3_16_0_, weblogcate3_.image as image16_0_, weblogcate3_.websiteid as websiteid16_0_, websitedat4_.id as id1_, websitedat4_.handle as handle19_1_, websitedat4_.name as name19_1_, websitedat4_.description as descript4_19_1_, websitedat4_.userid as userid19_1_, websitedat4_.defaultpageid as defaultp6_19_1_, websitedat4_.weblogdayid as weblogda7_19_1_, websitedat4_.enablebloggerapi as enablebl8_19_1_, websitedat4_.bloggercatid as bloggerc9_19_1_, websitedat4_.defaultcatid as default10_19_1_, websitedat4_.editorpage as editorpage19_1_, ©IBM Corporation 2006 24 BlogCentral Version 2 / Architecting for Performance
websitedat4_.ignorewords as ignorew12_19_1_, websitedat4_.allowcomments as allowco13_19_1_, websitedat4_.emailcomments as emailco14_19_1_, websitedat4_.emailfromaddress as emailfr15_19_1_, websitedat4_.emailaddress as emailad16_19_1_, websitedat4_.editortheme as editort17_19_1_, websitedat4_.locale as locale19_1_, websitedat4_.timeZone as timeZone19_1_, websitedat4_.datecreated as datecre20_19_1_, websitedat4_.defaultplugins as default21_19_1_, websitedat4_.isenabled as isenabled19_1_, userdata5_.id as id2_, userdata5_.isenabled as isenabled14_2_, userdata5_.username as username14_2_, userdata5_.passphrase as passphrase14_2_, userdata5_.fullname as fullname14_2_, userdata5_.emailaddress as emailadd6_14_2_, userdata5_.datecreated as datecrea7_14_2_, userdata5_.locale as locale14_2_, userdata5_.timeZone as timeZone14_2_, weblogcate6_.id as id3_, weblogcate6_.name as name16_3_, weblogcate6_.description as descript3_16_3_, weblogcate6_.image as image16_3_, weblogcate6_.websiteid as websiteid16_3_, weblogcate7_.id as id4_, weblogcate7_.name as name16_4_, weblogcate7_.description as descript3_16_4_, weblogcate7_.image as image16_4_, weblogcate7_.websiteid as websiteid16_4_, w1_.id as id5_, w1_.handle as handle19_5_, w1_.name as name19_5_, w1_.description as descript4_19_5_, w1_.userid as userid19_5_, w1_.defaultpageid as defaultp6_19_5_, w1_.weblogdayid as weblogda7_19_5_, w1_.enablebloggerapi as enablebl8_19_5_, w1_.bloggercatid as bloggerc9_19_5_, w1_.defaultcatid as default10_19_5_, w1_.editorpage as editorpage19_5_, w1_.ignorewords as ignorew12_19_5_, w1_.allowcomments as allowco13_19_5_, w1_.emailcomments as emailco14_19_5_, w1_.emailfromaddress as emailfr15_19_5_, w1_.emailaddress as emailad16_19_5_, w1_.editortheme as editort17_19_5_, w1_.locale as locale19_5_, w1_.timeZone as timeZone19_5_, w1_.datecreated as datecre20_19_5_, w1_.defaultplugins as default21_19_5_, w1_.isenabled as isenabled19_5_, userdata9_.id as id6_, userdata9_.isenabled as isenabled14_6_, userdata9_.username as username14_6_, userdata9_.passphrase as passphrase14_6_, userdata9_.fullname as fullname14_6_, userdata9_.emailaddress as emailadd6_14_6_, userdata9_.datecreated as datecrea7_14_6_, userdata9_.locale as locale14_6_, userdata9_.timeZone as timeZone14_6_ from weblogentry this_ inner join weblogcategory weblogcate3_ on this_.categoryid=weblogcate3_.id left outer join website websitedat4_ on weblogcate3_.websiteid=websitedat4_.id left outer join Roller™user userdata5_ on websitedat4_.userid=userdata5_.id left outer join weblogcategory weblogcate6_ on websitedat4_.bloggercatid=weblogcate6_.id left outer join weblogcategory weblogcate7_ on websitedat4_.defaultcatid=weblogcate7_.id inner join website w1_ on this_.websiteid=w1_.id inner join Roller™user userdata9_ on this_.userid=userdata9_.id where w1_.isenabled=? and this_.pubtime<=? and this_.status=? order by this_.pubtime desc ) as temp_ where rownumber_ <= ?
©IBM Corporation 2006
25
BlogCentral Version 2 / Architecting for Performance
Appendix C: Methodology for tuning DB2
When the application is ready for testing, the SQL statement snapshot monitor can be used to find performance issues in DB2. Generally all the monitors are turned on. The best way to reset the monitors is to restart DB2. After DB2 is restarted, the application server needs to be restarted and some initial queries run. The HiPODS tuning methodology recommends taking a complete set of snapshots: • • • Before running a test case After running the test case the first time After running the test case multiple times
Often many SQL statements are executed during startup, or the first time a test case is run. These may be important, but typically the ones that are run every time the test case is run are more important. By looking at the differences between the snapshots, it is possible to understand if a query is part of the initialization logic or is run every time. Total execution time is the first value to look at in the snapshot. Values below .01 seconds are typically not important. Often several values are of concern. The next values to look at are the total CPU times and the number of times that the query was executed. Queries with large elapsed time but small CPU time may need different fixes than queries with large CPU time. Similarly, queries that are cheap each time they are run, but that are run frequently need to be fixed using different techniques than queries that are run infrequently but use a lot of resources. Generally HiPODS’ test cases are executed by a single user. This eliminates problems that could be caused by contention such as locking. After issues discovered during single user tests are dealt with, test cases that emulate actual loads are run to determine if there are contention issues. It is best to make queries efficient before addressing concurrency issues. Many times concurrency issues are resolved by making the queries efficient.
Acknowledgements
During the development, testing and, deployment of BlogCentral Version 2, the HiPODS teams in the United States, China, India and the United Kingdom contributed significantly to the project’s success. The authors of this paper would like to express our most sincere thanks to those teams as well as to the Version 2 extended team who helped us along the way. In particular, we would like to acknowledge the contributions of Marsha Brundage, Jim Busche, Julian Friedman, Jigar Kapasi, Phay Lau, Jaykumar Patel, and Robert Witherspoon for their invaluable help with the performance testing and preparing this paper.
©IBM Corporation 2006
26
BlogCentral Version 2 / Architecting for Performance
Notices
Trademarks The following are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both: AIX DB2 DB2 Universal Database IBM WebSphere Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Mercury LoadRunner is a registered trademark of HP Mercury. Other company, product, and service names may be trademarks or service marks of others.
Special Notice The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While IBM may have reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environments does so at their own risk. Performance data contained in this document were determined in various controlled laboratory environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments as system performance standards. The results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment.
©IBM Corporation 2006
27
BlogCentral Version 2 / Architecting for Performance