Docstoc

Evaluating Cloud Application Development and Runtime Platforms

Document Sample
Evaluating Cloud Application Development and Runtime Platforms Powered By Docstoc
					UNSW S1 2009
Research Project


Evaluating Cloud
Application Development
and Runtime Platforms

Project Team: Supervisors - A/Prof. Anna Liu (annaliu@cse)
and Dr. Helen Paik (hpaik@cse)

Students: Fei Teng (ften303@cse), Liang Zhao (lzha077@cse)
and Xiaomin Wu (xmwu432@cse)




                      Version: draft
TABLE OF CONTENTS




                    1
1 INTRODUCTION




                 2
2 EVALUATION METHODOLOGY

     2.1 OVERVIEW

Cloud platforms are still in the stage of evolving, without any wide-accepted public standard at the
moment yet. But it cannot stop companies keening to promote their products to be the first person
to try the cloud tomato.

In this evaluation, three big companies’ cloud platforms are involved. They are Microsoft Windows
Azure, Amazon Elastic Compute Cloud and Google App Engine. Since there is no wide-accepted
public standard at the moment, each platform comes with its own featured technologies and models.
It leads to severe difficulty of cross-platform evaluations. For the purpose of keeping measurements
commonality, timing, throughput and error rates are the main focus in qualitative evaluation, while
user experiences are the key consideration in quantitative evaluation.




     2.2 QUALITATIVE EVALUATION METHODOLOGY

          2.2.1 EVALUATION TERMINOLOGY

The entire qualitative evaluation is mainly based on kinds of timing and request that measured and
observed from client and cloud hosting server side. Before taking further steps to introduce
evaluation methodologies, it is necessary to make a clarification on these timing-relevant and
request-relevant terminologies.


               2.2.1.1 TIMING




                                    Figure 1: Time-relevant Terminologies



The figure above shows a full round-trip among a client, a cloud hosting server and a cloud database.


                                                                                                       3
    Client Response Time: it is the amount of network round-trip time between the client and the
     cloud host/database, plus the amount of time that required processing a client’s request on the
     cloud host/database. This client response time is observed directly in the client.
    Processing Time: it is amount of time the server needs to process a block of logic codes. In order
     to get an accurate processing time, a timer is set to start and end right before and after the
     code block. If a database transaction is involved in the code block, the processing time will also
     include database operating time and network round-trip time between the cloud hosting server
     and the cloud database. The processing time is attached in the response, and returned to the
     client.
    Database Processing Time: It is practically impossible to measure accurate time that a cloud
     database takes to finish a database transaction. Alternatively, for measuring database
     processing time between the cloud hosting server and the cloud database, the processing time
     could be used as setting a timer before and after database API methods. This database
     processing time is returned in the response of the web application. While for measuring
     database processing time directly between the client and the cloud database (only applying to
     Azure Storage and Amazon S3), the client response time between the client and the database
     could be applied. This database processing time is monitored directly in the client.




               2.2.1.2 REQUEST

To easier indentify all requests sent, they are categorized into four types according to their response
results.

    Incomplete Request: It is a type of requests that a client fails to send or receive.
    Completed Request: It is a request that a client sends successfully and receives a response from
     the cloud host/database at last.
      Failed Request: It is a completed request, but its response contains an error message.
      Successful Request: It is a fully completed request, without any errors.




          2.2.2 TEST CASES

To maximize the coverage of the evaluation, some scenarios are illustrated to help build test cases.

    Client – Cloud Host Evaluations: A user visits a web application on the cloud from an end user
     application. The client response time would be the user’s first concern to the cloud.
    Cloud Host – Cloud Database Evaluations: A user may send/receive an article or a form to/from
     the cloud database through the web application, the database processing time is a main factor
     taking into consideration. Meanwhile, if thousands of users take the same action concurrently,
     the performance of the cloud database would also be interesting.
    Client – Cloud Database Evaluations: For a large file transferring, whether or not a user can
     make a peer-to-peer connection between client and the cloud database without going via the
     cloud host. And also, the performance of this connection.
                                                                                                       4
More details will be discussed in the section of each test later.




          2.2.3 TEST STRATEGY

In order to perform all test cases above, two types of test strategies are adopted in the qualitative
evaluation.

    Stress Test Strategy: To obtain architecture insights, for instance, performance and potential
     errors, concurrent requests are sent to cloud platforms in specific configuration for different
     test cases.
    Singleton Request Test Strategy: To make numerical measurements on timing and throughput
     for cloud databases, requests are sent continuously one after another, avoiding stress affection
     and network traffic.
    Singleton Transferring Test Strategy: It is a revised version of singleton request test strategy to
     fit for large file transferring. Timing and throughput of RESTful cloud databases are measured.




               2.2.3.1 STRESS TEST STRATEGY




                                Figure 2: The Flow Chart of the Stress Test Strategy



In the stress test strategy, some experiences of implementation are worth discussing, for instance,
configuring three requests within every thread, varying number of concurrent threads, and repeating
tests.



                                                                                                        5
Stress test strategy is implemented in a manner of multi-threads programming. Within every thread,
three requests are set to send continuously to ensure that there is a period, which is mostly in the
middle of the test, to maintain requests in a concurrent way.

The number of concurrent threads is variable after every round to suit to tests. It could either be
increased to put more stress to cloud platforms, or be fixed to repeatedly verify results.

Furthermore, due to high network fluctuations overtime, outlier results may likely be encountered
during the evaluation. These issues can be addressed by running tests multiple rounds, and
scheduling tests repeatedly in different time slots. A corn job is invoked to arrange tests over 24 hour
period.

This test strategy is used in Client – Cloud Host Evaluations and Cloud Host – Cloud Database
Evaluations.




               2.2.3.2 SINGLETON REQUEST TEST STRAGTEGY




                          Figure 3: The Flow Chart of the Singleton Request Test Strategy



The flow chart of singleton request test strategy is a modified version of stress test, which disables
the multi-threads manner, cancelling increasable number of concurrent threads. In the chart, the
number of concurrent threads has been set to be one constantly. And within every thread, only one
request is sent, without any continuously requests.

In the singleton request test strategy, because there is only one request in every thread, and only
one concurrent thread at any time, the number of rounds here also indicates the total amount of
requests which are sent to a cloud platform. Since only one thread at any time, the behaviour of
sending requests would be one request after another to avoid stress affection.
                                                                                                         6
This test strategy is adopted in Cloud Host – Cloud Database Evaluations.




                         Figure 4: The Flow Chart of the Singleton Transferring Test Strategy



               2.2.3.3 SINGLETON TRANSFERRING TEST STRATEGY

The flow chart above shows another revised singleton request strategy. It is specifically designed for
large data supported storages, testing throughput directly from client application to cloud databases.
Data in three sizes are sent via RESTful protocol, ranging among 1 megabyte, 10 megabytes and 15
megabytes.

Furthermore, in order to address outlier issues which may occur in tests, multiple rounds are
enabled in this test as well.

This test strategy is adopted in Client – Cloud Database Evaluations.




                                                                                                    7
     2.3 EVALUATION APPLICATION ARCHITECTURES

          2.3.1 APPLICATION ARCHITECTURES ON CLIENT

This section lists architectures of two client applications used in the evaluation. The former one,
Contract-First Web Service based client, is designed for stress test strategy and singleton request test
strategy; and the latter one, RESTful based client, is implemented for singleton transferring test
strategy.




               2.3.1.1 CONTRACT-FIRST WEB SERVICE BASED CLIENT




                                Figure 5: Contract-First Web Service Based Client



The diagram above illustrates the testing model used for cross-platform evaluations based on
Contract-First Web Service.

As mentioned above, three platforms offer diverse programming languages for web application
development. In this evaluation, Microsoft Windows Azure implements C#; Google App Engine uses
Python; and Amazon runs Java on an Ubuntu-based instance machine.

For the purpose of keeping as much commonality as possible among three cloud platforms, the
Contract-First Web Service is dedicated to the evaluation. As following guidelines:

    A WSDL file is built first.
    Three hosting servers implement all functions defined in the WSDL file.
    A unified client application is created from the WSDL file. So it can communicate with different
     platforms via the same protocol.



                                                                                                        8
               2.3.1.2 RESTFUL BASED CLIENT




                                       Figure 6: RESTful Based Client



This application architecture is implemented for singleton transferring test strategy, especially for
Azure Blog Storage and Amazon S3. App Engine Datastore is not in the test list, since it does not have
any protocol to do client to cloud database communication. When doing PUT/GET/DELETE RESTful
action, the client reads binary data from the local machine, and sends them directly to Azure Blob
Storage or Amazon S3




          2.3.2 APPLICATION ARCHITECTURES ON CLOUD PLATFORMS

This section illustrates application architectures implemented on each cloud platform. Since so far a
public standard of cloud platforms has not been established yet, each platform applies its own
featured technologies and models to suit the WSDL/REST-based evaluation.




               2.3.2.1 MICROSOFT WINDOWS AZURE

This diagram below illustrates the web application architecture used on Microsoft Windows Azure
Instance. Microsoft Windows Azure provides a Windows-based environment for applications running
and data storing on servers in Microsoft data centres in a distributed manner.

As can be seen in the figure, runtime environments, framework libraries, software development kits
and other Microsoft libraries have already been encapsulated into Windows Azure. Developer can
simply put their main focus on design of business logic. In the evaluation, Windows Communication
Foundation was selected to work as web role among various models in the Framework, adopting
service codes in C# and implementing as a web application. By invoking Azure SDK, Windows


                                                                                                        9
                         Figure 7: Web Application on Microsoft Windows Azure Instance

Communication Foundation communicates with multiple Azure Data Storages, which are sitting on
the cloud via RESTful protocol.




              2.3.2.2 AMAZON ELASTIC COMPUTE CLOUD




                                 Figure 8: Web Application on Amazon Instance



Amazon Elastic Compute Cloud, known as Amazon EC2, is a highly configurable cloud. Developers are
flexible to setup their favourite software and applications on any operating systems that Amazon
Instances supports.

For the instance used in the evaluation, an Ubuntu-based instance is used to hold Tomcat as Servlet
on top of Java 1.6 SDK. Third party framework Apache CXF is used to provide SOAP protocol. By using
JPA 1.0, Apache CXF is enabled to communicate with PostgreSQL, which is installed on the same
instance machine as where web application is hosted. With Amazon API, Apache CXF is allowed to
invoke Amazon’s cloud databases, Amazon SimpleDB and Amazon S3, via SOAP or RESTful protocol.



                                                                                                10
               2.3.2.3 GOOGLE APP ENGINE CLOUD




                                Figure 9: Web Application on Google App Engine



Google has made two kinds of programming languages available on Google App Engine,
comparatively, Python support is originally delivered since the first release of Google App Engine. It is
much more stable in practise at the moment. Therefore, in the evaluation, Python and its third party
frameworks, ZSI and Zope Interface, offering SOAP protocol supports in Python, are selected to
develop and deploy the web application codes.

In addition, Google App Engine SDK is used to make a connection via inner protocol between server
and Google stateful services, App Engine Datastore and App Engine Memcache. The former one
offers cloud database behind web applications, while the latter one provides storage for data
caching.




                                                                                                      11
2.4 QUANTITATIVE EVALUATION METHODOLOGY




                                          12
3 QUALITATIVE EVALUATION

     3.1 SUMMARY OF EVALUATION




     3.2 STRESS ROUND-TRIP TEST


The three figures above indicate changes of average client response time when cloud hosting servers
are under variant amount of stress, at various time and date

From figures, latencies are dramatically increased after 2100 concurrent requests. Two statements
could be raised. Firstly, it could be difficult for a limited number of test machines to challenge the
entire cloud hosting servers. Secondly, even the latencies were due to the burden raised on cloud
hosting servers, a quota of 2100 concurrent requests is efficient enough for nowadays enterprises.
Taking the ticket booking system of 2008 Beijing Olympic Game as an example, it crashed when its
burden reached at 2200 requests per seconds.




                                                                                                         13
     3.3 SINGLETON DATABASE WRITE/READ TEST

          3.3.1 TEST CONFIGURATION

This is a test case of Cloud Host – Cloud Database Evaluations, based on the singleton request test
strategy, following a scenario that a user sends/receives an article or a form to/from the cloud
database through the web application.

Base on the scenario, requests of various sizes, simulating a character (1 byte), a message (100 bytes),
an article (1 kilobyte) and a small file (1 megabyte), will be sent one by one to the web application
which hosted on the cloud. The database processing time will be measured on the cloud hosting
server, and sent back to client. For each test, the number of requests is fixed.

In terms of specifications, Amazon SimpleDB and Azure Table Storage are advertised to store
structure data, while Amazon S3 and Azure Blob Storage are aimed for storing binary data. In the
evaluation, request which is no larger than 1 kilobyte will be stored into structure data oriented
database, and the one which is larger than 1 kilobyte will be put into binary data oriented database.
But for App Engine Datastore, there is not separation on structure data and binary data in database
level, all data are supported in the format of string, text, and blob in property level.




          3.3.2 SINGLETON DATABASE WRITE TEST




                          Figure 10: Average Singleton Writing Time on Cloud Databases

                                                                                                      14
On the view of average database processing time, overall, each singleton database processing time
for writing small requests (1 byte, 100 bytes, and kilobytes) on cloud databases varies in a small
range. It suggests that the size of small requests doesn’t affect too much on database processing
time on every cloud databases.

The figure also states that Amazon LocalDB shows its strength from 1 byte to 1 kilobyte. It is mainly
due to the stressless test environment, so that a local database without any optimizing can still
handle requests normally. In addition, building the local database and the web application in the
same Amazon Instance might shorten the database processing time, comparing to the time uses by
other cloud hosting servers and corresponding cloud databases, which may not sit in the same cloud,
leading to a smallest time.

When the size of request goes to 1 megabyte, Amazon S3 almost has the same performance as App
Engine Datastore while Azure Blog Storage takes less time than others.

By diving request size by database processing time, the speed of every database transaction could be
calculated to help build the CDF of singleton write operation. It reflects the different write speed on




                         Figure 11: CDF of Singleton Write Throughput on Cloud Databases

each cloud database for different request sizes.

Overall, by increasing the size of the request, the transfer speed is getting faster progressively. It
indicates that the connection between the cloud hosts and the cloud databases is fast and stable on
three cloud platforms. Comparatively, Amazon SimpleDB has the slowest speed, which is worse than
App Engine Datastore and Azure Table Storage

In the first three tests (1 byte, 100 bytes, and 1 kilobyte), App Engine Datastore, Azure Table Storage,
Amazon LocalDB and Amazon SimpleDB are conducted. The order is quite stable in which Amazon
LocalDB performs much faster than others.



                                                                                                     15
As for the 1 megabyte test, three cloud platforms perform similarly. Approximately 80% of their
requests’ speeds almost approach 10 megabytes per second.




                         Figure 12: Average Singleton Reading Time on Cloud Databases



         3.3.3 SINGLETON DATABASE READ TEST

These two diagrams indicate database processing time of reading requests and CDF of read
throughput on different cloud databases.




                                                                                                  16


                         Figure 13: CDF of Singleton Read Throughput on Cloud Databases
One interesting point could be drawn is that comparing with singleton database write test, the
database processing time for all cloud platforms decreases dramatically except for Amazon S3, which
takes longer time than it is in singleton writing test.

Another observation is that Amazon SimpleDB changes its position from the last one to second last.
Azure Table Storage performs better than Amazon SimpleDB in write, but worse in read.




     3.4 STRESS DATABASE WRITE/READ TEST

          3.4.1 TEST CONFIGURATION

Based on the stress test strategy, another case of Cloud Host – Cloud Database Evaluations is
performed to simulate a scenario that multiple users take the write/read action concurrently.

In this test case, the number of requests varies, but the size of each request is fixed. Among three
platforms, Google App Engine has a strict quota limitation for free use. The incoming bandwidth is
limited at maximum 56 megabytes per minute. Since the number of concurrent requests in the test
varies from 300 to 3300, in order to suit the incoming bandwidth limitation, the size of the request
has to be set to 1 kilobyte.

A cron job is scheduled to perform repeatedly stress database tests over 24 hours. And every test
produces a copy of test results for further analysis. Therefore, for each platform, 6 copies of writing
test results and 5 copies of reading test results in different time period are collected, so 44 copies of
results in all.

By analyzing all these results, not only error rate tables, error detail lists, and a CDF of throughput
will be established, but also high network fluctuations on the test over time will be observed.




          3.4.2 ERROR DETAILS LIST

A variety of errors occurred during the stress database test. In terms of phases where errors are
thrown, all faults are categorized into three categories.

    Connection Error: The error is encountered if a request does not reach cloud hosts, due to
     network connection problems, such as packages lose, proxy gateway temporary unavailable.
     This request is also called incomplete request, according to the terminology definition.
    Server Error: The Fault occurs within cloud hosting servers, for instance, web application is not
     able to allocate resources to current request. The request goes back to the client eventually,
     with an error message. This request goes into the category of failed request.
    Database Error: The error comes from cloud database in the period of database processing time.
     The error of this request is also sorted as failed request.


                                                                                                          17
Error details in each category are listed as following.


                                          Table 1: Error Details Table

    Error
                     Error Messages                             Reason                  Happened on
  Category
                                             Multiple action perform at the
                                             same entry, one will be processed
                 datastore_errors:                                                  Google App Engine
                                             others will be failed
                 Timeout                                                            Datastore
                                             Request takes too much time to
                                             process
  Database
                 datastore_errors:           An error occurred for the API          Google App Engine
   Errors
                 TransactionFailedError      request datastore_v3.RunQuery()        Datastore
                 apiproxy_errors:            Too much contention on these           Google App Engine
                 Error                       datastore entities                     Datastore
                 Amazon SimpleDB is
                                             Too many concurrent requests           Amazon SimpleDB
                 currently unavailable
                 Unable to read data
                                                                                    Microsoft Windows
                 from the transport          WCF failed to open connection
                                                                                    Azure
Server Errors    connection
                 500 Server Error            HTTP 500 ERROR : Internal Error        Google App Engine
                 Zero Sized Reply                                                   Amazon EC2
                                                                                    Microsoft Windows
                 Read timed out              HTTP time out                          Azure
                                                                                    Amazon EC2
                                                                                    Microsoft Windows
                                                                                    Azure
                 Access Denied               HTTP 401 ERROR
                                                                                    Google App Engine
                                                                                    Amazon EC2
 Connection                                  Java IO exception, due to machine
                                                                                    Microsoft Windows
   Errors                                    limit, too many concurrent
                 Too many open files                                                Azure
                                             requests (too many threads) have
                                                                                    Amazon EC2
                                             been launched
                                                                                    Microsoft Windows
                 Network Error
                                             Local proxy connection error           Azure
                 (tcp_error)
                                                                                    Google App Engine
                 Unknown Host                                                       Microsoft Windows
                 Exception                                                          Azure




          3.4.3 DATABASE AND SERVER ERROR PRECENTAGES OF REQUESTS

On a client application, the number of rounds is set to 6 and the initial concurrent threads are 100,
according to the stress test strategy. Therefore, all six rounds will start 100, 300, 500, 700, 900 and
1100 concurrent threads gradually, with three continuous requests in each thread. The overall
requests sent will be 10800 for each client application.

                                                                                                          18
To maximize the stress from the client side, three test machines are deployed to run client
applications collectively at the scheduled time. In all, 32400 requests in each scheduled test.




                              Figure 14: Overall Error Percentages of Writing Requests

The overall writing error percentages chart is generated from the average results of retained data
from scheduled tests over 24 hours. The chart illustrates the performance of cloud hosts and cloud
databases from the aspect of error rates.

Considering the number of requests sent in all, although App Engine Datastore and Amazon
SimpleDB threw average 31.67 and 111.17 faults in each scheduled test, separately, the overall
performance of all cloud databases are still acceptable, keeping the database correct rates at a high
level. Even the worst one, still makes the rate more than 99.67% of completed requests.

Among all cloud platforms, Google App Engine drops the most number of server errors, containing
“500 Server Error” messages. The largest server error rate and database error rate happened on time
after May 21 16:30 EST 2009, when was May 20 23:30 PST 2009 to Google App Engine as well.
Checking its host and database status diagram on that day, there were some large latencies on both
host and database, around one or half hour earlier than the scheduled test. It could be a cause. But
since it is hard to prove that earlier latencies affected the later test, it is still not certain that the
latencies were the main reason which led to high error rates.




                                                                                                       19
                                 Figure 15: Overall Error Percentages of Reading Requests

For the overall reading error percentages chart, the correct rates of cloud databases and cloud hosts
are even higher, almost 99.99% of completed requests. All cloud platforms performance in a
significant good condition over different time periods.

For both stress database write and read test, the percentages of connection errors among all
requests on Amazon LocalDB and App Engine Database vary in a range of 15% to 20%. Amazon
SimpleDB takes the minimum rate, less than 10%, in both tests, respectively, almost reaching 0% in
stress database read test. Contrarily, Azure Table Storage occupies a largest rate in stress database
read test, which is more than 30%. More details about connection errors will be discussed in the next
section.




          3.4.4 CONNECTION ERROR PERCENTAGES OF ROUNDS

Besides of database errors and server errors, connection errors are incomplete requests failed
reaching cloud hosts mainly due to high network fluctuations, cloud hosting capabilities or security
issues.


           Table 2: Average Connection Error Percentage of All Requests of Rounds in Stress Database Write Test


                             Round 0          Round 1          Round 2          Round 3          Round 4          Round 5
                               300              900             1500             2100             2700             3300
App Engine Datastore          4.61%           11.83%           23.46%           22.30%           26.67%           28.72%
 Azure Table Storage          0.00%            0.00%            0.21%            2.98%           19.62%           30.54%
  Amazon SimpleDB             1.15%            0.12%            0.97%            6.81%           11.01%           11.13%
   Amazon LocalDB             0.00%            0.53%            6.35%           16.02%           19.88%           23.93%
                                                                                                                       20
           Table 3: Average Connection Error Percentage of All Requests of Rounds in Stress Database Read Test


                             Round 0          Round 1          Round 2          Round 3         Round 4          Round 5
                               300              900             1500             2100            2700             3300
App Engine Datastore          2.11%           11.08%            9.83%           10.74%          23.17%           21.75%
 Azure Table Storage          0.00%            0.00%            0.03%           30.24%          48.65%           52.53%
  Amazon SimpleDB             0.00%            0.08%            0.00%            0.05%           0.32%            0.20%
   Amazon LocalDB             0.00%            0.48%            9.44%           17.09%          23.57%           29.97%



For each platform, in both read and write test, the trend of the average connection error
percentages of rounds tends to rise when the number of concurrent requests increases. But App
Engine Datastore and Amazon SimpleDB have a smaller percentage trend in read test than write test,
while Azure Table Storage and Amazon LocalDB are contrary, higher trends in read test than write
test.

Amazon SimpleDB keeps a lowest percentage in both write and read test, approaching to 0% in read
test. But Amazon LocalDB, which shares the same hosting server instance and cloud network
environment with Amazon SimpleDB, starts receiving high connection error percentages from Round
2.

The reason for this phenomenon is that the local database setting in the instance in Amazon LocalDB
takes a lot of computing resources from the host, pushing the host to a limitation of capability and
leading dropping package.

For Azure Table Storage, the connection error percentages begin to leap, from less than 1% in Round
2, to more than 50% and 30%, separately in read test and write test, in Round 5. The overall error
connection requests take almost one third of all read requests as well. Most of connection errors are
raised due to “Read time out”. It occurs 9728.20 times on average in each scheduled test, varying in
a range of 8156 times to 12775 times.

The “Read time out” message means that a client application does not receive any response for its
sent request, leading to the occurrence of time out error. Because there is no way to get into
Microsoft Windows Azure’s instance, as Amazon does, to identify this issue. Some conclusions can be
assumed, that:

    The cloud hosting server of the web application reaches its capability.
    The network from Australia to USA is not well connected as expected due to insufficient
     external connections and geography.
    Peers in the same network have some effects on the test.

For App Engine Datastore, it keeps the connection error percentages around 25% from Round 2 to
Round 5 in write test, and less than 25% in read test. Most connection errors from Google App
Engine contains “Access Denied” message, which is a standard HTTP 401 error. But there is no HTTP
401 error in web application access logs. It means these requests are blocked before getting into the
web application. The conclusion can be presumed that the access is restricted due to a firewall.
When thousands of requests go into Google App Engine concurrently from the same IP, the rule of
the firewall may be triggered.
                                                                                                                      21
          3.4.5 CDF OF STRESS WRITE/READ THROUGHPUT ON CLOUD
                DATABASES




                        Figure 16: CDF of Stress Write/Read Throughput on Cloud Databases



Within every response of a successful request, there is a database processing time attached. The
time could be used together with the request time to calculate the speed of every database
transaction, and draw the CDF of stress write/read throughput. All data in Round 3 of stress tests are
used to draw this diagram.

According to the diagram, the first thing needs to be considered is that, instead of being the first
place in writing and reading test, Amazon LocalDB conversely performs the worst among all these
platforms which implies the poor handling concurrent requests capability. Moreover, except for
Amazon LocalDB, App Engine Datastore, Amazon SimpleDB and Azure Storage all show higher speed
in read test rather than write test, while Amazon LocalDB seems quite similar.

Comparing to the local database, all cloud databases show an impressive scalability in write and read
to some extent.




     3.5 SINGLETON LARGE FILE WRITE/READ/DELETE TEST

          3.5.1 TEST CONFIGURATION

Singleton large file tests are based on the singleton transferring test strategy, implementing as cases
of Client – Cloud Database Evaluations to simulate a scenario that a user transfer different sizes of
large file directly to the cloud databases.

                                                                                                     22
          3.5.2 SINGLETON LARGE FILE WRITE TEST




                  Figure 17: Large File Average Write Time Directly from Client to Different Platforms



This diagram illustrates the average time has taken to upload large binary files to cloud databases
directly from the client application. From the figure, both average write time of Azure Blob Storage
and Amazon S3 are exactly the same. That is probably due to the uploading limitation of the local
network environment, the test reaches the threshold of the local network before getting insights of
the cloud databases.


          3.5.3 SINGLETON LARGE FILE READ TEST




                  Figure 18: Large File Average Read Time Directly from Client to Different Platforms    23
This line chart shows the average time has been taken to retrieve binary files from cloud databases
to the client application. Comparing with the figure of large file average write time, it can be seen
that Amazon S3 has a faster speed on writing than reading, while Azure Blob Storage’s read speed is
faster than writing.


          3.5.4 SINGLETON LARGE FILE DELETE TEST




                  Figure 19: Large File Average Delete Time Directly from Client to Different Platforms



This diagram shows the average time has been taken to process delete action on cloud databases
which triggered directly by the client application. It is confirmed that neither Amazon S3 nor Azure
Blob Storage will delete data entries on the fly when they receive the signal. Both of them mark the
entry as “to be deleted”, and reply the client with “successful deleted” message at the first stage.
The real delete action will be performed later.




                                                                                                          24
4 QUANTITATIVE EVALUATION

     4.1 PRODUCTIVITY SUPPORTS FOR THE DEVELOPERS

          4.1.1 DEVELOPMENT UTILITIES

Taking a snap shot into Microsoft Windows Azure, heavily equipped frameworks and environments
are its highlight. Almost all existed Microsoft web development frameworks and runtime
environments are supported in Microsoft Windows Azure. With aids of these toolkits, developers can
simply put their main focuses on the business logic implementation with C# or PHP. But the
downside is obvious as well. They have to stick with Microsoft programming environments, for
instance, Microsoft Visual Studio.

Speaking of Amazon EC2, an administration role will be granted to developers when using a virtual
machine instance. They are allowed to install whatever programming environments they want in the
instance. In other words, there is no restriction at all on selecting programming environments on
Amazon EC2. But on the other hand, extra works need to be done, for example, uploading and
installing application runtime environments, setting up and connecting cloud databases from the
instance manually.

Different from Microsoft Windows Azure which offers fully functioned frameworks, and Amazon EC2
which provides highly configurable environment, Google App Engine re-implements programming
languages to suit Google App Engine. Up to present, Google has enabled Python and JVM-supported
languages on its cloud platform. Developers are free to choose frameworks based on Python and
JVM-supported languages to improve productivity. But, in practical, some limitations on Google App
Engine restrict the choices, for instance, no multiple threads, no local I/O access, and 30 seconds for
a request handler. Besides of these, Google offers other Google APIs to integrate Google App Engine
with other Google services.




          4.1.2 LEARNING RESOURCES

Microsoft provides several official cannels to help developers.

    "How Do I?" Videos for the Azure Services Platform
     http://msdn.microsoft.com/en-us/azure/dd439432.aspx
    Channel9
     http://channel9.msdn.com/
    Azure Service Platform Resources
     http://www.microsoft.com/azure/resources.mspx
    Steve Marx Blog
     http://blog.smarx.com/


                                                                                                     25
Amazon EC2



Google App Engine

   Google App Engine on Google Code
    http://code.google.com/appengine/
   Google App Engine Blog
    http://googleappengine.blogspot.com/
   Google Developers’ Channel on YouTube
    http://www.youtube.com/user/GoogleDeveloper




    4.2 IMPLEMENT

         4.2.1 WEB HOSTING SERVICE

Microsoft Windows Azure provides web role pattern for creating front-end web application. A web
role is a web application which is accessible via an HTTP/HTTPS endpoint. Developers are also
allowed to modify the number of instances to implement more web roles, making a scale of resource
usages.

Google App Engine supports native Python and all JVM supported languages, endpoint is exposed via
a unique URL. By choosing a proper third party web framework will be good enough to achieve any
web application. However each request cannot be process longer than 30 seconds by Google App
Engine.

Amazon EC2 offers virtualised hardware resources in an instance. Comparing with Microsoft
Windows Azure and Google App Engine, the user has the ability to choose its own runtime
environment, which available in the market ranging from open source to purchased products. User
has the flexibility to choose its own preference and do the customization for their on-demand used.




         4.2.2 COMPUTING SERVICE

Worker role pattern in Microsoft Window Azure is designed for providing a back-end processing
application which can communicate with Azure Storage services and other Internet-based services.
But it is not allowed to expose any external endpoints to users, which means no listener for incoming
requests over HTTP/HTTPS. However, a web role and a worker role can live in the same instance,
working on listening and computing separately. The same as web roles, the number of worker roles
can also be modified to scale resource usages.



                                                                                                  26
As for Amazon EC2, developers can run any computing software to take advantage of computing
resources on an instance. Moreover, multiple instances can be created and deleted whenever
needed in order to give a scalable resource for computing service.

Essentially, Google App Engine is not designed for doing long time processing. Any requests that are
longer than 30 seconds will be terminated automatically. But by scheduling corn jobs in Google App
Engine, web applications on the cloud host can still be used to process some light-weighted
computing tasks within 30 seconds.


          4.2.3 DATA SERVICE

Because cloud databases used in three vendors behave in some distributed manners, some relational
database logics do not fits new situations. The most significant one is that none of then supports SQL
join statements. Also, based on different theories used in these cloud databases, terminologies are
identically different when describing some relational database liked concepts.

Relation in the relational database is called kind in App Engine Datastore, or bucket in Amazon S3.
Attribute is named property in App Engine Datastore. While an entity in Google App Engine is used to
describe a tuple.

Microsoft provides two kinds of data services in the cloud, Windows Azure Storage and Microsoft
SQL Data Service, however none of them support transaction up till the report writing.

Windows Azure Storage is a REST-based, HTTP 1.1 only, dynamic cloud storage service. It contains
three storage services, Azure Blob Storage, Azure Table Storage and Azure Queue Storage, aiming for
binary data, structured data and communication between web roles and work roles respectively.

For Azure Blob Storage, the maximum allowed file size is 50 gigabytes. But a single file, which can be
uploaded to the storage at once, must be no larger than 64 megabytes. When the file excesses 64
megabytes, developers must trunk it into blocks, each in size of 4 megabytes, and 50 gigabytes in
total.

For Azure Table Storage, all properties put into one row must no larger than 1M in total, and the
maximum number of properties has to be within 255. Furthermore, every row of an entry must be
assigned a unique partition key under the account, and a unique row key under its partition.
Importantly, Azure Table Storage stores data in a distributed manner based on the partition key.

The primary role of Azure Queue Storage is to provide a way for web roles to communicate with
worker roles. Within Azure Queue Storage, every message put into queue cannot be larger than 8
kilobytes. But the number of messages is not limited.

Regardless of how data is stored, in Azure Blob Storage, Azure Table Storage or Azure Queue Storage,
all data held in Windows Azure Storage are replicated three times for fault tolerance. The cloud
database guarantees consistency, so the data read is expected.

Microsoft also provided another kind of cloud storage service, named SQL Data Service, supporting
SOAP and REST. But the protocol was later deprecated. Instead, Microsoft announced a new protocol,

                                                                                                    27
called Tabular Data Stream, would be release in the future. Since SQL Data Service is deprecated in
the middle of the evaluation, and development documents about Tabular Data Stream are still
incomplete, the evaluation of cloud database on Microsoft Azure Cloud mainly focuses on Azure
Data Storages.

Amazon has two kinds of data services too, Amazon SimpleDB for structured data and Amazon S3 for
binary data.

Amazon SimpleDB provides the core database functions of data indexing and querying. Comparing
with the traditional relational database, Amazon SimpleDB requires less administrative burden of
data modelling, index maintenance, and performance tuning. However, there are some limitations
on Amazon SimpleDB for users’ consideration. For instance, the attribute value is limited to 1024
bytes, and the number of attributes is limited to 255 in one table.

Amazon S3 claims to offer unlimited storage. With the simple module design, developers can write,
read and delete objects ranging from 1 byte to 5 gigabytes for one single data. However, before
creating an object into cloud, a bucket, which name should be global unique in the world, has to be
explicitly created.

Google App Engine packs its database functions in App Engine Datastore, another implementation of
Google BigTable, to index and store data. It equips with some unique features, supporting
transaction operation, building index automatically according to queries defined in web applications,
using reference property to operate cross kind query, implementing data models for kind inheritance
and dynamic increasing properties. But up to now, App Engine Datastore can only communicate
through the cloud hosting server via inner protocol.

In App Engine Datastore, an entry is limited to 1 megabyte. Different from other cloud databases, it
does not offer storage for different data in database level. Only some properties are introduced to
store binary data, text data, string data and list data.


          4.2.4 DEPLOYMENT AND MANAGEMENT SUPPORT

Microsoft Windows Azure provides a web portal, Azure Services Developer Portal, for developer to
deploy and manage application onto the cloud host. Developers can make choice between two
development methods, staging deployment and production deployment. The staging deployment is
used only for testing purpose. By default, every deployed application is set in staging deployment at
first, swapping to production deployment for releasing by developers after testing. After successful
deployment, developers can transfer application logs to Azure Blob Storage, adjust the number of
running instances.

Within Azure Services Developer Portal, analytics is provided to monitor the status of application
status. For example, virtual machine usage in date range, in-bound and out-bound bandwidth
network status for both Hosting Server and Storage service.

Amazon EC2 provides a set of command line tools to help developers interactive with the cloud
platform to create, reboot, and terminate an instance. Besides of command line tools, another
common tool used by most of developers is called AWS Management Console, which gives a quick,
                                                                                              28
global picture of the cloud platform so that developers can access and manage web services. But
graphical interface, like AWS Management Console, is only available to Amazon EC2 and Amazon
Elastic MapReduce up to now, other infrastructure services will supported in the console in the
future.

Google App Engine SDK comes alone with command line tools to deploy applications onto Google
App Engine. Before submitting, a string value, called version, could be defined to separate the
submitted version from hosted versions. Although only one version can be set to be released,
multiple other versions are still allowed to be hosted in the cloud, accessed with corresponding
endpoints, for test and maintenance purpose.

A rich web portal is provided to manage and monitor the web application in Google App Engine.
Accessing logs, resource usages, performance charts, and error rates can all be accessed through this
web portal. But accuracy and frequency of updated usages varies. Accessing logs are updated on the
fly when a new request comes, but the usage of stored data only is updated once a day, some
updates in the intervening period is just some estimates.




          4.2.5 COSTS

A notice in first stage, all pricing in this section uses US Dollar as a currency unit.

4.1 Microsoft Windows Azure

Microsoft Windows Azure is in Beta version. It is free to be used. But some limitation is applied. It is
free to use for 2000 VM hours, cloud storage capacity maximum is up to 50GB, total bandwidth is
20GB for every day.

A Consumption-Base Model [http://www.microsoft.com/azure/pricing.mspx

] is raised by Microsoft for the future charging, which charge user for the computing resources that
applications use.

4.2 Amazon

Amazon Web Service has been put onto commercial since early 2006. Pricing detail can be list in grid
[http://aws.amazon.com/s3/#pricing, http://aws.amazon.com/ec2/#pricing,
http://aws.amazon.com/simpledb/#pricing]. Amazon also provides an easy to use tool to estimate
monthly fee Simple Monthly Calculator [http://calculator.s3.amazonaws.com/calc5.html].

Speaking for Amazon EC2, Amazon provides two kinds of instance, Reserved Instance and On-
Demand Instance. Reserved Instance enable user to make one-off payment. User can rent instances
in a certain period of time, while no extra obligation will be needed to take care. However it only
allow for Linux/Unix instance [http://aws.amazon.com/ec2/#pricing].

For On-Demand Instance, users only pay for the usage. When renting instance from Amazon EC2,
charges will be applied for the type of instance by using hour as unit. Base on different criteria, the
                                                                                                          29
cost will be variance. Such as location of the instance choosing by user, the type of instance in
different CPU performance and the operation system, Linux/UNIX or Windows, installed onto the
instance.

Moreover, incoming and out outgoing bandwidth is also charged. Incoming bandwidth is fixed $0.10
per GB per month while outgoing bandwidth starts from $0.17 per GB per month for the first 10TB
variant to $0.10 per GB while over 150 TB per month, the same as SimpleDB and S3. Extra storage for
instance can be purchased, but cost will be variance by the geography location of the instance, and
the I/O frequency usage. Public IP Address, Amazon CloudWatch and Load Balancing are all charged
in time units.

Speaking for data storage, Amazon released two cloud database services to the public, Amazon
SimpleDB and Amazon S3.

In terms of Amazon SimpleDB, cost will be applied onto three aspects, Machine Utilization, Data
Transfer and Data Storage. Machine Utilization charges for the CPU consuming time for processing
queries. Data Transfer charges for the incoming and outgoing data transfer, while the first 1 GB is
free in every month. And Data Storage is in a fixed rate $0.25 per GB-month thereafter.

On Amazon S3, cost will be applied onto three aspects, Requests, Data Transfer and Data Storage.
Requests are counted in terms of the number of action for PUT, POST, COPY and LIST. Data Transfer
and Data Storage are all charges in GB as unit, and the rates will be decreased when the usage is up
to a certain amount. But there is no free quota for every month when comparing with Amazon
SimpleDB.



4.3 Google App Engine

Google App Engine is free to be used under free quota which will be refreshed every day. Only if the
application is running out of the free resource, Google App Engine will then charge user in seven
aspects, request, datastore, mail, UrlFetch, image manipulation, memcache and Deployments
[http://code.google.com/appengine/docs/quotas.html]. Specially, Google App Engine provides
budgeting functionality for user, which can set up the maximum budgets for all the seven aspect.

Hosting application in Google App Engine is free, but the number of times to deploy the same
application every day must not more than 250. And every single file must not larger than 10
megabytes.

In sum, Google App Engine charges the usage of incoming and outgoing bandwidth in gigabytes as
unit, $0.12 and $0.10 separately for every unit. And CPU consuming time is $0.10 for every hour
usage.

Speaking for data storage, Google App Engine charges data by using month as time unit, and
gigabytes as size unit. For every month $0.15 will be applied for one gigabytes data.

Specially, email service is charged by the number of recipients. For each recipient it cost $0.0001.


                                                                                                       30
5 Securety

5.1 Microsoft Windows Azure

When using Microsoft Windows Azure, Username and password are needed to provide credentials,
which is a Windows LiveID. Under the LiveID, Hosting Server and Azure Storage Service can be
created in a limited number. When visiting the application which has been deployed onto Hosting
Server, requests can be made via a unique URL travel on HTTP/HTTPS. When accessing Azure Storage
Service, a storage account and a 256-bits share key, which generated by the Azure Service Developer
Portal, are needed for authentication. There is an exception for Windows Azure Blob Storage, which
its container [http://msdn.microsoft.com/en-us/library/dd135733.aspx] can be set to be public. For
this special right granted, people around the world are allowed to visit the data within that specific
container.



5.2 Amazon Web Service

To access Amazon EC2, public/ private X.509 Key pair was used to access to instances that you launch,
if the instances were created from public AMIs provided by Amazon EC2. After you generate a key
pair, the public key is stored in Amazon EC2 using the key pair name you selected. Whenever you
launch an instance using the key pair name, the public key is copied to the instance metadata. This
allows you to access the instance securely using your private key. As for Amazon S3, in order to give
customer the permissions they wish to have, the data in Amazon S3 was organized in bucket level
and object level. By default, these objects were only accessed by the creator of these data. So, the
customer can modify, delete and grant permissions for the data they created. The permission in
SimpleDB is controlled in domain level. Therefore, only the domain creator or authenticated user can
access data. Amazon SimpleDB also provides SSL-encrypted endpoints for customer to access data.
However, the data in SimplDB is not encrypted, if users want the data to be encrypted, and then they
have to encrypt the data before they send it to Amazon SimpleDB.



5.3 Google App Engine

In Google App Engine, a Google account is needed for deploying application onto cloud and login to
web portal to manage cloud application. Unlike cloud data store service provided by Microsoft and
Amazon, Google Cloud Datastore is not exposed to the universe. It can only be visited from Google
App Engine Hosting Service via inner protocol. And a unique URL is provided by Google App Engine
for people outside the world to make request to cloud application via HTTP/HTTPS.




                                                                                                   31
32

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:15
posted:11/11/2011
language:English
pages:33