Cloud Auto-Scaling with Deadline and Budget Constraints by malj

VIEWS: 2 PAGES: 20

									Ming Mao, Jie Li, Marty Humphrey

             eScience Group

   CS Department, University of Virginia

        Grid 2010 – Oct 27, 2010
   A fast growing computing platform
     IDC - Cloud spending increases 27.4% a year to $56
      billion (compared 5% a year of traditional IT)
            $16.5 billion (2009) -> $55.5 billion (2014)
     src: Worldwide and Regional Public IT Cloud Service 2010-2014 Forecast


   Two most quoted benefits
     Scalable computing and storage
     Reduced cost

   Concerns
     Security, availability, cost management, integration
      interoperability, etc.
                      Q1. Cost – the most important factor in
                       practice?
                      Rate the benefits commonly ascribed to the                                                                  How important is it that Cloud service providers...
                                cloud on-demand model                                                                                      Offer competitive pricing                                              91.60%
            Pay only for what you use                                                           77.90%                              Offer Service Level Agreements                                              88.60%
       Easy/fast to deply to end-users                                                                             Option to move cloud offerings back on premise                                               87.80%
                                                                                                77.70%
                                                                                                                                       Provide a complete solution                                             86.00%
                    Monthly payments                                                           75.30%
                                                                                                                             Understand my business and industry                                              84.50%
        Encourages standard systems                                                     68.50%
                                                                                                                     Allow managing on-premise & cloud together                                              82.10%
  Requires less in-house IT staff, costs                                               67.00%                                        Support many of my IT needes                                           81.00%
    Alwasys offers latest functionality                                               64.60%                       Offer both on-premise and public cloud services                                         79.20%
Sharing systems with partners simpler                                                 63.90%                       Are a technology and business model innovator                                           78.30%
         Seems like the way of future                                        54.00%                                    Have local presence, can come to my offices                                      72.90%

                                      0.00%        20.00%      40.00%        60.00%       80.00%         100.00%                                                 0.00% 20.00% 40.00% 60.00%             80.00%   100.00%
                              Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009                                                             Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009




                      Q2. Moving into Cloud == Reduced Cost ?
   Resource utilization information based triggers (e.g.
    AWS auto-scaling, RightScale, enStratus, Scalr, etc)
   Multiple instance types


   Current billing models
     Full hour billing

   Non-ignorable instance acquisition time
     7-15 min in Windows Azure

   More specific performance goals
   Budget awareness (e.g. dollars/month, dollars/job)
                                                     Cloud
   Deadline               Users
                                       Application



    (Job finish time)
                                                     Cloud Server



    Cost
                                         Job



    Problem Statement – how to enable cloud
    applications to finish all the submitted jobs
    before user specified deadline with as little
    money as possible using auto-scaling.
   Workload are non-dependent jobs submitted
    in the job queue

   FCFS manner and fairly distributed

   Different classes of jobs

   Same performance goal (e.g.1 hour deadline)

   VM instances take time to startup
ni
tj
I
d
Vii,ii j




           Key variables used in the model
   Workload
     W  (J j , n j )

   Computing Power of Instance I i
                        D  nj
    P  (J j ,                                     )       Running Instance
                 
     i
                        j
                            t j ,type ( Ii ) n j


             ( D  (dtype ( Ii )  si ))  n j
P  (J j ,                                             )
                    j t j ,type( Ii ) n j                  Pending Instance
 i
   Scale up
     Sufficient budget
     Min(i ctype( Ii ') )         P ' W  P
                                         i   i

     Insufficient budget
      Max( P ')
             i               c
                              i   type ( Ii ')    C  i ctype( Ii )

   Scale down

       P  P W
         i   i   s
        Workload                             Required Computing Power
j1 :  x  60 10  10  40            j1 : 10              10        10  x   45
j2 :  y   60   5   20  35
                                  j2 : n1 '  5   n2 ' 20  n3 ' 10   y   35
                                                                                
j3 :  z  60 20  5  35
                                  j3 : 20              5         10  z  35
                                                                                
   P'      W       I1      I2                       V1          V2           V3      P'

                                Min(c1n1 ' c2 n2 ' c3n3 ')

               where        c1n1 ' c2 n2 ' c3n3 ' ctype ( I1 )  ctype ( I 2 )  C
                                    Cloud Cruise Control

            notify                           Decider

 admin                    Min( i ctype ( Ii ') ) &            Pj '  W  P
           dynamic                                          j

         configuration
                                                                        vm plan

                                                                     VM
                          Monitor          Repository
                                                                   Manager
                                                                                     +, –
                                              Config


               workload             update             update                  vm info




enqueue
                                               Historical
                                                                                         VM instances
                                                 Data
           users
                                                 dequeue
              Workload & VM simulation parameters

                    Mix             Computing          IO Intensive
               Avg 30 jobs/hour     Intensive        Avg 30 jobs/hour
               STD 5 jobs/hour    Avg 30 jobs/hour   STD 5 jobs/hour
                                  STD 5 jobs/hour
  General       Average 300s       Average 300s       Average 300s
0.085$/hour       STD 50s            STD 50s            STD 50s
Delay 600s
 High-CPU       Average 210s        Average 75s       Average 300s
0.17$/hour        STD 25s            STD 15s            STD 50s
Delay 720s
  High-IO       Average 210s       Average 300s        Average 75s
0.17$/hour        STD 25s            STD 50s            STD 15s
Delay 720s
             Stable Worload & Changing Deadline
Response (sec)                                            Utilization (%)
                                                                 100.00%
7000
                                                                 90.00%
6000
                                                                 80.00%
5000                                                             70.00%
                                                                 60.00%
4000
                                                                 50.00%
3000                                                             40.00%

2000                                                             30.00%
                                                                 20.00%
1000
                                                                 10.00%
   0                                                             0.00%

       0    10       20   30     40     50     60   70     80
                               Time (hour)
           utilization     deadline        avg      max         min
            Changing Workload & Fixed Deadline
Response (sec)                                                Worload (job/h)
 4000                                                                   350

 3500                                                                   300
 3000
                                                                        250
 2500
                                                                        200
 2000
                                                                        150
 1500
                                                                        100
 1000

  500                                                                   50

    0                                                                   0
        0        10     20     30    40     50      60   70      80
                                    Time (hour)
             deadline        avg      max         min     workload
                    VM Types               Total Cost ($)
                                         % more than optimal
Choice #1             General              98.52$ (43%)
Choice #2            High-CPU              128.86$ (87%)
Choice #3             High-IO              129.71$ (88%)
Choice #4   General, High-CPU, High-IO     78.62$ (14%)
 Optimal    General, High-CPU, High-IO        68.85$
   MODIS
200X – Year                                   Terra & Aqua – Satellite
(X - Y) – Day X to day Y                      15 images / day
                      Moderate scale test (up to 20 instances)
                              1hour deadline          2hour deadline        3hour deadline
      Terra 2004(10-12)         18 min late             8 min early          20 min early
         Total 45 jobs       9 C.H.or 1.08$           6 C.H or 0.72$         5 C.H.or 0.6$
       4 C.H.* or 0.48$
      Aqua 2008(30-32)              15min late         20 min early         29 min early
         Total 45 jobs            10 C.H or 1.2$      7 C.H.or 0.84$        5 C.H.or 0.6$
       4 C.H. or 0.48$

                          Large Scale test (up to 90 instances)
                                             2 hour deadline            4 hour deadline
       Terra & Aqua 2006(1-75)                  20min late                6 min early
             Total 1125 jobs                170 C.H. or 20.4$          132 C.H. or 15.84$
           93 C.H. or 11.16$
       Terra & Aqua 2006(1-150)             Admission Denied              22 min early
             Total 2250 jobs                                           243 C.H. or 29.16$
           185 C.H. or 22.2$
               * C.H. – computing hour             1C.H. = 0.12$ in Windows Azure
   Test: Terra & Aqua 2006(1-75) - total 1125 jobs
                  6min early
                  theoretical cost - 93 C.H. or 11.16$
                  actual cost - 132 C.H. or 15.84$
                                       Instance Acquisition and Release
                          40
                          38
                          36
                          34
                          32
                          30
                          28
                          26
        Instance Number




                          24
                          22
                          20
                          18
                          16
                          14
                          12
                          10
                           8
                           6
                           4
                           2
                           0

                               0   1         2                 3               4                  5
                                                 Time (hour)        Released       Acquiring   Ready
   Conclusions
     More cost-efficient than fixed-size instance choice
     VM startup delay can affect hugely in practice


   Future works
     More general cloud application model
     Multiple job classes
     Consider other instance types (e.g. spot instances &
      reserved instances)
     Data transfer performance and storage cost

								
To top