PERFORMANCE ANALYSIS FOR MULTITASK IN DISTRIBUTED CLOUD COMPUTING

Document Sample
PERFORMANCE ANALYSIS FOR MULTITASK IN DISTRIBUTED CLOUD COMPUTING Powered By Docstoc
					                                                                                                             5
National Conference on Role of Cloud Computing Environment in Green Communication 2012


           PERFORMANCE ANALYSIS FOR MULTITASK IN DISTRIBUTED CLOUD
                                COMPUTING

            Ms. S.Evangelin Perciyal                         Ms.P.Michal Jeba kumari M.E

                     II nd year M.E Computer Science engineering, The Indian engineering college,
        vadakkankulam evangelinselwyn@gmail.com



             1. ABSTRACT

        Abstract—Cloud computing is an emerging              cloud. Last, we compare through trace-based
        commercial infrastructure paradigm that              simulation the performance characteristics
        promises to eliminate the need for                   and cost models of clouds and other
        maintaining expensive computing facilities           scientific computing platforms, for general
        by companies and institutes alike. Through           and MTC-based scientific computing
        the use of virtualization and resource time          workloads. Our results indicate that the
        sharing, clouds serve with a single set of           current clouds need an order of magnitude in
        physical resources a large user base with            performance improvement to be useful to
        different needs. Thus, clouds have the               the scientific community, and show which
        potential to provide to their owners the             improvements should be considered first to
        benefits of an economy of scale and, at the          address this discrepancy between offer and
        same time, become an alternative for                 demand. Index Terms—Distributed systems,
        scientists to clusters, grids, and parallel          distributed     applications,    performance
        production environments. However, the                evaluation,             metrics/measurement,
        current commercial clouds have been built            performance measures.
        to support web and small database
        workloads, which are very different from                  2. INTRODUCTION
        typical scientific computing workloads.
        Moreover, the use of virtualization and              SCIENTIFIC computing requires an ever-
        resource time sharing may introduce                  increasing number of resources to deliver
        significant performance penalties for the            results for ever-growing problem sizes in a
        demanding scientific computing workloads.            reasonable time frame. In the last decade,
        In this work, we analyze the performance of          while the largest research projects were able
        cloud computing services for scientific              to afford expensive supercomputers, many
        computing workloads. We quantify the                 projects were forced to opt for cheaper
        presence in real scientific computing                resources such as commodity clusters and
        workloads of Many-Task Computing (MTC)               grids. Cloud computing proposes an
        users, that is, of users who employ loosely          alternative in which resources are no longer
        coupled applications comprising many tasks           hosted by the researchers’ computational
        to achieve their scientific goals. Then, we
        perform an empirical evaluation of the               facilities, but are leased from big data
        performance of four commercial cloud                 centers only when needed. Despite the
        computing services including Amazon EC2,             existence of several cloud computing
        which is currently the largest commercial            offerings by vendors such as Amazon and
        Department of CSE, Sun College of Engineering and Technology
                                                                                                             6
National Conference on Role of Cloud Computing Environment in Green Communication 2012


        go Grid , the potential of clouds for                investigate the performance        of   these
        scientific computing remains largely                 individual resources.
        unexplored. To address this issue, in this
        paper we present a performance analysis of           Job parallelism: A large majority of the
        cloud computing services for many-task               parallel jobs found in published PPI and
        scientific computing                                 grid traces have up to 128 processors ,
                                                             Moreover, the average scientific cluster size
        2.1 CLOUD COMPUTING SERVICESFOR                      was found to be around 32 nodes and to be
        SCIENTIFIC COMPUTING                                 stable over the past five years . Thus, in
                                                             Section 4 we book at the performance of
        In this section, we provide a background to          executing parallel applications of up to 128
        analyzing the performance of cloud                   processors.
        computing services for scientific computing.
        We first describe the main characteristics of        2.1.2 Four Selected Clouds: Amazon EC2,
        the     common       scientific    computing         Go Grid, Elastic Hosts, and Mosso identify
        workloads, based on previous work on                 three categories of cloud computing service
        analyzing and modeling of workload traces
        taken from PPIs and grids , . Then, we                   Infrastructure-as-a-Service(IaaS)
        introduce the cloud computing services that
        can be used for scientific computing, and                Platform-as-a-Service Service (PaaS),
        select four commercial clouds whose
        performance we will evaluate empirically.                Software-as-a-Service (SaaS)
        2.1.1 SCIENTIFIC COMPUTING                            3. TECHNIQUE

        Job structure and source: PPI workloads are          We provide a background to analyzing the
        dominated by parallel jobs , while grid              performance of cloud computing services
        workload dominated by small bags-of-tasks            for scientific computing. We first describe
        and sometimes by small workflows ,                   the main characteristics of the common
        comprising mostly sequential tasks. Source           scientific computing workloads, based on
        wise, it is common for PPI grid workloads to         previous Work on analyzing and modeling
        be dominated by a small number of users.             of workload traces taken from PPIs and
        We consider users that submit many tasks,            grids. Then, introduce the cloud computing
        often grouped into the same submission as            serve hat can be used for scientific
        BoTs, as proto-MTC users, in that they will          computing, and select four commercial
        be most likely to migrate to systems that            clouds whose Performance we will evaluate
        provide good performance for MTC work-               empirically.
        load execution. We focus in Section 3 on a
        more rigorous definition of MTC workloads,                 In the following Amazon EC2; the
        and on demonstrating their presence in               other three, Go Grid (GG), Elastic Hosts
        recent scientific workloads.                         (EH), and Mosso, are IaaS clouds with
                                                             provisioning, Billing, and availability and
        Bottleneck resources: Overall, scientific            performance guarantees similar to Amazon
        computing      workloads      are    highly          EC2’s.The Amazon Elastic Computing
        heterogeneous, and can have either one of            Cloud is an IaaS cloud computing service
        CPU, I/O, memory, and network as the                 thatopensAmazon’slargecomputinginfrastru
        bottleneck resource. Thus, in Section 4 we
        Department of CSE, Sun College of Engineering and Technology
                                                                                                              7
National Conference on Role of Cloud Computing Environment in Green Communication 2012


        cture to its users. The service is elastic in the    introduced in our previous work [22], that is,
        sense that it enables the user to extend or          we use the BoT identification information
        shrink its infrastructure by launching or            when it is present in the trace, and identify
        terminating       new     virtual     machines       BoTs as groups of tasks submitted by the
        (instances).The user can use any of the              same user at and during short time intervals,
        instance types currently available on offer,         otherwise.
        the characteristics and cost of the five
        instance types available in June 2009 are            5. RELATED WORKS
        summarized in Table 1. An ECU is the
        equivalent CPU power of a 1.0-1.2 GHz                     In this section, we review related work
        2007 Opteron or Xeon processor.                      from three areas clouds, virtualization, and
                                                             performance evaluation. Our work also
             4. EXPERIMENTS                                  comprises the first characterization of the
                                                             MTC component in existing scientific
         MTC workloads may comprise tens of                  computing workloads.
        thousands to hundreds of thousands of tasks
        and BoTs [4], and a typical period may be                          Performance evaluation of
        one year or the whole trace. Our method for          clouds and virtualized environments. There
        identifying proto-MTC users—users with a             has been a recent spur of research activity in
        pronounced MTC-like workload, which are              assessing the performance of virtualized
        potential MTC users in the future—in                 resources, in cloud computing environments
        existing system workloads is based on the            and in general In contrast to this body of
        identification of users with many submitted          previous work, ours is different in scope: we
        tasks and/or bags-of-tasks in the workload           perform extensive measurements using
        traces taken from real scientific computing          general purpose and high-performance
        infrastructures. We define an MTC user to            computing benchmarks to compare several
        be a user that has submitted at least J jobs         clouds, and we compare clouds with other
        and at least B bags-of-tasks. The user part of       environments based on real long-term
        our definition serves as a coupling between          scientific computing traces. Our study is
        jobs, under the assumption that a user               also much broader in size: we perform in
        submits jobs for execution toward an                 this work an evaluation using over 25
        arbitrary but meaningful goal. The jobs part         individual benchmarks on over 10 cloud
        ensures that we focus on high-volume users;          instance types, which is an order of
        these users are likely to need new                   magnitude larger than previous work .
        scheduling techniques for good system
        performance. The bag-of-tasks part ensures                   Performance studies using general
        that task submission occurs within a short           purpose benchmarks have shown that the
        period of time; this submission pattern raises       overhead incurred by virtualization can be
        new challenges in the area of task                   below 5 percent for computation and below
        scheduling and management . Ideally, it              15 percent for networking . Similarly, the
        should be possible to use a unique pair of           performance loss due to virtualization for
        values for J and B across different systems.         parallel I/O and web server I/O has been
        To identify MTC users, we first formulate            shown to be below 30 and 10 percent , [64],
        the identification criterion by selecting            respectively. In contrast to these, our work
        values for J, B. If B _ 1, we first identify the     shows that virtualized resources obtained
        BoTs in the trace using the method that we           from public clouds can have a much lower

        Department of CSE, Sun College of Engineering and Technology
                                                                                                               8
National Conference on Role of Cloud Computing Environment in Green Communication 2012


        performance       than     the     theoretical       including Amazon EC2, one of the largest
        peak.Recently, much interest for the use of          commercial clouds currently in production.
        virtualization has been shown by the HPC             Our main finding here is that the compute
        community, spurred by two seminal studies            performance of the tested clouds is low.
        that find virtualization overhead to be              Last, we compare the performance and cost
        negligible for compute-intensive HPC                 of clouds with those of scientific computing
        kernels and applications such as the NAS             alternatives such as grids and parallel
        NPB benchmarks; other studies have                   production infrastructures. We find that,
        investigated virtualization performance for          while current cloud computing services are
        specific HPC application domains , or for            insufficient for scientific computing at large,
        mixtures of Web and HPC workloads                    they may still be a good solution for the
        running on virtualized resources . Our work          scientists who need resources instantly and
        differs significantly from these previous            temporarily.
        approaches in target and in size. For clouds,
        the study of performance and cost of                 7. REFERENCES
        executing a scientific workflow, Montage, in
        clouds investigates cost performance trade-          [1] Amazon, Inc., “Amazon Elastic
        offs between clouds and grids, but usesa             Compute Cloud (Amazon EC2),”
        single application on a single cloud, and the        http://aws.amazon.com/ec2/, Dec. 2008.
        application itself is remote from the
        mainstream HPC scientific Community                  [2] GoGrid, “Go Grid Cloud-Server
                                                             Hosting,” http://www.gogrid.com, Dec.
         6. CONCLUSIONS                                      2008.

        With the emergence of cloud computing as a           [3] A. Iosup, O.O. Sonmez, S. Anoep, and
        paradigm in which scientific computing can           D.H.J. Epema, “The Performance of Bags-
        done exclusively on resources leased only            of-Tasks in Large-Scale Distributed
        when needed from big data canters, Relative          Systems,” Proc. ACM Int’l Symp. High
        Strategy Performance: Resource Bulk                  Performance DistributedComputing
        Allocation (S2) versus Resource Acquisition          (HPDC), pp. 97-108, 2008.
        and Release per Job (S1) only performance
        differences above 5 percent are shown-                [4] I. Raicu, Z. Zhang, M. Wilde, I.T.
        scientists are faced with a new platform             Foster, P.H. Beckman, K. Iskra,and B.
        option. However, the initial target workloads        Clifford, “Toward Loosely Coupled
        of clouds do not match the characteristics of        Programming on Petascale Systems,” Proc.
        MTC-based scientific computing workloads.            ACM Conf. Supercomputing (SC), p. 22,
        Thus, in this paper we seek to answer the            2008.
        research Question Is the performance of
        clouds sufficient for MTC-based scientific           [5] A. Iosup, C. Dumitrescu, D.H.J. Epema,
        computing? To this end, we first investigate         H. Li, and L. Wolters, “How Are Real Grids
        the presence of an MTC component in                  Used? The Analysis of Four Grid Traces
        existing scientific computing workloads, and         andIts Implications,” Proc. IEEE Seventh
        find that this presence is significant both in       Int’l Conf. Grid Computing, pp. 262-269,
        number of jobs and in resources consumed.            2006.
        Then, we perform an empirical performance
        evaluation of four public computing clouds,

        Department of CSE, Sun College of Engineering and Technology
                                                                                         9
National Conference on Role of Cloud Computing Environment in Green Communication 2012


        [6] U. Lublin, D.G. Feitelson, “Workload on
        Parallel Supercomputers:Modeling
        Characteristics of Rigid Jobs,” J. Parallel




        Department of CSE, Sun College of Engineering and Technology

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:17
posted:7/26/2012
language:English
pages:5