Learning Center
Plans & pricing Sign in
Sign Out

IEEE Projects 2012-2013 Grid Computing


ieee projects download, base paper for ieee projects, ieee projects list, ieee projects titles, ieee projects for cse, ieee projects on networking,ieee projects 2012, ieee projects 2013, final year project, computer science final year projects, final year projects for information technology, ieee final year projects, final year students projects, students projects in java, students projects download, students projects in java with source code, students projects architecture, free ieee papers

More Info
									            Elysium Technologies Private Limited
            Approved by ISO 9001:2008 and AICTE for SKP Training
            Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

        IEEE FINAL YEAR PROJECTS 2012 – 2013
                             Grid Computing
Corporate Office: Madurai
    227-230, Church road, Anna nagar, Madurai – 625 020.
    0452 – 4390702, 4392702, +9199447933980

Branch Office: Trichy
    15, III Floor, SI Towers, Melapudur main road, Trichy – 620 001.
    0431 – 4002234, +919790464324.

Branch Office: Coimbatore
    577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641 002.
    Website:, Email:

Branch Office: Kollam
    Surya Complex, Vendor junction, Kollam – 691 010, Kerala.
    0474 – 2723622, +919446505482.

Branch Office: Cochin
    4th Floor, Anjali Complex, near south over bridge, Valanjambalam,
    Cochin – 682 016, Kerala.
    0484 – 6006002, +917736004002.
    Email:, Website:

     IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                            Elysium Technologies Private Limited
                            Approved by ISO 9001:2008 and AICTE for SKP Training
                            Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

                                         GRID COMPUTING                                                    2012 - 2013
EGC      A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster
9301     Environments

        In this paper, we present our design of a Processing Element (PE) Aware MapReduce base framework, Pamar. Pamar is
        designed for supporting distributed computing on clusters where node PE configurations are asymmetric on different
        nodes. Pamar's main goal is to allow users to seamlessly utilize different kinds of processing elements (e.g., CPUs or
        GPUs) collaboratively for large scale data processing. To show proof of concept, we have incorporated our designs into
        the Hadoop framework and tested it on cluster environments having asymmetric node PE configurations. We
        demonstrate Pamar's ability to identify PEs available on each node and match-make user jobs with nodes, base on job
        PE requirements. Pamar allows users to easily parallelize applications across large datasets and at the same time
        utilizes different PEs for processing different classes of functions efficiently. The experiments show improvement in job
        queue completion time with Pamar over clusters with asymmetric nodes as compared to clusters with symmetric nodes.

        A Multi-objective Approach for Workflow Scheduling in Heterogeneous Environments

        Traditional scheduling research usually targets make span as the only optimization goal, while several isolated efforts
        addressed the problem by considering at most two objectives. In this paper we propose a general framework and
        heuristic algorithm for multi-objective static scheduling of scientific workflows in heterogeneous computing
        environments. The algorithm uses constraints specified by the user for each objective and approximates the optimal
        solution by applying a double strategy: maximizing the distance to the constraint vector for dominant solutions and
        minimizing it otherwise. We analyze and classify different objectives with respect to their impact on the optimization
        process and present a four-objective case study comprising make span, economic cost, energy consumption, and
        reliability. We implemented the algorithm as part of the ASKALON environment for Grid and Cloud computing. Results
        for two real-world applications demonstrate that the solutions generated by our algorithm are superior to user-defined
        constraints most of the time. Moreover, the algorithm outperforms a related bi-criteria heuristic and a bi-criteria genetic

 EGC     A Scalable Parallel Debugging Library with Pluggable Communication Protocols

        Parallel debugging faces challenges in both scalability and efficiency. A number of advanced methods have been

                     IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                            Elysium Technologies Private Limited
                            Approved by ISO 9001:2008 and AICTE for SKP Training
                            Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

        invented to improve the efficiency of parallel debugging. As the scale of system increases, these methods highly rely on
        a scalable communication protocol in order to be utilized in large-scale distributed environments. This paper describes a
        debugging middleware that provides fundamental debugging functions supporting multiple communication protocols.
        Its pluggable architecture allows users to select proper communication protocols as plug-ins for debugging on different
        platforms. It aims to be utilized by various advanced debugging technologies across different computing platforms. The
        performance of this debugging middleware is examined on a Cray XE Supercomputer with 21,760 CPU cores.

         Automating Data-Throttling Analysis for Data-Intensive Workflows

         Data movement between tasks in scientific workflows has received limited attention compared to task execution. Often
         the staging of data between tasks is either assumed or the time delay in data transfer is considered to be negligible
         (compared to task execution). Where data consists of files, such file transfers are accomplished as fast as the network
         links allow, and once transferred, the files are buffered/stored at their destination. Where a task requires multiple files
         to execute (from different tasks), it must, however, remain idle until all files are available. Hence, network bandwidth
         and buffer/storage within a workflow are often not used effectively. We propose an automated workflow structural
         analysis method for Directed Acyclic Graphs (DAGs) which utilises information from previous workflow executions.
         The method obtains data-throttling values for the data transfer to enable network bandwidth and buffer/storage
         capacity to be managed more efficiently. We convert a DAG representation into a Petri net model and analyse the
         resulting graph using an iterative method to compute data-throttling values. Our approach is demonstrated using the
         Montage workflow

 EGC      Distributed S-Net: Cluster and Grid Computing without the Hassle

        S-Net is a declarative coordination language and component technology primarily aimed at modern multi-core/many-
        core chip architectures. It builds on the concept of stream processing to structure dynamically evolving networks of
        communicating asynchronous components, which themselves are implemented using a conventional language suitable
        for the application domain. We present the design and implementation of Distributed S-Net, a conservative extension of
        S-Net aimed at distributed memory architectures ranging from many-core chip architectures with hierarchical memory
        organisations to more traditional clusters of workstations, supercomputers and grids. Three case studies illustrate how
        to use Distributed S-Net to implement different models of parallel execution. Runtimes obtained on a workstation cluster
        demonstrate how Distributed S-Net allows programmers with little or no background in parallel programming to make
        effective use of distributed memory architectures with minimal programming effort.

 EGC      Business-OWL (BOWL)—A Hierarchical Task Network Ontology for Dynamic Business
          Process Decomposition and Formulation

                  IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                          Elysium Technologies Private Limited
                           Approved by ISO 9001:2008 and AICTE for SKP Training
                          Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

       Collaborative Business Processes (cBPs) form the backbone of enterprise integration. With the growing reliance on the
       web as a medium of business collaboration, there is an increasing need to quickly and dynamically form cBPs. However,
       current Business-to-Business (B2B) information systems are still static in nature, and are unable to dynamically form
       cBPs based on high-level Business Goals (BGs)and their underlying criteria (e.g., item cost, product name, order
       quantity, etc). This paper introduces the Business-OWL (BOWL), an ontology rooted in the Web Ontology Language
       (OWL), and modeled as a Hierarchical Task Network (HTN) for the dynamic formation of business processes. An
       ontologized extension and augmentation of traditional HTN, BOWL describes business processes as a hierarchical
       ontology of decomposable business tasks encompassing all possible decomposition permutations. Through BOWL,
       high-level business goals (e.g., "Buy”) can be easily decomposed right down to the lowest level tasks (e.g., "Send
       Purchase Order”), effectively bridging the gap between high-level business goals with operational level tasks and
       complementing currently static business process modeling languages. The design of BOWL and a case study
       demonstrating its implementation are also discussed.

         Energy- and Cost-Efficiency Analysis of ARM-Based Clusters

       General-purpose computing domain has experienced strategy transfer from scale-up to scale-out in the past decade. In
       this paper, we take a step further to analyze ARM-processor based cluster against Intel X86 workstation, from both
       energy-efficiency and cost-efficiency perspectives. Three applications are selected and evaluated to represent
       diversified applications, including Web server throughput, in-memory database, and video transcoding. Through
       detailed measurements, we make the observations that the energy-efficiency ratio of the ARM cluster against the Intel
       workstation varies from 2.6-9.5 in in-memory database, to approximately 1.3 in Web server application, and 1.21 in video
       transcoding. We also find out that for the Intel processor that adopts dynamic voltage and frequency scaling (DVFS)
       techniques, the power consumption is not linear with the CPU utilization level. The maximum energy saving achievable
       from DVFS is 20%. Finally, by utilizing a monthly cost model of data centers, we conclude that ARM cluster based data
       centers are feasible, and are advantageous in computationally lightweight applications, e.g. in-memory database and
       network-bounded Web applications. The cost advantage of ARM cluster diminishes progressively for computation-
       intensive applications, i.e. dynamic Web server application and video transcoding, because the number of ARM
       processors needed to provide comparable performance increases.

         Evaluating Dynamics and Bottlenecks of Memory Collaboration in Cluster Systems

       Modern distributed applications are embedding an increasing degree of dynamism, from dynamic supply-chain
       management, enterprise federations, and virtual collaborations to dynamic resource acquisitions and service
       interactions across organizations. Such dynamism leads to new challenges in security and dependability. Collaborating
       services in a system with a Service-Oriented Architecture (SOA) may belong to different security realms but often need
       to be engaged dynamically at runtime. If their security realms do not have a direct cross-realm authentication

                 IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                           Elysium Technologies Private Limited
                           Approved by ISO 9001:2008 and AICTE for SKP Training
                           Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

       relationship, it is technically difficult to enable any secure collaboration between the services. A potential solution to
       this would be to locate intermediate realms at runtime, which serve as an authentication path between the two separate
       realms. However, the process of generating an authentication path for two distributed services can be highly
       complicated. It could involve a large number of extra operations for credential conversion and require a long chain of
       invocations to intermediate services. In this paper, we address this problem by designing and implementing a new
       cross-realm authentication protocol for dynamic service interactions, based on the notion of service-oriented multiparty
       business sessions. Our protocol requires neither credential conversion nor establishment of any authentication path
       between the participating services in a business session. The correctness of the protocol is formally analyzed and
       proven, and an empirical study is performed using two production-quality Grid systems, Globus 4 and CROWN. The
       experimental results indicate that the proposed protocol and its implementation have a sound level of scalability and
       impose only a limited degree of performance overhead, which is for example comparable with those security-related
       overheads in Globus 4.

         Fine-Grained Access Control in the Chirp Distributed File System

       Although the distributed file system is a widely used technology in local area networks, it has seen less use on the wide
       area networks that connect clusters, clouds, and grids. One reason for this is access control: existing file system
       technologies require either the client machine to be fully trusted, or the client process to hold a high value user
       credential, neither of which is practical in large scale systems. To address this problem, we have designed a system for
       fine-grained access control which dramatically reduces the amount of trust required of a batch job accessing a
       distributed file system. We have implemented this system in the context of the Chirp user-level distributed file system
       used in clusters, clouds, and grids, but the concepts can be applied to almost any other storage system. The system is
       evaluated to show that performance and scalability are similar to other authentication methods. The paper concludes
       with a discussion of integrating the authentication system into workflow systems.

         Improving Grid Resource Usage: Metrics for Measuring Fragmentation

       In highly heterogeneous and distributed systems, like Grids, it is rather difficult to provide QoS to the users. As
       reservations of resources may not always be possible, another possible way of enhancing the perceived QoS is by
       performing meta-scheduling of jobs in advance, where jobs are scheduled some time before they are actually executed.
       Thank to this, it is more likely that the appropriate resources are available to execute the job when needed. When using
       this type of scheduling, fragmentation appears and may become the cause of poor resource utilization. Because of that,
       some techniques are needed to perform rescheduling of tasks that may reduce the existing fragmentation. To this end,
       knowing the status of the system is a must. However, how to measure and quantify the existing fragmentation in a Grid
       system is a challenging task. This paper proposes different metrics aiming at measuring that fragmentation not only at
       resource level but also taking into account all the resources of the Grid environment as a whole. Finally, a performance

                 IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                           Elysium Technologies Private Limited
                           Approved by ISO 9001:2008 and AICTE for SKP Training
                           Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

       evaluation of the proposed metrics over a real test bed is presented. -

EGC     Investigation of data locality and fairness in MapReduce

       In data-intensive computing, MapReduce is an important tool that allows users to process large amounts of data easily.
       Its data locality aware scheduling strategy exploits the locality of data accessing to minimize data movement and thus
       reduce network traffic. In this paper, we firstly analyze the state-of-the-art MapReduce scheduling algorithms and
       demonstrate that optimal scheduling is not guaranteed. After that, we mathematically reformulate the scheduling
       problem by using a cost matrix to capture the cost of data staging and propose an algorithm lsap-sched that yields
       optimal data locality. In addition, we integrate fairness and data locality into a unified algorithm lsap-fair-sched in which
       users can easily adjust the tradeoffs between data locality and fairness. At last, extensive simulation experiments are
       conducted to show that our algorithms can improve the ratio of data local tasks by up to 14%, reduce data movement
       cost by up to 90%, and balance fairness and data locality effectively.

EGC    Lowering Inter-datacenter Bandwidth Costs via Bulk Data Scheduling

       Cloud service providers (CSP) of today operate multiple data centers, over which they provide resilient infrastructure,
       data storage and compute services. The links between data centers have very high capacity, and are typically purchased
       by the CSPs using established billing practices, such as 95-thpercentile billing or average-usage billing. These links are
       used to serve both client traffic as well as CSP-specific bulk data traffic, such as backup jobs, etc. Past studies have
       shown a diurnal pattern of traffic over such links. However, CSPs pay for the peak bandwidth, which implies that they
       are under-utilizing the capacity for which they have paid for. We propose a scheduling framework that considers various
       classes of jobs that are encountered over such links, and propose GRESE, an algorithm that attempts to minimize
       overall bandwidth costs to the CSP, by leveraging the flexible nature of the deadlines of these bulk data jobs. We
       demonstrate the problem is not a simple extension of any well-known scheduling problems, and show how the GRESE
       algorithm is effective in curtailing CSP bandwidth costs.

EGC    Maestro: Replica-Aware Map Scheduling for MapReduce

       MapReduce has emerged as a leading programming model for data-intensive computing. Many recent research efforts
       have focused on improving the performance of the distributed frameworks supporting this model. Many optimizations
       are network-oriented and most of them mainly address the data shuffling stage of MapReduce. Our studies with Hadoop
       demonstrate that, apart from the shuffling phase, another source of excessive network traffic is the high number of map
       task executions which process remote data. That leads to an excessive number of useless speculative executions of
       map tasks and to an unbalanced execution of map tasks across different machines. All these factors produce a
       noticeable performance degradation. We propose a novel scheduling algorithm for map tasks, named Maestro, to

                 IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                           Elysium Technologies Private Limited
                            Approved by ISO 9001:2008 and AICTE for SKP Training
                           Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

        improve the overall performance of the MapReduce computation. Maestro schedules the map tasks in two waves: first, it
        fills the empty slots of each data node based on the number of hosted map tasks and on the replication scheme for their
        input data; second, runtime scheduling takes into account the probability of scheduling a map task on a given machine
        depending on the replicas of the task's input data. These two waves lead to a higher locality in the execution of map
        tasks and to a more balanced intermediate data distribution for the shuffling phase. In our experiments on a 100-node
        cluster, Maestro achieves around 95% local map executions, reduces speculative map tasks by 80% and results in an
        improvement of up to 34% in the execution time.

 EGC      MARLA: MapReduce for Heterogeneous Clusters Ottawa, Canada

        MapReduce has gradually become the framework of choice for "big data". The MapReduce model allows for efficient and
        swift processing of large scale data with a cluster of compute nodes. However, the efficiency here comes at a price. The
        performance of widely used MapReduce implementations such as Hadoop suffers in heterogeneous and load-
        imbalanced clusters. We show the disparity in performance between homogeneous and heterogeneous clusters in this
        paper to be high. Subsequently, we present MARLA, a MapReduce framework capable of performing well not only in
        homogeneous settings, but also when the cluster exhibits heterogeneous properties. We address the problems
        associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently present through
        MARLA the components and trade-offs necessary for better MapReduce performance in heterogeneous cluster and
        cloud environments. We quantify the performance gains exhibited by our approach against Apache Hadoop and
        MARIANE in data intensive and compute intensive applications.

EGC     Self-Healing of Operational Workflow Incidents on Distributed Computing

        A Distributed computing infrastructures are commonly used through scientific gateways, but operating these gateways
        requires important human intervention to handle operational incidents. This paper presents a self-healing process that
        quantifies incident degrees of workflow activities from metrics measuring long-tail effect, application efficiency, data
        transfer issues, and site-specific problems. These metrics are simple enough to be computed online and they make little
        assumptions on the application or resource characteristics. Incidents are classified in levels and associated to sets of
        healing actions that are selected based on association rules modeling correlations between incident levels. The healing
        process is parametrized on real application traces acquired in production on the European Grid Infrastructure.
        Implementation and experimental results obtained in the Virtual Imaging Platform show that the proposed method
        speeds up execution up to a factor of 4 and properly detects unrecoverable errors.

EGC     Using Model Checking to Analyze the System Behavior of the LHC Production Grid

                  IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                            Elysium Technologies Private Limited
                            Approved by ISO 9001:2008 and AICTE for SKP Training
                            Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

        DIRAC (Distributed Infrastructure with Remote Agent Control) is the grid solution designed to support production
        activities as well as user data analysis for the Large Hadron Collider "beauty" experiment. It consists of cooperating
        distributed services and a plethora of light-weight agents delivering the workload to the grid resources. Services accept
        requests from agents and running jobs, while agents actively fulfill specific goals. Services maintain database back-
        ends to store dynamic state information of entities such as jobs, queues, or requests for data transfer. Agents
        continuously check for changes in the service states, and react to these accordingly. The logic of each agent is rather
        simple, the main source of complexity lies in their cooperation. These agents run concurrently, and communicate using
        the services' databases as a shared memory for synchronizing the state transitions. Despite the effort invested in
        making DIRAC reliable, entities occasionally get into inconsistent states. Tracing and fixing such behaviors is difficult,
        given the inherent parallelism among the distributed components and the size of the implementation. In this paper we
        present an analysis of DIRAC with mCRL2, process algebra with data. We have reverse engineered two critical and
        related DIRAC subsystems, and subsequently modeled their behavior with the mCRL2 toolset. This enabled us to easily
        locate race conditions and live locks which were confirmed to occur in the real system. We further formalized and
        verified several behavioral properties of the two modeled subsystems.

 EGC    Using Rules and Data Dependencies for the Recovery of Concurrent Processes in a
        Service-Oriented Environment

        This paper presents a recovery algorithm for service execution failure in the context of concurrent process execution.
        The recovery algorithm was specifically designed to support a rule-based approach to user-defined correctness in
        execution environments that support a relaxed form of isolation for service execution. Data dependencies are analyzed
        from data changes that are extracted from database transaction log files and generated as a stream of deltas from Delta-
        Enabled Grid Services. The deltas are merged by time stamp to create a global schedule of data changes that, together
        with the process execution context, are used to identify processes that are read and write dependent on failed
        processes. Process interference rules are used to express semantic conditions that determine if a process that is
        dependent on a failed process should recover or continue execution. The recovery algorithm integrates a service
        composition model that supports nested processes, compensation, contingency, and rollback procedures with the data
        dependency analysis process and rule execution procedure to provide a new approach for addressing consistency
        among concurrent processes that access shared data. We present the recovery algorithm and also discuss our results
        with simulation and evaluation of the concurrent process recovery algorithm.

EGC     WSCOM: Online Task Scheduling with Data Transfers

        This paper considers the online problem of task scheduling with communication. All information on tasks and
        communication are not available in advance except the DAG of task topology. This situation is typically encountered
        when scheduling DAG of tasks corresponding to Make files executions. To tackle this problem, we introduce a new
        variation of the work-stealing algorithm: WSCOM. These algorithms take advantage of the knowledge of the DAG

                  IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects
                   Elysium Technologies Private Limited
                   Approved by ISO 9001:2008 and AICTE for SKP Training
                   Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai

topology to cluster communicating tasks together and reduce the total number of communications. Several variants are
designed to overlap communication or optimize the graph decomposition. Performance is evaluated by simulation and
our algorithms are compared with off-line list-scheduling algorithms and classical work-stealing from the literature.
Simulations are executed on both random graphs and a new trace archive of Make file DAG. These experiments validate
the different design choices taken. In particular we show that WSCOM is able to achieve performance close to off-line
algorithms in most cases and is even able to achieve better performance in the event of congestion due to less data
transfer. Moreover WSCOM can achieve the same high performances as the classical work-stealing with up to ten times
less bandwidth.

          IEEE Final Year Projects 2012 |Student Projects | Grid Computing Projects

To top