GRid_Web
Shared by: liwenting
-
Stats
- views:
- 12
- posted:
- 7/21/2012
- language:
- English
- pages:
- 111
Document Sample


Cluster / Grid with Web and
Semantic Services
Dr G Sudha Sadasivam
Professor, CSE
PSG College of Technology
Coimbatore- 641 004
Agenda
• Web Services
• SOA
• Semantics
• Grid Architecture
• 3rd Generation Grid Architecture
• Semantic Grid
• Cluster Architecture- Hadoop
• Amazon Web Services
• Work at Grid and Cloud Computing Lab -
PSGCT
ORGANISING A BIRTHDAY PARTY????
PRODUCTS AND SERVICES – A TRADITIONAL WAY
OF DISCOVERING AND ACCESSING
INFORMATION SERVICES
1. Web Service
A service is a set of actions that form a coherent whole from the point of view of
service providers and service requesters - Arranging for a birthday party.
Web services provide a standard means of interoperating between different
software applications, running on a variety of platforms and/or frameworks in a
transparent and loosely coupled manner
A Web service is a software system designed
• to support interoperable machine-to-machine interaction
• has an interface described in a machine-processable format (WSDL).
• communication using standard SOAP-messages, on HTTP
• with an XML serialization in conjunction with other Web-related std.
• UDDI registry
• identified by URI
Web service is an entity that can be:
• Described (using WSDL)
• Published
• Discovered
• Invoked by a client
W3C technology standardization process
Web Service Interactions
COMPONENTS
• A Web service is an abstract notion that is implemented by a
concrete agent.
• Elements
– The provider entity is the person or organization that provides an
appropriate agent to implement a particular service.
– A requester entity is a person or organization that wishes to make use of
a provider entity's Web service.
– Registry – to register the services
• Web Service Discovery:
– Before message exchange, the requester entity and the provider entity
must first agree on both the semantics and the mechanics of the
message exchange
– The service description (WSD) (message formats, datatypes, transport
protocols, and transport serialization formats) represents a contract
governing the mechanics of interacting with a particular service.
– The semantics represents a contract governing the meaning
(consequence and purpose) of that interaction.
2. SOA
• Aim: Alignment of Business needs with IT
• Architectural style of building enterprise solutions based on
services
• SOA is a blueprint that governs creation, deployment,
execution and management of reusable business services.
• WSA is an instance of SOA (Architecture – independent of
tech.)
• Services provide independent, loosely coupled, transparent,
composable invocation of tasks in a standard way.
• SOA separates functions into distinct units (services),
which can be distributed over a network and can be
combined and reused to create business applications.
These services communicate with each other by passing
data from one service to another, or by coordinating an
activity between two or more services.
• Guiding principles – Reusability, Open standards
Alignment of Business needs with IT
Text in blue and black can be changed
Positions in Blue text cannot be altered
Text in black can be altered in position and sizes if need be
Text in yellow ochre is meant for legal matters and updates
Requirements mapping to architectural principles
Guiding Principles Realization
Ability to model and execute Business process modeling and
the business processes BPEL4WS
Common shared services
across various applications Service Oriented Architecture
Facilitate integration
between airlines.
services Enterprise Service Bus
Consistent UI for all
integrating airlines
services
Portal Framework
Independently scale the
components of the Decoupling of the layers
architecture independently
Architecture Style Employed: SOA Mediator Pattern with ESB Broker variation
6
Page number in yellow (position not to be changed)
• Services created using an SOA and provided by an
organisation’s IT should directly support the services that
the organisation provides to its customers. (BP – IT)
Human- Self- System-
mediated service system
SO business delivers service delivery
services to its customers service
SOA is a blueprint that governs Service Oriented architecture
creation, deployment, execution and
management of reusable business
services contract contract contract
It aligns Business and Technology New Composite
Legacy
system system
system
SOA roles
• Business Role: SOA is viewed as a set of services
that a business wants to expose to customers and
clients.
• Architectural Role: SOA is an architectural style
which requires a service provider, requestor and a
service description. It provides services that fosters
modularity, encapsulation, loose coupling, separation
of concerns, reuse, composable and single
implementation.
• Implementation Role: SOA is a complete
programming model (process) with standards, tools,
methods and techniques, technologies.
SOA suite
Model and
Capture business
processes and
policies
Activity monitoring Integrate the
to gain real-time services using
information on BP ESB and
orchestrate the
SOA services into BP
Develop, connect
Apply runtime and bind services to
policies to build composite
services and applications
Deploy composite
govern them applications and to
perform service
level management
Service
• A service is a manager entity that consists of a collection of components
that work together to deliver the business function (currency
conversion/airline reservations)
• A service maps to a business function but a component maps to business
entities and the business rules that operate on them.
• Bank teller application
– components - loan component, savings bank component (with
withdrawal / deposit), account manager (to create new accounts).
– Service - the interfaces of all components (group) can be composed
and exposed as services - creation of new accounts, withdrawal and
deposit services and loan service.
SUPPORTS
SERVICES BUSINESS GOALS
HAS
SERVICE DESCRIPTION
COMBINED
CHOREOGRAPHY DYNAMIC
RECONFIGURATION
UI, Business processes, Service Layer,
Component Layer, Object Layer
PRESENTATION – portal for aggregation
of contents to users
Business Process Layer
Automation logic
Orchestration of services.
Service layer – collection
of units of work (interfaces)
Processing logic
Component layer – operations
that are units of work.
SLA
Object layer / legacy –
Messages for
communication
(Operational)
Terms in SOA
• Services
• Service provider
• Service consumer (or Service requestor
• Service locator or service registry
• Service broker – passes service requests to one or
more service providers.
SOA LIFE CYCLE
Expose CREATION OF
SERVICES
FROM
Business EXISTING / new
Drivers Incremental COMPONENTS
Iterative
Consume Compose COMBINE
EXISTING
SERVICES
USE SERVICES
Consumer view :
Provider view :
Service identification
Component identification
Service Categorisation
Component Specification
Service exposure
Service realisation
Choreography
Service management
QoS
Standards Implementation
Advantages
• standardisation
• Faster time to market
• Operational efficiency and adaptability
• Agility to collaborate
• Continuous improvement
• Aligns business to IT
• Ease of introducing new technologies
• Return of Investment (ROI)
• Vendor diversity
• Services – encap, loose coupling, contract,
reuse, composability, autonomous, dynamic,
higher granularity
SERVICE ORIENTED ARCHITECTURE
Business Process
Service
Service Registry
Management
Transaction
Security
Service Description
Policy
Service Communication
Protocol (ESB)
Transport layer
Problems in Web services (Point – Point)
• Service consumers need to be modified whenever the service
provider interface changes. (dynamic)
• Every consumer should have a suitable protocol adapter for each
provider it is connected to. (interoperability)
ESB
• ESB acts as a mediator that transforms, routes, notifies and augments
information.
• It provides virtualization of the enterprise resources.
• The Enterprise Service Bus is an enterprise-class messaging bus.
• It has the following facilities:
messaging infrastructure
message transformation facility between consumer and
provider
Content-based routing between service consumers and
providers.
Capability to convert transport protocols between
consumer and provider.
SOA based Web services
Business Process (BPEL)
Management (WSManageability)
Transaction (WSTransaction)
Service Registry (UDDI)
Service
Security (WSSecurity)
Policy (WSPolicy)
Service Description (XML, WSDL)
Service Communication Protocol
(SOAP)
Transport layer (HTTP, JMS, SMTP)
SAHANA Responders
PRESENTATION / UI
Office Systems Laptop/PDA/Cell Web Client
Wired Mobile Internet
Channel Access
Match
Person Org Camp Requests Shelter SMS
Family Services Person Aids Place Alerts
Search Vol Search Match
Search procedures
BUSINESS PROCESSES
DDoS and Load Balancing
Missing Camps Request Shelter
Org Reg Mobile
person Reg Mgmt Reg
BUSINESS SERVICES OFFERRED BY SERVER GRID
• Missing person’s registry with efficient search
• Organisation registry with efficient match and
volunteer coordination
• Camps registry
• Request management registry with inventory
management and optimisation – search
• Shelter registry
• Messaging alerts
• Damages registry
• Grid management module to manage
coordination efforts among districts and relief
organisations
• Bulletin board – user area
SOA – screen shots
1. Organisation Registry
• New Organization Registration with the
System
• Maintaining details about each
organization with unique ID
• Updating Organization’s services
DESCRIPTION
• When a Organization wants to provide service it
must provide the Organization name, city, branch to
the system
• By Default, every Organization that registers for the
first time has to provide a single service
• On successful registration, an automatically
generated Organization Id will be displayed to the
Organization authority
• To update the service provided, both Organization
ID and password are validated
• The various services are displayed in the form and
from which Service provider have to select their
additional service
NEW ORGANIZATION REGISTRATION:
SERVICE
ORG NAME REGISTRATIO
PROVIDER N
CITY SYSTEM
BRANCH
SERVICE
ORGANIZATION DB
UPDATION
ORGANIZATION ID
ORGANIZATION’S SERVICE UPDATION
ORG ID AND PASSWORD REGISTRATION
SERVICE
PROVIDER SYSTEM
RECORD RETRIVAL
AND
VALIDATION
VALIDATION RESULT
SERVICES LIST
SELECTED SERVICE
SERVICE
UPDATION
SERVICE INFROMATION
UPDATED FORM
BUSINESS PROCESSES
• Service Provider registers to the system
• Service provider login validation
• Services updating
FORMS
3 X Forms
• LOGIN XFORM
• ORGANIZATION DETAILS XFORM
• SERVICE UPDATED XFORM
BUSINESS
PROCESS
LOG IN X FORM
ORGANISATION DETAILS X FORM & GETTING DETAILS
XML
SERVICE UPDATED X FORM
SERVICE SELECTION XFORM
DATABASE RELATIONSHIPS
High Throughput Computing High Performance Computing
Tightly coupled, fine grain parallelism
Distributed Computing, loosely coupled Homogenous Systems
Disparate Autonomous heterogenous systems high computing power, short period
Computation intensive – Sharing , single adm Low latency communication
P2P Clusters Shared Memory Computing
Mainly for file sharing Resource sharing Parallel systems, multicore
Geographically dispersed peers Close to each other, Divide and conquer
Autonomous nodes Usually homogenous synchronization
Decentralised Centralised control, cooperative working Tightly Coupled
GRID CLOUD
Heterogeneous systems, HTC Heterogeneous systems , HPC
VO – trust groups, dynamic, cross organisational On demand resource provisioning over Internet
Geographically dispersed Resource sharing Data centric with grid backbone, utility value
Scientific, distribution of work among all resources Elastic , Business, full utilization of resources
Virtualisation Web Services Virtualisation
System integration Application integration Viewing a single system as
Separation of concerns multiple resources
Data integration, interop Multi tenancy
Sharing a resource
among multiple clients
Some Characteristics of Grids
Numerous
resources
Owned by multiple Connected by
organizations & heterogeneous,
individuals multi-level networks
Different security Different resource
requirements management
& policies policies
Unreliable Geographically
resources and separated
Resources are
environments
heterogeneous
Stages to using the Grid – Classical
View
write (code) to solve problem
“compile” against middleware
submit to Grid security
middleware
advertise
Stage data
accounting
Deploy to
resources
Steering and Select
visualisation resources
Technical capabilities
• Resource modeling
• Monitoring and notification
• Allocation
• Provisioning, life-cycle management, and
decommissioning
• Accounting and auditing
• security
Overall GRID Architecture G2
Internet
GRID
Application
Application
Collective
Resource
Connectivity Transport
Internet
Fabric Link
2/2/2010 Source: The Anatomy of the GRID, Foster, Kesselman and Teucke43
Fabric layer: Provides the resources for shared access
Connectivity layer: Core communication and authentication protocols
Resource layer: Protocols for secure negotiations, initiation, monitoring
control, accounting on individual resources.
Collective Layer: Protocols and services to capture interactions among a
collection of resources.
Application Layer: User applications that operate within VO environment.
G3- Services - OGSA
• Service based infrastructure for grid
• Grid aims to integrate, virtualize, and manage resources
and services within distributed, heterogeneous, dynamic
“virtual organizations”
• Standardization is critical to create interoperable, portable,
secure robust, scalable and reusable components and
systems
• Goal is to standardize grid services by specifying set of
standard interfaces.
• Aims to develop a common , standard and open architecture
for grid based applications.
• Service-oriented architecture, based the Open Grid
Services Architecture (OGSA), addresses this need for
standardization by defining a set of core capabilities and
behaviors that address key concerns in Grid systems.
• OGSA is based on Grid Service ( extension of web service) .
• OGSA realizes the logical middle layer in terms of
services, the interfaces these services expose, the
individual and collective state of resources belonging to
these services, and the interaction between these
services within a service-oriented architecture (SOA).
• The architecture is not layered,
• Services are loosely coupled peers that, either work
single or part of an interacting group of services,
OGSI
• Requirements not met in Web services were implemented
as Grid services confirming to OGSI specifications
• OGSI specification defines
– How grid service instances are named and referenced
– How the interfaces and behaviors are common to all
Grid services
– How to specify additional interfaces, behaviors and
extensions
• GWSDL (Grid WSDL)
• Introduces Service Data Elements (SDEs)
• portType inheritance
• Grid Service Handle (GSH)
• Grid Service Reference (GSR)
• Factory
• Handle resolver
• Notification
• Service groups (light-weight registries)
Service relationships
Grid vs Web services
• Web Services
• Messages exchange
• Documents
• No notion of “pointer”
• Service orientation?
• Grid Services
• The architecture encourages everything to be exposed
through an interface rather than being sent as a
document
• GSH is the “pointer”
• Object orientation? (CORBA?)
• 2-level naming scheme – GSH and GSR
• SDE – Web services static discovery vs SDE –
dynamic
• Instantiation and life cycle management - factory
STATEFUL
WEBSERVICE
1
2. CREATE
3
G4- Grid WSRF
OGSA services defined and implemented as Web
Services
Grid Computing : Transition from OGSI to
WSRF
3. Semantic Web
• information management
– Keywords,
– Statistical,
– Natural Language,
– Semantic Web
• Semantic Web architecture
– automated conversion and storage of unstructured text
machine process able format
– automatically extract and process the concepts and
context in the database –uses intelligent techniques
– Uses metadata to capture meaning of the information
To capture Knowledge
• Metadata
• Ontology –
– formal specification of information
– A network of concepts, relationships, and constraints
that provide context for data and information as well as
processes.
– classes (concepts) and relationships (hierarchy) in the
domain. It provides a shared understanding of the
domain.
– Ontology languages - XML, RDF, OWL
• Logic –
– formal languages for representing knowledge with
semantics
– Reasoners to infer conclusions
• Agents
– Pieces of software that work autonomously and
proactively
– Eg- search personalisation
Semantic Web Architecture
Architecture
• Unicode
– International encoding standard
– Any language can be used on the web using one
standardized form.
• Uniform Resource Identifier (URI)
– uniquely identify resources (e.g., doc)
– URL+URN
• XML
– language to write structured web documents with user
defined vocabulary
– To send documents across the Web
• RDF
– Data model (representation) of web objects
– XML based syntax
• RDFS
– Has modeling to organise web objects into hierarchies
(taxonomies) – class, subclass, properties, domain and
range restriction
– Based on RDF
– Used to write ontology
• Logic Layer
– Application specific declarative knowledge – RIF and
SWRL
• Proof layer
– Deductive process
– SPARQL can be used for querying ontologies and
knowledge bases – SQL like
• Trust layer
– Users trust using Web services
RDF
• triples subject-predicate-object in RDF
• Joe Smith has homepage http://www.example.org/~joe
– http://www.example.org/~joe/contact.rdf#joesmith (subject)
is intended to identify Joe Smith
– http://xmlns.com/foaf/0.1/homepage (predicate)
– (object) is Joe's homepage http://www.example.org/~joe/.
"Joe has family name Smith"
RDF graph describing Joe Smith
RDFS for the
company ( resource) http://www.w3.org/Organization/contact#WebifySolutions
identified by URI http://www.w3.org/Organization/contact#WebifySolutions;
Name is Webify Solutions,
e-mail address is info@webifysolutions.com, and
phone number is 1-800-4WEBIFY.
OWL
• Classes - named class, intersection classes, union
classes, complement classes, restrictions, and
enumerated classes
• Properties
– Object type
– Data type
– Property types
• Functional
• Inverse functional
• Symmetric
• Transitive
• Individuals – instances of classes and properties relate
them
Need for ontology in IT
• Bank
– Offers a number of services which can use the same data but with
redundancy
– New services can be added – but reuse existing data / functionality
• An ontology-driven approach
– can capture and represent its total product knowledge in a
language-neutral form
– deploy the knowledge in a central repository (shared).
– a single, unified view of data across its applications.
– precise retrieval of information and seamless enterprise
integration,
– business processes and various data sources can map to
each other through a common meta-model.
– shared ontology
• eliminates point-to point integration
• simplifies application integration
• reduces data redundancy and
• provides the same semantic meaning across applications,
• eases the bank's maintenance and upgrades.
Need for semantic web
– WWW has vast amount of heterogenous information
• Searching is based on contents
• Semantic meaning attached to content items describes
the information precisely
• Relevancy of information extraction can be improved.
– Provided services can be tagged with meaning;
• Web-based software agents can dynamically find these
services on the fly and use them to your benefit or in
collaboration with other services.
Need for semantics in SOA
• In SOA service representations of the available services
must be maintained.
– Metadata to discover and organize services
– Metadata to model and assemble services
– metadata to encapsulate business logic for dynamic
binding,
– Metadata manage with metadata.
• Ontology provide a very powerful and flexible way to
aggregate, visualize, and normalize service metadata
layer.
• Ontology enhance service discovery, modeling, assembly,
mediation, and semantic interoperability
• Semantic technologies provide an abstraction layer above
existing IT technologies, one that enables the bridging and
interconnection of data, content, and processes across
business and IT silos.
Semantics for Business
• A business ontology is a formal specification of business
concepts and their interrelationships that facilitates
machine reasoning and inference.
• A business ontology ties systems together using
metadata, much as a database ties together discrete
pieces of data.
• Organizations can provide a single, unified view of data
across their applications,
• Allows for precise retrieval of information,
• simplifies enterprise and SOA integration,
• reduces data redundancy, and
• Provides uniform semantic meaning across applications.
• eases development, maintenance, and upgrades across
the enterprise.
Grid semantics
• The Grid’s vision - sharing diverse resources in a flexible,
coordinated and secure manner through dynamic formation and
disbanding of virtual communities, strongly depends on metadata.
Ad hoc expression and use of metadata causes chronic dependency
on human intervention
• The Semantic Grid is an extension of the Grid in which rich resource
metadata is exposed and handled explicitly, and shared and
managed via Grid protocols.
• It exposes semantically rich information associated with grid
resources to build more intelligent grid services
• The layering of an explicit semantic infrastructure over the Grid
Infrastructure leads to increased interoperability and greater
flexibility.
• Reference Architecture that extends OGSA (standardisation) to
support the explicit handling of semantics, and defines the
associated knowledge services to support a spectrum of service
capabilities.
• S-OGSA defines a model (abstraction), the capabilities (what) and
the mechanisms (how) for the Semantic Grid.
• Metadata – to label grid resources and
entities with concepts (data file according
to appln domain)
• Rules and classification-based reasoning
can be used to generate new metadata
from existing metadata. (VO membership)
• S-OGSA has
– Model (elements and relationships)
– Capabilities (services for the components)
– Mechanisms (elements to deliver the service)
S-OGSA entities and relationships
• Grid entities (id in grid)
• Knowledge entities (K-entities) – Grid entities to operate
on knowledge.
• Semantic Bindings – association between grid and
knowledge entities.
• Semantic grid entities – entities subject to semantic
bindings, or semantic bindings, knowledge entity.
S-OGSA
• Fabric layer – resources are virtualised
through Web services
• Grid middleware with services – OGSA
interact with one another. It deploys web
services with port types through which
resources are accessed
• OGSA is extended with light weight
semantics and knowledge services to
support a spectrum of service capabilities
• Top – application layer
• Semantics of middleware and fabric layers
are considered.
• Services
– Semantic provisioning services
• Knowledge provisioning services
• Semantic binding provisioning services
– Semantic aware grid services
• Consume semantic bindings and take actions
based on knowledge and metadata
Semantic aware authorisation service
Subject – John Doe, object – resource
Semantic bindings based on match
Ontology service provides knowledge to understand semantic bindings
Hadoop
What is Hadoop?
It's a framework for running applications on large clusters of
commodity hardware which produces huge data and to
process it
Apache Software Foundation Project
Open source
Amazon’s EC2, Google
alpha (0.21) release available for download
Hadoop Includes
HDFS - a distributed filesystem
Map/Reduce - HDFS implements this programming model. It
is an offline computing engine
Concept
Moving computation is more efficient than moving large
data
• Data intensive applications with Petabytes of data.
• Web pages - 20+ billion web pages x 20KB = 400+
terabytes
– One computer can read 30-35 MB/sec from disk
~four months to read the web
– same problem with 1000 machines, < 10 mins
FACTS
Single-thread performance doesn’t matter
We have large problems and total throughput/price more
important than peak performance
Stuff Breaks – more reliability
• If you have one server, it may stay up three years (1,000 days)
• If you have 10,000 servers, expect to lose ten a day
“Ultra-reliable” hardware doesn’t really help
At large scales, super-fancy reliable hardware still fails, albeit
less often software still needs to be fault-tolerant
Commodity machines without fancy hardware give better price
– performance ratio.
DECISION : COMMODITY HARDWARE.
DFS : HADOOP – REASONS?????
WHAT SOFTWARE MODEL????????
Fundamental Dynamics
(Pace of change of the digital
infrastructure)
Digital power =
computing x communication x storage x content
Moore’s law fiber law disk law community
law
doubles doubles doubles n
every 18 x every 9 x every 12 x 2
months months months where n is
# people
(Source: Ian Foster’s Talk)
HDFS Why? Seek vs Transfer
BTree (Relational DBS)
– operate at seek rate, log(N) seeks/access
-- memory / stream based
sort/merge flat files (MapReduce)
– operate at transfer rate, log(N) transfers/sort
-- Batch based
Characteristics
• Fault tolerant, scalable, Efficient, reliable distributed
storage system
• Moving computation to place of data
• Single cluster with computation and data.
• Process huge amounts of data.
• Scalable: store and process petabytes of data.
• Economical
• Data Model
– Data is organized into files and directories
– Files are divided into uniform sized blocks and
distributed across cluster nodes
– Replicate blocks to handle hardware failure
– Checksums of data for corruption detection
and recovery
– Expose block placement so that computes
can be migrated to data
• large streaming reads and small random reads
• Files are broken in to large blocks.
– Typically 128 MB block size
– Blocks are replicated for reliability
– One replica on local node,
another replica on a remote rack,
Third replica on local rack,
Additional replicas are randomly placed
• Understands rack locality
– Data placement exposed so that computation can be
migrated to data
• Client talks to both NameNode and DataNodes
– Data is not sent through the namenode, clients
access data directly from DataNode
– Throughput of file system scales nearly linearly with
the number of nodes.
Block Placement
Hadoop Cluster Architecture:
Components
• DFS Master “Namenode”
– Manages the file system namespace
– Controls read/write access to files
– Manages block replication
– Checkpoints namespace and journals
namespace changes for reliability
Metadata of Name node in Memory
– The entire metadata is in main memory
– No demand paging of FS metadata
Types of Metadata:
List of files, file and chunk namespaces; list of
blocks, location of replicas; file attributes etc.
DFS SLAVES or DATA NODES
• Serve read/write requests from clients
• Perform replication tasks upon instruction by
namenode
Data nodes act as:
1) A Block Server
– Stores data in the local file system
– Stores metadata of a block (e.g. CRC)
– Serves data and metadata to Clients
2) Block Report: Periodically sends a report of all
existing blocks to the NameNode
3) Periodically sends heartbeat to NameNode (detect
node failures)
4) Facilitates Pipelining of Data (to other specified
DataNodes)
• Map/Reduce Master “Jobtracker”
– Accepts MR jobs submitted by users
– Assigns Map and Reduce tasks to Tasktrackers
– Monitors task and tasktracker status,
re-executes tasks upon failure
• Map/Reduce Slaves “Tasktrackers”
– Run Map and Reduce tasks upon instruction
from the Jobtracker
– Manage storage and transmission of
intermediate output.
SECONDARY NAME NODE
• Copies FsImage and Transaction Log from
NameNode to a temporary directory
• Merges FSImage and Transaction Log into
a new FSImage in temporary directory
• Uploads new FSImage to the NameNode
– Transaction Log on NameNode is purged
HDFS Architecture
• NameNode: filename, offset-> block-id, block -> datanode
• DataNode: maps block -> local disk
• Secondary NameNode: periodically merges edit logs
Block is also called chunk
JOBTRACKER, TASKTACKER AND JOBCLIENT
Software Model - ???
• Parallel programming improves performance and
efficiency.
• In a parallel program, the processing is broken up into
parts, each of which can be executed concurrently
• Identify whether the problem can be parallelised (fib)
• Matrix operations with independency
CALCULATING PI
The area of the square, denoted As
= (2r)^2 or 4r^2.
The area of the circle, denoted Ac, is
pi * r2.
• pi= 4 * No of pts on the circle /
num of points on the square
• Count the number of generated
points that are both in the circle
and in the square MAP
• PI = 4 * r REDUCE
• Restricted parallel programming model meant
for large clusters
– User implements Map() and Reduce()
WORD COUNT EXAMPLE
• File
Hello World Bye World
Hello Hadoop GoodBye Hadoop
• Map
For the given sample input the first map emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
• The second map emits:
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>
The output of the first combine:
< Bye, 1>
< Hello, 1>
< World, 2>
The output of the second combine:
< Goodbye, 1>
< Hadoop, 2>
< Hello, 1>
Thus the output of the job (reduce) is:
< Bye, 1>
< Goodbye, 1>
< Hadoop, 2>
< Hello, 2>
< World, 2>
• Map()
– Input <filename, file text>
– Parses file and emits <word, count> pairs
• eg. <”hello”, 1>
• Reduce()
– Sums all values for the same key and emits
<word, TotalCount>
• eg. <”hello”, (3 5 2 7)> => <”hello”, 17>
• File
Hello World Bye World
Hello Hadoop GoodBye Hadoop
• Map
For the given sample input the first map emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
• The second map emits:
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>
MR model
• Map()
– Process a key/value pair to generate
intermediate key/value pairs
• Reduce()
– Merge all intermediate values associated with
the same key
• Users implement interface of two primary methods:
1. Map: (key1, val1) → (key2, val2)
2. Reduce: (key2, [val2]) → [val3]
• Map - clause group-by (for Key) of an aggregate
function of SQL
• Reduce - aggregate function (e.g., average) that is
computed over all the rows with the same group-by
attribute (key).
Cloud need
• ‘Era of tera’
– ever-growing datasets,
– Changing demands/loads
– unpredictable traffic patterns, and
– the demand for faster response times.
• Elasticity – use and relinquish resources as per demand
• Software applications should be internet accessible
• Large scale applications –
cloud provides large number of machines, when
needed, distributes work among them, provisions new
machines on failure, auto scale, relinquish machines
when not needed
Advantages
• Almost zero upfront infrastructure investment
• Just-in-time Infrastructure
• More efficient resource utilization
• Usage-based costing
• Potential for shrinking the processing time
• Less time for development
Basis – automated elasticity - on-demand and
elastic nature
Example – e-ticketing application
AWS
The Amazon Web Services (AWS) cloud provides a highly reliable
and scalable infrastructure for deploying web-scale solutions,
with minimal support and administration costs, and good
flexibility
• Amazon Elastic Compute Cloud (Amazon EC2) is a web
service that provides resizable compute capacity in the cloud.
• Operating system, application software and associated configuration
settings can be bundled in an Amazon Machine Image (AMI).
• Scale up / down is done by provisioning / decommissioning multiple
instances using simple web service calls
• On-Demand Instances / Reserve instances / Spot Instances
• Amazon S3 to retrieve/store input /output datasets.
– store / retrieve large amounts of data as objects in buckets (containers)
on the web using standard HTTP
– Copies can be made in 14 locations using CloudFront
• Amazon Simple Queue Service (Amazon SQS) is a reliable,
highly scalable, distributed queue for storing messages as they
travel between computers and application components
.
• Amazon SimpleDB is a web service for real-time lookup
and simple querying of structured data
• Amazon Relational Database Service (Amazon RDS)
provides an easy way to setup, operate and scale a
relational database in the cloud
• On-demand hadoop cluster- distributed processing,
automatic parallelization, and job scheduling
• Amazon Elastic MapReduce provides a hosted Hadoop
framework running on the web-scale infrastructure of
Amazon Elastic Compute Cloud
• Amazon Virtual Private Cloud (Amazon VPC) extends
corporate network into a private cloud contained within
AWS
• Availability Zones are distinct locations engineered to be
insulated from failures in other Availability Zones and provide
inexpensive, low latency network connectivity to other
Availability Zones in the same Region.
• Elastic IP addresses allocates a static IP address and
programmatically assigns it to an instance.
• CloudWatch can monitor an Amazon EC2 instance for resource
utilization, operational performance, and overall demand
patterns .
• Auto scaling feature to create Auto-scaling Group.
• Incoming traffic can be distributed using elastic load balancing
service.
• Amazon Elastic Block Storage (EBS) volumes provide network-
attached persistent storage to Amazon EC2 instances.
• AWS offers payment and billing services.
• Amazon CloudFront. provides a high performance, globally
distributed content delivery system
GrepTheWeb Application
Cloud Services best practices
• Design for failure and nothing will fail - design,
implement and deploy for automated recovery from
failure.
• In AWS
– Failover gracefully using Elastic IPs
– Utilize multiple Availability Zones
– Maintain an Amazon Machine Image
– Utilize Amazon CloudWatch
• Decouple the components – based on SOA design
principle of the loosely coupled the components for
scalability
– Message queues: If one component fails the system
will buffer the messages and get them processed when
the component comes back up.
1) SQS for decoupling and buffering
2) Service interfaces for components
3) AMI created
4) Stateless applications
• Implement elasticity
• Think parallel
The beauty of the cloud shines when you
combine elasticity and parallelization
• Keep dynamic data closer to the
compute and static data closer to the
end-user
PSG-Yahoo Grid and Cloud Computing Lab
2008 till date
• 54 rack servers – SC145 & PowerEdge
2950
• 40 end connectors
• 10 client nodes
• RHEL
• Hadoop
• Globus
• OpenVZ
• Xen
• Courses conducted – 10
• Papers published – 11
• Internship – 3
• Placement – 3
• PhD – 4
• Conference talks - 3
• An Efficient Approach to Task Scheduling in
Computational Grids
• Data Discovery in Grid using Content Based Searching
Technique
• P2P Information Retrieval Framework for Digital Library
System using Hadoop DFS.
• Integration of Xen and Hadoop framework
• DNA sequencing using hadoop data grids
• DNA sequencing in public clouds
• Virtualisation – using Xen and Open VZ- a comparison
of performance
• Grid Security – a tree based dynamic approach
• Study of some existing scheduling algorithms
• Grid Task Scheduling using PPSO
• Content based Image Retrieval
• Modification of fairshare scheduling in Hadoop
• Two level scheduler for clouds
• Hybrid Search using content based and semantic
approaches
Get documents about "