Grid Technology

Document Sample
Grid Technology Powered By Docstoc
					    Grid Technology and
  Applications Developed in
      Academia Sinica
            Eric Yen
Computing Centre, Academia Sinica
           Jan. 23th, 2003
 Basic researches for larger scale problems, e.g., High
  Energy Physics, Bioscience (Genomic & Protein Research),
  Remote Instrumentation, etc.
 Real Applications,
 Generic Types of Problems, and demanding
      Computing power from Teraflops to PetaFlops
      Storage Capacity from Terabytes to PetaBytes
      More and more network bandwidth
      Reliability
      Security
                    What Grid can do ?
1. Coordinating the sharing of distributed resources and flexible
   collaboration thru “virtual organization”
2. Effective management of distributed heterogeneous resources
3. Solving larger scale problem which is beyond the provision of any
   single institute/supercomputer in the world
4. Construction of a secure, reliable, efficient, and scalable mass
   storage system environment
5. Optimize the Usage of Resources
6. Facilitate better Sharing and Integration of Information Resources
7. Demands of IT for scientific researches in the new millennium
    Management of PetaByte scale storage system
    Collaborative processing
    Sharing and collaborating distributed resources
8. Grid is the mainstream for IT infrastructure
Computer and Network System Resources
                in AS
CPU Utilization @ Gauss
CPU Utilization @ Euler
Job Types in HPC @ IBM SP
Biomedical & Scientific Network
TW IPv6 Network Logical Map
                Grid Applications in AS
•   High Energy Physics(LCG): Computational Grid, Data Grid, Access Grid
•   BioGrid: Computational Grid, Data Grid, Access Grid
     – In charge of coordination of National Genomic and Protein Project
     – Bio-Computing
     – Bio-Informatics
     – Bio-Diversity
     – Bio-Portal
•   Computational Chemistry and Computational Physics: Computational Grid, Access Grid
•   National Digital Archives: Data Grid, Access Grid
     – In charge of the National Digital Archive Project
•   Earth Science and Astronomy Research: Computational Grid, Data Grid
     –   Earthquake Data Center
     –   Broadband Array in Taiwan for Seismology (BATS)
     –   Strong Motion Networks
     –   Taiwan Telemetered Seismographic Network (TTSN, 1973~1992)
•   Geospatial Information Science & Applications: Data Grid, Access Grid
     – NSDI
     – Web-based Space, Time and Language Content Architecture
•   eLearning: Access Grid, Data Grid and less Computational Grid
      The Infrastructure for Integrating Web
           Services & Grid Technology
Web Services & Grid Protocols

                                    Courtesy by IBM Taiwan
    Open Grid Services Architecture
  Manage resources across distributed heterogeneous platforms
  Deliver seamless QoS
  Provide a common base for autonomic management solutions
  Define open, published interfaces
  Exploit industry-standard integration technologies
      Web Services, SOAP, XML,...
  Integrate with existing IT resources
  Open Grid Infrastructure (OGSI)
Grid Service Implementation - Examples

                           Courtesy by IBM Taiwan
        Architecture Framework
OGSA Software Evolution

                          Courtesy by IBM Taiwan
    Grid Technology Development in AS
• Technology Developed for PC Cluster
   – Load Balance
   – Remote Execution Environment (LERR)
   – Meta-Queuing System (pQS)
• Resource Metadata and Management System for Grid
• Design and Operation of high performance network
• Construction of Storage Area Network
• We are now porting all these to the Grid platform, they will be
  Globus enabled.
       GRID Deployment in AS
 LCG test-bed for both Computing Centre and
  Institute of Physics started from 2002
 Globus Toolkit 2.2.x test-bed for parallel
  computing environment has been established
 Globus Toolkit 2.2.x test-bed for BIO-Cluster
  (ready from 2003)
 Globus Toolkit 3.0 testing began from Jan. 2003
 Other Works before July 2003
   Building pQS & Globus toolkit mixed environment
   Porting LERR-G
   Working on Data Management issue
   Promote GRID technology to our partners
LHC Computing Grid (LCG)
    HEP group in Taiwan and collaboration joined
   Academia Sinica: Joined Fermilab CDF Collaboration in 1993.
   National Central University: L3 at CERN (1990)
   National Taiwan University: Belle at KEK (1995)
   All three groups join the LHC at CERN.
      LHC is next generation high energy particle collider.
      AS: ATLAS(A Toroidal LHC ApparatuS )
      NCU and NTU: CMS
 CDF:
    Top quake discovery in 1995, Taiwan is one of the 5 countries in CDF.
    Evident of CP violation in B sector (1997) (Sin(2beta) measurement)
 Belle:
    CP Violation in B Physics(2002).
 L3:
    Higgs Search: Find one Higgs candidate (2000)
 CDF: The Collider Detector at Fermilab
 More than 500 physicists work on the collaboration
 The discovery of the top quark was one of the major results of
  the CDF collaboration.

 Installation of Silicon
  Detector into the CDF
 Sufficient bandwidth for downloading data set
  with size of 2 ~ 20 GBytes for multiple
  researchers at the same time.
 Sufficient bandwidth and efficient management to
  support stable multipoint video conferencing
 Within 2 ~ 3 years, the network should be able to
  support Tera Byte data file transferring bi-
  directionally, not including GRID requirement.
Taiwan LCG Structures:                                 Taiwan International Connectivity:

 Taiwan domestic network. Minimum                        Broadband connections to US, Europe, Japan
bandwidth is 2.5Gbps.                                  and Hong Kong are in place and will be
                                                       upgraded when necessary.
 Taipei GigaPoP is a Metropolitan Fiber Ring,
with the capability to upgrade from 10Gbps to
Multi-Lambda network.
               NCU                                                                       CERnet             AU
                           NCTU                                  EU
              Tear 2/3
YMU                                              NTOU
                                                                     1.2G or 2.5G                  JP
                                                                 via StarLight in Ph#1
                                                                                            155M  622M          US
                                  Academia                                                                622M
      MOECC                       Sinica (AS)
                                    Tear 1/2                                              Taipei
                Taipei                                                                   GigaPoP

               GigaPoP                           CGU                              155M  622M

                (10G & 2.5G)                                                 HK
   TANet                                Taipei City
   School                               School Net,                                                 SG
                                                                    CN                    TH
     s                NTU                  GSN,                    CSTnet
                    Tear 2/3
            Year      2002    2003    2004    2005

Processors             30      90      200     400

SI2000                15K     75K     220K    680K

Disk (TB)               2      10      30      80

Tape (TB)              30      60      120     240

Local Network
                      1200    1200    1200    1200
Bandwidth (MB/s)

                      1+1+2   1+1+2   1+1+2   1+1+2

Funding Status          F      E       E       E
Bio-Computing and CRASA
Conceptual Bio-Grid Application
    Bio-Grid Application Platform
 Bio-Cluster
      PC Farm Project (1996~)
      More 330 computing nodes (~2002)
      Dedicated 64 CPUs BioCluster (Mar. 2002)
      BioGrid in AS (IBMS, ASCC)
 Business Process
      Oracle (Compaq, IBM, SUN, Linux)
      64 CPUs BioCluster
      Parallel CRASA (IBMS)
      IBM DiscoveryLink pilot project (dbEST, dbSNP, Swiss-Prot)
      NGC LIMS
      ENU mouse database design
      Microarray Database (SMD on Linux)
        A New Tool for Sequence Analysis - CRASA

• What is CRASA?
 – Complexity Reduction Algorithm for Sequence Analysis

 – A homology based tool for annotating long genomic sequence (e.g.
 Human Chromosome)

 – Global sequence alignment for genome annotation

• The advantages of CRASA
 – Dynamic(Progressive) data structure (Multi-level Pyramid Data Structure)

 – Parallel processing

 – Low memory requirement

 – Long genomic sequence annotation

 – High accuracy of gene prediction
                    System Overview (CRASA)
                     Masking      cDNA Database                    Genomic
         RepBase                                                 DNA Sequences
         Database                    (HGI)

                               CR Pyramid constructing               CR
                                   (256 patterns)                  Processing

                                    CR Pyramid

                                          Pattern alignment

                               Match length  60 bp ?     No
                               Match fragments  3 ?               Filtering


89/12/20 pm                          Exons                                       31
 Using Globus Toolkit v2.2

 Compiling CRASA program with MPICH-
  G/PGI compiler

 Using globusrun to run the CRASA program
  on GRID
Start the Grid proxy
Compose the RSL script
CRASA is running on another machine
              Benchmark of CRASA – Part I

• Gene Prediction Performance

   – For Human Chromosome 21 (33.9Mbps, gene poor)
    ~ 48 minutes (11.5 kbps/sec)

   – For Human Chromosome 22 (34.0Mbps, gene rich)

    ~ 100 minutes (5.7 kbps/sec)

Gene prediction of the draft of whole human genome (98.8%)
has been completed through CRASA in two weeks.
            Benchmark of CRASA – Part II

• Parallelization Speed Up
            Benchmark of CRASA – Part IV

• Length of Query Sequence v.s. Elapsed Time
Digital Archive Data Grid
     Scope of Digital Archives
                  Domain Expertise
                   Culture and
               Knowledge Background
Digitised                                     e-Learning
  Born                                         Enterprise
 Digital                                      Intelligence

            Business Process and Lifecycle
                                             Knowledge Base
     Why Knowledge-based Approach for
            Digital Archives
 Passive Requirements: for long-term scalable and
  persistent archives while the technology evolves
 Active Requirements: for generation of new
  knowledge (for easily discover new and
  unexpected patterns, trends and relationships that
  can be hidden deep within very large and diverse
      Content Management Challenges1
   Separating content from presentation
   Versioning, Roll-back
   Data/Information re-use
   Re-purposing of Information, flexible Output
   Workflow, submit, review, approve, store
      Content Management Challenges2
   Integrating diversified contents and external sources
   System and roles-based security
   Metadata Management
   Compute and Storage resources on demand
   Reliability and Scalability
            Basic Functions of a CMS
 A CMS manages the path from authoring through to
  publishing using a scheme of workflow and by providing a
  system for content storage and integration.
    Authoring/Capturing
    Workflow
    Integration and Storage
    Publishing/Dissemination
The CMS Feature List
Access Grid
Access Grid for Collaborative Env.
    Multi-point Video Conference
      MCU-based : 24 concurrent
    WhiteBoard
    Video Server
    Web-based Content Retrieval and
eLearning Data Grid
Challenge and Goals of eLearning
 Challenge
   Building Knowledge Society
   Ubiquitous Learning
   Emergence of New Learning Models --> Workflow
   The most efficient implementation
   Adaptation to technology changes
 Goals
   Learning how to learn
   Helping people with disabilities more easier to learn
   Life Long Learning and Life Long Teaching
   Training at All Levels
   Formation of Learning Society
      Basic Requirements of eLearning
1. Combination of either Learner Centric or Teacher Centric,
   for making the most outcome
2. Diversified, Large Amount, Distributed and better accessed
   Learning Resources
3. Well Organized and Complete Content Description
4. Integration of heterogeneous Information Resources
5. On Demand and Ubiquitous Learning for anyone
6. Toward Effective Knowledge Discovery and Well
   Knowledge Organization & Management
                        How to Get There?
 Open Source eLearning Platform
           Web-based virtual learning, teaching and informing
           Robust, distributed collaborated and ubiquitous computing environment as the
            infrastructure --> demands for Grid Infrastructure !
           Standardization
                  Well-defined specification
                  Interoperability Mechanism for conversion, transformation, and exchange, etc.
                  Integration
 Building Community for
           Developing Common tools
           Technical Study & Support
           Requirements Collection
           Planning
           Suggestions to National Strategy
 Grid Infrastructure
 Learning Resources
 eLearning Services
 Progress of eLearning in Taiwan
Master Plan of Information Technology in
 Education for Primary and Secondary Schools
  Ministry of Education, 2001 --2005
  20% of curriculum time of using IT
  600 seed schools
  Training teacher teams
  Equipping teachers with notebook computers
Program of Science and Technology for e-
 Learning (2003)
  Cross Ministry initiative
  130 million US$ for 5 years
  Led by the President Liu of NCU
 Pilot Projects for eLearning in AS
 Social University for Adults Learning
 Community University for Minority, e.g., Indigenous
 Parallel Programming and Computing Applications
 Survey of the standardization of metadata for
Grid Architecture for eLearning
GBIF --> Biodiversity Data Grid
International Biodiversity Collaboration
Earth Science Data Center

Shared By:
Description: Grid Technology document sample