DISTRIBUTED ENCODING ENVIRONMENT BASED ON GRIDS AND IBP INFRASTRUCTURE

Document Sample
DISTRIBUTED ENCODING ENVIRONMENT BASED ON GRIDS AND IBP INFRASTRUCTURE Powered By Docstoc
					    DISTRIBUTED ENCODING
         ENVIRONMENT
    BASED ON GRIDS AND IBP
       INFRASTRUCTURE

           Petr Holub*‡ and Lukáš Hejtmánek*

*Faculty   of Informatics and ‡Institute of Computer Science,
                    Masaryk University, Brno
                    and ‡ CESNET, Prague
                        Czech Republic
                                      Motivation
• Huge production of multimedia and esp. video content
      – education (lectures, educational movies), science, fun, etc.
• Need for transformation (transcoding) from source
  formats to formats suitable for downloading and
  streaming
      – very computationally demanding
• Problems with storage capacity

• BUT: We have great Grid infrastructure! :-)

TERENA Networking Conference 2004, Rhodes, Greece                      2
                           Used Infrastructure
• MetaCenter Grid Infrastructure in Czech Rep.
      – PC clusters
         • more than 80 dual processor (PIII and P4) nodes with 2
           GB RAM and fast scratch disk
         • GE and Myrinet interconnection
         • Scheduling system: PBSPro
         • clusters are cheap and grow fast!
      – SGI machines, Alphas...
• Distributed Data Storage (DiDaS)
      – 15 TB of IBP based distributed storage
TERENA Networking Conference 2004, Rhodes, Greece                   3
  MetaCenter, DiDaS, CESNET Network




TERENA Networking Conference 2004, Rhodes, Greece   4
                                 IBP Overview




• exNode - serialized XML metadata
      – collection of capabilities of allocated IBP arrays
      – essential for file access
• We use AFS for storing exNodes
TERENA Networking Conference 2004, Rhodes, Greece            5
                            Scheduling Model
• Selection of best hosts
      – based on Completion Time Estimate (CTE)
• Data location optimization
      – selection of best storage depots
      – prefetch support
• Simplified CTE


      – problem with network performance estimate bD,p(t)

TERENA Networking Conference 2004, Rhodes, Greece           6
                 Scheduling Algorithm (1/2)

• General scheduling  NPO class
      – for uniform processors and jobs of different size
• Our greedy algorithm  PO class when processors and
  depots are connected via a complete graph
      – takes advantage of uniform task size
      – formal proof of correctness
      – for common graph, the scheduling belongs to  PO class
        again as greedy algorithm might prevent maximum utilization
        of depots

TERENA Networking Conference 2004, Rhodes, Greece                     7
                 Scheduling Algorithm (2/2)




TERENA Networking Conference 2004, Rhodes, Greece   8
                               Implementation
• Distributed Encoding Environment
      – for steering transcoding process
• libxio library
      – for enabling IBP in applications
• relies on transcode and HelixProducer for actual
  data transcoding
      – many input/output built in transcode formats: MPEG-1,
        MPEG-2, MPEG-4 (DivX, MS MPEG...), DV, RAW, etc.
      – RealMedia and others through external compression software
        (e.g. HelixProducer)
TERENA Networking Conference 2004, Rhodes, Greece                9
                              libxio library

• Provides equivalents for standard UNIX I/O functions
      – open, close, read, write, fttruncate, lseek, stat, fstat, and lstat
• IBP URI format


      – without lors:// prefix, local file is accessed
      – local_path/file specifies serialized metadata
      – short form lors:///local_path/file is available
        for reading
• IBP enabled transcode based on libxio
TERENA Networking Conference 2004, Rhodes, Greece                         10
        Distributed Encoding Environment
                     Overview




TERENA Networking Conference 2004, Rhodes, Greece   11
  Distributed Encoding Environment (1/3)




• lors tools are used for uploading from editing stations
  (Win32, MacOS X)
• remultiplexing for proper video/sound interleaving

TERENA Networking Conference 2004, Rhodes, Greece           12
  Distributed Encoding Environment (2/3)




• image transformations are performed using transcode
      – image size reduction, de-interlacing, noise reduction, color
        corrections, audio resampling and cleaning

TERENA Networking Conference 2004, Rhodes, Greece                      13
  Distributed Encoding Environment (3/3)




• IBP-enabled servers
• IBP-enabled client applications
TERENA Networking Conference 2004, Rhodes, Greece   14
                      Pilot User Groups (1/2)
• Lecture recording @ Faculty of Informatics, MU
      – 20 hrs/week, new lecturing halls with automatic video
        acquisition
         • HW conversion of analog signals to DV using Canopus
           ADVC-100 boxes
      – several target formats
         • high quality RealMedia (768576 @ 25 fps, 3 Mbps)
         • low quality RealMedia (384288 @ 15 fps, 56-768 kbps)
         • DivX (384288 @ 25 fps, 1CD)


TERENA Networking Conference 2004, Rhodes, Greece                  15
                      Pilot User Groups (2/2)
• Neurosurgery department at St. Anna University
  Hospital in Brno
      – large archives of operation recordings
      – they are willing to make them available to students of
        medicine
      – some editing is necessary: to select interesting pieces only
        and to anonymize patient
      – publishing to CESNET RealMedia streaming server




TERENA Networking Conference 2004, Rhodes, Greece                      16
                                   Future Work
• Deployment of new scheduling systems
      – DataGrid/EGEE, GridLab, or something else?
• Network traffic prediction service
      – suitable for distributed data storage
      – support for regularly running jobs
      – support for in-advance bandwidth allocations
• GUI for DEE



TERENA Networking Conference 2004, Rhodes, Greece      17
                          Acknowledgements
• CESNET Development Foundation projects 017/2002
  (DEE) and 018/2002 (DiDaS)
• CESNET Research Intent MSM 6383917201
• Miloš Liška, Luděk Matyska, Eva Hladká and
  MetaCenter staff




TERENA Networking Conference 2004, Rhodes, Greece   18
Thank you for your attention!

            Q/A?

				
DOCUMENT INFO