Introduction to Cloud Computing

Reviews
Shared by: JasonDetriou
Stats
views:
231
rating:
not rated
reviews:
0
posted:
6/24/2009
language:
English
pages:
0
Data-Intensive Text Processing with MapReduce (Bonus session) Tutorial at 2009 North American Chapter of the Association for Computational Linguistics―Human Language Technologies Conference (NAACL HLT 2009) Jimmy Lin The iSchool University of Maryland Sunday, May 31, 2009 Chris Dyer Department of Linguistics University of Maryland This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details Agenda   Hadoop “nuts and bolts” “Hello World” Hadoop example (distributed word count) Running Hadoop in “standalone” mode Running Hadoop on EC2     Open-source Hadoop ecosystem Exercises and “office hours” Hadoop “nuts and bolts” Source: http://davidzinger.wordpress.com/2007/05/page/2/ Hadoop Zen  Don’t get frustrated (take a deep breath)…  Remember this when you experience those W$*#T@F! moments  This is bleeding edge technology:      Lots of bugs Stability issues Even lost data To upgrade or not to upgrade (damned either way)? Poor documentation (or none)  But… Hadoop is the path to data nirvana? Cloud9  Library used for teaching cloud computing courses at Maryland Demos, sample code, etc.      Computing conditional probabilities Pairs vs. stripes Complex data types Boilerplate code for working various IR collections   Dog food for research Open source, anonymous svn access Master node Client JobTracker TaskTracker TaskTracker TaskTracker Slave node Slave node Slave node From Theory to Practice 1. Scp data to cluster 2. Move data into HDFS 3. Develop code locally 4. Submit MapReduce job 4a. Go back to Step 3 You Hadoop Cluster 5. Move data out of HDFS 6. Scp data from cluster Data Types in Hadoop Writable Defines a de/serialization protocol. Every data type in Hadoop is a Writable. Defines a sort order. All keys must be of this type (but not values). WritableComprable IntWritable LongWritable Text … Concrete classes for different data types. Complex Data Types in Hadoop   How do you implement complex data types? The easiest way:    Encoded it as Text, e.g., (a, b) = “a:b” Use regular expressions to parse and extract data Works, but pretty hack-ish  The hard way:    Define a custom implementation of WritableComprable Must implement: readFields, write, compareTo Computationally efficient, but slow for rapid prototyping  Alternatives:   Cloud9 offers two other choices: Tuple and JSON Plus, a number of frequently-used data types Input file (on HDFS) InputSplit InputFormat RecordReader Mapper Partitioner Reducer OutputFormat RecordWriter Output file (on HDFS) What version should I use? “Hello World” Hadoop example Hadoop in “standalone” mode Hadoop in EC2 From Theory to Practice 1. Scp data to cluster 2. Move data into HDFS 3. Develop code locally 4. Submit MapReduce job 4a. Go back to Step 3 You Hadoop Cluster 5. Move data out of HDFS 6. Scp data from cluster On Amazon: With EC2 0. Allocate Hadoop cluster 1. Scp data to cluster 2. Move data into HDFS EC2 3. Develop code locally 4. Submit MapReduce job 4a. Go back to Step 3 You Your Hadoop Cluster 5. Move data out of HDFS 6. Scp data from cluster 7. Clean up! Uh oh. Where did the data go? On Amazon: EC2 and S3 Copy from S3 to HDFS EC2 (The Cloud) S3 (Persistent Store) Your Hadoop Cluster Copy from HFDS to S3 Open-source Hadoop ecosystem Hadoop/HDFS Hadoop streaming HDFS/FUSE EC2/S3/EBS EMR Pig HBase Hypertable Hive Mahout Cassandra Dryad CUDA CELL Beware of toys! Exercises Questions? Comments? Thanks to the organizations who support our work:

Related docs
cloud computing
Views: 1  |  Downloads: 0
An Introduction to Cloud Computing
Views: 134  |  Downloads: 40
An Introduction to Cloud Computing
Views: 6  |  Downloads: 2
An Introduction to SaaS and Cloud Computing
Views: 314  |  Downloads: 70
Introduction to Cloud Computing
Views: 7  |  Downloads: 3
CLOUD COMPUTING AN INTRODUCTION
Views: 6  |  Downloads: 1
Introduction to Cloud Computing
Views: 38  |  Downloads: 7
Cloud_computing
Views: 119  |  Downloads: 49
Introduction to Amazon Elastic Cloud Computing
Views: 24  |  Downloads: 4
Cloud Computing-A future direction _revisioned_
Views: 169  |  Downloads: 74
premium docs
Other docs by JasonDetriou
Sample Executive Summary lasas
Views: 362  |  Downloads: 2
employee_work_rules
Views: 327  |  Downloads: 6
Partnership disputes Arbitration
Views: 197  |  Downloads: 2
Dry goods business
Views: 185  |  Downloads: 0
CORPORATE RETIREMENT PLAN SOLUTIONS
Views: 261  |  Downloads: 3
Demand for repayment of advances
Views: 154  |  Downloads: 3
Nonresidential building
Views: 208  |  Downloads: 1
Marbury v Madison info
Views: 306  |  Downloads: 3
Herbal Tea Remedies
Views: 561  |  Downloads: 25
Application To Rent Or Lease
Views: 1275  |  Downloads: 61
Travel agents indemnification of carrier
Views: 215  |  Downloads: 1