									   Midterm Review

Exam Date
      October 25, 2011
      Location 107 Talbert
      Please bring
           Pencils, pens and erasers.
      This is a closed book exam.
      NO Other material is allowed.
      No calculators/phones.
      Arrive on time, no extra time will be
      given if you arrive late
     Defining data intensive computing ( as in Fourth Paradigm: up
     to p.19)
     Enabling Technologies (ET):
       ET1: Web service

       ET2: Special data structures and algorithms

       NO GAE

     MapReduce model: components: Mapper, Reducer, Partitioner,
     Combiner; Execution framework , shuffle and sort
     Hadoop (HDFS) : as in yahoo site: Ch1, 2, 4; 5 only partitioner.
     Problem solving with MR:
       Chapter 1-4 in Lin and Dryer’s text

       Tom White analysis of web log (Don’t ask me for the

         handout, go find it)

     Defining data-intensive computing: J. Gray
     Given a problem solve it using MR
     Given a MR provide, provide a numerical example
     Best practices and design patterns described in the
     Lin&Dryer text
     Web services and project 1
     Hadoop (HDFS) architecture
     Functions of various MR modules

How to study?
     Make a list of all material to study.
     Study the material
     Practice writing pseudo code for the
     Use block diagrams and numerical
     examples when necessary

