presentation PDF PowerPoint Presentation Facelifts by benbenzhou

VIEWS: 167 PAGES: 87

presentation PDF PowerPoint Presentation Facelifts

More Info
									Computer science theory to
     support research
  in the information age
         John Hopcroft
       Cornell University
       Ithaca, New York


         University of Southern California
                   April 6, 2010
            Time of change
   The information age is a fundamental
    revolution that is changing all aspects of
    our lives.

   Those individuals and nations who
    recognize this change and position
    themselves for the future will benefit
    enormously.


                 University of Southern California
                           April 6, 2010
        Drivers of change
   Merging of computing and
    communications
   Data available in digital form
   Networked devices and sensors
   Computers becoming ubiquitous


              University of Southern California
                        April 6, 2010
Internet search engines are
changing

   When was Einstein born?

    Einstein was born at Ulm, in Wurttemberg,
      Germany, on March 14, 1879.

    List of relevant web pages

                   University of Southern California
                             April 6, 2010
University of Southern California
          April 6, 2010
Internet queries will be different

   Which car should I buy?
   What are the key papers in Theoretical
    Computer Science?
   Construct an annotated bibliography on
    graph theory.
   Where should I go to college?
   How did the field of CS develop?

                 University of Southern California
                           April 6, 2010
Which car should I buy?
   Search engine response: Which criteria below
    are important to you?
       Fuel economy
       Crash safety
       Reliability
       Performance
       Etc.



                       University of Southern California
                                 April 6, 2010
Make           Cost     Reliability            Fuel         Crash       Links to
                                               economy      safety      photos/
                                                                        articles
Toyota Prius   23,780   Excellent              44 mpg       Fair

Honda Accord   28,695   Better                 26 mpg       Excellent

Toyota Camry   29,839   Average                24 mpg       Good

Lexus 350      38,615   Excellent              23 mpg       Good

Infiniti M35   47,650   Excellent              19 mpg       Good




                        University of Southern California
                                  April 6, 2010
University of Southern California
          April 6, 2010
2010 Toyota Camry - Auto Shows
Toyota sneaks the new Camry into the Detroit Auto Show.
Usually, redesigns and facelifts of cars as significant as the hot-
selling Toyota Camry are accompanied by a commensurate
amount of fanfare. So we were surprised when, right about the
time that we were walking by the Toyota booth, a chirp of our
Blackberries brought us the press release announcing that the
facelifted 2010 Toyota Camry and Camry Hybrid mid-sized sedans
were appearing at the 2009 NAIAS in Detroit.

We’d have hardly noticed if they hadn’t told us—the headlamps
are slightly larger, the grilles on the gas and hybrid models go      Toyota Camry
their own way, and taillamps become primarily LED. Wheels are
also new, but overall, the resemblance to the Corolla is downright             › Overview
uncanny. Let’s hear it for brand consistency!                            › Specifications
                                                                         › Price with Options
Four-cylinder Camrys get Toyota’s new 2.5-liter four-cylinder with
                                                                         › Get a Free Quote
a boost in horsepower to 169 for LE and XLE grades, 179 for the
Camry SE, all of which are available with six-speed manual or         News & Reviews
automatic transmissions. Camry V-6 and Hybrid models are
relatively unchanged under the skin.                                           2010 Toyota Camry - Auto
                                                                              Shows
Inside, changes are likewise minimal: the options list has been
shaken up a bit, but the only visible change on any Camry model       Top Competitors
is the Hybrid’s new gauge cluster and softer seat fabrics. Pricing
will be announced closer to the time it goes on sale this March.               Chevrolet Malibu
                                                                         Ford Fusion
                                                                         Honda Accord sedan
                                     University of Southern California
                                               April 6, 2010
Which are the key papers in
Theoretical Computer Science?
   Hartmanis and Stearns, ―On the computational complexity of algorithms‖
   Blum, ―A machine-independent theory of the complexity of recursive functions‖
   Cook, ―The complexity of theorem proving procedures‖
   Karp, ―Reducibility among combinatorial problems‖
   Garey and Johnson, ―Computers and Intractability: A Guide to the Theory of NP-Completeness‖
   Yao, ―Theory and Applications of Trapdoor Functions‖
   Shafi Goldwasser, Silvio Micali, Charles Rackoff , ―The Knowledge Complexity of Interactive Proof
    Systems‖
   Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy, ―Proof
    Verification and the Hardness of Approximation Problems‖




                                    University of Southern California
                                              April 6, 2010
Temporal Cluster Histograms:
       NIPS Results
                                         NIPS k-means clusters (k=13)
                   180
                                                                                                 12: chip, circuit, analog, voltage, vlsi
                   160                                                                           11: kernel, margin, svm, vc, xi
                                                                                                 10: bayesian, mixture, posterior, likelihood,
                   140                                                                               em
                                                                                                 9: spike, spikes, firing, neuron, neurons
Number of Papers




                   120
                                                                                                 8: neurons, neuron, synaptic, memory,
                                                                                                     firing
                   100
                                                                                                 7: david, michael, john, richard, chair
                   80                                                                            6: policy, reinforcement, action, state,
                                                                                                     agent
                   60                                                                            5: visual, eye, cells, motion, orientation
                                                                                                 4: units, node, training, nodes, tree
                   40                                                                            3: code, codes, decoding, message, hints
                                                                                                 2: image, images, object, face, video
                   20
                                                                                                 1: recurrent, hidden, training, units, error
                    0                                                                            0: speech, word, hmm, recognition, mlp
                         1   2   3   4    5     6    7     8     9      10   11   12   13   14
                                                      Year
                                                                        University of Southern California
                                                                          Shaparenko, Caruana,
                                                                                  April 6, 2010                 Gehrke, and Thorsten
Fed Ex package tracking




          University of Southern California
                    April 6, 2010
University of Southern California
          April 6, 2010
University of Southern California
          April 6, 2010
University of Southern California
          April 6, 2010
University of Southern California
          April 6, 2010
University of Southern California
          April 6, 2010
 Zoom In
 Regional Radar
 Show Map Click:
 Total Precipitation
 Storm Severe
 Pan MapTracks
        Out
 Animate Map
NEXRAD NY
 BGM Radar
 9999
 2999
 10
 454
 744
 Ithaca,
 -76.51950
 4.125
 02.40547
 N0R
 9
Binghamton, Base Reflectivity 0.50 Degree Elevation Range 124 NMI —   Map of All US Radar Sites
 »

 A

 d
 v
 a
 n
 c
 e
 d

 R

 a
 d
 a
 r

 T
 y
 p
 e
 s
                                                               University of Southern California
 C                                                                       April 6, 2010
Collective Inference on Markov Models
for Modeling Bird Migration



Space




          time




                 University of Southern California
                           April 6, 2010
Daniel Sheldon, M. A. Saleh Elmohamed, Dexter Kozen
                University of Southern California
                          April 6, 2010
Science base to support activities

  Track flow of ideas in scientific
    literature
  Track evolution of communities in
    social networks
  Extract information from unstructured
   data sources.

                University of Southern California
                          April 6, 2010
Tracking the flow of ideas in
    scientific literature




Yookyung Jo

              University of Southern California
                        April 6, 2010
                           Index                                       Web

                           Probabilistic                               Chord
                File
                           Text                                        Usage
                Retrieve
                Text
                Index

                                                                               Page rank
                                                                               Web
                                                              Web              Link
              Discourse             Retrieval                 Page             Graph
              Word                  Query                     Search
              Centering             Search                    Rank
              Anaphora              Text
Tracking the flow of ideas in the scientific literature
Yookyung Jo
                                  University of Southern California
                                            April 6, 2010
Original papers   University of Southern California
                            April 6, 2010
Original papers cleaned up
                        University of Southern California
                                  April 6, 2010
Referenced papers   University of Southern California
                              April 6, 2010
Referenced papers cleaned up.
Three distinct categories of papers
                          University of Southern California
                                    April 6, 2010
Tracking communities in
social networks


Liaoruo Wang
               University of Southern California
                         April 6, 2010
―Statistical Properties of Community Structure
in Large Social and Information Networks‖,
Jure Leskovec; Kevin Lang; Anirban
Dasgupta; Michael Mahoney

   Studied over 70 large sparse real-world
    networks.

   Best communities approximately size 100
    to 150.

                  University of Southern California
                            April 6, 2010
Our most striking finding is that in nearly
 every network dataset we examined, we
 observe tight but almost trivial
 communities at very small scales, and at
 larger size scales, the best possible
 communities gradually "blend in" with the
 rest of the network and thus become less
 "community-like."

                University of Southern California
                          April 6, 2010
Conductance




              100
                    Size of community
                    University of Southern California
                              April 6, 2010
 Giant component




University of Southern California
          April 6, 2010
Whisker: A component with v vertices
connected by e v edges
             University of Southern California
                       April 6, 2010
  Our view of a community
                      Colleagues at Cornell



Classmates
                                       TCS


                           Me
                                                     More connections
         Family and friends                          outside than inside
                     University of Southern California
                               April 6, 2010
                      Core


Should we remove all whiskers and search
for communities in core?
            University of Southern California
                      April 6, 2010
Should we remove whiskers?
 Does there exist a core in social
  networks? Yes
 Experimentally it appears at p=1/n in
  G(n,p) model
 Is the core unique? Yes and No
 In G(n,p) model should we require that a
  whisker have only a finite number of edges
  connecting it to the core?
    Laura Wang   University of Southern California
                           April 6, 2010
Algorithms
 How do you find the core?
 Are there communities in the core of social
  networks?




                University of Southern California
                          April 6, 2010
How do we find whiskers?
 NP-complete if graph has a whisker
 There exists graphs with whiskers for
  which neither the union of two whiskers
  nor the intersection of two whiskers is a
  whisker



                University of Southern California
                          April 6, 2010
Graph with no unique core

                           3
      1                                        1




           University of Southern California
                     April 6, 2010
Graph with no unique core

       1




           University of Southern California
                     April 6, 2010
What is a community?
How do you find them?




         University of Southern California
                   April 6, 2010
Communities
 Conductance
 Can we compress graph?
   Rosvall and Bergstrom, ―An informatio-
    theoretic framework for resolving community
    structure in complex networks‖
 Hypothesis testing
   Yookyung Jo

                  University of Southern California
                            April 6, 2010
Description of graph with
community structure
  Specify which vertices are in which
   communities
  Specify the number of edges between
   each pair of communities




           University of Southern California
                     April 6, 2010
Information necessary to specify
graph given community structure
   m=number of communities
   ni=number of vertices in ith community
   lij number of edges between ith and jth
    communities
                         m        ni ni 1 / 2                   ni n j
       H Z      log
                                             lii                lij
                        i 1                               i j




                      University of Southern California
                                April 6, 2010
Description of graph consists of description
of community structure plus specification of
graph given structure.
   Specify community for each edge and the
    number of edges between each community

      nlogm      1   m m 1 log l H Z
                 2
             community structure

   Can this method be used to specify more
    complex community structure where
    communities overlap?
                  University of Southern California
                            April 6, 2010
Hypothesis testing
 Null hypothesis: All edges generated with
  some probability p0
 Hypothesis: Edges in communities
  generated with probability p1, other edges
  with probability p0.



                University of Southern California
                          April 6, 2010
Massively overlapping communities

 Are there a small number of massively
  overlapping communities that share a
  common core?
 Are there massively overlapping
  communities in which one can move from
  one community to a totally disjoint
  community?

               University of Southern California
                         April 6, 2010
Massively overlapping communities with a common core
                     University of Southern California
                               April 6, 2010
Massively overlapping communities
          University of Southern California
                    April 6, 2010
Clustering Social networks
Mishra, Schreiber, Stanton, and
Tarjan
 Each member of community is connected
  to a beta fraction of community
 No member outside the community is
  connected to more than an alpha fraction
  of the community
 Some connectivity constraint

               University of Southern California
                         April 6, 2010
In sparse graphs
 How do you find alpha – beta
  communities?
 What if each person in the community is
  connected to more members outside the
  community then inside?



               University of Southern California
                         April 6, 2010
       Transmission paths for
      viruses, flow of ideas, or
              influence


Sucheta Soundarajan

              University of Southern California
                        April 6, 2010
                                       Trivial model




Half of contacts                       Two third of contacts

            University of Southern California
                      April 6, 2010
                         Time of first item




Time of second item




                      University of Southern California
                                April 6, 2010
Theory to support
new directions
 Large graphs
 Spectral analysis
 High dimensions and dimension reduction
 Clustering
 Collaborative filtering
 Extracting signal from noise



                    University of Southern California
                              April 6, 2010
     Theory of Large Graphs

   Large graphs with billions of vertices
   Exact edges present not critical
   Invariant to small changes in definition
   Must be able to prove basic theorems




                   University of Southern California
                             April 6, 2010
                Erdös-Renyi
 n vertices
 each of n2 potential edges is present
  with independent probability
                                                        N pn (1-p)N-n
number                                                  n
   of
vertices


                 vertex degree
           binomial degree distribution
                    University of Southern California
                              April 6, 2010
University of Southern California
          April 6, 2010
  Generative models for graphs

 Vertices and edges added at each unit of time
 Rule to determine where to place edges
    Uniform probability
    Preferential attachment                 - gives rise to power
     law degree distributions



                     University of Southern California
                               April 6, 2010
           Preferential attachment gives
           rise to the power law degree
           distribution common in many
           graphs

Number
of
vertices




                University of Southern California   Vertex degree
                          April 6, 2010
                   Protein interactions
   2730 proteins in data base
   3602 interactions between proteins
SIZE OF    1 2    3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 1000
COMPONENT
NUMBER OF  48 179 50 25 14 6 4 6 1 1 1 0 0 0 0 1       0
COMPONENTS

     Only 899 proteins in components, where are 1851
     missing proteins?

 Science 1999 July 30; 285:751-753
                               University of Southern California
                                         April 6, 2010
                  Protein interactions
  2730 proteins in data base
  3602 interactions between proteins


SIZE OF    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 1851
COMPONENT
NUMBER OF 48 179 50 25 14 6 4 6 1 1 1 0 0 0 0 1     1
COMPONENTS

Science 1999 July 30; 285:751-753

                              University of Southern California
                                        April 6, 2010
        Giant Component



1.Create n isolated vertices
2.Add Edges randomly one by one
3.Compute number of connected
  components
              University of Southern California
                        April 6, 2010
            Giant Component
1
1000

1      2
998    1

1      2    3    4        5        6        7       8     9   10   11
548    89   28   14       9        5        3       1     1   1    1


                      University of Southern California
                                April 6, 2010
           Giant Component

1     2    3        4              5                6    7    8
367   70   24       12             9                3    2    2

9     10   12       13             14               20   55   101
2     2    1        2              2                1    1    1




                University of Southern California
                          April 6, 2010
                Giant Component

1     2    3     4         5           6            7     8    9     10

252   39   13    6         3           6            2     1    1     0


11    12   13    14        15          16           17    18   •••   514
1     0    0     0         0           0            0     0    0     1

                      University of Southern California
                                April 6, 2010
          Science base
 What   do we mean by science base?

  Example:   High dimensions




                University of Southern California
                          April 6, 2010
High dimension is
fundamentally different from 2
or 3 dimensional space




           University of Southern California
                     April 6, 2010
High dimensional data is
inherently unstable
   Given n random points in d dimensional
    space essentially all n2 distances are
    equal.

                       d
                 2                                       2
         x   y                   xi           yi
                       i 1


                     University of Southern California
                               April 6, 2010
High Dimensions

Intuition from two and three dimensions not valid for high
dimension




        Volume of cube is               Volume of
        one in all                      sphere goes to
        dimensions                      zero

                    University of Southern California
                              April 6, 2010
                             2           2
                         1           1       1
                                                  0.707
                         2           2        2




                                 1




                                                   Unit sphere



                                 Unit square


  2 Dimensions
University of Southern California
          April 6, 2010
                                        2       2       2       2
                                    1       1       1       1
                                                                    1
                                    2       2       2       2




  4 Dimensions


University of Southern California
          April 6, 2010
                                          2
                                      1       d
                                    d
                                      2       2



                              1




  d Dimensions


University of Southern California
          April 6, 2010
Almost all area of the unit cube is
outside the unit sphere




             University of Southern California
                       April 6, 2010
Gaussian distribution




       Probability mass concentrated
       between dotted lines
              University of Southern California
                        April 6, 2010
Gaussian in high dimensions


                             √d




                             3



           University of Southern California
                     April 6, 2010
Two Gaussians


                         √d                  3




         University of Southern California
                   April 6, 2010
University of Southern California
          April 6, 2010
Distance between two random
points from same Gaussian

   Points on thin annulus of radius                   d

   Approximate by sphere of radius                        d

    Average distance between two points is 2d
    (Place one pt at N. Pole other at random. Almost surely
     second point near the equator.)


                       University of Southern California
                                 April 6, 2010
University of Southern California
          April 6, 2010
                                    2d
 d


                       d




University of Southern California
          April 6, 2010
Expected distance between pts from
two Gaussians separated by δ

                                                 2d



                      2
                              2d


             University of Southern California
                       April 6, 2010
Can separate points from two
Gaussians if
        2
                2d          2d

                     1 2
       2d 1          2 2d                2d

            2
   1
   2    2d

                            1
                2    2d     4




                                 University of Southern California
                                           April 6, 2010
   We have just seen what a science base
    for high dimensional data might look like.

   What other areas do we need to develop a
    science base for?




                  University of Southern California
                            April 6, 2010
 Ranking is important
   Restaurants, movies, books, web pages
   Multi billion dollar industry
 Collaborative filtering
   When a customer buys a product what else is
    he likely to buy
 Dimension reduction
 Extracting information from large data
  sources
 Social networks

                    University of Southern California
                              April 6, 2010
Conclusions
   We are in an exciting time of change.

 Information technology is a big driver of that change.

   The computer science theory needs to be developed to
    support this information age.




                       University of Southern California
                                 April 6, 2010

								
To top