weigend_stanford201119_20110531doc - Andreas Weigend.doc by yan198555


                                            Spring 2011
                                         Stanford University
                                      Andreas S. Weigend, Ph.D.
                                     The Social Data Revolution:
                                Data Mining and Electronic Commerce
Andreas Weigend (www.weigend.com)

The Social Data Revolution: Data Mining and Electronic Commerce:
MS&E 237, Stanford University

May 31, 2011

Class 19

This transcript:

Corresponding audio file:

To see the whole series: Containing folder:


                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
Andreas:        Welcome to class 19, our final class of this year’s Social Data Revolution. Here’s the
                agenda for today. I will talk about why we have the Social Data Revolution, and that is
                because our communication has changed. Then I’ll review a couple of building blocks:
                individuals, you know my love for individuals, weaving them together -- the social fabric,
                and the underlying of all of that -- data, data people share, data that are produced as
                byproducts during communication; if you will, the digital exhaust.

                Then in half an hour we’ll do a little exercise. And then we’ll move up in the stack, to
                business, society, the world; we’ll ask some “what if” questions, wrap up, and then have
                some food and drinks outside.

                I want to start with a question. What actually do you see this decade is about? To frame
                it, I want to go back for the last 50 years. Fifty years ago people learned to build
                computers. That was the ‘70s. The ‘70s was about building computers. The ‘80s people
                learned to connect computers. The ‘90s people learned to connect pages. That’s when
                the web came into being. The 2000s people learned to connect people. Think about
                Facebook. My first question to you in this last class is what do you see for this decade?

Audience:       Sensors.

Andreas:        Sensors -- there are about 35 billion connected sensors, connected devices on the
                internet as we speak, so about an average five sensors per person. But is it the sensors
                or something else connecting them?

Audience:       When you connect the data to start to make sense of what’s going on.

Andreas:        Making sense of the sensors, right.

Audience:       Data driven business decisions.

Andreas:        Yes, instrumenting businesses.

Audience:       I like the sensors idea but I would go one step further and say that it’s not just sensors
                but it’s mobile, ubiquitous computing in terms of those sensors traveling with people at all
                times. That changes the nature of what you can do with that data.

Andreas:        I would like to argue it’s not that smart phones are smart, but smart phones are smart
                because the cloud is smart, because of all the data they create. In that sense, I think all
                of your answers pointed towards what really is smart is what’s happening in the
                connection between those data.

                You remember that the amount of data people create this year is more than the entirety
                of mankind has created until the end of last year. That’s a pretty amazing fact. In this
                quarter we tried to make sense out of that.

                For me another important aspect is how can we use all that stuff, the data emerging from
                sensors, from people, to actually help people collaborate better, work together in a better
                way, create more easily, co-create things which we couldn’t think about. Wikipedia being
                one of the earliest examples, something that shouldn’t work because why should
                anybody spend their time editing other peoples’ crap? But it does. We looked into Quora
                as another example which does work.

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
                For me, the collaboration aspect is just as important as having sensors that talk to each
                other. Why collaboration? Because ultimately it’s about behavior change. But I’m getting
                ahead of myself.

                I want to talk about perspectives of communication. Just to remind you EEs see the
                world that the purpose of communication is to transmit information. I don’t think so. I
                think information is just an excuse for communication. For people [00:05:06.8] out with
                each other, like monkeys looking for lice in each other’s hair.

                When I’m talking about communication now, I want to talk about a number of dimensions
                and I’ll do this relatively fast. First, there is structured versus unstructured
                communication. For instance Twitter, where does Twitter fall here? Kind of unstructured,
                but in order for people to find stuff, in order for people to be found, the idea is to have
                things like hash tags, to try to have some ad hoc structures which actually make it easier
                to get through the information deluge.

                Think about relevance versus chronology. Traditionally search engines went for what?
                For relevance because people really didn’t have totally real-time data. Groups worked
                very hard to give you the most relevant result for something. Then Twitter came along
                and suddenly people had the real-time web.

                Now that tradeoff, which is one of the terms you’ll hear a bunch of times in class today,
                that tradeoff between relevance versus recency is something we reflect back to the user,
                that for instance in Google or in Bing you can actually set how far back you want to go
                back into the past, and if you just want to see results from today then you don’t expect
                them to be as authoritative as when you go back 10 years. But of course, they’re more

                Another dimension of communication is synchronous versus asynchronous. A phone
                call, where does that fall? Synchronous. Usually you talk to somebody and expect an
                answer within a few seconds, unless you’re driving on 280 and the connection gets lost.
                Give me some examples of asynchronous communication.

Audience:       Mail.

Andreas:        Email.

Audience:       Facebook.

Andreas:        Facebook. The point I want to make is these are traditional ends of the spectrum. If you
                think about it, those ends of the spectrum come closer and closer together; if somebody
                sends me an email, they have some expectation of my response, which is probably not
                infinity but maybe within a day. Those expectations actually shift.

                Last year, I did a survey both in China and here. We asked people “What’s your
                expectation to get an answer to an email from a friend, a company, or a person you don’t
                know?” There’s a big difference between the U.S. and China. Surprisingly, what would
                you guess? What do you think is the expectation of people in China to get a response
                from individuals and companies, versus the U.S.? In other words, if you look at the social

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
                component, that people actually know each other, do you think it’s stronger in email in
                China or the United States?

Audience:       I would guess the expectation in China to get an answer from a company it’s longer than
                the expectation of an American to get an answer from a company.

Andreas:        Surprisingly, actually the Chinese have higher expectations to get something answered
                from a company than the Americans. We have come a long way in desensitizing our
                esteemed customers to expect anything from our customer service.

                Another couple of dimensions this year; searchability -- do you expect that what you put
                up is searchable and the communication is there forever? Legal aspects about
                discoverability or at least can you find it versus can you find it in principle? Time scales,
                we talked about response times and also decay times of knowledge, etc.

                Those are a few dimensions and there are many more. What I want us to ask here is
                how has the basically zero cost of communication, of reaching the world through Twitter
                for instance, how has that changed our behavior? Has it? Has your communication
                behavior been affected by being able to create and distribute information? You couldn’t
                potentially reach a billion people 20 years ago. Now you potentially can. Any effect on
                your communication behavior?

Audience:       I think just for social, it makes things so much easier… a party and it takes you 10
                minutes and you can invite 200 people whereas before you would have to call every
                single friend and say meet me … at 4:00 on Sunday. It makes things much faster and

Andreas:        It’s really the collaboration aspect here.

Audience:       Everything’s a prototype. Because it costs essentially nothing to put it up, you’re less …
                about what you are putting out there in a certain way. If something’s wrong you can
                easily change it.

Andreas:        Experimentation 24/7.

Audience:       I think it does cost something for you to put something on Twitter or Facebook, called
                social capital. The more you tweet, the more you update on Facebook, the less likely I
                think people will value it.

Andreas:        It depends on what you tweet. If you say really cool stuff, then probably your social
                capital increases. If you just interrupt people without giving them anything, crying wolf,
                then probably your probability of affecting them with what you’re saying decreases.

                Tim, that was an interesting point, actually allowing anybody to basically do experiments
                and that wouldn’t be possible without the communication aspects. The communication
                aspect is one. Let’s put this together with the sensors. That means we are now actually
                in a world which we instrumented, meaning we have sensors everywhere, and those
                sensors basically tells whoever wants to listen to them more or less in real time what’s

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
                Social media versus social data, social media people listen to what users are saying.
                Social data, people watch what people are doing. Revealed preferences versus merely
                stated preferences is actually why I am a big fan of social data, the data people create
                and share, and not just social media which we characterized before as taking old content
                and pushing it through new pipes.

                In all of this, the starting point for me is the individual. Before getting to the building
                blocks, I want to spend a couple of minutes to tell you how I think about learning. For
                me, learning means to be able to make distinctions. For instance, the distinction between
                implicit data, data you observe, and explicit data people knowingly and willingly

                The distinction when we talk about influence, about the attributes of a person or node, or
                the attributes of an arc, or an edge, or a connection. It’s a very important distinction. We
                saw that both of them are used very differently in marketing.

                Symmetrical relationships versus asymmetrical relationships, exemplified by Twitter,
                which is an asymmetrical relationship. I can follow Jason without Jason following me,
                versus Facebook in its canonical way, which is symmetrical relationships. He has to
                confirm my friend request.

                Any other distinctions you felt were important this quarter, things you didn’t distinguish
                before but then realized that these are actually interesting distinctions? For instance
                about data companies, let me give you one example.

                There are data companies that are essentially tech data companies, companies that
                scrape stuff like Rapleaf. It’s a data tech company where the consumer really has no
                view but they have a bunch of engineers that write code that goes and grabs stuff. That’s
                a data company. Compare that to an explicit data company which makes it easy for
                people to create and share data. What would be an example of an explicit data

Audience:       Reviews.

Andreas:        Review sites, Amazon, Facebook, Yelp. That means if you build an implicit data
                company you have to have a very different strategy. You spend your money very
                differently than if you have an explicit data company. Let’s think about some of the CEOs
                we saw. What about Skout? Where does Skout fall?

Audience:       It’s more implicit.

Andreas:        Actually I would say more of an explicit company. Why would you say more of an implicit

Audience:       The data of the third party that I can read from is not explicit because people aren’t … by
                trends I can see where people are going but it doesn’t tell me this person is going,
                whereas with Yelp it says this food is great. I find out that bar is a good place by gleaning
                traffic there where people are meeting up and stuff, so it feel more that I have to take one
                step of analysis to really find out the value of the data.

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
Andreas:        Good point. All these distinctions tend to be end of the spectrum. In this case you’re
                actually right; the implicit component of Skout compared to old-style Match.com where
                you essentially fire off database queries where you put in gender, hair color and stuff like
                this, and then you get something back. So it definitely has an implicit component;
                primarily the distance component between two people has entered there.

                Think about Thompson Reuters. Where do they fall? It’s mainly an explicit company.
                Both the Thompson part, people get paid to actually create and curate data but also the
                Reuters part where people around the world get paid to create and curate news -- versus
                Wikinvest, Mike Sha’s company. It’s a technology play. People don’t enter stuff. They
                enter once, their user name for their brokerage and their password. Then the rest
                happens by itself.

                The horizontal layer we have, those distinctions for data, we can also make another
                horizontal layer of marketing. Marketing basically has not only changed how we observe
                people making decisions, by having very different access to help you make decisions, but
                the main information of the way people make decisions has also changed. It’s no longer
                the glossy brochures that influences people primarily but it’s the social data aspect, the
                Yelp aspect and the social recommendations, much more than just looking up specs on
                the manufacturer’s website.

                Why should we care about all these things? Why should we care about these
                distinctions? Do you care? Maybe you don’t. I think you should care about them
                because only if you know what these axes are, those dimensions, you can actually
                understand what tradeoffs to take, whether it’s in your private lives -- privacy being one
                example; or whether it’s in how you want to run a company or what the company actually
                does in terms of its data strategy.

                There are many more distinctions and distinctions are actually kind of boring to learn but
                they’re super useful to have down the road as your map, space, or axes in the space so
                you can put things down and make predictions because you know not just the data points
                but you know what’s in the vicinity, the neighborhood of certain data points.

                Building blocks, our next point, let’s start with the individual. We spent a lot of time in this
                class talking about individuals. Any things you remember as interesting in our
                conversation about individuals, digital identity? Think about zapped coming in your class,
                Evan McMillen. Let’s hear some of the underlying dimensions or distinctions you

Audience:       How the location of someone can be used to described much about him.

Andreas:        Location, which basically was nonexistent as data about people 10 years ago, now being
                maybe the most important feature about us. If we know how we move every day, every
                night; if we know how we move today compared to the other days, we actually know a lot
                about the individual.

                We actually had good conversations about identity. For instance, what does it really
                mean to have an identity? Is it that I’m just my tweets? My Facebook updates? Is there
                more to Andreas than his tweets, his carefully constructed identity? What do you think?

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
Audience:       I think where we find you is not yourself and what you put out there but more your
                connections… all the connections around and that’s what is really interesting to study.

Andreas:        … story that it’s compared to faking my Facebook circle of friends pretty easy to fake a
                passport. As we said if you or I go to China, all white guys look the same, versus if you
                grab my Facebook ID, that wouldn’t last very long before somebody busts you. Here’s
                another one, when you make a restaurant reservation, what name do you give?

Audience:       First name.

Andreas:        Why? Why do you say it’s for John? Then you arrive and say this would be the
                reservation for John for 7:00 p.m.

Audience:       Why would you lie about it?

Andreas:        Why would you want the restaurant to actually have a persistent history about you? It’s a
                tradeoff you can make. A friend of mine, Dave Holtzman, who wrote a book called
                Privacy Lost used to be a spy for the Americans in Russia during the Cold War. Dave
                Holtzman never gives his real name. It’s not …, just to be clear, different people. He
                always says some random name, and as long as that transaction is completed he’s good.

                It’s worth understanding where we need to have persistent identity, and when we don’t
                need it. Should we just nuke it and have random tokens which we exchange and after
                the transaction has been completed we are done?

                Here’s another thought. Aging data -- historically we are used to the fact that paper ages.
                You know the rare books collection in the library? They are careful about what light they
                put out there because they’re worried they will deteriorate over the decades or centuries.
                Bits don’t age.

                Do we need to invent some aging of bits? The blog post Sammy and Jason did, they
                were thinking about taking norms of the offline world, which they funnily call the real
                world, and putting them into the online world. Should we age our bits? Like every year
                10% of the bits get randomly flipped? Which of these concepts actually we use here?

Audience:       As more and more data gets created over the years, it’s going to be difficult to assess
                what is relevant in time, separating it, or analyzing how data has changed over time
                specifically. In that sense it might be worthwhile for the actual server to have a data … or
                something like that… age the data.

Andreas:        What is our expectation? Do we expect to actually be able to find something like …
                asked me whether I could find an mp3 of an workshop I did at a WPP conference in
                Athens last year, where we did a workshop making predictions for 2020. I know I have
                the mp3 somewhere but it kind of ages by just having mp3 produced at the rate of a few
                hours a week. It’s very difficult to find stuff, so there is sort of a natural aging, it’s on my
                desk at home where things just disappear.

                There are many data sources and I believe space, geolocation and time are probably, as
                far as people go, the most important and pretty underestimated criteria of organizing
                social data. I don’t know how you retrieve stuff, but I typically remember it by where I
                was, like at this conference in Athens; when it was, maybe I look up my calendar and

                             Spring 2011
                          Stanford University
                       Andreas S. Weigend, Ph.D.
                      The Social Data Revolution:
                 Data Mining and Electronic Commerce
    then maybe if it was all indexed with geolocation like all my mp3 files, I would have a
    much easier time finding them if they had extra files in there than if it is just by a file

    Other examples of data sources by people: mobile, geolocation, photos, dating -- who
    you click on or don’t. Think about the data people create in Craig’s List. The attention
    stream, like Jason now texting somebody because it’s hard to keep your attention. The
    intention stream where you search something on Google or your favorite search engine.
    There is actually so much going on in terms of these data we create.

    Where did you come out with the question of whether you’d be more comfortable if all
    your geolocation, all your searches, all your header information about your
    communication was shared with the rest of the world? It doesn’t matter where you come
    out on this but it’s actually worth thinking about how different the discomforts we feel are
    in those three cases.

    From a company perspective, and I was talking to Alex in the coffee house a couple of
    days ago; he said that he was so surprised that Amazon.com was tracking all those clicks
    from people from the very beginning. Anybody else surprised about that, at the
    beginning of class about that? Even Aldo is nodding. Are you still surprised about that?
    Well trained, everybody is shaking their heads.

    It wasn’t just that Amazon was tracking it but remember the distinctions between a short
    click and long click? A click is not a click is not a click, but if you click on something and
    return after a second or two, then what you showed was not what the user wanted. You
    wouldn’t have gone back, otherwise he would have studied what you presented to him. If
    on the other hand it’s a long click and he starts reading it, maybe even converts, buys
    that item, then you know you want to value what you presented to them very differently.

    In terms of nodes, remember you were surprised when I told you that I created 500
    attributes for each customer at Amazon.com? Are you still surprised? Kind of. It’s
    interesting what you find, like whatever card they first use actually has an effect. Amex
    people tend to become better customers than people who use a Visa. That’s all about

    What I want to make sure you understand is the one point on how to characterize people
    from all those clicks. Remember, there are good things and bad things in the world.
    There are expected and unexpected things. An example for a good expected thing is
    your paycheck. An example of a good unexpected thing is your IPO. Maybe your rent
    falls here, and having an accident is an unexpected bad thing.

    Finding underlying attributes about people, this level of generality is actually a very
    important way of trying to build user interfaces. Let me give you an example. You know
    from Danny Kahneman and Amos Tversky that on average a dollar lost hurts you twice
    as much as you get happy if somebody gives you a dollar. There are all kinds of tests
    they do but it makes you twice as unhappy if somebody steals a dollar or if you lose a
    dollar than if you get a dollar.

    Overall, cultures and individuals -- huge interpersonal differences. I believe that how we
    look at the unexpected part is even more important. In the Social Data Summit when we

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
                talked about insurance, when we actually really deconstructed what insurance in a day
                where we know so much data about people, what insurance actually will -- can -- could
                mean, and cannot mean in the future; that is really talking about this cell here.

                I think in the valley most of us ignore that cell. We work super hard for this cell, that IPO,
                or that sale for a few hundred million dollars. But maybe in the Midwest they worry more
                about managing the downside as opposed to managing the upside. Trying to collect the
                data we have about people and coming up with a low dimension representation which will
                influence what products we offer to them is one of the things a number of companies are
                working on. That is it for what we have about individuals. Now social fabric.

Audience:       This may be off topic but I was thinking about the negative unexpected things happening.
                You used the example of passports a lot, and looking at the future I could see a lot of
                passports being exchanged by … fingerprint on anything, and peoples’ identities can be
                erased because there’s no trace of anything once we convert everything into data versus
                having ….

Andreas:        What do others think about this? You’re thinking peoples’ identity can be erased.

Audience:       As long as I have friends and companies recognize me then I haven’t been completely

Andreas:        Yes, what other opinions do we have here? Do we think that in this world where if I say
                this neutrally -- we have lots and lots of distributed information about us in the cloud, is it
                easier to let somebody disappear, to erase somebody, or is it harder?

Audience:       I think obviously the personal relationships you have, unless we get into the matrix you
                can trust peoples’ memories can’t be erased. I agree with Yusef that it’s going to be
                much easier because if all your music, your passport, even your phone contacts; they
                were on your phone. Now they’re up in the cloud somewhere.

Andreas:        So it’s more difficult to erase things?

Audience:       No it will be easier to erase things that you own whether it be music, a passport, physical
                things, but your actual identity is going to be you and your friends know.

Audience:       I can disagree because what we said before that we don’t own the data anymore that it’s
                somewhere on the cloud. Facebook might own it and … many more companies… just
                once… if you … erase everything you need to go to all those places and I think it takes
                more than just having everything in the same place.

Andreas:        That was a very good point you made because it really means -- sure you can erase one
                copy somewhere but it becoming essentially impossible to erase the tracks of a person
                the way you could do it before by paying somebody off in your hometown to remove your
                name from the birth registry.

Audience:       What about changing the data you have? For example you were talking about if
                someone wasn’t given entry to the U.S. because of Amazon. Let’s say you were … and
                you want to enter the country and they say -- there are no passports, but you just give
                them a number or something. They say we look at the system and you’ve done this, so
                we’re not going to let you in. I feel it’s a lot harder to argue and make a case.

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
Andreas:        Good points, but we today, having the last class, have no buffer at the end. Why don’t
                we take this conversation for our drinks afterwards. Is that okay? I do want to talk in this
                part of data about the business models, and remind you of this beautiful distinction Jay
                Nath made, the Head of Innovation of the City of San Francisco.

                People used to pay for data. Official airline guides being one example, or Axiom, lots of
                examples. Now by liberating data, we open up the world to come up with very different,
                new business models. ParkSF being an example, where people build apps on top of the
                free data, the data we pay for with our tax money, and there’s innovation which is so
                much faster than what we could have ever seen before.

                Another example here was West Law, part of Thompson Reuters, where they went into
                the courthouses and tried to create electronic records of all these cases. That’s how they
                made their money, you subscribed to access to the data. The question is when the
                courts will make all those data publically available, what will the company do that will
                actually create true value for the consumer beyond the value of access to the data?

                I now want to have a short exercise where you get to talk to your neighbors, for five
                minutes. I want to frame it this way. We have seen a whole bunch of trends that were
                contrasts of the past to the present. Like we went from a transaction economy to a
                relationship economy. I want you to talk to your neighbor and each pair to come up with
                one of those trends which is enabled through social data. One of those trends which
                might be relevant for business. To put something in your minds, remember the
                conversation we had about the DNA of companies, that companies have very different
                attitude toward their customers’ data, and how that is reflected in PE ratio. A short
                exercise, come up with one trend you see right now that you can express. We’ll go from
                X to Y, like from transaction economy to a relationship economy.

                Let me give you another couple of examples. You probably came to the class thinking
                that I’ll emphasize the importance of analyzing the data and you’re probably leaving the
                class after the quarter realizing it’s not so much about analyzing data but about creating
                and distributing data. In a data poor world, where the bottleneck was how do we get
                data, indeed it was about analyzing, trying to wrench the last bit out of the data. In a data
                rich world, it’s creating incentives for people to provide you data by giving them value

                I want people to talk for a couple of minutes and to come up with one thing where you
                think the world has changed, in some way related to data. I have my list here but I want
                you to think about that. Okay. Let’s hear a few. I want to have arrows between -- like
                from transactions to relationships. Can I hear pairs of words?

Audience:       Data asymmetry to data symmetry. Before where you had industries that revolved
                around you knowing nothing and the sales person knowing everything, it moved to where
                now everybody can know everything because they can share with each other.

Audience:       Wikinvest

Andreas:        Wikinvest. Used car salesman, all of that stuff, airline pricing. Good one.

Audience:       In terms of data retrieval, from push to pull.

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
Andreas:        Not only for data retrieval but think about marketing also.

Audience:       Could you explain that one?

Audience:       In the past you read about news that are printed on the newspaper so people select for
                you and now you can pull information that you are interested in.

Audience:       Order of none to order of one.

Andreas:        Wow.

Audience:       … like five years before when I go to a business and … I literally … didn’t get it. …. Right
                now I am valued as a customer because I can go and … and that data is going to
                persistent throughout. Now every single time I go that person is aware, the seller is aware
                that I could … selling experience, the seller … money, but don’t give me the negative
                rating. This was not possible in ….

Andreas:        Great, and a wonderful way of framing it. See, if you’re a second year business school
                student, you know you’re really crisp at putting things nicely. It reminds me I live in The
                Castro and there’s this Indian place on Market Street. I only knew them from delivery.
                And I was walking by, by chance, and then … said that’s the place with the great tandury
                chicken, so I decided I’d go in there and tell the guy what great chicken he makes
                because I really love their chicken.

                I go in and say I really love the food you deliver. He said why are you telling me? You
                should tweet about it. It’s amazing the shop owner saying why are you telling me. Tell
                Twitter. The power of none, the power of one.

Audience:       In terms of just the sheer amount of data going from essentially too little to potentially
                data overload, so the implications as you mentioned before, in the past you’re struggling
                to find ways to generate data and trying to grab data from wherever you can. Now it’s
                you have so much data you’re generating that the problem is separating the noise from
                what’s important so there’s implications for your feeds and how you have to digest
                information, how you analyze it in terms of figuring out what’s important versus what’s not
                because there’s so much data out there. The relevancy becomes such an issue.

Andreas:        Whenever we look at something and say gee, look how it’s changed, it’s also important to
                have a column where we write stuff that hasn’t changed. I think our “Oh my God, this
                information overload” thing goes back to probably when we got out of caves for the first
                time, or maybe we were still living in caves. It think it’s very interesting that there are
                papers in the ‘50s by this guy who wrote a book on The Attention Economy, Gold Harbor
                (ph.), he wrote a paper in the ‘50s before we were born actually talking about information
                overload. I think gee, what did people have in those days?

Audience:       I was going to go on the previous point.

Andreas:        Let’s have another couple of pairs from your discussion.

Audience:       I don’t know how to say this in succinct words but something like no strings attached to
                strings always attached.

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
Andreas:        Good one too. NSA -- what’s funny about it, Isn’t that what National Security Agency
                stands for? NSA to no such thing as no strings attached. Is that what you mean?

Audience:       … a pack of cigarettes, you’ll forever be bombarded with ….

Andreas:        Yeah.

Audience:       Unstructured to structured.

Andreas:        Or structured to unstructured, which I think is more. The amount of data we create, the
                relevant amount of unstructured data certainly increases.

Audience:       … APIs that …

Andreas:        It’s a good question. I think you can frame it -- API is a good one. I was just thinking that
                the last time I had to enter things in a structured way in a form, like the National Science
                Foundation has when you apply for a grant there’s literally a booklet of 50 pages. Please
                look up all the numbers that describe the fields of interest to you -- flipping through 50
                pages, nobody does that anymore.

Audience:       I’m not sure if it’s a good one, but in terms of algorithms, it changes from average
                algorithms to really smart ones because the amount of data we have is increasing

Andreas:        Let me play devil’s advocate. Say the opposite. The more data you have, the dumber
                your algorithms can be. But the algorithm takes more data sources into account, for
                instance recommendations might now be looking at the situation you’re in, your heart
                rate, the air quality in the room, how many people are here. I think if you have this notion
                of a box being the algorithm and data going in, and recommendations coming out, I think
                the better data you have the more data you have, the dumber the box can be. I see your
                point as well.

Audience:       Going from machine to human learning. Before it focused on the algorithms … to human
                learning… data… visualization or it’s about augmenting learning where there are
                machine tools but you use that as an analysis layer and there’s a human behind doing

Andreas:        Yeah.

Audience:       We’re talking about reliability of data to unreliability as more data is generated and its
                application in the real world day-to-day. Going back to the early… if I called a friend on
                my landline and told him to meet me tomorrow at 10:00 at night, I know he’s going to be
                there because there’s no other communication. But now with mobile, the flake factor is
                higher. I can call and cancel. Of if I send out an invitation on Facebook, that RSVP
                doesn’t have the same value anymore because it’s so transient. The data is unreliable
                because of the amount of speed.

Andreas:        Sometimes I say we go from an accounting mindset where people focused on getting that
                one data point right like for the earnings announcement, to the social data mindset where
                it’s much more like a dialog, perspective, and co-creation, and you never know whether
                any individual data point is right.

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
                One point to business, the impact on organizations; that is that the weakness I see in
                companies I work with is they don’t spend enough time really understanding the
                tradeoffs, really understanding the metrics and seeing how certain actions are predicted
                to affect a set of metrics.

                For instance, Friendster had a model where they said if we change this, we expect the
                relative conversion rate to go up by 14%. And actually it went up by 20%. Is that a good
                thing or bad thing? Conversion means -- know what conversion’s right. The model
                predicts if I do this something goes up by 14%, it went up by 20%. Does that person
                expect to get a pat on the shoulder saying well done young man?

Audience:       It depends on the percent error on the 14% prediction. Is a big part chance or they don’t
                actually understand their inputs very well and that’s bad.

Andreas:        In principle, the guy did a bad job because his model wasn’t happening. He predicted
                something and something else was the case. The fact that what was the case tends to
                be better for the business is not really the evaluation criteria for him. Metrics, and as I
                mentioned to you I spent days with Jeff Bezos going through the company, metric-fying
                the company, instrumenting the company. It’s something which I can only really want
                you to know how important that is. Very few CEOs deeply understand the importance of

                Experiments make no sense if we don’t have a good set of metrics in place. We can play
                with stuff all day long. If we don’t actually learn from it, if we don’t know what’s
                happening, might as well forget about it. So metrics first. Zinga was the example I gave
                you in class, a company which is extremely well metric-fied. You see that in the market
                cap and valuation people put to it.

                Now we talked a lot about how social data has changed. The world basically. I know
                there are more but give me some examples where you think that data, social data has
                not changed the world. Where, what would be verticals? Remember my distinction
                between horizontal -- data being horizontal or marketing being horizontal versus
                verticals? Insurance is one vertical. Which verticals do you think haven’t been
                revolutionized by social data, maybe not yet?

Audience:       I think transportation hasn’t hasn’t changed much yet in terms of… cars communicate
                what’s happening outside.

Andreas:        Let’s call it mobility. We have a group of three people who work on mobility here and
                Sammy here has some contact.

Audience:       Healthcare as well.

Andreas:        I think healthcare --

Audience:       I guess what I’m trying to say is with … healthcare, the areas are more heavily regulated,
                are seeing a more slower ….

Andreas:        But I think in both cases, after having listened to me for 17 and three-quarters of lectures,
                you can see how social data will actually change those verticals. I’m very curious about
                what you guys are going to come up with in the mobility stuff. Healthcare, I think we all

                                         Spring 2011
                                      Stanford University
                                   Andreas S. Weigend, Ph.D.
                                  The Social Data Revolution:
                             Data Mining and Electronic Commerce
                understand that by just instrumenting our bodies, quantifying ourselves, we can see
                things that are going wrong much faster.

Audience:       The energy sector.

Audience:       Education.

Andreas:        Expand please.

Audience:       … identifying the learning experience in kindergarten or elementary school. There’s a lot
                of things you can do and get … into the system but we’re not there yet.

Andreas:        There are some places. [Kahn] Academy for instance, runs in the Los Galos or Los Altos
                School District not far away from here. Runs some experiments where each individual
                student in class is carefully monitored in how well he does in adding numbers. Then the
                teacher has a dashboard where now it gets recommended to the teacher to pair up
                student A with student B because A is kind of slow and B is already bored. Why don’t we
                have B actually explain to A, which is very difficult for me in a classroom here to know
                who is about to nod off and who is just thinking “what’s that guy talking about” and
                nodding off for that reason.

Audience:       The financial services sector… we should be able to either eliminate or make things…
                cyber trading if we use social data properly. Secondly, in research because we still
                depend on the traditional conference model of sharing our results, whereas it could be
                more streamlined if I had access to the lab ---

Andreas:        I completely agree. For finance, with Wikinvest, we actually saw in the other examples --
                Mint being an example, where I think finance is probably going to change. As you
                probably noticed at the dinner with our Fidelity Investments head of innovation last
                Thursday after the lab, those guys were scared about how can they still tell people “I
                really have that mutual fund for you. It’s just the perfect fund for you” if most people now
                know it’s not really the case. Finance I think is ready.

                Research, I believe that it’s a beautiful example of something which was built around
                constraints that have disappeared. Traditionally it was that a professor has to do a bunch
                of things because communication was expensive. He had to teach class. She had to
                collect data. They had to analyze the data. They had to grade, etc.

                I think in this world we’re living in, those things are no longer in a personal union. The
                German word is … are no longer unified in one person. But you could really think about
                having the grading done in India for -- I know Jason might be doing that. The collection
                of data should be decoupled from the analysis of data.

                In 1991, at the Santa Fe Institute, a think-tank in New Mexico, Neil Gershenfeld, then
                Junior Fellow at Harvard, now professor at MIT’s Media Lab, and I ran a competition. We
                put up data from different labs and let the world analyze those data. I think for me, I’m a
                big fan of breaking down or decomposing problems, of rewriting not only the equations of
                business but also the equations of research.

                I was going to run through a number of verticals like making money, certainly has been
                affected by social data. Think of Branchout, LinkedIn, how people find jobs. Think of

                                          Spring 2011
                                       Stanford University
                                    Andreas S. Weigend, Ph.D.
                                   The Social Data Revolution:
                              Data Mining and Electronic Commerce
                future of work. Spending money has been dramatically affected. Think of social
                recommendations. Spending attention, getting attention; getting love, giving love; creating
                stuff, having people collaborate, all those things I think we have reached a very different
                world from what we had a couple of years ago.

                What I want to spend the last couple of minutes on with all the thought experiments, what
                if all the data you created would be there forever, I have a simple question for you. Why
                are you still here? Why did you come to class and why are you still here? I delivered
                some of the people I promised in the first class. Remember I went through that list of
                speakers. By the way, who was your favorite speaker?

Audience:       “Vladimir!”

Andreas:        He was not on the list. I want to understand, and thank you for giving me the feedback at
                the last week. I looked through what you had. But I want to actually ask more about why
                you’re still here, besides it’s only now that the food arrived. Was it for the distinctions we
                made, was it that we thought about behavior change, was it because of my funny accent?
                Was it about the amazing people we had in class? That’s actually one major reason why
                I’m here, because of you. What made you not come in the first place, but made you stay
                through 18 classes?

Audience:       I’m personally still fascinated about the stuff that can be done with social data and it feels
                like I’m just scraping the surface of it. It seems like every time….

Audience:       This class … other classes I’ve taken have probably … it’s not so much that academic
                but giving me a view into ….

Andreas:        Somebody said in the feedback was the cool stuff about this class was that “it’s nice to
                learn about people working on industry right now rather than a decade ago.” By the way,
                it’s much easier to teach quantum mechanics or linear algebra, and I love quantum
                mechanics. I love linear algebra. I told you in the first class that that’s what most of you
                expect; that A builds on B builds on C or maybe the other way around. That is very
                difficult in a class where we really stretch about such interdisciplinary area of fields. My
                goal is that I wanted to convey to you some of the passion I have for that field, some of
                my love for ultimately sharing with you and having you realize just how an important part
                this is for our future here, what we’re doing. It’s not just about money, it’s about meaning.
                It’s sort of figuring out how technology actually enables and influences the world we live

Audience:       The most important thing is that it’s happening right now and … a lot of things that you
                learned, examples you gave that just happened. Right now it’s so exciting because you
                see things we talk about to be now in the next class. If something’s changed, you hear
                about new companies that do things completely differently and their new services. It’s
                the most exciting part.

Andreas:        I think ideas are not created in a vacuum. My goal here is I want to maximize what your
                potential really is. That was my reasons why I shared so many of the things I believe in
                with you. As you know, it’s more about sharing dimensions, sharing tradeoffs, and
                having your realize that some of these axes exist.

                                          Spring 2011
                                       Stanford University
                                    Andreas S. Weigend, Ph.D.
                                   The Social Data Revolution:
                              Data Mining and Electronic Commerce
                 Some of the things are non-negotiable, like the importance of metrics or experiments.
                 But how we feel about privacy, I certainly learned quite a bit from you Jason. In that
                 spirit, I hope I managed to inspire you a bit, that I managed to give you some ideas, that
                 maybe you’ll think a bit differently about the world you live in, the world we live in from
                 when we met at the beginning here, 19 classes ago. I hope you’ll go out and make the
                 world a better place. Let me know if I can help you in some way with my connections or
                 my knowledge or maybe just with drinks which are waiting for us downstairs. Thank you
                 very much for this course.



To top