weigend stanford ads Andreas S Weigend Ph D swine flu

Document Sample
weigend stanford ads Andreas S Weigend Ph D  swine flu Powered By Docstoc
					                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
Andreas Weigend (

Data Mining and Electronic Business: The Social Data Revolution

June 1, 2009

Class 8_1 Ads: (Part 1 of 2)

This transcript:

Corresponding audio file:

Next Transcript - 8_2 Summary: (Part 2 of 2):

To see the whole series: Containing folder:


Transcript by Tamara Bentzur,                                              Page 1
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
Andreas:           Ladies and gentlemen, welcome to the 8 class of Data Mining and E-Business, or the
                   Social Data Revolution here, at Stanford University, spring 2009. We have a big agenda
                   for today. I want to start by reflecting on what we did last week and taking stock of where
                   we are on homework.

                   I would like to start by asking the three of you who helped last week with the Twitter
                   Hack-a-thon on Tuesday evening, Chris Anderson, Emir [0:00:32.7 unclear] and
                   [0:00:33.8 unclear] to tell us what you learned, how the experience was and basically
                   discovering people on Twitter. What is happening there?

Chris:             We spent a lot of time helping people set up their Python installations, and to get all their
                   necessary libraries installed. We also covered some topics like using sets to cover
                   people who may be your friends but not someone else’s friends so you could use an
                   intersection to figure out what’s going on there.

Student:           That’s pretty much it.

Student:           It was good to get it up and running, from a technical standpoint. A lot of people were
                   having trouble, like Chris said, just getting the Python up and running and installing the

Chris:             We also covered a little bit about methods for collaborative filtering. There were some
                   listed on the wiki, like actual collaborative filtering, so you say people who follow things
                   like you; you should be following the same people, or using @ replies to figure out who
                   your friends are talking to. There were a couple of simple methods to figure out who
                   might be interesting to you, as a user.

Andreas:           Did you find any interesting solutions from the students? When you showed it to
                   your friends, did you discover that there were ways of discovering people that
                   worked better than others? What was the upshot of the Tuesday evening with
                   pizza and beer?

Chris:             I think we were just trying to get people up to speed. I think the most interesting solution
                   were a combination of more than one of them. We might look at people who your
                   friends follow but you don’t follow to see how many of those are also the ones that
                   your friends talk to. If you could combine two of those solutions then you are
                   going to get better results.

Andreas:           That’s actually one of the stories we’ve seen again, and again in class. It’s not
                   about the smart algorithms, often, but it’s about coming up with better ways of
                   using data and incentivizing people to actually create data about themselves. Is
                   there any student feedback? Have you actually shown this to ten of your friends, the ten
                   suggestions? I know it’s not due yet, but if you have some feedback it would be

0:02:50.4          Don’t start with it on Wednesday if it’s due on Thursday because it takes a while to
                   assemble those friends on Twitter, here.

Transcript by Tamara Bentzur,                                               Page 2

                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
Student:          I figured out the ones for myself. It actually worked reasonably well. I did what I was just
                  talking about, and it did uncover some people that I feel I should be following. There was
                  some success. I’m not sure if the rankings were exactly right, but there are a few in there
                  that I would definitely put in the top ten people I should follow.

Andreas:          One remark we saw in the last class was the difference between following
                  someone; there is basically no costs for the person who is being followed. It’s a
                  question of recommending, like recommending songs or stuff like this, versus the
                  Facebook or MySpace problem where if you actually make friends with somebody
                  and it’s mutually symmetrical, then there is a cost for the other party involved.
                  What is the objective function? What is the cost function for the entire network,
                  which in this case totally decomposes; people being followed is not that big a deal
                  for most of them.

Student:          I don’t really have that much else to add. Chris summed it up pretty well. It was pretty

Student:          The same for me, I think people exchange their friends, basically followed other people in
                  the class. That made it easier to get the required followers.

Andreas:          Let’s give them a hand for helping out in class. [Applauses]

                  The homework is due on Thursday and I hope it will be a good learning experience for
                  you to talk with your friends and figure out what works and what doesn’t work.

                  Ron, are you ready? Should we swap this? Matt, do you want to talk with us about
                  what’s happening with the Berkeley/Stanford contest and if you just use the mic.

Matt:             This is from the contest that we were doing earlier in the quarter. These are the current
                  stats for Saturday. As you can see; they’re still going pretty strongly. The highest page
                  views - the average over the last week was 315. That’s pretty good. There are definitely
                  interactions going on with Would You Rather; it seems they’re kind of spanking everyone.
                  They’re at 75, the next highest is 9.7. You can keep following it.

Andreas:          We should say hello to your Ustream followers here. Hello. We had the same problem
                  set at Stanford and Berkeley. We are trying to figure out who is able to have better
                  metrics of engagement. It’s not just about attention acquisition; it’s also about attention
                  retention. I love the rhythm of that, “attention retention.” The metrics we defined were
                  robust metrics that allow us to figure out how the intentional attention was going.

                  If I read this color coding right, the top one… eight. We are safe here, given that the reds
                  are Stanford.

Matt:             The reds are doing better than the blue, on the whole.


Transcript by Tamara Bentzur,                                              Page 3
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
Andreas:          That’s why you’re wearing a blue t-shirt. Are there any comments? This will be up and
                  as you know whoever is going to be first – on October 31, I’ll invite them for a nice dinner
                  with a couple of cool people. Keep up with the good work here. Thank you for putting
                  this up.

                  Ron Chung, it is about ten minutes into class today. Ron was so nice and has actually
                  set up the prediction market for us, in terms of background. Ron did his undergrad at that
                  blue school. He also did his grad work at that blue school over there but saw the light
                  and went to [0:07:01.5 unclear] in Singapore to do his MBA. He has a really cool startup,
                  about discovery stuff. You had a posting on the website, right?

Ron:              Right, for anyone who was interested in an internship to do some software development
                  work. We’re looking for some help.

Andreas:          He helped out a lot here. We have no TA in the class so he helped us out. He said the
                  prediction market is something that would be super at [0:07:30.5 unclear] and I’m turning
                  it over to you.

Ron:              I’m sure you guys have already done the prediction market homework so you are familiar
                  with the prediction markets and how you can use a lot of feedback from people to figure
                  out more information about what other people think. Hopefully, collectively, you can
                  come to a more probable answer. It’s not going to give you the exact answer all the time,
                  but hopefully something closer.

                  One of the things we wanted to do at 225 was to figure out how many people were in
                  class today, because one of the games that ended already was based on the number of
                  people in class. I guess if you don’t mind, I’ll do a quick count, or if someone could
                  volunteer to help me count. This way we can move forward.

Andreas:          Did you specify whether you were counting me, as well? People ask about the three
                  people for [0:08:18.5 unclear] at the back. It’s all tricky.

Ron:              Since I wasn’t explicit, I’m going to assume a generic “people.” Any person who is alive
                  and breathing will be counted. 39 – You can check the results in terms of who had stock
                  in that range. The payoffs will be paid out accordingly. Again, there is another game for
                  the number of computers in class, right after the break.

Andreas:          At 3:55, we’ll count the computers so you can still hook up some friends outside and
                  collect some computers or move some outside, or whatever.


Ron:              In general, the point of this exercise was to help you better understand how you
                  can use prediction markets to solve and provide answers to certain problems that
                  you might encounter, and then to have a good way to measure that response from
                  people, and to be able to test and reward them for helping out. It’s a lot about the
                  incentive systems.

Transcript by Tamara Bentzur,                                              Page 4
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
                  Do continue to play; we’re going to finish on June 10 and hopefully, throughout the
                  experience, you will figure out what tricks people do to manipulate the markets; that
                  happens in the stock market all the time; and you will get more familiar with rational
                  decision making. One of the things you will notice is that people are very overconfident.
                  They might think, “This will hit this problem,” and the vast majority will probably try a bit
                  high. Some of those affects may be things you notice in the prediction market as you

Andreas:          Are there any comments from the students about the prediction market, in terms of
                  framing it for class? Prediction markets are a way to elicit data from people and as you
                  know, trying to get more data is always a good game. With the prediction market, it’s
                  not that like MTurk; we collect these things independently but with the prediction
                  market there is a price. You interact with each other. It’s similar to what’s
                  happening to the wiki.

                  For me, that was an important realization that this price of discovery mechanism,
                  which is essentially happens in a prediction market, is yet another way of
                  understanding something about the future of finding out what people are thinking
                  and of having a predictive character. This is different from having a statistical
                  model of the number of people who were in the past classes.

                  The next item here is YOBO. I organized a dataset from a company in China, called
         YOBO is asking people – and they have millions of users by now – to answer
                  17 questions about their music behavior, their personality. Those questions are at the
                  bottom of the problem set at the bottom of the wiki page.

                  There is a belief that there is some predictive nature from understanding where in this 2
                  to the something X 3 to the something. I’ve forgotten; I think it’s 8 binary questions so it
                  would be 2 to the 8, times 3 to the 17-8. Those are basically edges or corners or points I
                  a space.

                  Each of you who have answered those 17 questions falls into one point. If those
                  questions are well chosen, then there is a mapping from those questions or points onto
                  songs that would be recommended.

                  I gave you a relatively manageable, small data set of the order of 100,000 people. There
                  are much more there, but in order to understand what the problem is you don’t need
                  millions of people. In that problem, you have for each user, the seventeen values i.e. the
                  seventeen answers to the questions. Then, you have the songs they liked. There are
                  also the songs they didn’t like the songs they skipped. Basically, you lump together
                  people who answered the same questions the same way and find a mapping, from those,
                  onto the songs.

0:12:54.0         One way to predict to the new user who is just arriving what songs he might be
                  interested in is to show him what other people, who live in this same seventeen
                  dimensional vector, what they’re interested in. The CEO of the company, Adam

Transcript by Tamara Bentzur,                                              Page 5
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  [0:13:10.6 unclear] calls this “music DNA,” as one way of looking at it. It’s very
                  different from what we have seen in the problem sets we had before. On the
                  Twitter recommendations we don’t have 17 standardized questions for people. We
                  don’t have a social graph or demographics.

                  On the Delicious recommendations, again, we don’t know anything. People don’t
                  fall into a space. We only have the space of the URLs. That’s the new feature
                  about this problem set, that people first do a survey.

                  What we did was split into training data and into testing data, 50/50. You build a model
                  on the first half of the data and then you evaluate the model on the second half of the
                  data. It is not easy to evaluate a model because the space is a very high dimensional
                  space. There are hundreds of thousands of songs. Because the space is high
                  dimensional in the attributes also, the probability that a few people will have exactly the
                  same songs is pretty small. That’s the difference between live data – if we actually run
                  the algorithm live and see how happy people are, are the songs presented being skipped
                  or do they actually listen to the songs; versus dead data like the Netflix contest and what
                  we’re doing in a problem set here, where we just give them what people have done in the

                  Here is the problem set, for those of you who read Chinese; these are originally Chinese
                  questions. Above them, and apparently with a good translation, is the English translation
                  of the questions. For instance, “I’m a talkative person,” etc. There were five steps that I
                  gave you in this problem set. It’s entirely voluntary because we have no TA to grade it for
                  you; however, Sean [0:15:09.4 unclear], who was a student I class last year, is currently
                  interning at YOBO in Beijing. He basically is available. He put his email on the wiki and
                  it was too early; I hoped to have him in on the phone, but given that it’s 5:30 in the
                  morning in Beijing right now; this was beyond what he was willing to do on a Tuesday

                  The idea is to find a mapping from those seventeen dimensionals space, into the songs;
                  have as a baseline whatever the people in the training have said about these songs, and
                  see where they can improve on that.

0:15:47.2         Those seventeen questions at the bottom came from some professor dude somewhere in
                  Texas, I think. He probably had his 5 Psych 101 students that he tried it on, maybe 100,
                  50, or 20. Now you have millions of people so it allows us to also go back to the space
                  and figure out if we can do better. For instance, some of these questions are kind of co-
                  linear, trying to go after the same thing. Should we drop one of the three questions that
                  try to go after the same thing, or should we combine them; how should we combine
                  them? Should we maybe do a principle component analysis of boiling the seventeen
                  dimensionals space down to maybe five dimensions? How do the [0:16:30.5 unclear]
                  values drop? How much variance do we capture? Or, is that not even the right thing, for
                  variables that are actually binary?

Transcript by Tamara Bentzur,                                              Page 6
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
                  Those are questions for you - optional dataset, voluntary – if you want to explore a bit
                  more. I would love to be an advisor to your work, would love to discuss it with you, and I
                  would like to set a time. Since we don’t have class next Monday, I could potentially do an
                  office hour on Monday for people who have questions about this or about other things, t
                  the usual time when we are normally in class. If that’s appropriate, to offer and office
                  hour during finals week, then I would like to do that.

                  Who would potentially be interested in coming? I will put it up on the wiki. If there are a
                  couple of people I will come down from San Francisco and be available for you.

                  Are there any other open items from your side, before we turn to the main attraction of
                  today? No questions? I’m disappointed that I have this page for you with wishes that I
                  made weeks ago, wishes by students. This page about wishes for course content,
                  guests etc., nobody writes anything there. [Laughter] What more do you want?

Student:          I had a question about the requirements about the survey.

Andreas:          We are doing so many interesting things here. Do you remember the survey we did? It
                  was 25 questions. We got about 106 responses which I had you look at. Some of them
                  were super interesting. I was actually very surprised about some of the answers. Some
                  people told me that the technical homework, the Python stuff, is too far away from them.
                  I don’t want to point anybody out here. That is totally understood. We have 10 people
                  from the graduate school of business, and that notion about programming Python didn’t
                  come as a requirement for them. I said for class “web programming,” but that’s okay.
                  Some people are very good. For business school people or other people who prefer to
                  not do the quantitative stuff, we said that rather than forcing you to do it; if you would like
                  to look at some of the groupings we have and Enrique put together the 25 questions into
                  5 sets of 5 each. This is set A, B, C, D and E.

                  I made three slots. For set A, which consists of the question, “How did you first hear
                  about the swine flu outbreak,” that was 20% through television, 40% through Internet,
                  and the rest was through friends, etc. The second one, the biggest surprise was
                  everyone on the social network - those are all about discovery in the social media space,
                  discovery in this day and age. I want those three people here, who signed up for this one
                  already to come up with a single, deep insight that we can look at.

0:20:48.2         You put your names down because I believe we don’t do things anonymously. People
                  should be willing to [0:20:56.6 unclear]. Enrique said we want to have one graphic
                  visualization. Please, no pie charts; we talked about that already. If you have some rich,
                  additional data, meta information, this is your space. A lot of people will be looking at
                  that. We have so much in that survey. Tease some stuff out. In order to make it
                  palatable, we decided to break it down into five sets of questions. I made three slots for
                  each of them because I didn’t want everybody to hop on the first one, and then the
                  second through fourth set would not be filled.

Transcript by Tamara Bentzur,                                              Page 7
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
Enrique:          Clarify… blogs, newspaper articles, something … public facing and accessible to the
                  general public….

Andreas:          Did this answer your question? Okay.

                  We are one minute early, but that’s okay. You can fill in an extra minute, or I can give
                  you a long introduction. It’s a pleasure for me to welcome John Carnahan. John goes
                  where the data is, particularly where the untapped data is. He did his PhD at UCLA in
                  the 1990s, in population genetics, which was really cool then. He then worked at the
                  Santa Fe Institute. After that, he went to Idea Lab, where he worked for Go2
                  then became Overture, by a change of name.

                  I remember, when I was teaching at NYU twelve years ago, I read about Go2 and I
                  realized paying for placement is an amazing idea. That was an amazing idea when it first
                  came out. Overture was acquired by Yahoo and he stayed in the Yahoo Research Lab in
                  the LA area. At some stage, I’m not sure why, the LA area research lab disintegrated.
                  Yahoo people left and somebody from MySpace, which became Fox Interactive and then
                  part of Newscorp – I just happened to sit next to Rupert Murdock at the conference this
                  week. I’m not sure how that story went. He had a lot of untapped data. That was way
                  before Facebook came along.

                  Somebody approached John and said, “Can you help us out here? Is there some way
                  we could sell stuff to people, market to people, get ads to people to make us some
                  money?” He’s modest, but everybody in the world says he invented hyper targeting and
                  that’s one of the things I want him to share with you, today.

                  Let’s welcome John. [Applause]

John:             That’s a great introduction. Talking a bit about my experiences, but I want to leave it
                  open for questions that you have throughout the talk; feel free to ask them at any time. I
                  don’t think there is anything truly not very public in this world, so most of you probably
                  know a lot of this, already. If there are any intriguing questions about the history of this
                  stuff – there is a lot of money tied to this.

                  When he mentioned paying for placement, it pretty much kept the web alive after
                  the failure of the dot com back in the turn of the century. Feel free to ask any
                  questions you would like. I have a pretty intimate knowledge of the history of it.

                  He mentioned my background so I won’t go into this at all. As he said, it’s definitely
                  following the data. The one thing he did miss is that I actually was part of the whole dot
                  com fiasco, for a good three or four years. The way I like to phrase it is selling picks and
                  shovels with a gold brush. I was engineering at that time, so I was building stuff as
                  people were pouring money into my little group. We were building stuff and hopefully
                  they made money off of it. Most of them failed, but that didn’t bother me too much.

0:25:45.8         Online advertising – again, I don’t have to tell you too much about this but I wanted
                  to go into what I consider to be four types of online advertising. The first one at

Transcript by Tamara Bentzur,                                              Page 8
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  the top left, it’s search advertising pay for placement. This is a screen shot of
                  Google. If you type in the search term “LCD TV” you can see that most of it is related to

                  There is content match, which is roughly the same thing, except there is targeting
                  that you are looking at, which is context…. In this article, it’s how the Samsung LED
                  TV is different from LCD TV. I was actually interested in that, so I looked it up. It turns
                  out there is nothing that different. You can see the bold add there, “Which LCD TV
                  should I buy?”

                  Behavioral targeting - I don’t know how many people know about that, behavioral
                  targeting. There are a lot of companies out there that will gather information about
                  a user when they’re far down in the purchase funnel. That means once you’re right
                  about ready to buy something, they’re going to record some information about you
                  and build this vector of information about you, about your purchase behavior. I’m
                  sure Andreas mentioned this before; much like most recommendation systems on
                  Amazon and other places, if you’ve looked for something, chances are you are
                  interested in buying it. They use that information. The world is full, believe it or
                  not, of behavioral targeting and marketing engines, much more common than the
                  search of the context world.

                  That brings me to the last one, which is social. It’s easy to figure out search. You are
                  looking for something. Context, you are looking for something and it might be
                  commercial or it might not be, but they’re still going to show you an ad. Behavioral is
                  definitely something commercial there because they’re looking at you when you’re in the
                  commercial mood. Social networks – why would anybody want to advertise there?

                  I might say, on my page, that I want an LCD TV. How many people actually say that?
                  How many people actually go to a social network to talk about what their commercial
                  interests are? It’s hard to imagine a world where advertising on a social network is
                  anything but intrusive. I’m going to talk about why that interests me. It may be just to be
                  annoying, or is it could be for a good purpose.

                  For a little bit about the search marketplace, I mentioned a bit about the pay-per-
                  click, the PPC marketplace and pay-per-impression marketplaces are just some
                  acronyms to throw out in case you haven’t heard it enough. Search ads are built
                  into a massive economy, now. Both Google and Yahoo predominantly get most of
                  their money through search advertising.

0:28:50.6         You’re looking at people when they’re most interested in buying something or
                  getting information about something. You have to discern whether or not you
                  want to show somebody an ad, based on whether or not they’re looking for
                  information about something or whether or not they are really interested in buying

Transcript by Tamara Bentzur,                                              Page 9
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  The good things about this are there are great metrics. Advertisers love it. You
                  have click through rates, a CPC value which is the price that I’m interested in. That
                  accumulates into this thing called an ECPM or an Effective Cost Per Impression,
                  which really is the value of that marketplace, to both the advertiser and to the
                  publisher. You have explicit intent and commercial interests.

                  There are big difficulties with search marketplaces, however. The biggest one is
                  the notion of market liquidity. I still hold this as a failing on my part; when we were first
                  building Go2 and Overture, we were the guys who invented PPC marketplaces. Google
                  kicked our ass. Why did they kick our ass? It was mainly because they knew how to
                  handle a long tail better than we did.

                  If you think about a search marketplace, there are very few searches that a lot of
                  people make. There are a whole lot of searches that very few make. The problem
                  with that, in terms of building a marketplace, is that if you only have a few people,
                  the price is not very efficient for that marketplace. The history of search marketing
                  has been to roll things up from the tail, let’s accumulate these search terms
                  together into a single market to bring all of these advertisers together and to get
                  the kind of market liquidity that’s going to jack up the price.

                  That’s very inefficient. There are a lot of terms out there. You don’t know which
                  ones to merge. You don’t know which advertisers are going to be angered by that. It’s
                  also the lack of prediction. Because it is so sparse and because you have so many
                  individual features, it’s hard to predict anything about it. If somebody searches on this,
                  do I really know that if they searched on A that they will search on B or C? It’s hard to
                  make those kinds of leaps. You can look at overlap in terms of the users, but it’s really
                  not that helpful.

                  Content match ads – I won’t say too much about this. It’s very similar to search ads.
                  As a matter of fact, most search marketing companies just slap on the same sort of logic
                  and intuition and engineering to solving the content match problem as they do the search
                  problem, sadly. The danger if misplacement is there. Everybody has heard these
                  kinds of anecdotal stories; you have a diaper ad on a story about infant death
                  syndrome, or something like that. You have these content misplacements, which
                  always piss off advertisers.

0:31:49.6         You have multiple applicable queries from that context. You look at a page; there
                  might be several queries you could extract out of that. You can’t just use
                  something like information retrieval concepts. You can’t use TFIDF to figure out
                  what terms really belong on that page. It may be one topic it’s talking about; it may
                  be several. How do you know what ad to show? Also, publishers versus social
                  networks, it’s hard to know; these days, the number of publishers versus the number of
                  social network pages that are out there, the ratio is definitely changing. Doing content
                  match on social networks doesn’t make any sense, so it’s not very applicable. The
                  result is that you have much lower CTR, and much lower CPM.

Transcript by Tamara Bentzur,                                         Page 10
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  Social network ads – the idea there, and this has been around a long time, of how you
                  want to advertise on a social network. You want to figure out who your user is. You
                  want to target specific users. That makes sense. It’s just like television. You know
                  who is watching that TV because it’s meant for a specific demographic. You show ads
                  that are applicable to it. That’s been around for a long time; there is nothing very novel

                  Can we make the same mistakes that we did with content match? Is it really like
                  television? Do we really know more about these users than we do of the same people
                  who are watching television? Is Nielson good enough? Is that all advertisers want?

                  One way to figure this out is do they respond to these ads better than they do on
                  television. It’s hard to say. It’s completely different metrics. How about the creep
                  factor? Somebody is looking at your emails, your comments back and forth; is that really
                  good data to mine to figure out what ad they’d be interested in?

                  This is the general assumption I live by; there will always be ads on social
                  networks. If we can make them perform better through targeting, if we can give
                  advertisers better ROI, the real question is can we serve fewer ads? Can we make
                  users happier on social networks? If you see a targeted ad, mind you that it doesn’t
                  have too much of a creep factor, will the user actually be attracted to it more; therefore,
                  more responsive to it and feel less intruded? Would you rather see a “punch the
                  monkey” ad or would you rather see something about a product you’re really interested
                  in? I don’t know.

                  Most of you have already gone through this with Andreas’ class. Why do people
                  provide data? People provide data on a social network to interact with other
                  people. That’s one case. You have this private kind of group where you are
                  communicating every day, you’re following their feeds, and you’re interacting with
                  them. There is this other more public side of social networks; I forget who coined
                  the term, “niche envy.” I really like that term.

                  The reason people put in public information on a social network is that’s how they
                  want to appear when they’re found. This is how you want people to find you when
                  they search. That’s called “niche envy.” It may not be factual, but it’s how you
                  want to be perceived on that network.

                  From an advertiser’s perspective, do you want to advertise to the private discourse
                  that happens between individuals, or do you want to advertise to how somebody wants to
                  be perceived? It turns out that most advertisers want the latter. They want to know that
                  you aspire, some day, to own a mansion and a yacht. They want to know that
                  someday you do aspire to be a sanitation worker. They want to know that kind of

0:36:01.4         What kind of data do they provide? You probably know more about that than I do.
                  What is taboo? Are private profile taboo? Are emails taboo? When I first started

Transcript by Tamara Bentzur,                                         Page 11
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
                  this game, I imagined that email was completely taboo. I don’t think it is, anymore. I
                  think enough people have capitalized email that it kind of voids that notion. How honest
                  are users? Should we care?

                  Why did I go to FIM? Fox Interactive Media, MySpace, IGN, all the various companies
                  had a rich data source of self expression, of communication, of how people wanted to be
                  perceived on a network. Nobody was touching it or working with it. Some outside
                  companies were using it and crawling it and doing all kinds of things with it, but internally,
                  very little was done with it. It was a perfect opportunity.

                  This is what the social graph looks like. If you have questions about what the social
                  graph looks like on MySpace, feel free to ask me.

Andreas:          What is at the X axis?

John:             This is the number of users with that many friends. This is the number of friends, this is
                  the number of users in 1,000’s, I think. I can’t read it from here. We have a large number
                  of people out there, at least on MySpace, with hundreds of thousands or a million friends.
                  You have to keep in mind, that in a lot of cases – MySpace is very different than
                  Facebook. MySpace started with the notion that you could be anyone you want
                  and you could set up any page for anybody you want. Facebook added that much
                  later, it was more about private discourse at first. MySpace is riddled with how
                  many Brittany Spears pages and things like that.

Andreas:          I want to understand what’s on this axis here. You’re telling me that there are thousands
                  of people who have one friend.

John:             There are a large number of people with one friend. This is thousands.

Andreas:          I would plot it this way, but you plotted it. When I was at Amazon I plotted it this way. On
                  the X axis we have the number of friends somebody has and the Y axis is the count.
                  That means there are slightly less people with 1, 2, and 3 friends, and the maximum
                  number is 4 friends. Then you have the long tail of the distribution, Power Law,
                  [0:39:30.4 unclear], and this would be very few people count as 1, count as 2, count as 3,
                  who have 1,000 friends. That’s how I read the graph.

John:             You could turn it on its side, if that helps. [Laughs] It’s very few people with a large
                  number of friends, a nice spread in the middle; it makes sense. That’s directly from
                  MySpace data. This is from a year and a half ago, but it’s hard to say how much it's

0:40:00.8         When I first got there, one of the things I was tasked to do was to come up with
                  some way of describing the users on a social network. How can we say anything
                  about them? I started with the simplest thing, a nice big black box of supervised
                  machine learn models. Have you talked a lot about machine learn models?
                  Supervised models are pretty simple. You start with a labeled set. These are the
                  classes up there to the right, and this is an example of a profile.

Transcript by Tamara Bentzur,                                         Page 12
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  The ten classes or so that we have, it says sports, action sports, travel, personal finance,
                  health wellness and fitness, etc. We took users and said this person is a sports fan, this
                  person is something else, and we created a label set. From that labeled set we learned a
                  model that could predict what class somebody would fall into. It has difficulties in trying
                  to describe users in this way.

                  It’s hard to get a labeled set for a large number of classes. Here, it’s just ten; it’s not
                  that hard; it’s pretty tractable. You can get enough people to say this person is in
                  this bucket or in that bucket. As soon as you start going to ten, a hundred, or a
                  thousand, or ten thousand classes, chances of people even agreeing on a single
                  label goes to about zero. It’s an intractable problem, when you have a large
                  number of users.

                  This doesn’t work very well if we really want to try to explain the variance in how
                  people perceive themselves on a social network. It requires a lot of training data
                  and conceptually, it’s very difficult to deal with negatives. When you’re building a
                  machine learn model, you need both positive and negative examples.

                  How do you know what’s not a sports fan on MySpace? Just because they don’t say
                  anything about it doesn’t mean they’re not. It’s very difficult to get negative examples as

                  I guess you have talked about collaborative filtering so I’ll go into a bit about that.
                  The next thing we looked at when we were looking at MySpace data was how rich is the
                  data for recommendation. Most of you probably know a little bit about collaborative
                  filtering. Without any kind of interesting heuristics, all we did was we took what
                  people said about music, movies, and television and we looked at all the text they
                  wrote. Everybody who has seen MySpace before, you have a little note that says
                  what movies, music, and television you’re interested in. We just took that
                  information and parsed it out. We built a recommender for each of these three

                  It was surprising that without any kind of normalization of the terms, without any
                  kind of canonicalization, without anything interesting put in there, at all, any
                  understanding about what the relationship is between any of these entities, we
                  said, “they have these users in common.” We could compare them, in terms of the
                  users that they have in common, in that space, and come up with a
                  recommendation system. There are no ratings; we just kind of threw it in there.
                  Surprisingly, even to the MySpace folks who actively follow large numbers of
                  music entities, it was very accurate.

0:43:41.0         How did we do it? This is where it gets really interesting. One way to build a
                  recommender is using cosine similarity. This is a collaborative filtering
                  recommender built by cosine similarity. This is generally how you do it. This
                  formulation above there is the formulation for cosine similarity. You start with two
                  vectors where you have users and all the bands that they’re interested in. You

Transcript by Tamara Bentzur,                                         Page 13
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  want to calculate the co-occurrence of those bands in user space. How similar are
                  those bands when you look at where they overlap in terms of users?

                  The dot product is going to be the inner product that is going to be the hardest
                  thing to do. In the beginning, with small scale data sets, you say, “I have users
                  and their bands they are interested in, and I created an inverted index of that.”
                  Let’s to a transpose of that little matrix. Now, I have bands and the users they are
                  interested in. Based on this initial data set here, I end up with band 1 and band 2.
                  They share both users A and B in common. There is some similarity between
                  those things.

                  You can imagine, with MySpace, you have a lot of users and a lot of bands, so you don’t
                  have that much of a sparsity problem because there are a lot of users in common. There
                  are a lot of users who declare that kind of stuff. The hard part is now that you have
                  that information, now you have to calculate the similarity. You have to calculate
                  the pair-wise similarity. What’s the similarity between each of those bands?

                  If you have 100,000 bands, and you’re trying to create a matrix like this, of how
                  similar these things are in space; this is a really big matrix, a huge matrix. All you
                  are trying to figure out is what number of users they have in common. It doesn’t
                  scale with data. Everything we do at FIM is not usually that interesting, in terms of novel
                  algorithms, or anything. It’s how do you deal with data sets this size. How do you say
                  something interesting and try to use as much data as you can? You can sample down,
                  but that’s not the point.

                  It has huge memory requirements. If you really want to do this quickly, you create this
                  whole matrix, load the whole damn thing in memory, and you do each pair-wise
                  comparison. The first time I tried this, I think I did do it in memory and it was in the order
                  of 50,000 bands versus 50,000 bands, loaded it all into a 128 GB machine, ran it for a
                  couple of days, and it spit it out. I got my similarity matrix. That’s not too bad, but it
                  doesn’t answer the real problem. How do we deal with the large data sets?

                  There is an interesting paper, and part of my background is in distributed systems, as
                  well. Back at Yahoo Research, we started a little project called Nutch. Out of Nutch
                  came Hadoop. We hired Doug [0:46:51.2 unclear] to help build Nutch and from there, to
                  help build what eventually became Hadoop. We had a lot of experience with Hadoop. It
                  wasn’t us, but some other person figured out how to do cosine similarity in Hadoop. It
                  was a very interesting anecdotal problem to solve. How do you do this without loading
                  everything in memory?

0:47:12.6         Here is an outline of exactly how you do collaborative filtering on large data sets in
                  this space. If you haven’t heard of Hadoop before, I’ll do a quick bit on it. It’s
                  based on something called MapReduce. MapReduce has a really long history. It’s
                  been around for a long time, the notion of map reduce – if anybody works with Lisp, you
                  use MapReduce functions all the time.

Transcript by Tamara Bentzur,                                         Page 14
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  The way it works is you start with a map process. This map process takes some
                  input data. It doesn’t say what kind of input data it is. It doesn’t really matter what it is.
                  It outputs a set of keys and values. If you think of this in terms of text, you get a
                  webpage in and you’re going to output, “Here is the word I found and the
                  document I found it in.” Your key is the word; your value is a document.

                  Underneath it all, between the map and the reduce phase, Hadoop will do a sort. It
                  does a sort by default. What does that sort do? The power of that sort is that it’s
                  as the input to the reduce phase, it’s going to give you a key to an array of values.
                  It takes each of those key, each of those words, and it will return to you an array of
                  all the documents where that word belongs.

                  What you will output in most cases is another key and another value. It doesn’t
                  sound that interesting, and it’s not that interesting. Don’t think that MapReduce is some
                  panacea that you can use to solve all of your problems. The interesting thing about
                  MapReduce is that if you can frame your problem in the context of MapReduce,
                  you get a huge advantage. You get a huge advantage, mainly because when you
                  phrase it like that, you leverage a lot of data local processing. It enables things to
                  go really fast and to deal with very large data sets, if you can deal with it in that

                  How did we phrase the collaborative filtering problem, in terms of MapReduce? You
                  take a step back. Now you say, “I have my user and I have my bands, but what is
                  the real question I’m trying to answer?” Here is my matrix of users and bands.
                  What’s the question I’m trying to answer? I’m trying to answer “For band 1, how
                  many users does it have in common?”

                  I start with this raw matrix here, and I’m going to take in this data and I am going to
                  figure out for band 1, the user A data will transform to the pairs of 1 and 2, 2 and 3,
                  so these are bands 1 and 2 co-occur within user A. Bands 2 and 3 co-occur within
                  user A. I know that these bands are similar because they share the same user.

                  You get the number of users per band if you never do that initial inverted index.
                  Your output for your data for user A and user B are 1, 2; 2, 3; 1; 3, etc. In they
                  reduce phase, you get 1, which is a band, and you get the fact that you saw it with
                  band 2, band, band 3, and band 4. The output of this is that between bands 1 and
                  2, I have a count of 2. Between bands 1 and 3, I have a count of 1, and so on. By
                  doing this, I never have to load any of the information into memory. I can look at
                  things, one row at a time, and output directly what that inner product of that
                  formulation was.

0:51:17.6         This is really profound. It makes solving large data problems something you can
                  handle in a distributed environment, without having any kind of specialty
                  hardware. Calculating that same thing, 50,000 by 50,000 on a Hadoop cluster of 10
                  machines is trivial. It’s not days. It’s not huge amounts of memory. It’s minutes!
                  When you’re talking about 50,000 to 100,000 or 200,000 you understand what the

Transcript by Tamara Bentzur,                                         Page 15
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  order of complexity is when you increase like that. It’s still very reasonable
                  because it scales with how many rows you have. That’s linear.

                  No inverted index, no big memory requirement, and it’s one pass through the user
                  data to solve the problem. Now, what can we do with this powerful tool? What can we
                  do with the ability to produce similarity between features that users have? How do we
                  use that for advertising?

                  Hyper targeting is based on this notion that we have this similarity because we
                  have so many users talking about the same things. How can we use that
                  information for advertising? That’s exactly what hyper targeting is.

                  Hyper targeting makes the assumption that advertisers want to target to what they
                  think is their ideal user, to these archetypes. We give an advertiser the ability to say,
                  “I want somebody who is interested in this, has friends like this, and has all the qualities.
                  They might search for things,” whatever features you have about them, they can throw it
                  into this black box and describe what their ideal user is and they’re going to get back a
                  target that is not only predictive but also will have the best ROI for them. These are the
                  users that are going to click and buy their product.

                  It will take, as an input, anything that an advertiser will think of a determined
                  audience. But, there is a trick; in order to do this, we need to find some way to
                  describe the variance of all these features about users. That’s the hard problem.

                  We took the same lessons that we had from collaborative filtering, and we applied
                  that to topic modeling. I’ll try to describe this – is it legible? In the same way we do
                  collaborative filtering, we can describe these high level descriptors about users.
                  We don’t want to use all the features that a user will say about themselves in a
                  social network. We want a smaller set.

                  We want a smaller set because by compressing data down, not only does it allow us
                  efficiency, but when you compress data you generalize data. You make whatever you
                  are collecting about these users more applicable to new features coming in, to additional
                  features that might be there; it’s really the only way to represent users in that space.

                  For example, I have what a latent semantic space is. Let’s say you’re trying to
                  predict for a given user whether they would be interested in Steel Magnolias, the
                  movie. You have these other movies that are described in this same latent
                  semantic space. You have Star Wars – I think red is action, green is science
                  fiction, and blue is comedy. Star Wars can be represented by mostly red, a little bit
                  of green. Rush Hour can be described as half red, half blue. Terminator, Lord of
                  the Rings, Water Boy can all be described in this space.

0:55:07.7         Now, if we know what Fred has said about himself, let’s say he listed all five of
                  those movies. We can also describe Fred in this latent semantic space, as mostly
                  red, midway green, and a little bit of blue. Now, if we know where Steel Magnolias
                  fits within this space, and we know what Fred looks like, then we also know what
Transcript by Tamara Bentzur,                                         Page 16
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  Fred might be interested in. It’s another way of doing collaborative filtering, but in
                  a much more efficient and general way.

                  We took the MySpace feature data; this is an example of one of the things we did. We
                  took all their preferences in order to build a lower rank approximation of the matrix
                  that we found. Here are what the preferences are in this space; let’s call it “movies.”

                  Fred likes A, B, E, and F. Wilma likes B, C, and D etc. Right now, the space up
                  there is 6. We want to reduce it down to 2 that will best describe the variance
                  among users. This allows us to reduce noise, sparsity, dimensionality, and to
                  make better predictions.

                  How do we do it? I think you’ve mentioned it a little bit. There is a technique called
                  Singer Value Decomposition. I won’t go into it too much, but it’s a way for us to figure
                  out what those things are. If we start with a data set that looks like this, how do we
                  go to a matrix that looks like this? One method is called Singer Value
                  Decomposition. You mentioned PCA. It’s very similar.

                  The idea is that if you start with a large matrix of users to features, you want to
                  break it down into a lower dimensional set of here is the user matrix, here is the
                  feature matrix, and then some weight matrix that sits in between them that if you
                  multiple these three things together, you should approximate the original matrix.
                  You condense down all the description about these things into just a small weight
                  matrix. That’s the goal of an SVD.

                  Again, it’s a problem of scale. Here is an example of how you use it to figure out where
                  users are. Let’s say you have a features matrix of that same movie set again, and
                  you have Fred, Wilma, Betty, and Barney, and these are how they are rating those
                  movies to be. You can reduce it down - this is the actual output of this from an
                  SVD - of that matrix and from this set, just using that S matrix there, you can say
                  exactly where a new user, for instance Mr. Slate, would fit between the rest of the
                  users who are there.

                  Seeing a new user or a new ad or whatever entity that you’re talking about, where would
                  it fit into the existing space of information that you have about them? How do you
                  deal with the large data sets? Here is the example of what we tried to solve.

                  We tried to solve it with movies. In the same that we were dealing with, we had four
                  million users and roughly twenty-four thousand different movies. The matrix was pretty
                  big, ninety-five billion cells. I think that’s about right, but only thirty million or so with
                  values in them. If you tried to load all that into memory, there are not many machines in
                  the world that would hold that.

0:59:00.4         You can solve it with some simple tricks. One of them is gradient descent. Is
                  everybody familiar with the Netflix competition, the Netflix competition, where people
                  were trying to put on algorithms out there to find out the best movie recommendation?
                  There was a lot of work and it was really helpful that Netflix did this because it inspired a
Transcript by Tamara Bentzur,                                         Page 17
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  lot of people in the machine learning world to do some pretty interesting things for
                  recommendation systems.

                  One of them was one of the leading guys for a while who had this approach, very simple
                  approach, worked really well with large data, easy to put onto a MapReduce framework –
                  that used a grading descent. gradient descent is exactly what you think it is. If you
                  start with some objective function, something that you’re trying to minimize, and in
                  this case “root means squared error,” in the paper that they had. You can do some
                  simple tricks with gradient descent to allow you to work with a very large data set
                  and make only one pass through the data. Again, that’s the trick, making only one
                  pass through the data in order to solve the problem.

                  That was easy to implement. The problem was this works well with rating data, “root
                  means squared error,” works really well if you have values to the data. We don’t have
                  that. We only had [1:00:25.7 Blumian] features. We only had whether or not they liked
                  that movie.

                  We did some modifications to the algorithm, using MAP, Mean Average Precision,
                  which is where things show up in a ranking when you’re trying to predict it. If I’m
                  trying to predict the fifth movie out of four, and I predict that that movie is in rank
                  20 out of 100, I know I didn’t pick the top one; therefore, it’s not a close fit. If I
                  predicted a 5, then it would have a better fit. That’s the general intuition of how
                  MAP would work in this context.

                  We just substituted that for “root means squared error,” ran MAP through it, and
                  we got this very fast SVD approximation of all the variants that existed within
                  users, within all the features that they had in that space.

                  In a nutshell, that’s what hyper targeting is. It is not a very interesting algorithm, but
                  it’s a way of dealing with the algorithms that are out there on very large data sets
                  in order to represent what we know about the users that are out there.

                  Where are we now? That was the history up to about two years ago. Where are we
                  now, with everything that we’re doing? There is a lot of interest, obviously, in using
                  the social graph. It’s not just friends. It’s who you’re communicating with, how
                  you can use information about everything from how means are propagated to how
                  people join in other groups.

1:02:24.5         More importantly, we do a lot of tests. There is a lot of intuition around should
                  people be interested in the same thing that their friends are. There is a lot of
                  research work. I’m sure there is a lot of speculation about it, but one nice thing
                  about where we are is we can test that kind of thing. We can actually run ads to
                  peoples’ friends and see whether or not they’re more or less likely to click on it
                  than other people. That’s a great testing ground for any kind of research work where
                  what you’re really trying to do is figure out what is a friend in that context. Is it somebody
                  who is really close to you? Is it somebody who has similar interests to you? However

Transcript by Tamara Bentzur,                                         Page 18
                      Transcript of Andreas Weigend
          Data Mining and E-Business: The Social Data Revolution
                  Stanford University, Dept. of Statistics
                  you want to define a friend, you can do tests where you can figure out whether or not
                  they’re more or less likely to click on an ad.

                  One of the more interesting things about this description of users in this space is
                  the marketplace overlap. The idea is, in the search marketplace when I talked about
                  that earlier, you have these very discreet and small marketplaces where you’re defining
                  people. When you start trying to talk about what are the smallest number of dimensions
                  to describe a large data set, you’re going to have a much better place to define a set of

                  We are just like search marketing, in that we have a finite set of things. It’s very
                  different because it’s a much smaller set of things so there is no tail. Not only that,
                  the people that you’re advertising to are actual people so they can have many of
                  these attributes that are associated with them. In other words, the people that
                  you’re marketing to can exist in multiple markets at the same time. It
                  automatically, as soon as you create a target that’s out there, you automatically get
                  liquidity because there are already advertisers in the space because they’re
                  already advertising to those users.

                  I’ve been working with a lot of folks in prediction markets and in other places that are
                  keenly interested in how these overlapping marketplaces work in terms of both prediction,
                  and how this would work in other types of markets.

                  Targeting predictions – we have all this information about our MySpace users.
                  MySpace isn’t everything. MySpace doesn’t comprise the entire web. Can we say
                  these same sorts of things, can we figure out what attributes you have, even if
                  you’re not on MySpace? That’s a very important question to business. It’s not just
                  whether or not you’re saying something, but if there are associations between what
                  queries you make, where you go on the web, anything like that where based on other
                  people who do those same things that we know information about you, how well can we
                  predict those attributes, even if you’ve never been to MySpace before?

                  It’s the predictability of these things from other behaviors and other features that
                  we can gather about you on the web, and how valuable is that. Of course, click
                  prediction – you’ll find that the ability to predict whether or not somebody is going
                  to click, because we have a much smaller set of dimensions, is much easier to do.
                  It’s harder to predict whether or not somebody is going to click on a search ad
                  because you don’t really have that much information about a user.

1:06:11.1         For us, because we have users that are like that, that have clicked on ads like that
                  before, we can easily predict whether or not this user is going to click with some
                  pretty high accuracy. We know exactly what things might be interesting to you.
                  Again, this all goes back to that thing that I talked about before; how can we best
                  get ads to people that will actually have some meaning for them, that they will be
                  interested in, and can we show less ads as a result and still have a lot of revenue
                  and money?

Transcript by Tamara Bentzur,                                         Page 19
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
                  Also with Hadoop and beyond, we’re always interested in that. Some of the interesting
                  things coming out of doing distributed data systems are what other things you can
                  do if you have the whole batch processing thing buckled up, what are the other
                  things you can do? Things like streaming data, online learning, and those things
                  are all very interesting once you go beyond simple MapReduce problems. How do
                  you handle things like if you are really trying to do learning, how do you do online
                  learning? How do you do recursion? How do you do that in a distributed data model?

                  These are the kinds of things we’re looking at right now, or have been looking at for the
                  past year or so. The last thing I’ll mention is – small plug – Fox Interactive Media and
                  Fox Audience Network – I went there because I went where the data was. I encourage
                  you to do as well. If you are interested, whether it’s internship or fulltime position, if
                  you’re interested in looking at very interesting data sets and how to apply that and how to
                  make money off of it, please come see me.

                  One last thing about it, since I can plug it, is one of the advantages of Fox Interactive
                  Media and Fox Audience Network is we’re still very much like a startup. It’s a very small
                  group and very focused on what they’re doing but still with the backing of somebody with
                  a nice big, fat wallet. It’s a nice place to be, as long as we’re making money for the
                  company it works out well. It’s like my entire life through grad school. Nobody cared
                  what I did as long as I was producing papers. It’s the same kind of thing here, as well.
                  [Laughter] Thank you very much.

Andreas:          Thank you very much, John. We have about five minutes before the break. What are
                  your thoughts or questions?

Student:          You mentioned CVC XVM. What are your thoughts about CPA, where it’s going, and
                  also what … pick from…?

Andreas:          CPA means Cost Per Action. Action can be many things. Action can be somebody
                  signing up for your card or buying a house or just being a lead.

John:             There is a nice continuum between the CPM marketplaces, CPC marketplaces, and
                  CPA marketplaces. CPA marketplaces – the data is sparse but it also puts all of
                  the risk on the publisher, on the person who is serving the marketplace. Back in
                  the old days, everything was CPM, where you just put out your impressions and
                  you paid for 1,000 impressions, a million impressions, or whatever. It was a
                  beautiful thing and the revolution around CPC marketplaces is that it was right in
                  the middle. It was half the risk for the person serving the ad and the other half was
                  making sure the ad that the user actually saw, that the conversions actually saw.

1:10:27.9         The thing is, once they clicked on the ad, it’s kind of outside the domain of whoever is
                  serving the ads, whether or not they’re going to buy. It’s really hard to optimize past that
                  point if you’re the ad server.

                  I think CPA ads are great. I think that whole marketplace is great; however, it puts
                  all the risk on the ad server. That is a difficult thing to buy. At what point do you go
Transcript by Tamara Bentzur,                                         Page 20
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
                  to the advertiser and say, “Fix your site, and fix your buying funnel so that you get more
                  conversions.” That’s hard to do. It’s hard to do at scale, at least. You can do it for
                  smaller advertisers.

                  There is a lot of good optimization around there, with CPA marketplaces. If you look at
                  behavioral pixels and things like that, it might be better. Again, it’s very sparse data.

John:             Click fraud can be anything. It can be –

Student:          …

John:             To reduce click fraud?

Student:          … control…

John:             Click fraud is uncontrollable? Well –

Andreas:          Just for everybody’s benefit, click fraud means that it is not real people who are clicking
                  and who you should be paying for but it is maybe robots or people in Uzbekistan or in

John:             It turns out that MySpace doesn’t have a huge problem with click fraud. They actually
                  have a bigger problem with [1:12:05.0 unclear] click. Part of the problem with MySpace
                  is that there are ads in a lot of different places and sometimes they move around on you.
                  You go to click one thing and you accidently click on an ad. You stop and go back or
                  whatever. We actually don’t have much click fraud. The buying behavior we have
                  usually gets fixed on the backend. If you’re an advertiser and we identify an obvious bot,
                  we actually do refunds. The biggest problem we have is more the accidental stuff, which
                  is a much bigger margin.

Andreas:          That’s a good example of modeling that if you have a certain baseline level and then
                  suddenly there is a spike, you know that the best thing is to ignore it and don’t tell
                  anybody about it. Don’t talk about [1:12:52.4 unclear] fraud, and your advertiser doesn’t
                  complain about it.

John:             These heuristics around time are all in there. Let’s say a user can’t click on one ad more
                  often than once every two minutes or five minutes. It’s more that you get clicks on this ad
                  from the same location on a regular basis, but that’s easy to identify. You find that most
                  of the algorithms around click fraud are easy to solve with heuristics. Boris from Yahoo –
                  worked with him quite a bit –

Andreas:          Yeltsin, Becker [Laughter]

John:             It’s not that much of a problem.


Student:          How would you build an advertising model for Twitter?

Transcript by Tamara Bentzur,                                         Page 21
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
John:             I have no idea.

Andreas:          [Laughter] I think that question is best left for the reception afterwards. Is there anything
                  more specific about MySpace, about Fox Interactive Media?

Enrique:          How do you feel about word of mouth marketing, banner ads such as
        , and other companies that show a banner ad? As soon as you
                  engage with it, it leverages your social graph, very similar to any other advertising with
                  applications on Facebook, for instance.

Andreas:          Just repeat the question for people out there. Enrique wants to know how you feel about
        , Seth Goldstein’s company, and banner ads in the space.

John:             Maybe we should leave that to the reception, as well. I think it’s great. The more
                  interactivity you have with ads the better. Ads are evolving. People who are getting
                  into the ad game right now, who are looking for help; it's pretty phenomenal in terms of
                  interactivity. One of the biggest upcoming advertisers who is trying to REALLY leverage
                  the social network is Adobe. You can imagine the power that they have, both in terms of
                  – they see everybody. There is nobody in the world that Adobe doesn’t see. They own
                  the Flash player.

Andreas:          And they own the right to access the camera.

John:             Absolutely, and there are a lot of companies that are competing to get their attention to
                  take advantage of that fact, and the fact that they want to serve ads right there within the
                  other ads themselves, or within the movies. I’m all for it. It’s hard to know exactly what
                  the value of it is. As ads become closer to content, everything is kind of up in the air.

Student:          Could you talk a bit about… more engaging than standard banner ads….

Andreas:          Enrique’s second question was video advertising.

John:             We actually haven’t done any testing with video advertising ourselves. I can’t tell you
                  that. I know that we’ve done a lot of offline tests of that with companies that are involved
                  in that space to see what kind of lift you get. You get similar lift to what you get on a
                  social network, at least what data that we do have on that.

Andreas:          I have one question here. At the D7 Conference, All Things Digital, I think [1:16:31.9
                  unclear] showed that 61% of people say they use MySpace less now, than they used to
                  use it a year ago. Who here actually uses MySpace? Who of you has been on MySpace
                  in the last month, for more than a minute? Four people. Who of you has been on
                  Facebook, today, for more than an hour? [Laughter] That’s interesting how it has totally


John:             Yeah, which is interesting. As a company, MySpace is still trying to find out exactly who
                  they are. They’re going through that process now, whether or not they’re an
Transcript by Tamara Bentzur,                                         Page 22
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
                  entertainment site, a social networking site, exactly what they are. In my mind, however,
                  it’s more interesting to me to not try to evolve what social networking is. I think there are
                  enough people that are already working on that problem. It’s what you can do with social
                  data. That same information that we get from MySpace is equally applicable on
                  Facebook, on anything. Believe it or not, hyper targeting surfaces on Facebook all the
                  time. People don’t know that, but they serve on Facebook all the time and the same
                  target and the same things work out just as well.

Andreas:          I worry about us having a coffee break. Should we listen to the three questions we have
                  here and you can decide if you want to take any of them. We will be back here at 4:00

Sean:             Just to go off … I’ve been a MySpace user since 2006, 43,000 friends. I can tell you – As
                  the advertising becomes mare targeted and more intrusive, I can say I log in 90% less.
                  It’s specifically why I was so excited about today’s class. I think as the advertising
                  becomes so evident, it turns people off. Whereas, with Facebook, their advertising is
                  very blended in, very clean, and it’s not that when you go to the page there is a big flash
                  that says, “… movie coming out.” Is there a way to work better with the creative…?

John:             Tell me if I’m wrong, but you’re not arguing that it’s more or less targeted. I would
                  argue that it’s just as targeted on Facebook because it’s in a lot of cases using our
                  algorithms. The issue is it’s more intrusive on MySpace. One of our goals is if we
                  can make the targeting work, can we actually reduce the number of ads and still
                  have the same ROI? Can we make things less intrusive? There is this gut reaction
                  to if you’re population is going down and you’re making less money, you have to
                  start putting more ads in there.

                  MySpace has been putting ads in there for a long time but the idea of putting more ads in
                  there, the thought is you’re actually going to make more money isn’t’ necessarily the
                  case. There are a lot of arguments and papers on this; how intrusive do ads need to be?
                  Sometimes is not showing an ad the best solution? If you really know that somebody is
                  looking for information on a search, should you even show them an ad?


Andreas:          There is a big difference between stated preferences and [1:20:22.3 unclear]
                  preferences. I remember doing an experiment with Alibaba about five years ago, in
                  China. Chinese people also say they hate blinking ads. Just like most of the people,
                  they click on them. Now, on the one hand you get this negative feedback that people
                  hate the site because – any Chinese site – there are a lot of things happening on the
                  sites. People don’t like it but it’s still happening. How do you trade off? That was
                  one of the questions we had in the survey, that perceived negativity of that ad
                  continually hitting you, you really hate seeing that ad by the airline you don’t like
                  versus people still click, and it’s on top of mine. That is an interesting trade off,
                  basically, what metrics do you consider? If it’s just clicks, you’re optimizing for
                  clicks. If it is for perception you optimize for something else.

Transcript by Tamara Bentzur,                                         Page 23
                       Transcript of Andreas Weigend
           Data Mining and E-Business: The Social Data Revolution
                   Stanford University, Dept. of Statistics
John:             It’s really hard to say because you can’t say that people are using the site less because
                  of ads. The correlation is not necessarily cause.

Andreas:          We need our coffee, so we’ll be back in 15 minutes, at 4:00. In terms of logistics, I want
                  to point out that we do have – I want to know – who is the good soul who managed to get
                  EtherPad social data? I have no idea how that happened. Ron, thanks – cool. We have
                  an EtherPad for the class. It’s good and I’m not getting paid by EtherPad. It’s great for
                  concurrent editing. It’s amazing how if you type something, it actually works.

                  In the second half I have significant notes here about what I want to get through, the
                  insights I got in the class that I want to share with you. I think it’s pretty interesting stuff. I
                  won’t project. Instead, we’ll have the stream up where you can put stuff on, like here.
                  We’ll have the EtherPad up. Let’s try this participation that we always talked about in
                  architecture of participation and see if it makes sense to either do it in a wiki style where
                  you can put questions up on the, or whether it makes more
                  sense to do it in a streaming way.

Ron:              Can I get a show of hands for people who have laptops in class? If you have multiple
                  ones – so we can get a count of how many people actually have laptops in class. 26 plus
                  mine, anyone else? So 28.

Andreas:          What does that means? Who is happy about that? Who won? Interpret the results.
                  Statistics is about interpretation.

Ron:              I don’t know the numbers off the top of my head, based on the current prediction market
                  when it closed. I will go online now and switch out the data.

Andreas:          Are there any comments?

Student:          I have a quick comment about the prediction market. How many people participated in
                  predicting and is it possible to gain the system, to cheat it, to make false predictions, to
                  make something else happen?


Ron:              Since the market is designed for people to bet and trade shares, you can essentially sell
                  your shares first at a high price and when it goes back down you can buy them back to
                  earn money. Those are some of the games you can do if you want to, but that aside, you
                  could also manipulate the actual outcome, which is what some other people have done or
                  will do, I’m sure; to manipulate the results, whether it’s to add more Twitter followers,
                  force some followers to un-follow. On the game side you can manipulate and on the
                  actual how you interact with people and buy and sell shares, there are obviously ways
                  you can provide people incentives to sell or buy stock. You can write comments on the
                  system so you can influence. I think that’s a great test on the social piece, where if you
                  are a great influencer you can make a lot of money. That’s why stock markets and news
                  – that whole momentum causes some of these effects.

Transcript by Tamara Bentzur,                                         Page 24

Shared By:
Description: weigend stanford ads Andreas S Weigend Ph D swine flu