IBM INVESTOR RELATIONS
IR PODCAST
IBM AND THE FUTURE OF OUR PAST
THE GENOGRAPHIC PROJECT, A LANDMARK FIVE-YEAR STUDY CONDUCTED BY IBM AND NATIONAL GEOGRAPHIC EXPLORES THE ROOTS OF HUMANITY
OCTOBER 7, 2005
To hear this podcast and others from IBM, please visit www.ibm.com/investor. In the space of just a few thousand generations, the linguistic, cultural and genetic diversity of the human race has exploded. Scientists agree that all humans share a common ancestry in Africa. Much less well understood is how the human race migrated out of Africa and became so diverse. To discover more about this ancestral journey, IBM and National Geographic have launched the Genographic Project, a landmark five-year study which will collect a massive sample of human DNA to put together a map of how humans populated the planet. EDWARDS: I’m Ben Edwards. What do we know about our past? Scientists now agree we all share a common ancestry in Africa more than one hundred thousand years ago. But how did we get from there to here – from a few thousand early humans to an extraordinarily diverse, global population of six and a half billion?
To answer this question, IBM and National Geographic have embarked on the Genographic Project, a landmark five-year study that will collect and analyze DNA contributed by hundreds of thousands of people to map how humans populated the planet. The public can contribute by buying a Genographic Project cheek swab kit and submitting their own DNA to the study. In return, participants get a personal map of their own ancestral journey from the African plains.
Joining me today to discuss the future of our past are anthropologist and geneticist Dr Spencer Wells and Ajay Royyuru, head of IBM’s Computational Biology Center. Spencer is leading a team of field scientists to collect the project’s DNA samples; Ajay heads up the IBM team that will make sense of the massive amounts of information collected.
So, Spencer, tell us a little bit about the origins of the Genographic Project.
IBM INVESTOR RELATIONS
WELLS: years.
Genographic as an effort, as a research effort, really dates back a couple of The seeds of it came out of a project that I did with National Geographic kind of
summarizing the work in the field of genetic anthropology up until a couple of years ago.
And this is work that's been carried on largely as a cottage industry. Researchers who are trained in genetics are applying the tools of genetics to figure out where we all came from. And the goal is to explain human diversity: do we have a common origin as a species? If so, when and where? And how did we spread around the world and generate the patterns of diversity -linguistic diversity, physical diversity and so on -- that we see today?
But the story is still kind of murky. And at that point we had analyzed perhaps 10,000 or so DNA samples
And so coming out of that project National Geographic started talking to me about taking the research on to the next level. And they said, well, what would you do if you could do anything?
And I said, well, we'd increase the sample size by an order of magnitude or more. So, to go from 10,000 to over 100,000 samples, that would give us the power we need to answer these questions.
WELLS:
And I said, okay, that sounds good. We're obviously going to need a partner for
that. That's a lot of data. And we said, yes, I think we probably will. And so we approached IBM about getting involved. First choice. Obvious choice, you know, because of their expertise in setting up databases, and the scientific expertise that they have analyzing huge data sets.
And the guys at IBM loved it.
IBM INVESTOR RELATIONS
But we wanted to open it up to the general public as well,. You know, I'm of mixed ancestry but I still want to know about my own DNA, and can I add something to the database. So we're allowing the general public to participate as well.
EDWARDS:
Ajay, how did you personally get involved in this project?
ROYYURU: met.
Well, I think it was about almost a couple of years ago that Spencer and I first
And because of the work that my group has been doing here in IBM Research at the Computational Biology Center, we were sort of the logical point of connection.
EDWARDS:
What work is that, that you've been doing?
ROYYURU:
We work, you know, primarily at the intersection of information technology and
biology. Biology is an information science. Many have realized it now, in the post genome era.
And the problems that are posed in biology are actually answerable by looking at the information that you can extract from the genome. So the Genographic Project is an embodiment of exactly that thought, that you carry the answer to your ancestry in your DNA. It's always there.
EDWARDS:
Right.
And the mystery really is how do you connect from what you can see in today's population to what must have been back there.
WELLS:
Right.
IBM INVESTOR RELATIONS
Yes, and again, it's that goal of trying to explain patterns of human diversity. I mean, there are 6,000 plus languages spoken in the world today. How did that pattern of diversity get generated? I mean, how did we go from a small population living in Africa, presumably speaking one or a few languages, to 6,000 all over the world?
EDWARDS:
Now, you mentioned, you know, we're going from a data set of perhaps 10,000 to
100,000. I mean, why is that valuable? I mean, 10,000 to me seems like a large number already.
WELLS:
There's 6.5 billion of us on the planet, so 10,000 isn't a great sample.
[LAUGHTER]
Not yet.
ROYYURU:
You know, Ben, if you just think of what does diversity mean in human
population? You know, I'm aware of tribes in India, which is where I'm from, that have population sizes of two to three million. And we think of them as one tribe
So if you think of, that is what makes up the diversity that we see on this planet, then 10,000 or 100,000 is actually a small number. To learn the most from this project what you want to do is have these 100,000 distributed in a manner that represents this diversity in the best possible manner.
So you have to actually work in each of the geographies, make contact with as many indigenous populations as you can, and even amongst them, try to get as diverse a population as possible.
IBM INVESTOR RELATIONS
EDWARDS:
I see.
ROYYURU:
It's better to sample 10 different villages and have 10 people from each of those
villages than to get 100 people from one village, which means you really have to walk the planet.
WELLS:
And of those 10,000 to 15,000 people who have been genotyped today, most of
them are coming from western Europe, they're coming from North America, they're coming from East Asia, Japan, China.
Most of the world has not been sampled adequately. You know, I'm off to Libya and Chad in a couple of weeks to sample some very interesting populations, a group of people who speak a language that's unrelated to any other in the world. And it may be a remnant of the first languages spoken in central Africa as humans were populating that continent 50,000 years ago.
There are only 750 speakers left, and they live in a couple of villages in southern Chad.
EDWARDS:
Wow.
WELLS:
You know, getting out there is not an easy thing to do, and that's the reason they But they could reveal something really important about early
haven't been sampled so far. human history.
We have to go out and explain what it is we're going to do, the risks, which are minimal; and the benefits, which is finding out about your history. And get people excited. And most people really, when you start to explain that they're carrying this history book in their DNA, and that you can help them read it using the techniques of genetics, they're fascinated and they want to take part.
IBM INVESTOR RELATIONS
And you tell them, you've also got this genetic thread that connects you with people halfway around the world, and to me, and to people you've never met before and that I've never met before. And we can actually track that using DNA. They say, yes, that sounds really cool.
EDWARDS: terrific.
How did you come up with the idea of those kits, by the way? Because they're
[LAUGHTER]
WELLS:
Thanks a lot. Well, you know, this is something that those of us who work in the
field, we've been approached by the public a lot in the past, people who have read about a paper, something that comes out in the New York Times, and they're like, that's really cool, I want to get tested. Can you test my DNA?
And academic labs are not really set up to do that. So we wanted to put a mechanism in place that would allow the public to take part and also add to the database. You know, we've sold, as of this morning, around 67,000 of those kits -- so that's a lot of data. This is already one of the largest genetic databases ever compiled.
EDWARDS:
Wow.
ROYYURU:
And Ben, you know, the manner in which we reach out to people both on the
public as well as indigenous side really requires that we are very thoughtful and thorough in giving considerations to people's sensitivities, expectations of privacy, security of the data that you gather.
IBM INVESTOR RELATIONS
One way we are doing that is each of these kits labeled with a participation ID which is anonymous. So if you as the participant receive a kit, it has some random number on it. That's basically all that we know about that kit.
The second thing we have done as infrastructure that we roll out into the field for gathering the data, as a technology partner we made sure that we do it in a manner that secures the data all the way from its origin.
So we've given Thinkpads, for example, that have a biometric fingerprint reader and have an encrypted database loaded on it. So as you acquire the data out in the field, it's secure right then and is transmitted as and when you connect on the wire to the central repository that we built at National Geographic which houses all the data.
And we have a stated goal to not attempt patenting any of this genetic information. This is data that we are obtaining from the people, and we will release all this data back to the public.
WELLS:
Yes, we see this as part of the common heritage of our entire species, and it's
something everybody should have access to.
EDWARDS:
Can you tell me if someone has got a non scientific, non biological background,
how the migratory history of my ancestors is recorded in my DNA?
WELLS:
Good question. You get your DNA from your parents, and they get their DNA
from their parents and so on. So you have a line of descent that's traced through the passing on of this genetic information.
IBM INVESTOR RELATIONS
Now, when you're passing on this information, you have to copy it. And when you do, because it is very, very long, occasionally you make a mistake. They don't occur very often, but when they do occur and you pass them down through the generations, they become a marker of descent.
Then if you share a marker with someone else somewhere in the world, then you share an ancestor at some point in the past. And it could be a very recent ancestor a couple of generations ago, or it could be thousands of generations ago.
And it's by connecting up this ancestry through those markers that we can trace how people are related to each other and therefore how they've moved around the world.
EDWARDS:
Now, before you embarked on this project, what were the kind of dominant and
prevailing and sort of competing theories about human origins and migratory journeys and so on?
WELLS:
The big debate has been where did we originate as a species and when. Most
evidence at the moment is pointing to Africa, and I think everyone would agree that ultimately our species did originate in Africa.
The question of when is still open to debate; probably within the last couple of hundred thousand years. That's on the basis of fossil evidence; the genetic data is clearly pointing to that as well.
So if we're in Africa a couple of hundred thousand years ago, when did we start to leave? How did we explode around the world? How did we go from...and again, the genetic data says the population size is relatively small; a few thousand individuals in Africa to six and a half billion today all over the world.
That's really unanswered.
IBM INVESTOR RELATIONS
How do these DNA samples get processed? How much data is involved in a sample?
ROYYURU:
At the moment I think it's of the order of maybe a few tens to few hundreds of We are looking at about between 10 to 20 markers from each
megabytes per individual. individual.
For a male individual it's markers on the Y chromosome; for a female participant, it's markers from the mitochondrial DNA.
It's the number of participants that actually makes this project extremely interesting. When you put all of that together, it's a sizable...
EDWARDS:
I was just trying to work it out in my head and I couldn't...
[LAUGHTER]
ROYYURU:
A few terabytes at the moment.
EDWARDS:
A few terabytes, okay.
WELLS:
Yet ultimately this is not just a genetic project. It's motivated by an effort to
understand diversity, history, anthropology and so on. It's using genetics as a tool to do that.
EDWARDS:
And once you've collected the data then what do you do with it? How do you
process the information? How do you mine the information to extract your results?
IBM INVESTOR RELATIONS
ROYYURU:
There are some I would say key scientific questions that one has to address
when you look at data like this. One specific question is how do you actually go about reconstructing or hypothesizing what might be a family tree that explains all this diversity we see today.
So in a sense you don't have knowledge of all those branches of the tree; what you see are just the leaves, the populations that exist today. If you go back in time, some of these populations had common ancestors and they had common ancestors and so on. So the tree that [you're stating it]...
EDWARDS:
You have to reconstruct the architecture of the tree.
ROYYURU:
Exactly. Exactly. And it may not be just one tree. It may not be just one tree
that explains all that. But you want to have at least either the most parsimonious or most robust means by which you reconstruct the tree so that you can explain this diversity and recompile all the disparate information that stares at you today: the geography, the culture, the linguistics and so on.
We want to be able to reconcile all of that into one common view.
EDWARDS: yourself.
So Spencer, I'm taking a wild guess here, but I'm betting you've done a swab
WELLS:
I have.
EDWARDS:
So, tell us about your...what you found out about yourself and your history.
IBM INVESTOR RELATIONS
[LAUGHTER]
WELLS:
I'm African, as is everybody...
[LAUGHTER]
...ultimately. I...my ancestry on my father's side, this is the Y chromosome that I was looking up, I know that my father's family traces back to England a few hundred years ago via Connecticut.
And so I have a relatively common Y chromosome haplotype or set of linked markers that's found throughout southern England. Over 70 percent of the men in southern England have it.
Now, if you go a little bit further back, before they were in England my ancestors were living down in Spain. This is during the worst part of the last Ice Age. So much of Britain was covered with an ice sheet when you come down out of the North Pole.
And people living in Europe had withdrawn to what we call refugium. One was called the Franco Cantabrian Refugium, it was down in Spain and southern France; another down in Italy south of the Alps, and a third in the Balkans.
So my ancestors were living down in Spain, so I'm effectively Basque [for] like 15,000 years. Before that, if you go back to around 35,000 years ago, they were living in central Asia of all places. So it's this tracing back down the tree, tracing that lineage and showing those
connections.
Central Asia, probably hunting on the steps wooly mammoths and so on, and they're sewing clothing out of fur. You know, during...again, we're still in an Ice Age, very extreme climatic
IBM INVESTOR RELATIONS
conditions. And you know, these are people who probably were among the first to figure out how to live in these very cold, almost you know, Arctic zones, sub Arctic zones.
Before they were in central Asia, they were in central Asia they were in the Middle East. This was around 40,000 to 45,000 years ago, as part of the second migration, the major migration in fact out of Africa. And before that, they were back in Africa along with everybody else, around 50,000 years ago.
EDWARDS:
So there's two major migrations out of Africa?
WELLS:
As far as we can tell, yes. At the moment that's what it looks like. There was an
early one probably around 50,000 to 55,000 years ago that took a southern coastal route, so along the south coast of Asia around the Arabian peninsula, southern India and ultimately ended up in Australia around 45,000 to 50,000 years ago, very rapid, mostly in the tropics. And so the way of life is very similar along the entire route. It's what we call a coastal super highway.
The second major migration, though, most of the people outside of Africa trace their ancestry back to a second one that came up through the Middle East, around 45,000 years ago. So at the moment that's what the data is telling us.
ROYYURU:
And Ben, you know, this is what makes this project really so interesting. There
are an enormous number of questions that could possibly be at least approached if not answered authoritatively with this project.
To facilitate that we set up an advisory board that’s made up of geneticists, anthropologists, linguists, ethicists, representatives from indigenous groups, to really guide us in what are the questions that one needs to focus on and what will be the best way to connect with the rest of the community not just genetics in looking for answers to questions exactly like these.
IBM INVESTOR RELATIONS
I don’t think we’ll be able to answer every question, you know, it’s just not possible. But we want to focus on the key questions, which could possibly be answered by making the right connection between the data we gathered here versus what’s available out there in the field.
Well, good luck to both of you as you conduct this project over the next few years. It sounds fascinating.
WELLS:
Thanks a lot, Ben.
ROYYURU:
Thank you, Ben.
EDWARDS:
All right. Thanks.
[END OF SEGMENT].