The multinational research effort, led by Harvard Medical School geneticists and published Sept. 21 in Nature, also suggests that no single gene can explain the significant cultural and cognitive progress in human development that occurred about 50,000 years ago.
The study represents the largest data set yet of high-quality genome sequences from understudied populations, adding nearly 6 million DNA base pairs to the “canonical” human genome sequence published in 2001.
A heat map showing locations of previously unknown DNA variants. Red indicates higher number of discoveries, black fewer.
Most genome-wide population sequencing studies to date have focused on a handful of large populations. The HMS-led study, by comparison, sequenced samples from 142 smaller populations, most of which were previously understudied.
“As humans, we are not just the people who live in industrialized countries, and we are not just the people who live in numerically large groups,” said David Reich, professor of genetics at HMS and senior author of the study. “If we want to understand who we really are, we have to realize that some of the most interesting aspects of human variation are only present in underrepresented, small populations.”
“We wanted to go out into the world and pull together as many of the ethnically, linguistically and anthropologically diverse samples as we possibly could,” said Swapan Mallick, bioinformatic systems director in the Reich lab and first author of the study.
The team’s analyses are already answering questions about various populations’ genetic origins, but, the researchers note, these insights are only a milestone on a longer journey.
“Of course, there are thousands of ethnically distinct populations in the world, and much more work needs to be done,” said Mallick.
Reich, Mallick and their international team of colleagues began by selecting two genomes each from 51 populations represented in a collection called the Human Genome Diversity Project. Next, they assembled samples from members of 91 other groups, including diverse Native American, South Asian, and African populations not previously included in genome-wide studies, and sent the DNA for sequencing. In all, the project analyzed the genomes of 300 people.
A key conclusion—that the vast majority of modern human ancestry in non-Africans derives from a single population that migrated out of Africa—is also supported by two other whole-genome sequencing studies appearing simultaneously in Nature. One, led by an Estonian group, focused on 379 whole genome sequences; the other, led by a Danish group, analyzed 108 Australians and New Guineans.
Together, the three studies put to rest a lingering question about whether indigenous peoples of Australia, New Guinea and the Andaman Islands descend in large part from a second group that left Africa earlier and skirted the coast of the Indian Ocean. They do not, the HMS researchers say.
“Our best estimate for the proportion of ancestry from an early-exit population is zero,” said Reich, who is also an investigator of the Howard Hughes Medical Institute and associate member of the Broad Institute. “Taken together, all three studies leave wiggle room for, at most, around two percent.”
The HMS-led study further revealed that the common ancestors of modern humans began to differentiate at least 200,000 years ago, long before the out-of-Africa dispersal occurred.
“It had been unclear whether the group that expanded out of Africa represented a large subset of the populations within Africa,” said Mallick. “This really shows that there was a lot of substructure prior to the expansion.”
The additional discovery that genetics alone can’t account for the acceleration of cultural, economic and intellectual progress in the last 50,000 years runs contrary to a popular hypothesis in the field.
“There does not seem to have been one or a few enabling mutations that suddenly appeared among our ancestors and allowed them to think in profoundly different ways,” said Reich.
Instead, the researchers say, a constellation of factors, including environment, lifestyle, and possibly genes, precipitated the rapid changes that occurred.
“Geneticists often search for examples where genetics is the explanation. Here, paradoxically, genetic data are showing that there will be no clear genetic answers,” Reich said.
Mallick and colleagues overcame significant logistical hurdles posed by sharing and processing an enormous amount of data.
Often, in studies of this size, data are collected in many laboratories that use different sequencing machines and different experimental protocols. This can create so-called batch effects that make it difficult to distinguish true differences among samples. The current study minimized batch effects by sending all of the samples to a single center to be sequenced at the same time.
The team made much of the data set publicly available in 2014; multiple research groups have already used it for their studies.
In a way, the authors say, the findings reported thus far are just the tip of the iceberg.
“It’s impossible for our group to analyze even a tiny fraction of what the data represents,” said Mallick. “Our goal is to push the data out and let people use it to consider their own questions.”
Primary funding for the study, called the Simons Genome Diversity Project, was provided by the Simons Foundation (SFARI 280376) and the National Science Foundation (BCS-1032255).