Show me your DNA, I’ll tell you where you come from

Rhizome Journal Club

Starting from 2020, Rhizome members meet up once per month to discuss a topic one of the members finds relevant for our community! You can read about those topics in this new monthly blog post series Journal Club.    

Contributed blog post by Maxime Borry (@notmaxib)

Each of us has a unique genome, yet the difference between two random humans on this planet is tiny: around 0.1% on average.

But don’t let yourself be fooled by this number! This 0.1% difference between you and me actually contains enough information to tell much more about you then you’d suspect.
While our genome contains around 3.2 billions letters, or nucleotides, these A, T, G, C which code for our genes, scientists can use only a fraction of them, around 200 000, to guess which country you’re from.

Single variation between you and me

If you ask a population geneticist about the difference between two individuals, he’ll probably tell you about SNPs.
While our Scottish readers might think about their governing party, SNP in this context, stands for Single Nucleotide Polymorphism.
Single as in one position, polymorphism as in several possibilities of nucleotide at this position.
But let’s not get ahead of ourselves: let’s talk about the fundamentals first.
As I mentioned above, each of us is defined by these 3.2 billions of nucleotides that constitute our genome. It turns out that the same way we define the reference meter as being some length of a path travelled by light in vacuum, biologists have defined a reference human genome, currently named hg38.

Comparing your genome to hg38 will lead to an average of 0.1% difference. When at a given position in the genome hg38 at least 0.5% of humans have a different nucleotide, biologists call it an SNP.

source: wikipedia

Many variations can tell us a lot

0.1% of 3.2 billion nucleotides, that makes a lot of variations! To study these variations, scientists have come up with assays that look at a subset of them. Using one of these assays, Novembre et al. showed that they were able to predict the geographic origin of a person, solely based on their genetic data. To do so, they looked at 197’146 SNPs, of 1’387 European individuals, with known geographical origin.

Genetic data is like a gravy, it’s best when reduced

I don’t know about you, but when I look at a graph in three dimensions, it already takes me a few seconds to figure out which is what. Now imagine a graph in 197’146 dimensions, each of which representing a SNP !
Luckily for us, the authors have used a technique called PCA to reduce the number of dimensions to only two. After a slight 16° rotation of the graph, they projected it on the map of Europe. The resulting figure can be seen here.

The projection of individual SNPs genetic data in 2 dimensions matches darn well the self-reported country of origin. Pretty interesting result, no?

From a 2D graph to prediction of geographic origin

Using the two dimensions of this graph, the authors constructed a simple mathematical (linear regression) model to predict the geographic origin of a person based on its SNPs genetic profile. Using this method they were able to predict, for 90% of the 1’387 European individuals, their geographical origin within 840km.

My DNA is a bit of your DNA, and it’s precious

Dear reader, I probably don’t know you personally, but as you’ve seen above, we share around 99.9% of our DNA.
And as you might guess you share even more with members of your family, in decreasing order of common DNA, your parents, your siblings, your cousins, etc…
These days, many companies offer you to sequence your DNA to get insights about your ancestry and health, for a moderate financial contribution. While this might seem like a tempting offer, be aware that you will not only share your DNA with these companies, but also most of the DNA of your family, and they might not exactly be ok with the FBI knowing about it.

So before sending your saliva over for genome sequencing, think about it twice.