Genetic Disease Research Concept

Affecting up to 216,000 studies – Popular genetic method proven to be deeply flawed

The flawed method has been used in hundreds of thousands of studies.

A new study reveals flaws in a common analytical method in population genetics.

According to recent research from the Swedish University of Lund, the most commonly used method of analysis in population genetics is deeply flawed. This could have led to incorrect results and misconceptions regarding ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, influencing the results of medical genetics and even commercial ancestry testing. The results were recently published in the journal Scientific reports.

The rate at which scientific data can be collected is rapidly increasing, resulting in huge and highly complex databases, which has been dubbed the “Big Data Revolution”. Researchers use statistical techniques to condense and simplify the data while retaining the majority of important information to make the data more manageable. PCA (Principal Component Analysis) is perhaps the most widely used approach. Imagine PCA as an oven with flour, sugar and eggs serving as input data. The oven can still do the same thing, but the end result, a cake, is highly dependent on the proportions of the ingredients and how they are mixed together.

“This method is expected to give decent results because it is so frequently used. But this is neither a guarantee of reliability nor statistically robust conclusions,” says Dr. Eran Elhaik, associate professor of molecular cell biology at Lund University.

According to Elhaik, the method contributed to the development of ancient beliefs about race and ethnicity. It plays a role in shaping historical narratives about who and where people came from, not only by the scientific community but also by commercial ancestry companies. A well-known example is when a famous American politician used an ancestry test to back up his ancestral claims ahead of the 2020 presidential campaign. Another example is the misconception of Ashkenazi Jews as an isolated group or race driven by PCA results.

“This study demonstrates that these results were not reliable,” says Eran Elhaik.

PCA is used in many scientific fields, but Elhaik’s study focuses on its use in population genetics, where the explosion in the size of datasets is particularly acute, which is motivated by the reduction costs of

DNA
DNA, or deoxyribonucleic acid, is a molecule made up of two long strands of nucleotides that wrap around each other to form a double helix. It is the hereditary material in humans and almost all other organisms that carries genetic instructions for development, functioning, growth and reproduction. Almost all cells in a person’s body have the same DNA. Most DNA is found in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial DNA or mtDNA).

” data-gt-translate-attributes=”[{” attribute=””>DNA sequencing.

The field of paleogenomics, where we want to learn about ancient peoples and individuals such as Copper age Europeans, heavily relies on PCA. PCA is used to create a genetic map that positions the unknown sample alongside known reference samples. Thus far, the unknown samples have been assumed to be related to whichever reference population they overlap or lie closest to on the map.

However, Elhaik discovered that the unknown sample could be made to lie close to virtually any reference population just by changing the numbers and types of the reference samples (see illustration), generating practically endless historical versions, all mathematically “correct,” but only one may be biologically correct.

In the study, Elhaik has examined the twelve most common population genetic applications of PCA. He has used both simulated and real genetic data to show just how flexible PCA results can be. According to Elhaik, this flexibility means that conclusions based on PCA cannot be trusted since any change to the reference or test samples will produce different results.

Between 32,000 and 216,000 scientific articles in genetics alone have employed PCA for exploring and visualizing similarities and differences between individuals and populations and based their conclusions on these results.

“I believe these results must be re-evaluated,” says Elhaik.

He hopes that the new study will develop a better approach to questioning results and thus help to make science more reliable. He spent a significant portion of the past decade pioneering such methods, like the Geographic Population Structure (GPS) for predicting biogeography from DNA and the Pairwise Matcher to improve case-control matches used in genetic tests and drug trials.

“Techniques that offer such flexibility encourage bad science and are particularly dangerous in a world where there is intense pressure to publish. If a researcher runs PCA several times, the temptation will always be to select the output that makes the best story”, adds Professor William Amos, from the Univesity of Cambridge, who was not involved in the study.

Reference: “Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated” by Eran Elhaik, 29 August 2022, Scientific Reports.
DOI: 10.1038/s41598-022-14395-4


#Affecting #studies #Popular #genetic #method #proven #deeply #flawed

Leave a Comment

Your email address will not be published.