What drives the reduced prediction accuracy of polygenic scores in non-European individuals?

Polygenic risk scores (PRS) summarise the genetic information spread across several genetic variants into one single number. This number can be use to predict an individual’s phenotype or - more realistically - to place an individual in risk groups according to their PRS.

Many different steps are involved in calculating a PRS that is meaningful. For instance, the quality of the summary statistics is paramount. The choice of which variants to include in the PRS is also far from trivial and can involve simple ad hoc approaches (such as C+T) or model-based approaches such as LDpred and LDpred2. Finally, there are the issues of validation and overfitting in the validation set, as well as the even more complicated issue of transferability of polygenic risk scores across ancestries. The interplay of this complex array of factors in determining prediction accuracy across ancestries is the focus of this project.

I have been actively working on this with Iain Mathieson. We have a preprint and I have given several talks on this topic recently.

This is an ongoing topic of interest for me, and I am activley looking for guest lecturer teaching opportunities and collaboration opportunities, so do reach out if you’re interested in either of these.

Related