Loading...

A model for health risks prediction based on inherited DNA variation and clinical data

A model for health risks prediction based on inherited DNA variation and clinical data

Dr. Mykyta Artomov – Massachusetts General Hospital, Broad Institute, Nationwide Children’s Hospital, USA

The project is finished.

Simply Put

Complex interactions between inherited disease risks, lifestyle, and environment contribute significantly to individual trajectories of aging. A large amount of physiological and genetic data is required to predict critical timepoints of aging for a single individual – diseases onset age, overall health deterioration, age of death. We aim to investigate such data of more than 180,000 individuals from Finland to understand what features define aging and contribute to the risks of diseases impeding healthy aging. Identified relevant components could then be brought together in a single statistical model to predict disease onset age and mortality.

Description

Aging is commonly associated with an increase in the risk of complex phenotypes onset, such as cardiovascular diseases, diabetes, cancer, etc. In this context, aging outcomes could be defined as the age of major disease onset, age of critical health deterioration, or the age of death. Due to the high complexity of the genetic architecture and its interactions with the environment, modeling aging phenotypes requires lots of detailed clinical and genetic data.

In this project, data from several biobanks will be used to build a predictor for the individual risks of various aging outcomes. The available collections of exome sequencing, genotyping, and clinical record data enable both the construction of a predictor for disease onset age/mortality and its rigorous validation.

First, the data from Finnish biobanks assembled under the FinnGen project, one of the most homogenous and largest genetic and clinical data collections in the world, will be analyzed. In this project, more than 180,000 FinnGen participants with genetic data generated on genotyping arrays and clinical data with 3,882 data points per sample will be considered.

The clinical data for subsets of individuals with the most complete phenotypic information and well-defined time of disease onset will be used

  • to identify a set of phenotypic features, describing common diseases of aging – cardiovascular problems, cancer, etc.
  • to work out an optimal protocol for handling clinical data
  • to define a set of the most relevant for the risk prediction clinical features that could be tested in currently healthy individuals
  • to design an optimal model for integrating clinical features in health risk predictions

Separately, the analysis for inherited genetic features will be performed using genotyping arrays covering the entire genome sequence with about 1,000,000 DNA variants. The GWAS studies on FinnGen data allow evaluation of inherited risks contributing to the age of disease/mortality onset in the Finnish population. Polygenic risk score (PRS) model will be initially used for each individual to measure inherited predisposition to a phenotype conferred by the cumulative effect of relevant DNA alterations. Advanced statistical techniques will be used to develop a more complex model for estimating individual inherited disease risks using the genotype data.

Initially, the analysis will be performed separately for inherited genetic features and clinical features to investigate technical specifics and the relative contribution of each feature type to the overall disease onset age/mortality. Further, the relevant inherited risk features will be integrated with the relevant clinical data in the unified statistical model. The design and performance of the potential model will be investigated to create an optimal predictor. The effects of gender on health prediction accuracy will be estimated, and it will be determined whether gender should be included as a feature in a unified model or building separate predictors for each gender achieves better predictive power.

After the statistical model with the best combination of important features will be validated on FinnGen data, the transferability of the model to other ancestries and cultural backgrounds will be investigated using similar datasets. Specifically, a well-phenotyped cohort from Russia will be used to replicate findings and investigate the genetic diversity effects outside of the Finnish population.

First Results

The first case study results, “Supernormal Vascular Aging in Leningrad Siege Survivors,” were published in Frontiers in Cardiovascular Medicine on May 23, 2022.

Biological age can be described with a variety of markers ranging from clinical measurements, such as grip strength, to molecular-level signatures, such as DNA methylations. Blood vessel stiffness is one of the important structural metrics of biological age. The stiffness of the vessel is estimated by measuring the carotid-femoral pulse wave velocity (cfPWV). Upon comparison of vascular age with chronological age, there are three common phenotypes of vascular aging – normal, early vascular aging (EVA) and supernormal vascular aging (SUPERNOVA). Common factors affecting the likelihood of each phenotype include classic cardiovascular risk factors – smoking, hypercholesterolemia, hypertension, etc. Less is known about the role of inherited predisposition and exposures in early stages of life, for example, stress and malnutrition.

In recent published work, the researchers describe a unique case study of two female patients who survived near-death starvation in early childhood during World War II. Both patients, despite a severe history of life-threatening exposures, presented a supernormal vascular aging phenotype during clinical visits (age 73 and 71 at the time of visit). In addition, both patients had no signs of dyslipidemia, kidney problems, or diabetes. They have not received any antihypertensive or hypolipidemic medications. Furthermore, a common marker of subclinical atherosclerosis – carotid intima-media thickness measurements – showed only minimal signs of thickening without the formation of cholesterol plaques.

The scientists investigated how such early exposures could be mitigated to delay cardiovascular system deterioration and stimulate healthy aging. Clinical and behavioral features analysis showed that diet of both patients did not follow any recommendations of common healthy aging recommendations, however, there were no negative eating patterns as well. Sufficient physical activity, parental longevity, favorable reproductive history, and positive psychological state were among distinctive clinical features.

Analysis of the inherited susceptibilities was performed in comparison with a 103 other patients with similar starvation/exposures experience in early childhood. Most notably, the polygenic risk scores for cfPWV in two patients of interest were unremarkable – average compared to the rest of the cohort. However, both patients had susceptibility for lower high-density lipoprotein (HDL) levels, yet their actual measurements were within normal range. Such a mismatch between low expected and normal observed HDL levels, likely increases potential protective effects of HDL against cardiovascular incidents.

This clinical study illustrates that an entire variety of factors affects favorable cardiovascular health trajectory throughout a lifespan, even despite severely damaging early in life exposures that can be mitigated by a scenario of ideal congruence between hereditary resistance to a disease and practice lifestyle.

 

Uncovering Non-Additive Genetic Effects in Age-Related Disease, published in Nature Communications on December 13, 2025

Genome-wide association studies (GWAS) are a way to scan the DNA of very large numbers of people to find genetic differences linked to diseases. Most GWAS assume genetic effects are “additive” (each extra copy of a risk variant nudges risk up by a similar amount), but real biology isn’t always that simple: some variants act dominantly (one copy is enough to have the full effect) or recessively (you typically need two copies). Those “non-additive” patterns can reveal disease links that standard GWAS may miss – including in aging-related diseases, where many conditions have complex, multi-gene roots.

Recent publication from Artomov Lab supported by the Aging Biology Foundation, removes a major practical barrier: computational costs. Non-additive GWAS across thousands of conditions is usually too computationally expensive, so it has not been done broadly. The authors show a clever shortcut: use results from the usual additive GWAS to filter out genetic variants that are extremely unlikely to matter in a non-additive test, then run the heavier analysis only where it counts. In the FinnGen biobank (500,349 people; 2,329 diseases), this cut compute by about three orders of magnitude, dropping estimated cost from about $27,000 to under $40, while still uncovering 781 new genetic regions missed by additive GWAS and then linking many of them to likely genes/mechanisms using fine-mapping and colocalization across 571 datasets.

In the context of aging research, this makes it realistic to routinely look for “hidden” genetic effects across many diseases that rise with age (cardiovascular, metabolic, neurodegenerative, immune-mediated, and more). Genetic findings are especially valuable for prioritizing biological pathways and potential drug targets, and the paper emphasizes that genetic evidence can strengthen the chances that a drug target will succeed in clinical trials. This works not only reports new biological signals but goes further by connecting them to gene activity and proteins to suggest mechanisms. This work makes finding clues for disease risk estimation and therapeutic target identification cheap and scalable, which can accelerate discovery across the full landscape of age-related disease.

Final Results

Inherited risk profile and environmental exposures acquired throughout the lifetime shape the disease risk at the given age. The key goal of this project was to leverage large-scale genetic and clinical data to create and validate statistical models for predicting the likelihood and timing of the disease onset (and mortality).

UK biobank and FinnGen were used as the two key datasets to provide longitudinal clinical data, lab measurements and genetic data for building the predictive models, within-dataset and across-dataset performance estimation and investigation of the interplay between different risk factors to suggest possible health risk mitigation strategies.

Inherited risk for most common chronic diseases is spread across many genes, and, as a result, should be assessed via metrics that accumulate the risk effects across DNA, such as polygenic risk scores (PGS). This project has constructed the first independent performance estimation of the vast majority of PGS models available to date for predicting disease onset time and lifetime risks. Importantly, off-target predictive power of PGS (when the PGS for a specific trait also can predict other traits) was used to construct optimized multi-PGS predictive models to further boost performance.

In this work, major barrier for implementing open personalized PGS interpretation was solved by creating a solution to enable genetic risk profiling of external subjects in comparison with ~500,000 FinnGen participants. This was achieved via creating a parametrized population-based risk distributions that could be shared publicly, therefore, personalized genetic risk assessment could be performed locally, without the need for sharing sensitive genetic data with external resources. PGS browser – https://pgs.nchigm.org, is a publicly accessible resource representing all of the performance estimates, predictive models access and hosting the docker images that could be used locally to estimate individual risk profiles (Kolosov et al, Nat Communications, 2026). These models were already used to provide new insights into the structure of genetic risks across several diseases by major biobanks (Reeve et al, AJHG, 2025; Reeve et al, Nature Genetics, 2026 , Figure 1).

Figure 1
Figure 1. Results and predictive models access via PGS Browser.

Alongside with primary genetic predictive models, substantial contributions to development of the novel methods to understand non-additive genetic effects and impact of genetic data generation methods were made as a part of the technical ground work (Kolosov et al, Plos One, 2022; Artomov et al, Genome Research, 2023; Usoltsev et al, Nature Communications, 2024).

Importantly, the resulting genetic predictive models are adjusted for genetic ancestry, therefore, are broadly applicable (Figure 2).

Figure 2
Figure 2. Genetic ancestry adjustment for genetic predictive models.

The project has evaluated the performance of predictive models combining genetic and clinical features and quantified the importance of including both types of the data across 58 common chronic and age-related diseases. On average, inclusion of genetic features increased model’s performance by 2.5% with some diseases benefiting from up to 21% increase!

An alpha-version of the public platform for personalized disease risk prediction includes automated estimation of ancestry-adjusted genetic risks (from multiple data types) and intake of clinical information that is relevant for the target disease prediction. It also enables exploration of the feature importances for identification of the possible most impactful interventions. Furthermore, the results clearly highlight the impact of unified predictive models capturing both inherited and clinical components of the disease risk (example of the risk trajectory predicted for the same subject assuming different heritable risk profiles Figure 3).

Figure 3
Figure 3. Predicted risk profiles for Type 2 diabetes for the same subject assuming different genetic risk profiles, but keeping clinical features the same.

Public predictive models:

PGS browser – https://pgs.nchigm.org

Unified model – (public access coming in the fall 2026).

Publications:

  1. N Kolosov, M P Reeve, P Della Briotta Parolo, M I Kurki, V Llorens, FinnGen, T Petteri Sipila, A Herman, I Molotkov, M Aavikko, S Ripatti, A Palotie, M J Daly, M Artomov. PGS Browser: a public platform for personalized polygenic score interpretation. Nature Communications, accepted, 05/2026.
  2. I Molotkov, M Kurki, FinnGen, A Palotie, MJ Daly, M Artomov. Novel method for scalable non-additive GWAS reveals 781 new loci across 2,329 traits. Nature Communications, 17, Article number: 580 (2026)
  3. M Reeve, M Kanai, D Graham, J Karjalainen, S Luo, N Kolosov, C Adams, J Ritari, K Karczewski, T Kiiskinen, Z Fuller, J Mehtonen, M Kurki, Z Khan, J Partanen, M McCarthy, M Artomov, T Tuomi, M Pirinen, J Kero, R Xavier, M Daly, S Ripatti, FinnGen. Autoimmune hypothyroidism GWAS reveals independent autoimmune and thyroid-specific contributions and an inverse relation with cancer risk. Nature Genetics, 58, 2026
  4. D Usoltsev*, E Moguchaya*, M Boyarinova, E Kolesova, A Erina, K Tolkunova, N Paskar,  A Alieva, E Vasilyeva, S Kibkalo, A Kostareva, A Konradi, E Shlyakhto,O Rotar,  and M Artomov. Analysis of Vascular Aging Phenotypes in a High Cardiovascular Risk Population. Scientific Reports, 2025
  5. D Usoltsev, N Kolosov, O Rotar, A Loboda, M Boyarinova, E Moguchaya, E Kolesova, A Erina, K Tolkunova, V Rezapova, I Molotkov, O Melnik, O Freylikhman, N Paskar, A Alieva, E Baranova, E Bazhenova, O Beliaeva, E Vasilyeva, S Kibkalo, R Skitchenko, A Babenko, A Sergushichev, A Dushina, E Lopina, I Basyrova, R Libis, D Duplyakov, N Cherepanova, K Donner, P Laiho, A Kostareva, A Konradi, E Shlyakhto, A Palotie, M J Daly, M Artomov. Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nature Communications, 15(1), 2024
  6. MP Reeve, M Vehvil√§inen, S Luo, J Ritari, J Karjalainen, J Gracia-Tabuenca, J Mehtonen, S S Padmanabhuni, N Kolosov, M Artomov, H Siirtola, H M Olilla, D Graham, J Partanen, R J Xavier, M J Daly, S Ripatti, T Salo, M Siponen. “Oral and non-oral lichen planus show genetic heterogeneity and differential risk for autoimmune disease and oral cancer“. American Journal of Human Genetics. 111(6), 2024
  7. M Artomov*,#, E Atkinson*,#, KJ Karczewski, AA Loboda, HL Rehm, DG MacArthur, BM Neale, MJ Daly. “Discordant genotype calls across technology platforms elucidate variants with systematic errors in next-generation sequencing“. Genome research 33 (6), 2023.
  8. K Tolkunova, D Usoltsev, E Moguchaia, M Boyarinova, E Kolesova, A Erina, T Voortman, E Vasilyeva, A Kostareva, E Shlyakhto, A Konradi, O Rotar, M Artomov. “Transgenerational effect of early childhood famine exposure in the cohort of offspring of Leningrad siege survivors”. Scientific Reports 13(1), 2023.
  9. N Kolosov, V Rezapova, O Rotar, A Loboda, O Freylikhman, O Melnik, A Sergushichev, C Stevens, T Voortman, A Kostareva, A Konradi, M J Daly, M Artomov. “Genotype imputation and polygenic score estimation in northwestern Russian population”. Plos ONE 17(6), 2022