A model for health risks prediction based on inherited DNA variation and clinical data

A model for health risks prediction based on inherited DNA variation and clinical data

Dr. Mykyta Artomov – Massachusetts General Hospital, Broad Institute, USA

Simply Put

Complex interactions between inherited disease risks, lifestyle, and environment contribute significantly to individual trajectories of aging. A large amount of physiological and genetic data is required to predict critical timepoints of aging for a single individual – diseases onset age, overall health deterioration, age of death. We aim to investigate such data of more than 180,000 individuals from Finland to understand what features define aging and contribute to the risks of diseases impeding healthy aging. Identified relevant components could then be brought together in a single statistical model to predict disease onset age and mortality.


Aging is commonly associated with an increase in the risk of complex phenotypes onset, such as cardiovascular diseases, diabetes, cancer, etc. In this context, aging outcomes could be defined as the age of major disease onset, age of critical health deterioration, or the age of death. Due to the high complexity of the genetic architecture and its interactions with the environment, modeling aging phenotypes requires lots of detailed clinical and genetic data.

In this project, data from several biobanks will be used to build a predictor for the individual risks of various aging outcomes. The available collections of exome sequencing, genotyping, and clinical record data enable both the construction of a predictor for disease onset age/mortality and its rigorous validation.

First, the data from Finnish biobanks assembled under the FinnGen project, one of the most homogenous and largest genetic and clinical data collections in the world, will be analyzed. In this project, more than 180,000 FinnGen participants with genetic data generated on genotyping arrays and clinical data with 3,882 data points per sample will be considered.

The clinical data for subsets of individuals with the most complete phenotypic information and well-defined time of disease onset will be used

  • to identify a set of phenotypic features, describing common diseases of aging – cardiovascular problems, cancer, etc.
  • to work out an optimal protocol for handling clinical data
  • to define a set of the most relevant for the risk prediction clinical features that could be tested in currently healthy individuals
  • to design an optimal model for integrating clinical features in health risk predictions

Separately, the analysis for inherited genetic features will be performed using genotyping arrays covering the entire genome sequence with about 1,000,000 DNA variants. The GWAS studies on FinnGen data allow evaluation of inherited risks contributing to the age of disease/mortality onset in the Finnish population. Polygenic risk score (PRS) model will be initially used for each individual to measure inherited predisposition to a phenotype conferred by the cumulative effect of relevant DNA alterations. Advanced statistical techniques will be used to develop a more complex model for estimating individual inherited disease risks using the genotype data.

Initially, the analysis will be performed separately for inherited genetic features and clinical features to investigate technical specifics and the relative contribution of each feature type to the overall disease onset age/mortality. Further, the relevant inherited risk features will be integrated with the relevant clinical data in the unified statistical model. The design and performance of the potential model will be investigated to create an optimal predictor. The effects of gender on health prediction accuracy will be estimated, and it will be determined whether gender should be included as a feature in a unified model or building separate predictors for each gender achieves better predictive power.

After the statistical model with the best combination of important features will be validated on FinnGen data, the transferability of the model to other ancestries and cultural backgrounds will be investigated using similar datasets. Specifically, a well-phenotyped cohort from Russia will be used to replicate findings and investigate the genetic diversity effects outside of the Finnish population.


The first case study results, “Supernormal Vascular Aging in Leningrad Siege Survivors,” were published in Frontiers in Cardiovascular Medicine on May 23, 2022.

Biological age can be described with a variety of markers ranging from clinical measurements, such as grip strength, to molecular-level signatures, such as DNA methylations. Blood vessel stiffness is one of the important structural metrics of biological age. The stiffness of the vessel is estimated by measuring the carotid-femoral pulse wave velocity (cfPWV). Upon comparison of vascular age with chronological age, there are three common phenotypes of vascular aging – normal, early vascular aging (EVA) and supernormal vascular aging (SUPERNOVA). Common factors affecting the likelihood of each phenotype include classic cardiovascular risk factors – smoking, hypercholesterolemia, hypertension, etc. Less is known about the role of inherited predisposition and exposures in early stages of life, for example, stress and malnutrition.

In recent published work, the researchers describe a unique case study of two female patients who survived near-death starvation in early childhood during World War II. Both patients, despite a severe history of life-threatening exposures, presented a supernormal vascular aging phenotype during clinical visits (age 73 and 71 at the time of visit). In addition, both patients had no signs of dyslipidemia, kidney problems, or diabetes. They have not received any antihypertensive or hypolipidemic medications. Furthermore, a common marker of subclinical atherosclerosis – carotid intima-media thickness measurements – showed only minimal signs of thickening without the formation of cholesterol plaques.

The scientists investigated how such early exposures could be mitigated to delay cardiovascular system deterioration and stimulate healthy aging. Clinical and behavioral features analysis showed that diet of both patients did not follow any recommendations of common healthy aging recommendations, however, there were no negative eating patterns as well. Sufficient physical activity, parental longevity, favorable reproductive history, and positive psychological state were among distinctive clinical features.

Analysis of the inherited susceptibilities was performed in comparison with a 103 other patients with similar starvation/exposures experience in early childhood. Most notably, the polygenic risk scores for cfPWV in two patients of interest were unremarkable – average compared to the rest of the cohort. However, both patients had susceptibility for lower high-density lipoprotein (HDL) levels, yet their actual measurements were within normal range. Such a mismatch between low expected and normal observed HDL levels, likely increases potential protective effects of HDL against cardiovascular incidents.

This clinical study illustrates that an entire variety of factors affects favorable cardiovascular health trajectory throughout a lifespan, even despite severely damaging early in life exposures that can be mitigated by a scenario of ideal congruence between hereditary resistance to a disease and practice lifestyle.