Assess the heart disease risk of the Chinese elderly using a predictive model


  • Yu Fu Central University of Finance and Economics



heart disease, Extreme Gradient Boosting, predictive model, elderly in China


The accelerating aging process worldwide makes chronic diseases the predominant risk for public health, and heart disease is in the top causes of the mortality of the elderly. Studies have verified the interventions can prevent, reduce or delay the onset of chronic diseases. This paper aims to find the domain predictors of heart disease by applying a machine learning technique Extreme Gradient Boosting to 89 predictors extracting from genetic, lifestyle, economic condition, isolation, stressful life events, nutrition and availability of medical service indexes. The individual-level data used is Chinese Longitudinal Healthy Longevity Survey with the time range of 2000 to 2002, and 2011 to 2014. We apply the imputation and oversampling technique to improve the prediction performance and use a step by step parameter tuning process to get the best hyper-parameters needed in the modeling. The fitted predictive model reaches a prediction accuracy of above 90% in the independent test data set. Comparing the first investigated period of 2000 to 2002 with the second period of 2011 to 2014, the predictors associated with economic condition play an important role in the prediction. The nutrition factor, surprisingly, does not contribute significantly to the prediction capability.


[1]. Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), vol. 158, no. 3, pp. 419-444.
[2]. De Leon, C. F. M., T. A. Glass, L. A. Beckett, T. E. Seeman, D. A. Evans, and L. F. Berkman (1999). Social networks and disability transitions across eight intervals of yearly data in the New Haven EPESE. The Journals of Gerontology: Series B, vol. 54B, no. 3, S162-S172.
[3]. European Innovation Partnership on Active and Healthy Ageing(EIP on AHA) (2016). Renovated Action Plan A3.
[4]. Freedman, V. A., E. M. Agree, L. G. Martin, and J. C. Cornman (2006). Trends in the use of assistive technology and personal care for late-life disability, 1992-2001. The Gerontologist, vol. 46, no. 1, pp. 124-127.
[5]. Fried, L. P. and J. M. Guralnik (1997). Disability in older adults: Evidence regarding significance, etiology, and risk. Journal of the American Geriatrics Society, vol. 45, no. 1, pp. 92-100.
[6]. Kennedy, B. K., S. L. Berger, A. Brunet, J. Campisi, A. M. Cuervo, E. S. Epel, C. Franceschi, G. J. Lithgow, R. I. Morimoto, J. E. Pessin, et al. (2014). Geroscience: linking aging to chronic disease. Cell, vol. 159, no. 4, pp. 709-713.
[7]. Kennedy, S., E. Goyder, A. Haywood, and S. Parker (2013). Ageing Populations and Age-Related Health Inequalities: Evidence, issues and implications for policy and practice.
[8]. Sauerbrei, W., P. Royston, and H. Binder (2007). Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in medicine, vol. 26, no. 30, pp. 5512-5528.
[9]. Sauerbrei, W., A.-L. Boulesteix, and H. Binder (2011). Stability investigations of multivariable regression models derived from low-and high-dimensional data. Journal of biopharmaceutical statistics, vol. 21, no. 6, pp. 1206-1231.
[10]. Zeng, Y. (2004). Chinese longitudinal healthy longevity survey and some research findings. Geriatrics and Gerontology International, vol. 4, S49-S52.




How to Cite

Fu, Y. (2020). Assess the heart disease risk of the Chinese elderly using a predictive model. Advances in Social Sciences Research Journal, 7(2), 251–262.