Development and validation of the elderlies' diabetes risk predictive model using the Chinese data

  • Yu Fu Central University of Finance and Economics
Keywords: diabetes risk, predictors, China, CLHLS


Ageing is closely related to the functional decline and is the predominant causes of the chronic diseases such as cardiovascular disease, stroke and diabetes. Population ageing worldwide accelerates the prevalence of the chronic disease. Ageing China is suffering from the diabetes risk more than other countries according to WHO reports. We adapt a machine learning algorithm Extreme Gradient Boosting to model the incidence rate of diabetes in China using a large amount of individual-level characteristic indexes as predictors. The model performance is guaranteed with a prediction accuracy above 85%, arising from the use of minority class oversampling and a multi-variable grid search technique. We apply the 2000-2002 wave and 2011-2014 wave of the Chinese Longitudinal Healthy Longevity Survey (CLHLS) to investigate how the leading predictors of the diabetes risk change as time pass. The importance of social-economic status, life-style and the access to the medical service rise in the later wave, and the relative importance of isolation and stressful life events which are related to social-psychological health decline in the investigated period, indicating a disparity of the diabetes risk within subgroups of different economic conditions.

Author Biography

Yu Fu, Central University of Finance and Economics

Insurance School, Central University of Finance and Economics


[1]. Belanger, A., L. Martel, J.-M. Berthelot, and R. Wilkins (2002). Gender difference in disability-free life expectancy for selected risk factors and chronic conditions in Canada. Journal of women & aging, vol. 14, no. 1-2, pp. 61-83.
[2]. China National Working Commission on Ageing (2016). The 4th Urban and Rural Elderly Life Quality Sampling Investigation in China report.
[3]. European Innovation Partnership on Active and Healthy Ageing (EIP on AHA) (2016). Renovated Action Plan A3.
[4]. Group, D. P.P.D. R. et al. (2002). The Diabetes Prevention Program (DPP): Description of lifestyle intervention'. Diabetes care, vol. 25, no. 12, pp. 2165-2171.
[5]. Hoem, J. M. (1988). The versatility of the Markov chain as a tool in the mathematics of life insurance. In: Transactions of the 23rd International Congress of Actuaries. Vol. 3, pp. 171-202.
[6]. Kennedy, B. K., S. L. Berger, A. Brunet, J. Campisi, A. M. Cuervo, E. S. Epel, C. Franceschi, G. J. Lithgow, R. I. Morimoto, J. E. Pessin, et al. (2014). Geroscience: linking aging to chronic disease. Cell, vol. 159, no. 4, pp. 709-713.
[7]. Kennedy, S., E. Goyder, A. Haywood, and S. Parker (2013). Ageing Populations and Age Related Health Inequalities: Evidence, issues and implications for policy and practice.
[8]. Lindström, J., Ilanne-Parikka, P., Peltonen, M., Aunola, S., Eriksson, J. G., Hemiö, K., ... & Louheranta, A. (2006). Sustained reduction in the incidence of type 2 diabetes by lifestyle intervention: follow-up of the Finnish Diabetes Prevention Study. The Lancet, 368(9548), 1673-1679.
[9]. Sherris, M. and P. Wei (2019). A multi-state model of functional disability and health status in the presence of systematic trend and uncertainty. Available at SSRN 3445761.
[10]. Steyerberg, E.W., F. E. Harrell Jr, G. J. Borsboom, M. Eijkemans, Y. Vergouwe, and J. D. F.Habbema (2001). Internal validation of predictive models: effciency of some procedures for logistic regression analysis. Journal of clinical epidemiology, vol. 54, no. 8, pp. 774-781.
[11]. Steyerberg, E. W., S. E. Bleeker, H. A. Moll, D. E. Grobbee, and K. G. Moons (2003). Internal and external validation of predictive models: a simulation study of bias and precision in small samples. Journal of clinical epidemiology, vol. 56, no. 5, pp. 441-447.
[12]. Zeng, Y. (2004). Chinese longitudinal healthy longevity survey and some research findings. Geriatrics & Gerontology International, vol. 4, S49-S52.
How to Cite
Fu, Y. (2020). Development and validation of the elderlies’ diabetes risk predictive model using the Chinese data. Transactions on Machine Learning and Artificial Intelligence, 8(5), 16-26.