Screening for diabetes mellitus in the US population using neural network-based modeling and complex survey designs

Complex survey designs are widely used in medical cohort studies. Developing risk score models that adequately account for the sampling design is essential to minimize selection bias and obtain representative population estimates. This work addresses three complementary objectives. First, we propose a general predictive framework for regression and classification tasks that utilizes neural networks to incorporate survey weights into the model estimation process. Second, we introduce a procedure for quantifying prediction uncertainty based on conformal inference, adapted to the characteristics of complex survey data. Third, we demonstrate the application of the proposed methodology in a case study assessing the risk of diabetes mellitus in the US population, using the NHANES 2011–2014 cohort. The empirical results show that models of varying complexity, each using different sets of predictors, achieve different trade-offs between predictive performance and economic cost while maintaining generalizability at the population level. Although the case study focuses on diabetes, the proposed framework is directly applicable to the development of clinical prediction models for other diseases and complex survey datasets. All software and data used in this study are publicly available on GitHub.

keywords: Survey data, neural networks, diabetes, disease scores