Predicting patient hospital charges using machine learning


  • Dolley Shukla Shri Shankaracharya Technical Campus, Bhilai, India
  • Preeti Chandrakar Shri Shankaracharya Technical Campus, Bhilai, India



hospital charges, machine learning, World Health Organization


As the health care system moves toward value-based care, Clinical Management System (CMS) has designed a number of programs to improve the quality of patient care. One of these programs is called the Hospital Patient Admission Cost Analysis Program, which helps the patient and the hospital to diagnose the disease and estimate the cost of hospitalization. According to the World Health Organization (WHO), the personal and medical costs have skyrocketed faster than the global economy. Major attributes which cause an increase in expenditure include smoking, ageing and increased Body Mass Index (BMI). In this study, we find a correlation between medical costs and various items using the insurance data of different people with characteristics such as smoking, age, the number of children, region and BMI. This study can also be used to demonstrate different models of regression that can be used to forecast insurance costs. Machine learning significantly reduces human efforts because machine learning models can compute cost calculations in short time, for which human beings take much more time.


S. Chatterjee, C. Levin, R. Laxminarayan, “Unit cost of medical services at different hospitals in India,” PLoS ONE, vol. 8, no. 7, p. e69728, 2013, doi:

A. Gelman, J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2006, doi:

D. Gefen, D. Straub, M.-C. Boudreau, “Structural equation modeling and regression: guidelines for research practice,” Commun. Assoc. Inf. Syst., vol. 4, no. 1, p. 7, 2000, doi:

P. Fenn, “Current cost of medical negligence in NHS hospitals: analysis of claims database,” BMJ, vol. 320, no. 7249, pp. 1567–1571, 2000, doi:

R. Hillestad et al., “Can electronic medical record systems transform health care? Potential health benefits, savings, and costs,” Heal. Aff., vol. 24, no. 5, pp. 1103–1117, 2005, doi:

M. Hanafy, O. M. A. Mahmoud, “Predict health insurance cost by using machine learning and DNN regression models,” Int. J. Innov. Technol. Explor. Eng., vol. 10, no. 3, pp. 137–143, 2021, doi:

S. Kapsiani, B. J. Howlin, “Random forest classification for predicting lifespan-extending chemical compounds,” Sci. Reports, vol. 11, no. 1, p. 13812, 2021, doi:

J. W. Robinson, “Regression tree boosting to adjust health care cost predictions for diagnostic mix,” Heal. Serv. Res., vol. 43, no. 2, pp. 755–772, 2008, doi:

R. Sturm, “The effects of obesity, smoking, and drinking on medical problems and costs,” Heal. Aff., vol. 21, no. 2, pp. 245–253, 2002, doi:

J. Cawley et al., “Direct medical costs of obesity in the United States and the most populous states,” J. Manag. Care Spec. Pharm., vol. 27, no. 3, pp. 354–366, 2021, doi:

P. Schober, T. R. Vetter, “Linear regression in medical research,” Anesth. Analg., vol. 132, no. 1, pp. 108–109, 2021, doi:

K. Inc., “Medical Cost Personal Datasets.”

L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001, doi:

J. Hatwell, M. M. Gaber, R. M. A. Azad, “CHIRPS: Explaining random forest classification,” Artif. Intell. Rev., vol. 53, no. 8, pp. 5747–5788, 2020, doi:

M. Schonlau, R. Y. Zou, “The random forest algorithm for statistical learning,” Stata J. Promot. Commun. Stat. Stata, vol. 20, no. 1, pp. 3–29, 2020, doi:

L. de la Perrelle, G. Radisic, M. Cations, B. Kaambwa, G. Barbery, K. Laver, “Costs and economic evaluations of quality improvement collaboratives in healthcare: a systematic review,” BMC Heal. Serv. Res., vol. 20, no. 1, p. 155, 2020, doi:

P. Chandrakar, “Github Link:”

B. Abdelmoula, M. Torjmen, N. B. Abdelmoula, “Machine learning based prediction tool of hospitalization cost,” in 2021 22nd International Arab Conference on Information Technology (ACIT), 2021, pp. 1–6, doi:

B. Langenberger, T. Schulte, O. Groene, “The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data,” PLOS ONE, vol. 18, no. 1, p. e0279540, 2023, doi:

Z. Xiao, X. Xu, H. Xing, F. Song, X. Wang, B. Zhao, “A federated learning system with enhanced feature extraction for human activity recognition,” Knowledge-Based Syst., vol. 229, p. 107338, 2021, doi:

Database uploading user interface





Research Articles