Projects

Predicting Rehabilitation Length of Stay in Singapore Tertiary Rehab: Insights from Machine Learning Models

Project Categories

Care Continuum

Care Process & Redesign

Technology

Platform

National Healthcare Innovation and Productivity Medals

Cluster

National Healthcare Group

28 January 2026

This project aims to use machine learning models to predict Rehabilitation Length of Stay (RLOS) and categorize patients. This study successfully developed a machine learning model capable of predicting rehabilitation length of stay with reasonable.

Year Submitted: 2025
Published Date: 28 January 2026

Tags: Machine Learning, Automation, Digitalisation, Technology, Care Continuum, Rehabilitative Care, Care Process ＆ Redesign, Value Based Care, Length Of Stay

About this Content

Aims

This project aims to use machine learning models to predict Rehabilitation Length of Stay (RLOS) and categorize patients into short (less than or equals to 30 days) or long (more than 30 days) stays based on data available within the first 72 hours of admission.

Background

RLOS is a highly tracked indicator and accurate prediction of RLOS is essential for optimizing resource allocation and enhancing patient outcomes. Current RLOS is estimated based on clinicians' judgement and experience. There is a need in data-driven methodologies for predicting RLOS, which could provide more consistent, objective, and potentially more accurate forecasting to support clinical decision-making and operational planning.

Methods

This study leveraged both structured and unstructured data from 10,466 patient records from Tan Tock Seng Hospital's rehabilitation unit, spanning January 2013 to June 2024. To enable early prediction, only data available within the first 72 hours of admission was included. Functional Independence Measure (FIM) score was extracted using a combination of regular expression and large language model. The dataset underwent an 80:20 train-test split for model development and evaluation. Multiple machine learning algorithms were tested, including XGBoost, LightGBM, CatBoost, Random Forest, and Decision Tree models. Hyperparameter optimisation was performed using Optuna framework, with model selection based on F1 scores for balanced precision-recall assessment. Stratified 10-fold cross-validation was employed during training to ensure robust validation and prevent overfitting. Comprehensive evaluation used multiple metrics including accuracy, recall, precision, and F1 score to thoroughly assess classification performance across all models.

Results

XGBoost achieved the best performance with a test F1 score of 0.721, outperforming all other algorithms across all metrics with minimal overfitting observed (only 2.5-3.1% performance drops from training to test). The model achieved 72.3% overall accuracy but showed asymmetric performance: higher sensitivity for cases 30 days (77.8%) versus 30 days (65.1%). SHapley Additive exPlanations (SHAP) analysis revealed admission FIM score as the dominant predictor (Mean absolute SHAP value of 0.84), four times more influential than the next feature. Psychological services emerged as the second most important factor (0.42), nearly double the impact of speech therapy services (0.23), suggesting psychological interventions indicate more complex cases requiring extended care.

Conclusion

This study successfully developed a machine learning model capable of predicting rehabilitation length of stay with reasonable accuracy. The XGBoost algorithm demonstrated superior performance and identified admission FIM score as the primary predictor of RLOS, highlighting functional status at admission as the most critical determinant. The prominence of psychological services as the second most important predictor suggests that mental health interventions may serve as markers for complex cases requiring extended multidisciplinary care.

Lessons Learnt

Contrary to common expectations, age proved not to be a strong predictor of RLOS in this analysis when evaluating across multiple factors. These finding challenges conventional assumptions that older patients would necessarily require longer hospital stays, suggesting that other clinical and functional factors may be more influential in determining discharge readiness. A significant challenge emerged when attempting to extract the FIM score from free-text, which was identified as an important assessment tool that could help predict RLOS. These insights highlight both the importance of questioning assumptions about predictive variables in healthcare settings and the practical challenges of working with real-world clinical data where important information may be embedded in unstructured formats.

Additional Information

SHBC 2025 Best Poster (Health Services Research) - Bronze

Keywords

machine learning, rehabilitation length of stay

Innovators' Details

Healthcare Cluster(s)	National Healthcare Group
Organization(s) Involved	OCEAN-MID Tan Tock Seng Hospital, Tan Tock Seng Hospital Rehabilitation Centre
Platform(s)	National Healthcare Innovation and Productivity Medals
Healthcare Professional Group(s)	Medical
Applicable Specialty or Discipline	Rehabilitation Therapy
Project Lead(s)	Karen Sui Geok Chua
Project Member(s)	Yong Siang Ong Yong Sheng Heng Ye Li Xiaojin Zhang Kee Hao Leo Kenneth Jun Hong Ngoh

Connect with this contributor!

Xiaojin Zhang - xiaojin.zhang@nhghealth.com.sg

Project Attachment

Predicting Rehabilitation Length of Stay in Singapore - Tertiary Rehab Insights from Machine Learning Models.pdf