Predicting Rehabilitation Length of Stay in Singapore Tertiary Rehab: Insights from Machine Learning Models
Care Continuum
Care Process & Redesign
Technology
National Healthcare Innovation and Productivity Medals
National Healthcare Group
28 January 2026
This project aims to use machine learning models to predict Rehabilitation Length of Stay (RLOS) and categorize patients. This study successfully developed a machine learning model capable of predicting rehabilitation length of stay with reasonable.
Year Submitted: 2025
Published Date: 28 January 2026
Tags: Machine Learning, Automation, Digitalisation, Technology, Care Continuum, Rehabilitative Care, Care Process & Redesign, Value Based Care, Length Of Stay
About this Content
Aims
This project aims to use machine learning models to predict Rehabilitation Length of Stay (RLOS) and categorize patients into short (less than or equals to 30 days) or long (more than 30 days) stays based on data available within the first 72 hours of admission.
Background
RLOS is a highly tracked indicator and accurate prediction of RLOS is essential for optimizing resource allocation and enhancing patient outcomes. Current RLOS is estimated based on clinicians' judgement and experience. There is a need in data-driven methodologies for predicting RLOS, which could provide more consistent, objective, and potentially more accurate forecasting to support clinical decision-making and operational planning.
Methods
This study leveraged both structured and unstructured data from 10,466 patient records from Tan Tock Seng Hospital's rehabilitation unit, spanning January 2013 to June 2024. To enable early prediction, only data available within the first 72 hours of admission was included. Functional Independence Measure (FIM) score was extracted using a combination of regular expression and large language model. The dataset underwent an 80:20 train-test split for model development and evaluation. Multiple machine learning algorithms were tested, including XGBoost, LightGBM, CatBoost, Random Forest, and Decision Tree models. Hyperparameter optimisation was performed using Optuna framework, with model selection based on F1 scores for balanced precision-recall assessment. Stratified 10-fold cross-validation was employed during training to ensure robust validation and prevent overfitting. Comprehensive evaluation used multiple metrics including accuracy, recall, precision, and F1 score to thoroughly assess classification performance across all models.
Results
XGBoost achieved the best performance with a test F1 score of 0.721, outperforming all other algorithms across all metrics with minimal overfitting observed (only 2.5-3.1% performance drops from training to test). The model achieved 72.3% overall accuracy but showed asymmetric performance: higher sensitivity for cases 30 days (77.8%) versus 30 days (65.1%). SHapley Additive exPlanations (SHAP) analysis revealed admission FIM score as the dominant predictor (Mean absolute SHAP value of 0.84), four times more influential than the next feature. Psychological services emerged as the second most important factor (0.42), nearly double the impact of speech therapy services (0.23), suggesting psychological interventions indicate more complex cases requiring extended care.
Conclusion
This study successfully developed a machine learning model capable of predicting rehabilitation length of stay with reasonable accuracy. The XGBoost algorithm demonstrated superior performance and identified admission FIM score as the primary predictor of RLOS, highlighting functional status at admission as the most critical determinant. The prominence of psychological services as the second most important predictor suggests that mental health interventions may serve as markers for complex cases requiring extended multidisciplinary care.
Lessons Learnt
Contrary to common expectations, age proved not to be a strong predictor of RLOS in this analysis when evaluating across multiple factors. These finding challenges conventional assumptions that older patients would necessarily require longer hospital stays, suggesting that other clinical and functional factors may be more influential in determining discharge readiness. A significant challenge emerged when attempting to extract the FIM score from free-text, which was identified as an important assessment tool that could help predict RLOS. These insights highlight both the importance of questioning assumptions about predictive variables in healthcare settings and the practical challenges of working with real-world clinical data where important information may be embedded in unstructured formats.
Additional Information
SHBC 2025 Best Poster (Health Services Research) - Bronze
Keywords
machine learning, rehabilitation length of stay
Innovators' Details
Innovators' Details
Healthcare Cluster(s) | National Healthcare Group |
Organization(s) Involved | OCEAN-MID Tan Tock Seng Hospital, Tan Tock Seng Hospital Rehabilitation Centre |
Platform(s) | National Healthcare Innovation and Productivity Medals |
Healthcare Professional Group(s) | Medical |
Applicable Specialty or Discipline | Rehabilitation Therapy |
Project Lead(s) | Karen Sui Geok Chua |
Project Member(s) | Yong Siang Ong |
Connect with this contributor!
Xiaojin Zhang - xiaojin.zhang@nhghealth.com.sg
