Abstract:
Objective To compare the performance and efficacy of prognostic models constructed by different machine learning algorithms in predicting the survival period of lung transplantation (LTx) recipients.
Methods Data from 483 recipients who underwent LTx were retrospectively collected. All recipients were divided into a training set and a validation set at a ratio of 7:3. The 24 collected variables were screened based on variable importance (VIMP). Prognostic models were constructed using random survival forest (RSF) and extreme gradient boosting tree (XGBoost). The performance of the models was evaluated using the integrated area under the curve (iAUC) and time-dependent area under the curve (tAUC).
Results There were no significant statistical differences in the variables between the training set and the validation set. The top 15 variables ranked by VIMP were used for modeling and the length of stay in the intensive care unit (ICU) was determined as the most important factor. Compared with the XGBoost model, the RSF model demonstrated better performance in predicting the survival period of recipients (iAUC 0.773 vs. 0.723). The RSF model also showed better performance in predicting the 6-month survival period (tAUC 6 months 0.884 vs. 0.809, P = 0.009) and 1-year survival period (tAUC 1 year 0.896 vs. 0.825, P = 0.013) of recipients. Based on the prediction cut-off values of the two algorithms, LTx recipients were divided into high-risk and low-risk groups. The survival analysis results of both models showed that the survival rate of recipients in the high-risk group was significantly lower than that in the low-risk group (P<0.001).
Conclusions Compared with XGBoost, the machine learning prognostic model developed based on the RSF algorithm may preferably predict the survival period of LTx recipients.