比较多种机器学习模型预测肺移植术后受者生存

Comparison of multiple machine learning models for predicting the survival of recipients after lung transplantation

  • 摘要:
    目的  比较不同机器学习算法构建的预后模型在预测肺移植(LTx)受者生存期方面的性能和表现。
    方法  回顾性收集483例行LTx手术的受者资料,所有受者按7∶3的比例分为训练集和验证集,将收集到的24个变量基于变量重要性(VIMP)进行筛选,利用随机生存森林(RSF)和极端梯度提升树(XGBoost)构建预后模型,使用综合曲线下面积(iAUC)和时间依赖曲线下面积(tAUC)进行模型性能评估。
    结果  训练集和验证集的各变量无显著统计学差异。基于VIMP筛选排名前15的变量用于建模并确定重症监护室(ICU)住院时间为最重要的因素。与XGBoost模型相比,RSF模型在预测受者生存期方面表现出更好的性能(iAUC 0.773比0.723)。在预测受者6个月生存期(tAUC6个月 0.884比0.809,P = 0.009)和1年生存期(tAUC1年 0.896比0.825,P = 0.013)方面,RSF模型也表现出更好的性能。基于两种算法的预测截断值,将LTx术后受者分为高风险组和低风险组,两种模型的生存分析结果均显示高风险组受者的生存率显著低于低风险组(P<0.001)。
    结论  与XGBoost相比,基于RSF算法开发的机器学习预后模型可以更好地预测LTx受者的生存期。

     

    Abstract:
    Objective To compare the performance and efficacy of prognostic models constructed by different machine learning algorithms in predicting the survival period of lung transplantation (LTx) recipients.
    Methods  Data from 483 recipients who underwent LTx were retrospectively collected. All recipients were divided into a training set and a validation set at a ratio of 7:3. The 24 collected variables were screened based on variable importance (VIMP). Prognostic models were constructed using random survival forest (RSF) and extreme gradient boosting tree (XGBoost). The performance of the models was evaluated using the integrated area under the curve (iAUC) and time-dependent area under the curve (tAUC).
    Results  There were no significant statistical differences in the variables between the training set and the validation set. The top 15 variables ranked by VIMP were used for modeling and the length of stay in the intensive care unit (ICU) was determined as the most important factor. Compared with the XGBoost model, the RSF model demonstrated better performance in predicting the survival period of recipients (iAUC 0.773 vs. 0.723). The RSF model also showed better performance in predicting the 6-month survival period (tAUC 6 months 0.884 vs. 0.809, P = 0.009) and 1-year survival period (tAUC 1 year 0.896 vs. 0.825, P = 0.013) of recipients. Based on the prediction cut-off values of the two algorithms, LTx recipients were divided into high-risk and low-risk groups. The survival analysis results of both models showed that the survival rate of recipients in the high-risk group was significantly lower than that in the low-risk group (P<0.001).
    Conclusions  Compared with XGBoost, the machine learning prognostic model developed based on the RSF algorithm may preferably predict the survival period of LTx recipients.

     

/

返回文章
返回