Risk factors associated with major adverse cardiac and cerebrovascular events following percutaneous coronary intervention: a 10-year follow-up comparing random survival forest and Cox proportional-hazards model

Background Due to the limited number of studies with long term follow-up of patients undergoing Percutaneous Coronary Intervention (PCI), we investigated the occurrence of Major Adverse Cardiac and Cerebrovascular Events (MACCE) during 10 years of follow-up after coronary angioplasty using Random Survival Forest (RSF) and Cox proportional hazards models. Methods The current retrospective cohort study was performed on 220 patients (69 women and 151 men) undergoing coronary angioplasty from March 2009 to March 2012 in Farchshian Medical Center in Hamadan city, Iran. Survival time (month) as the response variable was considered from the date of angioplasty to the main endpoint or the end of the follow-up period (September 2019). To identify the factors influencing the occurrence of MACCE, the performance of Cox and RSF models were investigated in terms of C index, Integrated Brier Score (IBS) and prediction error criteria. Results Ninety-six patients (43.7%) experienced MACCE by the end of the follow-up period, and the median survival time was estimated to be 98 months. Survival decreased from 99% during the first year to 39% at 10 years' follow-up. By applying the Cox model, the predictors were identified as follows: age (HR = 1.03, 95% CI 1.01–1.05), diabetes (HR = 2.17, 95% CI 1.29–3.66), smoking (HR = 2.41, 95% CI 1.46–3.98), and stent length (HR = 1.74, 95% CI 1.11–2.75). The predictive performance was slightly better by the RSF model (IBS of 0.124 vs. 0.135, C index of 0.648 vs. 0.626 and out-of-bag error rate of 0.352 vs. 0.374 for RSF). In addition to age, diabetes, smoking, and stent length, RSF also included coronary artery disease (acute or chronic) and hyperlipidemia as the most important variables. Conclusion Machine-learning prediction models such as RSF showed better performance than the Cox proportional hazards model for the prediction of MACCE during long-term follow-up after PCI.

surgery (CABG), requiring shorter hospitalization and recovery times. Deployment of drug eluting stents (DES) have largely replaced the use of bare metal stents (BMS), improving long-term prognosis, mainly by reducing the rate of restenosis [2]. Until now, identifying potential risk factors for subsequent major adverse cardiovascular events may offer additional advantages with respect to outcome [2][3][4][5]. However, this requires suitable models for determining the risk factors.
The Cox model is a general quasi-parametric choice for analyzing censored data. This model relates the log of the hazard ratio to a linear function of the predictors. There have been several limitations for the Cox model such as requiring medical knowledge to model covariate interaction in terms of complex nonlinear forms, as well as the proportional hazard assumption [6,7]. Failure to establish and ignore these assumptions can affect the validity of the results.
Random Survival Forest (RSF), as an ensemble learning method, has been developed to overcome the problems mentioned in the Cox model and other classical models for the analysis of survival data. The most important feature of RSF is the proper performance of this model for measuring the importance of variables [8]. This model is also suitable for medical research in the field of high dimensional data [9][10][11]. Various studies have evaluated the performance of the RSF model in comparison with the Cox model [12].
Several studies have been performed on the risk factors of future adverse events following PCI with the use of BMS and DES [5,13]. However, there are a limited number of studies describing the results of long term follow up after PCI treatment, and results from long-term follow-up may not necessarily match those of short-term follow-up. Furthermore, to the best of our knowledge, the RSF model has not previously been used to identify factors affecting the occurrence of MACCE in patients undergoing angioplasty with stent deployment. Therefore, we have conducted a long-term study to identify factors affecting the occurrence of MACCE following coronary stenting, comparing the RSF and Cox proportional-hazards models.

Methods
The current retrospective cohort study was performed on 220 patients (69 women and 151 men) undergoing coronary angioplasty from March 2009 to March 2012 in Farshchian Medical Center in Hamadan city, Iran. In this study, major adverse cardiovascular and cerebrovascular events known as MACCE were selected as the designated events (including death, CABG, stroke and repeat revascularization) for survival analysis.
Survival time (months), as the response variable, was considered from the date of angioplasty to the end of the follow-up period (September 2019) or the occurrence of MACCE. For the patients who had not experienced MACCE, the time from the date of angioplasty to the end of the follow-up time was considered as the censored survival time.
To identify the factors influencing the occurrence of MACCE during 10 years follow-up after coronary angioplasty, the performance of the Cox model and RSF model were investigated. Also, the event-free survival curve from MACCE was constructed with the Kaplan-Meier method.
It should be noted that the restricted mean survival time (RMST) reported for between-group summary metrics. Unlike median survival time, it is estimable even under heavy censoring.

Cox proportional hazard model
Cox proportional-hazards model specifies the conditional hazard function based on the vector of predictor variables. The general form of hazard for the ith subject with the X i profile at the time of t based on the Cox model was as follows: The Cox model consists of two components: non-parametric component as unspecified increasing function, known as the baseline hazard (h0) and the parametric component, which is a linear and multiplicative function of the Xi [6].

Random forest survival model
The RSF model, as a tree-based ensemble non-parametric algorithm can solve the limitations of the Cox model as well as identify and rank the most important variables affecting survival time. Ensemble learning is a type of supervised learning technique in which the main idea is producing several models in a training data set and then combining (average) output rules or the hypotheses obtained from them [14].
In general, the RSF algorithm includes the following steps: 1. The number of B Bootstrap samples were selected from the original data. In each bootstrap sample, about one-third of the data was out of the bag. For example, 1000 samples of Bootstrap were selected from the main data, in each Bootstrap sample, 670 samples were used for training, and the remaining out-of-bag (OBB) sample used for testing and estimation of prediction error. 2. A survival tree-based Nelson-Aalen estimator was grown for each Bootstrap sample. In each node of the tree, mtry covariates were randomly selected out of all p covariates for splitting. A variable was chosen to maximize the separation between two formed tree nodes. Growth stops after a certain stop condition is met (e.g., when the number of observations within a terminal node is less than a preset value or when the node becomes pure). Default values of mtry = √ p and the log-rank statistic are used as split criteria. 3. To obtain a risk prediction ensemble, information from the terminal nodes (nodes with no further split) of B survival trees were aggregated. For each tree, the cumulative hazard function (CHF) is calculated, and then the average of these CHFs reports the ensemble CHF. 4. The prediction error was calculated for the ensemble CHF using OOB data.
In this study, the implementation of the RSF model for data in each time consisted of 2000 trees based on log-rank as splitting criteria. The relative importance of each variable was also assessed using VIMP criteria. The larger the VIMP value for a variable, the more important the predictor role of that variable.

Evaluation of survival models
Brier Score, as a measure to evaluate the performance of different survival models, is the mean square error of the prediction and indicates the predictive ability of a prediction model. Smaller values of the Brier Score indicate a more accurate prediction. The general form of the score is as follows: where Y i (t) is the event status for the i-th subject at time t, and Ŝ (t|X i ) is the survival probability for this person at time t according to the model [15].
Therefore, IBS (Integrated Brier Score) and C index criteria based on OOB data were used to compare the performance of Cox models and the random survival forest. It should be noted that for computing the evaluation criteria, all variables were included in both models.
In the next step of analysis confounders with significant unadjusted hazard ratios were included in the multiple cox regression. Also, for the RSF, confounders with positive VIMP were included. Then these two models were compared. The results show that for these conditions, the RSF (based on the six confounders with positive VIMP) has a better performance compared to the Cox model (based on the four confounders with significant unadjusted hazard ratios) ( Table 5).

Discussion
In this study, the long-term survival of cardiovascular patients after angioplasty was investigated in a 10-year follow-up. Comparing the predictive performance of the two models showed that the predictive performance of RSF was better than the Cox model.
Cox model showed that variables such as older age, diabetes, smoking, and longer stent length were the most important variables affecting patient survival. The most important factors affecting the survival of patients based on the RSF model were in order of diabetes, smoking, age, stent length, presentation of coronary artery disease, and hyperlipidemia.
To the best of our knowledge, there has been no similar study investigating the factors influencing the occurrence of MACCE after angioplasty with RSF. Until now, various studies have evaluated the short-term predictors of MACE (major adverse cardiac events) following PCI. However, few studies have focused on the long-term follow up outcomes. Most of these studies have reported short-term follow up results and compared the complications and survival of patients with DESs and BMSs.
We observed an incidence of 43.7% MACCE, whereas, Aghajan et al. [16] reported 14.4% MACE in elderly patients (with a mean age of 70.8 ± 4.7 years) during a 10 years follow-up period. During a shorter follow-up period of 2 years, Zhou et al. [17] reported 7.4% MACE, and after 3 years Meliga et al. [18] reported 26.5% MACE in patients treated at seven European and American medical centers.
As expected, the traditional risk factors (e.g. age, diabetes and smoking) increased the risk of MACCE. One year increase in the age increased the risk of MACCE by 5%. The hazard rate of MACCE in smokers was 2.41 times that of the non-smokers. These results are consistent with the findings obtained by Farshidi et al. [21] and Tsai et al. [22], indicating a significant correlation of old age, smoking and diabetes during PCI with mortality.
The finding of this study confirmed that individuals with a stent number of 3 were 1.8 times more likely to experience MACCE than those with a stent number of 1. Also, the chance of MACCE increased with an increase in the number of involved vessels. Tsai et al. [22] found that triple vessel and stent implantation predicted the development of MACE in Chinese PCI patients.
Also, our results showed that there was no statistically significant effect of stent type on the survival of patients. The current study is observational non-randomized; therefore a comparison of two stents types will be biased according to the lesion or patient characteristic, and any interpretation of treatment result is therefore precluded (due to indication bias). However, in a randomized controlled trial study conducted by Horst et al., in patients undergoing PCI, there were no significant differences in the composite outcome of death from any cause and nonfatal spontaneous myocardial infarction between the two types of stents after a median of 5 years of follow up [23]. Flice et al., reported a statistically significant difference between the two types of stents in the occurrence of MACE during a 3-year follow-up (18% for the DES versus 28% for the BMS stent) in coronary patients with chronic obstructive pulmonary disease [24]. Cai et al., showed that there was a statistically significant difference in with BMS was significantly higher than those treated with DES (P = 0.034). However, in the long-term followup, there was no significant difference in the mortality rate between the two types of stents [21]. Duggalb et al., reported that there was a statistically significant difference in unadjusted mortality rates between the BMS (5%) and DES (3.8%) [26]. Also, in the study by Dieguez et al. [27], the rate of all-cause mortality in patients treated with DES (6.5%) was significantly lower than that of patients treated with BMS (12.2%) (P = 0.049). The study conducted by Melberg et al., showed that after a median of 10 years' follow-up, a quarter of the patients were dead, and more than half of the patients died from non-cardiac causes. Also, causes of death will change from MACE (MACCE) and be more dominated by cancer, especially after 5 years [28]. However, in the present study, cancer was the cause of death in only four patients.
One of the limitations of the present study is that it may be difficult to confirm the cause of death for people who died out of hospital. Since fewer diagnostic tests in terminally ill or elderly patients may be performed, the causes listed in the death certificates may be inconclusive. Also, the analysis of this type of data with composite endpoints from a competing risk perspective can be considered.

Conclusion
The current study showed that the use of machinelearning prediction models such as RSF may improve long-term prediction in patients undergoing coronary stenting. Although the prediction performance of RSF based on the prediction error criteria was better than the Cox model, the most important variables identified in the two methods were similar. Our findings imply that the presentation of coronary artery disease (acute or chronic) and hyperlipidemia may also be considered as important prognostic variables in addition to diabetes, smoking, age, and stent length. The risk of complications may be modified by controlling these prognostic factors.