Prognostic nomogram for 30-day mortality of deep vein thrombosis patients in intensive care unit

Background We aimed to use the Medical Information Mart for Intensive Care III database to build a nomogram to identify 30-day mortality risk of deep vein thrombosis (DVT) patients in intensive care unit (ICU). Methods Stepwise logistic regression and logistic regression with least absolute shrinkage and selection operator (LASSO) were used to fit two prediction models. Bootstrap method was used to perform internal validation. Results We obtained baseline data of 535 DVT patients, 91 (17%) of whom died within 30 days. The discriminations of two new models were better than traditional scores. Compared with simplified acute physiology score II (SAPSII), the predictive abilities of two new models were improved (Net reclassification improvement [NRI] > 0; Integrated discrimination improvement [IDI] > 0; P < 0.05). The Brier scores of two new models in training set were 0.091 and 0.108. After internal validation, corrected area under the curves for two models were 0.850 and 0.830, while corrected Brier scores were 0.108 and 0.114. The more concise model was chosen to make the nomogram. Conclusions The nomogram developed by logistic regression with LASSO model can provide an accurate prognosis for DVT patients in ICU.

of relevant scoring or prediction models can assist doctors in ICU to make important decisions in a short time. However, there is lack of a scoring model for predicting the risk of mortality of DVT patients in ICU. The existing scores associated with DVT, like Wells, Padua and Caprini, are not suitable for critically ill patients [8][9][10], and also can't predict future mortality [11][12][13]. We hoped to fully include the initial admission information of DVT patients in ICU, consisting of general status, comorbidity information and important laboratory indicators to make a concise prediction model and present as nomogram for predicting the risk of 30-day mortality.
Nomogram is a graphical calculation tool designed to perform complex calculations quickly. Because of the trend of precision medicine and the aim to reach the simplicity of clinical decision-making, nomogram is widely used in medical treatment, especially in clinical predictive modeling [14,15]. We also presented our prediction model as nomogram in this study in order to promote the decision-making ability for DVT patients in ICU in the future.

Data retrieval
The data used in this study was obtained from Medical Information Mart for Intensive Care III (MIMIC-III) database. The MIMIC-III database integrates information on more than 50,000 ICU hospitalized patients from Beth Israel Deaconess Medical Center in Boston, Massachusetts from 2001 to 2012. [16]. MIMIC database uses international classification of diseases (ICD)-9 diagnostic code to define the patient's condition, which we used to select the baseline data of patients diagnosed as lower extremity DVT (ICD-9 diagnostic code including 45340, 45341, 45342, 45350, 45351 and 45352). We extracted the admission data and clinical outcomes of all patients diagnosed as DVT. The admission data included general condition, comorbidity information, laboratory indicators and severity scores. The clinical outcome was defined as all-cause death within 30 days after admission. Variables with missing data of more than 30% were excluded. In the general admission condition of patients, heart rate, systolic blood pressure, diastolic blood pressure, respiratory rate, body temperature, percutaneous oxygen saturation (spO2) and blood glucose were all the average values of the data collected on first day of ICU admission. In laboratory indicators, red blood cell volume distribution width (RDW), lymphocyte, monocyte, neutrophil, hematocrit, white blood cell and platelet count, blood sodium, blood potassium, serum anion gap, serum bicarbonate, serum chloride, blood urea nitrogen, serum creatinine, albumin, hemoglobin, serum lactate, partial thromboplastin time (PTT) and prothrombin time (PT) were all defined as the average values of data collected on the first day of admission or the earliest day after admission if detection delayed. Neutrophil-to-lymphocyte ratio (NLR) and lymphocyte-to-monocyte ratio (LMR) were also accessed using laboratory data detected at the first time. The comorbidity information and severity scores of each patient were collected using the code on github [17] to establish materialized views and extract relevant information on the PostgreSQL software. Comorbidity information was the ICD-9 diagnosis of patient who was admitted to hospital at the same time.

Statistical analysis
In the baseline data of patients, continuous variables with normal distribution would be represented by mean with standard deviation (SD), compared with t-test. Continuous variables with abnormal distribution would be represented by median and interquartile range, compared with Wilcoxon rank-sum test. Categorical variables would be represented by frequency and percentage, compared with chi-square test. We used general conditions, comorbidity information and laboratory indicators from patients' baseline data as candidate variables to develop new models. First of all, we transformed the variables including RDW, monocyte, creatinine, lactate, NLR and LMR which with abnormal distribution to new variables that conformed to normal distribution (we used a stable and large value of 50 minus each value of RDW to get a new variable, and monocyte, creatinine, lactate, NLR and LMR were transformed using the logarithm of 10 to construct new variables). For some variables that contained outliers (glucose and anion gap) and would affect the stability of the final model, we truncated them at 1% and 99%. Then we performed single imputation for the whole dataset based on the complete conditional specification and used predictive mean matching method to fill the missing value. Each incomplete variable was estimated by an independent model to ensure the validity of the imputation results. This process was implemented using the "mice" package in R software (Version: 3.6.1) [18]. For the convenience of using the model in clinic, we chose to convert all continuous variables into categorical variables. We used restricted cubic splines to detect the linear relationship between continuous variables and clinical outcome. Group transformations according to the analysis of restricted cubic splines were carried out for continuous variables that were not linear with the clinical outcome [19]. For the continuous variables that had a linear relationship with the outcome, the best categorical cut-off value of them would be determined by clinical significance or the analysis of the receiver operating characteristic (ROC) curves where the Youden index reached the maximum.

Model building
Firstly, we used stepwise logistic regression to fit the model with all candidate variables and the outcome of 30-day mortality, and selected the model with the smallest akaike information criterion (AIC). Considering the convenience of clinical application, we then used LASSO to screen all candidate variables. LASSO can help to simplify the model by increasing penalty coefficients λ to compress the estimate of each variable [20]. After that, less variables were selected according to LASSO results and clinical significance. Multivariate logistic regression was then used to fit a more concise model. After building two new models using stepwise logistic regression and logistic regression with L1 regularization (i.e. LASSO), these two new models were used to calculate the discrimination and calibration in the original training set. Discrimination was measured by area under the curves (AUC) and calibration was measured by Brier score. The Brier score was calculated as the following formula: In the formula, N represents the total number of predictions, f t represents the actual results and o t represents the prediction probability of the model. Then AUC, NRI and IDI were used as indicators to compare predictive abilities of two new models and traditional scoring models. Bootstrap was used to validate the two new models internally and the number of repetitions was set as 100 [21]. Finally, for the convenience of clinical application, we made a nomogram according to the model fitted by logistic regression with LASSO. The above data processing was done in R software (Version: 3.6.1).

Results
Of all 535 DVT patients enrolled in this study, 91 patients died within 30 days after admission. In the previous data cleaning, we removed variables that had more than 30% missing data, including most of the variables for blood gas analysis. In remaining variables, missing data of lactate and albumin were more than 20%; Missing data of other variables were no more than 10%. The top five variables with most missing data were albumin, lactate, LMR, NLR and neutrophils. Clinical outcome had none missing data. Figure 1 shows the situation of missing values. The statistical description and univariate analysis of baseline data of patients is shown in Table 1. After classifying all the continuous variables into categorical variables, we incorporated the general condition, comorbidity information and laboratory indicators of all 535 patients' baseline data into the stepwise logistic regression model. Among them, the location of DVT in patients was excluded because of its uncertain classification, and monocyte was excluded because of poor prediction of outcome. The specific classification of variables is shown in Additional file 1: Table S1. Finally, AIC of the stepwise logistic regression model was 356.75, and 18 variables were included in the model. In order to further simplify the model, we used LASSO to screen all the candidate variables of 535 patients. Then tenfold cross validation for logistic regression was conducted to help select the suitable penalty coefficient λ. The relationship between penalty coefficient λ and variables remaining in the model is shown in Fig. 2. Finally, the penalty coefficient λ as 0.0536 and a model consisting of 8 variables were chosen. The selected 8 variables were then put into multivariable logistic regression to fit the model. The AIC of logistic regression with LASSO model was 382.96. The variables and their coefficients in stepwise logistic regression model and logistic regression with LASSO model are shown in Table 2. ROC curves were then conducted to show the predictive abilities of two new models, SAPSII, oxford acute severity of illness score (OASIS) and sequential organ failure assessment (SOFA) on the clinical outcome of DVT patients. The results showed that the AUC of stepwise logistic regression was the best, which was 0.885 (95% confidence intervals [CI]: 0.849-0.921). The second was the logistic regression with LASSO model, AUC of which was 0.845 (95% CI: 0.804-0.885). Among the three severity scores, AUC of SAPSII took the highest, which was 0.781 (95% CI: 0.732-0.831). The ROC curves are shown in Fig. 3. Then, we evaluated the calibration of stepwise logistic regression model and logistic regression with LASSO model in the original training set. The results showed that the Brier score of stepwise logistic regression model was 0.091, and the Brier score of logistic regression with LASSO model was 0.108. Calibration curves of two new models are shown in Fig. 4. Then NRI and IDI were used to evaluate the improvement of predictive abilities of two new models compared with SAPSII. The classification nodes of risk of mortality were set as 0.2, 0.6 and 0.8. The results showed that the prediction or classification abilities of the stepwise logistic regression model and logistic regression with LASSO model were both better than SAPSII (NRI [categorical and continuous] both > 0; IDI both > 0), and there were statistical significances. Related results about NRI and IDI are shown in Table 3. After using bootstrap for internal validation, the corrected AUC and Brier score of the stepwise logistic regression were 0.850 and 0.105, respectively. And for logistic regression with LASSO model, the results were 0.830 and 0.114, respectively. Related results about internal validation of two new models are shown in Table 4. Although the prediction ability Among them, lactate and albumin are more deficient, missing data of which are both more than 20% of themselves. NA not available, NLR neutrophil-to-lymphocyte ratio, LMR lymphocyte-to-monocyte ratio

Discussion
This research used comprehensive data recorded in MIMIC-III to develop two complete models by using methods of stepwise logistic regression and logistic regression with LASSO. These two new models were verified to have good predictive abilities for 30-day mortality of DVT patients (AUC both > 0.8 in the original data and after internal validation), good calibration (Brier scores were 0.091 and 0.108 in original data set, 0.105 and 0.114 after internal validation) and improvement in predictive abilities compared with widely-used SAPSII score (NRI [categorical and continuous] both > 0; IDI both > 0; P both < 0.05). The model obtained by using logistic regression with LASSO only included 8 variables, which was more suitable for rapid decision-making in ICU and was used to make the final nomogram. DVT is a common and critical disease in ICU, which not only increases the time of bed-staying, but also endangers patients' lives [22,23]. And since the critically ill patients in ICU are usually complicated with a variety of basic diseases, such as cancer and multiple organ dysfunction (organ dysfunction accounts for 54.6% of the total DVT patients, metastatic cancer accounts for 11.0% and solid tumor accounts for 5.2% in this study), anticoagulation or thrombolysis therapy may cause major bleeding and even death [2]. The condition of DVT patients in ICU brings difficulties and challenges for doctors, especially for non-cardiovascular surgeons. A good scoring system or nomogram can help doctors in ICU predict the risk of mortality of DVT patients so as to strengthen monitoring or change decision-making. However, there is still lack of a prediction model specifically built for DVT patients in ICU, and none of previous studies had evaluated the predictive abilities of SAPSII, DVT deep venous thrombosis, spO2 percutaneous oxygen saturation, RDW red blood cell volume distribution width, PTT partial thromboplastin time, PT prothrombin time, NLR neutrophil-to-lymphocyte ratio, LMR lymphocyte-to-monocyte ratio, OASIS oxford acute severity of illness score, SAPSII simplified acute physiology score II, SOFA sequential organ failure assessment a Continuous variables that do not conform to normal distribution were represented by median and interquartile range

OASIS and SOFA on mortality of DVT patients in ICU.
This study showed that the SAPSII score had better discrimination in predicting 30-day mortality in patients with DVT in ICU (AUC = 0.781 [95% CI 0.732-0.831]), while the other two scores showed worse abilities (AUC both < 0.7). This might be due to the reason that OASIS score doesn't contain laboratory indicators and comorbidity information, while SOFA focuses more on organ failure. There are more than 15 indicators in SAPSII scoring model, which is more complicated to use compared with the nomogram developed in this study. The advantage of visualization and the categorical indicators of nomogram also increase the convenience of clinical application. Therefore, we recommend that doctors refer to our nomogram to quickly and accurately assess the 30-day risk of mortality in patients with DVT in ICU. Old age can lead to a poor prognosis in patients with DVT in ICU, which may be due to a high risk of bleeding, cardiopulmonary circulatory disorders, or decreased immunity and more susceptibility to infection [24]. Age is also a risk factor that is included in SAPSII and pulmonary embolism severity index (PESI) which is suitable for predicting mortality in patients with PE [25]. Since development of model is based on a specific population or disease, there will be differences in the classification criteria in different models. In this study, we chose 75 years old as the best classification cut-off value according to the analysis of ROC curve, which is different from SAPSII and PESI. Similarly, as for systolic blood pressure, different diseases have different classification criteria. Low systolic blood pressure and high systolic blood pressure are usually associated with poor clinical prognosis [26][27][28]. More attention should be paid to the harm caused by low systolic blood pressure in ICU because patients are in critical state and may be complicated with heart failure or other organ dysfunction. This study used 110 mmHg as the truncated standard of systolic blood pressure, which was different from PESI.
The nomogram included sepsis and metastatic cancer among the comorbidity information of patients, and a b Fig. 5 The nomogram obtained from the logistic regression with LASSO model. a If a patient meets the target requirement in each variable, he/ she will be given a score according to the points above (considering that the nomogram is difficult to calculate with naked eyes, the score that each indicator get has been pointed out). The total score obtained by adding up the scores of all indicators can be compared with the risk of 30-day death probability below. b A simple case analysis: a 64 years old female got deep vein thrombosis when admitted to intensive care unit. She had complications of sepsis and metastatic cancer. Her systolic blood pressure was 107 mmHg. After laboratory test, her blood urea nitrogen was 32 mg/dL, albumin was 2.5 g/dL, lactate was 3.2 mmol/L and NLR was 10.7. According to our nomogram, she finally got 416 score and a 30-day death probability that higher than 0.75. LASSO least absolute shrinkage and selection operator, NLR neutrophil-to-lymphocyte ratio metastatic cancer had the highest score of all variables. Sepsis is a common disease in ICU which is also related with high mortality [29]. From the perspective of Virchow triad, systemic bacterial infection and inflammation, as well as damage to endothelial cells are important factors in promoting thrombosis. Many studies have pointed out that patients with sepsis were prone to secondary thrombotic diseases [30,31], which might increase the risk of acute PE and death or difficulty in anticoagulation therapy. Metastatic cancer is also an important predictor in SAPSII score. In this study, metastatic cancer accounted for 11.0% of the total DVT patients, and was significantly correlated with clinical outcomes in both univariate and multivariate analyses (P = 0.001 in univariate analyses, P < 0.001 in two multivariate models). Cancer cells can differentially express genes including rat sarcoma family, phosphatase and tensin homolog, and tumor protein p53 to influence the genesis of thrombosis, which makes cancer patients themselves hypercoagulable [32,33]. And cancer patients often have central venous catheterization because they are in need of long-term infusion [34], which also damages the endothelial and increases the risk of thrombosis. However, patients with metastatic cancer complicated with DVT have an increased risk of major bleeding during anticoagulation or thrombosis therapy [35,36], which undoubtedly makes it difficult to treat this kind of patients. Whether inside or outside ICU, concerns about bleeding risk and unknown thrombosis are great barriers and challenges for thrombolytic therapy for DVT [37]. The nomogram proposed in this study can also assist doctors in risk grading or treating of DVT patients complicated with sepsis or cancer in ICU. Laboratory indicators can provide doctors with a reference to the condition and help judge the prognosis of patients. Nomogram in this study included 4 indicators: blood urea nitrogen, albumin, lactate and NLR. In previous studies, RDW and anion gap were considered to be related to the prognosis of ICU patients [38,39]. However, these two variables were screened out in logistic regression with LASSO and they were not shown in the nomogram in this study. Blood urea nitrogen was included in our nomogram and also in SAPSII. Blood urea nitrogen is considered to be a key factor reflecting the complex relationship among nutritional status, protein metabolism and renal condition. In several indicators (glomerular filtration rate, creatine and blood urea nitrogen) that reflect the renal function, blood urea nitrogen is more sensitive in reflecting the status of heart failure [40]. A previous retrospective study showed that high blood urea nitrogen was related with poor prognosis in critically ill patients in ICU. And even after adjusted for renal failure, blood urea nitrogen was still an independent variable to predict poor prognosis [41].
This study also showed that blood urea nitrogen could predict 30-day mortality independently in patients with DVT in ICU and 18.5 mg/dL was set as the cut-off value of classification. Albumin is an important component of human body, which has a variety of functions, such as extracellular antioxidants, buffers, immunomodulators and antidotes, and can alleviate inflammation caused by bacterial infection [42]. Previous studies showed that plasma albumin levels were reduced in many critically ill patients, up to 50% in patients with a value of lower than 35 g/L [43]. Decrease of albumin will cause disorder of blood osmotic pressure and serum composition, and may also be related with the loss of liver function, which will increase the mortality risk of DVT patients. In this study, the classification cut-off value of albumin was 3 g/dL. Many previous studies have pointed out the relationship between high lactate content and high mortality or poor prognosis [44,45]. In this study, lactate could be used as an independent variable to predict the clinical outcome as well, classification cut-off value of which was set as 2.25 mmol/L. NLR is an inflammatory indicator that can reflect the level of inflammation, which was also an important risk factor for predicting the clinical outcome in this study. Inflammation not only can promote thrombosis, but also has an important effect on the progression of DVT patients in ICU [46]. The classification cut-off value of NLR was set as 12.6.
Our research has several strengths. Firstly, our study is the first to develop a 30-day mortality risk score for DVT patients in ICU. Secondly, we used two methods to develop two complete models respectively and tested their predictive abilities in original training set and internal validation. Among them, LASSO belongs to machine learning and can simplify the model while ensuring the predictive ability of the model, which provides methodological convenience for us to make models suitable for clinical application. Thirdly, our final model for clinical application was presented as nomogram and all the included continuous variables were classified as categorical variables according to certain criteria, which provides great convenience for clinicians.
Our research also has limitations. Firstly, the missing data of albumin and lactate contained in the nomogram were both more than 20% before imputation, which may have an impact on the predictive ability of the model in external validation. Secondly, our study lacks data of blood gas analysis, for the reason that we found too much data missing in blood gas analysis in initial inclusion of variables. For the sake of the stability of the model, we cleaned out most of the variables of blood gas analysis though there were many variables that we considered to be excellent in predicting the clinical outcomes. Further studies can incorporate some items of blood gas analysis on the basis of our model to explore the predictive abilities of blood gas analysis on the mortality of patients with DVT. Thirdly, the clinical outcome we set was all-cause mortality rather than death directly related to DVT or embolic death (or PE), which was due to recording limitations of open database. And we can't have access to the patient's clinical end point and do further follow up.

Conclusions
Using the data from the open database MIMIC-III, we developed two sets of prediction models for 30-day mortality risk of DVT patients in ICU with two statistical methods, and selected the more concise model to make a nomogram for clinical application. The prediction ability of the nomogram is better than traditional scores, and can be used in the decision-making of DVT which has a high prevalence in ICU.