Identification of distinct clinical phenotypes of cardiogenic shock using machine learning consensus clustering approach

Background Cardiogenic shock (CS) is a complex state with many underlying causes and associated outcomes. It is still difficult to differentiate between various CS phenotypes. We investigated if the CS phenotypes with distinctive clinical profiles and prognoses might be found using the machine learning (ML) consensus clustering approach. Methods The current study included patients who were diagnosed with CS at the time of admission from the electronic ICU (eICU) Collaborative Research Database. Among 21,925 patients with CS, an unsupervised ML consensus clustering analysis was conducted. The optimal number of clusters was identified by means of the consensus matrix (CM) heat map, cumulative distribution function (CDF), cluster-consensus plots, and the proportion of ambiguously clustered pairs (PAC) analysis. We calculated the standardized mean difference (SMD) of each variable and used the cutoff of ± 0.3 to identify each cluster’s key features. We examined the relationship between the phenotypes and several clinical endpoints utilizing logistic regression (LR) analysis. Results The consensus cluster analysis identified two clusters (Cluster 1: n = 9,848; Cluster 2: n = 12,077). The key features of patients in Cluster 1, compared with Cluster 2, included: lower blood pressure, lower eGFR (estimated glomerular filtration rate), higher BUN (blood urea nitrogen), higher creatinine, lower albumin, higher potassium, lower bicarbonate, lower red blood cell (RBC), higher red blood cell distribution width (RDW), higher SOFA score, higher APS III score, and higher APACHE IV score on admission. The results of LR analysis showed that the Cluster 2 was associated with lower in-hospital mortality (odds ratio [OR]: 0.374; 95% confidence interval [CI]: 0.347–0.402; P < 0.001), ICU mortality (OR: 0.349; 95% CI: 0.318–0.382; P < 0.001), and the incidence of acute kidney injury (AKI) after admission (OR: 0.478; 95% CI: 0.452–0.505; P < 0.001). Conclusions ML consensus clustering analysis synthesized the pattern of clinical and laboratory data to reveal distinct CS phenotypes with different clinical outcomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12872-023-03380-y.


Introduction
Cardiogenic shock (CS), a state of circulatory failure, can occur due to acute ischaemic or non-ischaemic cardiac events, or from the progression of longstanding underlying heart disease [2][3][4].Unfortunately, despite recent advances in pharmacological intervention or mechanical support, CS mortality remains unacceptably high and highly varied, with the 30-day mortality ranging from 50 to 90% [5].The disparity in the mortality rates might imply CS patients are a heterogeneous population, and some phenotypes of CS are so different in clinical features and prognoses that they cannot be regarded as a whole population, both in clinical practice and research.Additionally, most attempts at staging CS have been based on expert opinions and consensus [6][7][8][9].To avoid complexity, some of these classification systems only use few variables and depend on specific, although arbitrary, cutoffs, which may introduce bias and fail to capture the full variability of patient profiles [10].Furthermore, the traditional logistic regression (LR) method has been used to develop most of these classifications, despite the fact that the predictors do not interact linearly and additively [11].Therefore, a more accurate and granular classification of the CS spectrum is urgently needed to aid in the urgent and critical task of selecting proper management, including targeting the most appropriate candidates for advanced therapies.
Machine learning (ML) algorithms have become more commonly utilized in individualized medicine to support clinical decision-making as electronic medical records and artificial intelligence have advanced [12,13].Consensus clustering, an unsupervised ML approach, is used to find similarities and differences among numerous variables, and then allocate them into distinct phenotypes.Consensus clustering generates multiple clustering results by multiple iterations and merges these results to arrive at the final clustering result [14].Additionally, consensus clustering can provide a visual display of multiple clustering results to help understand the clustering process and results, enhancing the interpretability of the algorithm.Recent studies have reported that ML consensus clustering approach may distinguish clinically distinct disease phenotypes such as cardiovascular diseases [9,15].
Given the heterogeneity of patients with CS on admission [16], we aimed to identify clinically meaningful phenotypes of patients with CS using an unsupervised ML approach and to assess mortality risks among these distinct clusters.

Study design and data resource
We conducted a retrospective multi-center analysis using all the relevant data extracted from the electronic Intensive Care Unit (eICU) Collaborative Research database.The Database was a comprehensive ICU database for more than 200,000 admissions from over 200 hospitals across the USA between 2014 and 2015 [17].We finished the "Protecting Human Research Participants" curriculum and obtained permission to access the dataset (authorization codes: 33,281,932).The establishment of the eICU database was approved by the Institutional Review Boards of the Massachusetts Institute of Technology (Cambridge, Massachusetts, USA).All the data were anonymized prior to research analyses by the eICU program, and hence the requirement for informed consent was waived.The study adhered to the ethical standards set forth in the 1964 Declaration of Helsinki and its later amendments.

Patient selection
We included all critically ill patients with a primary diagnosis of CS using International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from the eICU database (ICD-9 codes:758.51and R57.0).Patients were excluded if they had: [1] the age of less than 18; [2] multiple ICU admissions; [3] a length of stay in the ICU less than 24 h; and [4] incomplete information about study outcomes (all-cause in-hospital mortality, all-cause ICU mortality, and the incidence of acute kidney injury [AKI] after admission).For patients with multiple admissions, we retained information only on the patient's first admission to the ICU.

Data extraction and processing
Demographic data, medical history, vital signs, laboratory data, scoring systems, treatment information, and others were retrieved from the eICU database using structured query language with PostgreSQL (version 13.6, www.postgresql.org).We only used data that was present within 24 h of ICU admission for clustering analysis since our aim was to phenotype CS patients based on available data at the time of ICU admission.If patients received vital signs measurement or laboratory tests more than once on the first day of admission, only the initial test results were considered for subsequent analyses.

Endpoints
The study endpoints of our study included all-cause inhospital mortality, all-cause ICU mortality, and the incidence of AKI after admission.KDIGO (Kidney Disease: Improving Global Outcomes) criteria were taken as the definition of AKI [19].KDIGO criteria are as follows: increase in serum creatinine to ≥ 1.5 times baseline must have occurred within the prior 7 days, or a ≥ 0.3 mg/dl increase in serum creatinine occurred within 48 h, or urine volume < 0.5ml/kg/h for 6 h or more.The baseline serum creatinine was determined by using the minimum serum creatinine values available within the 7 days before admission.If the pre-admission serum creatinine was not available in the eICU database, the first serum creatinine measured at admission was used as the baseline serum creatinine.

Management of missing data
Variables with more than 20% missing values were excluded since large amounts of missing data might cause bias.Correspondingly, for variables with fewer than 20% missing values, multivariable imputation was applied, which was based on 5 replications and a chained equation approach method.Additionally, the extreme values were not omitted and treated as missing data for imputation [20].

Cluster analysis
We applied an unsupervised ML approach to consensus clustering to identify clinical phenotypes of ICU patients with CS.To prevent producing an excessive number of clusters that would not be clinically helpful, we employed a pre-specified subsampling parameter of 80% with 100 iterations and assigned the number of potential clusters (k) to vary from 2 to 8 in sequence with the K-means clustering algorithm.The optimal number of clusters was determined by cumulative distribution function (CDF) plot, delta area plot, consensus matrix (CM) heat map, cluster-consensus plot in the within-cluster consensus scores, and the proportion of ambiguously clustered pairs (PAC) analysis [21].Pairwise consensus values, defined as 'the proportion of clustering runs in which two items are grouped together' , are calculated and stored in a CM for each k.Then for each k, a final agglomerative hierarchical consensus clustering using distance of 1 − consensus values is completed and pruned to k groups, which are called consensus clusters.The within-cluster consensus score, ranging from 0 to 1, is defined as the average consensus value for all pairs of individuals within the same cluster [22].A value closer to 1 indicates better cluster stability [22].PAC is calculated as the proportion of all sample pairs with consensus values falling within the predetermined boundaries [21,22].A value closer to zero indicates better cluster stability [21].

Statistical analysis
After we identified the clusters of CS patients, we performed analyses to test the differences among the clusters.Data were presented as mean ± standardized differences (SD) and compared between groups using a Student's t test if the measurement data were normally distributed and the variance was homogeneous.If the requirements were not satisfied, data were expressed as median interquartile range (IQR), and the Kruskal Wallis rank test was used for comparisons between groups.Numeration data were reported as absolute numbers and percentages, with statistical analysis using Pearson's χ2 test or Fisher's exact test as appropriate.
We determined the clusters' key features using an absolute standardized mean difference (SMD) of > 0.3 in reference to Thongprayoon's studies [23][24][25].We then compared outcomes among the identified clusters.We assessed the association of clusters with CS and inhospital mortality, ICU mortality, and the incidence of AKI after admission using the LR model.Cluster 1 is taken as the reference group in the further analysis.The extracted variables were not incorporated into the LR analysis because these characteristics were used to identify clusters through unsupervised ML.We performed all analyses using R, version 4.0.5 (RStudio, Inc., Boston, MA, USA; http://www.rstudio.com/),with the package of Consensus ClusterPlus (version 1.54.0) for consensus clustering analysis [22].

Identification of the optimal number of clusters
Out of 34,682 ICU admissions, 21,925 patients with the diagnosis of CS were enrolled in the final cohort (Fig. 1).The CDF plot shows each cluster's consensus distributions (Fig. 2A).The delta area plot displays the relative change in the area under the CDF curve, and the largest changes in area occurred between k = 2 and k = 5 (Fig. 2B).The CM heat map demonstrated that the ML algorithm identified cluster 2 and cluster 3 with clear boundaries, indicating good cluster stability over repeated iterations (Fig. S2A).As shown in the cluster-consensus plot, k = 2 and k = 3 had high stability given their high mean cluster consensus score (Fig. S2B).Additionally, favorable low PACs were demonstrated for 2 clusters (Fig. S1).Summarily, the ML consensus clustering approach from baseline characteristics on admission identified 2 clusters that best represented the data.

Selection of Key features of the clusters
Figure S3 shows missing rate for clinical and laboratory variables extracted from the database.Cluster 1 had 9,848 (44.9%) patients, while Cluster 2 had 12,077(55.1%)patients.As shown in Table 1, the clinical characteristics of the two identified clusters in the CS cohort were significantly different.Ages at presentation were 70 (59-80) years for Cluster 1 cohort, and 65 (63-76) years for Cluster 2, whereas male sex represented 53.8% and 51.2% of patients in these two cohorts, respectively.On the basis of the |SMD|>0.3, the key features of patients in Cluster 1, compared with Cluster 2, included: lower SBP, lower MBP, lower DBP, lower eGFR, higher BUN, higher creatinine, lower albumin, higher potassium, lower bicarbonate, lower RBC count, higher RDW, higher SOFA, higher APS III score, and higher APACHE IV score (Fig. 3 and Table 2).

Major findings
The unsupervised ML consensus clustering approach provides the ability to more efficiently analyze, identify, were the key features used to differentiate the phenotypes of CS.In addition, these two clusters were also associated with multiple study endpoints including in-hospital mortality, ICU mortality, and the incidence of AKI after admission.A more accurate and granular classification could deepen our understanding of CS pathophysiology, be introduced into clinical practice as a risk assessment tool, and provide participant selection information for clinical trials.

Relation to other works
Several prognostic classifications or risk stratifications of CS have been reported.For example, with regard to hemodynamic phenotypes of CS, patients are generally classified into 4 phenotypes based on cardiac output (i.e., insufficient [cold] versus sufficient [warm]) and volume status (i.e., overloaded [wet] versus euvolemic [dry]) which reflect tissue perfusion and congestion, respectively [26,27].This classic "cold and wet" profile is the most frequent CS phenotype, accounting for nearly twothirds of patients with MI-associated CS [28].Based on 6 variables with a maximum of 9 points, there are three risk categories in the IABP-SHOCK II score [29].Patients in the low, intermediate, and high risk categories have an inhospital mortality risk of 20-30%, 40-60%, and 70-90%, respectively.The recently proposed Society of Cardiovascular Angiography and Interventions (SCAI) staging, describing stages of CS from A to E, provides discriminatory potential for morbidity and mortality [6].It can be used to track the severity of shock over the course of a hospital stay.However, it was noted that some of these classification tools are based on expert consensus and theoretical considerations rather than on clinical evidence.To avoid complexity, some of these classifications contain only a few characteristics and depend on specific, although arbitrary, cutoff values that could result in bias and fail to capture the full variability of patient profiles.Additionally, some continuous variables in the classification were changed into categorized variable, which might cause a loss of information on between-subject variability.Furthermore, most of these classifications, using the traditional LR method, were developed assuming that the predictors interact in a linear and additive way, despite the reality that the interactions are often non-linear and multifactorial [11].
To address these limitations above, clustering analysis was used in this study to capture the natural structure of multivariate data without a priori knowledge and it has been applied extensively in medical science, for example, to identify clinical phenotypes [30].It can also treat multiple variables independently and continuous variables as continuous.Zweck et al. [9] used machine learning, and identified 3 distinct CS phenotypes ("Noncongested"   1 Baseline characteristics of the clusters CS, "Cardiorenal" CS and "Cardiometabolic" CS), with specific and reproducible associations with mortality.However, their study and ours differ in terms of the study cohort, sample size, and statistical methods.Additionally, multiple endpoints (in-hospital mortality, all-cause ICU mortality, and the incidence of AKI) were set in our study.AKI, which is reflected by a rise in serum creatinine and a potential reduction in urinary output, may indicate renal hypoperfusion in the setting of CS and is associated with poor outcomes [31].In the current study, 2 phenotypes were identified.Compared with those in Cluster 2, patients in Cluster 1 had worse hemodynamic and metabolic parameters, lower scoring systems, and worse clinical outcomes, which indicated they were more likely to suffer from multisystem organ failure [4,29].Through calculating the SMD of each variable, we determined that SBP, MBP, DBP, eGFR, BUN, creatinine, albumin, potassium, bicarbonate, RBC, RDW, SOFA, APS III, and APACHE IV were the key features between clusters.Some of these indicators have been found to be associated with risk of mortality in CS.A creatinine of greater than 1.33 had significantly higher mortality in the Intra-aortic Balloon Pump in CS (IABP-SHOCK II) trial [32].Serum bicarbonate, especially when evaluated in the early-stage course of CS patients, could offer information regarding prognosis.Wigger et al. [33] found that serum bicarbonate decreased prior to significant elevation of lactate.A low bicarbonate level shows the better ability to predict 30-day mortality than the highest recorded lactate level.One recent study has reported that higher RDW is associated with an increased risk of all-cause mortality in critically ill patients with CS [34].There is mounting evidence that the development of SIRS plays an important role in the pathogenesis of CS.Pierce et al. [35] found that inflammatory cytokines might cause an increase in RDW by affecting iron metabolism and inhibited bone marrow.Additionally, CS can cause activation of the renin-angiotensin system, which leads to an increase in RDW with erythropoiesis [36].Multiple scoring systems derived from the ICU population have been proposed to predict clinical outcomes in CS.A small study comparing the APACHE-II, APACHE-III, SAPS-II, and SOFA scoring systems in CS reported that APACHE-III and SAPS-II had the best mortality discrimination [10].The latest version of APACHE-IV is calculated based on 129 variables derived within the first 24 h of ICU admission, which was assessed from over 110,588 patients admitted to more than 104 ICUs across the USA [37,38].The application of the APACHE IV score is limited due to its complexity.However, as data science advances, the complexity of these scores could be overcome by electronic recording techniques and computing power.

Clinical implications
The strengths of our study include innovative findings via an unsupervised ML consensus clustering approach derived from a large sample size consisting of a multicenter population of ICU patients with CS covering a broad spectrum of etiologies.The identified clusters of CS may be used by clinicians in the ICU to quickly assess patients with CS, as the key features identified in this study are rapid, easy, and inexpensive laboratory tests.These clusters may enhance clinical trials by developing treatment strategies tailored to a shock phenotype instead of aiming for a one-size-fits-all solution, thereby paving the way for more individualized health care.This new classification system of different shock states will also help to make different trials of CS better comparable and may also trigger new randomized trials on the preshock state.

Limitations
There were several limitations to our current study.First, due to the retrospective nature of this study, future studies may collect comprehensive data in a prospective manner and allow for enhanced, even more nuanced examination of the CS phenotypes.Second, in the eICU database, values for some important variables, including lactate, brain natriuretic peptide, and some advanced hemodynamic monitoring parameters, were documented incompletely and not included for currentanalysis.Third, restricted by the eICU database, the etiology of CS has not been identified accurately.Future studies should attempt to conduct subgroup analyses based on different causes of CS.Fourth, consensus clustering was performed on hospital admission and did not include data before or during hospitalization, which could affect hospitalization-related outcomes.Lastly, our classification tool only enrolled the variable of CS at the early stage, and cannot evaluate the severity and progression of CS dynamically.Therefore, in the future, we will study the association between ML-derived phenotypes and endpoints within individual SCAI stages, whose aim is to characterize disease severity as it evolves over the course of a hospital stay.Institutional Review Board of Changzheng Hospital (Naval Medical University, Shanghai, China).

Fig. 1
Fig. 1 Flow diagram of patient inclusion and overview of the statistical analysis.Abbreviation: CS: cardiogenic shock; ML: machine learning; eICU: electronic Intensive Care Unit; ICD-9: International Classification of Diseases, Ninth Revision; CDF: cumulative distribution function; CM: consensus matrix; PAC: proportion of ambiguously clustered pairs; SMD, standardized mean differences; LR: logistic regression

Fig. 3
Fig. 3 The SMD for each of baseline characteristics across clusters.Abbreviation: SMD: standardized mean differences; BMI: body mass index; CICU: cardiac cardiac intensive care unit; CSICU: cardiac surgery intensive care unit; MICU: medical intensive care unit; SICU: surgery intensive care unit; CCU-CTICU: cardiac care unit-cardiac trauma/surgical intensive care unit; NICU: neuro intensive care unit; CTICU: cardiac trauma intensive care unit; CABG: coronary artery bypass grafting; PCI: percutaneous coronary intervention; COPD: chronic obstructive pulmonary disease; AIDS: acquired immunodeficiency syndrome; SBP: systolic blood pressure; DBP: diastolic blood pressure; MBP: mean blood pressure; SpO 2 : oxygen saturation measured by pulse oximetry; WBC: white blood cell; RBC: red blood cell; RDW: red blood cell distribution width; BUN: blood urea nitrogen; eGFR: estimated glomerular filtration rate; SIRS: systemic inflammatory response syndrome; SOFA: Sequential Organ Failure Assessment; APS III: acute physiology score III, APACHE IV: Acute Physiology and Chronic Health Evaluation IV; IABP: intraaortic balloon pump; RRT: renal replacement treatment

Table 1 (
Values are presented as the means (standard deviations) or medians (interquartile ranges) for continuous variables, and categorical variables are presented as total numbers and percentages BMI: body mass index; CICU: cardiac cardiac intensive care unit; CABG: coronary artery bypass grafting; PCI: percutaneous coronary intervention; COPD: chronic obstructive pulmonary disease; AIDS: acquired immunodeficiency syndrome; SBP: systolic blood pressure; DBP: diastolic blood pressure; MBP: mean blood pressure; SpO 2 : oxygen saturation measured by pulse oximetry; WBC: white blood cell; RBC: red blood cell; RDW: red blood cell distribution width; BUN: blood urea nitrogen; eGFR: estimated glomerular filtration rate; SIRS: systemic inflammatory response syndrome; SOFA: Sequential Organ Failure Assessment; APS III: acute physiology score III, APACHE IV: Acute Physiology and Chronic Health Evaluation IV; IABP: intraaortic balloon pump; RRT: renal replacement treatment continued)

Table 2
Selection of key characteristics of the clusters SBP: systolic blood pressure; DBP: diastolic blood pressure; MBP: mean blood pressure; BUN: blood urea nitrogen; eGFR: estimated glomerular filtration rate; RBC: red blood cell; RDW: red blood cell distribution width; SOFA: Sequential Organ Failure Assessment; APS III: acute physiology score III, APACHE IV: Acute Physiology and Chronic Health Evaluation IV.