Evaluating the association of social needs assessment data with cardiometabolic health status in a federally qualified community health center patient population

Background Health systems are increasingly using standardized social needs screening and response protocols including the Protocol for Responding to and Assessing Patients’ Risks, Assets, and Experiences (PRAPARE) to improve population health and equity; despite established relationships between the social determinants of health and health outcomes, little is known about the associations between standardized social needs assessment information and patients’ clinical condition. Methods In this cross-sectional study, we examined the relationship between social needs screening assessment data and measures of cardiometabolic clinical health from electronic health records data using two modelling approaches: a backward stepwise logistic regression and a least absolute selection and shrinkage operation (LASSO) logistic regression. Primary outcomes were dichotomized cardiometabolic measures related to obesity, hypertension, and atherosclerotic cardiovascular disease (ASCVD) 10-year risk. Nested models were built to evaluate the utility of social needs assessment data from PRAPARE for risk prediction, stratification, and population health management. Results Social needs related to lack of housing, unemployment, stress, access to medicine or health care, and inability to afford phone service were consistently associated with cardiometabolic risk across models. Model fit, as measured by the c-statistic, was poor for predicting obesity (logistic = 0.586; LASSO = 0.587), moderate for stage 1 hypertension (logistic = 0.703; LASSO = 0.688), and high for borderline ASCVD risk (logistic = 0.954; LASSO = 0.950). Conclusions Associations between social needs assessment data and clinical outcomes vary by cardiometabolic condition. Social needs assessment data may be useful for prospectively identifying patients at heightened cardiometabolic risk; however, there are limits to the utility of social needs data for improving predictive performance. Supplementary Information The online version contains supplementary material available at 10.1186/s12872-021-02149-5.

social needs associated with downstream consequences of the SDOH [3,4]. To this end, health care systems and payers are increasingly collecting population-and individual-level data on social needs, including food insecurity, unemployment, housing instability, and transportation barriers [5,6].
In 2016, the Protocol for Responding to and Assessing Patients' Assets, Risks, and Experiences (PRAPARE) was developed by the National Association of Community Health Centers and partnering organizations as a screening tool and corresponding clinical workflow to assess and respond to patients' social needs. PRAPARE has been the most prevalent social needs assessment in the United States, increasingly used by hospitals, health systems, and health plans [7,8]. The PRAPARE screening assessment bridges social risk and clinical risk indicators by being embedded into electronic health record (EHR) systems and has facilitated national standards surrounding social risk data capture, reporting, and population health and care management activities [9].
Despite these features and the prevalence of PRA-PARE, there is little evidence on the relationship between data from PRAPARE and clinical outcomes of interest [10]. A recent systematic review of PRAPARE and similar social needs screening assessments found little evidence to evaluate predictive validity [11]. Addressing this gap is critical as health care systems consider delivery reforms that embrace population health management and care coordination across the health and social care continuum [4,12]. With a better understanding of how social needs screening assessment data predicts clinical risk, health systems and payers can identify patients with health related social needs that are most predictive of poor health, provide care management to promote linkages to appropriate wraparound services and community resources to patients most likely to realize health benefits, measure the impact of interventions, manage patient panels, and inform care team composition [13]. As health systems and payers increasingly invest in collecting this information [4,14,15], there is a need to evaluate the relationship between patients' social needs and their clinical risk to design tailored interventions.
This cross-sectional study examined the relationship between responses to PRAPARE and cardiometabolic clinical outcomes among patients in a federally-qualified community health center (FQHC). We utilized two approaches to examine the association between social needs assessment data and the likelihood of the following clinical indicators of cardiometabolic health status: obesity, hypertension, and atherosclerotic cardiovascular disease (ASCVD) 10-year risk. These outcomes are important because cardiometabolic disease is the leading cause of death of people in the United States, and obesity and hypertension are both modifiable risk factors [16]. Our goals were to (1) better understand the social needs and health status of a defined population and (2) evaluate PRAPARE social needs assessment data's association with cardiometabolic health status to inform risk prediction, stratification, and population health management. We hypothesized that models using social needs data from PRAPARE would have moderate performance for three cardiometabolic health outcomes.

Study setting and data collection
The study was conducted at a FQHC in a medium-sized city in the southeastern United States. In 2019, the FQHC saw 36,361 unique patients, 97% of whom had incomes ≤ 200% of the federal poverty level; 57% were uninsured and 93% were members of racial and/or ethnic minorities.
The FQHC began implementing PRAPARE in mid-2017 in its Pediatric, Adult Medicine, and Family Medicine clinics. PRAPARE tool was fully integrated within the FQHC's EHR. The social needs assessment is administered via patient interview, and referrals to community resources or social services are made based on identified needs. Additional detail on the FQHC's clinical workflow, patient population, EHR integration, and implementation logistics are published elsewhere [17].

Measures
We obtained the data used in this analysis through a retrospective query to abstract charts of patients who had received PRAPARE as part of their primary care clinical encounter. PRAPARE includes a set of national, well-validated core measures and additional optional measures to match community priorities [18]. The core measures evaluate the following: race, ethnicity, education, employment, migrant/seasonal farm work, insurance, veteran status, income, language, material security (food, clothing, childcare, utilities, medicine/health care, phone, and other), transportation, housing status, housing stability, social integration and support, address/neighborhood, and stress. The optional measures include incarceration history, safety, refugee status, and domestic or interpersonal violence. PRAPARE aligns with existing national initiatives [19], ICD-10 clinical coding, and the Uniform Data System used by the Health Resources and Services Administration. All core and optional measures were included as independent variables except for neighborhood, incarceration history, refugee status (not consistently collected during the chart abstraction period), language (data missingness and correlation with ethnicity), and income (data quality/missingness). Sex and age were included as covariates in the analysis.
We evaluated three cardiometabolic clinical outcomes: body mass index (BMI), systolic and diastolic blood pressure, and ASCVD 10-year risk. These outcomes were selected because of their relevance in primary care [20], data availability and integrity, and causal links to how social needs can affect clinical risk [21]. All three clinical outcomes were dichotomized: obesity was defined as BMI ≥ 30 for all patients with recorded height and weight (n = 2153). Hypertension was defined by a stage-1 threshold using patients' systolic (> 130 mmHg) or diastolic (> 80 mmHg) blood pressure (n = 2174) [22]. Finally, the ASCVD risk score is an estimate of the likelihood of an ASCVD event over the following 10 years and was developed to identify patients that might benefit from primary prevention [23]. Patients with a risk score > 7.5% were classified as being at risk requiring clinical intervention using statin therapy based on existing guidelines [24]. Because ASCVD is not a valid cardiovascular estimate for patients younger than 40 years, they were excluded, leading to an analytic sample of 1468 patients after imputing missing cholesterol readings with a healthy value as is consistent with best practices for managing missing EHR data [25].

Regression methods
We employed two standard model selection approaches to assess the associations between PRAPARE responses and the three clinical outcomes: (1) backwards stepwise logistic regression, a parametric approach in which predictor variables are included in the model and removed individually if they were not statistically significant at a 0.157 level, a value recognized to optimize the Akaike information criteria (AIC) [26,27]; and (2) logistic least absolute selection and shrinkage operation (LASSO) regression, a type of supervised machine learning algorithm that performs model selection by "shrinking" or penalizing variables, (i.e., setting certain coefficients to zero if they are not contributing explanatory power to the model) [28]. As a result, LASSO is designed to avoid overfitting better than regression models without a penalization function and uses a data-driven approach to select variables, minimize prediction error, maximize out-of-sample performance, and address issues with multicollinearity. We used three different LASSO models with different model selection criteria. One was based on an adaptive LASSO, one minimized the Bayesian information criteria (BIC), and one minimized AIC, which estimates the amount of information lost by using the model [29]. Model fit was similar across all three LASSO approaches, but the AIC approach consistently performed well and was selected based on performance and theory. Additional results comparing the different LASSO approaches for all variables of interest can be found in Additional file 1.
For each outcome and regression approach, nested sequential models were built by adding social need and demographic covariates in groupings. This sequential approach highlighted changes in model performance and goodness-of-fit as PRAPARE covariates are added to the limited set of demographic characteristics usually operationalized in typical EHR. Model 1 used only age and sex; Model 2 added race and ethnicity; and Model 3 added expanded demographic covariates from the PRAPARE that can often be found in existing EHR data (e.g., household size, education, employment, insurance status). The full model, Model 4, added the remaining social needs covariates identified during screening. For each model, the backwards stepwise or LASSO approach selected variables to include. Excluded variables were also excluded in all subsequent models to enable direct comparison of the nested models.

Model evaluation
To evaluate model goodness-of-fit, a likelihood-ratio test was conducted for each successive pair of nested models. To evaluate model performance, we computed the concordance statistic (c) and assessed the statistical significance of c-statistic improvement between sequential models using a equality-of-areas test [30]. The c-statistic represents the area under the receiver operating characteristic curve, which can range from 0 to 1. A c-statistic of 0.5 indicates that the model performs as well as random chance at classifying outcomes and 1 indicates perfect accuracy. For the stepwise logistic regression models, c-statistics were computed over the full dataset; for the LASSO regression models, c-statistics were computed over a validation dataset that contained 20% of the observations, which were excluded from model training. We hypothesized that model performance for the full PRA-PARE model (Model 4) would be satisfactory across the three clinical outcomes. We defined satisfactory performance as having a c-statistic > 0.65, the lower bound for moderate discrimination [31]. All statistical analysis was conducted in Stata version 16 [32].

Study sample
Between May 2017 and February 2019, PRAPARE was delivered to 2192 patients, primarily those who were referred to behavioral health either as part of a primary care or stand-alone appointment ( Table 1). The average patient age was 50 and ~ 60% of patients were female. Almost half of patients were African American and about a third were Hispanic. Approximately one third of patients lacked a high school education and the majority were uninsured or unemployed. The most commonly-reported social needs were social isolation, barriers to health care and medicine access, lack of housing, transportation barriers, food insecurity, and high stress. Over half of patients (54.5%) were obese, 52.8% had stage-1 hypertension, and 73.6% had borderline 10-year ASCVD risk or higher. Since the calculation of ASCVD risk is only valid for patients age 40-79, The presence of social needs was generally greater among patients with adverse health outcomes (Table 1). In bivariate analyses, food insecurity, lack of access to care and medicine, and inability to afford a phone plan were significantly more prevalent in patients with high blood pressure and borderline-or-higher ASCVD risk than in patients without. However, this trend was not the case for obese patients, who were more likely to have housing, had fewer transportation barriers, and lower stress.

Nested models
For both the stepwise logistic and LASSO approaches, the full models (Model 4) for all three clinical outcomes included both demographic and social need variables (Additional file 2). The number of variables retained in each model ranged between 7 and 13 for the stepwise logistic regression models and between 5 and 13 for the LASSO logistic regression models. The variables that were included in at least three models were: age, sex, race, lack of housing, unemployment, high stress, and access to health care or medicine.
In multivariable analyses, the magnitude and direction of the odds ratios were similar between the stepwise and LASSO approaches ( Table 2). High stress and lack of housing were associated with decreased odds of obesity in full models. Identified needs related to health care and medicine access were associated with increased odds of both hypertension and borderline-or-higher ASCVD risk. Lack of housing was associated with lower odds of obesity, but higher odds of borderline-or-higher ASCVD risk. For all three clinical outcomes, the final stepwise and LASSO models performed similarly, as measured by c-statistic (Table 3). Model performance was poor for predicting obesity (stepwise, c = 0.617; LASSO, c = 0.590), moderate for hypertension (stepwise, c = 0.711; LASSO, c = 0.681), and high for ASCVD risk (stepwise, c = 0.944; LASSO, c = 0.949). The high prediction performance for ASCVD risk was expected as age, race, and sex are used to calculate the score and were included as covariates alongside the PRAPARE variables.
With each sequential model, improvements in performance were observed as covariates were added; however, improvements were not always statistically significant (Table 3). For obesity, the addition of the social needs covariates (Model 4) resulted in a statistically significant increase in the c-statistic, but only in the stepwise approach. For hypertension, only the inclusion of race and ethnicity covariates (for Model 2) was associated with a significant increase in the c-statistic for both approaches. For ASCVD risk, the addition of the social needs covariates (for Model 4) significantly increased the c-statistic for the stepwise approach; however, for the LASSO approach, the c-statistic plateaued in Models 2, 3 and 4, with a slight decrease in performance in Models 3 and 4.
Sequential models often demonstrated statistically significant improvements in goodness-of-fit (Table 3). In particular, for both regression approaches, the addition of the social needs covariates from PRAPARE in all three clinical outcomes significantly improved goodness-of-fit compared to the model that included extended demographics only.

Discussion
Assessing and responding to social needs is a major priority for health care systems seeking to deliver high-value care and improve population health. Efforts to better integrate these activities into routine clinical encounters and standard of care [6] include social needs EHR documentation strategies [33], innovative care models [34,35], and cross-sector collaboration [36]. This study builds on the existing literature by evaluating the relationship between responses to a standardized social needs assessment and accepted measures of cardiometabolic health outcomes. Our intention is to highlight practical analytical tools for leveraging social needs information from PRAPARE and similar screening tools [37,38] to better understand the association between social needs and commonly-studied cardiometabolic outcomes. The application of these analytical tools has the potential to enhance value-based care, population health management, panel management [13], and integrated socialmedical care model design and implementation [39,40]. We evaluated the relationship between data from PRA-PARE and three cardiometabolic outcomes using two predictive analytic approaches. We found that social needs were more prevalent in patients with hypertension and borderline-or-higher ASCVD risk. Interestingly, social needs were less prevalent in obese patients compared to those who were not obese. Lack of housing, high stress, and access to medicine and health care were the only social needs that were selected in models across more than one clinical risk outcome. These social needs may be proxies for additional, interrelated non-medical drivers of health and do not represent causal mechanisms between social needs and clinical outcomes.
The presence of social needs was associated with lower prevalence of obesity. Existing literature indicates that this counter-intuitive finding may be because obesity has a unique and multifactorial relationship to social needs and SDOH that varies by cultural context, race, ethnicity, and gender [41]. This finding highlights the importance of both understanding that associations between social needs and clinical outcomes depend on how adverse outcomes are defined, as well as the need for cautious interpretation of the directionality of the effect based on the variables retained in the model.
We hypothesized that the predictive analytic approaches would demonstrate moderate performance for all three cardiometabolic outcomes (c-statistic > 0.65); we found support for this hypothesis for ASCVD risk and hypertension, but not for obesity. We noted that model performance for ASCVD risk was very high even without including the clinical parameters or health behaviors used to calculate an ASCVD risk score as predictor variables (blood pressure, total and high density lipoprotein cholesterol, diabetes diagnosis, smoking status, and hypertension treatment). The comparison across nested models further contextualizes this finding. The inclusion of social needs variables for Model 4 resulted in a statistically significant increase in prediction performance compared to Model 3 in the backwards stepwise approach. The small decrease in the LASSO approach performance between the same models may be due to overfitting on a small sample, despite regularization to prevent it, and limitations to increases to prediction performance with increasingly granular data on an individual's unmet social needs. Nevertheless, this still suggests that social needs assessment data as a whole has useful associations with clinical outcomes in the absence of information on health behaviors and biometric data.
We found that the stepwise logistic and machine learning LASSO regression models demonstrated similar performance, a finding consistent with prior studies assessing the performance of predictive models [42,43]. A potential explanation is that the advantages of more advanced, resource-intensive machine learning techniques like LASSO regression or random forest models, compared to stepwise logistic regression models, require more observations and dimensionality to become apparent [31]. The number and functional form of included variables also can influence results, with similar research demonstrating better performance for machine learning approaches when more variables and continuous variables are used [44]. This underscores the importance of considerations regarding data transformation, variable functional form, and data missingness when using and selecting a predictive analytical approach.
This study has several limitations. Because PRAPARE was administered only to a subset of patients referred to behavioral health, there was less variation in social need levels to base predictive analytics. The smaller sample size of patients from multiple clinics within a FQHC may have limited prediction performance and generalizability. Generalizability may also be limited by the setting-FQHCs in the southeastern United States. Finally, the prediction modeling approaches used in this analysis do not allow for making conclusions on potential causal inference. Despite these limitations, this study contributes to an emerging evidence base that suggests the formal and pragmatic validity of PRAPARE and provides insights into how social needs data could be used in outpatient settings to predict cardiometabolic health outcomes.
Predictive analytics may have the potential to proactively identify patients at higher risk for poor health outcomes who could benefit from an intervention, even in the absence of data obtained in current screening methods for cardiometabolic outcomes; our results align with previous literature suggesting limits to the utility of SDOH data for risk prediction [45]. In other words, additional SDOH data do not always lead to statistically significant improvements in prediction performance. This is relevant as payors, including state Medicaid programs [46], collect social needs data for new enrollees to identify patients at risk for worsening medical complexity based on social needs in order to improve population health. Our findings suggest that performance may depend on how clinical outcomes are defined and that relationships between social needs assessment data and outcomes vary by disease pathway.

Future research directions
Future research should evaluate social needs prevalence and association with additional clinical outcomes, using prospective data to understand how social needs data can be used to predict clinical risk and, ultimately, improve population health. Though this study focused on moderate clinical outcomes that may be useful for proactive intervention, outcomes that correspond to more severe clinical conditions (e.g., A1c > 8.0%, ASCVD > 20%) may have differing and perhaps stronger associations with unmet social needs. Ideally, this would include linking multiple data sources to comprehensively describe patient behaviors and environment in addition to information on social needs to predict other clinical risk and health status indicators including uncontrolled diabetes, co-morbidity burden, and behavioral health outcomes. Moreover, risk prediction around social needs will only add significant value if it is coupled with implementing evidence based responses to social needs that meaningfully address social needs to improve health outcomes in a cost-effective manner. These responses, or social care interventions, will need to be rigorously tested in diverse settings among a study sample of sufficient size to detect its impact on outcomes of interest including medication adherence, utilization, and cardiovascular health status.
Understanding the relationship between clinical outcomes and social needs may have important ramifications for how payers adjust for risk. In addition, future research should also evaluate the relationship between social needs assessment data and the likelihood of requiring costly types of heath care utilization including inpatient and emergency department visits. As social need screening becomes wider spread, there is a need to understand how this data can be used to improve health equity as health systems focus on improving population health. For example, our findings and future research can inform the business case for health systems to implement interventions to address social needs, which has the potential to narrow disparities in care resulting from social and economic inequities [47]. A critical step will be to design quality measures that complement care guidelines to focus support on medically vulnerable patients with unmet social needs [48].

Conclusions
Associations between standardized social needs assessment data and clinical outcomes vary by cardiometabolic outcome and may depend on how clinical outcomes are defined, which has implications for designing population health management and quality improvement initiatives using social needs assessment data on a defined patient population. Predictive analytics has the potential to leverage associations between clinical outcomes and social needs assessment data; however, there are limits to the utility of social needs data for improving predictive performance. Future research should further evaluate the utility of social needs assessment data to predict forms of clinical risk and health care utilization, especially as this data becomes more available in administrative or health records.