Study design and setting
This study was a cross-sectional survey conducted in two of the largest landfill sites in Johannesburg. The landfill sites were closer to densely populated areas, with the largest number of waste pickers in Johannesburg. One of the study sites had approximately 600 waste pickers and the other had approximately 3000 waste pickers.
Study population and sample
The study population included male and female waste pickers aged 18 years or older. A convenience sampling frame was employed to include waste pickers in the study. All waste pickers who were available on the day of data collection were approached. The study aim was explained to each one of them in a language they understand. Those who agreed to a written consent were included in the study.
Data collection tools and methods
A structured questionnaire was translated into languages spoken by individual waste pickers. Data were imputed directly into the RedCap database using the electronic version of the structured questionnaire (see Additional file 1). The research was conducted according to the ethical principles in the declaration of Helsinki (1964). Detailed ethics statement is in the declarations section of the manuscript.
The 10-year risk of fatal CVD was calculated using a comparable method from the European based Systematic Coronary Risk Evaluation (SCORE) project [18]. We opted to use the SCORE methods because the Framingham data, an analyses strategy for 10-year risk of fatal CVD tended to overestimate absolute risk in populations with lower coronary heart disease rates [19, 20]. In this study, we assumed that the waste pickers have low-risk CVD which is comparable to the low-risk countries reported in the SCORE risk function. This is because they constantly do physical activities, a major factor for the development of CVD. The physical activities may include strenuous work in pulling, pushing, bending, and walking during the collection of waste materials. Also, the SCORE risk function allows for variation in the risk of CVD [18].
The risk factors used in the SCORE risk function are widely regarded as the classical CVD risk because of their strong association with CVD. Therefore, we used the SCORE low CVD risk estimates based on sex, age, systolic blood pressure, total cholesterol, and current smoking status. We took anthropometric measurements for weight and height to generate BMI (weight in kilograms per height in meters squared) category: underweight was defined as less than 18.5), the normal weight started from 18.5 to < 25), overweight (25 to < 30), and obesity from 30 and above [21]. Systolic and diastolic blood pressure was measured three times, and the average of the three measurements was taken to generate the final variables. Hypertension was defined as systolic blood pressure > = 140 mmHg and/or diastolic blood pressure > = 90 mmHg [21]. Point of care testing was done for random blood glucose. Random blood glucose was used as a proxy to indicate possible diabetes without the diagnosis. The random blood glucose was categorized into three groups with 4.5–7.8 mmol/L as normal and 7.8–11.1 mmol/L, > 11.1 mmol/L as moderate and high, respectively [22]. Also, total cholesterol in mmol/L was measured. Other self-reported data collected were for education, injuries, clinic visits, alcohol, average monthly income, and HIV.
Statistical analysis
Statistical analysis was undertaken using STATA SE version 15.1 (4905 Lakeway Drive, College Station, TX, USA). Descriptive statistics for continuous variables were summarized as means and standard deviations, factor variables were presented as frequencies and percentages. We used the SCORE method to estimate the 10-year risk of fatal CVD [18]. The cardiovascular risk functions were calculated using a Weibull proportional hazards model for baseline survival function only. The hazard function for men and women were calculated separately, and the risk factor estimate was done for combined sexes. The hazard function calculation was based on the participant's age, to produce an estimate that depends on the observed age rather than using the length of the study period as in the traditional survival function. The 10-year risk of fatal CVD was based on the conditional probability of death in the next 10 years given a participant’s current age. The risk of death was estimated by combining two separate models, one for coronary heart disease (CHD) and another for non-coronary heart disease (non-CHD).
To calculate the 10-year risk for fatal cardiovascular disease, we first estimated the underlying risk for CHD and non-CHD. This was done separately for participant's current age and their age in 10 years.
The survival probability is:
$$\begin{aligned} {\text{S}}_{0} \left( {{\text{age}}} \right) & = {\text{exp}}\left\{ { - \left( {{\text{exp}}\left( \alpha \right)} \right) \, \left( {{\text{age}} - 20} \right)^{p} } \right\} \\ {\text{S}}_{0} \left( {{\text{age}} + {1}0} \right) & = {\text{exp}}\left\{ { - \left( {{\text{exp}}\left( \alpha \right)} \right)\left( {{\text{age}} - 10} \right)^{p} } \right\}* \\ \end{aligned}$$
(1)
where α and p are low-risk constant coefficients available in the SCORE report. * Weibull model is expressed as λ = exp(α).
We then calculated the weighted sum, w, of risk factors including cholesterol, smoking, and systolic blood pressure, this was also done separately for CHD and non-CHD.
$$w = \, \beta_{chol} \left( {{\text{Cholesterol}} - 6} \right) + \beta_{SBP} \left( {{\text{SBP}} - 120} \right) + \beta_{smoker} \left( {{\text{current}}} \right).$$
(2)
where β is a constant coefficient available in the SCORE report [18].
To get the probability of survival at each age for each cause, we combined the underlying risk for CHD and Non-CHD disease at current participants' age and age in 10 years in Eq. 1, with the weighted sum of participant’s risk factors in Eq. 2.
$$\begin{aligned} {\text{S}}\left( {{\text{age}}} \right) & = [\left\{ {{\text{S}}_{0} \left( {{\text{age}}} \right)} \right\}^{{{\text{exp}}(w)}} \\ {\text{S}}\left( {{\text{age}} + {1}0} \right) & = \left\{ {{\text{S}}_{0} \left( {{\text{age}} + {1}0} \right)} \right\}^{{{\text{exp}}(w)}} \\ \end{aligned}$$
The 10-year survival probability based on a participant's current age and age in 10 years is:
$${\text{S}}_{{{1}0}} \left( {{\text{age}}} \right) = {\text{S}}\left( {{\text{age}} + {1}0} \right)/{\text{S}}\left( {{\text{age}}} \right)$$
The 10-year risk for each point is given by:
$${\text{Risk}}_{10} = 1 - {\text{S}}_{10} \left( {{\text{age}}} \right)$$
Finally, we combined the risk for CHD and non-CHD to produce the 10-year risk of fatal CVD.
$${\text{CVDRisk}}_{10} \left( {{\text{age}}} \right) = \left[ {{\text{CHDRisk}}\left( {{\text{age}}} \right)} \right] + \left[ {{\text{Non-CHDRisk}}\left( {{\text{age}}} \right)} \right]$$
We then used ordinary least squares regression to assess the association between the 10-year risk of fatal CVD and known metabolic risk factors. To avoid, any spurious findings, we adjusted our model with full variables inclusion and systematic reduction. Separate models were fitted for continuous variables that have been transformed to factor variables, with all other covariates adjusted. We opted to use the bootstrapping technique with maximization of variance–covariance estimation on the log-transformed 10-year risk of the fatal CVD response variable. The bootstrapping model produced bias-corrected 95% confidence intervals from the underlying unobserved clustering from the data.