- Research article
- Open Access
- Open Peer Review
A methodology review on the incremental prognostic value of computed tomography biomarkers in addition to Framingham risk score in predicting cardiovascular disease: the use of association, discrimination and reclassification
BMC Cardiovascular Disorders volume 18, Article number: 39 (2018)
Computed tomography (CT) biomarkers claim to improve cardiovascular risk stratification. This review focuses on significant differences in incremental measures between adequate and inadequate reporting practise.
Studies included were those that used Framingham Risk Score as a baseline and described the incremental value of adding calcium score or CT coronary angiogram in predicting cardiovascular risk. Searches of MEDLINE, EMBASE, Web of Science and Cochrane Central were performed with no language restriction.
Thirty five studies consisting of 206,663 patients (men = 118,114, 55.1%) were included. The baseline Framingham Risk Score included the 1998, 2002 and 2008 iterations. Selective reporting, inconsistent reference groupings and thresholds were found. Twelve studies (34.3%) had major and 23 (65.7%) had minor alterations and the respective Δ AUC were significantly different (p = 0.015). When the baseline model performed well, the Δ AUC was relatively lower with the addition of a CT biomarker (Spearman coefficient = − 0.46, p < 0.0001; n = 33; 76 pairs of data). Other factors that influenced AUC performance included exploration of data analysis, calibration, validation, multivariable and AUC documentation (all p < 0.05). Most studies (68.7%) that reported categorical NRI (n = 16; 46 pairs of data) subjectively drew strong conclusions along with other poor reporting practices. However, no significant difference in values of NRI was found between adequate and inadequate reporting.
The widespread practice of poor reporting particularly association, discrimination, reclassification, calibration and validation undermines the claimed incremental value of CT biomarkers over the Framingham Risk Score alone. Inadequate reporting of discrimination inflates effect estimate, however, that is not necessarily the case for reclassification.
Selective reporting of association and non-standardisation of cut points make evidence synthesis difficult
There was a negative correlation between improved discrimination and the baseline performance of Framingham Risk Score
No evidence was found between poor reporting practice and the magnitude of categorical net reclassification index
Prediction models are not perfect  and often have methodological weaknesses meaning few are used in everyday practice . Framingham Risk Score (FRS)  is one of the exceptions because of its extensive validation . It is however, not perfect with researchers attempting to improve prediction within the intermediate category . Numerous other tests have been investigated hoping to improve upon this base model . This is also in keeping with the rising interest in novel biomarkers in medicine , particularly in the field of cardiovascular  and cancer  risk prediction. In search of a surrogate biomarker that detects subclinical disease, coronary calcium score (CACS) has been investigated for more than a decade  with some proposing screening with computed tomography (CT) in the general population . Also the quality of both primary and secondary prognostic studies is generally poor  but with a few exceptions . Thoracic calcium score is considered a relative of CACS where the Agatston method  is applied to the thoracic aorta . CT coronary angiogram (CTCA) has established itself in the acute chest pain setting and is now being investigated as a tool of reclassifying cardiac risk based on luminal stenosis (and other characteristics) in the CONFIRM cohort . These imaging biomarkers generate substantial interest and may add incremental value to traditional Framingham risk factors. Given the known methodological issues [15,16,17], we looked for differences between adequate and inadequate reporting practice.
This article is reported in line with Preferred Reporting Items for Systematic Reviews and Meta-Analyses . The protocol for this review was registered on PROSPERO (2015:CRD42015023795) . CLP searched MEDLINE, EMBASE, Web of Science and the Cochrane Central Register of Controlled Trials in July 2015. The bibliographies of all included studies were searched for further potential studies. Attempts were made to retrieve missing information in the included publications by contacting authors and grey literature was searched [20, 21]. Only full text publications were included. No language restrictions were applied. An update search was carried out in June 2016.
Screening Process & Study Selection
Two reviewers (CLP, NP) conducted title and abstract, and then full-text screening, against the inclusion criteria. A third reviewer (CH) was involved if disagreements were not resolved by consensus. We only included studies that examined Agatston or CACS, thoracic aorta calcium score (TACS) or CTCA as new predictors and used any iteration of FRS as a baseline model. We included any cohort study that reported the association of FRS with the defined predictors and cardiac endpoints and/or cardiovascular comorbidities. In addition, these studies had to report one of the following: summary statistics indicating incremental value of the predictor of interest in addition to the old model, such as difference in area under the receiver operating characteristics curve (Δ AUC) [22, 23], category-based net reclassification index (NRI) [24, 25], integrated discrimination improvement (IDI) , relative IDI (rIDI)  or other reclassification measures. Composite endpoints  were considered and studies using surrogate outcomes were excluded .
Two authors (CLP, NP) independently extracted data from the included studies, recording the first author, journal, publication year, outcome assessed, population evaluated and their inferences on whether the additional predictor improves prediction beyond the FRS. Publications were classified whenever possible as defined by Framingham/Wilson 1998, Framingham/Adult Treatment Panel III (ATP) 2002 and Framingham/ D’Agnostino 2008 [30, 31]. Original 95% CI, standard error, standard deviation or p-values of summary estimates of interest were extracted [32, 33]. If the Kaplan-Meier survival curve was available, any missing hazard ratio was estimated (by YW) . Specifically, the standard of reporting effect sizes that signalled incremental prognostic value was evaluated. The choice of optimal cut points/thresholds was also examined, particularly in relation to size effects that indicate association . Various methods of quantifying incremental prognostic value of an additional test have been described . We focused on the reporting characteristics of multivariable regression, calibration, discrimination and reclassification . For the documentation of multivariable regression, we determined its adequacy based on the availability of information on whether an additional predictor was significant at p < 0.05 level, or the use of tests that penalised the inclusion of an additional predictor. For discrimination, we assessed the documentation of the baseline AUC of FRS and the Δ AUC as a result of an additional predictor of interest. The adequacy of reporting baseline AUC relied on accurate documentation of the FRS as originally published . In brief, the calculation of FRS could be threatened by addition, deletion or modification of the original FRS items. Other aspects included whether coronary heart disease (CHD) was measured and the measured population was similar to the original FRS population. For reclassification, all publications were searched for NRI calculation or results. The type of NRI was verified. We considered established categories (e.g. < 10%, 10–20%, > 20%) or any justified use of categories as appropriate to relevant data sets. The recommendation of reporting reclassification was taken from .
We rated study quality using the Quality In Prognosis Studies (QUIPS) tool . Two reviewers (CLP, NHP) conducted the quality assessment on the major aspects and two reviewers (JP, YHW) assessed the statistical aspect of the studies independently.
For studies where the 95% CI or standard error was not reported, a correlation coefficient of 0.3 between FRS and CACS/CTCA was used to allow estimation of the 95% CI for Δ AUC based on data from . Numbers were displayed as exact numbers, median or percentages. The alteration of the risk factors used to calculate FRS was assessed based on previously published items  with modifications. The items were scored ordinally as either yes, no or unclear. FRS model of the 1998 iteration was scored against 18 items. FRS model of the 2002 and 2008 iterations were scored against 15 items (3 diabetes related items discounted). The summation of the individual item score indicated the overall level of alteration which was dichotomised into a binary variable: minor and major alterations. The threshold for dichotomisation was based on the median number of items altered among the included studies. The summary of NRIs and AUCs was displayed as medians and interquartile ranges. NRIs and AUCs were subsequently split into two groups depending on the practice of reporting being either adequate or inadequate, or as equivalent binary groups. The aspects of reporting were based on previously published work on AUC  and NRI  with adaptations. The respective groups were then compared using the Wilcoxon sign rank test at significance level p < 0.05. Specifically, we were looking for any particular practice of reporting AUC or NRI leading to excessive claims of any additional predictor. All statistical analysis was carried out using STATA version 14.0 (StataCorp, College Station, Texas).
Eight hundred and one unique hits were screened, leading to 35 studies (Fig. 1) encompassing 206,663 patients (men = 118,114, 55.1%) [4, 5, 9, 13, 14, 39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68]. All publications concluded that at least 1 imaging biomarker indicated either independent association with composite endpoints, improved discrimination or classification beyond traditional risk factors. However, there were reservations about TACS [13, 48] and some argued against the reclassification properties of CACS and CTCA [52, 59].
Quality of included studies
The included studies were usually at predominantly low risk of bias with regard to study participation, measurement of prognostic factors, outcome measure and confounding factors. There was low to moderate risk of bias for statistical analysis because several studies selectively reported results and/ were not clear about the process of model building. The majority of studies were at high risk of attrition bias because there were notable amount of missing data and/ the number of participants’ loss to follow-up was not accounted for. Figure 2 shows the overall bias assessment.
The types and calculation of Framingham risk score
Eleven studies (31.4%) adopted FRS 1998 [5, 39, 44, 46, 48, 52, 53, 59, 62, 66, 67], 13 studies (37.1%) adopted FRS 2002 [13, 40,41,42,43, 47, 49, 50, 54, 58, 63, 65, 68] and 3 studies adopted FRS 2008 [51, 56, 60]. Three studies used both FRS 1998 and 2002 [4, 9, 14]. Two studies did not specify the iteration of FRS used [55, 57]. According to previously published criteria , additions, deletions and modifications of risk factors are shown in Table 1. Six studies (17.1%) did not provide any mean estimate or a breakdown of different categories of FRS [41, 51, 55, 58, 67, 69]. The median number of items altered was 3. Using that as the threshold, twelve studies (34.3%) had major alterations [4, 9, 39, 41,42,43,44, 46, 51, 59, 67, 68] and 23 studies (65.7%) had minor alterations [5, 13, 14, 40, 45, 47,48,49,50, 52,53,54,55,56,57,58, 60,61,62,63,64,65,66]. Five of 23 studies that had minor alterations did not have any components of FRS altered [49, 55, 56, 60, 66]. Of those 5 studies, two studies did not provide any information about the components of FRS [56, 60] and were given the benefit of the doubt, however, findings should be interpreted with caution.
Thresholds and reporting of association
Odds ratio, relative risk, c-index and hazard ratio were used to indicate association of the imaging biomarker with outcomes. There was selective reporting of subgroups and p values among the reported subgroups (Table 2). The reference groups of investigations were not consistent. In CTCA, the reference group could either be no disease or non-obstructive disease. In both TACS and CACS, the reference group was not always a score of zero. The Additional file 1 displays the details regarding the thresholds of different types of investigation. The cut points that define respective categories were also variable.
Intended population for Framingham risk score
Four studies (11.4%) had an exclusively Caucasian population [5, 13, 52, 53]. Eleven studies (31.4%) had more than 10% non-Caucasian population [41, 48,49,50,51, 54, 57, 58, 61, 66, 68]. Twenty two studies (62.9%) had not recorded ethnicity as a variable [4, 9, 14, 40, 42,43,44,45,46,47, 52, 53, 55, 56, 58,59,60, 62,63,64,65, 67]. Five studies (14.3%) had documented CHD at baseline [45, 57, 63, 65, 67]. Considering all the information, only 4 studies (11.4%) were identified as similar to the original Framingham population [5, 13, 52, 53].
Documentation of regression, discrimination & AUC analysis
Of the 35 studies, the majority appropriately reported multivariable regression (74.3%). Thirty three studies reported AUC estimates for both FRS alone and the FRS with additional CT biomarkers with data on 76 such pairs of data. Appropriate documentation of AUC was not common practice (36.4%). The method used to compare receiver operating characteristics curves was not always described (39.4%). Only eight studies reported calibration (22.9%) [4, 48, 51,52,53,54,55, 68]. Table 3 shows the reporting of regression and discrimination. The AUC of FRS alone ranged from 0.53 to 0.77 (median = 0.68). The Δ AUC ranged from − 0.07 to 0.24 (median = 0.06). There was strong inverse correlation between the Δ AUC and the baseline FRS AUC (Spearman correlation coefficient, − 0.46, p < 0.0001). When the baseline FRS AUC performed well, the Δ AUC was relatively lower with the addition of a CT biomarker (Fig. 3).
Table 4 shows the median AUC values and the Δ AUC when the data was classified according to features of design and analysis . The baseline FRS AUC performs better with minor alterations compared with those with major alterations of the Framingham model (p = 0.0006). The improvement in AUC was greater in those with major alterations of the Framingham model (p = 0.015). Other factors that significantly affected the performance of AUC included the exploration of data analysis, reporting of calibration and validation, multivariable and AUC documentation (all p < 0.05). The types of incremental value reported were associated with a difference in AUC performance, but only significant when a threshold of 2 was chosen. In the sample population, measurement of CHD as an outcome or whether the population was similar to the original Framingham cohort did not significantly alter the AUC performance.
Documentation of reclassification and NRI analysis
Twenty three studies reported NRI estimates and all had at least 2 cut-offs, with those that had 3 cut-offs making up the biggest NRIs [59, 67]. The number of thresholds influenced the value of NRI . The most commonly used type of NRI was categorical NRI (69.6%). The studies that reported calibration were the same as those documented in the last section. Table 3 shows the reporting of reclassification analysis and NRI. Complete reporting of reclassification analysis using reclassification table or text was not common practice (31.4%). When reclassification analysis was done, only half was considered appropriate (56.3%). The actual number of patients being up or down classified to a different risk group was not documented. In conjunction with the documentation of NRI, the proportion of subjects being correctly reclassified was not always available (43.8%). Most studies subjectively drew strong conclusions from the NRI calculated (68.7%). The individual components of events and non-events and also their respective NRI components were not always available (at least 43.8%). Fifteen studies reported categorical NRI with 46 data points. The values of categorical combined NRI ranged from − 0.083 to 0.785 (median = 0.249). None of the aspects of reporting  was significantly related to the difference in the values of categorical combined NRI (Table 5).
The majority of studies claimed improved discrimination and reclassification of the outlined CT biomarkers over the established Framingham model. For association, hazard ratios were commonly used but the variation in reporting practice hindered evidence synthesis. Although all studies used a similar baseline model for AUC analysis, the performance of FRS varied. There was a clear negative correlation between improved discrimination and baseline performance of FRS. In contrast, despite the poor reporting, there was no difference in the magnitude of categorical NRI between adequate and inadequate reporting practice.
Selective reporting of association is known and remains an issue . We found that non-standardisation of thresholds and different reference groups across studies prohibit future meta-analysis. Here we substantiate this with 3 included studies. Chow et al. used non-obstructive coronary disease as the reference group for 3 different composite outcomes . The same author then used no coronary artery disease as the reference group in the CONFIRM cohort . In , we estimated hazard ratios using Kaplan-Meier curve . When non-obstructive coronary disease is the reference group, the estimated association of obstructive disease with composite outcome is smaller (HR = 2.03, 95% CI 1.47–2.79), compared with when no coronary disease/normal is the reference, the estimated association is bigger (HR = 3.24, 95% CI 2.28–4.63) .
NRI records the transformation in predicted risk that changes from one category to another category after the introduction of an additional test. However, it is only meaningful when the information about risk thresholds is available. The change in predicted risk could be correct or incorrect. In this population, the concept that the subject could be wrongly reclassified with an additional test was not clearly outlined. The combined NRI could have been driven by predominantly event NRI, leading to overestimation. This could have been clarified by reporting of the components of NRI but this was not standard practice, despite recommendation . Deriving from concern about miscalibration, another recommendation was the regular reporting of calibration [26, 71, 72], with graphical plot being the best assessment . Calibration, however, was regularly overlooked. To counteract the issues with missing data, we have met solutions such as Weibull extrapolation [5, 44, 47] and adjustment of risk cut-offs by the ratio of actual follow up. These strategies translate to a fact that a significant proportion of the included studies used non-standardised risk categories but almost all managed to justify. A more definitive solution would be a move towards decision curve analysis . Only 1 study provided information to allow adjustment using Kaplan-Meier estimates . This is on a background of insufficient reporting on the handling of censoring. This adjustment should receive more attention, especially when censoring happened early on during follow-up . There is currently no consensus on what is a large enough NRI. Overall, considering the uncertainty in NRI  and the small values of NRI, one should not draw strong conclusions from the use of NRI alone. Given the popularity of NRI in cardiovascular research , a framework of reporting NRI should be followed, for example in .
Discrimination as measured by AUC analysis is an established method of measuring incremental value . It was reported in almost all the studies but adequate documentation was not common practice. In our study, reporting of calibration, validation and AUC documentation all influenced the values of AUC. The baseline Framingham model is an established score even considering its various iterations . Big improvements in AUC was seen in cases where the baseline model performed badly. This echoes previous findings  but in a more defined population. As eluded to in , this phenomenon is similar to when a new drug is only effective when compared to an ineffective comparator drug . In addition to the above, inadequate reporting practices were associated with inflated estimates.
Our investigation on AUC  and NRI  were not empirical and can only serve as an update in a different population. The assessment on thresholds was minimal compared with previous investigation . The harm of imaging using CT (radiation burden) was not explored. The focus was solely on the potential benefit. Studies that indicated association or only had a reclassification table could be excluded because we focused on studies that had at least 1 summary estimate that indicated incremental value. We were unable to assess publication bias where articles showed no or worsening prediction if these studies remain unpublished. The calculation of model coefficients can significantly impact the baseline AUC. There are different ways of calculating the model coefficients (re-estimation for the new population, using published coefficients or a point based model) but these were not explored. The difference between using CHD or CVD as an outcome was investigated, however, the justification and transparency of reporting outcomes was not examined.
Association on its own is insufficient to substantiate incremental value  and large values are infrequent in biomarker research . AUC analysis is seen as a good starting point and reclassification should follow rather than replace AUC analysis . However, AUC analysis is not without fault . Transparent reporting of NRI should be compulsory, for example use of a reclassification table (38), and readers should be aware of the controversies surrounding NRI . The co-existence of a lack of increase in AUC and a positive NRI should alarm readers . In general, the reporting in prognosis studies needs to be more robust [82,83,84,85]. Data should be made available for individual patient data meta-analysis.
Inconsistent thresholds, reference groups and selective reporting prohibit future evidence synthesis of associations. Inadequate documentation of discrimination, calibration and validation are widespread. The variable baseline performance and other aspects of reporting discrimination inflate potential incremental values. Reporting of reclassification is also insufficient but significant differences between adequate and inadequate reporting practice have not been identified.
Area under the receiver operating characteristics curve
Coronary calcium score
Coronary heart disease
CoroNary CT Angiography Evaluation For Clinical Outcomes: An InterRnational Multicenter Registry (CONFIRM)
Computed tomographic coronary angiogram
Framingham Risk Score
Integrated discrimination improvement
Net reclassification index
Preferred Reporting Items of Systematic Review and Meta-analysis
Relative integrated discrimination improvement
Thoracic aorta calcium score
- Δ AUC:
Difference in area under the receiver operating characteristics curve
Damen JAAG, Hooft L, Schuit E, Debray TPA, Collins GS, Tzoulaki I, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416.
Bouwmeester W, Zuithoff NP, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9(5):1–12.
Wilson PW, D'Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–47.
Erbel R, Mohlenkamp S, Moebus S, Schmermund A, Lehmann N, Stang A, et al. Coronary risk stratification, discrimination, and reclassification improvement based on quantification of subclinical coronary atherosclerosis: the Heinz Nixdorf recall study. J Am Coll Cardiol. 2010;56(17):1397–406.
Kavousi M, Elias-Smale S, Rutten JH, Leening MJ, Vliegenthart R, Verwoert GC, et al. Evaluation of newer risk markers for coronary heart disease risk classification: a cohort study. Ann Intern Med. 2012;156(6):438–44.
Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? BMJ. 2009;338
Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MS, et al. Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009;119(17):2408–16.
O'Connor JP, Aboagye EO, Adams JE, Aerts HJ, Barrington SF, Beer AJ, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol. 2017;14(3):169–86.
Valenti V, B OH, Heo R, Cho I, Schulman-Marcus J, Gransar H, et al. A 15-year warranty period for asymptomatic individuals without coronary artery calcium: a prospective follow-up of 9,715 individuals. JACC Cardiovasc Imaging 2015;8(8):900–909.
Oudkerk M, Stillman AE, Halliburton SS, Kalender WA, Möhlenkamp S, McCollough CH, et al. Coronary artery calcium screening: current status and recommendations from the European Society of Cardiac Radiology and North American Society for cardiovascular imaging. Int J Cardiovasc Imaging. 2008;24(6):645–71.
Ioannidis JP, Tzoulaki I. What makes a good predictor?: the evidence applied to coronary artery calcium score. JAMA. 2010;303(16):1646–7.
Agatston AS, Janowitz WR, Hildner FJ, Zusmer NR, Viamonte M Jr, Detrano R. Quantification of coronary artery calcium using ultrafast computed tomography. J Am Coll Cardiol. 1990;15(4):827–32.
Wong ND, Gransar H, Shaw L, Polk D, Moon JH, Miranda-Peats R, et al. Thoracic aortic calcium versus coronary artery calcium for the prediction of coronary heart disease and cardiovascular disease events. JACC-Cardiovasc Imag. 2009;2(3):319–26.
Hadamitzky M, Achenbach S, Al-Mallah M, Berman D, Budoff M, Cademartiri F, et al. Optimized prognostic score for coronary computed tomographic angiography: results from the CONFIRM registry (COronary CT angiography EvaluatioN for clinical outcomes: an InteRnational multicenter registry). J Am Coll Cardiol. 2013;62(5):468–76.
Tzoulaki I, Liberopoulos G, Ioannidis JP. Assessment of claims of improved prediction beyond the Framingham risk score. JAMA. 2009;302(21):2345–52.
Kyzas PA, Loizou KT, Ioannidis JP. Selective reporting biases in cancer prognostic factor studies. J Natl Cancer Inst. 2005;97(14):1043–55.
Tzoulaki I, Liberopoulos G, Ioannidis JP. Use of reclassification for assessment of improved prediction: an empirical evaluation. Int J Epidemiol. 2011;40(4):1094–105.
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700.
Pang CL, Peters J, Hyde C, Roobottom C. The added value of computed tomography coronary angiogram in predicting future cardiovascular events in a low risk population: comparison with Framingham Risk Score. PROSPERO: International prospective register for systematic reviews. 2015:CRD42015023795.
British Library e-theses online service. http://ethos.bl.uk/Home.do. Accessed Sept 03, 2015.
System for Information on Grey Literature in Europe. http://www.opengrey.eu/. Accessed Sept 03, 2015.
DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.
Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. Net Reclassification indices for evaluating risk-prediction instruments: a critical review. Epidemiology (Cambridge, Mass). 2014;25(1):114–121.
Pencina MJ, D'Agostino RB Sr, Demler OV. Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med. 2012;31(2):101–13.
Pencina MJ, D'Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30(1):11–21.
Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–72. discussion 207-12
Pencina MJ, D'Agostino RB, D'Agostino RB, Vasan RS. Comments on ‘integrated discrimination and net reclassification improvements—practical advice’. Stat Med. 2008;27(2):207–12.
Ferreira-González I, Permanyer-Miralda G, Domingo-Salvany A, Busse JW, Heels-Ansdell D, Montori VM, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ. 2007;334(7597):786.
Ciani O, Buyse M, Garside R, Pavey T, Stein K, Sterne JAC, et al. Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study. BMJ. 2013;346:f457.
Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Final Report. Circulation. 2002;106(25):3143.
D’Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation. 2008;117(6):743–53.
Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–43.
Altman DG, Bland JM. How to obtain the P value from a confidence interval. BMJ. 2011;343:d2304.
Guyot P, Ades AE, Ouwens MJNM, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol. 2012;12(1):9.
Leeflang MM, Moons KG, Reitsma JB, Zwinderman AH. Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clin Chem. 2008;54(4):729–37.
Pencina MJ, D'Agostino RB, Pencina KM, Janssens AC, Greenland P. Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol. 2012;176(6):473–81.
Leening MJG, Vedder MM, Witteman JCM, Pencina MJ, Steyerberg EW. Net reclassification improvement: computation, interpretation, and ControversiesA literature review and Clinician's guide. Ann Intern Med. 2014;160(2):122–31.
Hayden JA, van der Windt DA, Cartwright JL, Cote P, Bombardier C. Assessing bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6.
Lau KK, Wong YK, Chan YH, Yiu KH, Teo KC, Li LS, et al. Prognostic implications of surrogate markers of atherosclerosis in low to intermediate risk patients with Type 2 Diabetes. Cardiovasc Diabetol. 2012;11(101).
Ahmadi N, Hajsadeghi F, Blumenthal RS, Budoff MJ, Stone GW, Ebrahimi R. Mortality in individuals without known coronary artery disease but with discordance between the Framingham risk score and coronary artery calcium. Am J Cardiol. 2011;107(6):799–804.
Budoff MJ, Shaw LJ, Liu ST, Weinstein SR, Mosler TP, Tseng PH, et al. Long-term prognosis associated with coronary calcification: observations from a registry of 25,253 patients. J Am Coll Cardiol. 2007;49(18):1860–70.
Raggi P, Shaw LJ, Berman DS, Callister TQ. Gender-based differences in the prognostic value of coronary calcification. J Women's Health. 2004;13(3):273–83.
Chang SM, Nabi F, Xu J, Pratt CM, Mahmarian AC, Frias ME, et al. Value of CACS compared with ETT and myocardial perfusion imaging for predicting long-term cardiac outcome in asymptomatic and symptomatic patients at low risk for coronary disease clinical implications in a multimodality imaging world. JACC-Cardiovasc Imag. 2015;8(2):134–44.
Elias-Smale SE, Proenca RV, Koller MT, Kavousi M, van Rooij FJ, Hunink MG, et al. Coronary calcium score improves classification of coronary heart disease risk in the elderly: the Rotterdam study. J Am Coll Cardiol. 2010;56(17):1407–14.
Forouzandeh F, Chang SM, Muhyieddeen K, Zaid RR, Trevino AR, Xu J, et al. Does quantifying epicardial and intrathoracic fat with noncontrast computed tomography improve risk stratification beyond calcium scoring alone? Circ Cardiovasc Imag. 2013;6(1):58–66.
Hadamitzky M, Meyer T, Hein F, Bischoff B, Martinoff S, Schomig A, et al. Prognostic value of coronary computed tomographic angiography in asymptomatic patients. Am J Cardiol. 2010;105(12):1746–51.
Elias-Smale SE, Wieberdink RG, Odink AE, Hofman A, Hunink MG, Koudstaal PJ, et al. Burden of atherosclerosis improves the prediction of coronary heart disease but not cerebrovascular events: the Rotterdam study. Eur Heart J. 2011;32(16):2050–8.
Yeboah J, Carr JJ, Terry JG, Ding J, Zeb I, Liu S, et al. Computed tomography-derived cardiovascular risk markers, incident cardiovascular events, and all-cause mortality in nondiabetics: the multi-ethnic study of atherosclerosis. Eur J Prev Cardiol. 2014;21(10):1233–41.
Greenland P, LaBree L, Azen SP, Doherty TM, Detrano RC. Coronary artery calcium score combined with Framingham score for risk prediction in asymptomatic individuals.[erratum appears in JAMA. 2004 Feb 4;291(5):563]. JAMA. 2004;291(2):210–5.
Yeboah J, McClelland RL, Polonsky TS, Burke GL, Sibley CT, O'Leary D, et al. Comparison of novel risk markers for improvement in cardiovascular risk assessment in intermediate-risk individuals. JAMA. 2012;308(8):788–95.
Matsushita K, Sang YY, Ballew SH, Shlipak M, Katz R, Rosas SE, et al. Subclinical atherosclerosis measures for cardiovascular prediction in CKD. J Am Soc Nephrol. 2015;26(2):439–47.
Mohlenkamp S, Lehmann N, Greenland P, Moebus S, Kalsch H, Schmermund A, et al. Coronary artery calcium score improves cardiovascular risk prediction in persons without indication for statin therapy. Atherosclerosis. 2011;215(1):229–36.
Mohlenkamp S, Lehmann N, Moebus S, Schmermund A, Dragano N, Stang A, et al. Quantification of coronary atherosclerosis and inflammation to predict coronary events and all-cause mortality. J Am Coll Cardiol. 2011;57(13):1455–64.
Polonsky TS, McClelland RL, Jorgensen NW, Bild DE, Burke GL, Guerci AD, et al. Coronary artery calcium score and risk classification for coronary heart disease prediction. JAMA. 2010;303(16):1610–6.
Raggi P, Cooil B, Callister TQ. Use of electron beam tomography data to develop models for prediction of hard coronary events. Am Heart J. 2001;141(3):375–82.
Rana JS, Gransar H, Wong ND, Shaw L, Pencina M, Nasir K, et al. Comparative value of coronary artery calcium and multiple blood biomarkers for prognostication of cardiovascular events. Am J Cardiol. 2012;109(10):1449–53.
Agarwal S, Cox AJ, Herrington DM, Jorgensen NW, Xu J, Freedman BI, et al. Coronary calcium score predicts cardiovascular mortality in diabetes: diabetes heart study. Diabetes Care. 2013;36(4):972–7.
Arad Y, Goodman KJ, Roth M, Newstein D, Guerci AD. Coronary calcification, coronary disease risk factors, C-reactive protein, and atherosclerotic cardiovascular disease events: the St. Francis heart study. J Am Coll Cardiol. 2005;46(1):158–65.
Cho I, Chang HJ, Sung JM, Pencina MJ, Lin FY, Dunning AM, et al. Coronary computed tomographic angiography and risk of all-cause mortality and nonfatal myocardial infarction in subjects without chest pain syndrome from the CONFIRM registry (coronary CT angiography evaluation for clinical outcomes: an international multicenter registry). Circulation. 2012;126(3):304–13.
Versteylen MO, Kietselaer BL, Dagnelie PC, Joosen IA, Dedic A, Raaijmakers RH, et al. Additive value of Semiautomated quantification of coronary artery disease using cardiac computed tomographic angiography to predict future acute coronary syndrome. J Am Coll Cardiol. 2013;61(22):2296–305.
Gibson AO, Blaha MJ, Arnan MK, Sacco RL, Szklo M, Herrington DM, et al. Coronary artery calcium and incident cerebrovascular events in an asymptomatic cohort the MESA study. JACC-Cardiovasc Imag. 2014;7(11):1108–15.
Hermann DM, Gronewold J, Lehmann N, Moebus S, Jockel KH, Bauer M, et al. Coronary artery calcification is an independent stroke predictor in the general population. Stroke. 2013;44(4):1008–13.
Chow BJ, Small G, Yam Y, Chen L, Achenbach S, Al-Mallah M, et al. Incremental prognostic value of cardiac computed tomography in coronary artery disease using CONFIRM: COroNary computed tomography angiography evaluation for clinical outcomes: an InteRnational multicenter registry. Circ Cardiovasc Imag. 2011;4(5):463–72.
Lin FY, Shaw LJ, Dunning AM, LaBounty TM, Choi JH, Weinsaft JW, et al. Mortality risk in symptomatic patients with nonobstructive coronary artery disease a prospective 2-center study of 2,583 patients undergoing 64-detector row coronary computed tomographic angiography. J Am Coll Cardiol. 2011;58(5):510–9.
Chow BJ, Wells GA, Chen L, Yam Y, Galiwango P, Abraham A, et al. Prognostic value of 64-slice cardiac computed tomography severity of coronary artery disease, coronary atherosclerosis, and left ventricular ejection fraction. J Am Coll Cardiol. 2010;55(10):1017–28.
Park HE, Chun EJ, Choi SI, Lee SP, Yoon CH, Kim HK, et al. Clinical and imaging parameters to predict cardiovascular outcome in asymptomatic subjects. Int J Cardiovasc Imag. 2013;29(7):1595–602.
Cho I, Chang HJ, Hartaigh BO, Shin S, Sung JM, Lin FY, et al. Incremental prognostic utility of coronary CT angiography for asymptomatic patients based upon extent and severity of coronary artery calcium: results from the COronary CT angiography EvaluatioN for clinical outcomes InteRnational multicenter (CONFIRM) study. Eur Heart J. 2015;36(8):501–8.
Han D, B OH, Gransar H, Yoon JH, Kim KJ, Kim MK, et al. Incremental benefit of coronary artery calcium score above traditional risk factors for all-cause mortality in asymptomatic Korean adults. Circulation J 2015;79(11):2445–2451.
Raggi P, Gongora MC, Gopal A, Callister TQ, Budoff M, Shaw LJ. Coronary artery calcium to predict all-cause mortality in elderly men and women. J Am Coll Cardiol. 2008;52(1):17–23.
Muhlenbruch K, Heraclides A, Steyerberg EW, Joost HG, Boeing H, Schulze MB. Assessing improvement in disease prediction using net reclassification improvement: impact of risk cut-offs and number of risk categories. Eur J Epidemiol. 2013;28(1):25–33.
Pepe MS, Feng Z, Gu JW. Comments on 'Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond' by M. J. Pencina et al., statistics in medicine (DOI: 10.1002/sim.2929). Stat Med. 2008;27(2):173–81.
Hilden J, Gerds TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med. 2014;33(19):3405–14.
Cox DR. Two further applications of a model for binary regression. Biometrika. 1958;45(3/4):562–5.
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making. 2006;26(6):565–74.
Steyerberg EW, Pencina MJ. Reclassification calculations for persons with incomplete follow-up. Ann Intern Med. 2010;152(3):195–6. author reply 6-7
Janssens ACJW, Khoury MJ. Assessment of improved prediction beyond traditional risk factors. Circ Cardiovasc Genet. 2010;3(1):3.
Bero L, Oostvogel F, Bacchetti P, Lee K. Factors associated with findings of published trials of drug–drug comparisons: why some statins appear more efficacious than others. PLoS Med. 2007;4(6):e184.
Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–90.
Moons KGM. Criteria for scientific evaluation of novel markers: a perspective. Clin Chem. 2010;56(4):537.
Pepe MS, Fan J, Feng Z, Gerds T, Hilden J. The net reclassification index (NRI): a misleading measure of prediction improvement even with independent test data sets. Stat Biosci. 2015;7(2):282–95.
Mihaescu R, van Zitteren M, van Hoek M, Sijbrands EJ, Uitterlinden AG, Witteman JC, et al. Improvement of risk prediction by genomic profiling: reclassification measures versus the area under the receiver operating characteristic curve. Am J Epidemiol. 2010;172(3):353–61.
Hemingway H, Croft P, Perel P, Hayden JA, Abrams K, Timmis A, et al. Prognosis research strategy (PROGRESS) 1: a framework for researching clinical outcomes. BMJ. 2013;346
Riley RD, Hayden JA, Steyerberg EW, Moons KGM, Abrams K, Kyzas PA, et al. Prognosis research strategy (PROGRESS) 2: prognostic factor research. PLoS Med. 2013;10(2):e1001380.
Steyerberg EW, Moons KGM, van der Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381.
Hingorani AD, Windt DAvd, Riley RD, Abrams K, Moons KGM, Steyerberg EW, et al. Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ. 2013;346:e5793.
Chun Lap Pang is sponsored by National Institute of Health Research and Plymouth Hospitals NHS Trust to undertake a research degree.
This research was funded by the National Insitute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care South West Peninsula (NIHR CLAHRC South West Peninsula). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Availability of data and materials
Data available from included publications.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.