A new efficient trial design for assessing reliability of ankle-brachial index measures by three different observer groups
© Endres et al. 2006
Received: 30 May 2006
Accepted: 27 July 2006
Published: 27 July 2006
Skip to main content
© Endres et al. 2006
Received: 30 May 2006
Accepted: 27 July 2006
Published: 27 July 2006
The usual method of assessing the variability of a measure such as the ankle brachial index (ABI) as a function of different observer groups is to obtain repeated measurements. Because the number of possible observer-subject combinations is impractically large, only a few small studies on inter- and intraobserver variability of ABI measures have been carried out to date. The present study proposes a new and efficient study design. This paper describes the study methodology.
Using a partially balanced incomplete block design, six angiologists, six primary-care physicians and six trained medical office assistants performed two ABI measurements each on six individuals from a group of 36 unselected subjects aged 65–70 years. Each test subject is measured by one observer from each of the three observer groups, and each observer measures exactly six of the 36 subjects in the group. Each possible combination of two observers occurs exactly once per patient and is not repeated on a second subject. The study involved four groups of 36 subjects (144), plus standbys.
The 192 volunteers present at the study day were similar in terms of demographic characteristics and vascular risk factors: mean age 68.6 ± 1.7; mean BMI 29.1 ± 4.6; mean waist-hip ratio 0.92 ± 0.09; active smokers 12%; hypertension 60.9%; hypercholesterolemia 53.4%; diabetic 17.2%. A complete set of ABI measurements (three observers performing two Doppler measurements each) was obtained from 108 subjects. From all other subjects at least one ABI measurement was obtained. The mean ABI was 1.08 (± 0.13), 15 (7.9%) volunteers had an ABI <0.9, and none had an ABI >1.4, i.e. a ratio that may be associated with increased stiffening of the arterial walls.
This is the first large-scale study investigating the components of variability and thus reliability in ABI measurements. The advantage of the new study design introduced here is that only one sixth of the number of theoretically possible measurements is required to obtain information about measurement errors. Bland-Altman plots show that there are only small differences and no systematic bias between the observers from three occupational groups with different training backgrounds.
Coronary heart disease and stroke as manifestations of atherosclerosis are among the leading causes of death. Consequently, there has been an intensive search for measures of atherosclerosis burden that would indicate the cardiovascular and overall risk of the individual patient . An early indicator of possible generalized atherosclerosis is peripheral arterial disease (PAD), which refers to the manifestation of atherosclerosis below the aortal bifurcation [2, 3].
An early and reproducible diagnosis of PAD is important for identifying high-risk patients as early as possible. Only a minority of patients exhibit the symptoms of intermittent claudication. Data from the literature suggest that for every patient with intermittent claudication, there are at least three with asymptomatic PAD .
The introduction of Doppler sonography has made it possible for general physicians to diagnose asymptomatic PAD by determining the ankle-brachial index (ABI), which represents the ratio of ankle to brachial systolic blood pressure (SBP) [5, 6]. ABI values below 0.9 are considered pathological (presence of PAD) [7, 8]. Compared to angiography, the sensitivity of a low ABI for leg artery stenosis of ≥ 50% is about 90%, and the specificity is about 98% . In a recent systematic review, the specificity of low ABI to predict future cardiovascular outcomes was high (e.g. 88% for cardiovascular mortality), but the sensitivity was low (41%) .
To establish ABI determination as a screening procedure for asymptomatic PAD, the interobserver variability (same subject, different observers) and intraobserver variability (same subject, same observer) of ABI measures must be known. Of particular interest is how these two types of variability differ depending on the observer's level of specialization. The present study therefore set out to determine inter- and intraobserver variability of ABI measurements by three groups of observers (angiologists, general physicians and medical office assistants) and with a sufficiently large number of randomly selected subjects, age 65 to 70, free of serious disease and without symptomatic PAD (the target group of ABI screening measurements). In order to avoid having to make very large numbers of measurements (impractical for both subjects and observers), a study design was developed that provides the necessary information with one sixth the number of Doppler measures that would otherwise have to be performed. The paper deals with the study methodology.
A total of 382 people (36.0%) responded to the study center in Bochum, of whom 310 were interested in participating and 270 signed the statement of informed consent. To ensure that at least 144 subjects would be available for the ABI measurements on the day of the study (four groups of 36), the first 100 men and the first 100 women to respond to the invitation were asked to attend. As a result there were 14 more participants per group than the minimum of 36 required (stand-by). The four groups were asked to arrive at 1.5- to 2.5-hour intervals. Another 20 subjects whose data were not to be included in the analysis of results were asked to participate in a "dress rehearsal" one day before the actual study day. The study was conducted in accordance with the Helsinki Declaration and the Guideline for Good Epidemiological Practice (GEP) issued by the German Working Group on Epidemiology . The study was approved by the ethics committee of the Ruhr-University of Bochum.
The duplicate ABI measurements were performed by six angiologists, six general physicians (GPs) and six medical office assistants with special training in Doppler measurements (MAs). The angiologists (internists with a specialization in angiology) were selected from the "centers of excellence" (coordinating and training centers) of the German Epidemiological Trial on Ankle-Brachial Index (getABI) [8, 12]. The GPs were also investigators in getABI and were chosen by the angiologists. The MAs were employees of the angiologists and specialized in ABI measurements. The basic physical examinations were performed by separate observers (three angiologists and three GPs) who were not involved in performing the duplicate ABI measurements. These extra angiologists/GPs also performed a single ABI measurement on the stand-by subjects.
Demographic data for the 192 subjects who reported on day of study
Group 5 Stand-bys
p-values (global test of all 5 groups)*
Validation groups (Groups 1, 2 and 3)
Number of participants
Age: mean (± SD)
Sex: N (%)
BMI: mean (± SD)
< 25: N (%)
25 – < 30: N (%)
>= 30: N (%)
WHR: mean (± SD)
Smoker status: N (%)
Hypertension: N (%)
Hypercholesterolemia: N (%)
Diabetes: N (%)
Vascular-widening procedures (n): N (%)
The first 36 consecutive subjects in each group were called up for repeat ABI measurements, and the remaining 14 subjects were registered as stand-bys. This procedure was followed for four groups. Stand-bys underwent only a single ABI measurement (not used for study purposes).
Combination schedule of observers and patients
Pat01 – MA1
Pat02 – MA2
Pat03 – MA3
Pat04 – MA4
Pat05 – MA5
Pat06 – MA6
Pat07 – MA2
Pat08 – MA3
Pat09 – MA4
Pat10 – MA5
Pat11 – MA6
Pat12 – MA1
Pat13 – MA3
Pat14 – MA4
Pat15 – MA5
Pat16 – MA6
Pat17 – MA1
Pat18 – MA2
Pat19 – MA4
Pat20 – MA5
Pat21 – MA6
Pat22 – MA1
Pat23 – MA2
Pat24 – MA3
Pat25 – MA5
Pat26 – MA6
Pat27 – MA1
Pat28 – MA2
Pat29 – MA3
Pat30 – MA4
Pat31 – MA6
Pat32 – MA1
Pat33 – MA2
Pat34 – MA3
Pat35 – MA4
Pat36 – MA5
The primary purpose of the study was the assessment of three sources of variability, namely the true differences in ABI between subjects, the measurement error arising from duplicate measurements of the same subject by the same observer (intraobserver variability), and the additional error arising from measurements by different observers (interobserver variability). Secondary outcome measures were body mass index, waist-hip ratio, and risk factors for atherosclerotic complications (smoking, hypertension, lipid disorders, diabetes mellitus, symptomatic PAD).
The measurements in the study group began after a ten-minute rest period. Each observer was assigned an aide, who was responsible for guiding the observer to the numbered beds in the correct time sequence, as set out in a predetermined list, and for recording the readings on a standard form. The aide ensured the correct sequence of measurements and the blinding of the observer to his/her previous measurements on the same subject. The observers had no access to the measures. Additionally, controllers verified the accuracy of the measurement sequence and procedure at each bed.
Over the course of about 90 minutes, each subject saw three observers, one from each of the occupational groups. Each observer was given a list of six subjects to measure, and always completed all six subjects on the list before returning (approximately 45 minutes later) to the first person on the list to repeat the measurements on the same subjects in the same sequence. The volunteers remained supine between observers, resulting in a resting period of 5–10 minutes between each set of readings (by different observers).
The Doppler ultrasound device with which the ABI readings were taken was the same one used in the getABI study [8, 12] (Kranzbühler 8 MHz, Solingen, Germany). Measurements were performed on subjects resting in a supine position with the upper body as flat as possible, since readings taken in the sitting or semi-sitting position may result in a substantial increase in the tibial artery blood pressure. After the initial ten-minute rest, each observer used a blood pressure cuff and a Doppler device to take bilateral readings of systolic blood pressure (SBP) at the anterior and posterior tibial arteries and the brachial artery (in that order). Once the manometer pressure was released, the first pulse sound audible through the Doppler device as the cuff was deflated marked the systolic arterial blood pressure. To eliminate possible noise interference from activity at neighboring beds, headphones were used.
The ratio of ankle SBP to arm SBP yields the ABI value. Calculations were performed according to the recommendations of the American Heart Association . Ankle pressure on either side was the higher of the pressure at either the anterior or posterior tibial arteries on that side. Arm pressure was either the average of the two arm pressure readings, if the difference between the two arms was <10 mmHg, or the higher of the two, if the difference between the two sides was ≥ 10 mmHg. The ABI value for left and right sides was calculated as the left or right ankle pressure divided by the arm pressure. The lower of the two ABI values (right or left) was the subject's overall ABI . Note that on the study day only raw values were recorded. The ABIs were calculated later by the statisticians. All data were double-entered into the data base to ensure accuracy of data entry.
In a previous study  the difference between the standard deviations of the most and least experienced observers was found to be 0.05 ABI points (standard deviations of 0.073 and 0.120 respectively). We decided that our study should at least be able to detect a two-way difference between the standard deviations of two observers of half the above value (0.025 ABI points). Given a sample size of 108 (three runs of the design), the power for this comparison in an F-test was calculated as 93%, which is acceptable.
For all statistical tests, two-sided p values <0.05 were considered statistically significant. Characteristics of the ABI study subjects were analyzed using Chi-square or Fisher's exact test for contingency tables, and t-test or one-way ANOVA for the comparison of two or more means.
Each of the ABI values was assumed to represent the sum of the true mean ABI value and random variation from several sources. Analysis of ABI variability was done using a mixed model (Proc Mixed SAS™ Statistical Software Release 9.1.3) with two random factors and their interaction, the subjects (108 levels) and the observers (18 levels). Note that the inclusion of the patient as a factor makes it possible to account for and calculate intra-patient correlation. No repeated measures model was used.
Variance of the true ABI values based on mean ABI between different subjects, i.e. variance of the ABI that is medically interpreted.
Variance of ABI measurements when multiple measurements are performed on the same subject by the same observer – the so-called intraobserver variance, assumed to be due to measurement error and physiological variation.
Variance in ABI measurements when multiple measurements are performed on the same subject by different observers (first component of interobserver variance), due to a systematic measurement bias in the individual observer, e.g. an observer generally measuring higher or lower values than others (independent of the specific subject).
An additional variance of ABI measurements when multiple measurements are performed on the same subject by different observers, caused by interaction between subject and observer (second component of interobserver variance), i.e. an observer measuring higher or lower values than others for specific subjects only. For example, some observers may have a harder time measuring obese subjects than do other observers.
By adding the three variance components of multiple measurements on the same subject, namely the intraobserver variance b and the two interobserver variance components c and d, one obtains the total variance for multiple measurements on the same subject by different observers:
Total variance of all ABI values for a single subject = b + c + d
Simple addition is permissible when the individual variance components are independent of one another. The standard deviation of the ABI measurements for a single subject is the root of the total variance.
In order to evaluate the overall quality of the ABI as a measurement technique for discrimination between different subjects, all four variance components must be considered. An intraclass correlation coefficient (ICC) is normally formed for this purpose. The ICC indicates the proportion (in percent) of the total variance in measurement results between subjects that can be explained by the "biological" or real variance between the subjects examined. A high ICC indicates that the measurements can be used to discriminate between individuals. ICC values ≥ 0.75 are said to be acceptable . The ICC can also be understood as a generalization of the coefficient of determination, or the square of the correlation coefficient in the case of only two sources of variations (two variables). Therefore, the interpretation of the ICC is the same as for the coefficient of determination, namely the proportion of total variance that is explainable by the real variance between subjects.
The intraobserver variability of each individual observer can also be determined. By taking the mean of these values for all observers within one occupational group (angiologists, general physicians and medical office assistants) it is possible to assess the average quality of the measurements by each of the three groups.
Bland-Altman plots were used to visualize measurement errors. For each subject, the differences between the first and second measurements were plotted against the mean of the two measures with 95% limits of agreement .
All data analyses were performed using SAS™ Statistical Software (Release 9.1.3, SAS Institute Inc., Cary, NC, USA).
Of the 200 subjects invited to participate, 192 (96%) appeared on the study day. The data for the pilot study (rehearsal) were not analyzed. Demographic data for all subjects who appeared on the study day are shown in Table 1, subdivided into Groups 1–4 for duplicate ABI determination and a fifth group of stand-bys. The subjects in the five groups were equal for all recorded basic data, including PAD risk factors.
The 108 individuals (53.7% female) for whom complete duplicate measurements were obtained had a mean age of 68.7 (1.5) years, mean BMI of 29.0 (4.3) kg/m2, and mean waist-hip ratio of 0.92 (0.09). Active smokers made up 9.3% of the sample and former smokers 40.7%; 58.1% were hypertensive, 54.8% dyslipidemic, and 15.7% diabetic. Only one subject (0.9%) of Groups 1–3 reported having had peripheral vascular-widening measures in the past. In Groups 4 and 5 five subjects (5.9%) reported having had such measures.
ABI measurements were performed on 189 of the 192 subjects who enrolled in the study. One subject dropped out after the basic examination, before any ABI determinations were performed, and on two other subjects no ABI measures could be obtained for technical reasons (very obese subjects with upper arm too large for the blood pressure cuff).
In Groups 1–3 (108 subjects), the ABI measurement program was completed as planned with full sets of duplicate measurements. Because of time restrictions, full sets of measurements were not obtained from all subjects in the fourth group. The analysis was based on the first three groups of subjects (58 women, 50 men) for whom complete measurements were obtained.
Results of duplicate (Groups 1–4) and single (Group 5) ABI measurements
Comparison of mean systolic arm pressure between subjects in stand-by group 5, duplicate measurement groups 1–3, and incomplete group 4
25 th percentile
75 th percentile
The simplest and most direct measure of agreement between two measured values is the difference between them. This is true not only when seeking to determine agreement between measures obtained by two different measurement methods, but also when seeking to determine the reproducibility of a measure, in this case an ABI measure, when repeat measurements are performed by the same observer on the same subject.
Bland-Altman plots were created to visualize the ABI measurement error . The difference between two ABI values was plotted against the mean of the same two values, in order to show whether systematic differences exist between the repeat measurements as a function of absolute value (mean) of the ABI. The two blue lines represent two standard deviations of the differences, in other words the upper limit of the difference between two measures up to which about 95% of all the differences between measures can be found. These are the limits of agreement (mean difference ± 2SD) as described by Bland and Altman.
Previously only a few small studies (6–36 subjects) existed on the inter- and intraobserver variability of ABI measures [14, 17, 18, 19]. These studies worked with small numbers of observers and with patients who had symptomatic PAD or reduced ABIs (<0.95). By contrast, our study focused on a large number of elderly unselected individuals, who are the most likely to be subjects of screening measures, as in primary care. With a total of 18 observers of different technical backgrounds and levels of experience, the study also reflects the everyday conditions of primary care.
From the fact that the mean differences between the ABI measurements in Figure 3 and Figure 5 are very close to 0, the following two conclusions can be drawn: Neither the time difference of about 45 minutes between the two measures on the same subject (repeat measurements by angiologists, GPs or MAs) nor differences in observer category (e.g. angiologists versus GPs) resulted in a systematic bias in the measures. Furthermore, comparison between the two groups of physicians (Figure 5) leads to the conclusion that the ABI measures taken by the GPs are just as accurate as those taken by the angiologists. In sum, Bland-Altman plots show that there are only small differences and no systematic bias between the observers from three occupational groups with different training backgrounds. This confirms the appropriateness of ABI measurements for screening for PAD and generalised atherosclerosis in the GP setting.
Previous studies indicate both that the variability in ABI measures is highly dependent on the experience of the observers, and that the value of the ABI measures depends on the apparatus with which the measurements are performed [14, 17, 18, 19]. However, the strength of these studies' conclusions must be considered questionable, as all studies had one or more major methodological limitations such as relatively small samples of observers or patients, or the selection of patients with symptomatic PAD. The latter is of importance, as variability has been reported to differ at least slightly between diseased and normal subjects [17, 20].
The variability in ABI measures is of particular importance for patients moving from one observer to another, for example as the result of changing family doctors (interobserver variance). Moreover, the ABI should always be an average of several measurements. Under ideal measurement conditions, an ABI <0.9 is considered a readily obtained indicator for the possible presence of PAD [21, 22]. The most recent studies nevertheless recommend that patients with ABIs between 0.9 and 1.1 be treated as "borderline PAD", since a 25% higher mortality is found in this group compared to patients with ABIs of between 1.1 and 1.4 . A significant increase in mortality was also recently observed in patients with ABIs >1.4 . Similar results were published by Resnick et al. in 2004; however, because the study population consisted of aboriginal Americans only, the data could not necessarily be applied to the general population .
With the partially balanced incomplete block design, it was possible to test every possible observer combination exactly once while performing only 16.7% of all the repeat ABI measurements that would theoretically be required. Thus we established the conditions for assessing components of variance with large numbers of subjects and observers, which can be applied also to measures other than ABI.
peripheral arterial disease
body mass index in kg/m2
The study was supported by an unrestricted educational grant by Sanofi-Aventis, Berlin, Germany.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.