If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Address reprint requests to Dr Neil E. O'Connell, Department of Clinical Sciences, Brunel University London, Kingston Lane, Uxbridge UB8 3PH, United Kingdom.
Health Economics Research Group, Institute of Environment, Health and Societies, Department of Clinical Sciences, Brunel University London, Uxbridge, United Kingdom
Bimodal outcome distributions are used to justify responder analyses in trials of interventions for pain.
•
An analysis of trials in spinal pain found no evidence for bimodally distributed outcomes.
•
This does not support the automatic prioritization of responder analyses to estimate treatment effectiveness.
Abstract
The presence of bimodal outcome distributions has been used as a justification for conducting responder analyses, in addition to, or in place of analyses of the mean between-group difference, in clinical trials and systematic reviews of interventions for pain. The aim of this study was to investigate the distribution of participants' pain outcomes for evidence of bimodal distribution. We sourced data on participant outcomes from a convenience sample of 10 trials of nonsurgical interventions (exercise, manual therapy, medication) for spinal pain. We assessed normality using the Shapiro-Wilk test. When the Shapiro-Wilk test suggested non-normality we inspected distribution plots visually and attempted to classify them. To test whether responder analyses detected a meaningful number of additional patients experiencing substantial improvements we also calculated the risk difference and number needed to treat to benefit. We found no compelling evidence suggesting that outcomes were bimodally distributed for any of the intervention groups. Responder analysis would not meaningfully alter our interpretation of these data compared with the mean between group difference. Our findings suggest that bimodal distribution of outcomes should not be assumed in interventions for spinal pain and do not support the automatic prioritization of responder analysis over the between group difference in the evaluation of treatment effectiveness for pain.
Perspective
Secondary analysis of clinical trials of nonsurgical interventions for spinal pain found no evidence for bimodally distributed outcomes. The findings do not support the automatic prioritization of responder analyses over the average between group difference in the evaluation of treatment effectiveness for spinal pain.
Randomized controlled trials (RCTs), and systematic reviews of RCTs are widely considered the most robust method to evaluate the efficacy and effectiveness of clinical interventions. In pain treatment a critical outcome is pain intensity, which is commonly measured using a self-reported visual analog scale (VAS) or numeric rating scale. In RCTs with outcomes measured on these scales, the only direct estimate of the effects caused by the intervention of interest is the average between group difference in pain score after the intervention. For example, if a hypothetical RCT, comparing an electrotherapy intervention with a placebo control, reported a postintervention between group mean difference of 1 point on a 0 to 10 pain intensity VAS, then we would interpret that the intervention specifically improved pain by 1 point, on average.
A range of examples from trials of pharmacologic therapies for pain show that that the pattern of participant outcome is often bimodally distributed.
Simply put, of the patients who receive the active intervention some experience substantial improvement (and are often referred to as “responders”), some have minimal to no change in their outcome (“nonresponders”), and very few experience intermediate (moderate) effects. This bimodality of outcomes is proposed to reflect that these interventions deliver large treatment effects for a minority of patients and that this large benefit to those individuals is essentially washed out in the between-group average difference, because of the large proportion of nonresponders, leading to underestimation of the effectiveness of the intervention. In this instance, the average treatment effect, estimated according to the between group difference, may not reflect the experience of many or indeed any participants who received the intervention. On that basis it has been proposed that in trials for chronic pain, the mean between group difference lacks utility as a measure of effectiveness.
ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief, Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors “Evidence” in chronic pain-establishing best practice in the reporting of systematic reviews.
Estimating the number needed to treat from continuous outcomes in randomised controlled trials: Methodological challenges and worked example using data from the UK Back Pain Exercise and Manipulation (BEAM) trial.
ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief, Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors “Evidence” in chronic pain-establishing best practice in the reporting of systematic reviews.
which compares the proportion achieving a predetermined clinically important improvement from baseline in the treatment and control groups. It is proposed that this type of analysis better quantifies individual participant responses to treatment, enables the calculation of easily interpreted measures such as the number needed to treat to benefit (NNTB)
Estimating the number needed to treat from continuous outcomes in randomised controlled trials: Methodological challenges and worked example using data from the UK Back Pain Exercise and Manipulation (BEAM) trial.
ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief, Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors “Evidence” in chronic pain-establishing best practice in the reporting of systematic reviews.
and thereby provides more meaningful and interpretable estimates of treatment effectiveness than the between group mean difference. Such analyses can use thresholds for important change derived from patients. For example, the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) consensus group recommendations
In responder analysis of clinical trials, the between group difference in the proportion of participants who experience a good outcome reflects the net increase in the proportion of patients who achieve the desired outcome because of the intervention. However, responder analyses have attracted criticism over certain conceptual and practical limitations. The term “responder analysis” encourages the conflation of patient outcomes with treatment effects.
‘Responders’ are identified by within person change from baseline. That change (or outcome) may be because of natural recovery, nonspecific treatment effects, and/or regression to the mean as well as, or instead of, the effects of the intervention and it is not possible to isolate treatment effects from these other causes of within person change.
Why caution is recommended with post-hoc individual patient matching for estimation of treatment effect in parallel-group randomized controlled trials: The case of acute stroke trials.
As such, the label “responder” simply identifies a participant whose outcome improved, regardless of whether improvement was because of the treatment or not. In fact, treatment may be responsible for very little, or none of that improvement in outcome. It is also possible that some individuals who actually responded to the intervention might not be counted as responders. For example, if the natural history of a person during the treatment period would have been significant worsening, yet with treatment their condition remains stable, they will be counted as nonresponders despite receiving significant benefit from the intervention. Similarly, a lack of symptom improvement and significant worsening of symptoms are both counted simply as nonresponse. Finally, the dichotomization of outcomes in responder analyses greatly reduces the precision of estimates of effect, necessitating larger sample sizes to detect significant differences.
Fig 1 illustrates this issue. Data from the treatment arm of a hypothetical RCT are bimodally distributed with a small but distinct group of “responders” achieving substantial (≥50%) pain relief. Because a larger group achieves little or no relief the average reduction in pain in this group is small. In Fig 1B the control group data are added and are unimodally distributed. In this example the between group average difference in pain relief is small because of the large number in the treatment group who experienced little pain relief. In this case, a treatment that is actually effective for a certain group of people might be considered not meaningfully better than control, on the basis of a small mean between group difference. In this instance a case can be made that a responder analysis is more informative than the average between group difference in pain scores.
Figure 1Impact of bimodally distributed outcomes. Hypothetical data from an active intervention arm of an RCT. A small proportion of participants experience substantial (≥50%) pain relief in the intervention whereas most experience little improvement. Because most participants experience little or no change the estimate of average pain relief experienced is modest. The control group data are superimposed (red). Outcomes are unimodally distributed in this group. The difference in average pain relief between groups is small despite the presence of more “responders” in the active group.
but it has not yet been established whether this pattern is commonly present across a wide range of interventions. As such, the case for the wider replacement of the between group difference with responder analysis in trials and reviews of interventions for pain has not been established. One finding that would support such a case would be the observation of bimodal outcome distributions (expressed as percentage change from baseline) in groups receiving various different types of interventions. This case would be further supported if the equivalent outcomes from placebo, waiting list, or minimal care control groups did not also show a bimodal outcome distribution, because evidence of bimodal outcomes is not in itself evidence of bimodal effects. When outcomes are normally distributed in all groups or similarly distributed in intervention and control groups, it is likely that the average between group effect remains the most representative and informative estimate of treatment effectiveness.
Aim
The aim of our exploratory study was to investigate the distribution of participants’ pain outcomes using data from a selection of trials of nonsurgical interventions for spinal pain.
Methods
This was a secondary analysis of data from recent trials of different interventions for spinal pain. We sourced data on participant outcomes from a convenience sample of 10 trials of nonsurgical interventions for low back and neck pain conducted by researchers at the George Institute for Global Health that measured pain intensity as a primary outcome. Our analysis included all relevant trials conducted by this group over a 9-year period.
For each trial, we calculated the percentage change in pain intensity score from baseline to the first follow-up point after intervention for each individual participant in the trial. We then aggregated these to create a distribution of change scores (outcomes) for each arm of the trial.
We divided the samples according to duration of pain (acute and chronic) and pooled data according to the type of intervention. Before combining data from different trials we checked, using visual inspection, that baseline pain scores were similarly distributed. When this was the case we combined data from different trials in the following groups, established a priori:
Data from all comparison/control groups (placebo/sham, wait list, minimal care) were combined. For dropouts, missing data were imputed using the baseline observation carried forward method.
Tests for normality of the outcome distribution were conducted using the Shapiro-Wilk test. When these tests suggested non-normality we inspected the distribution plots visually and attempted to classify them. When separate distribution peaks suggestive of groups with distinct distributions were observed we planned to assign a cut point (“knot”), decided post hoc upon observation of the data and consider whether distributions on each side of the knot followed known parametric distributions, for example, normal, binomial, or Poisson distribution. However, we did not observe such distinct peaks in our data so this was not conducted. We conducted a sensitivity analysis to examine the effect of imputing missing data by reanalyzing the data using an available case approach. No adjustment was made for baseline levels of pain intensity.
In a secondary analysis we tested the proposition that small average between group treatment effects in our data set might hide the fact that a meaningful number of additional patients experienced substantial improvements above what they would have received in the comparison condition. Using the IMMPACT group recommend thresholds of 30% and 50% improvement from baseline
as markers of moderately and substantially important change, respectively, we calculated the risk difference (absolute risk increase or decrease) and number needed to treat for all trials that reported an average between group difference of <10 mm or 1 point on a 100-mm 10-point pain scale. This threshold was chosen on the basis of the recent Outcome Measures in Rheumatology (OMERACT12) consensus group recommendations for a threshold to define a minimal clinically important effect.
Data were available from 10 RCTs published between 2007 and 2015 that included a total of 2,926 participants and were conducted by the George Institute for Global Health and the University of Sydney. In acute pain conditions we included data from 3 RCTs
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
recruited participants with subacute low back pain, with pain of 6 to 12 weeks duration; we included this study in the chronic category but conducted a sensitivity analysis to see whether its exclusion from this group affected our results.
Interventions included exercise interventions (Tai Chi, motor control exercise, McKenzie spinal exercises, multimodal exercise, graded activity), manual therapy (spinal manipulative therapy), medication (paracetamol 665 mg modified-release regular dose plus placebo tablets “as needed,” or 665 mg modified-release regular dose and 500 mg paracetamol immediate-release tablets “as needed,” diclofenac 50 mg twice daily). Placebo/minimal care control interventions included placebo tablets, a single advice session and booklet, and detuned electrotherapy. Two trials included participants with whiplash-associated disorder
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Table 1 outlines the included trials and interventions contributing to the analysis and Table 2 summarizes the characteristics of participants in the included trials.
Table 1Included Trials According to Intervention Group
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Participant outcomes were not normally distributed for any of the groups (Shapiro-Wilk test for all groups P < .05; Table 3). No evidence of bimodal of distribution was observed for any of the groups (Fig 2). Distributions generally showed a unimodal distribution with a strong positive skew reflecting that most participants in these trials experienced complete or substantial improvement in their pain, whichever group there were assigned to, including placebo and minimal care groups. Participants receiving exercise interventions showed a broader and more variable distribution.
Table 3Results of Tests of Normality According to Intervention Group (Shapiro-Wilk Test)
Figure 2Distribution of participant outcomes in acute pain conditions; −1.0 = 100% pain relief. Please see this figure in color at http://www.jpain.org/.
Participant outcomes were not normally distributed for any of the groups (Shapiro-Wilk test for all groups P < .05; Table 3). No clear evidence of bimodal of distribution was observed. The placebo/minimal care and exercise group distributions were unimodal with a positive skew (Fig 3) with the peak of the distribution reflecting that most participants experienced little change from baseline. The manual therapy distribution showed a broader and more random distribution but without strong evidence to suggest bimodality. Reanalyzing with data from the study of Pengel et al
Figure 3Distribution of participant outcomes in chronic pain conditions; −1.0 = 100% pain relief. Please see this figure in color at http://www.jpain.org/.
Reanalysis of all groups using an available case approach instead of imputing missing data using baseline observation carried forward did not reveal evidence of bimodal distribution for any of the intervention classes.
Secondary Analysis
We calculated the risk difference and NNTB for comparisons that showed small (<1 point change on a VAS) between group mean differences. These results are shown in Table 4, Table 5. In our sample, no comparisons with between group differences <1 showed statistical significance at the P < .05 level. Similarly, the NNTB were frequently high (>10) with 95% confidence intervals (CIs) that crossed the line of no effect for all comparisons.
Table 4Results of Responder Analyses for Effect Sizes <1 Point on a 0 to 10 Pain Scale (Acute Pain)
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Placebo versus Paracetamol (regular dose + as needed)
.17 (−.21 to .46)
.02 (.04 to .08)
NNTB 41 (NNTH 27 to NNTB 12)
.05 (−.02 to .12)
NNTB 21 (NNTH 43 to NNTB 8)
Placebo versus Paracetamol (regular dose)
−.07 (−.40 to .27)
−.007 (−.05 to .06)
NNTB 128 (NNTH 19 to NNTB 15)
.01 (−.06 to .08)
NNTB 91 (NNTH 18 to NNTB 13)
Paracetamol (regular dose + as needed) versus Paracetamol (regular dose)
−.19 (−.53 to .15)
−.016 (−.08 to .04)
NNTH 61 (NNTH 13 to NNTB 22)
−.04 (−.11 to .03)
NNTH 28 (NNTH 9 to NNTB 29)
Abbreviations: NNTH, number needed to treat to harm; NNT, number needed to treat; NSAID, nonsteroidal anti-inflammatory drug; SMT, spinal manipulative therapy; GP, general practitioner management.
∗ Negative values indicate the comparator intervention was more beneficial.
† Because lower values of NNT represent larger effect sizes, no effect represents infinity; see Altman.
In this sample of RCTs of interventions for spinal pain we found no compelling evidence suggesting that outcomes are bimodally distributed for the interventions studied. The distribution of outcomes was different between chronic and acute conditions, with acute conditions predictably showing a positively skewed, but essentially unimodal, distribution, indicating that most participants experienced substantial pain relief. This pattern was the same regardless of the receipt of any particular treatment, or control intervention. For chronic conditions outcome distributions did not meet the threshold for normality as per the Shapiro-Wilk test, but were more evenly spread from mild worsening, to little or no change, to mild or moderate improvement.
Recommendations to prioritize or at least coreport responder analyses over the mean between group difference have been made with particular regard to the evaluation of drug therapies for pain.
ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief, Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors “Evidence” in chronic pain-establishing best practice in the reporting of systematic reviews.
Specific examples of bimodal distribution of outcomes have been presented for: nonsteroidal anti-inflammatory drugs (etoricoxib and naproxen) in acute postoperative pain, osteoarthritis pain, chronic low back pain, and ankylosing spondylitis
In the data presented in this report we found no evidence for bimodality of outcomes when we pooled data from trials of paracetamol and diclofenac (n = 1,114) for acute low back pain. In the primary analyses of both of those trials no significant between group difference was observed between the drug and placebo groups, suggesting a lack of efficacy, which may explain the lack of bimodal distribution. There was also no evidence of bimodality when we investigated each drug individually or for nonpharmacological interventions. These findings suggest that bimodal distribution of outcomes for pain interventions cannot necessarily be assumed.
In our secondary analysis we sought to see if small (<1 point on a 0–10 VAS) between group differences in pain intensity corresponded to potentially a favorable NNTB. The proposed value of responder analysis may lie with interventions for which a between group difference compared with the comparison condition is present (for example as indicated by a threshold of statistical significance) but where that difference is small. The absence of statistical significance in the available comparisons in our analysis is reflected in CIs for the risk difference and number needed to treat that all cross the line of no effect. In these data, responder analysis would not meaningfully alter our interpretation of the data compared with the average between-group difference. The lack of bimodality of outcomes in our data may be interpreted as reflecting comparisons of treatments that lack even marginal effectiveness. The 2 comparisons in our data set that showed as statistically significant between group differences (Tai Chi vs waiting list control for chronic low back pain, mean difference 1.2 (95% CI, .6–1.8).
had mean between group differences of >1 point on a pain scale which exceeds the OMERACT 12 group's recent recommended threshold for a minimal clinically important effect. For those comparisons, outcomes in the intervention group did not show a bimodal distribution. Post hoc analysis of these comparisons reveals relatively a favorable NNTB for a 50% reduction in pain of 4 (95% CI, 2–12) for Tai Chi versus usual care and 4 (95% CI, 2–26) for advice and exercise versus placebo. In these examples a ‘responder’ analysis, although offering an alternative way to present the data, does not change our interpretation of the results compared with a traditional means analysis. Because of the absence of bimodal distribution in these data one interpretation may be that the between group difference may be representative of the average true treatment effect but may have been adequate to tip a larger number of participants over the threshold from “non-responder” to “responder” in the treatment group. In this way generally modest treatment effects may give the false impression of large treatment “responses.”
It is clear that the mean between group difference is a somewhat blunt tool with limitations. It tells us little about the individual experience of participants in clinical trials or that factors that drive them. In certain circumstances (ie, in which strong evidence of bimodality is present), responder analysis may be useful although it is crucial that the thresholds used to establish response have validity and are established a priori. Despite the problems outlined previously in establishing clinical “response,” it is arguable that responder analyses offer a simple and understandable estimate of what is likely to happen to a patient given a treatment in a clinical setting and therefore useful to inform clinician and patient decision-making. However, it remains important to recognize its limitations as an estimate of efficacy. When outcomes are not bimodally distributed it is arguable as to whether responder analysis performs that task any better.
As well as aiming to offer a more meaningful interpretation of bimodally distributed outcomes, responder approaches have been recommended for identifying potentially important characteristics of responders that may act as effect modifiers.
However as discussed previously, because “response” includes the influences of natural history, regression to the mean, and the nonspecific effects of interventions, such approaches conflate outcome with treatment response and risk identifying general prognostic indicators as opposed to actual treatment effect modifiers. There are alternative methods for identifying patients who are more likely to respond to a particular treatment that minimize that risk. These include exploring treatment effect modifiers using interaction analyses
or by exploring proposed mechanisms of action of different treatments using mediation analyses. These studies can also inform clinical decision-making although still require judgement regarding the size and likely clinical importance of any observed effects.
Other approaches to interpreting patient-reported outcomes are emerging. For example, cumulative distribution functions have been proposed, in which the full distribution of participants’ outcomes in each group is plotted. This approach visualizes the difference between groups in the proportion of participants with a specific outcome across the full range of possible outcomes allowing for the difference between groups to be assessed at any threshold for defining “response.”
Use of the cumulative proportion of responders analysis graph to present pain data over a range of cut-off points: Making clinical trial data more understandable.
Such an approach mitigates the risks associated with relying on a single threshold, although potentially allows increased freedom for selective interpretation.
An important limitation of our study is that it is restricted to a convenience sample of data from a relatively small number of trials of interventions for spinal pain. We may have lacked adequate power to reliably detect bimodality. Further analysis with data from a larger set of trials for a broader range of interventions and painful conditions would be useful. In addition, we have focused on pain as a sole outcome and it is possible that other important outcomes, for example, disability or quality of life, may have shown different results.
Conclusions
In a convenience sample of clinical trials of a variety of conservative interventions for spinal pain we did not find evidence that clinical outcomes were bimodally distributed. These findings suggest that bimodal distribution of outcomes should not be assumed in interventions for spinal pain and do not support the automatic prioritization of responder analysis over the between group difference in the evaluation of treatment effectiveness for pain. In light of this, a pragmatic approach for trials might be to prospectively plan for either analysis approach, contingent on the observed distribution of outcomes. We note, however, that this approach is likely to have practical implications for study design and conduct, because the required sample size for a responder analysis may be much larger than that for a between group difference. For systematic reviews of trials of interventions for pain, when information regarding the distribution of outcomes is likely to be missing in most cases and when individual patient data are often not accessible, the decision is likely to be driven by what is reported in the included studies. There is a case for including both types of analysis.
References
Altman D.G.
Confidence intervals for the number needed to treat.
Use of the cumulative proportion of responders analysis graph to present pain data over a range of cut-off points: Making clinical trial data more understandable.
Estimating the number needed to treat from continuous outcomes in randomised controlled trials: Methodological challenges and worked example using data from the UK Back Pain Exercise and Manipulation (BEAM) trial.
Assessment of diclofenac or spinal manipulative therapy, or both, in addition to recommended first-line treatment for acute low back pain: A randomised controlled trial.
Why caution is recommended with post-hoc individual patient matching for estimation of treatment effect in parallel-group randomized controlled trials: The case of acute stroke trials.
ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief, Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors
“Evidence” in chronic pain-establishing best practice in the reporting of systematic reviews.
S.J.K.'s research is supported by the National Health and Research Council of Australia. M.L.S.' research is supported by the Chiropractic and Osteopathic College of Australasia Research Limited.
The authors have no conflicts of interest to declare.