If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Faculty of Dentistry, University of Toronto, Toronto, Ontario, CanadaUniversity of Toronto Centre for the Study of Pain, University of Toronto, Toronto, Ontario, CanadaDepartment of Dentistry, Mount Sinai Hospital, Toronto, Ontario, Canada
Human pain neuroimaging has exploded in the past 2 decades. During this time, the broader neuroimaging community has continued to investigate and refine methods. Another key to progress is exchange with clinicians and pain scientists working with other model systems and approaches. These collaborative efforts require that non-imagers be able to evaluate and assess the evidence provided in these reports. Likewise, new trainees must design rigorous and reliable pain imaging experiments. In this article we provide a guideline for designing, reading, evaluating, analyzing, and reporting results of a pain neuroimaging experiment, with a focus on functional and structural magnetic resonance imaging. We focus in particular on considerations that are unique to neuroimaging studies of pain in humans, including study design and analysis, inferences that can be drawn from these studies, and the strengths and limitations of the approach.
This article provides an overview of the concepts and considerations of structural and functional magnetic resonance neuroimaging studies. The primer is written for those who are not familiar with brain imaging. We review key concepts related to recruitment and study sample, experimental design, data analysis and data interpretation.
Understanding how pain is encoded in the brain has been a fundamental challenge for pain researchers. Despite the universality of acute pain and the high prevalence of chronic pain, we have yet to precisely characterize the mechanisms of pain perception and modulation in health and disease. The complexity of identifying these mechanisms stems from the multidimensional nature of pain—pain is a complex amalgam of sensory, affective, cognitive, and motor responses to dynamic internal and external states. The challenge of characterizing mechanisms increases as pain becomes chronic, with widespread plasticity of nociceptive and modulatory pathways contributing to the ongoing experience of pain. The rise of neuroimaging techniques offers the potential for breakthroughs in these efforts. Neuroimaging approaches (including functional magnetic resonance imaging [fMRI], positron emission tomography, electroencephalography, and other approaches) are extremely powerful tools that offer unique, noninvasive, in vivo views of central processes. Indeed, functional and structural neuroimaging studies have identified neural responses and features of acute,
Because of the promise of these new techniques, pain imaging is rapidly growing and will continue to expand as scanning facilities become more available and analysis software becomes increasingly user-friendly. Although brain imaging findings can provide important insight into central mechanisms, there are many aspects of study design and analysis that must be carefully considered and planned a priori to obtain a robust, reproducible result. Indeed, a recent systematic review of fMRI data showed that a data set could be processed through almost 7,000 unique pipelines, with almost 35,000 resulting maps.
This highlights the importance for non-imagers and new trainees who read neuroimaging reports to be familiar with some of these considerations and how they may affect outcomes and inferences. In this article, we provide an overview of these considerations, and key questions that should be asked when reading an imaging report. We will not weigh in on the many exciting theoretical debates in the pain neuroimaging community, such as the specificity of pain-evoked cortical and subcortical responses or the feasibility of brain-based biomarkers for pain (see, for example Davis et al
). We instead provide a primer and guidelines to assist trainees and nonexperts in designing and reporting pain neuroimaging experiments as well as reading and evaluating articles.
Because of the brain's key role in generating pain percepts, the ability to noninvasively examine brain function in vivo is critical. We focus specifically on functional and structural magnetic resonance imaging (MRI), because MRI is the most common tool used in human pain studies. Other promising human neuroimaging methods such as electroencephalography, magnetoencephalography, functional near-infrared spectroscopy, transcranial magnetic stimulation, positron emission tomography, and arterial spin labeling are beyond the scope of this review.
A cursory PubMed search with the search terms “(pain or nocicept*) AND brain AND MRI NOT review” on January 20, 2017 resulted in 4,895 studies (see Fig 1). A number of publications address methodological issues and statistical considerations associated with human neuroimaging in general,
and we encourage neuroimagers to consult these reviews for additional guidelines, and more in-depth discussion of some of the technical considerations we delineate herein. Our aim is to address how these issues may specifically affect pain research. Our goal is to provide an introduction of particular use to a novice audience, including new trainees, clinicians, and/or non-imagers interested in evaluating studies on the neuroimaging of pain. We believe that awareness of methodological and inferential limitations can lead to positive advances. We focus on the elements that should be included in the Methods and Results sections of any report, and address inferences that may be drawn during the Discussion.
Recruitment and Sample
After deciding on a research question, the first aspect any researcher considers is his or her research population and sample. It is important to acknowledge that neuroimaging experiments require unique constraints on enrollment, and many of these constraints might put a particular burden on neuroimaging studies of patient samples.
MRI scanners are large magnets with a narrow bore in which the participant lies during scanning. The participant's brain is scanned with specialized coils embedded in a small head cage, or head coil. This provides several constraints that are rarely acknowledged but can substantially affect a study. Subjects must fit in the narrow bore, and therefore MRI studies are likely to exclude obese patients, despite the fact that obese individuals are more likely to report severe pain than normal and underweight counterparts.
Because the MRI scanner uses radio waves, participants cannot have any ferromagnetic metal in their body, because there is a risk of the main magnetic field pulling on the metal, especially as the subject enters the bore of the scanner. Radio waves can also cause heating in tissues, and this can be exacerbated by any electric conducting materials including cables and wires. Furthermore, metal in the head and neck region can cause image quality issues, because these reflect the radio waves. These factors provide severe constraints on researchers interested in studying topics such as phantom pain in amputees or pain in individuals after certain surgical procedures, because these patients may have metallic (and possibly ferromagnetic) objects in their body that are not safe in a research MRI scanner. Studies of pain in other populations, such as the elderly, or patients with cardiac pain, may also be limited because of other types of incompatible implants or devices, including pacemakers or medication pumps. Finally, some patients may experience claustrophobia because of the narrow bore or confining coil surrounding their head, and so patients with comorbid anxiety disorders may be less likely to complete these experiments. These constraints mean that such studies might routinely exclude the most severe cases, which must be considered when drawing general inferences about pain populations. Studies must always measure and report complete characteristics of the sample as well as inclusion and exclusion criteria.
Because of the constraints on the participants that can be recruited for an imaging study, it is often tempting to recruit a convenience sample—usually healthy young adults from the university environment. However, such a sample is not representative of those disproportionately affected by chronic pain: women between the ages of 30 and 50 of lower socioeconomic position.
Nonetheless, many neuroimaging experiments are not designed to isolate mechanisms of chronic pain, and may instead be interested in more general neural mechanisms, such as those involved in the psychological modulation of pain, or basic processes of nociception and acute pain. Convenience samples may be appropriate in these cases, because they allow investigators to isolate basic mechanisms underlying pain perception in healthy individuals. However, even studies of acute pain in healthy volunteers should carefully consider generalizability to the larger population, because convenience samples may lack diversity in age, educational background, socioeconomic status, race and ethnicity, depending on the study, and each of these factors has been shown to influence pain.
All experiments should report full sample characteristics so that readers and reviewers can evaluate generalizability to the population.
Many experiments compare patients with pain disorders with matched control patients and compare results across groups. If healthy controls are not selected to match nonpain-related characteristics, there is a strong chance that differential findings between groups will reflect processes other than the pain disorder of interest (eg, group differences because of higher incidence of mood disorders, obesity, and comorbid chronic pains).
It is strongly recommended that such matching be done by carefully selecting the control population, rather than attempting to find “clean” pain patients (eg, those free of comorbid psychopathology). Although capturing processes related to comorbid disorders complicates interpretation, and can lead to confounds, individuals without such comorbidities may not be representative of most pain patients. Potential confounds and comorbidities should be carefully considered, and if they are unavoidable, accounted for with appropriate experimental designs and statistics. Researchers must also carefully consider differential artifacts related but not germane to clinical presentation. For example, recent work indicates that resting state fMRI (rs-fMRI) studies are highly susceptible to even small motion artifacts, such that a group difference could emerge if there was a systematic difference between the groups in movement
(as discussed in the section, Preprocessing). Factors such as discomfort from lying in the scanner, comorbid movement disorders, or psychiatric disorders such as anxiety might be more likely in pain populations and lead to greater motion in patient groups, resulting in differential between group effects related to movement rather than the measure of interest.
Thus, head motion and potential group differences in motion must be carefully considered and quantified in rs-fMRI as well as task-based fMRI studies, and data cleaning strategies should be carefully used to mitigate the contributions of such factors.
All studies must ensure adequate sample size and statistical power to reliably detect meaningful effects, and neuroimaging experiments are not unique in this regard. Neuroimaging studies are expensive to run, generally requiring hundreds or even thousands of dollars in scanning fees for each subject. It is therefore common for sample sizes to be smaller in neuroimaging studies than other types of experiments. Although small samples are common because of the financial burdens and additional constraints on neuroimaging studies, this has led to the publication of many studies that are likely to be underpowered for effects that might be of greatest clinical interest, such as individual differences in pain populations. Underpowered studies are less likely to detect true effects and more likely to find false positive results by overfitting data, thereby decreasing the likelihood that findings will replicate.
Studies of chronic pain populations might be particularly susceptible to such effects because of the breadth of diagnostic categories (meaning that patients with similar diagnoses but divergent symptom profiles might be included in the same small sample; eg, fibromyalgia, which is commonly a diagnosis of exclusion,
where symptom profiles can vary greatly) and heterogeneity in terms of comorbid disorders and medication use.
Because of the potential for false positive results in underpowered imaging studies, studies with small samples are coming under increased scrutiny. The onus will increasingly be placed on researchers to show that samples are adequately powered to detect expected effects. Such demonstrations must be on the basis of justifiable, a priori calculations, so researchers should always estimate power before conducting an fMRI study, and report how the sample size was determined. Statisticians have recently introduced several new, easily accessible fMRI-specific power analyses that make it easier to consider desired effect and sample sizes (eg, fmriPower.org,
) and will help to justify the funding of fully powered studies. These approaches can be used a priori to estimate sample size on the basis of expected effect size. Importantly, a number of these approaches require researchers to have preexisting imaging data, which is not always possible for new researchers or those using new paradigms. Power calculation on the basis of behavioral effect sizes may also be useful because these do not rely on previous imaging studies. These can be computed using data from pilot studies completed outside of the fMRI environment, which would substantially reduce costs.
The problem of underpowered studies can also be addressed by aggregating data via repositories and/or meta-analyses. Repositories facilitate data-sharing across imaging experiments. Some repositories host data from many different tasks, scanners, and populations. These repositories facilitate reproducibility, open science, and large-scale analyses that are relatively impervious to noise created by inconsistencies across individual experiments. Other repositories require contributors to collect brain images with standardized scanning protocols. These scans then undergo standardized preprocessing and analysis pipelines, allowing several groups with limited resources to pool brain imaging data and investigate larger cohorts of patients. Several pain imaging repositories (OpenPain [http://openpain.org]; Pain and Interoceptive Imaging Network [https://www.painrepository.org]
) have been established, and have proven to be successful. However, there are legal and ethical considerations that may affect a group's ability to contribute to such repositories.
Finally, meta-analyses statistically test the distribution of findings across studies, which permits assessments of the consistency of effects and overcomes some of the limitations associated with small individual studies. These studies allow for a principled approach to determining the relationship between a particular brain region and behavior. Several pain imaging meta-analyses have been published, including meta-analyses of pain-evoked cortical responses,
Consider the Context: Constraints of the MRI Environment
Neuroimaging experiments (especially MRI) take place in a unique environment, so studies must be carefully designed and made suitable for the imaging suite. As mentioned previously, this restrains the patient populations that can be studied. It also places substantial constraints on the types of experiments pain imagers can conduct. All equipment must be MRI-compatible and suited to the unique scanner environment. There are few commercially available MRI-compatible devices capable of delivering nociceptive stimuli, because these must be completely nonferromagnetic and must not introduce electrical noise during data collection. Even commercially available devices may have differential success at different scanners depending on considerations such as MRI field strength, sequence design, and even bore size. Thus, many pain researchers have examined brain responses associated with acute thermal and electrical pain, but few have examined cold pain, cold allodynia, or chemical pain. During the experiment, painful stimuli are usually controlled by simple computer tasks that coordinate stimulus presentation and synchronize timing with the MRI scan (see the section, Materials and Procedures). Experimenters often use these programs to present visual stimuli (eg, task instructions, cues) and to measure responses (eg, pain ratings). Visual stimuli are usually presented on a computer screen that patients view through goggles or via a mirror that sits atop the head coil and reflects images displayed on a projector screen. Participants provide pain ratings and make other responses using devices such as button boxes, joysticks, mouse-like trackballs, or by moving their hands
because verbal responses are generally avoided because of the loud noises produced by the scanner and because head motion must be minimized. Thus, fMRI studies of pain may not capture the social and interpersonal elements that are likely to contribute to verbal pain ratings when a patient informs her doctor about her pain. This can be seen as a strength (eg, for researchers who investigate ascending nociceptive pathways and want to minimize social modulatory influences)
and few computer paradigms are able to capture these processes).
Head motion provides a third unique constraint afforded by the MRI scanner context. Participants cannot move their heads more than a few millimeters (typically <2 mm) during scanning, for fear of contaminating the data (see the section, Analysis). Some individuals, such as those with lower back pain, may not be able to lie still without pain, which provides additional constraints on patient samples and eligibility. In addition, painful stimuli that induce strong withdrawal responses such as startle and electric shock cause unique challenges because of task-related motion if not carefully mitigated
(see the section, Preprocessing, for discussion of motion correction). Many researchers choose to familiarize subjects with the stimuli to minimize startle and other withdrawal-related behaviors, as well as to ensure stimuli are tolerable for the participants. This of course imposes limitations on the intensity of pain or novelty of the stimuli that can actually be administered in the scanner. Furthermore, it is unclear to what degree neural responses might reflect regulation of the prepotent withdrawal response even when the participant is able to voluntarily suppress motion. All of these considerations must be weighed in terms of the construct validity of painful stimuli presented within the scanning environment.
Another key consideration is the scanning parameters and sequences. Most sites will have “out of the box” sequences, but these should be selected with consideration, to best optimize the sequence to maximize signal from regions of interest. Although the technical aspects of these decisions are outside the scope of this review, it should be recognized that these considerations must be tailored to experimental hypotheses. We therefore recommend that researchers work closely with magnetic resonance physicists and experts to ensure that the selected sequences are ideally suited to the temporal resolution of their effect of interest and the particular anatomical region(s) of interest. For example, some brain stem structures might require multiple acquisitions of different contrasts for precise anatomical localization. Additionally, protocols might be adjusted for experimental efficiency. For example, a standard diffusion-weighted study to investigate white matter in the brain might acquire 64 diffusion-encoding directions, with b-shell of 1,000 s/mm2. However, this acquisition is more than might be needed for a standard fractional anisotropy (FA) map, which requires a minimum of 6 diffusion-encoding directions,
resulting in an unnecessary cost and participant burden. These study-specific considerations might also be balanced against the homogeneity requirements of data-sharing repositories, although collaborative and multisite studies also exist (eg, the Multi-Disciplinary Approach to the Study of Chronic Pelvic Pain
Authors should outline everything that participants did from the start of the study until they finished. A study begins when a participant provides informed consent, as approved by a local institutional review board, which evaluates the ethics of the study. Informed consent and ethics approval should be acknowledged in any study, and authors should include important details, for example whether authorized deception was used in any studies that include misleading information (eg, studies of placebo analgesia), or whether patients were asked to refrain from taking their prescribed medication. In some cases, understanding the relative rate of participation is important and studies should report the number of people contacted for recruitment. Were subjects debriefed at the end of the study? For studies of individual differences and/or clinical severity, which questionnaires were administered (not just the ones relevant to the current report)? Similarly, were any tasks or relevant procedures administered outside of the main experiment that are not analyzed in the report? This is important information because additional tasks and measures might influence behavior in the main paradigm, and reviewers and readers should be able to evaluate potential confounds.
The description of task design should include all details necessary for the purpose of external replication, including the instructions, counterbalancing schemes, and task timing. What platform was used for experimental programming, and how did subjects provide responses? If decisions about task design were made on the basis of earlier pilot testing, it can be useful to report these details (eg, “We collected pain ratings 20 seconds after heat offset. Pilot testing revealed that there was no difference between ratings made immediately after offset vs after a delay. We chose the current design so as to reduce contamination of the blood-oxygen-level dependent (BOLD) response to heat”). It is also important to provide the exact instructions provided, as well as a description of the scales used to assess painful percepts, and other features of the stimulus. What were the exact instructions? What was the resolution of the scale? What were the anchors to the scale?
Some pain studies use nociceptive stimuli to elicit pain in healthy subjects and/or in chronic pain patients. Researchers can choose between several nociceptive stimulation paradigms. Thermal pain is the most well established and most common stimulus used in neuroimaging.
This is likely because the paradigm works well in the scanner and is relatively convenient. However, it is a poor model of chronic pain, because heat pain is qualitatively dissimilar to pain experienced in most chronic pain conditions. Therefore, the researcher must ensure that inferences are appropriate. It should not be taken for granted, for instance, that acute heat pain is a valid model for chronic pain that is neuropathic or musculoskeletal in nature. Other acute pain models such as ischemia, muscular hypertonic saline injections, bladder filling, and visceral/rectal distention might be more appropriate for questions about the neural bases of musculoskeletal or visceral chronic pain disorders.
However, not all pain experiments seek to model chronic pain; some focus on understanding neural processes associated with acute pain and its modulation. In this case, different considerations guide the evaluation of validity. Are experimental manipulations appropriate with regard to the psychological construct the researchers are testing? Is the chosen noxious stimulus appropriate for the questions of interest (eg, does thermal vs electrical vs laser pain activate different ascending fibers)?
Questionnaires are also often an important component to a study. These can serve as screening tools against exclusion criteria (eg, handedness, other neurological disorders, and MRI contraindications, such as claustrophobia), to characterize a chronic pain population (eg, the McGill Pain Questionnaire
), or as a measure or covariate of interest. When used judiciously, such measures can help strengthen inferences about the mental processes that particular patterns of activation might be subserving. However, each added questionnaire inflates the number of statistical tests and increases the chance for false positive results, if not carefully taken into account through multiple comparisons correction. As discussed in the section, Group-Level Analyses, the issue of type I error is particularly germane to neuroimaging studies, and increases patient burden in terms of time spent on the study. Thus, the type and number of questionnaires included should be carefully considered on the basis of clear hypotheses. Questionnaires expected to have large amounts of overlapping variance (eg, the Fear of Pain Questionnaire
) can increase the chance of incidental findings with little additional explanatory power added to the study.
There are 3 different approaches that can be used to investigate neuroimaging data: hypothesis-driven, exploratory, and reproduction/replication. A typical neuroimaging study uses these respective approaches to test a hypothesis, develop new hypotheses, and confirm (or reproduce) the findings. Hypotheses should be selected a priori, before any data are collected, on the basis of theory and previous data. When a hypothesis is generated, a pilot study is usually performed on a small sample size to ensure that the experimental paradigm is valid (ie, to ensure that the hypothesis is appropriately operationalized), and to determine the feasibility of the study. Small details may be adjusted at this time (eg, instructions, task timing, etc). Pilot studies are rarely submitted for publication, because they often use liberal statistical thresholds (because of low power). They allow researchers to adjust the paradigm before fMRI scanning, because scan time is costly and one wants to make sure the task is clear so that scanning can proceed uninterrupted. Pilot studies can also be used to estimate effect size, although it is known that small samples can overestimate effect size.
Then, an independent sample (whose size has been determined using a power calculation) consisting entirely of new subjects should be acquired to formally test the hypothesis. Preregistration allows researchers and reviewers to distinguish between a priori, hypothesis-driven analyses and post hoc exploratory analyses, which are intended to generate new hypotheses. Several preregistration sites now exist (eg, Open Science Framework https://osf.io/registries/
), and several neuroscience and psychology journals now allow researchers to conduct preregistered reports (https://cos.io/rr/). We hope that interest in preregistration will grow within the pain community. We recognize that because of an emphasis on novelty among publication and funding outlets, the choice to follow a path from exploratory to replication studies is generally not in the hands of young or even senior investigators. However, because of the paucity of replication studies in fMRI, and its reliability being called into question,
In this section we provide an overview of the analyses that should appear in a typical imaging report and provide some guidance about how choices might affect pain studies. We focus primarily on the Results section of a typical fMRI-BOLD investigation, because these are the most prevalent type of studies, although we acknowledge structural MRI studies where relevant. Other modalities may have different analysis steps, and specific literature should be consulted. Furthermore, trainees should refer to more comprehensive articles on how to analyze and report neuroimaging studies.
When data are collected, several steps are necessary to transform the data into the proper multidimensional format that can undergo statistical analysis. These transformations are grouped together and generally referred to as preprocessing. A summary of these steps is provided in Fig 3. The fMRI pipeline usually includes steps to: 1) remove the first set of volumes acquired (usually between 5 and 10), because it takes some time for the magnetic field to reach a steady state; 2) correct for the fact that fMRI data are collected in slices, and slice time acquisition differs within each time point (“slice time correction”); 3) test whether the head moved at any time point, which can induce artifacts, and correct for any head motion so that analyses capture to the same brain region over time (“motion correction”; the interested reader is referred to other sources for in-depth discussion of the problem with motion in MRI, and reviews of approaches to motion correction
); 4) remove high- or low-frequency noise that can contaminate the signal or lead to spurious results (“temporal filtering”); 5) register the subject's functional images with structural images (because the images are collected separately in time and head displacement might occur; “coregistration”); and 6) spatially transform the individual's images to a standard template brain (eg, Montreal Neurological Institute template, Talaraich-Tournoux atlas, a group mean) with a specific stereotaxic space so that analyses are possible across individuals, who vary in neuroanatomy (“normalization”). The normalization step is also important for spatial specificity when reporting results of an fMRI study, and allows findings to be interpreted by other researchers and be included in subsequent meta-analyses. Several additional optional steps might occur at this stage (eg, correction for artifacts, spatial smoothing, etc), but details of these individual procedures are outside the scope of the current review.
Preprocessing structural MRI (sMRI) data for gray matter analysis largely comprises: 1) removing scanner-induced noise from the structural T1-weighted brain images, and 2) segmenting tissues on the basis of image contrast, and/or model the tissue of interest. Finally, the images are transformed and registered to a stereotaxic coordinate space for analysis. For example, for a gray matter analysis the B1 field is calculated to remove signal inhomogeneities. This allows for better tissue classification, and better estimates of gray matter volume or cortical thickness. For diffusion-weighted scans, which are used for white matter analysis, susceptibility artifacts, such as eddy currents induced by gradients coils, are corrected using various algorithms. Next a diffusion model (whether the tensor model to calculate fractional anisotropy, or a tractographic model) is calculated and applied to the data.
It is important to note that many of the algorithms used for these specific steps may vary as a function of software package (ie, SPM [http://www.fil.ion.ucl.ac.uk/spm/], vs AFNI [https://afni.nimh.nih.gov/] vs FSL [https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/]), as well as the fact that individual programs exist to optimize each of these steps. We highly recommend that new researchers/trainees carefully investigate the specifics and defaults of the analysis software. Default assumptions are not always transparent, and may not always suit your individual needs. For example, if a researcher is interested in effects in small regions like the periaqueductal gray or the hippocampus, he or she might want to minimally spatially smooth the data (which blurs the boundaries of image units [3-D pixels, or “voxels”]), rather than use the default values within a software package. Large smoothing kernels are suitable for analyses of cortical regions, but not for smaller, more discrete brain regions.
As another example, the default settings of SPM (http://www.fil.ion.ucl.ac.uk/spm/) can restrict statistical analyses to regions where the calculated signal is greater than a set threshold. This can particularly affect voxels that are prone to signal loss from susceptibility artifacts in regions with tissue-air boundaries,
such as the inferior temporal lobe and the orbitofrontal cortex and other ventral brain areas, which are of interest in many pain studies. These artifacts can be reduced through informed decisions about acquisition parameters.
Neuroimagers benefit from developing expertise across multiple analysis packages and making informed decisions about which approach to use for a given analysis. Imagers should also develop practices that involve visualizing data at each stage of analysis, which can help to identify such issues as substantial dropout, missing data, and poor normalization or coregistration, among other important steps. Of course, when researchers elect to apply specific approaches, they should report their decision process and the reason that they opted for a given technique. The following 2 sections describe analysis considerations unique to task-based fMRI and rs-fMRI, respectively.
When fMRI data have been preprocessed, they can be analyzed in relation to the tasks that were administered during the fMRI session. The simplest way to think of the most common fMRI subject-level analysis approach, referred to as the general linear model (GLM), is that we test the correlation between each voxel's activity and the events that occurred during the experiment. To accomplish this, the researcher must carefully track the timing of all the events in scanning sessions relative to the start of scanning, including the stimuli delivered (eg, noxious stimuli, images) and any responses collected. It is also important to record responses required for the analysis (eg, pain ratings). A model representing the onset and duration of the various stimuli, a time course of the scan, is constructed (Fig 2). This stimulus time course will then be transformed to represent an ideal response to a given stimulus by combining the moments the stimulus was presented with a wave-like function that captures the biological delay in the BOLD response, referred to as a hemodynamic response function, or HRF. There are several options for HRF models, and these require an a priori understanding of how hemodynamics vary on the basis of the population studied, stimulus type, and the brain region of interest (ROI). For example, HRF in the brainstem, an important region when studying descending modulation of pain, may differ from those in the cortex,
When the events are combined, or “convolved” with the HRF, this generates an example time course of what the BOLD response would look like if activity within a voxel activation increased every time this stimulus was delivered (Fig 2). Different software packages model the HRF differently, and a researcher should understand how these differences can affect their results.
The researcher can simultaneously model several different stimulus types or conditions at this stage, and can test whether the magnitude of the response varies as a function of another variable (eg, whether responses to a noxious stimulus are larger in the trials when the participant rated the stimulus as more painful). This is called a “parametric analysis.”
The different independent variables, or regressors, included in the GLM depend critically on the a priori design of the task, and the comparisons that the researchers planned to make. To enhance the ability to measure task-related activation (ie, activation that correlates with the design-based regressor created through the convolution steps described previously), researchers can include nuisance variables, or regressors of no interest (eg, head motion, intercepts for each block of data acquisition in case mean activation varies when the scanner stops and starts), to capture the noise in the MRI signal. When the design matrix is complete, it is regressed against every voxel in a “mass univariate” approach
for each of the many hundred thousand or so voxels that make up standard fMRI whole brain acquisition. Effectively, this means that an analysis is performed for each voxel in the brain—leading to more than 100,000 statistical tests, and thus a high potential for false positive results (see the section, Group-Level Analyses, for discussion on correction for multiple comparisons). This subject-level analysis results in a coefficient (“beta”) for each voxel for each element of the design matrix (all stimulus conditions and all nuisance variables). The coefficient describes the strength of the relationship between that regressor (the experimental condition or conditions of interest) and the voxel's time course of activation. This usually involves averaging across stimuli and tasks, although some pain researchers use single trial analyses
Researchers can also compute contrasts across these regressors at the subject-level (eg, to compare high- and low-intensity stimulation) to generate voxelwise contrast values that describe whether a given voxel's activation strongly differs on the basis of condition. Whole brain summary statistic maps of these voxelwise beta coefficients or contrast values are then passed to group-level analyses to facilitate statistical analyses across groups of subjects.
Subject-Level Rs-fMRI Analyses
Rs-fMRI is becoming increasingly popular in the pain neuroimaging community, because it allows the investigation of intrinsic brain functional connectivity associated with pain-related phenotypes, including pain-related characteristics (eg, severity, duration) as well as cognitions (eg, pain hypervigilance, fear of pain, catastrophizing).
The preprocessing steps for rs-fMRI are similar to those of task-based connectivity, with a few notable exceptions (Fig 3). Rs-fMRI does not rely on a task-based regressor. Rather, time courses from spontaneous activity serve as a regressor. There are 2 primary approaches to the resting state analysis: seed-based connectivity or independent components analysis. The former relies on an a priori interest on the connectivity of a particular brain region, whose time course becomes the regressor. This approach requires additional nuisance regressors to correct for physiological noise, such as cardiac pulsatility and respiration-related motion. There are several approaches to correct for such noise,
or 4) global signal regression. In contrast, the independent components analysis method does not rely on the time course of a particular brain region, but rather is a data-driven approach that groups correlated regions. In this method, artifacts can be identified on the basis of the spatiotemporal patterns of the resulting components, and artifactual components can be removed from the signal. When summary statistics maps of connectivity have been produced, a group-level analysis can be performed, similar to task-based data.
When each subject's fMRI data have been preprocessed, and statistics are performed on individual data sets, a group-level analysis is undertaken. The purpose of this analysis is to identify brain regions that are significantly activated across participants or between groups. At this stage, univariate statistical tests can be performed, or multivariate statistics can be used. We focus primarily on traditional univariate statistics, which are tests that identify voxels in which there is significant activation as a function of the condition (or contrast) of interest. Perhaps the most common group-level analysis is the one-sample t-test on contrast maps across all participants to test whether activation at each voxel differs significantly as a function of the contrast (for example, whether all participants show greater activation in response to noxious, relative to innocuous, stimuli). This generates a whole-brain statistical map (eg, a distribution of t-statistics across voxels), which must then be appropriately thresholded for inference. Other standard group-level analyses include t-tests, which compare activation across groups (eg, patients vs controls), and correlation analyses, which identify the strength of the correlation between voxelwise activation and individual differences in some known parameter (eg, behavioral performance, symptom severity, or questionnaire measure).
As with any type of statistical analysis, one must ensure results are not driven by outliers. This is particularly important in small samples, and brain–behavior correlations, but is true of all experiments. As an example, with a small sample size, a correlation between pain duration and the thickness of the cortex in a particular brain region might be driven by a single subject, or a small subset of subjects. Results should be visualized (ie, scatter plots should be inspected to determine whether effects can be attributed to a small number of individuals). Alternatively, automated statistical techniques such as robust regression
The group-level analyses outlined previously, when conducted on each voxel throughout the entire brain, require tens of thousands of statistical tests (ie, multiple comparisons). Because of the inherent nature of statistics, this will necessarily lead to a cumulative proportion of false positive results. Therefore, it is important to adjust or correct for this inflated rate of false positive results (ie, correct for multiple comparisons). There are 2 fundamental approaches to the multiple comparisons problems: 1) ROI analyses, which reduce the number of multiple comparisons on the basis of a priori hypotheses, or 2) multiple comparisons corrections, which correct for the number of tests performed. These are not mutually exclusive, because testing multiple ROIs will still require multiple comparisons correction. We also note that thresholds can be computed at the level of the voxel or the level of the cluster (for a review of thresholding methods, see Poldrack et al
). We describe each of these considerations in the following paragraphs.
To reduce the number of multiple comparisons, researchers can reduce the search area within the brain and restrain analyses to a priori ROIs. This excludes statistical tests from brain regions we do not expect to be implicated in the analysis, and reduces the number of comparisons, because tests are restricted to voxels within ROIs. When an analysis is performed on every voxel within an ROI, multiple comparisons corrections must still be conducted, but there will be substantially fewer comparisons for which to correct. Researchers can also extract various coefficients across the ROI, depending on their question of interest. For example, the mean of the time series across all voxels in the ROI can be extracted. Another, and perhaps better, option is to extract the first principal component of the time series in the voxels. The resulting eigenvalue is effectively a weighted average of the activation in the ROI, where atypical voxels are downweighted. When the metric of interest is extracted, a single statistical test can be conducted, which would therefore be evaluated with a standard P < .05 statistical threshold if data are combined (eg, averaged) across voxels within the ROI. However, if coefficients are extracted from multiple ROIs or multiple voxels within an ROI, multiple comparisons correction is required. So how do researchers identify appropriate ROIs? One way is to include a functional localizer—an fMRI scan or contrast that excludes the condition of interest. For example, if we want to determine how a painkiller affects pain-related brain activation, a baseline scan of pain-related activation, in which each participant receives high- and low-intensity stimulation, can be performed before analgesic administration. This generates a functional localizer scan, which can be analyzed with whole-brain statistics. Regions that show significant pain-related activation would then be used as ROIs for subsequent tests of analgesic-related reductions. In this case, ROIs are on the basis of the same subjects within the same scanner; one can even identify subject-specific ROIs if the localizer task is designed properly
). Researchers can also use a priori ROIs on the basis of the relevant literature. In this case, ROIs may be defined on the basis of brain atlases that are in standardized stereotaxic space, using either neuroanatomical boundaries or extracting data from a sphere or box placed at specific coordinates. There are many atlases, and researchers must be judicious in their selection of ROIs, on the basis of their aims. Alternatively, ROIs can be on the basis of meta-analyses (eg, with automated term-based meta-analyses such as through Neurosynth
). Regardless of how individual ROIs are selected, the rationale for such decisions must be reported and should be carefully evaluated.
If a researcher does not have strong a priori hypotheses about specific ROIs, or wants to conduct whole brain analyses, they must use multiple comparisons corrections to account for the number of tests and adjust P-value thresholds accordingly. Several appropriate methods exist. For example, familywise error correction and Bonferroni correction set adjusted P-values by dividing the threshold by the number of tests. This leads to a very stringent threshold. The false discovery rate (FDR), in contrast, is a more lenient method to correct for multiple comparisons, which controls for a false-positive proportion (ie, the fraction of detected voxels that are false positive; which is an unobservable metric).
In FDR correction, a rate (q) between 0 and 1, is specified, which represents the maximum FDR that will be tolerated on average. Next, uncorrected P values of suprathreshold voxels are ranked on the basis of significance (from smallest/most significant to largest/least significant). Next, the voxel (or cluster) whose probability is greater than its ranking divided by the total number of tests, as corrected by a desired FDR (q-value) is identified. This voxel's P value is set as the threshold for a given FDR.
Each software package might implement corrections slightly differently (eg, some approaches account for the spatial contiguity of activation; cluster-wise correction), whereas others set thresholds on the basis of voxelwise values alone. Cluster correction is now advised
Comparisons across these approaches have been discussed elsewhere and we advise interested researchers to consult these articles for more thorough treatment of considerations involved in methods for multiple comparison corrections.
Another form of multiple comparisons correction is to perform nonparametric tests, such as permutation testing, which do not make parametric assumptions about the data. Permutation testing resamples data and randomizes the assignment of observations. This process is repeated several times (usually >1,000) to empirically determine a null distribution of the data, and to determine whether the observed differences are significant on the basis of the data, and the α value is set to minimize false-positive results. A notable advantage of permutation testing is that it identifies the false-positive rate on the basis of the data. However, permutation testing can be very computationally intensive. For a comprehensive review, see Hayasaka and Nichols.
This indicates that there is an abnormally large number of significant findings that are reported in these studies—implying that negative results are not being reported (the so-called “file-drawer issue,” where negative results are not published, and that only results that meet statistical significance are reported, despite a large number of statistical tests performed (p-hacking). The field has become increasingly aware of and vigilant against such practices, because they can hinder true progress, and perpetuate false-positive results.
Multivoxel Pattern Analyses
The analyses we reviewed are all referred to as mass univariate statistical tests, because each voxel is modeled as a single outcome in a statistical test, and multiple tests are conducted. In multivariate analyses, this relationship is switched—multiple voxels are modeled together rather than individually. One of the first multivariate analysis approaches was partial least squares,
which identifies patterns, or distributed networks of brain activity related to a construct (such as a behavioral task). This approach was used by Seminowicz and Davis to investigate the neural underpinnings of pain–cognition interactions.
for a review). In MVPA, algorithms from computer science and machine learning identify patterns across voxels that relate to a single construct. For example, MVPA has been used to test whether voxels within the anterior cingulate discriminate physical pain from other stimuli,
The pattern is then tested against data that were not included in the training set to determine whether it can reliably discriminate between states (eg, predicting that the condition was acute pain). This is repeated iteratively with different sets of held-out data through “cross-validation,” which then allows researchers to assess how well the pattern predicts outcomes. MVPA is a robust method to recover information encoded in ensembles of voxels. There are different algorithms and approaches that can be used for MVPA, but these are outside the scope of this primer. For a review, please see Cohen et al.
Another way to move beyond the consideration of individual activation clusters, or “blobs,” is by acknowledging the network nature of brain activity through functional connectivity analyses. Functional connectivity refers to a family of techniques that examine temporal correlations in BOLD activity across brain regions (highly related to the rs-fMRI techniques reviewed previously). Critically, despite the term “connectivity,” functional connectivity does not rely on anatomical connections, but on simple correlations between the time courses of different regions. Functional connectivity approaches can be divided into several classes: those that assume static relationships between regions over time (“static connectivity”), those that assume that relationships vary over time (“dynamic connectivity”), and those that identify the influence of neural regions on each other (which can provide directional inferences in connections; “effective connectivity”). The simplest static connectivity measure is the seed-based analysis, in which researchers extract the time series from a single ROI (the “seed”) and test for correlations between that region's activation and activation throughout the rest of the brain. The region's activation is used as the predictor in a GLM, and this results in a map of the strength of the correlations at each voxel, which can be threshold according to the mass univariate approach described previously. For example, one study reported that patients with temporomandibular disorders (TMD) have abnormally stronger connectivity between the medial prefrontal cortex and the posterior cingulate cortex/precuneus. This strength of this abnormal connectivity was related to pain rumination in these patients.
In this framework, brain structures are called nodes, and the connections between them are called edges. A graph theoretical approach captures the structure of a network (a set of nodes) and allows researchers to identify a variety of metrics for the network elements, including the density of connections of the brain regions (node degree), the clustering of neighboring nodes, the distance needed to travel between 2 regions (path length, or efficiency), which regions are highly connected to many other regions (ie, hubs), and which hubs are crucial (centrality). Together, these can be used to map networks in pain, and how such networks may be altered in chronic pain conditions. Studies indicate that graph theoretical analysis of rs-fMRI can discriminate between placebo responders and nonresponders in osteoarthritis,
Whereas static connectivity measures assume that regions vary over time but their correlations remain stable, dynamic connectivity approaches allow the correlations across regions to vary as well. Dynamic connectivity has been growing in popularity within the pain community.
This is essentially an interaction analysis, whereby a researcher can ask whether a condition modulates the connectivity between 2 brain regions. For example, a PPI was used to show that there is stronger coupling between the periaqueductal grey (PAG) and the rostral anterior cingulate cortex (rACC) under placebo analgesia than under a control condition.
PPI can also be used to examine whether altered connectivity is associated with a behavioral state. This approach has been used to show that increased ventrolateral prefrontal cortex (vlPFC) connectivity with the amygdala and nucleus accumbens during controllable pain was associated with reduced anxiety.
In PPI analyses, the “dynamic” aspect is known and driven by the experimental design (eg, placebo blocks vs control blocks). In other approaches, dynamics are more fluid. Sliding window analyses examine connectivity across time by binning the task into specific chunks of time; this approach was used to show that individuals with more dynamic connectivity between the medial PFC (mPFC) and PAG were more apt to spontaneously disengage from pain.
Finally, new approaches can also use purely data-driven approaches to identify moments when networks reconfigure. For example, state-based dynamic connectivity analyses use latent models to estimate whether brain networks remain stable or shift connections over time.
This approach revealed dissociations among brain networks during remifentanil administration (eg, connectivity within networks associated with emotion remained stable, whereas connectivity within networks associated with pain reorganized as drug infusion increased).
In contrast to static and dynamic connectivity, effective connectivity tests the plausibility of directional brain networks models—that is, not only is the activity between 2 brain regions correlated, but the effect of a neural region on another.
allow for whole brain searches. One such method, dynamic causal modeling uses Bayesian statistics to adjudicate between physiologically plausible models of interactions between brain regions (or nodes).
When a model has been selected, the algorithm can determine how a condition can modulate the connectivity strength (or edges) between the nodes that comprise the network. This method has been used to determine the brain networks underlying isosalient nociceptive and innocuous stimuli of other modalities.
Mediation analysis tests whether the relationships between 2 variables is significantly reduced when a third, intervening variable is taken into account. Mediation can be used as a form of effective connectivity by testing whether the relationship between a brain region and behavior, or the correlation between 2 brain regions, can be partially explained by connections with another region. For example, this method has been used to show that connectivity between the ventral striatum and ventromedial prefrontal cortex mediate the effects of self-regulation on pain.
Another study used mediation analysis to show that functional connectivity between the lateral prefrontal cortex and the PAG during rectal distention in healthy controls and patients with ulcerative colitis were mediated by the medial prefrontal cortex.
Notably, such a relationship was not observed in patients with irritable bowel syndrome, although the difference between the 2 groups was not formally tested and cannot, therefore, be considered a group difference. Another example showed that the relationship between structural gray matter abnormalities in the supplementary motor area and pain-related helplessness in temporomandibular disorders was mediated by corticofugal motor white matter tracts.
One might assume that these group-level analysis decisions are atheoretical with respect to pain mechanisms. This is not, however, entirely the case. As an example, one of the most basic historical debates in the field is whether pain is the product of “labeled lines” running from the periphery to pain-specific areas in the brain, or rather as a function of temporal and spatial patterns of activity throughout the neuroaxis. Labeled line theories date back centuries
and are largely on the basis of neuroanatomic studies of spinal pathways and lesion studies. This locationist approach led to statistical analyses biased toward clusters of activation, such as traditional univariate GLM analyses. Local interactions or spatial patterns within these clusters are obscured by spatial blurring. As such, these analyses have traditionally led to more modular interpretations of neural processing (ie, that single regions have single functions)—a framework more consistent with a specificity account of neural pain processing. However, this account has largely been unable to identify specific brain regions that are necessary and/or sufficient for the experience of pain. As described previously, MVPA has been used to increase specificity by searching for spatial patterns associated with particular psychological experiences. Such an analysis technique operates on the belief that activation more specific to pain will be identified by searching for unique patterns of spatial distribution, clearly more consistent with a pattern account. Similarly, studies using functional connectivity or changes in functional connectivity across different time epochs are likely to result in interpretations of pain as a function of temporal and spatial patterning, rather than the product of activation of any single specific pain center.
Another approach to analyze fMRI data is the use of models to identify brain regions or networks related to perceptual experiences and more abstract concepts, such as the hedonic value of pain. One approach to modeling perceptual (and thus subjective) experiences, percept-related fMRI, was developed by Porro and colleagues.
This method requires a continuous rating of a measure of interest for the duration of the trial. This point-by-point perceptual rating curve is used as a regressor of interest in the statistical model. This method has been used to investigate prickle,
To investigate more complex behaviors, such as decision-making, valuations, and social interactions, neuroimagers have adopted engineering modeling approaches. Theoretical computational models of such processes are developed to estimate underlying computations and predictions about neural signals,
and the dynamic interactions between theoretical signals. These models can generate predictive time-series that can be used as regressors in the statistical model to identify correlated brain activity, and identify regions underlying the process. Because of the complexity of pain, such approaches may be appropriate for pain research. Indeed, several families of computational models have been used to investigate the neural computations related to pain, including estimating the subjective value of pain and its relationship to cognitive factors and modulation.
The ability to observe changes in oxygenated blood flow throughout the brain while a subject is in a particular behavioral or cognitive state is extremely powerful. Interpreting these activations, however, can be complicated, and the inferences that can be drawn are affected by a number of factors. We discuss some of these factors, and the limitations they pose on the interpretability of data.
Because of the high cost of neuroimaging studies, and the difficulty in recruiting chronic pain populations, most pain imaging studies tend to be cross-sectional. These studies are important and useful because they can provide information on mechanisms. For example, if a brain activation is significantly correlated with pain intensity, one might conclude that that the activity is pain-related. However, observed activation might reflect many other correlated mental processes, so control conditions and interpretations must be carefully considered before making such conclusions. For example, correlations with pain intensity could reflect processes involved in magnitude estimation,
or other processes that accompany pain but are not specific to pain. One must ensure that the control condition is sufficient for making claims about differentiating between these processes. Furthermore, the exact relationship between BOLD activation and underlying neuronal activity is still an open area of investigation.
Brain activation is inherently correlational, and authors often speculate on the biological basis of a finding. Whereas activations observed are related to the stimulus, one cannot infer causation from fMRI alone. In the absence of a task or intervention that can directly test a brain region (eg, eliciting a virtual lesion using transcranial magnetic stimulation), we can only infer associations, but not causal links. For example, higher dorsolateral PFC (dlPFC) activation might be associated with increased placebo analgesia (ie, reduced pain
), but if one wants to test whether the dlPFC causes placebo effects on pain, one must show that placebo effects are abolished when dlPFC activity is disrupted. Indeed, researchers have done exactly this,
indicating that the dlPFC causally contributes to placebo analgesia. Although this limits the inferences that might be drawn from fMRI alone, it is also clear that activation-based fMRI studies help identify brain candidates for causal interventions, and thus their potential utility remains enormous.
A related inferential difficulty arises when group differences are observed between pain patients and healthy controls in cross-sectional studies. Without additional data, it is impossible to determine whether these differences preexisted the condition (and therefore represent predisposing or possibly causal factors) or whether they are related to the cumulative effects of living with a chronic condition. Nevertheless, such inferences are frequently implied. For example, in structural imaging, it is common to refer to observed group differences as “plasticity” or “changes,” implying (without support) that the differences represent cumulative effects of disease. Some groups have attempted to form stronger inferences by performing correlations with disease characteristics, such as pain intensity, duration, questionnaire data, and other metrics,
but even when such correlations are found, it remains difficult to determine whether they are due to cumulative effects of pain or other long-term effects of chronic disease (eg, mood and behavioral changes). Furthermore, such inferences are still limited by their correlational nature. Modeling techniques, such as dynamic causal modeling
allow inferences about the direction of information flow in neural circuits during a mental process or across participants, but they are still limited in causal inferences because they rely on the statistical variance within a cross-sectional data set. The strongest way to address this issue is to perform longitudinal studies in healthy subjects and/or chronic pain patients—scanning subjects before and after a therapeutic intervention,
Although such designs pose challenges in terms of resources and time, they allow for stronger and more mechanistically informative inferences about temporal precedence.
Logical Fallacies in Neuroimaging
Reverse inference is the widespread tendency to ascribe a function to an observed pattern of activation (eg, “we observed activation in the visual cortex, therefore the participant is probably looking at something”). This issue has been discussed in depth in other reviews
so will only be covered briefly. Reverse inference is tempting and, to some extent necessary, because the ultimate goal of scanning is elucidating mechanism. Caution is required, however, because reverse inference relies on a logical error (“affirming the consequent”). Despite the fact that the logic underlying such inferences is flawed, the probability that they are true can still be high if (as in the previously mentioned example of the visual cortex) the region in question is linked fairly exclusively with a particular function. In pain imaging, however, we frequently observe activation of regions (eg, insular and anterior cingulate cortices) that are involved in a wide variety of functions.
As a practical example of how this might affect inferences in pain studies, there is a widespread tendency to interpret observed activations in the periaqueductal gray as an indication that descending modulation is taking place. Although such inferences can be strengthened by corroborating evidence (eg, correlation with reduced pain ratings), it is commonly forgotten that the PAG is involved in many functions also relevant to pain, including escape and avoidance responses,
Three instances of this fallacy in imaging studies are group comparisons, implied lateralization, and subregional specialization. Researchers often conduct separate analyses for each group in a between-groups study (eg, to examine activation within patients and activation within controls). In such cases, it is tempting to display separate thresholded group maps and draw qualitative comparisons on the basis of the extent of activation in each group. However, one cannot assume that a region was differentially activated simply because it meets a threshold for significance in one group and not in the other: apparent differences may simply reflect near-threshold differences (eg, the P value for a particular cluster was .049 for one group and .051 for the other). To conclude that there was a group difference, the groups must be formally compared using an appropriate statistical test (ie, an interaction analysis). Relatedly, when comparing a correlation between 2 groups where 1 group shows a significant correlation with a behavioral measure, but the second group does not have a significant correlation, it is tempting to conclude that there are group differences between these correlations. However, the correlation coefficients must be normalized using Fisher z-transformation,
and then formally compared. A similar error is often made with respect to laterality claims: if significant activation is observed on one side of the brain, but not in the symmetric contralateral brain region, it is common to conclude that the effect in question is lateralized. Without formally comparing the activation between the 2 sides, however, there is no basis for ruling out that processing is bilateral. Similarly, it is common to use significant activations to draw inferences about specialization of subregions of a given structure. As an example, on the basis of anatomical tracing and other evidence, the insula has often been divided into a posterior region associated with sensory input and an anterior region involved in more abstract components of pain perception (eg, interoception).
In this primer, we reviewed fundamental considerations in the design, analysis, description, and evaluation of a pain neuroimaging experiment. Investigators must make many careful decisions when designing experiments, and reports must be written so that reviewers and readers in the community can evaluate the work appropriately, whether or not the reviewer has expertise in imaging. We hope this primer provides a foundation for pain clinicians and trainees without advanced training in neuroimaging, and encourage interested potential investigators to continue to read about the unique considerations involved in fMRI experiments by consulting recent dialogues that address these issues and their implications for the broader neuroimaging community.
Activation likelihood estimation meta-analysis of brain correlates of placebo analgesia in human experimental pain.