Original Report| Volume 16, ISSUE 5, P454-462, May 2015

Improving QST Reliability—More Raters, Tests, or Occasions? A Multivariate Generalizability Study

  • Søren O'Neill
    Address reprint requests to Søren O'Neill, PhD, MRehab, Spine Center of Southern Denmark, Lillebælt Hospital, Østre Hougvej 55, 5500 Middelfart, Denmark.
    Spine Center of Southern Denmark, Lillebælt Hospital, Middelfart, Denmark

    Institute of Regional Health Science Research, University of Southern Denmark, Odense, Denmark
    Search for articles by this author
  • Lotte O'Neill
    Centre for Medical Education, Aarhus University, INCUBA Science Park Skejby, Århus N, Denmark
    Search for articles by this author
Published:February 12, 2015DOI:


      • Disturbed pain sensitivity may be a complicating factor in clinical pain states.
      • Pain sensitivity can be assessed using quantitative sensory testing (QST).
      • The reliability of QST was improved substantially by applying a battery of tests.
      • Reliability was also improved substantially by repeated testing (occasions).
      • Little was gained from testing with different raters or differentiated weighting.


      The reliability of quantitative sensory testing (QST) is affected by the error attributable to both test occasion and rater (examiner) and the interactions between them. Most reliability studies account for only 1 source of error. The present study employed a fully crossed, multivariate generalizability design to account for rater and occasion variance simultaneously. Nineteen healthy volunteers were examined with a battery of 7 QST procedures 4 times on 2 occasions by 2 raters. The QST battery was composed to include a mix of different pain stimuli and response domains, including threshold, intensity, tolerance, and modulation with mechanical, thermal, and chemical stimuli. The classical test-retest and interrater reliability (.19 < intraclass correlation coefficient <.92) was in line with the literature, and generalizability analysis indicated that the universe score was generally the dominant source of variation (relative contribution = 19%, 78%). Error attributable to the interaction between study participant and occasion was also influential. Dependability coefficients indicated that a substantial increase in reliability and feasibility could be achieved by employing a composite QST battery compared to single QST procedures. Reliability was improved more by repeated testing on separate occasions than by repeated testing by different raters.


      When balancing reliability and feasibility, the current findings suggest that a carefully selected battery of QST procedures repeated on a few occasions may be optimal.

      Key words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to The Journal of Pain
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Backonja M.
        • Attal N.
        • Baron R.
        • Bouhassira D.
        • Drangholt M.
        • Dyck P.J.
        • Edwards R.R.
        • Freeman R.
        • Gracely R.
        • Haanpaa M.H.
        • Hansson P.
        • Hatem S.M.
        • Krumova E.K.
        • Jensen T.S.
        • Maier C.
        • Mick G.
        • Rice A.S.
        • Rolke R.
        • Treede R.-D.
        • Serra J.
        • Toelle T.
        • Tugnoli V.
        • Walk D.
        • Walalce M.S.
        • Ware M.
        • Yarnitsky D.
        • Ziegler D.
        Value of quantitative sensory testing in neurological and pain disorders: NeuPSIG consensus.
        Pain. 2013; 154: 1807-1819
      1. Brennan RL: Computer Programs - Center for Advanced Studies in Measurement and Assessment (CASMA) - College of Education - The University of Iowa [Internet]. [cited 2014 Jan 1]. Available at:

        • Brennan R.L.
        Generalizability theory.
        Springer, New York2001
        • Chesterton L.S.
        • Sim J.
        • Wright C.C.
        • Foster N.E.
        Interrater reliability of algometry in measuring pressure pain thresholds in healthy humans, using multiple raters.
        Clin J Pain. 2007; 23: 760-766
        • Chong P.S.T.
        • Cros D.P.
        Technology literature review: Quantitative sensory testing.
        Muscle Nerve. 2004; 29: 734-747
        • Cicchetti D.V.
        Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology.
        Psychol Assess. 1994; 6: 284-290
        • Geber C.
        • Klein T.
        • Azad S.
        • Birklein F.
        • Gierthmühlen J.
        • Huge V.
        • Lauchart M.
        • Nitzsche D.
        • Stengel M.
        • Valet M.
        • Baron R.
        • Maier C.
        • Tölle T.
        • Treede R.-D.
        Test-retest and interobserver reliability of quantitative sensory testing according to the protocol of the German Research Network on Neuropathic Pain (DFNS): A multi-centre study.
        Pain. 2011; 152: 548-556
        • Giesbrecht R.J.S.
        • Battié M.C.
        A comparison of pressure pain detection thresholds in people with chronic low back pain and volunteers without pain.
        Phys Ther. 2005; 85: 1085-1092
        • Jones D.H.
        • Kilgour R.D.
        • Comtois A.S.
        Test-retest reliability of pressure pain threshold measurements of the upper limb and torso in young healthy women.
        J Pain Off J Am Pain Soc. 2007; 8: 650-656
        • Kane M.
        in: Brennan R.L. Educational Measurement. 4th ed. ACE/Praeger, Westport, CT2006: 17-64
        • Kumar M.
        • Narayan J.
        • Verma N.S.
        • Saxena I.
        Variation in response to experimental pain across the menstrual cycle in women compared with one month response in men.
        Indian J Physiol Pharmacol. 2010; 54: 57-62
        • Maquet D.
        • Croisier J.-L.
        • Demoulin C.
        • Crielaard J.-M.
        Pressure pain thresholds of tender point sites in patients with fibromyalgia and in healthy controls.
        Eur J Pain Lond Engl. 2004; 8: 111-117
        • Meeus M.
        • Nijs J.
        • Huybrechts S.
        • Truijen S.
        Evidence for generalized hyperalgesia in chronic fatigue syndrome: A case control study.
        Clin Rheumatol. 2010; 29: 393-398
        • Moloney N.A.
        • Hall T.M.
        • Doody C.M.
        Reliability of thermal quantitative sensory testing: A systematic review.
        J Rehabil Res Dev. 2012; 49: 191-207
        • Nussbaum E.L.
        • Downes L.
        Reliability of clinical pressure-pain algometric measurements obtained on consecutive days.
        Phys Ther. 1998; 78: 160-169
        • Olesen A.E.
        • Staahl C.
        • Ali Z.
        • Drewes A.M.
        • Arendt-Nielsen L.
        Effects of paracetamol combined with dextromethorphan in human experimental muscle and skin pain.
        Basic Clin Pharmacol Toxicol. 2007; 101: 172-176
        • O’Neill S.
        • Manniche C.
        • Graven-Nielsen T.
        • Arendt-Nielsen L.
        Association between a composite score of pain sensitivity and clinical parameters in low-back pain.
        Clin J Pain. 2014; 30: 831-838
        • Paungmali A.
        • Sitilertpisan P.
        • Taneyhill K.
        • Pirunsan U.
        • Uthaikhup S.
        Intrarater reliability of pain intensity, tissue blood flow, thermal pain threshold, pressure pain threshold and lumbo-pelvic stability tests in subjects with low back pain.
        Asian J Sports Med. 2012; 3: 8-14
        • Persson A.L.
        • Brogårdh C.
        • Sjölund B.H.
        Tender or not tender: test-retest repeatability of pressure pain thresholds in the trapezius and deltoid muscles of healthy women.
        J Rehabil Med. 2004; 36: 17-27
        • Potter L.
        • McCarthy C.
        • Oldham J.
        Algometer reliability in measuring pain pressure threshold over normal spinal muscles to allow quantification of anti-nociceptive treatment effects.
        Int J Osteopath Med. 2006; 9: 113-119
        • Prushansky T.
        • Dvir Z.
        • Defrin-Assa R.
        Reproducibility indices applied to cervical pressure pain threshold measurements in healthy subjects.
        Clin J Pain. 2004; 20: 341-347
        • Pryseley A.
        • Ledent E.Y.
        • Drewes A.M.
        • Staahl C.
        • Olesen A.E.
        • Arendt-Nielsen L.
        Applying concepts of generalizability theory on data from experimental pain studies to investigate reliability.
        Basic Clin Pharmacol Toxicol. 2009; 105: 105-112
        • Sand T.
        • Zwart J.A.
        • Helde G.
        • Bovim G.
        The reproducibility of cephalic pain pressure thresholds in control subjects and headache patients.
        Cephalalgia Int J Headache. 1997; 17: 748-755
        • Schenk P.
        • Laeubli T.
        • Klipstein A.
        Validity of pressure pain thresholds in female workers with and without recurrent low back pain.
        Eur Spine J. 2007; 16: 267-275
        • Smidt N.
        • van der Windt D.A.
        • Assendelft W.J.
        • Mourits A.J.
        • Devillé W.L.
        • de Winter A.F.
        • Bouter L.M.
        Interobserver reproducibility of the assessment of severity of complaints, grip strength, and pressure pain threshold in patients with lateral epicondylitis.
        Arch Phys Med Rehabil. 2002; 83: 1145-1150
        • Staahl C.
        • Reddy H.
        • Andersen S.D.
        • Arendt-Nielsen L.
        • Drewes A.M.
        Multi-modal and tissue-differentiated experimental pain assessment: Reproducibility of a new concept for assessment of analgesics.
        Basic Clin Pharmacol Toxicol. 2006; 98: 201-211
        • Streiner D.L.
        • Norman G.R.
        Health Measurement Scales: A Practical Guide to Their Development and Use.
        Oxford University Press, Incorporated, 2003
        • Vanderweeën L.
        • Oostendorp R.A.B.
        • Vaes P.
        • Duquet W.
        Pressure algometry in manual therapy.
        Man Ther. 1996; 1: 258-265
        • Vaughan B.
        • McLaughlin P.
        • Gosling C.
        Validity of an electronic pressure algometer.
        Int J Osteopathic Med. 2007; 10: 24-28
        • Walton D.M.
        • Macdermid J.C.
        • Nielson W.
        • Teasell R.W.
        • Chiasson M.
        • Brown L.
        Reliability, standard error, and minimum detectable change of clinical pressure pain threshold testing in people with and without acute neck pain.
        J Orthop Sports Phys Ther. 2011; 41: 644-650
        • Wessel J.
        The reliability and validity of pain threshold measurements in osteoarthritis of the knee.
        Scand J Rheumatol. 1995; 24: 238-242
        • Wilder-Smith O.H.G.
        Quantitative sensory testing.
        in: Gebhart G.F. Schmidt R.F. Encycl Pain [Internet]. Springer Berlin Heidelberg, 2013: 3334-3339 ([cited 2015 Jan 4]. Available at:)
        • Van Wilgen P.
        • van der Noord R.
        • Zwerver J.
        Feasibility and reliability of pain pressure threshold measurements in patellar tendinopathy.
        J Sci Med Sport. 2011; 14: 477-481
        • Wylde V.
        • Palmer S.
        • Learmonth I.D.
        • Dieppe P.
        Test-retest reliability of quantitative sensory testing in knee osteoarthritis and healthy participants.
        Osteoarthritis Cartilage. 2011; 19: 655-658
        • Ylinen J.
        • Nykänen M.
        • Kautiainen H.
        • Häkkinen A.
        Evaluation of repeatability of pressure algometry on the neck muscles for clinical use.
        Man Ther. 2007; 12: 192-197