Abstract

Tests and diagnoses used in sexually violent predator (SVP) evaluations must be reliable, as reliability is foundational to validity. The current study contained a stratified sample of evaluations of 395 individuals referred as potential SVPs between 2012 and 2017. Each individual was initially evaluated by at least two experts. The sample included three groups: individuals not meeting SVP criteria (N = 200, or 400 evaluations), individuals meeting SVP criteria (N = 95, with 190 evaluations), and individuals where evaluators disagreed (N = 100, with 200 evaluations). The sample also included 200 subsequent independent evaluations on these "disagree" cases. Static-99R score intraclass coefficient (ICC) interrater reliability was good to excellent within each group and overall. Evaluators scored the Static-99R within one point of each other 87% of the time. Cohen's kappa diagnostic agreement for Pedophilic Disorder was substantial. ASPD and substance abuse kappa were in the "fair" range, while OSPD diagnoses in the positive group were at the "moderate" level of agreement. Ethnic differences in diagnoses were consistent with other studies, with equivalent Static-99R ICC values across ethnic groups. There were no significant differences between state civil servants versus contracted experts in Static-99R ratings or final determinations. The results suggest that Static-99R scores have acceptable reliability in these evaluations, and Pedophilic Disorder (the most common paraphilic disorder in our study) and OSPD can be reliably diagnosed. We discuss limitations of the study, as well as the need for care in high-stakes evaluations given the imperfect reliability of psychological measurements.

