DCP 2.0 Osteometric Data

Published: 18 February 2018| Version 1 | DOI: 10.17632/6xwhzs2w38.1
Natalie Langley,


This study evaluates the reliability of osteometric data commonly used in forensic case analyses, with specific reference to the measurements in Data Collection Procedures 2.0 (DCP 2.0). Four observers took a set of 99 measurements four times on a sample of 50 skeletons (each measurement was taken 200 times by each observer). Two-way mixed ANOVAs and repeated measures ANOVAs with pairwise comparisons were used to examine interobserver (between-subjects) and intraobserver (within-subjects) variability. Relative technical error of measurement (TEM) was calculated for measurements with significant ANOVA results to examine the error among a single observer repeating a measurement multiple times (e.g. repeatability or intraobserver error), as well as the variability between multiple observers (interobserver error). Two general trends emerged from these analyses: (1) maximum lengths and breadths have the lowest error across the board (TEM < 0.5), and (2) maximum and minimum diameters at midshaft are more reliable than their positionally-dependent counterparts (i.e. sagittal, vertical, transverse, dorso-volar). Therefore, maxima and minima are specified for all midshaft measurements in DCP 2.0. Twenty-two measurements were flagged for excessive variability (either interobserver, intraobserver, or both); 15 of these measurements were part of the standard set of measurements in Data Collection Procedures for Forensic Skeletal Material, 3rd edition. Each measurement was examined carefully to determine the likely source of the error (e.g. data input, instrumentation, observer’s method, or measurement definition). For several measurements (e.g. anterior sacral breadth, distal epiphyseal breadth of the tibia) only one observer differed significantly from the remaining observers, indicating a likely problem with the measurement definition as interpreted by that observer; these definitions were clarified in DCP 2.0 to eliminate this confusion. Other measurements were taken from landmarks that are difficult to locate consistently (e.g. pubis length, ischium length); these measurements were omitted from DCP 2.0. This manual is available for free download online (https://fac.utk.edu/wp-content/uploads/2016/03/DCP20_webversion.pdf), along with an accompanying instructional video (https://www.youtube.com/watch?v=BtkLFl3vim4). Observer experience also played a role in the ability to consistently reproduce measurements. Average intraobserver relative TEM values of the measurements in Table 3 from lowest to highest were 2.31 (Observer 2), 3.25 (Observer 1), 3.36 (Observer 3), and 3.41 (Observer 4). Observer 2 had the lowest TEM for most measurements, and Observer 4 had the highest TEM most frequently. While Observer 1 had the most experience in number of years (27 years), Observer 2 had more technical training than any other observers. Observer 2 had 14 years of experience, but had measured approximately 900 skeletons (more than any other observer) during this time.


Steps to reproduce

The osteometric data was collected on a random sample of William M. Bass Donated Collection skeletons (n=50). Four observers measured the left elements of 50 skeletons. The observers were assigned numbers based on experience level, with Observer 1 having the most experience (27 years) and Observer 4 having the least experience (3 years). Ninety-nine measurements were taken on each skeleton using the instrument specified in the measurement definition in Data Collection Procedures, 3rd edition (e.g. spreading calipers, digital sliding calipers, tape measure, osteometric board, mandibulometer). Once all 50 skeletons were measured, the process was repeated for a total of four rounds. Observers were provided copies of Data Collection Procedures for Forensic Skeletal Material [Moore-Jansen et al., 1994] and Cranial Variation in Man [Howells, 1972]; the latter describes how to locate cranial landmarks if sutures are obliterated, Wormian or apical bones are present, etc. Instruments were calibrated with calibration rods before each measuring session, and the following conditions were modeled to establish the repeatability of the measurements according to the National Institute of Standards and Technology’s Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results: 1. The measurement procedure was performed the same each time. 2. The same observer performed each measurement with the same measuring instrument.


Mercyhurst University, University of Tennessee, Mayo Clinic Arizona


Error Analysis, Approaches in Anthropology, Forensic Anthropology, Observer Variation, Biological Anthropology