Performance Characteristics and Measurement Quality Objectives (MQOs)
Basic performance characteristics of an assessment and its component methods include precision (consistency and repeatability), sensitivity (detection limit), and accuracy (proximity to analytical truth). Measurement quality objectives (MQO) are control points or thresholds to provide perspective for judging acceptability of data (Shewhart 1986, Flotemersch et al. 2006, Stribling et al. 2008). They can include acceptance criteria related to consistency (precision) of field sampling, consistency and reliability of different components of laboratory processing (sorting and identification), accuracy in detecting stressor conditions, and precision of final site assessments. Although MQOs allow decisions on data acceptability, their use does not dictate how data should be produced, that is, the specific method or technology to be used (Crumbling 2001). Further, exceeding a MQO by one or more individual measurement values does not automatically indicate that the value is unacceptable; rather, it means that the value(s) should be examined in more detail to understand reasons for the exceedance (Stribling et al. 2008). For example, precision and sensitivity of benthic sampling and analysis methods can be determined using either kicknets or Surber samplers. If each method demonstrates acceptable repeatability (consistency, precision) between sample-pairs in numeric indicator values and/or assessment narratives, sample data produced by either field technique could be considered suitable for inclusion in analyses. Example MQOs are:
- Field sampling precision: Relative percent difference (RPD) for ‘n’ sample-pairs should be ≤ 15%
- Taxonomic precision: Percent taxonomic disagreement (PTD) should be ≤ 15%
- Sensitivity of a biological index to differentiate a priori determined stressed sites from reference: Discrimination efficiency (DE) should be ≥ 88%
Precision estimates are important to help interpret results from bioassessment efforts. Two fundamental requirements for a biological assessment are that 1) samples are consistently collected, and 2) the data sufficiently characterize the sample. Estimates of precision focused on these 2 requirements elevate confidence in the assessment results. A sampling or analysis method that produces consistent/repeatable results gives high precision (low values), and thus, greater confidence in overall assessments. Conversely, methods that are inconsistent and less repeatable give low precision (high values) indicating greater variability, and result in less confidence in assessment results. Precision can be estimated using several measures (e. g., ANOVA, coefficient of variation, confidence intervals, signal/noise ratio, or RPD) that evaluate sample pairs from a subset of sites, either the same or adjacent reaches, for either intra-team variability (same crews) or inter-team variability (different crews) (Kaufmann et al. 1999, Hughes et al. 1998, McCormick et al. 2001, Flotemersch et al. 2006, Stribling et al. 2008). Ideally, precision is known for all of these components. Biological protocols can be highly precise, but nonetheless have limited capacity for detecting ecological conditions, i.e., a method that is precise may have nothing to do with whether it is detecting degraded conditions (Stribling et al. 2008).
The primary purpose of using biological indicators, i.e., a calibrated index for an assemblage or specific population attributes, is the detection of stressed conditions and associated causal analyses. The sensitivity of the indicator can be characterized using discrimination efficiency (DE) (Flotemersch et al. 2006). The DE is an indication of the proportion of a priori determined (via non-biological criteria) stressed sites correctly identified as such. A high DE is desirable to minimize missing impairment in a routine monitoring and assessment program.
There are many different components of bioassessment protocols for which performance characteristics can be calculated. However, which and how many are calculated are ultimately determined by needs of the data user. Some aspect of performance characteristics should be documented and presented with any biological assessment dataset; failure to do so will likely stimulate questions regarding data quality, and this may diminish defensibility of the assessment results. Ideally, precision is known for all of these components.
Biological assessments are most useful when the methods distinguish natural and index variability (i.e., “noise”) from a true environmental effect (i.e., “signal”). Therefore, the premise is that the site is representative of the population of sites, the sample is representative of the site examined and the assemblage(s) measured, and the data produced, are an accurate reflection of that sample. States typically establish a threshold based on this signal and then add other thresholds to distinguish among higher (e.g., outstanding natural resource waters, excellent warmwater habitat, or excellent/good habitat) and lower assessment categories (e.g., limited resource waters, fair/poor/very poor).
Quality assurance programs encourage the continued documentation of variability to ensure the ability to detect long-term trends. An ongoing quality assurance program is also useful for periodically reevaluating the performance of the indicator and calibrating reference conditions. Quality assurance procedures include examination of replicate field samples at some subset of the sample units (e.g., 10% of the sites) and reexamination of a proportion of samples by an independent taxonomist. For programs in which multiple field sampling crews are used, it is important to document variability in results caused by personnel. Side-by-side sampling by different field crews is done to document the magnitude of crew variability as a source of measurement error.
Overall variability (= total uncertainty, or error) of data from any measurement system results from accumulation of error from multiple sources (Taylor 1988; Clark and Whitfield 1994; Taylor and Kuyatt 1994; Diamond et al. 1996). Error can generally be divided into two types: systematic and random. Systematic error is the type of variability that results from a method and its application or misapplication. It is composed of bias that can, in part, be mediated by using an appropriate quality assurance program. Random error results from the sample itself or the population from which it is derived, and can only partly be controlled through a careful sampling design (see Figure below). It is often not possible to separate the effects of the two types of error, and they can directly influence each other (Taylor 1988). The overall magnitude of error associated with a dataset is known as data quality. How statements of data quality are made and communicated is critical for data users and decision makers to properly evaluate the extent to which they should rely on scientific information (Peters 1988; Costanza et al. 1992). A stream assessment (in particular, a biological assessment) is a series of methods taken together as a protocol (Diamond et al. 1996; USEPA 1999) and, as such, each method can contribute to overall variability (see Figure below). Thus, it is important to know something about the quality of the data produced at each step of the process.

Regardless of the approach, the primary purpose of an analytical threshold is to establish levels of biological quality that can be used in determining attainment or non-attainment of the designated use. To facilitate water quality management decisions, the thresholds should allow for straightforward decisions including statements of uncertainty when biological data are compared against the thresholds. Decisions applying to thresholds also need to be documented in the record.
References
Clark, M.J.R. and P.H. Whitfield. 1994. Conflicting perspectives about detection limits and about the censoring of environmental data. Water Resources Bulletin 30:1063-1079.
Crumbling, D. M. 2001. Current Perspectives in Site Remediation Monitoring, EPA 542-R-01-014, October 2001.
Diamond, J.M., M.T. Barbour, and J.B. Stribling. 1996. Characterizing and comparing bioassessment methods and their results: A perspective. Journal of the North American Benthological Society 15:713-727.
Flotemersch, J. E., J. B. Stribling, and M. J. Paul. 2006. Concepts and approaches for the bioassessment of non-wadeable streams and rivers. EPA/600/R-06/127. Office of Research and Development, US Environmental Protection Agency, Cincinnati, Ohio.
Hughes, R.M., P.R. Kaufmann, A.T. Herlihy, T.M. Kincaid, L. Reynolds, and D.P. Larsen. 1998. Development and application of an index of fish assemblage integrity for wadeable streams in the Willamette Valley Ecoregion, Oregon, USA. Canadian Journal of Fisheries and Aquatic Sciences 55:1618-1631.
Kaufmann, P.R., P. Levine, E.G. Robison, C. Seeliger, and D.V. Peck. 1999. Quantifying physical habitat in wadeable streams. USEPA. 620/R-99/003. Corvallis, OR.
McCormick, F.H., R.M. Hughes, P.R. Kaufmann, D.V. Peck, J.L. Stoddard, and A.T. Herlihy. 2001. Development of an index of biotic integrity for the mid-Atlantic Highlands region. Transactions of the American Fisheries Society 130:857-877.
Peters, J.A. 1988. Quality control infusion into stationary source sampling. Chapter 22, in, Lawrence H. Keith (editor), Principles of Environmental Sampling. Pp. 317-333. ACS Professional Reference Book. ISBN 0-8412-1173-6. American Chemical Society.
Shewhart, W.A. 1986. Statistical Method from the Viewpoint of Quality Control. ISBN 0-486-65232-7. Dover Publications, New York. 155pp.
Stribling, J. B., B. K. Jessup, and D. L. Feldman. 2008. Precision of benthic macroinvertebrate indicators of stream condition in Montana. Journal of the North American Benthological Society 27:58–67.
Taylor, J.K. 1988. Defining the Accuracy, Precision, and Confidence Limits of Sample Data. Chapter 6, pages 102-107, in Lawrence H. Keith (editor), Principles of Environmental Sampling. ACS Professional Reference Book. American Chemical Society. Columbus, OH.
Taylor, B.N. and C.E. Kuyatt. 1994. Guidelines for evaluating and expressing the uncertainty of NIST measurement results. NIST Technical Note 1297. National Institute of Standards and Technology, US Department of Commerce, Washington, DC.
USEPA. 1999. Rapid Bioassessment Protocols for Use in Streams and Wadeable Rivers: Periphyton, Benthic Macroinvertebrates and Fish. Second Edition. Office of Water, Washington, D.C.
For more information, please visit:
![[logo] US EPA](http://www.epa.gov/epafiles/images/logo_epaseal.gif)