over the past 60 years, including the open field test, elevated plus-maze, and light/dark box. He explained that these approach/avoidance tests are based on the simple premise that small prey animals such as rats and mice have an innate aversion to exploring open, brightly lit areas where the risk of predation is presumably high, yet at the same time they have a natural drive to explore novel, potentially fruitful environments where they might find food, mates, or new territory.
The conceptual framework of these tests is straightforward, but each laboratory conducting anxiety testing uses what they believe to be the best apparatus and testing approach. The question then, Holmes said, was whether this variability affects the ability to reproduce findings across laboratories and across studies. To illustrate the complexities of this issue Holmes highlighted three studies. As background, he noted that it has been known for many decades that genetically inbred, isogenic strains of mice differ in various phenotypes, including measures related to anxiety. Using these inbred strains restricts the amount of variability in the population and presumably increases the ability to detect influences due to an environmental or a procedural difference.
The first study Holmes described compared the results of standard tests and assays for anxiety across four different laboratories involved in a consortium project (Mandillo et al., 2008). It was acknowledged that differences in equipment and apparatus could be a possible confound in standardization. Each laboratory was allowed to use the apparatuses already in place, and there were no attempts to equate variables such as housing or the vendor from which the mice were purchased. In one test, for example, using the percentage of time spent in the center of the open field as a measure of anxiety-like behavior, there were marked differences between mouse strains within a laboratory. Yet, although the magnitude of the differences varied among laboratories, trends were preserved. The authors concluded that despite differences in equipment, vendors, and housing across laboratories, the results were reproducible and robust. They also suggested possible confounds that might limit tighter replication, including experimenter experience, animal husbandry, apparatus differences, and clarity of the standard operating procedure used.
In the second case highlighted by Holmes, the investigators went to “extraordinary lengths to equate the test apparatus, protocols, and all possible features of animal husbandry” that they could control (Crabbe et al., 1999, p. 1670). Across a battery of different tests, the sites sought to