Genes, Environments, and Mouse Behavior
John C. Crabbe
GENEOTYPES AND ENVIRONMENTS
As in any area of science, investigators seek to reproduce interesting results of behavioral and other neurobiological experiments with laboratory animals in their own laboratory setting. This generalization of research findings is a crucial part of the scientific process in several ways. Reproducibility, in the broad sense, is taken as a sign of reliability. Failures to reproduce a finding can help to prune the literature of false-positive findings. Successful exportation of a finding to multiple laboratories can allow a scientific insight to be explored using diverse methods not available to the original reporter. In the specific case of studies with stable, reproducible genotypes, the accumulation of results across laboratories is both spatial and temporal. Thus, one of the most long-standing (and reproducible) findings in the modern history of studies with inbred mouse strains is the repeated finding that inbred mice of the C57BL lineage prefer to drink alcohol solutions over plain tap water, and those of the DBA lineage are near-teetotalers, while many other inbred strains show intermediate levels of preference for alcohol (Belknap and others 1993; McClearn and Rodgers 1959; Rodgers 1972; Wahlsten and others 2003a).
However, it is nearly impossible to replicate an experiment exactly. For behavioral studies with laboratory mice, the subject of this paper, it is flatly impossible. Interest in behavioral genetics and genomics is on the rise, driven by the revolution in genomic and informatics capabilities. One of the simplest meaningful behavior genetics experiments with mice is to compare multiple inbred strains on the same task. Within a strain,
each same-sex animal is genetically identical to all others, and the individual differences among animals across strains derive from both genetic and environmental sources. When between-strain differences exceed within-strain variability, evidence for significant genetic influence is demonstrated. Because animal husbandry has begun to pay attention to the details of a mouse’s genetic background, it is possible to study the same strains on the same behavioral tasks under multiple environmental conditions. Thus, strains might be studied for their activity in a novel arena during their circadian day and night, and/or at different ages, or in a different apparatus. The extent to which mean strain responses on two tasks are correlated may be taken as an estimate of the degree to which a common set of genes influences both traits (Hegmann and Possidente 1981), and such a result would suggest the influence of common neurobiological mechanisms.
The purpose of this presentation is to discuss some examples of the interplay between genotypes and environments, drawn from the behavioral responses of inbred strains of laboratory mice. I start by distinguishing between two broad sources of environmental influences, the laboratory environment and the test environment. Features of the laboratory environment include (but are certainly not limited to) the local air supply and its humidity, local tap water, noise in the colony rooms, lighting (type, intensity, and light/dark cycle), caging, bedding, food, water delivery system, and all other aspects of husbandry practices. Many of these are unique to a given facility and cannot be exactly duplicated elsewhere (e.g., air), whereas some can be mimicked elsewhere (e.g., food, bedding). Features of the testing environment include the specific apparatus, details of the testing protocols for handling, treating, and scoring the animals, transport to and from colony and home cage, and the specific experimenters performing the work. Testing environments are somewhat more amenable to standardization. The principal point of the paper is to show that strains’ behaviors often depend on specifics of the environment. In other words, gene by environment (GXE) interactions occur, even when the exact environmental source of influence cannot be identified.
A MULTISITE TRIAL
Several years ago, my colleagues Doug Wahlsten at the University of Alberta in Edmonton, Bruce Dudek at the State University of New York at Albany, and I set out to evaluate the stability of strain differences in some simple laboratory behaviors. Our principal interest was whether the reliability of the genetic differences on a behavior we saw routinely within each of our laboratories was predictive of reliability of genetic differences
across laboratories. After numerous phone calls, meetings, and emails, we decided that one straightforward way to address this question was to standardize the laboratory and test environments nearly completely. We had also often heard during our careers that mice purchased from a supplier “behaved differently” from those reared locally, even when the same inbred genotype was studied. Such complaints were usually accompanied by the certain statement that it was the “stress of shipping” that caused the purchased animals to abandon the true path. We could find no data to support or refute this well-entrenched piece of laboratory lore. We decided to test males and females of eight genotypes in all three laboratories simultaneously on a battery of tests. We further decided to compare directly home-grown mice with those shipped from a breeder.
During the exchange of several hundred emails and more phone calls, we adopted a set of husbandry parameters in common. We purchased the same bedding and food (although the food was from local vendors), and adopted the same light/dark cycle and cage changing schedule. We purchased seven inbred strains and one F1 hybrid as breeding stock at each site, set up matings on the same day, and bred mice locally. We had age-matched mice of each genotype shipped to us for comparison. We built or purchased identical apparatus, adopted exactly the same test protocols, and when the time came, tested 379 mice for activity, elevated plus maze behavior, accelerating rotarod performance, water escape learning, and activity again after a cocaine injection. After the weekend off, mice were given a test of alcohol preference drinking.
The results were largely as we expected, but there were also surprises (Crabbe and others 1999; Wahlsten and others 2003a). For each task excepting time in open arms on a plus maze, by far the most important variable was genotype of the mice. For example, the alcohol preference differences were highly significant, but the only variable that mattered was strain (although, as was also already well known, females drank more than males). The pattern of strain differences was nearly identical in all three laboratories, and it made no difference whether animals were shipped or locally bred. Across all behaviors, the next most important variable was the site at which the test was performed. For example, mice appeared less anxious in the plus maze in Edmonton than at the other sites. Sexes rarely differed, and the effect of shipping was negligible for nearly all variables. However, there were significant GXE interactions for many tests (e.g., the response to cocaine in some strains in Edmonton). Thus, despite a ferocious level of standardization, which amounted to eliminating as much of the environmental variability as possible from the experiment, some strains responded somewhat differently in different laboratories for some tasks.
SOURCES OF INDIVIDUAL DIFFERENCES
The sources of environmental influence that led to strain-specific responses in the multisite trial could not be identified. However, a more recent experiment offers some plausible suggestions. During the early course of his several-year career ranging from postdoctoral fellow to associate professor, Jeff Mogil and his assistants had collected baseline data on a simple, spinally mediated reflex response to acute pain in mice, the tail withdrawal reflex (Chesler and others 2002a,b). Each mouse had its tail immersed in 49°C water, and the latency to remove it was recorded. In fact, 12 different experimenters had amassed data on 8,034 mice from 40 genotypes. Because of the scrupulousness of his laboratory records, he knew age, sex, weight, season of the year, humidity, temperature, cage density, time of day, and order of testing within the cage. He and his collaborator Elissa Chesler hit on the idea of mining this incredible data set to ascertain which variables best predicted individual differences in pain sensitivity. They employed a classification and regression tree (CART) analysis. This automated data-mining technique develops rules used to partition the data recursively. Essentially, it builds “trees,” somewhat resembling pedigrees, through successive branch points, serially splitting the data along the most important factors until as much of the variability in the data set as possible has been accounted for. It can be used with unwieldy data sets like this, where there are empty cells and nested factors.
A CART analysis can be used to rank order the factors for their efficacy at explaining individual differences. The most important variable in their outcome was the specific experimenter who performed the experiment. This variable was followed closely by the genotype of the mouse. Other factors that mattered a great deal were season, cage density, and time of day. The other variables were not as important. An attractive feature of this study was that they then obtained 192 new mice from three strains. These mice were tested on the same day, either in the morning or the afternoon, by one of two experimenters. This new experiment revealed the importance of the experimenter, the genotype, and the time of day. In other words, the variables predicted to be important by the CART analysis were verified in an independent study (Chesler and others 2002a,b). It is entirely possible that in the multisite trial, the specific experimenters, who necessarily differed in each laboratory, may have elicited strain-specific responses on certain tasks.
THE BABY AND THE BATH WATER
Does this mean that behavioral genetics is doomed? Are behavioral responses simply too variable, as we often hear from our molecularly
inclined colleagues? Is the answer removing the experimenter from the experiment through automation? We tend to disagree with these gloomy thoughts. Rather, we think that the stability of genetic influences is often overlooked. Genotype was the strongest effect for all behaviors in the multisite trial. As Doug Wahlsten and I have continued our work exploring GXE interaction across strains in our two laboratories, we have been studying 21 strains drawn largely from the Mouse Phenome Project A and B list (Paigen and Eppig 2000). We recently explored the literature for evidence for or against stable strain differences in behavior through the years (Wahlsten and others 2003b). We sought tasks where several of the same substrains had been used and where very similar phenotypes were studied, even though apparatus and procedures could not be exactly the same over many years. Thus, we allowed a great deal more environmental variability than we allowed in the multisite trial. For each trait, data had also been collected identically in Portland in 2002. We then correlated the data for older studies with those gathered in Edmonton in 2002.
Another piece of untested laboratory lore is that morphology is less variable than behavior. One trait for which there are many historical data is mouse brain weight. Indeed, in addition to Edmonton and Portland data for 21 strains from 2002, we found eligible studies in 2000, 1973, and 1967. The correlations with Edmonton data for the Portland 2002, 2000, and 1973 studies were all between .84 and .97. These account for 71 to 95% of the variance. However, the oldest study correlated less well with the modern study (r2 = 0.23), though it was based on only four strains. For open field activity, we found studies from another laboratory in 2003, the Portland 2002 data, and studies from 1968 and 1953. All four correlations yielded r2 = 0.90! Clearly, activity in mice is at least as stable across laboratories (and decades) as brain weight, and appears to be more so. The findings were not all so stable however. Although Portland and Edmonton’s 2002 elevated plus maze outcomes correlated (r 2= 0.78), a study from 1993 showed only a very modest relationship (r2 = 0.37). because three of the seven strains in common behaved very differently in the two laboratories.
Understanding complex traits can be advanced through studies with mouse genetic models. However, modeling genetic effects cannot rely on simplistic assumptions about the environment. Although any careful experimenter standardizes conditions within his or her own laboratory to achieve reliable genetic results, it cannot be assumed that within-laboratory reliability translates directly into across-laboratory reliability. Some features of the laboratory environment are nearly impossible to duplicate.
Attempts to standardize the test environment can help improve reproducibility across laboratories, but are not a panacea. Some caveats to enforced standardization of conditions ranging from husbandry to apparatus and protocols should be considered. First, use of a single set of standard conditions could lead to false-negative conclusions. For example, if the effect of a genetically engineered null mutation is not apparent under the standard conditions, and every laboratory adopts them, a real gene effect could be missed. Second, a good deal of time could be wasted exploring apparent gene effects that actually only occur in the standard conditions. Finally, failure to explore a range of environmental conditions may underestimate the actual genetic influence, which is very likely to be expressed as GXE interaction.
These studies were supported by grants from the National Institutes of Health (NIAAA and NIDA) and the Department of Veterans Affairs, and by the National Sciences and Engineering Research Council of Canada.
Belknap, J.K., Crabbe, J.C., Young, E.R. 1993. Voluntary consumption of ethanol in 15 inbred mouse strains. Psychopharmacology 112:503-510.
Chesler, E.J., Wilson, S.G., Lariviere, W.R., Rodriguez-Zas, S.L., Mogil, J.S. 2002a. Identification and ranking of genetic and laboratory environment factors influencing a behavioral trait, thermal nociception, via computational analysis of a large data archive. Neurosci Biobehav Rev 26:907-923.
Chesler, E.J., Wilson, S.G., Lariviere, W.R., Rodriguez-Zas, S.L., Mogil, J.S. 2002b. Influences of laboratory environment on behavior. Nat Neurosci 5:1101-1102.
Crabbe, J.C., Wahlsten, D., Dudek, B.C. 1999. Genetics of mouse behavior: interactions with laboratory environment. Science 284:1670-1672.
Hegmann, J.P., and Possidente, B. 1981. Estimating genetic correlations from inbred strains. Behav Genet 11:103-114.
McClearn, G.E., and Rodgers, D.A. 1959. Differences in alcohol preference among inbred strains of mice. Q J Stud Alcohol 20:691-695.
Paigen, K., and Eppig, J.T. 2000. A mouse phenome project. Mamm Genome 11:715-717.
Rodgers, D.A. 1972. Factors underlying differences in alcohol preference in inbred strains of mice. In: Kissin B, Begleiter H, eds. The Biology of Alcoholism. New York: Plenum. p. 107-130.
Wahlsten, D., Metten, P., Phillips, T.J., Boehm II, S.L., Burkhart-Kasch, S., Dorow, J., Doerksen, S., Downing, C., Fogarty, J., Rodd-Henricks, K., Hen, R., McKinnon, C.S., Merrill, C.M., Nolte, C., Schalomon, M., Schlumbohm, J.P., Sibert, J.R., Wenger, C.D., Dudek, B.C., Crabbe, J.C. 2003a. Different data from different labs: Lessons from studies of gene-environment interaction. J Neurobiol 54:283-311.
Wahlsten, D., Mosher, T., Crabbe, J.C. 2003b. (In)stability of brain size and behavior over decades in different laboratories. (Abstract). Int Behav Neur Genet Soc Abstr.