functionally interoperable with the clinical-trial registration database, ClinicalTrials.gov.

THE RESEARCHER PERSPECTIVE: COLLECTING, ANALYZING, AND REPORTING SEX-SPECIFIC DATA

Researchers encounter barriers to the reporting of sex-specific biomedical research results well before the publication stage, said session moderator Jon Levine, director of the Wisconsin National Primate Research Center and editor-in-chief of Frontiers in Neuroendocrinology. Challenges emerge in designing experiments, applying for grants, and making the most of limited funding inasmuch as these activities build on the existing knowledge base, which is historically biased toward males.

Collecting the Data: Sex in Biomedical Research

The Politics of Sex Differences

Biases against studying females are embedded in the research culture, and there are numerous misconceptions, said Larry Cahill, professor of neurobiology and behavior at the University of California, Irvine. In neuroscience, for example, some think that if there is no behavioral difference between the sexes, there is no brain difference. It is known, however, that identical behaviors can be manifested through different neurobiologic mechanisms. Others assert that consideration of sex differences makes things more complicated. But analyzing data by sex can sometimes provide clarity.

Cahill offered an example of sex differences in brain function from his work on emotional memory. He discovered that the amygdala operates differently in men and the women when they watch the same emotional event; activity in the left-hemisphere amygdala is more predictive of memory of a given event in women, while activity in the right-hemisphere amygdala is more predictive of memory of the same event in men.

The greatest obstacle to moving forward, Cahill said, is the profound biases that exist against the consideration of sex differences. Such biases may be even greater in studies of the brain. Sex differences in the liver or kidneys are not particularly controversial, but sex differences in the brain can become a political issue. Cahill said that researchers need



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 11
WORKSHOP SUMMARY 11 functionally interoperable with the clinical-trial registration database, ClinicalTrials.gov. THE RESEARCHER PERSPECTIVE: COLLECTING, ANALYZING, AND REPORTING SEX- SPECIFIC DATA Researchers encounter barriers to the reporting of sex-specific biomedical research results well before the publication stage, said session moderator Jon Levine, director of the Wisconsin National Primate Re- search Center and editor-in-chief of Frontiers in Neuroendocrinology. Challenges emerge in designing experiments, applying for grants, and making the most of limited funding inasmuch as these activities build on the existing knowledge base, which is historically biased toward males. Collecting the Data: Sex in Biomedical Research The Politics of Sex Differences Biases against studying females are embedded in the research culture, and there are numerous misconceptions, said Larry Cahill, pro- fessor of neurobiology and behavior at the University of California, Irvine. In neuroscience, for example, some think that if there is no behav- ioral difference between the sexes, there is no brain difference. It is known, however, that identical behaviors can be manifested through different neurobiologic mechanisms. Others assert that consideration of sex differences makes things more complicated. But analyzing data by sex can sometimes provide clarity. Cahill offered an example of sex differences in brain function from his work on emotional memory. He discovered that the amygdala operates differently in men and the women when they watch the same emotional event; activity in the left-hemisphere amygdala is more pre- dictive of memory of a given event in women, while activity in the right- hemisphere amygdala is more predictive of memory of the same event in men. The greatest obstacle to moving forward, Cahill said, is the pro- found biases that exist against the consideration of sex differences. Such biases may be even greater in studies of the brain. Sex differences in the liver or kidneys are not particularly controversial, but sex differences in the brain can become a political issue. Cahill said that researchers need

OCR for page 11
12 SEX-SPECIFIC REPORTING OF SCIENTIFIC RESEARCH to be bold and assert that male-only studies are not good enough any- more. How many false conclusions have been published as a result of failure to consider sex differences? he asked. Male Bias in Animal Studies One argument for the preferential use of males in animal studies, said Rae Silver, Kaplan Professor of Natural and Physical Sciences at Barnard College and Columbia University, is that females are more vari- able than males, partly because of cyclic reproductive hormones. There is evidence that some behaviors exhibit cycle-related variations, but in most instances there is little or no evidence that such variations make female models inappropriate. But the arguments persist. One commentary cited by Silver de- scribed how a particular rat model of arthritis was more reproducible in male rats and that therefore far fewer males than females were needed to achieve statistically significant results. The researcher asserted, however, that the results were applicable to both sexes. One argument that is true is that the cyclic nature of female sex hormones necessitates larger samples and more test groups in rodent work. Studying females requires more time, is more labor-intensive, and is more expensive than studying only males. Researchers must often jus- tify the cost, as well as the increased use of animals, to their administra- tion or institutional animal-care-and-use committee. Silver questioned whether it would be possible to require the animal-research community to include both males and females when ap- propriate, as has been done for humans. Workshop participant Vivian Pinn, director of ORWH, responded that it takes great effort for NIH to monitor the mandated inclusion of women and minorities in clinical tri- als, and it could become overwhelming to monitor the sex of animals in studies in the same way. It would be more practical, and probably as ef- fective, if researchers knew that information on the sex of animals was desired or required when submitting the results of studies for publication. Sex Differences Across the Full Spectrum of Research Denise Faustman, director of the Immunobiology Laboratory at Massachusetts General Hospital, noted that three large phase 3 clinical trials of type I diabetes products had recently failed; together, they were estimated to have cost over $3 billion to conduct. In two of those trials, she said, enrollment of males and females was fairly well balanced—

OCR for page 11
WORKSHOP SUMMARY 13 about 60–70% men. The preclinical data that informed the human trials, however, were obtained solely in female mice. She asked how these large, expensive trials might have been designed differently if phar- macokinetics or responsiveness or the stage of the disease had been studied in both male and female animals. The blame for failed clinical trials is shared equally by the clinical researchers who design and con- duct the trials and the basic researchers who continue to publish data on only males or only females because it is easier. Sex differences must be considered and reported across the whole spectrum of research, Faustman said. Subpopulations of Males and Females A participant pointed out that males and females constitute broad subpopulations that can each be divided. For example, women in the fol- licular phase are different from women in a luteal phase; prepubertal women are different from postpubertal women; women taking hormone- replacement therapy are different from women who are not; and women taking estradiol and progesterone are different from women taking Premarin with hydroxyprogesterone acetate. Similarly, men taking an- drogens are different from men who are not. Those who understand or study reproduction or endocrinology are more aware of these issues, but researchers in other fields often are not. A challenge is how to make re- searchers more aware. Cahill concurred, noting that in his early work on emotional memory he simply divided subjects into men and women, but he later discovered that the division had led to false conclusions. He failed to find enhancing effects of stress hormones on memory in wom- en, he explained, because he had not accounted for menstrual cycle or the use of hormonal contraception. Analyzing the Data: Methods of Subgroup Analysis Analysis and Interpretation of Subgroups Clinical-trial data reflect groups of participants, explained John B. Wong, chief of the Division of Clinical Decision Making at Tufts Medical Center, but each patient that a physician sees is a unique indi- vidual with unique risk factors, genetic profile, experiences, and medica- tions. The question is which of the participants in a randomized controlled trial is the same as the patient about to be treated. That is the driving force for subgroup analysis.

OCR for page 11
14 SEX-SPECIFIC REPORTING OF SCIENTIFIC RESEARCH Wong offered a cautionary tale about subgroup analysis. The In- ternational Study of Infarct Survival, a randomized controlled trial of thousands of patients, found an overall statistical benefit of aspirin over placebo in prevention of death (ISIS-2, 1988). Sleight (2000) conducted an analysis of 12 subgroups and identified two that had a nonsignificant adverse effect. Those two subgroups, Wong revealed, were participants whose astrologic signs were Gemini and Libra. That is amusing at first, but Sleight, a noted statistician, stressed in his publication that “when clinicians believe such subgroup analyses, there is real danger of harm to the individual patient” (Sleight, 2000, p. 25). Frequentist Statistics and Null-Hypothesis Errors The frequentist statistical perspective, sometimes called the null hypothesis, begins with the position that a drug and a placebo are equal. Given that assumption, any observed differences in results would be due to chance. Given the alternative hypothesis that the drug and the placebo are different, observed differences in results would be due to differences between the drug and the placebo, but the null hypothesis is easier to test. Wong pointed out the problems of type I and type II errors, and the often greater concern about the former, and the problem of statistical power where an inadequate sample size increases the chance of a type II error. Wong further explained that two types of errors can occur in asso- ciation with a hypothesis that there is no difference between drug and placebo (Table 1): either the drug is truly beneficial or not, and the study either suggests that the drug is beneficial or not. A type I error occurs when the study results show that the drug is beneficial but in fact it is not—a false positive. There is less than a 0.05 probability ( = 0.05) that this would be the case if it were assumed that the drug was equivalent to the placebo. A type II error occurs when the study results show that the drug is not beneficial but, in fact, it is—a false negative. There is usually a probability of 0.1–0.2 ( = 0.2 or = 0.2) that this would be the case if it were assumed that the drug was equivalent to the placebo. The consequence of these two kinds of errors in subgroup analy- sis is multiplicity. For a type II error, if a drug is truly beneficial (the unknown truth is that it works), the probability that the study will erro- neously find the drug to be not beneficial is about 20% [1 80% = 20%]. Assuming that each subgroup is independent, and two subgroups are ana- lyzed, the probability of erroneously finding the drug to be not beneficial in at least one subgroup increases to 36% [1 (80%)(80%) = 36%]. With

OCR for page 11
WORKSHOP SUMMARY 15 TABLE 1 Errors of Hypothesis Testing Truth Drug Beneficial Drug Not Beneficial Study Drug 1 = 0.80 = 0.05 Result Beneficial Power Type I error Drug Not = 0.20 1 = 0.95 Beneficial Type II error SOURCE: Wong, 2011, Slide 6. 12 subgroups, there is a 93% chance of an erroneous finding that the drug is not effective in at least one subgroup [1 (80%12) = 93%]. For a type I error, if the drug is truly not beneficial, the probabil- ity that the study will erroneously find it to be beneficial is 5% if there are no subgroups [1 95% = 5%], 10% if there are two independent subgroups [1 (95%)(95%) = 10%], and 46% if there are 12 subgroups [1 (95%12) = 46%]. Having described the general concerns subgroup analysis, Wong suggested Bayesian statistical inference as one possible approach to re- porting of sex-based subgroups. Bayesian inference is a method of show- ing how knowledge or belief is altered by data (for further background, see Goodman, 1999). It provides a framework for combining prior belief or evidence with current evidence. The FDA guidance on using Bayesian methods for medical-device clinical trials, Wong said, describes it as “learning from evidence as it accumulates” (FDA, 2010, p. 5). To illustrate the use of Bayesian inference, Wong asked: What is the probability that an asymptomatic woman 40–50 years old with a positive mammogram has breast cancer? Prior knowledge is that about 0.8% of asymptomatic 40- to 50-year-old women have breast cancer. In other words, of 1,000 asymptomatic women, based on prior knowledge of prevalence, eight (0.8%) would have breast cancer. Seven of those eight (90%) would have positive mammograms. However, 69 (7%) of the remaining 992 women who do not have breast cancer would have positive mammograms. The Bayes rule, or a Bayesian interpretation, Wong explained, would suggest that the probability of breast cancer in those with positive mammograms is 7 of the total positive mammograms (7 + 69), or 9%, because so many more women do not have breast cancer

OCR for page 11
16 SEX-SPECIFIC REPORTING OF SCIENTIFIC RESEARCH than have breast cancer.4 Most physicians, Wong noted, guess that the likelihood is over 90%. Another way to look at the data is with what Wong referred to as a likelihood ratio. If a patient has a positive mammogram, the likelihood that she has breast cancer is 90%, and the likelihood that she does not is 7%. Hence, the patient is 13 times as likely to have breast cancer as not if she has a positive mammogram (90% ÷ 7% = 13). Wong also described the Bayes factor, which compares how well a hypothesis predicts the data (for further background, see Goodman, 1999). All information from a clinical trial is taken into account in the Bayes factor, Wong noted; the Bayes factor indicates the likelihood of an effect discussed above. In essence, it is the probability of the data given the null hypothesis vs the probability of the data given the alternative hypothesis. As opposed to the frequentist statistical perspective discussed above, there is a separation between the probability of error, which is the null hypothesis, and the weight of the evidence from a particular clinical trial, which is the Bayes factor. In other words, a Bayesian integration gains strength from prior information whereas a frequentist approach cannot. A Bayesian approach formally integrates prior knowledge with data (“sequential learning”). However, it requires a subjective prior be- lief or evidence; conclusions depend on the prior evidence, and different investigators may use different prior evidence (which may actually help to determine how robust the conclusions are). A Bayesian approach can be used for hierarchical modeling, which combines results or “borrows strength” from different studies. For example, if the national prevalence of diabetes in the United Kingdom is 2% with a standard deviation of 0.5% and, in a local sample of 1,000 patients in a given city, 1.5% have diabetes, with the Bayesian framework the national and local data could be integrated to estimate that 1.7% of the patients in the city have diabe- tes (with a 95% credible interval of 1.2–2.4%). In contrast, the frequentist approach could not integrate the national data and would es- timate that 1.5% of the patients in the city have diabetes (with a 95% confidence interval of 0.8–2.5%). It has also been suggested that a Bayesian approach can be used in the design and conduct clinical trials and would facilitate flexibility, including adaptive randomization and stopping criteria (Berry, 2005). 4 Wong referred participants to Calculated Risks by Gerd Gigerenzer (2002) for further discussion.

OCR for page 11
WORKSHOP SUMMARY 17 Wong pointed out, however, how assumptions about prior evi- dence can affect interpretation of a new study and have large effects on the conclusions drawn. Berlin suggested the need for a “research czar” that could help to facilitate some level of consistency among similar studies, for example, in common terminology and definitions. Wong noted that the Patient-Centered Outcomes Research Institute has a meth- odology committee that is attempting to address some of the issues, such as methodological standards, that would help facilitate assessment of data among studies. From an industry perspective, Berlin said, a barrier to sharing clinical-trial data is that participants sign an agreement that dictates whether and how their information can be shared. He said that sponsors should develop participant agreements to facilitate sharing. Risk Stratification Frank Davidoff, editor emeritus of Annals of Internal Medicine, suggested risk-stratification analysis as an alternative to Bayesian statis- tics for applying clinical-trial results to an individual patient, and he re- ferred participants to the work of Kent and Hayward (2007a,b). When multiple risk factors are used to segregate a sizable study population into risk subgroups, the difference in rates of outcomes can be as great as a factor of 50, he said. For example, a drug that demonstrates an overall beneficial effect may have virtually no beneficial effect in some sub- groups, probably because their baseline risk is small to begin with. At the other extreme, the intervention may have a large clinical effect in pa- tients who have a high baseline risk. For many researchers, risk stratifi- cation is less difficult to grasp than Bayesian analysis, and Davidoff suggested that it is statistically robust. Risk-stratification analysis can be applied to existing trials to look for differences in intervention effects among different groups, including sex. There are methodologic challeng- es to risk stratification, he noted, including the need for an independent determination of the risk groups, and there is a potential for type I and type II errors. Reporting the Data A Role for Journals Silver referred participants to the report of a 2010 IOM work- shop, Sex Differences and Implications for Translational Neuroscience

OCR for page 11
18 SEX-SPECIFIC REPORTING OF SCIENTIFIC RESEARCH Research, which focused on defining roles for industry, government, academe, and journals in the translation of sex differences in neurosci- ence from bench to bedside. One of the suggestions raised at that work- shop was that journal publishers set standards “for the inclusion of sex-related subject information in all publications, including sex of origin of tissues, cell lines, etc.” and “establish guidelines to encourage authors to analyze data by sex and to report sex differences, or the lack thereof” (IOM, 2011, p. 77). It was noted, however, that it is not possible to study everything all the time, and one of the challenges raised at the 2010 workshop was to set priorities. Silver cited the work of Beery and Zucker (2011), who analyzed the distribution of animal and human male and female subjects in pub- lished studies in journals in diverse biologic disciplines. The sex of sub- jects was not specified in a number of journals; in many cases in which sex was noted, there was a male bias. Silver noted that in nearly every discipline that was the case more often in nonhuman studies than in hu- man studies. Silver quoted five of the recommendations of Beery and Zucker (2011, p. 570) that were based on their findings: If male and female models are thought to differ in re- sponse to an intervention, then the study must be designed with adequate sample size to answer the question for each sex. If prior research strongly indicates that there are no significant sex differences between male and female animals, then sex is not required in subject sex selection, but study of both males and females is both feasible and encouraged. If information about the existence of sex differences is absent or equivocal, then both sexes should be studied in numbers sufficient to permit valid analysis. Outreach training activities offering practical sugges- tions and additional sources of information should be made available by the NIH to help investigators design studies that fully incorporate female animals. . . . The review process for extramural funding should treat inclusion of females as a matter of scientific merit that af- fects funding. Journal policies determine manuscript reporting requirements, Silver said, and if journal editors believe that it is important to know the sex of origin of a cell type that is being studied or the sex of animal or human participants, investigators will have to include that information.

OCR for page 11
WORKSHOP SUMMARY 19 Cahill suggested that for studies (of a non–sex-specific issue) in which only one sex has been used, journal editors should make the last two words of the article title “in males” or “in females.” In addition to providing immediate clarity to basic researchers as they refer to the liter- ature, this truth-in-advertising policy would raise awareness and would be a powerful statement that sex matters. Davidoff noted that that is simi- lar to what was done in the mid-1990s in publications of randomized controlled trials. Such publications were not always easily identifiable, partly because “randomized controlled trial” was not included in the title and partly because the articles were not indexed as trials in U.S. National Library of Medicine’s online database (MEDLINE®/PubMed®). Proper titling and indexing of papers allow researchers to study the frequency with which types of studies are published, and allow meta-analyses to be done more quickly, easily, and completely. Sex-Based Comparisons vs Reporting of Participant Sex Judith Lichtman, associate professor in the Department of Epi- demiology and Public Health at Yale University School of Medicine, suggested that in considering standardization of journal policies for sex- specific reporting, it is important to remember that there are studies that are designed to assess sex-based differences, or of which such assess- ment is a natural extension, and studies in which sex-related data would be interesting to know but are not necessarily the focus. Studies designed to analyze by sex and studies that simply note the sex of participants as an observation present different methodologic issues. The extent to which sex is considered affects the focus of the work, the analyses, and often the length of the resulting paper. She suggested that requiring sex- based analysis takes study-design decisions out of the hands of the au- thors and peer reviewers and that comparisons drawn from studies that were not designed to assess sex differences may not be robust and could be misleading. Sex-specific analysis presents methodologic and analytic chal- lenges. For example, sample size is important. There must be enough data for adequate statistical power and useful comparisons. When the events being studied are very rare, there can be unintentional bias in en- rollment or a disproportionate blend of women or men among study sites. There may also be differences in prevalence or risk factors between males and females, and differences in psychosocial factors may come into play in comparisons. Lichtman added that older datasets that do not have the desired distribution of men and women can still be of value

OCR for page 11
20 SEX-SPECIFIC REPORTING OF SCIENTIFIC RESEARCH even though they may not have adequate power: relationships may be apparent, and they can help in generating hypotheses. Lichtman described her quick survey of August 2011 issues of the Annals of Internal Medicine, JAMA, and the New England Journal of Medicine. Of 11 original contributions, four included some level of sex stratification of data, five that she thought probably should have included sex-specific analysis did not, and in the remaining two it was not clear whether stratification would have been appropriate (for example, an in- vestigation of a nationwide outbreak of Salmonella infections associated with peanuts). She stressed that it is important to consider when sex- specific analysis makes sense and when it does not. Other Subgroups: Race and Age There is no question that sex is an important difference and one that has been underreported in the literature, Lichtman said, but differ- ences are also associated with race and age, and perhaps reporting poli- cies will need to be extended to those categories—although when sex, age, and race are considered, data presentation and interpretation can become complicated, and what the most useful comparisons are need to be considered. Workshop participant Pinn pointed out that the law requires NIH to include women and minorities and their subpopulations in clinical re- search. Analysis by race can be challenging, and researchers are often confused about how to address subpopulations. Although ORWH focus- es primarily on women, the NIH National Institute for Minority Health and Health Disparities (NIMHD) focuses on minorities and other health- disparity populations. Both ORWH and NIMHD report data by race and by sex. Data on Sex-Specific Reporting Pinn stressed that in looking at data on sex-specific reporting, it is important to know what studies the data are based on, for example, whether the data are only for clinical trials, or for clinical trials and ob- servational studies, or whether the data are for studies funded by NIH or for all studies. She noted that NIH has been conducting analyses of clini- cal research and in looking at 12,000 protocols in FY 2010 found that 56% of the 23.3 million participants were women. When sex-specific studies of diseases that affect only women or only men were excluded from the analysis, 51.6% of the participants in NIH-funded extramural