During his tenure at the New England Journal of Medicine, Kassirer said that many policy decisions were made and implemented, for example, the Ingelfinger Rule regarding duplicate publication and conflict-of-interest policies. Researchers are eager to have their papers published in high-profile journals, such as the New England Journal of Medicine, JAMA, and Annals of Internal Medicine. As a result, editorial policies implemented by those journals can be effective in modifying behavior. If the major clinical journals banded together to promote policies that foster sex-specific reporting, it seems reasonable to assume that investigators would take notice. Kassirer said that there is no bar to journals’ introducing policies that require sex-specific reporting, as was done at JNCI.
Davidoff pointed out that until about 10–15 years ago, nearly all clinical-journal editors were men. In recent years, however, four of the five major general clinical journals have had female editors (today, two of the five have female editors). That is an important change, he said.
Legato suggested that editors put out a request for papers that directly address sex differences, perhaps for publication in a supplemental issue. Robert Golub, deputy editor of JAMA, said that his journal publishes theme issues on topics that are of immediate relevance, and sex differences could be considered as one of those topics. The goal of a theme issue is to highlight research or current thinking in a field. Floyd Bloom, former editor-in-chief of Science, supported the idea of editors’ commissioning special issues on themes as a way to stimulate sex-specific science that could then be appropriately reported in journals; he noted that this has been successful in the neurosciences.
In basic-science research, it is very rare that male and female animals are studied side by side, said Levine. Even more problematic is the fact that male animals are often the default choice to avoid confounding effects of the estrous cycle or other female-specific physiologic circumstances. Levine suggested that basic-science journals consider a policy whereby authors are asked to justify the use of males vs females and
have reviewers check to see whether authors have considered the implications of their choice of males vs females in the design, conduct, and interpretation of their experiments.
Blaustein supported the idea of including the justification of the sex of animals as one of the items that reviewers and associate editors should look for, noting that it would help to increase awareness. He added that in many cases, the default choice is actually immature animals because researchers think that this avoids the effects of the estrous cycle and mix immature males and females.
Blaustein cited an article on sex bias in biomedicine by Zucker and Beery (2010, p. 690), who suggested that
to correct the sex bias in animal research, we need stringent, strictly enforced measures, not voluntary appeals. Journal editors and reviewers should require authors of research studies that use only male or only female animals to state this in the title of their papers. This would highlight sex biases and spur researchers to balance the numbers of males and females that they use. Funding agencies should refuse to consider grant proposals that do not properly acknowledge the sex of the animals to used, and favor those that include males and females, and analyze data by sex.
Guidance on conducting sex-differences research is available, Blaustein noted. For example, an article that was the product of a series of discussions sponsored by the Society for Women’s Health Research outlined strategies and methods for research on sex differences in behavior and the brain (Becker et al., 2005). The article was followed by a book, Sex Differences in the Brain, that discusses the subject in more depth (Becker et al., 2008). Guidance on designing and analyzing experiments to consider sex differences can be provided, Blaustein said, but that will not force researchers to look at both sexes.
Katrina Kelner, editor of the new journal Science Translational Medicine and former deputy editor for biology at Science, said that the issue of sex-specific reporting “is not on the radar screen” of the interdisciplinary journals (such as Proceedings of the National Academy of Sciences, Science, and Nature). Major issues that editors at Science have been working to address over the last 10 years include image manipulation, conflict of interest, and availability of the data and materials for the
analysis being reported, whether climate change, physics, biomedicine, or another discipline.
A challenge for any journal is to decide how stringently to enforce an editorial policy, Kelner said. High-profile journals must balance the large volume of manuscripts submitted against the limited time of reviewers and staff. A particular challenge for editors at an interdisciplinary journal is that one’s expertise in any particular topic is limited. Science has around 20 editors, about half of whom are physical scientists and half of whom are molecular biologists. If Science adopted the sex-specific reporting recommendation of the 2010 IOM report Women’s Health Research: Progress, Pitfalls, and Promise, the journal would have to rely, to some extent, on outside reviewers, Kelner said. There are limitations in relying on outside reviewers to ensure that appropriate analysis has been done. It can be difficult to find suitable reviewers—most are quite busy—and, because papers can be heterogeneous, editors do not always get the needed advice from a review.
Science Translational Medicine has adopted some of the ICMJE recommendations—including the institutional review board, authorship, and clinical-trial registration requirements—but has not adopted the recommendation about sex-specific reporting (although Kelner noted that she intends to revisit the issue after the workshop).
To aid editors of interdisciplinary journals, Kelner suggested that a role for the IOM and the editorial organizations could be to ensure that educational resources are available to editors, including information that they can refer to when assessing sex-specific reporting in papers.
Study Design, Analysis, and Reporting
Golub shared the results of his informal audit of the 50 most recent randomized controlled trials published in JAMA. Of the 50, 21 reported single-sex results (on sex-specific topics) or presented results analyzed and stratified by sex. Of the 29 studies that did not report results by sex, 19 included at least 40% women, 22 at least 30% women, and 27 at least 20% women. Golub opined that only one of the 29 studies was possibly adequately powered to do subgroup analyses; he reiterated that statistical power is a recurring problem. Adler suggested that having data similar to what Golub presented published annually would be helpful.
The issue of reporting results by sex is related not simply to the percentage of female participants in studies but to fundamental questions of adequate power and of whether subgroup analysis is appropriate. The harm in separating males and females in the absence of sufficient statistical power is the risk of errors, such as a type II error in publishing comparisons by sex that show no significant difference. Negative results do not necessarily mean that there is no difference.
Women’s Health Research states that “in the absence of a compelling reason not to, it should be assumed that there are sex differences in conditions” (IOM, 2010, p. 233). If the intent is to report raw data by sex for future research, that assumption is not likely to be an issue. For analysis of data by sex, however, Golub said that unless there is a priori evidence to support the assumption, the analysis may be flawed, and a Bayesian statistical approach may lead to false associations. In other words, a starting assumption that there is a sex difference for any association being studied could lead to the publication of false conclusions.
Gregory Curfman, executive editor the New England Journal of Medicine, concurred that the real challenge in reporting clinical-trial results separately for males and females is whether there is adequate statistical power for subgroup analysis. The rationale for reporting data by sex is indisputable, he said, but if a clinical trial has not been adequately powered to look at males and females separately, the conclusions are not going to be statistically sound. In many cases, achieving statistical significance for subgroup analyses would require unattainable or unjustifiable numbers of participants. Curfman therefore cautioned against editorial policies that require trials to be designed to reach valid statistical conclusions for males and females separately. Such editorial policies would create a “steep mountain to climb for investigators and for funding agencies,” he said. In addition, as has been discussed, it may not be enough to report results only by sex. Other important demographics, such as race and age, add to the complexity of reporting results and of sample-size calculations for large clinical trials. Curfman also raised a concern that reporting results by sex in studies underpowered for valid subgroup analysis may be misleading and may be subject to misinterpretation by the health care community.
Science and Science Translational Medicine have a strong emphasis on data access, and Kelner endorsed the idea that even if a study has not been properly powered for male and female subgroup analysis, the resulting data should still be made available for others, perhaps in an appendix (with the appropriate warning that, because of statistical limitations, sex-specific data do not necessarily indicate a sex-specific differ-
ence). Ensuring that all data are available to everyone once a study report is published is an important role for journals, Kelner said. Berlin added that appropriate study design and use of common definitions would improve the ability to conduct meta-analyses of such archived data.
Current NIH policy on the participation of women in clinical trials has not achieved the desired effect of having enough women enrolled in all studies for sufficient statistical power, Golub said. And statistical power is an ever-increasing problem. For example, improvements in standards of care mean that control groups have better outcomes and that more participants must be enrolled to find a significant difference between control and test treatments. In some studies, composite outcomes (such as major adverse cardiac events) are already necessary for achieving adequate power, and this limits the possibility of valid subgroup analysis on individual outcomes. Composite outcomes can hide important effects, Golub noted, and often the events that are driving a difference are the less important events, such as rehospitalization, whereas the number of deaths may not be statistically different or may be underpowered for analysis. Given those challenges, is it feasible to have enough women enrolled to facilitate statistical analyses of the sexes separately? Will a Bayesian statistical trial design enable studies with a lower aggregate enrollment? Once the primary hypothesis and primary outcomes are established, the necessary enrollment is determined on the basis of power analysis and expected attrition. Adding more patients is expensive, Golub said, and researchers cannot simply over-enroll in anticipation of unplanned subgroup analyses.
Blaustein questioned the extent to which the number of participants would need to be increased to see an effect of sex. In animal studies, he noted, sex differences are often apparent even in small studies. Kassirer responded that as the difference in effect between two therapies becomes smaller and smaller, larger and larger studies are needed to be able to identify it. There are differences between animal studies and human studies and between different types of human studies, Golub added. For example, in clinical trials with major cardiac outcomes, severe events, including death, have become much more rare, and the effect size that the investigator is willing to accept may be fairly small compared with a more basic study in which one would be able to detect effect differences with 10 subjects in each group. Sufficient power for subgroup analysis may be easier to achieve in some types of studies and harder to achieve in large clinical trials with rare outcome events.
A participant raised a concern about the reliance on a statistical p value of 0.05 for accepting or rejecting a hypothesis. In some cases, a
finer p value, such as 0.000001, might be appropriate, and in others, a p value of 0.20 might be acceptable. It may be necessary sometimes to allow more leeway in interpretation (which, it was noted, is not the same as exploratory data analysis).
Christine Laine, editor-in-chief of Annals of Internal Medicine, said that publishing a large trial without sex-specific results does not necessarily mean that there are no sex-specific results; when discussing the limitations of a study, authors should point out that there could be sex differences. Absence of evidence is not evidence of absence, Davidoff agreed. He added that statisticians are often involved in clinical trials at a late stage and probably should be engaged much sooner.
One of the quandaries that an editor faces, Golub said, is the need to cater to two different readerships: researchers and clinicians. Researchers are interested in hypothesis findings to inform the design of their next study, but the mission of medical journals is to publish studies that will affect clinical care, and it is increasingly difficult for clinical readers to understand the articles. New methods, including Bayesian analysis and adaptive clinical trials, will allow studies to be done without the need to anticipate everything from the start. The downside is that editors will need to re-educate themselves so that they can understand the methods. More statisticians will be required as it becomes more difficult to find reviewers who can review the statistical design of a study. And again, there is the issue of whether readers will be able to understand the reports.
A dilemma in drug development, said Marietta Anthony, director of women’s health programs at the Critical Path Institute, is that clinical studies are powered to show that a therapy is safe and effective in general, not necessarily safe and effective specifically in females or males. However, FDA guidelines mandate that medical products be demonstrated to be safe and effective in the populations that will use them, so there is some question of interpretation. Parekh explained that FDA guidance documents for preclinical animal studies recommend studying both male and female animals. Phase 1 clinical studies look for safety and pharmacokinetic differences among subpopulations of healthy volunteers, including women and men. If there are significant differences, they are monitored carefully in early phase 2 safety and efficacy studies. Early phase 2 studies in patients almost always include women. Phase 3 clinical trials are hypothesis-driven studies with prospectively defined end points and involve large numbers of patients. Sometimes the overall data will show no effect of a product but subgroup analysis reveals a significant effect in a particular population. Such a finding is hypothesis-
generating, Parekh said, and another prospectively designed trial is then conducted to confirm the finding.
Exploratory Subgroup Analyses Kramer said that in his experience, the issue with subgroup analyses in observational studies is not getting authors to do sex-specific analyses but preventing overinterpretation of results of exploratory analyses. Exploratory analyses—whether they are sex-specific or specific for comorbidity, age, race, or socioeconomic status—are different from prospective primary end-point analyses that are protected by randomization. JNCI does not publish post hoc analyses or exploratory analyses in article abstracts, and such analyses are clearly labeled in the results.
Golub concurred in the need to exercise caution on exploratory analyses. When the primary outcome of a study is negative, authors will often stress a positive secondary or post hoc finding. Identifying and addressing it is part of the peer-review process. Like JNCI, JAMA does not include exploratory analyses in abstracts and requires that they be clearly described as exploratory in the text. Golub noted that there is a difference between a prespecified secondary analysis and a post hoc exploratory analysis.
Sex Subgroups Sex hormones influence virtually all cells, and stage of reproductive life and development should be considered in designing and reporting studies, Blaustein said. For instance, children are not the same as adults, and estrous-cycling females are not the same as acyclic females. He acknowledged that considering such factors requires some knowledge of reproductive endocrinology or the involvement of a reproductive endocrinologist, which is unlikely in most cases. However, he pointed out that it is important to begin thinking about variations within the sexes.
Influence of Journal Editorial Policies
ICMJE and journal policies can influence researcher behavior. For example, Golub said, it is now rare to receive a report of a clinical trial that was not registered in a public database in accordance with the ICMJE policy, and all submissions are accompanied by author disclosure forms. It is important to note, however, that although a trial must be registered before the study begins enrollment, most policies (for example, requiring financial disclosure forms and race and ethnicity statements), unlike a sex-specific reporting policy, require little of the authors until
the study is complete. Even though the trial-registration policy changed researcher behavior, its implementation was straightforward and did not require any major changes in study design or any substantial investment of resources. Kramer added that journals can assert important influence on the quality of reporting. For example, journal editors have become more attuned to looking for discrepancies between a primary end point in the original study as designed and a primary end point that the authors discuss in their paper.
Golub questioned the extent to which an editorial policy could affect study design relative to sex-specific reporting, in light of the study-design issues involved (such as adequate power). Changing editorial policy might send a signal to researchers who are beginning to design a study, but would they be able to make the changes needed and design studies with adequate power? Larger studies would require additional funding, which may not be available. Changing standards and publishing analyses that are not likely to be valid is not a good solution either. It comes down to the design of the studies, and, inasmuch as study design has not changed substantially in the last 10 years, it is not clear what would make it change now. If editorial policies required sex-based analysis, would the funding follow?
As noted in Women’s Health Research (IOM, 2010), not considering sex and gender differences in the design, analysis, and reporting of studies has limited understanding of important sex differences and slowed progress in women’s health. Laine stressed that problems in the design and analysis of a study cannot be fixed simply by changing reporting requirements. Journal editors can ask authors to reanalyze their data but cannot ask them to redesign their studies and redo them. Journals could reject papers that do not report sex-specific results, but that is unlikely to happen.
Journals do not provide research funding, and Blaustein suggested that changes in how experiments are done start with funding agencies. Kelner concurred that funding agencies need to be partners in encouraging good research practices. There needs to be a culture shift within science. Questions about what can be accomplished by editors and publishers through setting standards for authors, whether these be recommendations or mandates, versus the role of federal agencies and other funders in shaping research culture to embrace consideration of sex differences as part of sound study design, were raised in a number of comments by participants.
As discussed earlier, the ICMJE policy is specific to medical journals. Laine listed several other editorial associations and published
guidelines that do not address sex-specific reporting at all, including the Council of Science Editors (which covers science broadly, not only biologic science but the physical sciences), the World Association of Medical Editors, the Guideline on Good Publication Practice of the pharmaceutical industry, and the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network, an umbrella organization that catalogs numerous reporting guidelines, such as CONSORT (Consolidated Standards of Reporting Trails) for randomized trials and STROBE (Strengthening the Reporting of Observational studies in Epidemiology) for observational studies. (The CONSORT and STROBE guidelines cover the majority of the clinical research published in the major journals.)
Annals of Internal Medicine does not have a specific policy on sex-specific reporting but follows the ICMJE policy, Laine said, and encourages authors to follow reporting guidelines, including CONSORT and STROBE. She added that for many years, Annals has indicated in the title and abstract when a study includes only men or only women and indicates in the limitations of the study if data are insufficient to examine potentially relevant sex differences or racial or ethnic differences. Annals does not ask authors to report sex-specific results when the study design is insufficient to enable useful reporting of such results.
A journal can put a policy into place, but there has to be a way to implement it. Laine offered clinical-trial registration as a case example: the ICMJE put its registration policy into place before there was anywhere for sponsors to register their trials. Similarly, journals could require that studies be powered for subgroup analysis, but that would entail the availability of resources to fund those types of studies. It will not work if the funders, researchers, and journal editors are not aligned. Editors can foster more accurate reporting, but must be careful about making requirements that are not feasible, Laine cautioned. As data-sharing advances, journals may be different a decade from now, and researchers whose studies do not meet some set criteria may move away from traditional journals and publish their results in an open-access setting. Berlin added that a motivation for trial registration was to eliminate publication bias—to make all results available regardless of whether they are positive or negative. He cautioned that a situation in which only studies with sex-specific results are published is not desirable.
Another issue is the long pipeline of current high-quality clinical trials, many with long-term followup. These will be coming to completion over the next decade or later, and panelists discussed how any editorial policy that affects study design would need to be phased in over an
extended period. In the interim, Golub said, it is unlikely that journals would forgo publishing a well-done, informative study that could affect patient care solely because it lacked enough power to permit valid sex-specific reporting. Conversely, journals are not willing to publish poor-quality studies or invalid or meaningless data or analyses.
As discussed earlier, the present workshop was designed to consider a recommendation in Women’s Health Research (IOM, 2010) that “the International Committee of Medical Journal Editors and other editors of relevant journals should adopt a guideline that all papers reporting the outcomes of clinical trials report on men and women separately unless a trial is of a sex-specific condition” (IOM, 2010, p. 13). Laine raised several concerns about editorial policies that might be developed on the basis of that recommendation. First, it appears that observational studies are not included. Second, as discussed above, many trials include insufficient numbers of women or men to allow valid comparisons or within-group conclusions. Third, if randomization was not stratified by sex, the results should not be interpreted as causal relationships. Finally, simply reporting sex-specific results does not address the question of whether any of the observed sex differences are due to sex or to confounding factors.
On the basis of their experiences in implementing editorial policies, the panelists offered a variety of suggestions regarding the inclusion of sex-specific information in scientific publications (summarized in Box 2).
It was also suggested that the ICMJE consider adopting a stronger sex-specific analysis and reporting statement similar to that of JNCI. Laine predicted, however, that ICMJE members would question why only sex was being addressed and not other key factors, such as age, race, ethnicity, and insurance coverage.