The fourth session of the workshop focused on new and emerging designs for intervention studies. Intervention study designs are experimental designs to establish whether an intervention (such as a drug treatment or behavior modification) has the desired impact. There are challenges in designing intervention studies for small populations, as well as for studies that examine multiple determinants of health using multilevel interventions at the community level. The session, moderated by steering committee member James Allen (University of Minnesota), provided a brief conceptual portrait of work that is making the tools of intervention science accessible to small populations.
Amy Kilbourne (University of Michigan) provided an overview on designs used for dissemination and implementation research with small populations. She was followed by Christine Lu (Harvard University), who discussed quasi-experimental designs. Diane Korngiebel (University of Washington Medical School) addressed ethical and qualitative challenges of research with small populations. Invited discussant Patrick H. Tolan (University of Virginia) summarized key points.
DESIGNS FOR DISSEMINATION AND IMPLEMENTATION RESEARCH FOR SMALL POPULATIONS
Amy Kilbourne provided an overview of implementation and dissemination research and examined key intervention study designs that focus on implementation strategies—hybrid designs, stepped-wedge designs, and
sequential multiple assignment trials (SMART) designs. She described how they have been applied in particularly small and vulnerable populations.
The National Institutes of Health (NIH), she noted, has defined dissemination and implementation research:1
Dissemination research is the scientific study of targeted distribution of information and intervention materials to a specific public health or clinical practice audience. The intent is to understand how best to spread and sustain knowledge and the associated evidence-based health interventions.
Implementation research is the scientific study of the use of strategies to promote the uptake of evidence-based health interventions in clinical and community settings in order to improve patient/population outcomes.
Different types of designs used in dissemination and implementation research include classic randomized controlled trials and pragmatic clinical trial designs, interrupted time series, dynamic wait list designs, and regression point displacement designs and stepped-wedge designs. Kilbourne focused on hybrid effectiveness/implementation, stepped-wedge, and SMART designs because there has been a lot of emerging science in implementation research using them.
Kilbourne illustrated a pathway from pre-intervention stage efficacy studies, which are the studies first done in a tightly controlled population; to effectiveness studies, which evaluate the effectiveness of an approach; to adoption, sustainability, and moving to scale, which are often what happen at the dissemination and implementation phase. SMART and adaptive designs can be used in any phase. Hybrid designs are used at the effectiveness, dissemination, and implementation phases. Stepped-wedge designs are used at the dissemination and implementation phases.
She said these types of study designs are used to answer particular questions about the best implementation strategies related to provider organizational behavior change. Implementation strategies are tools and methods that provide technical and interpersonal supports or methods that help providers adapt or adopt, sustain, and scale effective practices into routine care. According to Kilbourne, the words “technical” and “interpersonal” are two key words in implementation science. Examples of technical strategies are toolkits or training that are focused on how to help providers learn a new skill or intervention, or how to get teams of providers in a small clinic to work together. Interpersonal skills include leadership,
1 NIH PAR 16-238: Dissemination and Implementation Research in Health (R01).
business strategy, and strategic thinking—essential when community health workers work with a larger community organization, or when front-line providers or nurses work with physicians and health care executives. Both are essential for successful implementation strategies.
Hybrid designs can compare the effectiveness of different implementation strategies or look at both effectiveness of a clinical intervention and the strategy used to implement that intervention. Stepped-wedge designs are often used in clustered ways, randomizing sites to different types of implementation strategies. SMART and adaptive designs can be used throughout the whole process, including in effectiveness or efficacy trials to determine whether sites or people are responding to an initial intervention or strategy.
Kilbourne emphasized the importance of the study of implementation strategies by pointing to the gap between research creation, the discovery of new evidence-based practices of knowledge, and implementation so that people can benefit. Providers may choose not to implement for a variety of reasons, from too difficult to not tested on the relevant patient population. Whether or not researchers publish research results has also been an issue, as well as whether the published reports are usable. Even effective and implementable interventions may or may not be implemented in practice. For every dollar spent on medical research, 20 cents may result in some value, she said.
Implementation science is intended to address this research-to-practice gap. Different challenges call for different strategies to consider. For example, some interventions may not be used for small populations. In this situation researchers may need to consider tools or methods to adapt to local settings and populations. Kilbourne referred to the barriers, discussed earlier, in sampling and having a sufficient number of sites to test clinical interventions in small practices or populations. Another barrier is whether or not interventions are rolled out with limited planning. There is always an urgency to move forward quickly, she acknowledged, but it is not necessarily easy to do. Intervention reach is also hard to sustain, she said, especially related to health.
A notable exception relates to U.S. Veterans Health Administration efforts to end homelessness. A plan was put in place to work with different agencies through a cross-agency priority goal that included training, facilitation, and community engagement. She said that this was a wonderful example of how planning of the implementation strategies ahead of time can make a big difference.
Hybrid Effectiveness/Implementation Designs
Kilbourne explained how hybrid effectiveness/implementation designs compare on implementation strategies, address limits of stepwise research and research to practice, promote external validity, and blend the effectiveness and implementation stages. These designs are ideally used when there is an imperative to do something for a population but little evidence for a particular intervention. With a hybrid design, both effectiveness and implementation are tested.
There are three different types of hybrid implementation designs (see Table 5-1). Hybrid Type I is a standard effectiveness study with observations of its implementation process. A researcher observes the process to see whether or not the clinical intervention itself is effective. Hybrid Type III starts with an intervention that is presumably effective, but it is not known how best to implement and sustain it, so different implementation strategies are tested. Hybrid Type II is a combination of Types I and III, in which an implementation strategy is used in combination with the clinical intervention. The control group is often offered usual care, and the design is to test whether the whole package can/should be implemented and sustained.
TABLE 5-1 Hybrid Effective Implementation Designs
|Type I||Type II||Type III|
|Design Characteristic||Test clinical intervention||Test clinical and implementation strategies||Test implementation strategy|
|Question||Is treatment effective vs. usual care?||Is treatment delivered through tailored provider coaching effective vs. usual care?||Does provider coaching vs. training alone improve treatment uptake?|
|Unit of Analysis||Patient||Providers/clinics||Providers/clinics|
|Primary Outcomes||Health outcomes||Process measures||Provider uptake, sustainability|
|Key Advantage||“Cleanest” in terms of determining effectiveness||Ideal when there is time-sensitive need to roll out intervention||All participants get intervention, focus on what it will take to sustain|
SOURCE: From workshop presentation by Amy Kilbourne.
Kilbourne provided examples of the three types. In the example of Type I,2 she and her colleagues studied a collaborative care model for Aetna enrollees (privately insured individuals) with mood disorders from small group practices. About 90 percent of Aetna enrollees were seen by a solo or small-group primary care practitioner. These small practices did not have capacity to provide what is considered evidence-based treatment for depression—the collaborative care model. The collaborative care model in this study included care management and self-management support. Care management was with a national phone-based social worker who linked a person to different providers in terms of the recovery care they needed and symptom management. The control group participants got usual care, which consisted of wellness mailings. The average depression symptom score was lower in the intervention group than the control group over a 12-month period.
In the Type II example,3 doctor-office collaborative care was implemented to improve pediatric behavioral health outcomes. This was a variation on the collaborative care model but was for pediatric practices for children with behavioral health needs. It included care management and family support. In addition, it had provider training and support within those clinics. Those providers not only delivered the care management and family support; they got additional training on how to do it, and they got consultation on making sure that they did it properly. Usual care was standard primary care. The primary outcome here was quality of care. The doctor-office collaborative care improved quality of care over usual care within a 12-month period.
Kilbourne noted a Type III study requires an implementation strategy. She and her colleagues have used an implementation framework prepared by the Centers for Disease Control and Preservation4 called the Enhanced Replicating Effective Programs (REP) Implementation Strategy that has
2 Kilbourne, A.M., Nord, K.M, Kyle, J., Van Poppelen, C., Goodrich, D.E., Kim, H.M., Eisenberg, D., Un, H., and Bauer, M.S. (2014). Randomized controlled trial of a health plan-level mood disorders psychosocial intervention for solo or small practices. BMC Psychology, 2(1):48. doi: 10.1186/s40359-014-0048-x.
3 Kolko, D.J., Campo, J., Kilbourne, A.M., Hart, J., Sakolsky, D., and Wisniewski, S. (2014). Collaborative care outcomes for pediatric behavioral health problems: A cluster randomized trial. Pediatrics, 133(4):981-992. doi: 10.1542/peds.2013-2516.
4 REP was developed by the CDC to rapidly translate prevention programs to community-based settings (Social Learning Theory, Rogers Diffusion model). See Kegeles, S.M., Rebchook, G.M., Hays, R.B., Terry, M.A., O’Donnell, L., Leonard, N.R., Kelly, J.A., and Neumann, M. (2000). From science to application: The development of an intervention package. AIDS Education and Prevention, 12(Suppl. 5):62-74; and Kilbourne, A.M., Neumann, M.S., Pincus, H.A., Bauer, M.S., and Stall, R. (2007). Implementing evidence-based interventions in health care: Application of the replicating effective programs framework. Implementation Science, 2:42. doi: 10.1186/1748-5908-2-42.
three phases: pre-implementation, implementation, and dissemination. These phases focus on the packaging and identification of quality gaps and barriers, customizing the treatment package based on local input, and listing the essential core elements of the clinical intervention with some options for adaptation. It is essential to provide a community-based organization the opportunity to adapt interventions.
Her first example of a Hybrid Type III study was a pioneering intervention in HIV prevention in small communities5 that compared the effectiveness of having only a packaged manual of an HIV prevention intervention delivered to an AIDS service organization, versus delivering the manual plus training, versus delivering the manual, training, and additional technical assistance. She showed charts illustrating that the more implementation strategies provided, the more uptake of the intervention.
Her second example was a VA project that focused on homelessness prevention.6 This was a comparative effectiveness of two implementation strategies—standard REP with training and passive technical assistance (providers called if they needed help) or enhanced REP with active facilitation and coaching of the frontline providers responsible for providing outreach services to the veterans with serious mental illness who had dropped out of care. After 6 months, the sites that started out with immediate enhanced REP had improvement over time. However, that improvement began 3 months into the process. At the end of Phase 2, after the sites that had been given standard REP were given facilitation, they caught up to the original groups that received enhanced REP at the beginning.
Kilbourne said that stepped-wedge designs are ideal when there is an intervention known to cause more good than harm, but resources are too limited to provide the intervention to all participants at the same time. In stepped-wedge designs, all participants receive the intervention, but the
5 Kelly, J.A., Somlai, A.M., DiFranceisco, W.J., Otto-Salaj, L.L., McAuliffe, T.L., Hackl, K.L., Heckman, T.G., Holtgrave, D.R., and Rompa, D. (2000). Bridging the gap between the science and service of HIV prevention: Transferring effective research-based HIV prevention interventions to community AIDS service providers. American Journal of Public Health, 90:1082-1088.
6 In this study, authors used Enhanced REP, adding facilitation (regular coaching by implementation expert) to support providers in implementing self-efficacy through identifying/mitigating barriers to adoption, building coalitions at sites, and enhancing communication with leaders. See Kilbourne, A.M., Almirall, D., Goodrich, D.E., Lai, Z., Abraham, K.M., Nord, K.M., and Bowersox, N.W. (2014). Enhancing outreach for persons with serious mental illness: 12-month results from a cluster randomized trial of an adaptive implementation strategy. Implementation Science, 9:163. doi: 10.1186/s13012-014-0163-3.
start-time is randomized. She cited an example of a provider facilitation–collaborative care stepped-wedge design in mental health clinics.7
Sequential Multiple Assignment Trial (SMART) Designs
SMART designs are multistage trials, each with the same subjects throughout. Each stage corresponds to a critical decision point with pre-specified measures of responsiveness. Treatment options at randomization are restricted depending on the history of responsiveness, and subjects are randomized to a set of treatment options. The goal of a SMART design is to inform the development of an adaptive intervention strategy. This approach identifies which sites or organizations might need the most intensive level of implementation support or strategies.
SMART designs can be used when there is insufficient evidence to decide which implementation strategy to start with or how to phase in additional strategies. Kilbourne provided an example of a SMART design called the Adaptive Implementation of Effective Programs Trial, or ADEPT study.8 The research question was to determine the best way to implement a collaborative care model in community-based practices to improve patient mental health outcomes. The study was conducted in small practices in several counties across Michigan and Colorado. Three implementation strategies were considered: (1) the standard REP, consisting of the intervention package and training and brief passive technical assistance; (2) external facilitation, having a clinical expert call in to providers and provide them guidance or mentoring on a regular basis; and (3) a combination of external and internal facilitation, with internal facilitation involving someone onsite to support providers in implementing a collaborative care model. The least expensive version was the standard REP, and the most expensive in terms of money and time involved external and internal facilitators.
The study started with one implementation strategy where everyone received REP. Nonresponders were randomized either to added external facilitation (phone-based collaboration on using the collaborative care model) or to added external plus internal facilitation (in-person support
7 Bauer, M.S., Miller, C., Kim, B., Lew, R., Weaver, K., Coldwell, C., Henderson, K., Holmes, S., Seibert, M.N., Stolzmann, K., Elwy, A.R., and Kirchner, J. (2016). Partnering with health system operations leadership to develop a controlled implementation trial. Implementation Science, 11:22. doi: 10.1186/s13012-016-0385-7.
8 Kilbourne, A.M., Almirall, D., Eisenberg, D., Waxmonsky, J., Goodrich, D.E., Fortney, J.C., Kirchner, J.E., Solberg, L.I., Main, D., Bauer, M.S., Kyle, J., Murphy, S.A., Nord, K.M., and Thomas, M.R. (2014). Protocol: Adaptive Implementation of Effective Programs Trial (ADEPT): Cluster randomized SMART trial comparing a standard versus enhanced implementation strategy to improve outcomes of a mood disorders program. Implementation Science, 9:132. doi: 10.1186/s13012-014-0132-x.
for a clinic administrator to help incorporate the collaborative care model in routine practice). All were followed up to identify whether sites were responding to the strategy or not. Nonresponders to REP plus external facilitation alone were randomized into REP plus external facilitation (no change) or REP plus internal and external facilitation. The response variable was an indicator of whether or not fewer than 50 percent of patients received three or more collaborative care model self-management sessions, an indication that providers were delivering the evidence-based collaborative care model.
Kilbourne concluded with her views on future directions in study designs for implementation research in small populations. One promising direction is enhancing reach—thinking about ways to engage community organizations, schools, and places where certain types of people may congregate. For example, many people with mental health symptoms do not go to primary care, but they can be reached through other community organizations or schools.
Second, it is important to find ways to design implementation strategies to compare the effectiveness of different strategies while at the same time giving all participants the same beneficial clinical intervention.
Third, she commented on the balance between good randomization, or making randomization efficient and fair, and making it seem like an equitable process. It is important to work with stakeholder timelines and to address the urgency to act, questions about fairness, and whether or not people get the right resources. Finally, she noted the potential for data capture strategies, such as smartphones and other information technologies.
QUASI-EXPERIMENTAL DESIGNS WITH APPLICATIONS TO SMALL POPULATIONS
Christine Lu presented different types of quasi-experimental study designs. In some contexts or situations, randomized controlled trials are not feasible, which is where quasi-experimental study designs are particularly useful. In studying the impacts of health care policy, she has found that effects can be intended or unintended, desirable or undesirable, direct or indirect, and very obvious or not so obvious. Researchers try to ensure the study design chosen for the research question is the strongest possible in order to draw causal inference conclusions.
Interrupted Time Series
In interrupted time series designs, one of several types of quasi-experimental study designs, typically one time point represents when the intervention was implemented or introduced. The intervention may be a policy change or program intended to change behavior. There are observations of an outcome variable taken over time, preferably several observations before and several after the intervention. The interrupted time series uses observations of the same outcome measured across time interrupted by the intervention. In addition to the group that receives the intervention, there may be a comparison group that has not received the intervention. The study is stronger if it includes a comparison group.
Lu explained other quasi-experimental designs. In a pre-post with comparison group design, there is only one observation before the intervention and one after. This design is weaker than the interrupted time series design with multiple observations before and after. The pre-post–only design does not have a control group, and also has only one observation before and after the intervention. The post-only design (cross-sectional study design) does not have a control group and has only one observation after the intervention. The post-only with comparison group has a control group, but has only one observation after the intervention.
The strong designs are randomized controlled trials and interrupted time series with comparisons, Lu said. There are intermediate designs for drawing causal inference relationships between the exposure and the outcome: the single interrupted time series and the pre-post with comparison group designs. The weak designs are pre-post–only, post-only, and post-only with comparison group.
Lu said that interrupted time series designs are useful when there is a sharply defined known time point of the intervention. That makes it easier to design the outcome measures and to take measurements before and after that time point. It is more likely that any observed change is related to the intervention or that the intervention has caused the change. The basic design is to compare longitudinal trends before and after the intervention, for example, by using segmented linear regression. The major assumption is that the baseline trend (before the intervention) will reflect what would have happened without the intervention.
She illustrated what would result from a pre-post with comparison group design. This design has four observations: one before and one after for the intervention group and one before and one after for the comparison group. Based on these four data points, one might conclude that the intervention caused a change, but with a few more observations in the pre-intervention time period to provide information on the “historical trend,”
one might come to a different conclusion. Conclusions will be stronger if researchers can collect more data even if only in the historical time period.
In Lu’s first example,9 the intervention was a Food and Drug Administration (FDA) antidepressant warning called the boxed warning that was issued in 2004. Before the boxed warning was issued, the FDA released several advisories, noting that the pediatric population who take antidepressants should be monitored closely because of a possible increased risk of suicidality (suicidal thoughts and behavior). Many media reports exaggerated the suicidal risk, likely scaring patients, parents, and the public. The goal of the research was to study the impact of the FDA warnings and media coverage on three outcomes: antidepressant dispensings, suicide attempts, and completed suicides in pediatric and adult populations. The last two are rare outcomes, and for that reason researchers decided to use a multisite study. The longitudinal interrupted time series design used data from the Mental Health Research Network, part of the Health Care Systems Research Network. An advantage of data networks10 is that researchers do not have to collect, clean, and combine data from individual sites. Instead, they design a distributed computer program and sent it to individual sites to run against their data. They receive summary results that can be combined quickly with minimal concerns about privacy, data ownership, or data cleaning.
The FDA example is important because one of the weaknesses of interrupted time series is co-intervention. In essence, because the media attention and FDA warning happened around the same time, their effects cannot be separated from each other. The study considers the combined effect. Lu illustrated the results by displaying time series charts, one showing antidepressant dispensing over time, the other showing suicide attempts using psychotropic drug poisoning as a proxy. In both, the historical trends were relatively linear (showing an increase over time in dispensings, but a decline over time in suicide attempts by drug poisonings), but a visual inspection illustrated that post-intervention trends were not linear. Researchers included a quadratic term to adjust for the nonlinearity. Dispensings were lower by about 31 percent post-intervention among adolescents, but suicide attempts by poisoning increased post-intervention by 21.7 percent
9 Lu, C.Y., Zhang, F., Lakoma, M.D., Madden, J.M., Rusinak, D., Penfold, R.B., Simon, G., Ahmedani, B.K., Clarke, G., Hunkeler, E.M., Waitzfelder, B., Owen-Smith, A., Raebel, M.A., Rossom, R., Coleman, K.J., Copeland, L.A., and Soumerai, S.B. (2014). Changes in antidepressant use by young people and suicidal behavior after FDA warnings and media coverage: Quasi-experimental study. British Medical Journal, 18:348. doi: 10.1136/bmj.g3596.
10 Other research data networks include the Sentinel Program with about 223 million individuals and the Patient-Centered Clinical Research Network with about 10 million individuals. These are valuable resources developed with common data models.
among adolescents, illustrating the importance of monitoring for unintended consequences.
Lu’s second example was a natural experiment in MaineCare, the Maine Medicaid Program.11 The MaineCare Program recategorized some medications as nonpreferred, which presented an administrative barrier. In this example, New Hampshire served as a comparison state. Researchers focused on a number of second-generation antipsychotic and anticonvulsants drugs to understand whether the Maine policy had an impact on whether people started or stopped taking the medicine. Lu said that this example is an interrupted time series with comparison series, one of the strongest designs possible. She showed a graph with two lines showing drug initiations in Maine and New Hampshire, with the intervention in the middle. In New Hampshire, the comparison state, the trend was stable and increasing over time. But in the study state, after the intervention—because of intervention—there was a very clear reduction in the new starts of those medications. Lu observed that is the advantage of having a comparison series with an interrupted time series: it is clear what happened in a group not exposed to the intervention.
Using the same policy example, Lu showed another chart illustrating pre-post with comparison series analysis of the time to discontinuity of the drug. Essentially, there are four data points. There is the time to discontinuation in the pre-policy cohort and the time to discontinuation in the post-policy cohort in both states. The test used an adjustment accounting for the difference in relative hazard ratios between the pre and post in the comparison state (New Hampshire) to estimate the impact in the study state (Maine). The result showed that the study state had a much higher likelihood of discontinuing the drugs after the policy started.
Strengths and Challenges of Interrupted Time Series
Lu said that the interrupted time series design is very useful when there is a sharply defined intervention to minimize biases due to co-intervention. It controls most common threats to internal validity because of the availability of historical trends in the outcome with and without control (com-
11 Lu, C.Y., Soumerai, S.B., Ross-Degnan, D., Zhang, F., and Adams, A.S. (2010). Unintended impacts of a Medicaid prior authorization policy on access to medications for bipolar illness. Medical Care, 48(1):4-9; Lu, C.Y., Adams, A.S., Ross-Degnan, D., Zhang, F., Zhang, Y., Salzman, C., and Soumerai, S.B. (2011). Association between prior authorization for medications and health service use by Medicaid patients with bipolar disorder. Psychiatric Services, 62(2):186-193; and Zhang, Y., Adams, A.S., Ross-Degnan, D., Zhang, F., and Soumerai, S.B. (2009). Effects of prior authorization on medication discontinuation among Medicaid beneficiaries with bipolar disorder. Psychiatric Services, 60(4):520-527.
parison) groups. It is possible to directly estimate effects. Intuitive, visual displays are available that facilitate analysis and communication of results.
A challenge to this design is the need for reasonably stable data. In the FDA drug-warnings study, they used measures of antidepressant dispensings and suicide attempts, even though suicide attempts were rare. They had a suicide death outcome in children and in young adults separately, but these were too rare for interrupted time series to detect changes. Co-intervention is a potential problem. Interrupted time series is stronger if each segment before and after the intervention has more than eight data points. This provides a much more stable trend change estimate, even though a linear trend might not be realistic.
Lu said that interrupted time series designs are potentially useful for small populations. She considers a rare outcome as more of a problem than a small population. She observed that it is possible to design an interrupted time series analysis using data for an outcome from a single person before and after a specific intervention.
Lu said it is important to evaluate the impact of policies or interventions. Researchers should pay attention to different data sources and select outcome measures that have the greatest chance of being stably recorded across time points before and after the intervention so that changes in their rates can be reliably measured. Researchers should try to use the strongest study design. They should consider both unintended and intended consequences, and whether the intervention will have shorter- or longer-term outcomes. In her example the antidepressant dispensing was a short-term intended outcome and suicide attempts could be an unintended consequence or longer-term outcome. She also encouraged researchers to consider leveraging existing data networks whenever possible.
ADDRESSING THE CHALLENGES OF RESEARCH WITH SMALL POPULATIONS
Diane Korngiebel discussed bioethics and small population research, data aggregation and qualitative methods, and co-production and related approaches.
Korngiebel noted the first bioethical consideration is beneficence and non-maleficence or “do no harm.” She referred to standard references: the Belmont Report, the Common Rule, and 45 CFR Part 46.12 She said
bioethics offers no answers but many questions. If the question is whether a small group should be studied separately, bioethical questions include whether the population benefits from separate study, whether harm could ensue from a separate study, or, conversely, whether harm could ensue if the population is not studied separately.
Korngiebel said another bioethical pillar is the respect for autonomy. This may be respect for a person, but could also be extended to the respect for the autonomy and identity of a community. Tribes, for example, are sovereign nations. They have a right and the responsibility to their people to review, choose, and direct research priorities.
She noted that justice and equity constitute the last bioethical pillar. Inherent in the term “health disparities” is recognition that injustice (inequity) should be addressed. She questioned whether health disparities research could inform resource distribution to address social determinants of health. She referred to a study13 that looked at mortality rates. It found that medical advances averted about 178,000 deaths from 1996 to 2002, whereas when they looked at education differences, 1.36 million deaths could have been averted. She suggested creative thinking about health disparities and how to improve health.
Data Aggregation and Qualitative Methods
Korngiebel said a data cycle should flow from the population to health data to inform policy, to funding distribution to programs/services to address needs, and back to the population. This cycle breaks down fast if there are no data and the inequitable distribution of resources continues. She noted the challenges of current data collection, especially for small populations like American Indians, Alaskan Natives, Native Hawaiians, and Pacific Islanders. Low survey response rates represent a major challenge at the state, regional, and national levels. Another major challenge is ethnic and racial misclassification, especially when the information is recorded by a third party. These issues result in data that are not reliable for these groups.
She suggested current methods or classifications do not support relevant aggregation of data for small populations. For example, groups of different sizes are collapsed, such as Asians (96%) with Pacific Islanders (4%). With this collapsing, the issues of the smaller group are subsumed by the larger. Another consideration centers on what constitutes valid data. When decisions are made about what data to collect and count, it is a form
13 Woolf, S.H., Johnson, R.E., Phillips, R.L., Jr., and Philipsen, M. (2007). Giving everyone the health of the educated: An examination of whether social change would save more lives than medical advances. American Journal of Public Health, 97(4):679-683.
of gatekeeping. The data that are collected may mean that some people are left out or not being heard.
She and her colleagues talked to five tribal partners about what they would recommend for data aggregation.14 The study timeline started in 2009, but partnerships started long before that to build relationships and establish trust. A conference was held with indigenous and tribal health leaders at the University of Washington to talk about data-related issues. A bioethics administrative supplement asked tribes about combining their data with data from other tribes. There were 2 years for data collection and analysis, 2 years of tribal review, and finally a publication. The five partner tribes varied in size and were diverse in terms of culture and location. The engagement approach was the tribal participatory research method. The process involved extensive consultations, tribal council approvals, and memoranda of understanding. They learned that many factors could inform aggregation, including geographic proximity, community type (urban/rural, coastal/inland), culture, presence or absence of a contaminated environment, type/severity of health concerns, access to health care, and generational cohort.
Korngiebel discussed how researchers can leverage the community wisdom of small populations. She proposed a focus on co-production and co-creation15 as broad concepts to use in approaches, frameworks, and methodologies. She noted that the study she described was about co-production. Researchers used qualitative methods to identify important reasons for differences between tribes so that quantitative researchers could use them in their data aggregation activities. However, the activity was about creating value together. The tribes are the ones most affected by the data, most affected by the research, and most affected by the interventions, she stressed.
Korngiebel noted that one challenge of co-production is achieving consensus. A team may include community members who disagree. However, co-production is a process of consultation and negotiation, and all participants are engaged in the decision making. The result is development of a transparent and inclusive process. This is very different from having a decision imposed externally, she said.
14 Korngiebel, D.M., Taualii, M., Forquera, R., Harris, R., and Buchwald, D. (2015). Addressing the challenges of research with small populations. American Journal of Public Health, 105(9):1744-1747.
15 For explanations of these concepts, see the AMA Journal of Ethics, November 2017, Vol. 19, No. 11.
She cited a values-based approach from the social sciences: community-based participatory research. This values-based framework respects community needs and prioritization. According to the originators of the process,16 all partners contribute expertise to defining the issue and determining action. The community does not expect the researcher to get nothing in return. Communities are constantly consulted before, during, and after. Communication is important because the relationship must be built and maintained.
Korngiebel also described user-centered design, an approach from industry. For some companies, including Apple, Amazon, Google, Microsoft, and Boeing, user-centered design is central to how to develop features, products, services, and activities. One feature is research into how something is going to be used within the fabric of a person’s life or in the fabric of a community. Some user-centered design approaches are coming up with qualitative approaches to collect data in a more kinesthetic, active way, such as through storytelling or camera journal data collection.
Korngiebel said that researchers should consider interweaving available tools, starting with community-based participatory research for community engagement. At some point, the effort will diverge to user-centered design to create the content and delivery of the intervention. User-centered design includes a phase of generative and formative research, iterative creation, usability testing, revisions, and piloting. She concluded by asking how dissemination and implementation research might be incorporated with these frameworks to enhance success.
Patrick Tolan noted he works in community-based, randomized controlled trials that are behavioral. They use schools as units. These studies are large budget-wise, but they are small when considering the units. He spends a lot of time talking to people who are not scientists and to scientists who are not intervention researchers. It is difficult to explain the need to do something that is contained, defined, and organized that is given to only half the people by chance. Explaining this takes a lot of time and energy because many good ethical questions arise.
Tolan noted that much of the conversation about small populations is about small samples. One of the messages from the workshop, he said, is that the issue of small samples may be less challenging than how to define distinct populations. That definition tends to be related to marginalization; reduced political, economic, and social capital; differences in resources and
16 Israilov, S., and Cho, H.J. (2017). How co-creation helped address hierarchy, overwhelmed patients, and conflicts of interest in health care quality and safety. American Journal of Ethics, 19(11):1139-1145.
risk; and the ability to be engaged in the kinds of interventions that can make a difference.
Tolan extracted three themes from the session: (1) how to engage in statistical and causal inference and the challenges; (2) how to know the validity and completeness of knowledge and to work toward it; and (3) the relationship between health equity justice and distribution of access to resources, risk, and opportunities for care.
Statistical and Causal Inference
The statistical and causal inference challenges are critical. It can be challenging to obtain sample sizes that allow confidence in probability-based statistics when working with distinct populations or those who are marginalized and hard to reach. This challenges the tenet that randomness has been achieved and that a reliable and sensitive estimate with enough statistical power to detect differences that are meaningful and reliable can be obtained. A number of designs and strategies can deal with those challenges, such as use of interrupted time series. He noted that many promising strategies have been discussed to recruit populations that are difficult to reach, marginalized, or may be hostile to engagement. He said it may be a matter of political will and economic resources. In other words, he asked, do we want to fund the work to get this done? Do we as scientists want to spend our time on this work?
Validity and Completeness of Data
Tolan noted that the discussion raised the issue of replication in a new way. Replication is part of examining variation, and replicability is an important part of understanding the validity and meaning of findings. He suggested small populations will make a contribution toward new ways to make comparisons around replication. A related consideration is how to understand which populations to distinguish.
Tolan commented that everybody is in groups. Non-Hispanic whites are not a very homogeneous group and may not be a very meaningful group to use as a norm for benchmarking. The discussion about the meaningfulness of geographic determination of groups (see Chapter 3) shows that a fundamental question revolves around how to use new tools to identify what is important about distinctions.
Tolan said researchers will start considering group randomized trials or large group trials with many comparisons as the norm. For example, he and his colleagues have done studies of school-based intervention with 45 schools in one study and 37 in the other, both small samples. There were thousands of students involved, but small samples in terms of the intervention. He said he thinks that approach will become a norm.
He reflected on comments by Lisa Signorello (see Chapter 2) on terms of meaning. Drawing on his research on parenting, terms mean different things and carry different weight to different people. For example, for a parenting intervention to be of interest to Mexican American families, it needed to address respect and responsibility, while non-Hispanic whites were interested in autonomy and development of independent capability and African-American parents were concerned with love, connection, and dependability. One Native American tribe told researchers that they do not do parenting; the people around the family, the community, are responsible for the development of a child. Tolan noted that in studies like these, researchers are engaging in an exchange of cultural understanding and meaning. Many populations are marginalized and suspicious of researchers, he said. They are very sophisticated and aware of the nature of the relationship and the power dynamic.
He said that another question has to do with how to help people understand science. Random assignment in small populations may be a challenge for statistical reasons. But it may also be a challenge because there are populations that cannot see how it is ethical to exclude people from an intervention that could help them. Researchers have to think about these issues on a deeper level than “do you understand what I mean when I say that flipping a coin is fair?”
Researchers have to be thoughtful about the nature of the political and social standing of groups when they are engaged in interventions. An intervention occurs within a larger context of how much social or political capital a given group may have, how much access or impediments to access it may face, and how the intervention implementation itself may depend on that.
In studying small populations, Tolan said, researchers should recognize their understanding of typicality, normality, or abnormality is biased and incomplete. How a benchmark was created may be problematic and biased. The struggle is in analyzing the variation from the benchmark when data are incomplete.
Tolan observed that often critical information is undiscovered. He referred to Gomez’s presentation (see Chapter 2) in which Asian women were shown to have lower rates of breast cancer. He said he thought, “We should be studying why the lower rates occur because we want everybody to look more like them.” If researchers look for the variation in risk among groups, there may be information about why some groups are better off in some ways that could be useful, he said.
Small populations are important because they help make clear that interventions occur within a social and political ecology. The determinants of risk, the causal mechanisms of disorder and disease, and the way interventions can be mounted and sustained all occur within that ecology.
Tolan turned to health equity justice issues, saying deeper evidence about disparities is needed. One school of thought says a person’s zip code indicates his or her level of education, mortality, and many other characteristics. It is a quick (but inaccurate) way of saying that zip code determines development. But, he said, we need to dig further to understand the patterns of risk exposure, the kinds of mechanisms that are important in terms of likelihood of expression of disorder, and what interventions might be valuable to help. He stated small population interventions are an important way to start to dig into the mechanisms of disparities. He asked whether interventions can further understanding of disparities based on economic level, ethnic group, or zip code and the extent to which they are related to access and distribution of resources.
Researchers should start to see the work they are doing as helping to build a good, solid case as a civil right, as a justice issue, by showing the intimacy between health risk, health outcomes, health resource access, and political and social capital of the small populations. He suggested that scientists, funders, and administrators start to put health equity high on their list of scientific research priorities and consider the idea that the value of a given study has to be filtered through the lens of how it helps address health equity. Tolan said small populations will become recognized for the value they have.
Kelly Devers asked the panel about examples of the use of electronic health records (EHRs) to complement other kinds of interventions and to speed the cycle between research and implementation. She also asked whether they had seen EHR data used by community schools and social service agencies.
Kilbourne replied that the Department of Veterans Affairs (VA) has pioneered research in utilizing EHRs to improve quality and outcomes of care. From a personal and consumer perspective, the VA also implemented the Blue Button, a personal EHR, and researched how people use their own personal health record and information to communicate with their providers. There has been pioneering research in the communities who serve the serious and persistent mentally ill on shared decision making and personalized health records as well.
David Berrigan (National Cancer Institute) asked Lu about the need for additional control groups in natural experiments as suggested by Medical Research Council guidelines. He said that grantees and study sections have been grappling with how best to evaluate natural experiments. Lu replied the answer is to try to think creatively. In her FDA drug warning study,
the message was targeting the pediatric population. The group she and her colleagues used for comparison was not a control group per se but a group that was not targeted by that message—the adult population. Because they were not targeted by the FDA warning and were not the target or the audience for the media outlets, adults were essentially a group that could be used for comparison. That group also reduced its antidepressant usage (spillover effects), but not to the magnitude observed in youth.
She suggested that another approach is to try to create comparison outcome measures to help determine whether what is observed is related to the intervention of interest. For example, the FDA example was about antidepressants. An alternative is to measure a medication not targeted by the same policy.
Aimee James (Washington University in St. Louis) said she advocates that researchers stop calling people “hard to reach” because “they are hard for us as researchers to reach if we are outsiders” or they are “hard to recruit” because of things “we have done poorly in the past.” She also encouraged the group to start being specific about the meaning of “small populations.”
Shobha Srinivasan (National Cancer Institute) observed that many of the workshop presentations alluded to interventions and data but did not talk about the process of implementing interventions. She noted that Kilbourne’s presentation described implementation science in terms of the process. She noted that sometimes a measured change in the outcome may be due to the intervention process and not the intervention itself and wondered how to properly account for this.
Kilbourne agreed and added that Korngiebel’s thoughts about community-based participatory research and user-centered design are the type of implementation strategies needed. They may quicken the evolution of what is thought of as an evidence-based practice into something better. User-centered design might help identify the best way of delivering an intervention more efficiently than just letting the process unfold.
Allen concluded by referring to a 10-article report on intervention work in Alaska. He said a colleague, Ed Trickett, helped write the last article and insisted that the title be “Most of the Story Is Missing.”17 He said the title speaks to the fact that how researchers report is all too often lacking in sufficient specification of what actually happened in the intervention, important events and elements in the implementation, and examples of responsiveness to local context that facilitated intervention success.
17 Trickett, E.J., Trimble, J.E., and Allen, J. (2014). Most of the story is missing: Advocating for a more complete intervention story. American Journal of Community Psychology, 54(1-2):180-186.
This page intentionally left blank.