Proceedings of a Workshop
Developing a Methodological Research Program for Longitudinal Studies
Proceedings of a Workshop—in Brief
As background about the motivation for the workshop, John Haaga (National Institute on Aging) said that the types of surveys funded by NIA are complex and the data needs require increasingly more complex methodological solutions. Moreover, longitudinal studies of aging have unique characteristics, so that the findings from the wealth of methodological research generated by the survey industry—primarily focused on cross-sectional studies—are not always applicable.
Graham Kalton (Westat) underscored that there are several different types of longitudinal studies, including cohort studies of individuals (such as the National Longitudinal Surveys or the National Health and Aging Trends Study) and household panel surveys (such as the Panel Study of Income Dynamics and Understanding Society and the UK Household Longitudinal Study). Rotating panel surveys (such as the Medical Expenditure Panel Survey, the Survey of Income and Program Participation) can also be considered a type of longitudinal survey, though they are typically used for cross-sectional analyses. Kalton provided an overview of long-standing methodological issues that are unique, or are of particular importance, to cohort studies of individuals and household panel surveys, the types of studies of primary interest for the workshop.
Kalton said that longitudinal studies need to be designed with a recognition that objectives will change over time. With that in mind, there are several basic design considerations. One is whether the survey is focused on a national or a local population, which has implications for the extent to which mobility in and out of the area will be a factor. Another is the types of characteristics used for oversampling, which are usually fixed characteristics, such as age
and race, rather than characteristics that can change, such as poverty. And a third is a plan for replenishing the sample to add new entrants to the population.
Additional considerations emphasized by Kalton as particularly important for the design of longitudinal surveys include: (1) rules that determine which sample members are followed from one wave to another, (2) sample loss from one wave to another, (3) strategies for maintaining high response rates, (4) use of administrative records, (5) use of adaptive design, (6) sample replenishment, (7) intervals between waves, (8) data collection modes, (9) panel conditioning effects, (10) implications for longitudinal analyses, (11) statistical disclosure control, and (12) management of a survey as a national resource.
Kalton highlighted several general priority areas for which new or continuing research is needed to inform the design of longitudinal surveys: research on how to obtain high initial response rates and maintain high retention rates; the effects of changing modes from one wave to another on longitudinal analyses; ways of reducing respondent burden; and opportunities for the use of administrative records and contextual variables to reduce burden and expand analysis capabilities.
LESSONS FROM THE UNITED KINGDOM
To provide a perspective based on the experience of a longitudinal survey that has a formal, large-scale mechanism built in for methodological research and experimentation, Annette Jäckle (University of Essex) described the UK’s Understanding Society Study and its Innovation Panel. The Innovation Panel is a survey of approximately 1,500 households and has been conducted annually since 2008. The survey includes the core Understanding Society questions and core methodological research experiments that have been developed to inform the design of the main survey. There is an open competition for researchers who would like to add their own methodological research to the Innovation Panel. On average, there are 11 experiments in each wave, and some experiments are conducted across multiple waves. Sometimes additional studies are conducted between the annual waves, using the Innovation Panel sample.
Jäckle said that the main strength of the Innovation Panel is that it is designed the same way and uses the same procedures as the main Understanding Society Study. However, she acknowledged, the large number of experiments raises questions about whether the findings can be generalized to the main survey. In particular, it is possible that the experiments could lead to differential errors related to attrition, context effects, and panel conditioning. Based on the research that has been conducted to date to examine these issues, Jäckle said it appears that the findings from the Innovation Panel have good external and internal validity. The experiments have been useful in informing decisions about such issues as mixed mode designs and a mobile app to measure monthly household expenditures.
OPTIMIZING PERIODICITY AND CONTENT
One of the workshop sessions focused on the research needs to inform decisions about the frequency and timing of data collection waves in longitudinal studies. Vicki Freedman (University of Michigan) emphasized that decisions about periodicity need to be considered in the context of the primary scientific focus of the study, which can be grouped into three broad categories of topics: (1) population-level trends, (2) life-course influences on later life, and (3) individual-level trajectories.
Freedman said that research is needed to better understand how wider periodicities affect costs, including interviewer recruiting, training, and retention costs, and the costs associated with sample member tracking and contacts between follow-ups. Life-course studies in particular would benefit from research on the relevant windows of vulnerability for different types of predictors of interest. It would also be useful to conduct research on the relevant ages for measuring outcomes of interest, for example, by identifying periods of steep change.
Freedman noted that secondary analysis of small-scale epidemiological studies with rich detail could shed further light on this question, even if the predictors in the available studies are not exactly the predictors of interest. In studies that aim to understand individual-level trajectories, it may be possible to assign a subset of the sample to more frequent data collections, some of which align with the data collections in a broader interviewer-assisted study. Secondary analysis of the dynamics could shed light on the extent of the bias as the interval is widened, she said.
Randall Olsen (Ohio State University) shared Freedman’s view that the ideal periodicity for a survey ultimately depends on its research goals. He argued that existing longitudinal surveys are a rich source of underutilized data that could be used to research periodicity. For example, the National Longitudinal Surveys have been conducted with varying periodicity since their beginning more than four decades ago, and several experiments were embedded into them over the years. He noted that sample members who do not respond during each follow-up wave introduce additional variation, but that some of the missing data are recovered during subsequent rounds using bounded interviewing techniques. He argued that the most
valuable experiments in the near future will likely be focused on fieldwork strategies, particularly on evaluating strategies that reduce nonresponse.
Marco Angrisani (University of Southern California) discussed a project that evaluated the feasibility of linking survey data from the Understanding America Study to data from real-time electronic financial transactions. Linking to those types of data is challenging because respondents not only have to consent to provide access but also have to take several steps to enable the linking: create an account with the financial management Website used for the study, add a financial institution to the account, and keep the account up to date. Due to the burden of this process, only a subset of sample members who initially agree to participate actually complete all the steps. He said that additional research is needed to understand how to overcome these barriers and promote participation through the use of incentives, visual aids, a help line, or other methods. Studies that advance understanding on how to implement these linkages would be particularly valuable because access to data of this type could represent a paradigm shift from fixed-period survey designs to continuous or event-triggered designs.
ALTERNATIVES AND ENHANCEMENTS TO SELF-REPORTED DATA
One of the workshop sessions focused on the growing area of methodological research to identify and evaluate new sources and forms of data that can be linked to self-reported survey data. Pamela Herd (University of Wisconsin) discussed the feasibility of expanding biological data collections in population-based longitudinal studies to include data on the gut microbiome. These types of data collections involve logistical challenges and an increased burden on respondents, but a pilot study conducted as part of the Wisconsin Longitudinal Study had high response rates. She noted that the qualitative research conducted in preparation for the study was helpful in identifying the messages that were able to overcome respondent concerns and encourage participation. For example, the advance letter for the study included a Time magazine article about the gut microbiome. Herd underscored the benefits of interdisciplinary collaborations for these types of studies, particularly the greatly expanded usefulness of the data when biological data can be linked to population-based survey data.
Turning to two areas that are especially difficult to measure using self-reports, activity and sleep, particularly sleep quality, Phil Schumm (University of Chicago) described advances in using actigraphy to more directly measure activity and sleep in population-based longitudinal studies. He said that the latest generation of actigraphy devices is very capable and cost efficient and can be shared across studies. Work on new methods for extracting features is progressing quickly across many disciplines, he said, but more integration is needed with the types of research conducted by survey methodologists. In addition, Schumm said, there is a need to focus on new functional analysis approaches to the study of activity (and, possibly, sleep) throughout the entire day.
Jeffrey Kaye (Oregon Health and Science University) discussed other remote sensing methodologies for unobtrusive data collection. He noted that detecting meaningful change is of particular interest for longitudinal studies, but it is difficult to accomplish using traditional data collection methods. Technologies that can provide continuous data—such as passive sensors installed in people’s homes—can detect mobility, walking speed, sleep, and night-time behavior and can detect changes in variability often missed with sparsely spaced periodic data collections.
In addition to more frequent measurement, remote-sensing technologies can also provide more objective, reliable, and ecologically valid data than self-reports. Consequently, deeper analysis is possible because of greater time domain precision and the possibility of integrating uniformly time-stamped data across multiple domains. Although there are major challenges associated with the scalability of using remote-sensing technologies in large longitudinal studies, it is not impossible to do, particularly for studies that begin with an in-person interview. Kaye argued that smaller studies can continue to provide valuable information that helps researchers understand how to take advantage of these new sources of data on a larger scale in the near future.
Nancy Bates (U.S. Census Bureau) provided an overview of consent and confidentiality considerations when linking survey data to data from other sources. These considerations are becoming increasingly important as more and more data from different sources are integrated. She said that research is needed to build on studies that have begun to examine such questions as how the mode, framing, and placement of the request affects consent and the most effective ways for communicating the request. Of particular importance to studies on aging is to better understand how older people interpret consent. Bates noted that vast amounts of data are available from existing studies, which can be used to conduct analyses about the characteristics of people who do not provide consent.
Kelly Peters (American Institutes for Research) described the record linkage activities in Project Talent, a
longitudinal survey that began with a sample of high school students in 1960. The study team has worked on expanding the usefulness of the survey data with linkages to data from the Social Security Administration and the Centers for Medicare & Medicaid Services. Among several questions, methodological research as part of these projects has evaluated ways of obtaining consent. She noted that, overall, record linkages reduced cost and respondent burden for Project Talent.
Jennifer Ailshire (University of Southern California) discussed opportunities for expanding longitudinal studies of aging through linkages to contextual data, such as information about the socioeconomic and demographic characteristics of a place, the built environment, physical environment, and availability of health care. Although these types of linked datasets are not easy to manage and use, and data user access needs to be balanced with considerations of confidentiality and administrative burden, the datasets create valuable new research opportunities. To move forward, Ailshire said, it would be useful to establish a central data warehouse that could bring together linked data and facilitate their storage and distribution. Developing an infrastructure that would enable researchers to input respondent household or workplace coordinates and receive spatially linked data for these cases would be of great benefit to researchers, she said.
Building on work in the fields of cognitive neuroscience, psychology, and empirical social science, Seth Sanders (Duke University) described research on the possible uses of response-time data to model cognition and cognitive decline. Acknowledging that survey response-time data can conflate several different processes (such as reading and answering speed or interviewer and respondent behavior) and that researchers do not have the same control over the environment during a survey interview as neuroscientists in a laboratory experiment, these types of data are readily available as a byproduct of computer-assisted data collections and could be particularly useful in longitudinal surveys to compare two data points. Sanders reported that he and his colleagues used response time on the Montreal Cognitive Assessment screening test, but future research could evaluate whether other sets of survey questions could also be used to model cognitive decline. This research also raises the question of whether other neuroscience methods (such as eye tracking or functional magnetic resonance imaging) could also be used to improve the usefulness of response-time data in surveys and to investigate whether response-time data could improve measurement and modeling in other research.
As a wrap-up to the session on alternatives to self-reported data, Brian Harris-Kojetin (National Academies of Sciences, Engineering, and Medicine) summarized the work of a Committee on National Statistics panel that was asked to provide recommendations to increase the use of combinations of multiple data sources in federal statistical programs. The panel’s first report discussed the use of both government and private-sector data and recommended the development of a framework for combining data from multiple sources.2 The report also discussed the increased privacy and confidentiality concerns when combining datasets. The panel’s more comprehensive second report is scheduled to be released in late summer 2017.
REDUCING RESPONDENT BURDEN, INCREASING PARTICIPATION, AND IMPROVING DATA QUALITY
Mick Couper (University of Michigan) turned to the issues of respondent burden, participation rates, and data quality, which are complex, interconnected, and often involve tradeoffs. He provided an overview of the opportunities and challenges associated with the use of mixed-mode data collections and new interview modes. Research is needed to better understand how to target mixed-mode designs to maximum effect, he said, for example, by predicting who is most likely to respond through the Web and targeting requests accordingly. More work is also needed to identify the best ways for addressing differential Internet and smartphone coverage, he said.
Couper noted that longitudinal surveys collect an increasing variety of measurements, in addition to self-reported data, and there are questions about how to most efficiently integrate the collection of these types of data into the overall survey process. He offered several examples, including: What types of biological samples is it feasible to ask respondents to mail in? What are the implications of different modes of administration for tests of cognitive ability? He agreed with Bates that research is needed on how to increase consent rates for administrative data linkages, particularly on the Web. A related question is how to increase the use of new technologies that facilitate additional forms of measurement in longitudinal surveys, such as accelerometers or global positioning systems. Couper said that an underlying goal has to be to develop research designs that can answer questions related to mixed-mode data collections and new technologies without negatively affecting the core data collection processes in ongoing longitudinal surveys.
Chris Chapman (National Center for Education Statistics [NCES]) described efforts to increase response
rates and reduce nonresponse bias in the Beginning Postsecondary Student Longitudinal Study (BPS). He argued that longitudinal studies are particularly well suited for adaptive design strategies that rely on modeling response propensity and potential response bias because they benefit from the availability of data from the sampling frame or administrative records, paradata, substantive information from prior interviews, and, sometimes, mode flexibility. The BPS experiments allowed NCES to target incentives in ways that were most effective in improving data quality. Specifically, NCES was able to substantially reduce nonresponse bias by focusing on nonrespondents who had a high likelihood of contributing to nonresponse bias and had a high response propensity score, and by identifying the incentive amount ($45) that led to reduction in bias in the largest number of estimates. Chapman noted that studying intervention options with a sub-sample prior to implementation of a full survey is prudent when it is feasible.
Rob Warren (University of Minnesota) turned to the subject of panel conditioning effects, the idea that the act of responding to survey questions can, over time, change attitudes, behaviors, or at least the quality of the reports on those attitudes and behaviors. This challenge is unique to longitudinal studies. Warren said that although it is generally well understood that these types of effects may occur, they rarely receive adequate attention. More research is needed to understand the circumstances and respondent characteristics that increase the likelihood of panel conditioning effects, he said. Building methodological experiments into data collections can improve understanding of these issues and help improve survey design in subsequent waves. Although it is not always practical to undertake elaborate experiments, rotating panels can usually be easily accommodated and can provide a mechanism for evaluating panel conditioning effects in a particular survey.
COST CONSIDERATIONS AND COST-EFFECTIVENESS MEASURES
Michaela Benzeval (University of Essex) talked about survey costs in the context of the Understanding Society Study described earlier. She said that funding for the study comes from a variety of sources, including the UK Economic and Social Research Council and government agencies. She noted that there is increasing interest in the ability of a study to demonstrate its value and a general trend toward tighter budgets. In recent years, the sponsors pushed for a switch to a sequential mixed-mode design for Understanding Society. In response, Benzeval and her team had to become more creative in managing the uncertainties and risks involved with this shift, in an environment in which the data collection contractors’ actual costs are not sufficiently transparent. Because only prices (no direct cost information) are available from the survey’s current contractor, an “open book” accounting framework was developed: that framework includes a detailed spreadsheet of variable prices associated with different survey modes, linked to predicted responses for those modes. If either the budget or the response rates differ substantially from what was anticipated, the detailed variable price information can inform decisions about how to vary activities in order to maximize response rates within the fixed budget.
The initial Understanding Society experience with introducing mixed-mode data collection was that the variable costs declined modestly, but the fixed costs and the overall cost per household increased substantially. Whether the fixed costs could be reduced in subsequent waves is an open question. In particular, the research team will want to investigate whether it is possible to encourage the Web response mode before other modes for the entire sample, whether it is possible to reduce face-to-face contact without a drop in response rates over waves, and whether there are other ways of engaging respondents—for example, through newsletters—that would reduce costs.
Stephen Smith (NORC at the University of Chicago) discussed survey costs, primarily based on his experience with the National Social Life, Health, and Aging Project. He noted that there are many differences among contractors in the metrics used to track data collection costs. These differences make comparisons difficult even when cost data might be available. Although there are cost implications associated with all of the methodological issues that have been discussed throughout the workshop, Smith argued that there are several areas that deserve particular attention from a cost perspective: (1) exploring different options for periodicity, including continuous data collections; (2) understanding the actual cost implications of “cheaper” modes; (3) chasing high response rates; (4) considering cost versus the quality of mail data collection; (5) assessing interviewer training approaches; (6) staying connected with respondents between data collection waves; (7) leveraging new technologies for respondent contact and response mode; (8) considering incentive levels; (9) and leveraging bulk purchasing power across studies (e.g., for specimen equipment).
Brad Edwards (Westat), using the example of the National Health and Aging Trends Study, pointed out that longitudinal studies have unique and complex cost implications that differ from cross-sectional studies. However, he said, there are some basic metrics related to variable costs that tend to be measured the same way across contractors, such as response rates, incen-
tives per sample unit, and interviewer hours per completed case.
Edwards noted that there are several additional cost metrics that have the potential to be shared more openly. One set of metrics includes total project costs divided by years in funding vehicle, sample size, total number of interviews, number of rounds, and design type. A second set of metrics for fixed costs would be months from inception to launch, number of variables, and months from the end of data collection to the public release of data. A third set of metrics for variable costs could come from paradata on contact attempts and successes; incentive protocol and payouts; response rate and level-of-effort effects by incentives, as well as the total cost by sample unit and by incentives; and for Web and mail responses, the effects of reminders over time.
In terms of research needs, Edwards highlighted the question of what makes participation of value to respondents. In particular, he said, it would be useful to better understand how engagement, saliency, and gamification influence respondent retention and survey costs.
PERSPECTIVES ON KEY THEMES AND RESEARCH NEEDS
During the final session of the workshop, four of the workshop planning committee members offered their perspectives on the key themes and research needs that emerged from the discussion. Robert Hauser (University of Wisconsin) began by highlighting panel conditioning effects and consent for data linkage as two areas for which more research is needed. With new rules going into effect in 2018 for the protection of research participants, there will be increased emphasis on the informed aspects of consent and a better understanding of what facilitates informed consent will become particularly important. Hauser argued that the tradeoffs associated with nonresponse need to be better understood, building on research on the bias implications associated with nonresponse, but also remembering the importance of maintaining sufficient statistical power for longitudinal analyses. He noted that the discussion of survey costs was very useful and urged the workshop participants to maintain a focus as part of the cost research on how methodological changes may affect subsequent waves of data collections in longitudinal studies.
Hauser suggested that more research may be needed to understand how questionnaire length relates to respondents’ willingness to complete a survey, and, in particular, the differences between the two self-administered modes, paper-and-pencil questionnaires and Web surveys. He also argued that the way researchers communicate with participants between survey waves deserves more attention and that new research in the area of science communication may be able to provide ideas.
Another area of research that Hauser said is promising is the use of paradata, such as contact records. The question of how interviewer characteristics affect interview outcomes is not new, but the discussion about continuous data collection designs raises this question in a new light. Finally, an important question that emerged from the workshop is whether it would be possible to develop a vehicle dedicated to methodological research, similar to the Innovation Panel of the Understanding Society Study. Hauser said that a panel that could simultaneously serve the needs of several longitudinal surveys may not be feasible, but whether individual surveys could include a methodological research panel deserves consideration.
Maria Glymour (University of California, San Francisco) argued that one of the most urgent research needs is related to the representativeness of data that may be available for potential linkages, and, in particular, how this affects the generalizability of research based on linked data. In addition to selective participation, selective survival of the records is also a concern. Glymour agreed with others who pointed out that the related question of what motivates people to consent to data linkage is an important area of research to pursue, and some of this research could be based on qualitative studies. She argued that a better understanding of what drives participation could be used to develop weighting models for nonrepresentative samples.
In terms of research on aging, Glymour said that pursuing continuous data collection, which could be a passive form of data collection, in combination with adaptive designs that could trigger targeted followups during the last months of a person’s life, could provide invaluable information during a critical period in the life course. She noted that it is not clear which variables would be most useful to monitor, but the issue of representativeness is, again, an important consideration.
Another priority area for research highlighted by Glymour related to opportunities for expanding the use of existing data from nonsurvey sources. Clinical data are highly relevant for aging studies, and therefore the ability to link to electronic medical records needs to be explored. The usefulness of these types of data could further be expanded through more ambitious projects, she said, such as the use of machine learning and natural language processing techniques to extract detailed information from medical records and the potential addition of imaging data. Glymour also agreed that the use of paradata, such as response
time or interviewer effects, needs to be explored and that longitudinal studies, particularly those that are interested in cognition, present unique opportunities for expanding the use of these types of data. Glymour argued that interdisciplinary collaborations could make an especially powerful contribution to advancing research in this area. A final general area of research that Glymour emphasized as a priority is evaluating and reporting data quality, ideally using standardized metrics. As noted throughout the workshop, this topic becomes especially important with the increasing use of data from multiple sources.
Jäckle agreed with Hauser that implementing a survey similar to the Understanding Society Innovation Panel for multiple longitudinal surveys is unlikely to be feasible, due to the many differences in data collection methods among surveys. However, it may be worthwhile to pursue whether individual surveys could carve out subsets of their samples for methodological research, which could enable in-depth research that not only shows an effect (or the lack of an effect) but also provides insights into why a certain phenomenon occurs.
Jäckle agreed with others that research on consent is important. Increasingly more complicated data linkages call for a better understanding of the factors that influence people’s willingness to provide consent, and in a broader sense of the barriers to participation, as illustrated by Angrisani’s research, described earlier. Jäckle echoed Glymour’s argument that it is also important to pursue research on the implications for data quality of linking information from different sources, particularly the implications in terms of representativeness.
Jäckle noted that the question of panel conditioning effects deserves renewed attention with the increasing interest in the use of data associated with technologies that were originally designed specifically to change behavior, such as actigraphy devices and financial aggregators. Jäckle pointed out that another characteristic of new technologies is that they are rapidly evolving, which could have implications for the comparability of data collected over several waves in a longitudinal study. She argued that research similar to the mode-effects studies may be needed to understand these potential effects.
Colm O’Muircheartaigh (University of Chicago) noted that the discussion of costs revealed that the challenges that stand in the way of meaningful exchanges about costs are not simply due to differences in accounting systems and concerns about the proprietary nature of the data; rather, much more work is needed to develop the types of metrics that would be most useful to have in order to inform the design of longitudinal studies. He argued that the general goal of increasing synergies across longitudinal surveys is worth exploring and that synergies can take many forms, but, he acknowledged, these types of collaborations are challenging and often involve compromises. He noted that involving survey methodologists and survey operations staff in discussions about particular surveys could make a substantial difference in researchers’ ability to identify opportunities to improve the designs.
In terms of the usefulness of a potential panel set aside for methodological research, O’Muircheartaigh agreed with Jäckle that this would be most useful if the test panel mirrors the design of the main survey. He said this goal could be accomplished by simply partitioning the sample into two parts: a core sample and a subset that could be in the field in parallel, or possibly 6 to 12 months ahead of the main sample. From the perspective of obtaining funding for such a panel, he said, an ideal structure may be one that is flexible enough to allow for the possibility of combining data from the subset with the data from the main survey, when the nature and the outcome of the experiments allow this kind of approach.
WORKSHOP PLANNING COMMITTEE: James Jackson (Chair), Russell Sage Foundation, New York, and Institute for Social Research, University of Michigan; Maria Glymour, School of Medicine, University of California, San Francisco; Robert Hauser, Department of Sociology (emeritus), University of Wisconsin—Madison; Annette Jäckle, Institute for Social and Economic Research, University of Essex, United Kingdom; Colm O’Muircheartaigh, Harris School of Public Policy, University of Chicago.
DISCLAIMER: This Proceedings of a Workshop—in Brief was prepared by Krisztina Marton, rapporteur, as a factual summary of what occurred at the meeting. The statements made are those of the rapporteur or individual meeting participants and do not necessarily represent the views of all meeting participants; the planning committee; the Committee on National Statistics; or the National Academies of Sciences, Engineering, and Medicine.
REVIEWERS: To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop—in Brief was reviewed by Maria Glymour, School of Medicine, University of California, San Francisco; Richard Jones, Department of Psychiatry and Human Behavior, Brown University; and Stephen Smith, NORC at the University of Chicago. Kirsten Sampson Snyder, National Academies of Sciences, Engineering, and Medicine, served as review coordinator.
SPONSORS: This workshop was supported by the National Institute on Aging of the National Institutes of Health. For additional information regarding the meeting, visit http://nas.edu/Longitudinal-Methods-Workshop.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. (2017). Developing a Methodological Program for Longitudinal Studies: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/24844.
Division of Behavioral and Social Sciences and Education
Copyright 2017 by the National Academy of Sciences. All rights reserved.