The outreach program, which the National Science Foundation (NSF) pursued after retracting its decision to suppress some data from the 2006 Survey of Earned Doctorates (SED), consisted of two major efforts: a web-based survey and a series of outreach meetings. At the conclusion of the outreach program, the Division of Science Resources Statistics (SRS) summarized the findings and considered them in selecting an approach to protecting confidentiality in the survey in the future.
Stephen Cohen reported on the SRS collaboration with the SED contractor to develop and implement a brief web survey to collect information from known users of the Race/Ethnicity, Gender, and Fine Field of Study Tables (called here the REG tables) and the Doctorate Recipients from United States Universities: Summary Report (also known as the Interagency Summary Report). The major purpose of the web survey was to gather information from these data users about their uses of the SED data and their preferences for disclosure protection alternatives.
The sample for the web survey consisted of three types of data providers and users: (1) deans of institutional units that administer the survey (n = 543), (2) data users who asked to receive the REG tables annually (n = 31), and (3) individuals who requested the 2006 REG tables via the SRS website between June and September 2008 (n = 297). The survey focused on the
following information: frequency of SED data use; whether SED data were used to fulfill specific federal, state, or institutional reporting requirements; whether users aggregate or average data across fields of study, racial/ethnic categories, or year (and their preferences for these types of aggregation); and uses of other SED reports. NSF received responses from 373 (43 percent) of the 871 sample units.1
The major findings of the web-based survey were that most respondents (81 percent) use the Interagency Summary Report, half are REG tables users, and 25 percent are users of other SED data products. Although REG tables users and Interagency Summary Report users examine race/ethnicity/gender data across the range of degree fields, over 70 percent said they focus more on the degree counts of women and underrepresented minorities.
There is a core of long-term users of these data; nearly 30 percent of respondents stated that they have been using SED data products for more than 10 years. They use SED data for a variety of purposes; fulfilling state, federal, or institutional requirements is an important, but not the dominant, reason for using SED data. Approximately 30 percent of respondents aggregate SED data (across fields of degree or across years) for some reporting purposes, often for completing reports to be provided to various offices in NSF itself. Not surprisingly, about two-thirds of respondents prefer the option of a 2-year aggregation of SED data to a 3- or 4-year aggregation for disclosure protection (Simko and Dominguez, 2008).
DATA USER OUTREACH MEETINGS
Shirley McBay, president of the Quality Education for Minorities (QEM) Network, reported on her organization’s project to organize, schedule, and conduct eight outreach meetings with representatives of minority-serving doctoral degree-granting institutions, leading institutional producers of doctoral degrees to minority recipients, and science, technology, engineering, and mathematics (STEM) professional organizations. The meetings took place between mid-October and early December 2008 in geographically dispersed locations in order to reach a range of institutions and associations. A sample of job titles of the institutional participants includes assistant vice chancellor, associate vice provost for academic affairs, associate vice president for research, dean, and director of institutional research. Job titles of participants from associations include president, ex-
ecutive director, research manager, director of publications/journals, and principal research analyst.
The meetings were designed to provide opportunities for participants to learn about SED confidentiality and privacy requirements, describe their specific SED data uses and needs, and give feedback to SRS about the disclosure protection alternatives. The small (focus) group format of the meetings enabled participants to become informed about the details of and rationale for the alternative approaches, ask clarifying questions, and leverage the insights of fellow participants. As a result, in comparison to the web survey, the outreach meetings provided SRS with more comprehensive information about the impacts of the alternatives on the data uses and needs of these SED data users, as well as a deeper understanding of their preferences for the particular attributes of the different alternatives, albeit from a smaller group of respondents.
The outreach meetings followed a common agenda. QEM staff led introductions, reviewed the meeting agenda, and described the background materials prepared for all participants. SRS staff (at least two attended every meeting) described the mission and data resources of the SRS division, the broader issues of confidentiality/privacy and data access (including the relevant legislation and Office of Management and Budget guidelines), and the most common methods of protecting against the statistical disclosure of confidential information. SRS then presented three alternative disclosure protection approaches, fielded questions, and received initial feedback. Summarizing the lessons learned in these outreach meetings, McBay listed the ways that users say they have been using the SED data (see Box 4-1).
McBay pointed out that one of the practical concerns voiced by the outreach meeting participants was that the loss of data on specific minority groups by field would negatively affect the programs to increase the participation of minorities in the fields. For example, the observation was made that American Indians would be specially impacted by this suppression because their numbers are usually smaller than those of other underrepresented minority groups. This is such a small community that, in fact, all members know the graduates by field, and they actively seek to put forward these doctorate awardees as role models.
Some groups use the data to follow trends in awards to minorities by field and to make comparisons with peer institutions. Some of this information is needed to prepare proposals to other parts of NSF to support programs for enhancing minority doctorate opportunities. Accreditation agencies often require information that compares the institution with peers and the size of the pool from which its student population is drawn.
How SED Data Are Used by Participating Institutions and Organizations
SOURCE: Workshop presentation by Shirley McBay.
Later in each meeting, participants described their uses of SED data and the impacts of data suppression, assessed the utility of the alternative disclosure protection approaches for their particular data uses, and expressed their preferences. New approaches raised by participants were also discussed (Quality Education for Minorities Network, 2009).
Finally, McBay summarized the preferences of the outreach meeting attendees for NSF to:
Obtain external advice on the interpretation of the confidentiality pledge and on ways to modify it to accommodate the reporting of small cells by race/ethnicity, gender, and citizenship.
Develop and discuss an action plan regarding data suppression with representatives of the other federal agency sponsors of the SED report—the National Institutes of Health, the National Aeronautics and Space Administration, the National Endowment for the Humanities, the U.S. Department of Education, and the U.S. Department of Agriculture—and, at NSF, the Social, Behavioral and Economic Directorate’s Advisory Committee, the SRS Human Resources Experts Panel, the Committee on Equal Opportunities in Science and Engineering, and senior staff/program officers in other NSF directorates responsible for specific initiatives for broadening participation.
In cases in which data on a racial/ethnic group would be lost in the aggregation, include “n < 3” in the table cell to indicate that the actual number in the cell ranges from zero (0) to two (2).
Aggregate the REG data to broad degree fields using the Classification of Instructional Programs (CIP) codes, which would support the accurate tracking, assessment, and reporting of fields of study and program completion activity (e.g., biological and biomedical sciences; computer and information sciences; engineering; mathematics and statistics; physical sciences; and psychology).
Document what REG data are lost at various threshold levels. Ensure that summary data tables include data on U.S. citizens and permanent residents to get a more accurate picture of the complete pool of STEM doctoral degree recipients in the United States, and revisit the policy of placing respondents who indicate more than one race in the “other” category (in conjunction with the implementation of the new standards for reporting race and ethnicity as directed by the Office of Management and Budget, which require separate reporting of people who indicate more than one race).2
Investigate and report on the impact of small data cell suppression on the SRS WebCASPAR (Integrated Science and Engineering Resources Data System) database—a system that contains informa-
tion about academic science and engineering resources and is newly available on the web.
Develop a strategy to ensure that information about these potential changes is broadly communicated.
Modify the SRS licensing process to make it easier and timelier for individuals to access unpublished restricted data and review the list of individuals/groups with these licenses to determine if special steps/outreach efforts are needed to ensure greater diversity among those with licenses.
Continue to pursue development of a data enclave that will make STEM data easily accessible to data users.
Take steps to ensure that significant racial/ethnic diversity exists among the leaders, conveners, and participants in data workshops conducted with SRS support.
Cohen referred to the NSF decision paper to summarize the preferences gleaned from analysis of the web-based survey and the outreach meetings (National Science Foundation, 2009):
Report small counts of doctorate recipients. In general, data users strongly prefer aggregation to data suppression as a method to protect the confidentiality of individually identifiable data. They want data cells containing small counts (including zero) to be displayed.
Disaggregate racial/ethnic categories. Data users strongly prefer that racial/ethnic categories be reported separately and not aggregated into a combined minorities category. Similarly, data users prefer that SRS report the multirace data separately, in its own (new) category, instead of combining that data in the Other-Unknown race/ethnicity category.
Minimize aggregation. Although users prefer aggregation over suppression, they prefer methods that result in less data aggregation.
If years are aggregated, data users prefer a 2-year aggregation over a 3- or 4-year one. However, they prefer no aggregation of years and having REG data reported for single years.
Most data users prefer that more fine fields of degree be displayed as single fields rather than be aggregated into combined fields.