National Academies Press: OpenBook

Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary (2010)

Chapter: 6 Participant Views and Unresolved Issues

« Previous: 5 Ensuring Access and Confidentiality
Suggested Citation:"6 Participant Views and Unresolved Issues." National Research Council. 2010. Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12797.
×

6
Participant Views and Unresolved Issues

The workshop served to emphasize the tension between data access and confidentiality protection. It pointed out the need for the National Science Foundation (NSF) to continually solicit input from respondents and data users on decisions regarding the balance between access and protection.

As for the specific options for protecting the Survey of Earned Doctorates (SED) data, participants in the workshop expressed their views that the NSF had made the correct decision in selecting an aggregation approach to limiting disclosure risk for the Race/Ethnicity, Gender, and Fine Field of Study Tables (called here the REG tables). In summarizing the workshop, Mark Schneider said that “we are now going in the direction of aggregation rules.” Jacob Bournazian’s overall assessment was that the data aggregation approach selected by NSF is both compatible with user needs and with future growth in accessing data. At the conclusion of his presentation, Jerome Reiter commended NSF for the decision to select an aggregation approach rather than a data suppression one.

Although most people at the workshop agreed with the approach selected by NSF, there were important caveats. Reiter, for example, agreed that suppression should be avoided but pointed out that aggregation, as envisioned in the NSF solution, also has some drawbacks. These drawbacks and other unresolved issues brought up in the general discussion period are outlined below.

Suggested Citation:"6 Participant Views and Unresolved Issues." National Research Council. 2010. Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12797.
×

Partially synthesized data. Reiter pointed out that the NSF solution protects against reidentification based only on field. It fails to protect against the possibility of colleague or self-identification. He is concerned that, if a colleague knows the field, the year, and the gender (or similar sensitive information), the data will not be able to be fully protected.

One means of protecting against this problem is to partially synthesize only the small cells that are most susceptible to being disclosed, publishing simulated data for the 4 percent of cells that are most sensitive. The simulated data would look and behave like actual data, valid inferences could be derived from statistical procedures, and longitudinal series could be preserved. This methodology would avoid the problem inherent with the use of the “cutoff of 25” rule, in that cells would not be subject to publication one year and disappearance in another year because the cutoff was not attained.

However, as several participants pointed out, the users of the data need actual counts for policy evaluation and other purposes, and synthetic data would not suit their purposes. The use of synthetic data was compared to data perturbation, with the important difference that the aggregates would be unaffected. Using synthetic data might lead to the further problem, Schneider suggested, of having a dual approach in which because some number of users want the real data, different representations of the same data cells would be generated from the restricted data sets, and two sets of numbers—synthetic and real—would be published.


Concentration criteria. A possible solution to the problem of colleague identification, as suggested by Stephen Cohen, would be to set criteria for concentrations. This approach would be similar to the rules now used by some government agencies to publish data only for companies in which there is not a concentration of the variable of interest. In this case, the concentration rules could be based on the number of schools that contribute doctorates to the field.

In the discussion that followed Reiter’s presentation, Cohen gave an example of how a smart intruder with some knowledge could identify a respondent through the published tables. The NSF contractor was able to use Google scholar, dissertation abstracts, and a candidate whose gender and race were surmised from a faculty photograph found on a departmental website to find a match in the SED. The result was judged to be a correct match. This exercise lent support to the aggregation decision.

Suggested Citation:"6 Participant Views and Unresolved Issues." National Research Council. 2010. Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12797.
×

Volatility of the data. Among the issues that warrant additional investigation, according to several participants, is that of the volatility of the estimates when the cutoff of 25 rule is applied. The NSF proposal would be to reassess the fine fields every 3 years to identify fields to be added or deleted based on the number of doctorates in a field and the number of schools granting those doctorates. Currently, NSF adds 8 to 10 new fields and loses 2 or 3 fields as a result of these triennial reviews. The decision is usually a joint decision between NSF and the sponsoring agencies of the survey.


Informed consent. One means of avoiding the problem of potential identification of persons in the published small cells is to obtain the permission of individuals to have personal characteristics and other data published. This would be done by asking for their informed consent to make their data available. It was suggested that informed consent would be sought only for certain sensitive data items, such as gender, race, or ethnicity. This might avoid increased nonresponse that might accompany asking for informed consent for the whole array of data collected by the survey.

According to Cohen, the NSF legislation seems to prohibit requesting the informed consent of the respondents for release of their data. Although the Confidential Information Protection and Statistical Efficiency Act permits the solicitation of the informed consent of respondents, the NSF legislation and the data collection strategy militate against using this authority for SED. Nonetheless, Lynda Carlson agreed that it would be useful to test an application of informed consent to the SED to see if obtaining such consent would be feasible. If so, NSF would then be in a position to deal with the implementation of such a procedure.

Suggested Citation:"6 Participant Views and Unresolved Issues." National Research Council. 2010. Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12797.
×
Page 44
Suggested Citation:"6 Participant Views and Unresolved Issues." National Research Council. 2010. Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12797.
×
Page 45
Suggested Citation:"6 Participant Views and Unresolved Issues." National Research Council. 2010. Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/12797.
×
Page 46
Next: References »
Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary Get This Book
×
 Protecting and Accessing Data from the Survey of Earned Doctorates: A Workshop Summary
Buy Paperback | $31.00 Buy Ebook | $24.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The Survey of Earned Doctorates (SED) collects data on the number and characteristics of individuals receiving research doctoral degrees from all accredited U.S. institutions. The results of this annual survey are used to assess characteristics and trends in doctorate education and degrees. This information is vital for education and labor force planners and researchers in the federal government and in academia.

To protect the confidentiality of data, new and more stringent procedures were implemented for the 2006 SED data released in 2007. These procedures suppressed many previously published data elements. The organizations and institutions that had previously relied on these data to assess progress in measure of achievement and equality suddenly found themselves without a yardstick with which to measure progress.

Several initiatives were taken to address these concerns, including the workshop summarized in this volume. The goal of the workshop was to address the appropriateness of the decisions that SRS made and to help the agency and data users consider future actions that might permit release of useful data while protecting the confidentiality of the survey responses.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!