Ethics pervades discussions of the collection and use of data. It is accepted that certain types of health data serve a public good and are in the interest of the community so can and should be collected. For example, certain vital statistics are required to be reported in all jurisdictions. Similarly, reporting of some conditions is mandatory. In the case of children, this includes birth defects (in some jurisdictions), newborn metabolic screening, newborn hearing screening (in some jurisdictions), certain infectious diseases, all cancers, and suspected abuse and neglect. The exact list of reportable conditions is further guided by state law. There is also general consensus that it is appropriate to track the rates of some potentially sensitive population characteristics using aggregated data (e.g., suicide rates, adolescent pregnancy). Much of the information that is collected in national and state surveys is also reported as aggregated data. However, reporting of data also raises issues of privacy and confidentiality.
Virtually all of the datasets used to obtain aggregated data include detailed information on individuals or in some cases on all the individuals in a family or household. This provides the mechanism for review and analysis of data pertaining to individuals and the cross-tabulation of data to assess how two or more characteristics are associated. For example, one could determine the proportion of white males in rural areas who are immunized. In order to examine such relationships, identifiers are routinely removed to preserve anonymity.
Although the agency that collected the data could theoretically identify the individual, and in some cases does go back to collect additional information (as for example in a follow-back or longitudinal study), the information that could be used for identification is stripped from the file before the data are analyzed. This protects the individual who provided the information, but at the same time allows analysts to query the data so that multiple pieces of information can be combined to assess the relationships between elements in the dataset.
Numerous data are available in public use datasets that can be analyzed both by public health officials and by investigators in research institutions. This mechanism has provided a great deal of current information about children’s health. The removal of individual identifiers protects the individual’s privacy while advancing the public interest to learn more about children’s health. Furthermore, there are requirements to ensure that individuals with rare conditions are not identifiable by prohibiting the reporting of units of analysis (such as geographic sites) that would allow the person to be traced.
Data anonymity is now possible with current technologies, even for linking