Cybersecurity tools and techniques are one of the foundations for trust that information will be protected, such as that trade secrets will be safeguarded or that personal information will be kept confidential. As people conduct more of their daily lives online, opportunities to acquire and misuse financial, medical, sexual, and other forms of personal information are multiplying. Furthermore, the continued development and spread of computer and communications technologies are creating new ways for companies, governments, and criminals to gain access to information that people would rather keep to themselves. And once data have been generated and exist somewhere, disclosure of those data creates the potential for harm. A particular challenge is that even if disclosure of some data is not likely to cause harm, aggregation of those data with other data may be harmful. Researchers have explored potential technical solutions to some aspects of this problem, such as differential privacy, but these work at best in limited circumstances, and the general challenge persists.
Individuals have many preferences about their privacy, and those preferences are not fixed. They are dynamic, informed by context, shaped by relationships with other people and institutions, and constantly under negotiation. Sometimes these preferences coalesce socially into expectations, norms, or conventions that are associated with particular contexts. At the same time, governments, communities, social networks, and businesses have legitimate interests in acquiring, analyzing, and using data about individuals. These interests may be commercial, governmental, or social, but they all create a desire or a need for personal information.
The advent of the era of “big data” is further complicating the protection of privacy. Today, data are being transacted, computed, observed, and sensed, and data from many different sources can be combined. Individuals do business with companies, live in communities, associate with each other in societies, and are overseen by governments. Data are used for health care, law enforcement, intelligence, politics, education, and virtually every business. All of these data can be stored indefinitely, replicated, and combined in unlimited ways. For example, software now exists that can analyze a person’s social media posts, connect them with other data about that person available online, and construct a surprisingly detailed and accurate profile of that individual.
In the modern world, an individual’s physical, mental, and emotional state is constantly being quantified based on the data he or she generates. In some cases, people are aware that they are generating data and may give permission for these data to be used in certain ways. But these data can be used for multiple purposes, some of which people want and others of which they do not want. As examples, data can be used to identify suspects in a crime, approve loans, sense early Alzheimer’s disease, detect a person’s learning style, infer sexual orientation or political affiliation, estimate income, identify a network of friends or acquaintances, recognize where a person is through public cameras, or detect when a person is home. Furthermore, data that can be used beneficially are the same data that can be misused. For example, data generated by playing a game online could be used to identify health problems among older people, or they could be used to calculate reaction times and discriminate against older employees.
Data that can be used beneficially are the same data that can be misused.
Big data can reveal people’s activities at a continuous and intimate level and can be used in ways that make many people uncomfortable. For example, someone may enter a query into a search website and, the next day, encounter targeted advertisements for an associated product. But often the only way to acquire a service or product is to divulge the information demanded by the provider of that service or product. People may choose to use specialized ad-free search engines or browse the Web using privacy-enhancing technologies to limit the amount of targeted advertising they receive. However, people using these approaches may experience a lower quality or utility of service or even no service at all.
One way people may control the collection and use of their data is through the procedure known as notice and consent. It is a contractual agreement that assumes and respects the free exchange of information and services. It serves notice that an institution wants personal data, describes what the institution will do with the data, and explains what an individual will receive in exchange for the use of his or her data. The individual replies to this notice with a yes or a no (for instance, by clicking a button on a webpage). Yes gives consent and enables access to the service; no denies consent and, generally, the individual’s access to the service. In this way, individuals manage their privacy by trading it against incentives offered in the marketplace. Notice and consent makes no moral claim about whether privacy is good or not. It is simply an exchange agreement.
Given the complexity of the digital world, most people would be hard pressed to manage every aspect of their privacy.
As originally developed in the 1970s, notice and consent was a simple and easy-to-understand system designed to respect individual autonomy and the desire to derive value from data. It worked well at a time when data collection was much less pervasive than it is today and did not include the collection of extremely fine-grained bits of data (such as the timing and targets of swipes on a smartphone). Today, notice and consent, as currently used, has serious flaws. First, for consent to be useful, it has to be informed. But to cover all contingencies, consent notices have become long, dense, difficult to read—and usually remain unread. If an individual is not informed, that person’s autonomy is largely an illusion. Furthermore, people cannot make informed decisions every time the use of a technology demands personal data, especially as technology becomes more embedded in everyday activities. The individual user is being asked to assess one of the psychologically more difficult trade-offs: that between an immediate and predictable good and a long-term and unspecified risk. Moreover, cumulative effects are also hard to assess. An individual piece of information may be harmless, but when many such pieces are aggregated, the aggregate may reveal sensitive information.
Notice and consent typically demands a yes or no answer, but someone may want their data used for some purposes and not for others. Also, preferences, technology, and the use of data can change over time, but notice and consent makes no provision for such change. Given the complexity of the digital world, most people would be hard pressed to manage every aspect of their privacy.
Even if people were given a set of options rather than a binary consent option for the use of their data, they generally cannot be told exactly how their data will be used in the future. Companies may do their best to lay out the risks of providing personal information, but they may not be able to anticipate all such risks. For example, a company may discover a use for data that was not apparent when the data were collected.
Asymmetric access to and use of information means that the users of a technology generally do not know much about what is done with their data. Many users also do not care much about the effects of disclosure in the distant future. Firms that depend on mining private data do not go out of their way to publicize their use of the information and consequent threats to privacy.
Notice and consent does not, moreover, necessarily preclude transfer of data to third parties. As a result, information granted for one purpose may be transferred to someone else who uses it for another purpose. The existence of privacy policies does not necessarily safeguard privacy; such policies could specify, for instance, that all of a person’s data will be indiscriminately sold.
Finally, much of the information being gathered about individuals today is not subject to notice and consent. It is gathered through administrative records, transactions, and other activities of daily life, and what can be inferred by combining such data may be more harmful than any individual piece of data.
As discussed earlier, better cybersecurity protections and stronger accountability can help to ease the dilemmas associated with privacy. However, they cannot completely solve problems with privacy, because like notice and consent they place an undue burden on the user. Third-party privacy services could place the task in the hands of experts, but if such services then had to be purchased by individuals, inequities would be inevitable.
One alternative to notice and consent that is used more commonly in the European Union than in the United States is the concept of legitimate interests. It calls for balancing the interests of the data controller against the interests of the data subject. Under the framework outlined in the U.K.’s Data Protection Act, data controllers receive guidance about how to identify and protect these interests. In the United States, the Federal Trade Commission Act has an unfairness provision that might be used to implement a similar framework. Such a step would be consistent with the responsible use of data and could provide the basis for a universal approach to privacy protections.
One limitation of the legitimate interests approach is that it does not offer guidance on yet-to-be-invented uses of data. Also, how such a concept would be implemented remains uncertain. It could complement notice and consent, but other approaches are needed.
A pressing dilemma in the era of big data is that different stakeholders have conflicting interests in the balance between privacy and data collection. Even in a simple abstract model with just one data holder and two data subjects who exchange only cash and data, there are many scenarios in which the resulting flows of cash and data will not necessarily benefit everyone. In more complex situations, different definitions of optimality are similarly liable to lead to mixed distributions of both benefits and costs.
Within neoclassical economic theory, there are contrasting arguments for and against increased privacy protections. One argument is that privacy creates economic inefficiencies and therefore reduces economic welfare. Another argument is that stakeholders in the marketplace tend to overinvest in data collection and use, which is also inefficient and creates the risk that data will be inadvertently released that in the first place were never needed. Similarly, recent empirical research on privacy shows that either the protection of data or the collection of data can have beneficial and negative
consequences. For example, in the United States, states that legislate stricter privacy for medical data have been shown to experience lower adoption of new health technologies, in particular electronic medical records. But other results show that states with more protections on health information are more likely to see creative and innovative approaches because the innovators have a better sense of what can and cannot be done and are less subject to regulatory uncertainty.
At the root of many of these discussions is the question of who owns the data.
Similarly, the data industry can be viewed in different ways. If it allows a better match between consumers and merchants by enabling them to find each other with minimal costs, then consumers, merchants, and the data industry can all win. But if the data industry is an oligopoly with only a few gatekeepers who control the relationship or contracts between consumers and merchants, the data industry will have the upper hand with both merchants and consumers. In this case, the lack of competition can reduce choice, and resources can be transferred from consumers and merchants to the data industry rather than creating a bigger economic pie for everyone. The outcome remains an open question.
At the root of many of these discussions is the question of who owns the data. Can an optimal balance between privacy on the one hand and data collection and use on the other be identified or maintained? An even more relevant question may be whether the interests of different stakeholders can be balanced.
The more control consumers have over their data, the more risks they are likely to take with those data, in the same way that adding safety features to cars, such as anti-lock braking systems, may lead drivers to drive faster because they feel secure. Moreover, transparency and control are necessary but not sufficient conditions for privacy protection. In the absence of other protections, there may instead be “responsibilization,” whereby end users are forced to take responsibility for something over which they actually have little control.
Given the problems with current privacy regimes in this era of big data, rather than specifying that particular methods be used to protect privacy, government could regulate uses of data that pose risks. These risks could involve financial losses, physical injury, unlawful discrimination, identity theft, loss of confidentiality, and social or economic disadvantage. Under such a system, some uses of some data would be regulated or forbidden,
even if the data were gathered through notice and consent. Data could continue to be used for beneficial purposes, while harmful uses would be avoided because they would be illegal. Regulations could be applied to what might be termed “personally impactful inferences”—the combinations of existing data that represent potentially harmful use. Controls over the use of data also could apply to profiling activity.
Decisions about how data would be used and how such use would be controlled could emerge from individuals, communities, businesses, government, and society at large. These decisions could take the form of legislation, regulation, or informal standards, although different entities would have to negotiate who makes the decision, and the approach would need to be scalable so as to be widely applicable. If such a regime were to be attempted and as people gained familiarity with it, potential harms and benefits would become more apparent, so controls over use could change over time and vary from place to place.
An alternative or complement to controlling the use of data would be to control the collection of data. Disincentives to the bulk collection of data can be put in place. Entities that ask for too much data or permission to do too much with data can be identified and dissuaded from their actions—for example, by bringing those actions to the attention of potential users. The principle of purpose limitation in the European Union’s data protection directive, under which businesses can retain data only for as long as they need them, could be strengthened so that businesses do not retain data just in case a future use should arise.
Users could be given more granular control over the data they generate. For example, they could have more control over the generation of data by technologies such as cell phones. However, other information is also being gathered, such as by municipal cameras that record license plates. Furthermore, computers connected to the Internet typically send out voluminous quantities of data that can be hard to hide, and exceptional efforts to turn on privacy controls can make a user even more visible to those who are looking for such actions. Indeed, people have little control over the generation of “microdata” from everyday activities even though such data can be combined in revealing ways.
A widely accepted set of norms for the use of data could help to protect privacy. For example, the following norms, similar to the framework provided by the Fair Information Practice Principles,2 could be promoted and implemented:
- The use of data should benefit users or protect others. Benefits may be hard to pinpoint, but discussion among people representing multiple perspectives can often arrive at conclusions. At the least, the entities collecting the data could be required to explain to people how they or others are benefiting—if, say, such data collection is helping to stop fraud.
- Data should be kept secure. Security is essential to safeguard the uses of data and protect privacy.
- Users should be able to inspect, export, delete, and edit data they have provided. If people are able to review the data they have provided, they can see whether the information is accurate or they can decide to delete it. Allowing the data to be edited can be more of a challenge, since people may misrepresent themselves or their past activities or not understand the context in which the data were gathered and for what purposes. In some cases, moreover, deletion would be inappropriate—such as with financial data that need to be retained for accounting and legal purposes.
2The Fair Information Practice Principles are rooted in a 1973 report from the U.S. Department of Health, Education and Welfare, Records, Computers, and the Rights of Citizens.