Security Limitations of Encrypted Databases—Future Possibilities for Securing and Accessing Data at Rest
Thomas Ristenpart, Cornell Tech
Thomas Ristenpart is an associate professor at Cornell Tech and in the Computer Science Department at Cornell University. He studies the theory of secure computer systems. He observed that while theory research is often focused on 10 or more years into the future, his interests lie in the near term. He spoke to the workshop attendees about the security of databases, highlighting examples of methods that have been developed, their vulnerabilities, and where future efforts might be successful.
First, Ristenpart defined databases as computing systems that hold data and allow efficient retrieval, noting that having a lot of data “is not very useful to us if we can’t get to the specific items that we want.” Many databases contain sensitive information—for example, government or company databases—and many publications have made the point that data is a valuable resource, he added. As evidence, he referred
to the number of data breaches in major industries and governments as an indicator of the data’s value.
Ristenpart outlined two approaches to increase data security: one can secure against a malicious threat at the client side, or one can secure the database itself. He focused his talk on the latter. He further described the tension between security and usability. In general, the more secure the database, the less searchable and easy it is to use; the easier to use and search, the less secure. His research seeks a middle ground, somewhere between a fully secure and a fully open database (see Figure 5.1). He outlined at least two approaches used in concert to try to address security issues as well as their weaknesses and vulnerabilities.
Ristenpart said that the basic idea behind the approaches for securing databases is to use cryptography so that when an adversary inevitably gets access to the databases, they are not able to retrieve useful information. He explained that a system called an encryption proxy uses a secret key to convert the plaintext data into data that an adversary cannot use (i.e., ciphertext).
Ristenpart described an encryption method called deterministic encryption (DET). In general encryption processes, all queries and
results are encrypted so that two identical queries will produce what appears to be different results (different ciphertexts), which is good for security but not efficient for users. DET differs from general encryption in that the queries and their results are encrypted but result in the same ciphertext. The ciphertext being the same enables fast retrieval for users (i.e., find all rows with “Alice”), although the result will be unreadable to an attacker.
DET is known to be vulnerable to “frequency attacks,” according to Ristenpart. He illustrated the concept behind frequency attacks in a diagram (see Figure 5.2).1,2 An adversary generates a histogram of ciphertexts, and, with some information on plaintext distribution, such as the frequency of the names “Alice” or “Tim,” frequency analysis could recover plaintext information, Ristenpart explained.
2 M. Naveed, S. Kamara, and C.V. Wright, “Inference Attacks on Property-Preserving Encrypted Databases,” ACM Conference on Computer and Communications Security 2015:644-655.
Ristenpart calls this “leaking information.” He summarized that DET enables fast retrieval of matching rows, but ciphertexts leak plaintext equality, allowing leakage abuse attacks. He pointed out that DET frequency analysis can be very effective and cites academic work that shows that researchers, in a simulated attack, were able to recover 100 percent of DET-encrypted values from hospitals.3 He commented that in a hospital database, “you have a column which is the sex and it has been encrypted, and that means there are only two ciphertext possibilities. And we know that more women than men go to hospitals, and so all you have to do is say well, the one that is more frequent, that’s the women, and now you have uncovered the plaintexts for both.”
Ristenpart explained another approach to securing databases, order-revealing encryption (ORE), which uses DET encryption and ordering of the results to allow efficient range searches. The example he used was queries of employees (e.g., Alice, Bob) ordered by their ages or salaries. He noted that this is a more efficient but less secure method than DET.
Another approach briefly discussed by Ristenpart is encryption wrapping or “onioning.” He explained that this approach encrypts the DET ciphertext, which is decrypted when a user accesses the information. He cited researchers who have noted that this approach works only against an attacker who obtains a single dump of database entries (or a snapshot adversary). Ristenpart said that the academic community largely agrees that no such attacker exists; most will collect multiple snapshots of databases and use frequency analysis or similar analysis to access the data.
3 V. Bindschaedler, P. Grubbs, D. Cash, T. Ristenpart, and V. Shmatikov, “The Tao of Inference in Privacy-Protected Databases,” Proceedings of the VLDB Endowment 2018, Vol. 11, No. 11, August 2018, doi:https://doi.org/10.14778/3236187.3236217.
Ristenpart noted that all of the encryption schemes presented have vulnerabilities: “The consensus in the academic community now is these things are pretty dangerous; they are probably giving you a false sense of security … most people say just to avoid them is probably the easiest thing to do and we have got to look for other solutions.”
To review these encryption approaches, Ristenpart noted the following:
- Property-revealing encryption such as DET/ORE leaks information about plaintexts;
- Research shows that in many situations, property-revealing encryption is insecure; and
- Relatively cheap “fixes,” such as encryption wrapping, will not prevent these attacks in current practice.
He summarized his talk with the following three points:
- The inability to secure databases is a pressing issue for government, industry, and nongovernmental organizations.
- Encrypted databases have high potential for strong security layer.
- Some earlier works over-promised and under-delivered.
- There are future directions worth exploring!
- Research(ers) with holistic, “mixed-methods” approaches are needed that
- Understand attack methods,
- Understand formal crypto analysis methods (e.g., “provable” security), and
- Understand database systems.
Ristenpart believes that more research and development of encrypted database technologies is needed, and that the work showing deficiencies in current approaches is meant to help refine understanding and not chill work on the topic.
Robert Dynes, a planning committee member, asked how uncertainty is associated with frequency analysis. Ristenpart agreed
that there is some uncertainty in the frequency analysis. Basically, he explained, the highly skewed distributions—for example, names or gender/sex in hospital data sets that are both highly skewed—are very likely to be represented in a database and will be easy to recover. Data in the tail of the distribution that are lower frequency are less likely—will have higher uncertainties, and there will be less confidence about them. He noted that the papers that he referenced discuss this topic in more detail.
Mim John, the workshop chair, asked whether the frequency analysis might be a good cueing source. Ristenpart responded that it could be one source of intelligence, but it would need to be backed-up with other sources. One workshop participant wondered whether encryption could be used to protect backdoors. Ristenpart responded that it is difficult to secure data from adversaries while allowing access to legitimate users. He also noted that, unfortunately, there is a bad track record on keeping secrets.
Brian Allen asked whether bots were used to conduct queries that appear to be legitimate but would change the frequency distribution, and whether would it be possible to eliminate that frequency analysis problem. Ristenpart replied that noise can be added, but the crux is how to do that in a principled way that really does flatten the distributions in a well-defined model. Ristenpart highlighted a related and important question, which is “What if I want to clean my database of contamination?” If it is encrypted, the service provider for the data cannot do that, so that is a tension, he acknowledged.