Stefan Savage, University of California, San Diego
Stefan Savage is a professor of computer science and engineering at the University of California, San Diego. His work ocuses on empirical measurement and analysis of cyber crime and cybersecurity. He noted that his work addresses challenges similar in some ways to those faced by the Intelligence Community (IC).
- Much of the activity of interest takes place in private, limiting the observation window.
- The adversary is constantly changing their method, either to improve or as a reaction to having been seen or disrupted.
- A great deal of the work is focused on understanding the relationship between the technical components and the nontechnical dimensions that motivate bad actors.
He noted that his work focuses on economics as well as some of the social factors that allow people to build networks of common interest.
Savage noted that, in spite of the challenges, one advantage researchers in this area have is that substantive cybercriminal organizations are rarely vertically integrated. They require the same kinds of economies of scale that legitimate businesses do. They may need to outsource components such as malware development, anonymization services, or even English language skills. Markets for these sorts of generalized commodities cannot be closed to only the cybercriminal sector. They are often sufficiently open that very motivated researchers can participate in them. Even in the case of specialized capabilities that do not lead to open markets (i.e., drug and chemical supply or banking relationships), the specialization itself means that there is significant sharing across organizations because the customers in these markets all end up working with similar people and organizations. These non-open capabilities eventually become visible once they are used.
Savage explained that his team’s work often involves engaging directly with these open markets or with criminals in order to develop more opportunities to observe what is happening. He said that with sufficient observation, his team may discover a leak (some ground truth) about the criminal behavior that can then become a Rosetta stone for discovering more. Even activities that cannot be measured directly can sometimes be measured via side effects or proxies. Savage noted that the remainder of his talk would provide several examples of application of this overall research approach.
The first example he described involved the research question of how to measure the security of a given cyber defense. He noted that this is a longstanding challenge because there is little theoretical basis for how to talk about measuring defenses. What does it mean if an antivirus is “more” or “less” secure? He explained that his team’s insight was to take advantage of the adversarial environment and the existence of a marketplace focused around being able to bypass defenses. Examining how a change in defense leads to changes in
the pricing of underground services for creating a bypass to the defense, which provides a way to implicitly measure the security value of various defenses. In other words, he said, if the defense is more effective, it becomes more expensive to bypass.
Savage described how CAPTCHAs1 provide a way to apply this approach. CAPTCHAs are meant to prevent mass signups to a service by bots. They require the entity signing up to do something that is difficult for computers to solve automatically but easy for humans to do. Adversaries attempting to bypass CAPTCHAs do not do so with clever computer vision algorithms, but with outsourced human labor to places such as China, India, or Bangladesh that have low-cost labor but good Internet connectivity. The adversaries pay about $0.50 to $1.00 per 1,000 CAPTCHAs solved. At any point in time, he said, somewhere on the order of 1,000 workers are available instantaneously to do this kind of work.
He explained that his research team joined every underground CAPTCHA-solving service it could find, as both retail customers and laborers in order to see the transactions from both sides. That allowed the team to precisely construct experiments so that it could observe how many people worked there, how much capacity could they have, what the expected accuracy is, and so on.
Savage said this approach also provided fresh insights for how to think about the problem. Often cybersecurity experts assume that CAPTCHAs cannot be a very effective defense, because they are comparatively inexpensive to defeat. However, Savage observed that people have used CAPTCHAs for so long because they provide a different kind of value: a source of some friction against people who are engaging with a service. What this means to attackers is that if the value from compromising an account does not offset the cost of the CAPTCHA solving service, then it will not be attacked. Effectively, he said, CAPTCHAs act as a filter so that only the highly motivated attackers can avail themselves.
Savage noted that this approach is also a useful way for companies to measure the effectiveness of changes in cyber defenses.
1 CAPTCHA is an acronym for “completely automated public Turing test to tell computers and humans apart,” coined in L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems for Security,” EUROCRYPT 2003: International Conference on the Theory and Applications of Cryptographic Techniques, May 2003, doi:10.1007/3-540-39200-9_18.
He described some work that his team did with Google. The company wanted to better understand account compromise and abuse and what defenses would be most helpful. Savage explained that his team added phone verification SMS challenges to the account workflow. The team was able to track the cost of bulk registrations of Google accounts versus other webmail providers. He said that when Google first rolled out SMS challenges there was a factor of 10 increase in the price in order to address the additional burden placed on the people who were trying to abuse the process. Savage said that this sort of proxy measurement is now a standard operating procedure inside Google and other companies as a way to evaluate the effectiveness of some of their defenses.
Another example related to measuring effectiveness involves the cost of hacking into individual email accounts. A service might charge $350 in return for the contents of a particular person’s Gmail account. At the time his team started to study this activity, Savage said, the extent or effectiveness of these services were unknown. He noted that a colleague, Geoffrey Volker, likes to say that the team’s secret research methodology is shopping. Savage explained that the team found several of these services and then created two classes of online personas: buyers seeking to engage with these services, and fake personas (along with related social networks) of people the team wanted the services to attack. He said that his team asked the services to break into these fake persona’s email accounts. He noted that they were careful to work with university lawyers and Google’s lawyers during this research. This approach allowed the team to monitor very precisely what happened after attackers were paid to break into these accounts.
Savage said that most of these services were low quality, but several were quite effective. The latter services had workflows that were able to defeat Google’s two-factor authentication in many cases, primarily through phishing. Because it had this controlled study, he said, the team could develop an understanding of how certain groups were targeting Google, and then look retroactively to determine how many people had been compromised. The team found that hundreds of users had been compromised by the most effective attack site
discovered. That result led to a number of changes in how Gmail works, and the price of those attack services doubled.
Savage provided another example focused on defenses and using economics as a lever to determine where to intervene to improve a system. He noted that most of the tools associated with cybercrime (trojans, botnets, exploit kits, and so on) are a cost center. None of them actually creates any revenue. So, he said, his team decided to look at an entire scam from the standpoint of this question: What is the value chain and the cost structure for the business and where are the weak points in that? He described an analysis of email spam.
A spam message must get into a user’s mailbox and evade the spam filter, and the user then needs to click on a link; but that, Savage noted, is only the beginning. For the scam to be effective, many other things need to happen as well. The domain clicked on needs to have a registrar; there needs to be a name server; that server needs to reach back to another server to provide content, and so on. Moreover, there needs to be a set of relationships to actually fulfill an order. In particular, there must be a bank to take credit card payments and a drop shipper to provide the product. Savage explained that each one of these items must work in order for the overall enterprise to make money. But almost none is run by the same organization. The process is heavily decentralized and outsourced.
Savage said the team turned to the question, Which one of the pieces of this process is most cost effective to disrupt in order to demonetize this activity? To find out, he said, the team basically built the world’s most gullible person in an account that received as much spam as they could and then clicked on everything offered. He said his team worked to associate particular websites with particular criminal organizations; a researcher went undercover and pretended to join more than half of them so that the team had ground truth. Next, the team cut a deal with a bank so that it could make purchases
from all of these different classes of sites so that it could track the flow of money back to the acquiring bank who was going to receive it. (As an aside, he noted that his is one of the few departments of computer science with a drug room to manage all the drugs bought in during this investigation, again pursuant to general council approval.)
The team learned that one reason these enterprises are so effective is that most of the resources needed to complete the scam have very low replacement cost. He said that if you shut down someone’s web server, take away their domain name, or take down their botnet, the cost to develop a replacement, both in time and money, is quite modest. The exception, the team found, is on the merchant bank side—that is, the side that actually receives the Visa payments from the customers (primarily in the United States and to a lesser extent Western Europe).
The end result, the team found, is that spammers use a small number of banks. Moreover, if one of those accounts is shut down for malfeasance, all of the money that is held in it is swept up (in some cases resulting in multimillion-dollar losses.) That result, he said, led the team to help establish an undercover takedown regime focused on money and the banks, which was very effective. The team first conducted a case study with Microsoft focused on shutting down counterfeit software sales. He said they effectively shut down all online sales of counterfeit Microsoft software for 18 months.
He described similar work with the pharmaceutical industry that shutdown about 50 percent of the organizations that were selling online drugs. The European banks that were participating largely exited that market. He added that because the team had a researcher undercover in some of the organizations, it could record the messaging that the organizations were providing to their membership about events. That, he said gave the team a great deal of information to help confirm what the organizations thought was happening.
Savage described another approach to block spam called domain blacklisting. A list of open domains that are thought to be bad is made available, and then users block them if they appear in an email. He noted that these days everyone who uses email has domain blacklisting happening somewhere as a part of their email chain or
browser. This mechanism is supposed to help prevent people from visiting bad sites. However, he said, the bad actors continue to send links to domains that have been blacklisted. This led the team to think that perhaps blacklisting is not as effective at disrupting the actual economics of this market as had been thought. In another project, with some help from journalists, the team acquired leaked data from very large illegal pharmaceutical programs related to every single sale over the course of several years. It was able to synchronize those data with contemporaneous blacklists to see how blacklists affect the market.
Savage noted that the team’s findings were not intuitive: most of the revenue generated by these criminal groups from domains that are blacklisted occurs after the domain is blacklisted. The reason, he explained, is multifold. One reason is that blacklisting is not deployed universally; there is a long tail in the set of email providers that do not have the latest information. Another reason, he said, is that somewhere between 20 to 40 percent of sales for counterfeit drugs are derived from people who went into their spam folder to find and click on the link.2 This means that, in fact, the filtering was not effective; rather, it was providing one place (the spam folder) for people to look for the information. Another issue is that replacement cost for domains is really low. He explained that because the team had the complete balance sheet of several of these organizations, it could see that less than two percent of the costs are replacement domains, the economics of the domain registration market.
Savage turned to a more general issue regarding cyber defenses that are made public or can be seen publicly. When a bad actor is forced to shut down operations because they see evidence that they have been detected and blocked, they are forced to evolve their tactics. Moreover, the defensive side loses an intelligence source. In other words, he said, public defenses may result in only a temporary reduction in attacks. He described a study in which Facebook tried to compare different abuse interventions. Whenever it directly blocked the attackers, the attackers evolved, usually within hours, if not sooner. The defenses deployed with longer-term
2 N. Chachra, D. McCoy, S. Savage, and G.M. Voelker, “Empirically Characterizing Domain Abuse and the Revenue Impact of Blacklisting,” Proceedings of the Workshop of Economics of Information Security (WEIS), June 2014, https://cseweb.ucsd.edu/~voelker/pubs/namevalue-weis14.pdf.
value allowed the attackers to continue. Facebook then tracked the attackers and silently undid their actions at a random point in the future so that the attackers would not get a clear signal of when or how they were detected.
Savage described a related study on data breaches. Usually, he said, data breaches become known to the public because someone discovers that it happened and then discloses it publicly. But, he asked, how many data breaches are undiscovered and/or undisclosed? The team developed a project to try to identify site compromise without the active participation of the sites being compromised.
He further described how the team focused on data breaches that involve the theft of credentials, in particular online sites where usernames are typically specified in terms of someone’s email address. The user provides a password associated with the username. The standard way that passwords are implemented is the site has a backend database with the username and a cryptographic hash of the password that is used at login to ascertain that the user actually knows the password. The assumption his team made, he said, and one that has been borne out by a lot of research, is that password reuse between people’s email accounts and other sites that they log into is quite high. Approximately 40 percent of users reuse passwords, he said, and 20 percent share a password with their primary email account.3 As a result, it is quite common when there is a breach of credential files that the bad actors will try to crack the password hash file using a dictionary attack (where they try all possible passwords), and then see whether they can login to the associated email account.
Savage’s team created a system that registered unique accounts at many Internet sites. In each case, it set the password on the site to be the same as that for the email account, effectively becoming one of the 25 percent, in partnership with a major email provider. He said that the team created a lot of accounts, and then used each individual account to login to a unique site. It then monitored for successful email logins under the assumption that the only way an attacker could log on with the correct password would
3 A. Das, J. Bonneau, M. Caesar, N. Borisov, and X. Wang, “The Tangled Web of Password Reuse,” Proceedings of the Network and Distributed System Security Symposium, February 2014, https://www.ndss-symposium.org/ndss2014/programme/tangled-web-password-reuse/.
be by having breached the corresponding site. These naïve accounts can be thought of as canary warnings.
He explained that the team used two different kinds of passwords: ones that were very easy to crack (dictionary words), and ones that were long and random and for which a traditional cracking tool would not work. It did this, he said, mainly as a way to infer breach severity. If a bad actor broke into the second type of account, then either the site was not using password hashes or someone was doing a man-in-the-middle attack on account setup. The team ran this study on about 2,300 sites that it monitored for 1.5 years. He said that it detected 19 breaches (about 1 percent of the sites), none of which had been disclosed before. In almost every case, he said, the team talked to the site owners, who were not aware that the breach had happened.
Savage noted that, here again, an interesting observation is that while the team could not measure the breach itself, it could set up the data in a way that a side effect allowed the determination that a breach occurred. His team also learned that breach disclosure is difficult, especially when it could tell the company only that the breach happened, but not how it happened.
As a final example, Savage described some of his team’s work on tracking Bitcoin. Bitcoin, he noted, is still the world’s largest cryptocurrency, at least in terms of exchange value, at about $150 billion. It is decentralized, and there is no oversight authority. The transactions in Bitcoin are public, irreversible, and pseudonymous. He said the advantage of this kind of cryptocurrency is that costs on payment are potentially reduced. The risk is that there are no controls on abuse. He and his team tried to determine how anonymous Bitcoin actually is. They realized, he said, that many cryptocurrencies, and Bitcoin in particular, face two challenges.
The first, he said, is that no one really wants Bitcoin. It cannot be used for meaningful financial work. Eventually, one needs to move Bitcoin in and out of fiat currency, and the exchanges that do
this typically have “know your customer” requirements imposed by governments. This means that the bank will know who gets the fiat currency. The other challenge, he said, is that Bitcoin identifiers are consistent over time. Although the identifiers are pseudonymous, once they are used (exchanged for a fiat currency somewhere), reidentification attacks are now possible; all of the transactions that use the same public key are that same person.
Savage then explained that both on the retail side and on the exchange side, entities need to publicize their wallets in order to conduct business. And so again, he said, his team did a lot of shopping; it bought many things with Bitcoin and then placed translation orders with most of the exchanges until it identified a number of wallets. It then clustered the flow of money across the Bitcoin graph (with several hundred known ground truths) and found that a large fraction of the Bitcoin transaction flow can be traced back to an entity that has real-world data. He noted this sort of analysis has now become a fairly standard practice.
Dynes asked whether Savage’s team discussed risks or scenarios for retaliation against his team by the cybercriminals under study. Savage explained that this is a consideration and some areas remain to be explored. He placed the work into two categories: one where the adversary is nonviolent (e.g., the spam work) and one where the research team is not a direct cause of the cybercriminals’ problems. However, these categories did not hold in one case, and the team received focused attention from unwanted sources. He also noted that his team does not involve students in activities that seem too risky. He explained that generally speaking, his team is doing observational measurement studies.
Dynes asked whether the team thought about designing ways to retaliate against or foil attackers directly. Savage noted that a lot of the work was focused on identifying ways to make the criminal businesses fail. The team has provided a great deal of information to law enforcement which led to arrests in several cases. John expressed concern about whether criminal networks might then decide to attack the team to impede the work. Savage noted that others are doing this kind of work and that the team has shared the lessons learned with the operational sides of either industrial or government
organizations. He noted that it is not as appropriate for the team to continue when there is no new research question.
Brinsfield asked about lessons learned and insights about how to use related techniques to do a better job at surveillance and monitoring and noted that in intelligence the people and networks they may be interested in are increasingly transient. She asked whether there is a way to make connections between activities or to recognize when something is anomalous. Savage suggested that, to the extent possible, one create incentives for the people one is trying to monitor or to interact with. That could mean making available attractive services or needed commodities. He said that provides an opportunity to keep a finger on the pulse of their activities. He also observed that he has been surprised about the extent to which law enforcement professionals are not using the available data sets to generate leads (as opposed to using them to explain leads they already have).
Savage provided two examples with which he is familiar. He said that before Bitcoin, the most common cybercriminal currency was Liberty Reserve. Eventually, a case led to arrests, and through forfeiture law enforcement acquired all of the back-end servers for Liberty Reserve, which holds somewhere around $10 billion dollars of mostly illegitimate transactions. He understands that very little case origination work has been done by mining that data set (as opposed to using it as a reference when looking someone up.) As another example, the Financial Crimes Enforcement Network (FinCEN) of the U.S. Department of Treasury conducts almost no case origination from Suspicious Activity Reports. He said, “I think this is unfortunate because it is an enormous data set and is exactly the kind of thing that modern data mining techniques would be good at analyzing.”
Sara Gamberini, National Defense University, asked about the data breach project. She wondered whether, in cases where the team discovered a breach that had not been made public, it felt they had a responsibility to the general public to disclose. Savage replied that his group debated that internally and that he personally would have preferred to disclose, but that legal concerns and a lack of consensus on what to do prevented that course of action. He also noted the problem of detecting the breach but not the method of breach. What
that means is that even if a company resets the passwords, it is not clear that the systems become safer.