A critical precondition of good empirical studies is the availability of data. One important type of economic data is administrative records maintained by government agencies. Although there are important informational gaps because there is no formal reporting requirement for patent licenses and changes in assignment or ownership, research on patents has the benefit of an almost complete set of data on what patents are applied for (at least within 18 months), what patents are issued, and initially to whom. Since formal copyright registration is not required to obtain protection, there is no comprehensive set of administrative data on copyright protection or ownership. Data on registrations with the Copyright Office of the Library of Congress provides useful information on many of the most important copyrighted works, but such records are incomplete and historical records are difficult to access. On the other hand, as with patent litigation, the federal judicial system generates a complete set of data on who sues whom for what copyright violations and, with the important exception of out-of-court settlements, with what outcomes, as well as on criminal prosecutions and resulting penalties. Other important administrative data are collected in mandatory federal government surveys of businesses, employment, expenditures for research and development, and other business activities.
Unquestionably, the most crucial data for analyzing the impact of copyright and of digitization reside in the private sector. Fortunately, the digital revolution, while transforming the conditions underlying the copyright system, also means that a wealth of information relevant to the functioning of the copyright system is generated and stored routinely in the course of business—for example, purchases, licensing transactions, and website views, among others. The challenge is that these data are in the hands of a multitude of private collectors—sellers, Internet service providers, and search engines. Much of that information is proprietary or subject to trade secrecy and privacy protections and thus is not subject to disclosure. Little of it is in a form readily usable by researchers. Even for the data that businesses may be willing to share, there are often very substantial hurdles in collection, aggregation, and transmission.
We devote a good deal of this report to enumerating these data sources, explaining their relevance to public policy concerns, exhorting collectors to make them available on reasonable terms to qualified investigators, and demonstrating the importance of public and private investment in