The analogy to empirical patent research has limitations. Unlike the patent system, there is no comprehensive repository for copyrighted works. Measuring their value using sales or usage data is challenging because such data are either unknown, dispersed, or privately owned. Owing to the vast, decentralized, and often private nature of the data, the costs and benefits of the collection process are often difficult to know. In some cases, such as orphan works, it is simply infeasible. Thus, before describing some types of research projects that might be profitably undertaken, we outline in this chapter both key opportunities and formidable challenges associated with acquiring and using data related to copyright and identify some promising data resources to support policy-relevant empirical studies.
Copyright policy is most contentious and in flux in the digital realm. The introduction of CDs, DVDs, MP3 files, UGC websites, web-based content aggregators, and now streaming music and radio have all created challenges for the interpretation and enforcement of copyright law not only in the music industry but also in other copyright-intensive industries such as newspapers, software, and film. Digital technology also enables rapid changes in the nature of consumption, which can expand rapidly in new areas and contract just as swiftly in others.
The implications for data collection are also profound. Most promising, the process of digitizing and digitally distributing expressive works generates a digital data trail that can then be used by researchers to study copyright policy. File-sharing is a prime example. By its design file-sharing software requires an accounting infrastructure that keeps track of users connected to the system, including their location, operating system type and speed, as well as information on which files are being shared by whom in what way. These data are ostensibly public, although collecting, organizing, and making data amenable to systematic research takes considerable effort. Several studies have collected different chunks of such file-sharing data and use it to telling effect. Such direct comprehensive data-based analysis of music sharing would have been impossible in a world where users swapped CDs and purchased bootleg copies from local dealers.
Although infringing use of music has been the phenomenon most thoroughly studied using this digital data trail, it is not inconceivable that similar methods could be applied to other industries as they become increasingly digitized. E-books provide a prime example. In a world where readers increasingly consume written content on digital devices,