4

Data Infrastructure for an Empirical Approach to Copyright Policy Research

Although the empirical research described in the previous chapter suggests that independent research on the copyright system’s impact on creativity and innovation can provide significant insights for policy makers, the availability of such research is very limited; and for questions on which some research exists, it is clearly at an early stage of development. The paucity of independent research can be explained by many factors, but the committee’s deliberations repeatedly returned to one key bottleneck—the quality and quantity of data across all of the principal content media—books, movies, recorded music, newspapers, and software. Categories include data on such matters as the costs of production, marketing, and distribution; prices of products and quantities sold; ancillary sources of revenue for creators such as live performances; consumption behavior; patterns of access, including unauthorized access, to copyrighted works; licensing terms and the efficacy of licensing arrangements; and the costs and efficacy of anti-piracy technologies and legal enforcement measures.

The situation with respect to copyright is analogous to discussions of the impact of the patent system some 15 years ago. There was no paucity of theory, but the difficulty of subjecting these theories to systematic and detailed empirical analysis meant that the debates went largely unresolved. There was even widespread skepticism that empirical research was feasible, let alone useful. This state of affairs has changed significantly over the past two decades. Most importantly, a number of key data sources were made available or created, spawning a diverse literature



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 45
4 Data Infrastructure for an Empirical Approach to Copyright Policy Research Although the empirical research described in the previous chapter suggests that independent research on the copyright system’s impact on creativity and innovation can provide significant insights for policy mak- ers, the availability of such research is very limited; and for questions on which some research exists, it is clearly at an early stage of develop- ment. The paucity of independent research can be explained by many factors, but the committee’s deliberations repeatedly returned to one key bottleneck—the quality and quantity of data across all of the principal content media—books, movies, recorded music, newspapers, and soft- ware. Categories include data on such matters as the costs of produc- tion, marketing, and distribution; prices of products and quantities sold; ancillary sources of revenue for creators such as live performances; con- sumption behavior; patterns of access, including unauthorized access, to copyrighted works; licensing terms and the efficacy of licensing arrange- ments; and the costs and efficacy of anti-piracy technologies and legal enforcement measures. The situation with respect to copyright is analogous to discussions of the impact of the patent system some 15 years ago. There was no paucity of theory, but the difficulty of subjecting these theories to systematic and detailed empirical analysis meant that the debates went largely unre- solved. There was even widespread skepticism that empirical research was feasible, let alone useful. This state of affairs has changed signifi- cantly over the past two decades. Most importantly, a number of key data sources were made available or created, spawning a diverse literature 45

OCR for page 45
46 COPYRIGHT IN THE DIGITAL ERA on the operation and impact of the patent system. An important early effort was the establishment at the National Bureau of Economic Research (NBER) of the first publicly available patent dataset that incorporated both accessible patent citation data and links to Compustat data on individual firms (Jaffe and Trajtenberg, 2002). Extensive surveys of corporate R&D managers by researchers first at Yale University (Levin et al., 1987) and later at Carnegie-Mellon University (Cohen et al., 2000) provided the first systematic data on how patents are used relative to other means of creat- ing competitive advantage in different industries. Public agencies such as the National Science Foundation and, in recent years, the U.S. Patent and Trademark Office itself, have taken further steps to expand patent- related data collection and analysis. A robust empirical research agenda in the copyright area will require data associated with the activities of very different stakeholders—originating artists, performers, companies that publish and disseminate copyrighted works—as well as much more detailed user data that capture patterns of digitized material consumption and distribution across population groups. The availability of systematic data and the emergence of a community of investigators able to identify the strength and weaknesses of particular data sources for addressing particular issues were keys to an empirically oriented understanding of the patent system that has clearly influenced policy making in the area. The committee believes that creating a similar data infrastructure platform around copyright and enabling a community of investigators to study and engage directly in policy debates in the area of copyright would be immensely valuable. Empirical copyright research has been undertaken in the past although not on a sustained basis. Issues similar to today’s debates about anti-piracy measures arose at the dawning of the digital age over two decades ago. With the advent of digital audio tape (DAT) technology, the record industry and the consumer electronics industry diverged on the need for government intervention. Both sides produced consumer surveys and studies supporting their points of view. The non-partisan Office of Technology Assessment (OTA), created to provide Congress with authoritative analysis of complex technical issues, sponsored theo- retical, empirical, and survey research that addressed consumer patterns as well as the concerns about infringing use of home recording technol- ogy. Although the legislation growing out of this work—the Audio Home Recording Act of 1992, P.L. 102-563, 106 Stat. 4237—was soon eclipsed by more effective digital copying and playback technologies (e.g., computer ripping of audio files from CDs and MP3 players), the OTA studies, in particular its consumer survey, provided an objective basis for anticipat- ing consumer behavior and evaluating policy options (U.S. Congress Office of Technology Assessment, 1989).

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 47 The analogy to empirical patent research has limitations. Unlike the patent system, there is no comprehensive repository for copyrighted works. Measuring their value using sales or usage data is challenging because such data are either unknown, dispersed, or privately owned. Owing to the vast, decentralized, and often private nature of the data, the costs and benefits of the collection process are often difficult to know. In some cases, such as orphan works, it is simply infeasible. Thus, before describing some types of research projects that might be profitably under- taken, we outline in this chapter both key opportunities and formidable challenges associated with acquiring and using data related to copyright and identify some promising data resources to support policy-relevant empirical studies. OPPORTUNITIES AND CHALLENGES ARISING FROM DIGITAL TECHNOLOGY Copyright policy is most contentious and in flux in the digital realm. The introduction of CDs, DVDs, MP3 files, UGC websites, web-based content aggregators, and now streaming music and radio have all created challenges for the interpretation and enforcement of copyright law not only in the music industry but also in other copyright-intensive industries such as newspapers, software, and film. Digital technology also enables rapid changes in the nature of consumption, which can expand rapidly in new areas and contract just as swiftly in others. The implications for data collection are also profound. Most promis- ing, the process of digitizing and digitally distributing expressive works generates a digital data trail that can then be used by researchers to study copyright policy. File-sharing is a prime example. By its design file-sharing software requires an accounting infrastructure that keeps track of users connected to the system, including their location, operating system type and speed, as well as information on which files are being shared by whom in what way. These data are ostensibly public, although collecting, organizing, and making data amenable to systematic research takes considerable effort. Several studies have collected different chunks of such file-sharing data and use it to telling effect. Such direct compre- hensive data-based analysis of music sharing would have been impossible in a world where users swapped CDs and purchased bootleg copies from local dealers. Although infringing use of music has been the phenomenon most thoroughly studied using this digital data trail, it is not inconceivable that similar methods could be applied to other industries as they become increasingly digitized. E-books provide a prime example. In a world where readers increasingly consume written content on digital devices,

OCR for page 45
48 COPYRIGHT IN THE DIGITAL ERA usage data now exist that would have been prohibitively expensive to collect in the analog age. Software logs routinely collect information not only on sales of books downloaded from centralized repositories like Amazon, but also information on if and when a particular book was read, how quickly it was read, and so forth. Similar analyses could be done on e-magazines and blogs where it is now possible to measure time spent on a particular article or blog-post, and click-through rates of particular hypertext links. In the context of streaming video, YouTube and Netflix collect data on user behavior including repeat consumption and the loca- tion and time of consumption. All of this information, if routinely col- lected by private and public entities and systematically organized, would be invaluable to the study of copyright in the digital age, as well as other aspects of the digital economy. Of course, proper use of this data will require taking steps to protect the privacy of consumers. On the other hand, collecting such microdata for research remains a considerable challenge. Perhaps the biggest challenge lies in the fact that data about the creation, consumption, and distribution of digital media increasingly reside in the hands of private entities whose incen- tives diverge from those of researchers. Even if such data were available, constructing pseudo-experimental research designs places an additional burden on data when, as is usually the case, researchers are unable to directly run experiments. Finally, the problem of “free” goods is particu- larly salient in the digital domain. E-magazines and blogs are often free to read, free applications for smartphones abound, and free music and video are widely available. In such cases, it becomes hard to place a dollar value on such goods, compounding the difficulty of estimating consumer or producer surplus in these industries. This section highlights the practical and conceptual challenges inherent in the collection of digital copyright- related data and its use in carefully designed research. Incentives of Data Owners Data collection can be costly. Firms and industries have some moti- vation to collect such information in the pursuit of profit maximization and industry-focused advocacy. To out-compete rivals they will want to keep some information proprietary, but in some cases they will be open to selectively sharing data that will help their industry in policy advocacy. They might also design studies and surveys to shape public or political elite perceptions in ways that favor their policy agenda. The home record- ing controversy described earlier is a good example. What private data holders do not have at present is an incentive to act in concert to share data with researchers whose results they do not control. These challenges will undoubtedly persist as the Internet and digi-

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 49 tal technologies continue to evolve. For that reason, we believe that the policy agenda must begin with a multi-faceted, robust, broad-based, for- ward-looking data collection foundation. Challenges of Research Design Even if some of the adverse data-sharing incentives of data owners could be negotiated, credible research requires well-conceived research designs. The ideal approach is to experimentally subject a treatment group to a particular policy while leaving another, similar “control” group untouched, then to estimate the impact of the policy using relevant outcome variables. This simple comparative approach would work if we could experimentally expose, say, half a population to an opportunity to engage in infringing use of copyrighted content. But this may not be fea- sible. We assume that the people we would observe engaging in infringe- ment are likely those with a high level of interest in the work. However, research into whether this assumption is valid may be a threshold step in this inquiry. For example, it may be that some people access the work without authorization merely for the purpose of skimming, sampling, or other initial inquiry much as one would use a précis, index, or other aid. Gaining access to data while simultaneously implementing a credible research design is often a considerable challenge. Nevertheless, the more data collection is expanded, the more it will be possible to implement better research design. (Angrist and Pischke, 2008). The copyright context may well be a source of pseudo-experimental comparisons. As a general rule, books and musical works published in 1923 are now in the public domain while some works produced a year later are not, making it possible for simple comparisons to provide impor- tant insight into the effect of copyright although this may be complicated by the fact that there are often several editions of the same title. If copy- right protection inhibits use—or if being in the public domain promotes over-use—then the works still under copyright protection should see less use. As useful as this insight may be, a researcher of course still needs data on usage or other outcomes of interest. In particular, careful research designs must reflect the fact that copyrighted material is heterogeneous and ensure that “apples to apples” comparisons are being made when the objective is to determine the impact of copyright law on the creation, diffusion and use of those works. Free Goods The challenges of incorporating the impact of digital technology into GDP are particularly troublesome in the case of digital goods and services

OCR for page 45
50 COPYRIGHT IN THE DIGITAL ERA whose price is zero. To see why, consider the usual approach to adjusting for quality. Suppose that technical change has allowed the price of lettuce to fall from $3.50 to $2.00 from 2009 to 2010 and that demand is perfectly inelastic, i.e., the quantity remains constant. While the total nominal sales of lettuce would decrease from 2009 to 2010, we can easily make a price index adjustment. Using the 2009 prices for the same good, GDP would have been higher, and so we can use these quality-adjusted prices to char- acterize the impact of technical change on the lettuce industry. If a good that formerly had a price becomes free, however, there is no procedure for incorporating it into GDP statistics. Suppose that in 2009, there were many sales of music CDs, but by 2010 consumers relied exclusively on infringing downloads, possibly in much higher volume. As customers download music without cost from the Internet in place of purchasing music CDs, both the price and quantity of music purchases disappear from GDP calculations. There is no simple price adjustment that will allow us to link the 2009 and 2010 distribution and account for the change in price. Instead, the entire category of music sales simply disappears from the GDP estimation. A concrete example is the decline in sales of printed encyclopedias, initially attributed to the rise of Encarta, which was recorded as a drop in GDP, while the rise of Wikipedia, which displaced Encarta, is absent from the GDP statistics. Similarly, there is no direct accounting in GDP for the rise of online media services such as the New York Times or Washington Post except for the indirect sales generated through advertising revenue. This mismatch in the quantity of digital output and its mis-measurement in copyright-relevant industries makes empirical analysis extremely hard to implement. Despite the formidable challenges of measuring the value of free goods, their increasing importance in many digital contexts requires that new research methods be developed and implemented. Contingent valu- ation, randomized control trials, and quasi-experimental settings are all potential methods for helping to determine what value consumers and other stakeholders ascribe to free goods on the Internet. Companies like Google have been measuring and benchmarking the impact of digital content. A website’s PageRank or reputation on the Web translates into how much attention or time it can expect to get from consumers, which translates into how much ad revenue it can demand from advertisers. These links fall short of scientific rigor, and it is debatable whether ad revenue captures all the values and if not, what the correct methodology should be.

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 51 Measuring the Impact of Digital Technology Many of the topics in copyright policy research require measuring some aspect of the transition to a digital age. The measurement ques- tions are central in some cases, secondary in others; but measuring the emergence of digital technology is an underdeveloped field attracting a level of effort woefully small in comparison to its social and economic importance. A very large scale government enterprise measures GDP, the flow of pecuniary goods and services. The shifts toward digital goods and digital distribution command attention nothing like it in scale or sophistication. The symptoms of underdevelopment are apparent in many aspects of U.S. policy. For example, the recently issued 360-page National Broadband Plan contains information from only a few statistical studies authored by neutral third-parties, primarily academics. It contains little in the way of statistical analysis of the consequences of various policy options. This is not attributable to inadequate staff effort but reflects the inchoate state of economic research about digital infrastructure and digi- tization more broadly, in particular, the absence of an organized commu- nity of researchers with a large and well developed body of knowledge. So incomplete a data foundation would be unthinkable in other infra- structure contexts. Every congressional bill supporting transportation infrastructure, for example, is accompanied by a forecast for the economic growth it will generate and the number of jobs it will create. Nothing comparable can be done for legislation shaping the information infra- structure because there is not even a simple measure of the size of the digital economy nor any apparatus in place to project its growth. Many initiatives to improve measurement of the digital economy were launched in the 1990s—at the Bureau of Economic Analysis, Census Bureau, Bureau of Labor Statistics (BLS), and the National Telecommu- nications and Information Administration. A few of these have survived, for example, a survey about the labor market for information technol- ogy workers, and an estimate of the scale of electronic commerce, called E-Stats. Others did not survive, however—for example, household and business surveys of broadband supply, adoption, and use. Unlike in other developed countries, the best information about the online behavior of the U.S. population came not from a government-spon- sored survey but instead from a private foundation, the Pew Internet and American Life Project. Although the Pew survey has been useful, espe- cially in tracking social behavior online, its scale is limited, ranging from a little more than one thousand to several thousand households at a time. With these sample sizes the survey could only gauge general trends and gain some insight into their variance. It is incapable of achieving what the BLS survey, involving 80,000-100,000 households, does well—providing a

OCR for page 45
52 COPYRIGHT IN THE DIGITAL ERA picture of variance across populations in different regions with different gender, age, skill level, educational, and ethnic profiles. WHAT DATA ARE NEEDED AND AVAILABLE, ACCESSIBLE, OR COULD BE CREATED? Public discourse about copyright would benefit from a range of innovative institutions contributing to measurement efforts. What types of publicly accessible databanks would contribute to research efforts? What standards for data in this area would contribute to building further research? What data remain locked in proprietary vaults but could be unlocked by a standard process for protecting privacy while informing research? What is not being systematically measured but could be? Assessing the health of the copyright system requires, at a minimum, documenting both the supply side and the demand side of the market for each content area—books, movies, recorded music, newspapers, software, etc. On the supply side, this means determining the number of products, and new products, available in each year, and the prices of each of the products. Generally a harder task is quantifying the consumer side of the market, not only the quantities sold but also the amount of use that each product gets. Harder still, but vital for answering important policy ques- tions, is ascertaining the volume of unpaid use of each product over time. Because many copyright industries derive much of their revenue from ancillary activities, it would be useful to know about revenue flows to producers from these activities, including, for example, live performance revenue for musicians and speaking fees for authors. With data of these sorts, one could begin to address the following questions: What has hap- pened to revenue? To what extent has unpaid consumption displaced sales? What has happened to the flow of new creative works? To study the role of each agent in the digital economy—creator, mar- keter, distributor, and consumer—three categories of data are needed. These include data that are currently available to the public but not exten- sively studied in the context of digital technology; data that exist but for whatever reason are not available to the general public; and data that do not currently exist but can be created. Existing Accessible Data We have found a wide range of data sources from government agen- cies to private institutions that can be used to measure the impact of copy- right in the digital age. Most of the data are published on an annual or quarterly basis, although a few reports have been released on a one-time basis. First, we will examine data related to Internet use in general from

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 53 public and private institutions. Then we will look at the relevant data sources for digital copyright in particular. See Table 4-1 for an annotated bibliography of these data sources. The most comprehensive public domain report on the behaviors and demographics of Internet users is the Federal Communications Commis- sion’s High-Speed Services for Internet Access, which focuses on the sta- tus of broadband in the United States. It shows the number of consumers connected on broadband through DSL, cable modem, FTTP, and satellite. The report further breaks down each population group into seven tiers both in terms of upload and download speeds. It also includes a geo- graphical mapping of connection speeds on a state-by-state basis. The Pew Research Center publishes an annual report that shows the number of Internet users by gender, race, age, household income, educa- tion, and community type. This report includes data on broadband and wireless penetration as well as the percentage of Internet users who carry out certain activities online such as reading the news or playing games. Together, the Federal Communications Commission and Pew reports describe some aspects of the user dynamics of the digital world and have the potential to model different aspects of consumer behavior online. Private firms collect a great deal of information on products, prices, and volumes of paid consumption (see Table 4-2). Nielsen, for example, collects very detailed data on the quantities of books and music record- ings sold as far back as 2001 in the case of books and the 1990s in the case of music. Nielsen also conducts a quarterly survey, the A2/M2 Three Screen Report, that tracks the penetration of broadband, HDTV, DVR, and smartphones. In addition, the report contains the number of users for and the hours spent on TV, Internet, and mobile phones broken down by age demographics. Although some researchers have gained access to Nielsen data, they have not been widely used because of the restrictive terms on which they are available. Movie box office revenue data are available from the Internet Movie Database (IMDb) and Box Office Mojo, among other sources. Information on sales of discs is available from Opus and other providers. The RIAA now provides substantial data on its member companies’ current and historical sales activity. Perhaps the biggest void is data on the volume of unpaid consump- tion, yet that, too, is changing. Big Champagne has tracked the popular- ity of copyright-protected works through unpaid distribution channels for a decade. And Google’s recently developed Transparency Report portal provides real-time and historical data on take-down and user data requests. Another regularly published report, by the International Data Cor- poration (IDC), shows the size and growth of digital data over time. The

OCR for page 45
54 COPYRIGHT IN THE DIGITAL ERA TABLE 4-1  Data Requirements for Copyright Analysis: An Illustrative Framework   Supply Demand Music • data on new records, • number of new tickets sold music tracks including • music video plays on YouTube professional, semi- and elsewhere professional, and amateur • radio airplay and listening recordings times (including online • number of concerts streaming services like (with details on venues, Pandora and Spotify) capacities, etc.) • record and Internet sales data • information on quality of • data on unauthorized use new music recorded • copyright status of recorded work Performance • information on the careers, • Information on the Artists activities, and income consumption of artistic of dancers, performers, performances of various types, musical artists, etc. and the impact of digitization on that. Original artistic • information on the careers, • information on the productions activities, and income of consumption of art by originating artists including museums, collectors, galleries, fine artists, architects, corporations and the general designers, sculptors, etc. public Scientific • data on scientific • data on use of prior research papers and researchers by scientific researchers, by research reports • data on the activities professional practitioners who and finances of scientific rely on scientific findings publishers (e.g., physicians), and by the general public Movies • data on new movies, video • data from videos taken down clips released from YouTube, Ustream and • quality measures of new other video content sites video content • cinema attendance numbers • copyright status of • home movie watching recorded work including internet purchases, video rentals, streaming movie services, set-top box consumption, etc. • data on unauthorized use

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 55 TABLE 4-1  Continued   Supply Demand Software • data on the amount of • data on the use and extension software produced, value of software by users in both of such software, and the private and public sectors its diffusion in various • data on user-generated formats software, including software from open source movement Content • data on publication • readership figures of new content, by • time and money spent in publication type (magazine, consuming content newspapers, blogs, • ad revenue for publishers websites, etc.) • data on unauthorized use • copyright status of work numbers include both historical values and future projections as far into the future as 2020. The report shows the cost of information management, the percentages of Internet data that require various levels of security, and the number of people using social networks. The State of the Internet is a quarterly report published by Akamai that provides country-level Inter- net data. The statistics include Internet attack traffic, average connection speed, and number of unique IP addresses. The same data is available on a state-by-state basis for the United States. Some copyright data from government and academic institutions have not yet been analyzed. The online U.S. Copyright Office Database contains roughly 20 million records of works registered since 1978 by creators of books, music, films, maps, software, etc. Each record con- tains the date of creation, date of publication, and basis of the copyright claim. Pre-1978 Copyright Office records are being digitized back to 1923. Another source of copyright data is the Stanford Copyright Renewal Database, which contains renewals of copyrighted books between 1950 and 1992. Each record shows the title, author, renewal date, and renew- ing entity. Another category of government data, important for understanding copyright enforcement, is civil infringement suits filed in U.S. Federal District Courts and criminal prosecutions for infringement. This provides a record of plaintiffs, defendants, and judgments for cases that proceed through litigation. A private firm, Lex Machina, is preparing copyright litigation data in a form that should be useful to researchers. We have also identified data in the private sector that can advance our understanding of the impact of copyright laws. The RIAA publishes an annual Music Consumer Profile report that estimates the market size

OCR for page 45
56 COPYRIGHT IN THE DIGITAL ERA TABLE 4-2  Existing Data Sources and Stakeholders Agents Database Name Source Frequency Consumers A2/M2 Nielsen Quarterly Consumers The Diverse and IDC One-time Exploding Digital Universe Consumers The Digital IDC One-time Universe Decade Consumers High-speed FCC Semi-annual services for Internet access Consumers Survey data Pew Internet Annual Consumers Soundscan Social Nielsen Ongoing Media Report Television Report Creators Copyright records U.S. Copyright Ongoing Office Creators Copyright Stanford One-time renewal database University Distributors 10-K and 10-Q Media Annually and reports distribution quarterly companies Copiers Digital Music IFPI Annual Report 2010 Regulators Music Consumer RIAA Annual Profile

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 57 Description • Analyzes consumer behavior on video related media including TV, Internet, and mobile phones. Discusses what consumers watch, how much time spent, and how trends are changing. • http://en-us.nielsen.com/main/insights/nielsen_a2m2_three • Calibrates size and growth of digital data through 2011. • Also explores the impact of scientific industries as well as the environmental footprints of digitization. • http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital- universe.pdf • Estimates the size and growth of the digital universe through 2020. Also looks at the cost to manage information, security issues, and the prevalence of social networks. • http://www.emc.com/collateral/demos/microsites/idc-digitaluniverse/iview.htm • Provides summary of subscribership data filed by annual providers of high-speed services. Includes details about subscribership differences among census tracts. • http://www.fcc.gov/wcb/iatd/comp.html • Shows the current demographics of Internet users and the activities they do online. Describes the frequencies of Internet activities. • http://www.pewinternet.org/Data-Tools/Download-Data.aspx • Overall sales and viewership figures on a variety of media platforms, including CD, DVD, consumption of social media, etc. Scattered across multiple reports and Nielsen channels • Catalogs all registered books, music, art, periodicals, and other works. Includes the date of creation, basis of claim, previous registration, and claimant. • http://www.copyright/gov/records/ • Creates a searchable copyright renewal records for books published between 1923 and 1963. Contains information on renewing entity, renewal date, and registration date. • http://collections.stanford.edu/copyrightrenewals • Includes financial data such as net income, revenue, and cost of goods sold. Also discloses special events such as CEO departure, bankruptcy, and business risks. Available on company websites and from financial information services. • One measure of the incidence of global music revenue and the impact of unauthorized use across different domains. Imperfect sales suggest that the decline in global music revenue is a result of unauthorized use with certain regions suffering more than others. • http://www.ifpi.org/content/library/DMR2010.pdf • Provides benchmark on genre, format, age, and gender of music consumers. Estimates the overall size of the music industry. • http://www.riaa.com/keystatistics.php continued

OCR for page 45
58 COPYRIGHT IN THE DIGITAL ERA TABLE 4-2  Continued Agents Database Name Source Frequency Regulators The State of the Akamai Quarterly Internet Regulators Fair Use on the Library of One-Time Internet Congress Researchers/ Web of Science Thomson Reuters Ongoing Inventors Researchers/ USPTO Patent USPTO Ongoing Inventors Database for the industry. The market figures are broken down by genre, format, age and gender of consumer, and channel of sales. In addition, music recording companies publish annual financial 10-K reports that contain profit margins and revenues numbers. The same is true of publicly held companies in other copyright-intensive industries such as film, publish- ing, and software. These data can shed light on the stakes involved for copyright regulation. Lastly, there are one-time reports published by government institu- tions and special interest groups to address the issue of digital copyright. The Library of Congress published the Fair Use on the Internet report in 2002, which contains a list of court cases that can help define what is considered fair use and what is not. The International Federation of the Phonographic Industry (IFPI) Digital Music Report 2010 estimates the revenues lost due to music infringement in select countries around the world. Estimates include global revenues for games, music, films, newspapers, and magazines. The report also provides a list of legal music providers for each country. Existing Data with Limited Access Massive amounts of copyright-related data exist but are not readily available for public use for multiple reasons. For example, the records of customer purchases on eBay or Amazon.com can be used to study online consumer behaviors. Due to privacy issues, these data are not easily acces- sible by research institutions and have limited use even for keepers of the

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 59 Description • Includes data gathered across Akamai’s global server network about attack traffic, connection speeds, Internet penetration and broadband adoption. Also aggregates publicly available news and events. • http://www.akamai.com/stateoftheinternet • Assesses the merits of the fair use argument for actions on the Internet. Highlights the difficulty in creating a general guideline for fair use on the Internet. • http://www.fas.org/irp/crs/RL31423.pdf • Scientific publications and citations • http://thomsonreuters.com/products_services/science/science_products/a-z/ web_of_science/ • scientific publications and citations • http://patft.uspto.gov/ data. Another example is the amount and content nature of peer-to-peer file transfers that take place over the Internet. Some of that information exists on peer-to-peer network servers that are operating in questionable legal realms and some on individual personal computer hard drives. For these types of data, the first challenge is simply to identify the sources, then to overcome the legal barriers to access, and agree on protocols to protect privacy, and finally, to aggregate the data into one place. Currently Non-existent Data A full understanding of the digital economy will eventually require collection of additional data that currently do not exist. These data may not be quantitative or even quantifiable. In the Internet realm, with little control and regulation, the data collection process presents many techno- logical challenges. Examples of such data of interest include systematic measures of copyright enforcement, radio playlists for all stations, and licensed use of musical works in television and movies. Closing the Gap We have three suggestions to advance research to inform evidence- based policy making. First, we need to attract social science researchers’ attention to the questions we have identified. By forming a cohort of researchers from a wide variety of disciplines and by supporting them

OCR for page 45
60 COPYRIGHT IN THE DIGITAL ERA with a robust and comprehensive data infrastructure we can make sig- nificant progress on a wide variety of policy issues relevant to copyright. Second, public and private grant-making organizations should sup- port research that builds the data infrastructure that would support research in this area. They could convene a representative group of researchers, for example, under the auspices of NBER, to further iden- tify, characterize, and prioritize data sources. Funding agencies could then assist researchers in negotiating access to such data and in some cases fund their acquisition from industry stakeholders, perhaps through a research consortium. In many cases, private firms hold data that may be recent enough for some research purposes but obsolete commercially. They might be induced to release these to researchers on a rolling basis. Third, as we have observed, the federal government needs to expand the collection of data on the digital economy as well as on intangible assets such as intellectual property holdings and their use. This should take several forms. First, agencies such as the Bureau of Labor Statistics and the Bureau of the Census should consider adding copyright-related information to regularly conducted surveys of businesses and consumers. One prime example would be revising the Bureau of Labor Statistics Time Use Survey to address questions of digital consumption in a contemporary way. In the current survey, there is no measurement of time spent listen- ing to music exclusively rather than in combination with other activities. Although private sector sources of data are important, as we have noted, there are significant limitations of current surveys, and the availability of such data is limited for researchers. The Bureau of Economic Affairs of the Commerce Department has very limited resources to acquire the types of business data described above that could be extremely useful in understanding the landscape of intangible assets. The committee proposes a more ambitious approach. Agencies such as the Bureau of the Census, Bureau of Economic Analysis, National Sci- ence Foundation, U.S. Patent and Trademark Office, and the Copyright Office should form an interagency group that, along with expert advisors, would study the advisability and feasibility of an ongoing and systemic national business survey of intellectual property. Like the Business R&D and Innovation Survey (BRDIS), the IP survey would include samples of businesses in the service and manufacturing sectors. It would probe uses (e.g., licensing) and holdings of intellectual property and costs of acquisi- tion and maintenance. Because of the nature of the production of digital goods, including the prominence of user-generated content, the business survey should be complemented, if at all feasible, by a detailed consumer survey of user-generated content and use. This would include, among other things, measurement of the amount of production and distribution of digital content by non-business entities (i.e., by users), and also mea-

OCR for page 45
DATA INFRASTRUCTURE FOR COPYRIGHT POLICY RESEARCH 61 surement of the consumption of such content by both business and the population at large. Unlike BRDIS, these surveys could be conducted periodically, such as every five years. The Bureau or the National Science Foundation would issue periodic reports of aggregated data, but detailed data would be available to qualified licensed researchers on the same basis as other busi- ness confidential information, through the Census data centers. Such sur- vey data could never provide data to answer all of the research questions we pose in Chapter 3 but would be a considerable advance on the status quo, greatly contributing to our ongoing efforts to better understand the stock and flow of intangible assets in the economy. We cast this proposal as a study recommendation because of the con- straints of our charge and limitations of our expertise. Although a survey would be especially important for understanding copyrights because of the lack of a formal registration requirement, it would make little sense to mount a survey of copyrights alone, neglecting patents and trademarks. Nevertheless, other forms of intellectual property are outside our state- ment of work. Equally important, we are not in a position to judge two very important considerations that could render either or both surveys impracticable—the burden they would impose on respondents (e.g., the need for businesses to conduct patent and copyright searches) and the resources needed by agencies charged with carrying them out. The fed- eral statistical agencies generally are tightly budget constrained and hav- ing to cut back activities.  The gap between what would be ideal in terms of data requirements for a thriving research agenda around copyright and what exists currently is large. Building easily accessible and comprehensive datasets relevant to the study of copyright-relevant industries is crucial for the develop- ment of a research community based around copyright issues. We hope the categories of data described in this chapter will help focus efforts to obtain and create high quality datasets for addressing some of the key policy questions described in this report.

OCR for page 45