Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 71
10- Three Legal Mechanisms for Sharing Data
Sarah Hinchliff Pearson1
Creative Commons
Sharing data today can be easy; you can simply post them on the web. But doing so means losing
some control over the data, including whether you will be accurately and properly credited. This
is obviously the case when you share data without a related license, contract, or waiver. As I will
explain, to a certain extent this is true even when any one of those legal mechanisms is used.
I will begin by defining some terms. For purposes of this presentation, attribution, credit, and
citation all have distinct meanings. Attribution refers to the legally imposed requirement to
attribute the rights holder when the data are copied or reused in a specified manner. The remedy
against someone who fails to attribute is a lawsuit, either based on breach of contract or
infringement of an intellectual property right, depending on the legal mechanism used to impose
the attribution requirements. Credit, on the other hand, is what we all want--explicit recognition
for our contribution to someone else's work. Finally, there is citation, which is rooted in norms of
scholarly communication. The purpose of citation is to support an argument with evidence.
However, citation has also become a proxy for credit, albeit an imperfect one.
This is an important starting point. It reminds us that legal attribution requirements do not
necessarily match our expectations for receiving credit, nor do they perfectly map to accepted
standards of citation. When the remedy for failure to attribute is a lawsuit, we are well-served to
recognize this incongruity. With that in mind, let us turn to the law.
There are three main legal mechanisms for sharing data: licenses, contracts, and waivers.
Whenever data are shared, there is a possibility they will not be properly cited upon reuse.
Licenses and contracts attempt to eliminate this risk by imposing legal attribution requirements.
Waivers, however, do not legally impose attribution. Instead, they rely on community norms to
ensure proper citation. There are consequences to each of the three approaches. I will address
each below.
Licenses
We will start with the approach for which Creative Commons is best known - licenses. Licenses
operate by granting permission to copy, distribute, and adapt data upon certain conditions. One
of those conditions is attribution, as it is in all Creative Commons licenses. A license sounds a lot
like a contract because it grants permission to use data under certain conditions. However, they
are actually quite different because a license is built upon an underlying exclusive right.
Therefore, in order to understand the scope of a license, you have to understand the scope of the
underlying right. In the context of sharing scientific data, the rights involved are typically
copyright or database rights.
1
Presentation slides are available at http://sites.nationalacademies.org/PGA/brdi/PGA_064019.
71
OCR for page 72
72 DEVELOPING DATA ATTRIBUTION AND CITATION PRACTICES ANDE STANDARDS
We will begin by taking a closer look at copyright law. Copyright law grants a bundle of
exclusive rights to creators of original works at the moment the work is fixed in a tangible
medium. In non-legalese, that means copyright is granted automatically once you write your
work down or enter it into the computer.
Copyright is limited in scope and duration, and the specific limitations vary by country. For
scientific data, the most important limitation of copyright is that copyright never extends to facts.
Copyright does, however, extend to a collection of facts if they are selected, arranged, and
coordinated in an original way. The required threshold is low.
There is significant uncertainty about where the line of copyright extends, even among copyright
lawyers. To complicate matters further, this line varies somewhat according to the laws of each
country.
Determining what is subject to copyright is only the first hurdle. The next task is identifying the
scope of copyright protection. Even when a database or a collection of facts is subject to
copyright, the facts themselves remain in the public domain. This means that the general rule in
the U.S. and elsewhere is that data can be extracted from a copyrighted database without
infringing copyright law.
That is not true, however, in the European Union (EU). In the EU and a few other countries,
governments have implemented what are called sui generis ("of their own kind") database rights.
These rights allow a database maker to prevent the extraction and reuse of a substantial part of
the contents of a database, even if the contents are otherwise in the public domain.
A license can be built atop copyright or database rights or both. By way of example, Creative
Commons ("CC") licenses are copyright licenses. If a CC license is applied to a database, it
covers both the data and the database, all to the extent each is subject to copyright. Any use of
the data or database that implicates copyright, requires attribution. Any use of the data that does
not implicate copyright if for example, the data are in the public domain does not require
attribution, even if it triggers database rights.
Because of the difficulty of deciphering the contours of copyright protection in scientific data
and databases, it is very hard for both the data provider and data user to know when the license
applies and when it does not. In other words, it is difficult to know when attribution is legally
required. This creates a number of risks.
For one, it creates the risk that data providers will be misled about what they are getting when
they apply a license to their data. They may believe that if they apply a license to their data, any
use of the data will require attribution. As I explained earlier, that is not the case. If the data are
in the public domain, or if the use of copyrighted data falls under fair use, the attribution
requirement is not triggered.
It also creates the risk that data users (also referred to as the licensee) will misjudge their
attribution requirements because of the difficulty in determining when copyright applies. They
may under- or over-comply with the license without realizing it. Either situation can be
problematic.
OCR for page 73
THREE LEGAL MECHANISMS FOR SHARING DATA 73
In addition to the legal uncertainty, licenses also create the risk of imposing burdensome
attribution requirements. In the science context in particular, projects often rely on data gathered
from a variety of different sources. Depending on the licenses used, it is possible that would
require attributing each individual or institution that contributed any piece of data to the project.
This is a problem we call attribution stacking.
This raises yet another potential problem with attribution. Attribution obligations written into a
license are, by their nature, inflexible. No lawyer can anticipate every situation in which the
attribution requirements would be triggered and account for all of the circumstances in which
they will be applied. This can create some absurd situations where, for example, a user or
aggregator of data may technically be required to attribute 1000 different data providers, all in
the idiosyncratic manner that the rights holder has dictated. Conceivably, the user could do all
this and still not satisfy people's expectations for receiving credit or accepted standards of
citation.
Contracts
The next legal mechanism for requiring attribution is contract law. Contracts can have different
names and take a lot of different forms, but they are often called data use agreements or data
access policies.
Unlike a license, a contract does not necessarily require an underlying intellectual property right.
Technically, it requires a few legal formalities, including an offer and acceptance. In practice,
sometimes that manifests in an online agreement, where the user has to click to accept the terms
to access to data. Other times the user is presumed to have accepted the terms by continuing to
use the site. If you read those terms, they may require attribution.
Like licenses, contracts suffer from a number of potential downsides. For one, they likely impose
confusing obligations on users who get data from a variety of sources, all subject to different
user agreements. This problem is even more pronounced with contracts because at least public
licenses are somewhat standardized. User agreements are not, which means each data source
likely has a different user agreement, filled with legalese imposing attribution and other
obligations on users. The consequence is that some data sources may not be used simply because
users cannot understand the terms.
Another limit to contract law is that it only binds the parties to the agreement. That may sound
obvious, but this is not the case with licenses. If someone obtains licensed data and shares them,
the person who obtains them it from that second user is still bound by the conditions of the
license. If the data were shared by contract alone, the person who obtained the data from the
second user would not be bound by the terms of the contract because they were not a party to the
original agreement. In this respect, contracts have a more limited reach than licenses.
OCR for page 74
74 DEVELOPING DATA ATTRIBUTION AND CITATION PRACTICES ANDE STANDARDS
In a different respect, contracts have a broader reach than licenses. Because they are not tied to
an underlying right, contracts can impose obligations on actions that are not restricted by
copyright or database rights. The effect could be to restrict or take away important rights granted
to the public. For example, in 2011, the Government of Canada launched an open data portal
with a related contract controlling access to the data. This agreement initially had a provision that
forbid any use of the data that would hurt the reputation of the Canada. This requirement created
an uproar and was changed within a day. Nevertheless, this example shows the potential for
overreaching. This sort of thing is particularly troublesome in the context of standardized
contracts, where the terms are rarely read and almost never negotiated.
Waivers
The last legal mechanism is the waiver. Waivers can take many forms, but the purpose is to
dedicate the data to the public domain.
Waivers are not enforceable in every jurisdiction. To deal with this problem, CC has created a
tool called CC0 (read CC Zero) that uses a three-pronged approach designed to make it operable
worldwide. The first layer is a waiver of copyright and all related rights. If the waiver fails, CC0
has a fall-back license that grants all permissions to the data without any conditions. As a final
backup, CC0 contains a non-assertion pledge, where the rights holder promises not to assert
rights in the data.
Obviously waiving rights to a dataset means the provider no longer has control over it. Among
other things, that means the data provider cannot require attribution (although they can certainly
encourage it). Yet, as mentioned above, nearly every approach requires losing some measure of
control in the data. Waivers also provide legal certainty in a way that contracts and licenses do
not. There is no need to try to decipher the scope of copyright protection or consult a lawyer. Nor
is there a need to try to parse the legalese of a variety of different user agreements. Note this
certainty does not exist when data are released without any legal mechanism. The silent approach
leaves people guessing about whether property rights exist in the dataset and whether they risk
liability by using it.
To summarize, each approach has consequences. With licenses, we face legal uncertainty about
the scope of the license, and we risk imposing attribution requirements that are inconsistent with
relevant community norms and expectations. With contracts, we gain some measure of legal
certainty, but we risk imposing even more burdensome attribution obligations as each institution
or data provider creates its own contractual terms. Contracts also pose the risk of overreaching
and imposing obligations that may restrict important rights of users. Waivers avoid the problems
associated with licenses and contracts, but they require giving up control.
OCR for page 75
THREE LEGAL MECHANISMS FOR SHARING DATA 75
It is important to remember that there is no mechanism that can impose legally binding
obligations in a way that perfectly maps to our expectations for receiving credit or accepted
standards of citations. By trying to use the law for control, we risk imposing unnecessary
transaction costs on data sharing. We also potentially push people away from using our data
sources. Choosing the right approach requires an understanding of the consequences. The
conversation at this workshop is a good start.
OCR for page 76