We will begin by taking a closer look at copyright law. Copyright law grants a bundle of exclusive rights to creators of original works at the moment the work is fixed in a tangible medium. In non-legalese, that means copyright is granted automatically once you write your work down or enter it into the computer.
Copyright is limited in scope and duration, and the specific limitations vary by country. For scientific data, the most important limitation of copyright is that copyright never extends to facts. Copyright does, however, extend to a collection of facts if they are selected, arranged, and coordinated in an original way. The required threshold is low.
There is significant uncertainty about where the line of copyright extends, even among copyright lawyers. To complicate matters further, this line varies somewhat according to the laws of each country.
Determining what is subject to copyright is only the first hurdle. The next task is identifying the scope of copyright protection. Even when a database or a collection of facts is subject to copyright, the facts themselves remain in the public domain. This means that the general rule in the U.S. and elsewhere is that data can be extracted from a copyrighted database without infringing copyright law.
That is not true, however, in the European Union (EU). In the EU and a few other countries, governments have implemented what are called sui generis (“of their own kind”) database rights. These rights allow a database maker to prevent the extraction and reuse of a substantial part of the contents of a database, even if the contents are otherwise in the public domain.
A license can be built atop copyright or database rights or both. By way of example, Creative Commons (“CC”) licenses are copyright licenses. If a CC license is applied to a database, it covers both the data and the database, all to the extent each is subject to copyright. Any use of the data or database that implicates copyright, requires attribution. Any use of the data that does not implicate copyright - if for example, the data are in the public domain - does not require attribution, even if it triggers database rights.
Because of the difficulty of deciphering the contours of copyright protection in scientific data and databases, it is very hard for both the data provider and data user to know when the license applies and when it does not. In other words, it is difficult to know when attribution is legally required. This creates a number of risks.
For one, it creates the risk that data providers will be misled about what they are getting when they apply a license to their data. They may believe that if they apply a license to their data, any use of the data will require attribution. As I explained earlier, that is not the case. If the data are in the public domain, or if the use of copyrighted data falls under fair use, the attribution requirement is not triggered.
It also creates the risk that data users (also referred to as the licensee) will misjudge their attribution requirements because of the difficulty in determining when copyright applies. They may under- or over-comply with the license without realizing it. Either situation can be problematic.