Based on the models that I have been talking about and the need for bioinformatics, I want to propose a set of goals against which we measure these legal regimes.
Interoperable: data from many sources can be combined without restriction
Reusable: data can be repurposed into new and interesting contexts
Administrative Burden: low transaction costs and administrative costs over time
Legal Certainty: users can rely on legal usability of the data
Community Norms: consistent with community expectations and usages
The first goal is interoperability. The question is: Can data from different sources be combined? We have seen that the ability to combine the data is really very important for bioinformatics. You cannot link together knowledge that leads to new discoveries if are not aware that such knowledge exists. While the growing costs of scientific periodicals have been widely discussed, the most important issue to scientists is not only cost, but accessibility and searching. In other words, the problem is interoperability of knowledge.
Public Domain ****
Can be combined with other data sources with ease
Community Licenses *** / **
Depends on type of license: share-alike or copyleft are unsuitable, but attribution-only licenses are less problematic
Private Licenses * / **
Depends on restrictions, but not scalable; permutations too large
Transaction costs and the administrative burden are significant barriers to data integration. What are the costs not only for any specific transaction, but over time? Even something as simple as an attribution requirement, when you are required to give citation, can become a huge burden if you are looking at thousands of different data sources or millions of data elements.