To apply this idea to scientific information artifacts, one creates a set of conventions for syntactic and semantic compatibility among components and a standard packaging mechanism to make selecting and installing components easy. One starts with the primary sources (databases, knowledge bases, and the like), applies a script to do the normalization, and comes up with a packaged component. The resulting “binary” may or may not be collected with others to make a distribution. Someone creating a local installation optimized for local query obtains needed components from one or more distributions and installs those into their own environment.
Some two dozen components have been created and collected in the NeuroCommons framework. The components are independent and the architecture is open, so that anyone may pick and choose the ones they like without having to take all of them. One may create new components and either add them to the distribution (subject to quality control), create a new distribution, or just use them privately. Currently the NeuroCommons distribution is accomplished either through a set of RDF files or a database dump.
Bio2RDF (http://bio2rdf.org) is an open-source project that aims to facilitate biomedical knowledge discovery using Semantic Web technologies. Bio2RDF is an important contributor to the Linked Data Web, offering the integration of over 30 major biological databases with content ranging from biological sequences (such as are stored in UniProt, Genbank, RefSeq, Entrez Gene), structures (from the Protein Data Bank), pathways and interactions (cPATHs), and diseases (OMIM), to community-developed biomedical ontologies (OBO).
This project builds on W3C standards for sharing information over existing Web architecture and representing biomedical knowledge using standardized logic-based languages. Powered by open-source tools, Bio2RDF enables scientists to not only explore manually curated and computed aggregated knowledge about biological entities but to also link their data and enable all scientists to ask fairly sophisticated questions across distributed, but integrated, biomedical resources. Bio2RDF-linked data are available today as N3 files, indexed Virtuoso databases, and SPARQL endpoints across three mirrors located in Canada and Australia.
With interest growing in the Bio2RDF data and services beyond the initial developers, the group is fielding requests to add more than 50 additional data sources in the areas of yeast and human biology, toxicogenomics, and drug discovery.