a model is of both kinds. The distinction between the two kinds of models is not always sharp because data analysis often/sometimes makes basic assumptions about the data fitting a specific model. To understand how the two categories are affected by the limitations of computational technology, each is first discussed separately. Then the concerns that are apparent when they are considered together are discussed.

Analysis of Experimental Data

The analysis of genetic data,1 proteome data,2 morphologic data,3 and neuroimaging data4 is the most common use of computation in neuroscience. Computing has played a critical role in enabling the technologies that produce these data. This role ranges from data acquisition to creating the algorithms used to tease the signals out of the data.

The hardware requirements for analyzing large data sets are relatively straightforward. Applications based on database search and local alignment depend on a style of high-performance computing (HPC) known as “embarrassingly parallel.”5 This generally requires between 10 and 1,000 identical servers, each of which is given a portion of the search and alignment task to accomplish. Embarrassingly parallel computing requires little coordination between servers. The cost of HPC clusters is now well within the reach of individual departments and research groups owing to the emergence of Beowulf-class computer clusters, which are built on commodity hardware deploying Linux operating systems and open source software.6

The challenges lie in the development of software programs for analysis of large data sets. This requires advances in the science of bioinformatics, also known as computational biology. Bioinformatics is a multidisciplinary field at the intersection of computer science, statistics and molecular biology. Neuroscience


Sources of genetic data are microarrays, whole genome sequences, and epigenetic changes, among others.


An example of proteome data is data that comes from mass spectroscopy.


Morphologic data comes from quantitation of cells, phenotyping, and locating over time.


Techniques that generate functional neuroimaging data include electroencephalography (EEG), magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and near-infrared spectroscopy (NIRs).


“Embarrassingly parallel” tasks typically include doing an identical problem over and over again with different starting configurations or different random seeds. In these cases, there is generally little or no communication between the different processors. For additional information, see the Web site of the University of Melbourne’s Department of Computer Science and Software Engineering at http://www.cs.mu.oz.au/498/notes/node40.html. Last accessed on January 18, 2008.


“Beowulf clusters” describes a set of identical computing nodes that are connected together somewhat loosely in order to enable communication between the nodes. In general, the individual nodes are off-the-shelf computers connected to one another through a commodity means (usually just Ethernet). For additional information, see the Beowulf Project Overview Web site at http://www.beowulf.org/overview/index.html. Last accessed on January 18, 2008.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement