and communication of digital data are done by computers with relatively little human oversight, erroneous data can be rapidly multiplied and widely disseminated. Some projects generate so much data that significant patterns or signals can be lost in a deluge of information. As an example of the challenges posed by digital research data, Box 2-1 explores these issues in the context of particle physics research.
Because digital data can be manipulated more easily than can other forms of data, digital data are particularly susceptible to distortion. Researchers—and others—may be tempted to distort data in a misguided effort to clarify results. In the worst cases, they may even falsify or fabricate data.
Digital Data in Particle Physics
From the invention of digital counting electronics in the early days of nuclear physics, to the creation of the World Wide Web and the data acquisition technology for the Large Hadron Collider (LHC), particle physics has been a major innovator of digital data technology. The LHC, which recently came into operation at the European Center for Nuclear Research (CERN) in Geneva, has spawned a new generation of data processing. The accelerator collides two beams of protons, resulting in about a billion proton-proton collisions every second. These collisions occur at several points around the 27-km circumference of the circular accelerator. This first step of the process is difficult enough to imagine, but the next steps are even more amazing.
Part of the energy carried by the two colliding protons is converted into matter by fundamental processes of nature. Some of these processes are well understood, but others might represent major discoveries that could deepen our understanding of the universe—for instance, the creation of particles that constitute the so-called dark matter inferred from astrophysical measurements.
The spray of energetic outgoing particles from one such collision is called an event.
The particles in the spray have speeds approaching the speed of light. They fly out of the proton-proton collision point into a surrounding region that is instrumented with an array of sophisticated particle detection devices, collectively called a detector. The detector senses the passage of subatomic particles, creating a detailed electronic image of the event and providing quantitative information about each particle such as its energy and its relation to certain other particles.
Each proton-proton collision generates about 1 megabyte of information, yielding a total rate of 1 petabyte per second. It is not practical to record this staggering amount of information, and so the experimenters have devised techniques for rapidly selecting the most promising subset of the data for exhaustive analysis.
Only a tiny fraction of the deluge—perhaps one in a trillion—will be due to new kinds of physical processes of fundamental importance. Once the detector has recorded an event, a high-speed system performs a rapid analysis (within 3 micro-