Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 69
8
EMERGING METRICS AND MODELS
Continuing progress in measuring the returns on research
investments requires new metrics and models to analyze how the inputs
to research are converted into both short-term outputs and long-term
impacts. NSF Director Subra Suresh provided the context for this
discussion in a lunchtime keynote address that described five themes
guiding NSF’s investment decisions. Two separate sessions at the
workshop included seven speakers who examined specific tools and
approaches, from the creation of a science policy infrastructure at NSF to
visual analytics that can probe data sets for unexpected findings.
ASSESSING RESEARCH AT NSF
Traditional measures of research outputs provide only a partial
picture of the state of scientific research in the United States, said NSF
Director Subra Suresh during his keynote address at the workshop. For
example, if the percentage of scientific publications were extrapolated
into the future based on the trends of the last few years, China’s
percentage would surpass that of the United States in 2013 or 2014.
Publications are only one metric, Suresh acknowledged, and their impact
is a matter of debate, but “agencies like NSF are looking at the
significance, or lack thereof, of these kinds of metrics.”
Taking a different metric, the United States led the world until 2000
in R and D expenditures as a fraction of GDP. But in that year three
major competitors —Germany, Japan, and South Korea— surpassed the
United States, and several smaller countries have done so since. Other
countries, such as China and Singapore, are investing very heavily in
science and engineering research.
69
OCR for page 70
70 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
With the increasing globalization of research, metrics of the United
States’ competitive edge will inevitably change. But such changes raise
the question, said Suresh, of “what kind of metrics do we put in place so
that we can position ourselves most appropriately for the future?”
At the National Science Foundation, this question should be
considered within the context of five broad themes that are guiding the
agency. First, science has entered what Suresh called a “new era of
observation.” Digital technologies make it possible to generate data at an
unprecedented pace. These data, along with new computational tools, are
creating both tremendous excitement and new problems. NSF is devoting
considerable effort to the development of cyberinfrastructure that can
take advantage of these opportunities and solve the problems. In
particular, cyberinfrastructure provides new capabilities for assessment
of research. For example, the agency is asking what kinds of capabilities
it can put in place in situations where the research community uploads
data and information automatically. Researchers already have many
responsibilities, and NSF has to be careful not to impose unfunded
mandates on the community, said Suresh. But cyberinfrastructure makes
it possible to store, integrate, sort, extract, and permanently archive
information. How can this information best be used while protecting the
integrity and confidentiality of the scientific process, Suresh asked. How
can NSF work with other federal agencies and with its counterparts
around the world to use this information to move science and education
forward?
A second important opportunity, according to Suresh, is to integrate
data and ideas from the social sciences and from the natural sciences. As
an example, Suresh described NSF-sponsored research that identified the
potential economic benefits of auctioning off portions of the
electromagnetic spectrum. The 2012 federal budget projected that such
auctions are expected to yield approximately $28 billion over the next
decade, with $10 billion of that being set aside budget deficit reduction.
“That’s a tangible contribution to policy of social sciences research
sponsored by NSF some 20 years ago,” Suresh said. The social sciences
research being sponsored by NSF offers many similar opportunities to
leverage natural sciences research. In the context of clean energy, for
example, Suresh has been talking with officials at the Department of
Energy on how social, behavioral, and economic research sponsored by
NSF can contribute to research supported by the department.
A third opportunity is to expand research partnerships both within
the United States and internationally and through people exchanges as
OCR for page 71
71
EMERGING METRICS AND MODELS
well as virtually through digital technologies. As NSF lacks the
capability to engage in multiple bilateral relationships with many
countries, Suresh has been exploring how NSF can work with private
foundations and with multilateral bodies such as the G20 countries to
enhance international cooperation.
Suresh’s fourth theme was the need to continue investing in the
development of human capital, especially the STEM workforce, not just
for the United States but for the world. Since 1952, Suresh noted, NSF
has funded 46,000 graduate research fellows. In 2010 it doubled the
number of graduate fellows to 2,000 per year and kept the number at
2,000 in 2011. In addition, the stipend was increased from $10,500 to
$12,000, and NSF’s goal is to sustain that level of support into the future.
NSF’s’ initial graduate fellows would be well into retirement by now.
How were their careers shaped by NSF’s support? Have the fellowships
helped women and underrepresented minority groups over the past 58
years? What effect have career awards and young investigator awards
had on researchers? New computer technologies could gather
information to help answer some of these questions and shape human
capital policies within the financial constraints expected in the future.”
A fifth theme was the need to measure the impacts of NSF funded
research intelligently and over a long period of time. Although a good
deal of the research NSF funds has purely scientific motivations, some of
it has helped generate entirely new industries making significant
contributions to the economy, Suresh observed. How can NSF help
match the products of research with the needs of the marketplace without
taking money away from fundamental research? How can the agency
reconcile the short-term economic focus of the country and its elected
leaders with the long-term benefits of basic research? How can NSF best
articulate the benefits of basic research funding over the course of
decades for the American public and the global society? Suresh
suggested that a possible model could be the studies of higher education
institutions’ contributions to the economy of the Boston area. He also
cited the number of startup companies that have emerged in part from
NSF-funded nanoscience and engineering centers. In addition, he
recounted physicist Michael Faraday’s response to William Gladstone
when asked about the practical value of electricity. Faraday replied, “One
day, sir, you may tax it.”
Suresh concluded his remarks with an invitation to workshop
participants to make suggestions to NSF on its policies and programs:
What new kinds of programs need to be put in place to take advantage of
OCR for page 72
72 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
current opportunities? Should NSF’s merit review process be changed to
recognize truly transformative multidisciplinary research? Can NSF
promote family-friendly policies that will enable women in much greater
numbers to join STEM workforce? Such input “would be enormously
helpful,” Suresh said.
THE STAR METRICS PROJECT
In 2005, OSTP Director John Marburger observed at a AAAS
policy forum that he found it very difficult to provide an evidence-based
answer to the question, “How can the federal government optimize its
investments in science?” An interagency working group under the title of
Science of Science Policy came to a similar conclusion in 2008, noting
that no solid theoretical and empirical basis exists for deciding the level
or allocation of scientific investments.
Those observations, along with the establishment of the Science of
Science and Innovation Policy (SciSIP) program at NSF, culminated in
an initiative to build a data infrastructure that would help answer the
questions posed by Marburger and the interagency group. SciSIP
Director Julia Lane described this system, known as STAR Metrics, at
the workshop.
The Motivation for STAR Metrics
The motivation behind the system is threefold, said Lane. First, a
principle of good government is that officials should be able to document
the results of government spending. Instead, she said, most agencies are
unable to document what researchers are supported, let alone what are
the results of their work. Second, agencies need to be responsive to
stakeholders, and the Office of Management and Budget, Office of
Science and Technology Policy, and Congress are all asking for data.
Third, the utility of the data requires new analytical approaches and the
use of cutting edge technologies. “Relying on manual and burdensome
reporting simply doesn’t make sense.”
What is STAR Metrics?
STAR Metrics is a federal and university partnership to document
the outcomes of science investments to the public. It is an OSTP
initiative partnering with NIH, NSF, DOE, and EPA that is divided into
OCR for page 73
73
EMERGING METRICS AND MODELS
two phases. Phase 1 involves establishing uniform, auditable, and
standardized measures of the initial impact of ARRA and base budget
science spending on job creation. Phase II calls for the collaborative
development of measures of the impact of federal science investments on
the creation and diffusion of scientific knowledge (through publications
and citations), economic growth (through patents, start-ups, and other
measures), workforce development (through student mobility and
employment), and social outcomes such as health and the environment.
This represents what Lane termed a “sea change” from the current
data infrastructure on public science. For 50 years, the science agencies
have essentially been proposal processing and award administration
factories, she said. They apply labor and capital to the receipt of
proposals, the awarding of grants and contracts, and the management of
their performance. The proposal or award is not a behavioral unit of
analysis but an intervention. The behavioral unit of analysis is the
individual scientist. There is a pressing need, said Lane, is to restructure
the data system to “look at the human beings who are affected by science
funding and try to explain their behavior.”
Nevertheless, observed Lane, it makes less and less sense to talk
about the outcome of an individual award. Increasingly, the relevant unit
of analysis is a cluster of researchers, a scientific field or subdiscipline,
or an entire research agenda. In addition, principal investigators typically
get funding from a stream of activities, so being able to identify the
incremental impact of an individual award is extraordinarily difficult.
This has implications for the structure of the data within the agencies.
“You have to capture the activities of the scientists over their entire
period of activity, not just the period of the award.” Finally, the
outcomes of many awards occur long after the administration of the
award. Unless this long-term benefit is measured, the impact of a
scientific investment will be under-estimated.
Capturing Data
In the twenty-first century, almost all scientific activity occurs
electronically, yet reporting of scientific activities is often still done
manually. “Submitting data that are in PDF format that are unstructured
and unsearchable means that you miss enormous amounts of what’s
going on,” said Lane.
In phase I, the STAR Metrics program sought to capture who is
being supported by scientific funding without burdening researchers. It
OCR for page 74
74 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
did that by using the internal administrative records of researchers’
institutions to capture that information as it flows from one place to
another. STAR Metrics receives 14 administrative data elements from
awards, grants, human resources, or finance systems on a quarterly basis.
Phase I began with a pilot project at six institutions. Since then, 75
institutions have joined on a voluntary basis. The data need not be
personally identifiable.
As an example of the information that can be generated in phase I,
Lane cited data on full time equivalent (FTE) positions. The data yield
quarterly reports on FTE jobs generated by ARRA, total FTE jobs and
positions, FTE jobs generated through subawards and among vendors,
and jobs generated through overhead payments. “For the first time, for
each institution, we’re able to document how many people are
supported,” Lane said. Faculty are only a small proportion - about 20
percent - of the FTEs that are supported. Support services, graduate
students, postdoctoral fellows, undergraduate students, and others
represent 80 percent of the supported positions. An FTE may represent
several supported students. The data also make it possible to calculate the
total number of individuals supported by research funding, along with
the number of positions supported outside universities through vendor
and subcontractor funding. “Not a single PI lifted a pen or typed a
keyboard to enable us to pull this information, yet the information is very
powerful and can be used to inform federal and state lawmakers.”
Future Plans
The next step in STAR Metrics’ development is to develop the main
features of the phase II platform that will compile information from
individual researchers, commercial publication databases, administrative
data, and other sources to capture as much information about scientific
activities as possible. Federal policymakers, agency officials, research
institutions, and investigators “will have a common and coherent system
of understanding what they’re doing and the impact of what they’re
doing,” Lane said.
OCR for page 75
75
EMERGING METRICS AND MODELS
RECONSTRUCTING NETWORKS OF DISCOVERY
The media have been questioning the return on federal research
investments, noted Stefano Bertuzzi from the Office of Science Policy
Analysis in NIH’s Office of the Director. A 2008 article in Newsweek
concluded that “judging by the only criterion that matters to patients and
taxpayers— not how many interesting discoveries have been made, but
how many treatments for disease the money has bought— the return on
investment to the American taxpayer has been approximately as
satisfying as the AIG bailout.” A more recent article in Nature entitled
“What Science Is Really Worth” ran under the tagline, “Spending on
science is one of the best ways to generate jobs and economic growth,
say research advocates. But the evidence behind such claims is patchy.”
Building an Empirical Framework
Continuing the discussion of STAR Metrics, Bertuzzi described it as
a way of combining and linking input measures with economic,
scientific, and social outcomes. For example, when a new discovery or
technology is licensed to a company, the license represents a return on
research investments. STAR Msfrics would “unpack what is inside the
black box of the licensing,” said Bertuzzi.
Bertuzzi demonstrated a prototype tool based on the discovery of
drugs for rheumatoid disease. These are transformative drugs that can
seem to bring people back from near death, and they generate billions of
dollars in sales each year. Using information from STAR Metrics, it is
possible to trace the developments that led to these drugs using the
scientist as the unit of analysis.
The scientific story began with fundamental research on
inflammation, which led to the discovery of tumor necrosis factor (TNF).
Further research on molecular mechanisms involving TNF gave rise to
several different drugs that work in different ways to reduce
inflammation.
STAR Metrics data show the levels of public and private funding
for this research as based on funding attributions in publications related
to TNF. Funding began largely in the public sector at NIH and then
decreased over time as private funding increased. The data also yield an
interactive website that presents a timeline of milestone events that led to
the approval of specific drugs. Clicking on an event in the timeline
OCR for page 76
76 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
produces a list of the scientists involved in publishing key papers.
Clicking on the paper pulls up a brief CV along with highlights of the
discovery and funding sources. Further links connect scientists with
patent databases and other information.
The links among scientists, discoveries, publications, patents, and
other information form networks that allow the process of discovery to
be visualized. Interactive websites make it possible to explore the
network to uncover collaborations, institutional connections, linked
events, and other aspects of innovation. “We will be able to collect,
through federal-wide profiles, what the scientists themselves tell about
their stories, their interests, and their discoveries,” said Bertuzzi. STAR
Metrics will make it possible to “disentangle and unpack all the
complexity of the network that eventually led to that particular
discovery.” A potential practical application would be to look for the
common features of successful discovery processes and then try to
replicate them.
CREATING KNOWLEDGE FROM DATA
The outputs of research historically have been viewed as consisting
of papers, patents and human resources, noted Ian Foster, Arthur Holly
Compton Distinguished Service Professor and Chan Soon-Shiong
Scholar at the University of Chicago. Papers document ideas, patents
establish ownership rights, and human resources constitute people who
are trained in ideas and in methods.
Today, said Foster, large amounts of human intellectual capital are
being captured in other forms— especially as data and computer
software. These resources also capture ideas and methods that can be
transferred from one person to another. Such resources have been
growing explosively. In 2001, according to an annual report from the
journal Nucleic Acids Research on the number of publicly available,
high-quality databases in molecular biology, there were 96 molecular
biology databases. In 2010, there were 1,070, and in 2011 there were
1,330. Some of these databases have tens of millions of entries and
billions of bytes of nucleic acid information. “Historically, we might
have thought of people as conducting an experiment, writing it up, and
putting the results into a paper which other people would read, build on,
and perhaps cite in their publications. Clearly, consulting databases
OCR for page 77
77
EMERGING METRICS AND MODELS
rather than the literature has become a primary means of accessing the
work of other investigators.”
In addition, an expanding set of online services provide access to
software. “Web services” is a term often used to refer to the software that
is made available over the internet by standardized protocols. One
registry lists 2,053 services provided by 148 providers. Some of these
provide very simple functions, but others provide sophisticated
computational capabilities to scientists who otherwise would not have
access to them. Furthermore, many of these services are made freely
available to others, often through large development and distribution
communities. “Data and software are two types of resources that are
becoming fundamental to how people do science, and they are being
shared in ways that are very different than just a few years ago.”
New methods are needed for evaluating these resources, said Foster,
including their impact on the research process as well as on downstream
activities such as job creation, patenting, and the formation of
companies. The fact that these resources are digital makes such
evaluations somewhat easier, because accessing an electronic database or
piece of software involves a digitally mediated transaction and can be
logged and analyzed in the future. Collective analysis of these
transactions, along with more conventional metrics, also can reveal the
ways in which knowledge is integrated. For example, the MyExperiment
project seeks to make the sharing of computational procedures, data, and
software as easy as sharing images on a social networking site. The site
also makes it possible to share workflows and reports on how often they
are used and for what purpose. “We can look not only at how people
interact with people via publications but also how software interacts with
data and data with software and people with software and data.”
The STAR Metrics program also seeks to capture research activities
and outputs in the form of a distributed database. In that context, it
becomes possible to automate many administrative tasks such as creating
biosketches, progress reports, final reports, and tenure reviews.
In this and other ways, researchers derive tremendous value from
such platforms, said Foster. Researchers are as interested as evaluators in
the connections between different knowledge bases. A system that links
all research outputs to all relevant research inputs would be invaluable to
researchers who are trying to determine which pathways have not been
explored and should be pursued, which research strategies are most
useful, and how a particular research problem has been tackled in the
past. “With luck we will find, as is often the case in science, that the very
OCR for page 78
78 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
activity of observing something will change the activity that we are
observing, and accelerate its process.”
MEASURING THE IMPACT OF STAR SCIENTISTS
Measuring the impact of research requires a long-term view, said
Lynne Zucker, Professor of Sociology and Policy Studies at the
University of California, Los Angeles. The short-term impact can be
much smaller than the long-run impact. To see these long-term impacts,
said Zucker, “ten years out is about the minimum, in my experience,
from having done a lot of evaluations of programs for the University of
California system and for the Advanced Technology Program and other
programs.”
Many new ideas are embodied in those who conceive them. People
have high amounts of tacit knowledge, and they can transmit this
knowledge to others. People who have been doing the same kind of
science often can absorb these ideas quickly, but in general the diffusion
of ideas is slow. Teams that include what Zucker called “star scientists”
have been located primarily in universities, but increasingly they occur in
firms, too. “There’s a lot of basic science going on in industry,” said
Zucker.
Biotechnology is an exemplar of a science-driven industry.
Scientific breakthroughs led to hundreds of new firms. Consolidation
occurred when scientific advances slowed, with some firms growing and
others failing. However, the number of jobs continued to grow, so that
people were absorbed into the successful companies. In the case of
biotechnology, the growth and change were revolutionary enough that an
entirely new industry was created.
Developing an infrastructure to collect data about knowledge flows
into industry is a complicated process and has not been done well in most
industries, according to Zucker. However, in biotechnology, a system
known as Bioscan makes it possible to track the process of transferring
knowledge from molecular biology into industry. Bioscan also shows
that firms in which star scientists are involved have higher employment
growth than others. “It’s a selection process— the top talent gets selected
first,” said Zucker
A new model of a high-science firm emerged in biotechnology.
Scientists were free to publish and were rewarded for it, both in salary
and stock options. Firms had deep collaborations with university faculty,
OCR for page 79
79
EMERGING METRICS AND MODELS
and rewards were closely tied to the firms’ outputs. Large incumbent
firms learned to emulate this culture, and if they did not they had a
tendency to fade and die.
More recently, many nanotechnology firms have been adopting the
biotech model and are undergoing a similar process. Many startup and
incumbent firms are competing, with roughly one in ten firms having star
scientists involved in their firms. Nanotechnology is more geographically
distributed in the United States than biotechnology. But where star nano-
scientists are active has been a key determinant of where and when new
firms enter the field.
NSF funding for nanotechnology has had a large impact in the field,
Zucker observed, contributing to large increases in published nanoscale
articles and significant growth in nanoscale patenting.
The impacts of star scientists vary across S and T areas in
proportion to technological opportunity, said Zucker. Some areas have
had recent breakthroughs, and those areas are going to have more
opportunities than areas where the science is more mature. But scientific
fields also make their own opportunities, as when biotechnology firms
have begun working in nanotechnology.
In general, said Zucker, federal investments appear to be important
for impacts in all science and technology areas, but to test this idea she
and her colleagues have been developing an integrated database with
input from multiple sources. The resource is beginning to produce early
results, and “the general answer so far is yes, with some variation,
federal grants do make a big difference . . . for most science areas.”
The initial version of the resource, StarTechZD, is now available on
the web (http://startechzd.net) and permits the tracking of knowledge,
funding, and economic impacts. It can identify both organizations and
particular scientists within and across databases. It also can separate
organizational and individual efforts. Zucker called it a “quantum jump
in the ability to analyze science and technology. . . It’s an extremely
important tool.”
VISUAL ANALYTICS
Visual analytics is the science of analytical reasoning facilitated by
interactive visual interfaces, said John Stasko, Professor and Associate
Chair of the School of Interactive Computing at the Georgia Institute of
Technology. It combines automated analysis techniques with interactive
OCR for page 80
80 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
visualizations for effective understanding, reasoning, and decision
making on the basis of large and complex data sets. Another way to think
of visual analytics, said Stasko, is that it combines interactive
visualization, computational data analysis, and analytical reasoning.
“Visualization is not about making pretty pictures,” he said. “It’s about
helping people solve problems and gain insights from their data.”
Visualization is not appropriate for every problem. If someone is
interested in how many people are employed in an area, a data mining
algorithm can find the best fit. However, visualization is a powerful tool
in exploratory data analysis scenarios, ”where someone drops a pile of
data in your lap and says ‘Help me understand what’s there.’” These are
scenarios were people typically do not know exactly which questions to
ask.
Effective visualization tools both answer questions and raise
questions. The interactive aspects of the data enable someone using to
tool to essentially have a conversation with the data. “You explore one
angle and a new question arises. It’s through the interaction where things
happen.”
Some existing visualizations can be frustrating, Stasko admitted.
For example, large network graphs such as maps of science do not
necessarily convey clear conclusions. A map might show that
mathematics is strongly related to computer science, but such an
observation is not very interesting. Also, one visualization cannot
necessarily show all of the variables that someone might want to
represent. They present a static view of connectivity, clustering, or
centrality, “but you want to go beyond that.”
Stasko cited several examples of effective interactive visualizations.
The Social Action system uses social network analysis to measure the
centrality of different nodes in the network, thus combining the
algorithmic analysis of the data with interactive exploration. Another
system called Jigsaw does document analysis of unstructured text.
Through such processes as text mining and entity identification, it
produces multiple interactive visualizations of the content of the
documents for exploration. Finally, Stasko mentioned a system called
Ploceus (named after a weaver bird that creates elaborate nests) that does
network visualizations from tabular data. The system takes data from a
spreadsheet, for example, and creates networks that allow the data to be
explored.
Stasko concluded by saying that there are many different methods
of data analysis and they are not mutually exclusive. The best kinds of
OCR for page 81
81
EMERGING METRICS AND MODELS
data analysis combine statistical, automated computational, and visual
exploratory methods, he said. From such explorations of data, where the
questions are not necessarily defined beforehand, insightful discoveries
can emerge.
CONSIDERATIONS IN BUILDING COMPREHENSIVE
DATABASES
Adam Jaffe, Dean of Arts and Sciences and Fred C. Hecht Professor
in Economic at Brandeis University, commented on the importance of
creating a comprehensive database that contains all research inputs and
outputs. “It has been a long time in coming, and we’ve talked about it for
a long time, but we are now at a point where we can glimpse that it may
actually be happening.” The only thing that can protect science funding,
he said, is demonstrating the long-term and diffuse but tremendously
important impacts of science, “and that requires very extensive and
complicated data.”
One way to build such a database will be to take advantage of
automated data capture. Once the framework for the system has been
created, huge amounts of data can be collected automatically by
searching the web. Automated data capture will reduce the reporting
obligations imposed on institutions and individuals. “The ARRA
reporting requirements almost caused my office for research
administration to implode,” said Jaffe. Universities are under stress
because financial support from all sources is down while financial needs
are up. “Everyone is overworked, and when you put these reporting
requirements on top of that, it really is a significant issue that we need to
worry about.”
Such a database would be greatly advanced by a unique identifier
for each person who receives money from the federal government to
conduct research. “This is absolutely crucial,” said Jaffe. “If we
eventually fail to get to a system where each person is tagged with a
unique identifier, this project will not succeed.” Real data have many
ambiguities that need to be resolved, and a unique identifier would
resolve many of them.
Evaluations also need to track the failures—the students who
dropped out, the grant applications that were not funded, the projects that
produced negative results. “You don’t know the return to the successful
investments unless you can have some kind of ‘but for’ or counterfactual
OCR for page 82
82 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
to compare what occurred when you funded it to what might have
occurred otherwise.” Statistically, the best way to answer these questions
is to have data in the system on other than successful outcomes.
Finally, Jaffe said, the data should extend beyond the biosciences. “I
know NIH is the 800-pound funding gorilla, but there are other sciences
and other industries out there.”
The indirect effects of research funding can be very difficult to
track. Things like the accumulation of human capital or the spillover
effects from research have very long lags and diffuse impacts. Data
collection therefore needs to be broad-based and multidimensional.
“What is so exciting about some of these projects is that we are
beginning to see an infrastructure where all the different pieces can be
connected together, where we can come to understand better how all
these things work.”
DISCUSSION
During the discussion period, the panelists discussed several
prominent issues associated with improving the accuracy of information
in databases. Administrative data tend to contain many errors, which can
reduce the value of analyses. Some disciplines have adopted systems in
which researchers are asked to review and correct errors in, for example,
listings of publications and citations. One approach would be to promote
researchers’ retention of permanent e-mail addresses that could function
both as identifiers and as a means of verifying information related to that
person.
Julia Lane cautioned that a unique identifier for each researcher may
not be practical and may not be essential. It may make more sense to
think of investigators having multiple identifiers that are interoperable.
Identification is a problem in many countries, not just the United States,
and efforts both within and across nations are now reaching the point
where progress can be made.
Spector suggested that databases need to leverage the federated
transparency of the Web rather than creating specific systems for
measuring the impacts of research. There are several ways of doing this.
Crowd-sourcing can be “incredibly powerful” because many people, and
particularly the younger generation, want to keep information up to date.
Natural language processing can help improve accuracy by comparing
information from many places on the Web. Finally, machine learning
OCR for page 83
83
EMERGING METRICS AND MODELS
algorithms are powerful categorization mechanisms. “Don’t build custom
systems,” Spector warned, “because they will be expensive [and]
bureaucratic.”
In response to a question about how advances in data presentation
and visualization can help policymakers better understand and use data,
Stasko said that it is critical for the designers of such systems to
understand the systems’ users and tasks. “What do you want to find out
about the data, and how can visualizations help?” The answers to
questions in areas such as patenting could change scientific practices and
help set the research agenda. And visualization can help convey the
complexity of the innovation ecosystem, with all its different and tangled
components.
Director Suresh was asked about the “broader impacts” criterion
that NSF uses to review proposals, with reference to the reauthorization
of the America COMPETES Act calling on NSF to broaden these
impacts to include such considerations as performance measures and
partnerships. Suresh responded that the National Science Board has been
investigating the broader impacts criterion. Researchers are
understandably confused, he said, about how many of these
considerations to incorporate into their research proposals, how much of
the burden to place on the individual versus the department versus the
school versus the institution, and how to consider such factors as
economic impact and workforce development. “This is very much a
work in progress.” A number of groups are working in parallel and in
conversation with one another, he said, ideally leading to clarity rather
than confusion on this issue.
OCR for page 84