TABLE F-5 Science and Engineering Indicators from SEI 2012 Digest Used in the Panel’s Analysis

Indicator Label Indicator
RD_%_GDP R&D expenditures as a share of economic output = R&D as percentage of GDP
   
Deg_NatSci First university degrees in natural sciences
Deg_Eng First university degrees in engineering
Doct_NatSci Doctoral degrees in natural sciences
Doct_Eng Doctoral degrees in engineering
S&E_Art S&E journal articles produced
Eng_Share_S&E_Art Engineering journal articles as a share of total S&E journal articles
   
Res_Art_Int_CoAuthor Percentage of research articles with international coauthors
Share_Citation_Int_Lit Share of region’s/country’s citations in international literature
Global_HighValue_Patents Global high-value patents
Export_Comm_KIS Exports of commercial knowledge-intensive services
HighTech_Exports High-technology exports
Trade_Balance_KIS_IntAsset Trade balance in knowledge-intensive services and intangible assets
VA_HighTech_Manu Value added of high-technology manufacturing industries
VA_Health_SS Global value added of health and social services
VA_Educ Global value added of education services
VA_Whole_Retail Global value added of wholesale and retail services
VA_Real_Estate Global value added of real estate services
VA_Transport_Storage Global value added of transport and storage services
VA_Rest_Hotel Global value added of restaurant and hotel services

NOTES: GDP = gross domestic product; KIS = knowledge-intensive services; R&D = research and development; RD = R&D; S&E = science and engineering.

SOURCE: Panel analysis and Science and Engineering Indicators 2012, see http://www.nsf.gov/statistics/seind12/tables.htm [November 2012].

are more similar. Clusters of subtopics are also observed. Expenditure variables, trade variables, and patent variables are more similar to variables within their group. This shows that variables on a subtopic relay similar information; i.e., they are proxy variables. For example, if an analyst is looking at predictor variables for a regression model and is unable to obtain data on technical staff, then researchers can substitute. In some ways, this relieves the burden on statistical agencies/ offices trying to follow the Frascati Manual’s recommendations. Even if they fall short in collecting certain variables, similar information can be gleaned from other variables on the same topic.

Single indicators highlighted for each subtopic as primary indicators are not shown here, as that would lead to conjecture. Nations should decide which variables to collect depending on ease of collection and budgetary constraints. The panel is not asserting that statistical offices around the world should stop collecting detailed S&T data, as the utility of variables is not limited to the ability to feed them into a regression model. National statistical offices collect detailed STI information through surveys and/or by using administrative records to answer specific policy questions, such as the mobility of highly skilled labor, the gender wage gap in S&T occupations, and the amount of investment moving into certain S&T fields. It can be said that the S&T community is interested in understanding the progress of nations in attracting the best talent, or the broad careers pursued by Ph.D. holders in particular fields, or the R&D investment in environmental projects. The main concern faced by the panel was the unavailability of detailed data as main variables undergo disaggregation. Apart from OECD and Eurostat member countries, the rest of the world has yet to keep pace in terms of capturing STI information in accordance with recommendations of the Frascati and Oslo Manuals. OECD and Eurostat have been frontrunners in pursuing valuable information, and they should be commended for their efforts. At the same time, the panel is not critical of non-OECD and non-Eurostat nations, as both data collection agencies and respondents must undergo a learning process to provide such fine data in a consistent fashion.

Figures F-14 to F-16 show a cluster heat map, a hierarchical cluster tree, and the multidimensional scaling of a Pearson correlation matrix, respectively. The input matrix consists of S&E indicators from the SEI 2012 Digest. The red and orange squares along the diagonal of the heat map show that those variables are very closely related to each other, and either they could be merged, or the most well-behaved and consistent variables among them could be selected. Figure F-15 shows clusters of indicators; Figure F-16 shows sets of indicators that are either similar or dissimilar. In these two figures, the dimensions have no interpretation, and one is looking for clusters of variables that would indicate they belong together. Indicators representing the service sector are observed to be highly correlated with each other. Indicators denoting first university degrees are closely grouped together. The same conclusion can be drawn for indicators on generation of S&E knowledge (articles and citations). Therefore, clusters of subtopics are observed, similar to those observed for STI variables from the OECD, Eurostat, and UNESCO databases. Certain indicators, such as R&D as



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement