TABLE F-5 Science and Engineering Indicators from SEI 2012 Digest Used in the Panel’s Analysis
|RD_%_GDP||R&D expenditures as a share of economic output = R&D as percentage of GDP|
|Deg_NatSci||First university degrees in natural sciences|
|Deg_Eng||First university degrees in engineering|
|Doct_NatSci||Doctoral degrees in natural sciences|
|Doct_Eng||Doctoral degrees in engineering|
|S&E_Art||S&E journal articles produced|
|Eng_Share_S&E_Art||Engineering journal articles as a share of total S&E journal articles|
|Res_Art_Int_CoAuthor||Percentage of research articles with international coauthors|
|Share_Citation_Int_Lit||Share of region’s/country’s citations in international literature|
|Global_HighValue_Patents||Global high-value patents|
|Export_Comm_KIS||Exports of commercial knowledge-intensive services|
|Trade_Balance_KIS_IntAsset||Trade balance in knowledge-intensive services and intangible assets|
|VA_HighTech_Manu||Value added of high-technology manufacturing industries|
|VA_Health_SS||Global value added of health and social services|
|VA_Educ||Global value added of education services|
|VA_Whole_Retail||Global value added of wholesale and retail services|
|VA_Real_Estate||Global value added of real estate services|
|VA_Transport_Storage||Global value added of transport and storage services|
|VA_Rest_Hotel||Global value added of restaurant and hotel services|
NOTES: GDP = gross domestic product; KIS = knowledge-intensive services; R&D = research and development; RD = R&D; S&E = science and engineering.
SOURCE: Panel analysis and Science and Engineering Indicators 2012, see http://www.nsf.gov/statistics/seind12/tables.htm [November 2012].
are more similar. Clusters of subtopics are also observed. Expenditure variables, trade variables, and patent variables are more similar to variables within their group. This shows that variables on a subtopic relay similar information; i.e., they are proxy variables. For example, if an analyst is looking at predictor variables for a regression model and is unable to obtain data on technical staff, then researchers can substitute. In some ways, this relieves the burden on statistical agencies/ offices trying to follow the Frascati Manual’s recommendations. Even if they fall short in collecting certain variables, similar information can be gleaned from other variables on the same topic.
Single indicators highlighted for each subtopic as primary indicators are not shown here, as that would lead to conjecture. Nations should decide which variables to collect depending on ease of collection and budgetary constraints. The panel is not asserting that statistical offices around the world should stop collecting detailed S&T data, as the utility of variables is not limited to the ability to feed them into a regression model. National statistical offices collect detailed STI information through surveys and/or by using administrative records to answer specific policy questions, such as the mobility of highly skilled labor, the gender wage gap in S&T occupations, and the amount of investment moving into certain S&T fields. It can be said that the S&T community is interested in understanding the progress of nations in attracting the best talent, or the broad careers pursued by Ph.D. holders in particular fields, or the R&D investment in environmental projects. The main concern faced by the panel was the unavailability of detailed data as main variables undergo disaggregation. Apart from OECD and Eurostat member countries, the rest of the world has yet to keep pace in terms of capturing STI information in accordance with recommendations of the Frascati and Oslo Manuals. OECD and Eurostat have been frontrunners in pursuing valuable information, and they should be commended for their efforts. At the same time, the panel is not critical of non-OECD and non-Eurostat nations, as both data collection agencies and respondents must undergo a learning process to provide such fine data in a consistent fashion.
Figures F-14 to F-16 show a cluster heat map, a hierarchical cluster tree, and the multidimensional scaling of a Pearson correlation matrix, respectively. The input matrix consists of S&E indicators from the SEI 2012 Digest. The red and orange squares along the diagonal of the heat map show that those variables are very closely related to each other, and either they could be merged, or the most well-behaved and consistent variables among them could be selected. Figure F-15 shows clusters of indicators; Figure F-16 shows sets of indicators that are either similar or dissimilar. In these two figures, the dimensions have no interpretation, and one is looking for clusters of variables that would indicate they belong together. Indicators representing the service sector are observed to be highly correlated with each other. Indicators denoting first university degrees are closely grouped together. The same conclusion can be drawn for indicators on generation of S&E knowledge (articles and citations). Therefore, clusters of subtopics are observed, similar to those observed for STI variables from the OECD, Eurostat, and UNESCO databases. Certain indicators, such as R&D as