National Academies Press: OpenBook
« Previous: 3 Software and Systems Division
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

4

Statistical Engineering Division

The mission of the Statistical Engineering Division (SED) is to develop and apply statistical and probabilistic methods supporting research in measurement science and technology; implement methods for experimental design, data analysis, statistical modeling, and probabilistic inference; study and apply best practices for the characterization of measurement uncertainty; and disseminate sound statistical methods to U.S. industry and the scientific community. SED conducts statistical research, consulting, and collaboration in metrology, standard development, forensics, and other areas fundamental to NIST’s mission. While maintaining core competence in experimental design, statistical modeling, and data analysis, the division is expanding into the areas of machine learning (ML), artificial intelligence (AI), and data science. The activities of SED staff not only lead to advances in statistical methodology; they also bring significant benefits to the scientific programs of their many collaborators in government and industry. The division’s Boulder Statistics Group focuses on collaborations with NIST scientists in the Boulder, Colorado, campus. The Gaithersburg Group focuses on collaborations with NIST scientists in the Gaithersburg, Maryland, campus.

TECHNICAL QUALITY OF THE WORK

SED has considerable scientific expertise in the three distinct areas needed to accomplish its mission: metrology and inter-laboratory studies, collaborative research and education in support of other NIST programs, and standing as a neutral and trusted arbiter in areas of statistical controversy. However, advanced statistical computing techniques (beyond some software development) seems to be for the most part absent from the activities, priorities, and interests of the SED. Some software developments were reported, but the role and importance of computing algorithms go beyond this. The strong interlink between statistics methods and research to computer science methods and research are an integral part of modern statistics.

The area of metrology and inter-laboratory studies has been a core technical component of SED since its founding in 1947. NIST appears to be the best in the world among all the organizations engaging in these activities. The activities are of two types. The first is publishing papers in applied science and metrology outlets to try to improve statistical practice. For instance, SED staff indicated that standard metrology assumes independence of data (a generally wrong assumption for metrology data). A SED staff member led the development of an international standard (ISO/CD 24185) that shows how to deal with dependence properly. This was not new statistics, but it shows NIST doing its job to continually improve metrology. Now statistical methodology development is expanding beyond the modeling stage to include the whole data science life cycle including data formulation and data cleaning, with consistent documentation and code repository including the data cleaning process.

Obtaining inter-laboratory agreements (necessary for national and international laboratories to reach consensus on metrology issues) is an interesting statistical and political problem. The political aspect is that even laboratories that are hopelessly wrong ought not to simply be excluded from the analysis, because rejection of their results would be politically problematical. The SED chief scientist explained that disagreement could be addressed through Bayesian hierarchical modeling, whereby the wrong results would have little effect on the final answer, but the confidence intervals for the incorrect

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

laboratories would still include the correct result, which would not then be rejected. This is a creative use of modern Bayesian analysis.

In the area of collaborative research and education in support of other NIST programs, SED strongly supports other NIST programs that are in need of statistical support. SED seems to be among the best government and industry collaborative statistics groups. Sometimes this support is routine, as in the frequent need of other NIST scientists for assistance in standard experimental design (a SED strength). Even here, NIST is forward looking with developments on the leading edge of design. For instance, the ABACUS chemical analysis package was developed to provide all chemical analysis groups at NIST a way to generate optimal designs for their experiments. The package is a leading-edge hierarchical Bayes design package that could beneficially be made widely available to the chemical industry. It would be useful to compare Bayesian model conclusions with machine learning conclusions unless the Bayesian modeling approach is scientifically vetted and well documented. The qualitative scientific conclusions should not depend on whether a NIST statistician is Bayesian or not.

Often the collaborations involve development of novel statistical methodology. One such example was the LANTERN methodology, which focused on distilling large-scale genotype-phenotype measurements into an explainable low-dimensional representation, while using a modern Gaussian process implementation to deal with the highly variable response surface arising from the measurements. Across several large-scale benchmarks (including ML and AI), LANTERN’s predictive performance was outstanding, providing interpretable scientific insights concerning the way that the genotypes and phenotypes affect the analysis.

SED also provides numerous short courses and internal individual training opportunities in statistics for NIST staff. Such technology transfer is an important part of enhancing the overall scientific expertise at NIST. Often, this technology transfer occurs on an individual level. For example, a SED staff member developed a very complex statistical design/analysis implementation for problems a particular NIST researcher faced; by the end of the project, the scientist had internalized this very complex analysis and subsequently did not need statisticians to help address such problems.

In the area of standing as a neutral and trusted arbiter in areas of controversy, following its longstanding role as the setter of standards in metrology, SED is also seeking to establish standards in other areas of statistical controversy. An illustration of this was the work on the reporting of forensic evidence in court.1 A typical current practice for DNA evidence is to report likelihood ratios to represent evidential weight, but this has become a default practice rather than a carefully reasoned methodology, and as such is quite controversial. The SED effort contribution was to go back to the roots of the problem and try to identify methods of presenting forensic evidence that all could agree with.

A 2015 National Academies of Sciences, Engineering, and Medicine report2 encouraged SED to broaden its engagement with the academic community, both to augment the expertise at NIST but also to further raise the profile of SED. A number of steps were taken to do this and need to be pursued. However, a basic problem is that very few academic statisticians are involved with SED. Academics can spread the word that NIST is a great place to work and with which to collaborate. Academic involvement can also help address the deficiency that SED staff have very few publications in statistics journals (only 17 since 2015)—a limiting factor in terms of visibility to the external community. This is to be expected for metrology and inter-laboratory agreements, as the audiences for these works are primarily non-statistical.

Most of the collaborative research being done by SED is of enough novelty that it could be published in mainstream statistical journals. The challenge to doing so is time; the primary audience for the work is typically the discipline in which the supported project originated, and publishing in that

___________________

1 National Institute of Standards and Technology, Information Technology Laboratory, “Likelihood Ratios and Evidence Communication,” presentation to the panel on June 22, 2021.

2 National Academies of Sciences, Engineering, and Medicine, 2015, Review of Three Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology, Washington, DC: The National Academies Press.

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

discipline has priority at NIST. Abstracting from the work of a supported project the novel methodology for a statistical journal can be time consuming, and SED staff often reasonably choose to spend their time elsewhere (e.g., on a new collaborative project). Academic engagement will help to develop these methodologies more fully. Additionally, this could be helpful in influencing the academic community to take on research topics and challenges of long-term interest to NIST, such as the development of methods at the intersection of statistics and machine learning.

RECOMMENDATION: SED should strengthen its engagement with the academic community and should consider assigning an academic statistician on a project to take the lead in the production of statistics journal articles.

There seems to be excellent synergies occurring between traditional statistics and data science, as reflected in the LANTERN project, which involved a very productive synergy between the two. The incorporation of data science within the existing activities of SED staff could become a challenge if data science were to replace the core statistical elements of SED. On the other hand, this may not become a problem if data science staff were added to SED as part of the overall Information Technology Laboratory (ITL) data science initiative.

RECOMMENDATION: SED should expand engagement with the data science, machine language, and artificial intelligence researchers at NIST.

There are opportunities for NIST to leverage its reputation as a neutral arbiter in directions other than metrology, inter-laboratory agreements and forensics. For instance, SED is ideally positioned to propose a common language for ML, AI, data science and statistics, although it may well be that those ships have sailed.

RECOMMENDATION: SED should maintain and enhance its role as an impartial and trusted arbiter of statistical issues.

TECHNICAL EXPERTISE OF THE STAFF

SED scientists are highly capable at accomplishing this mission and supporting NIST’s diverse programs and objectives, both inside and outside of ITL. Division members are well trained statistical generalists, each capable of supporting data collection and analysis efforts across a spectrum of technology applications. In some cases, these applications involve non-trivial existing methods that may be unknown to non-statisticians. In others (e.g., basic metrology), SED staff members improve well established, but not the best, statistical practice methods and advocate the use of appropriate methods. New science in NIST programs often produces the need for new theoretically sound and practically effective statistical methodology. SED effectively provides excellent support in all of these kinds of activities.

SED’s long-standing core responsibility to support measurement science has grown substantially since the division’s 2015 National Academies review. For example, its role in the development and documentation for every Standard Reference Material available from NIST is now quite substantial. It is making important national and international contributions to measurement and standards organizations.

In their responsibility to support new science and technology, SED staff members work as close collaborators with scientists and engineers, educating their partners and developing deep personal domain interest. They provide breadth of statistical expertise and domain engagement that impact outcomes across NIST. This includes both relatively small/focused and more major/broad initiatives like forensics and evaluation of the current state of diversity, equity, and inclusion.

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

AI and ML are areas of substantially increased interest at NIST, including ITL. Probabilistic modeling and advanced statistical methodology provide perspectives and tools with potential to greatly advance these fields. AL and ML experts typically have limited background in statistics, and not all statisticians have experience in these areas. Although some progress has already been made, there is need for additional SED staff that can very soon interact substantially with computer scientists and engineers in AI/ML. Part of the challenge here is staff size and the number and size of the efforts that SED is already supporting. Part of the challenge is to find statistical generalists who have existing experience in these areas.

RECOMMENDATION. In hiring of technical staff, SED should continue its practice of hiring statistical generalists (trained individuals with good understanding of the theory and the applications of current and proven statistical methodologies) and pay first attention to maintaining expertise to support its existing missions and broad statistical expertise. SED should also seek to recruit excellent statisticians with prior experience in artificial intelligence (AI)/machine learning (ML), targeting some recruiting efforts at good statistics and computer science Ph.D. programs with strong AI/ML components.

RECOMMENDATION: As key elements in its hiring plans, SED should seek individuals with expertise in statistical computing, artificial intelligence, machine learning, and research publication.

ADEQUACY OF RESOURCES

SED staff reported that the division has sufficient hardware, software, and information technology support for their mission. It has implemented the recommendation from the 2015 National Academies report to develop stronger ties with the statistical research community for its staff. Staff have been engaged with the American Society for Quality, the American Statistical Association, and the International Statistical Engineering Association; NIST hosted the 2019 Fall Technical Conference and participated regularly in the Defense and Aerospace Test and Analysis Workshop (DATAWorks). It sponsored a Virginia Tech Computational and Data Analytics Capstone Project course. SED has also participated in NIST’s efforts to expand diversity, equity, and inclusion and has participated in both the Inclusivity Network Analysis as a First Step to Harness Human and Social Capital for Innovation at NIST and Assessing Inclusivity of Women at NIST projects.

One of the key challenges for SED is how to recruit and retain the next generation of talent. Over 60 percent of SED staff are eligible for retirement. SED is one of the very best groups in the world in metrology and inter-laboratory experimentation and among the best government or industry applied collaborative statistics groups. As SED develops its strategy for growth, it will need to ensure that it maintains excellence in these areas.

RECOMMENDATION. SED’s staffing strategy should continue to support its excellence in metrology, reference material development and calibration, documentary standards development, and inter-laboratory comparisons. It should also include statisticians who can support broad scientific collaboration at NIST.

SED has a growth opportunity. AI and ML are a strategic technical focus for NIST. Statisticians are key players in AI/ML, and data science more broadly, and can make substantial contributions to collaborative projects. As an example, the LANTERN project focuses on developing an interpretable hierarchical Bayesian model to distill large-scale, genotype-phenotype landscape measurements into an explainable low-dimensional representation. Across several large-scale benchmarks, LANTERN’s predictive performance equals or outperforms all alternative approaches while providing interpretable

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

scientific insights. Examples such as the LANTERN project illustrate the value that SED can bring to NIST in AI/ML and suggest that SED should leverage this opportunity to increase its staffing in these important areas.

RECOMMENDATION: SED should leverage the growth opportunity in artificial intelligence and machine learning to increase its staffing in these areas and in data science broadly.

SED is not well known among students and postdoctoral researchers in the statistics community. In addition, only 16 percent of SED’s technical staff are women. These factors suggest that developing new recruiting strategies needs to be a high priority for SED.

SED might consider adapting a model like that used in the Statistical Sciences Group at the Los Alamos National Laboratory (LANL).3 Starting over 20 years ago, that Statistical Sciences Group identified faculty from across the United States with expertise in areas of strength or strategic growth. These faculty were invited to spend portions of their summers or sabbaticals at LANL with their travel supported, and their students were encouraged to apply for internships. Faculty and students worked on projects, with particular emphasis on helping to prepare results for publication. Faculty were able to provide peer-review for internal technical documents, and students often chose to extend the initial project work into part of a master’s thesis or Ph.D. dissertation, which continued their collaborations into the academic year. Many students, after exposure to the deep scientific collaboration provided by collaborative projects, chose careers at the national laboratories. Intentionally focused engagement with the academic community may enable both recruitment and retention by allowing SED staff to continue their professional development.

This form of recruitment pipeline is widely used in both Department of Energy (DOE) and Department of Defense (DoD) agency settings. The DOE laboratories have especially well-developed internship programs attached to a variety of U.S. universities, including LANL’s and the Lawrence Livermore National Laboratory’s well-established historical ties to the University of California system’s various campuses. These internship and related faculty outreach activities provide not only an existing successful exemplar for science, technology, engineering, and mathematics (STEM) student recruitment, but also a risk-mitigation strategy for the problem of effective recruitment efforts being compromised by ineffective retention methods. Hiring student interns from U.S. universities provides a cost-effective means to evaluate the fit of potential candidates while helping students and faculty learn about the unique culture of these federal institutions.

Faculty and student outreach efforts also provide a means to enrich the diversity of the NIST workforce while remedying the problem of aging out of the NIST staff ranks. A good way to develop young STEM talent for a federal agency is through university outreach, and especially when the outreach collaborative efforts are focused on universities that recruit and retain high-quality students with diverse backgrounds. The collaboration with Morgan State University mentioned in the Information Access Division and SSD chapters above is a good start on solving the problem of the recruitment of underrepresented minorities, and many federal agencies have formed similar collaborations with universities that have high proportions of U.S. citizens among their student communities, as well as strong STEM programs that attract high-quality, technically oriented students. These universities are often found in large states with diverse populations. For example, Texas has several universities with diverse student bodies. The development of intern programs between NIST and these universities would provide a time-tested strategy for recruitment and retention of high quality diverse staff and management talent. If funding can be developed to support the year-round academic studies of these hiring candidates, the strategy only improves its utility by creating strong intellectual and personal bonds between NIST and key university faculty who recruit domestic students.

___________________

3 Los Alamos National Laboratory, “Statistical Sciences,” https://www.lanl.gov/org/ddste/aldsc/computercomputational-statistical-sciences/statistical-sciences/index.php, accessed July 12, 2021.

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

RECOMMENDATION: SED should develop a strategy to broaden its workforce pipeline.

In addition to recruiting and retaining staff, SED has an opportunity to expand its impact with more effective development and deployment of software products, particularly SED-developed web-based tools. An example of this kind of tools is ABACUS, a web-based tool that automates the design and analysis of data from quantitative chemical measurement procedures. In general, SED staff do not have the development experience to make robust, deployable tools, and they would benefit from collaboration with other ITL staff. In addition, SED staff reported that NIST policies and procedures currently make it difficult to deploy an outward facing software product. SED could consider software t options that include open source software, third- party repositories (whether github or an academic/commercial cloud system), applying resources to some level of user support, and investing in developing and presenting tutorials, documentation, and training.

RECOMMENDATION: ITL should identify resources and create processes that make software and tools developed by SED easily available both within and outside NIST.

EFFECTIVENESS OF DISSEMINATION OF OUTPUTS

The majority of SED’s research activities are applied in character and are characterized by close support of disciplinary research groups in NIST and engagement with standards bodies in the proper use of statistics. As noted earlier, the division also engages in methodological research. The research portfolio is driven by the needs of stakeholders across NIST and the standards communities. Research outcomes tend to be either contributions to disciplinary research products or methodological in nature. As such, appropriate levels of technology transfer and the dissemination of research results can be accomplished through publication of peer-reviewed journal and conference papers, publication of standard reference materials and documentary standards, presentation at scientific conferences and workshops, and dissemination of computational tools. SED has demonstrated continuing high-levels of dissemination activity that over the past 5 years includes publication of 287 academic papers, reports, and book chapters, as well as characterization of numerous reference materials. SED researchers have presented at numerous scientific conferences and have hosted dozens of short courses and tutorials for the benefit of NIST colleagues, other government agencies, and the academic community. Additionally, division researchers have a number of active collaborations with members of the academic community that include cooperative research agreements and hosting visiting faculty, postdoctoral researchers, and students.

While SED is highly active in publishing in subject-matter journals and conferences, its dissemination of research results, as well as its standing and familiarity within the academic community, could be improved through increasing publication in top-shelf statistical journals. The scope of most top journals tends to be on foundational theory and novel methodology, whereas most SED research activity is applied and aims to support the practice of statistical analysis. A move to increase publication in statistics journals might be achieved by collaborations with academic statisticians to further develop and publish on the novel statistical issues arising from these applications. Additionally, it would be beneficial if SED staff expanded their publication and research presence beyond statistics communities to machine learning journals such as the Journal of Machine Learning Research and conferences such as the Annual Conference on Neural Information Processing Systems (NeurIPS) and the International Conference on Machine Learning.

RECOMMENDATION: SED should continue its highly active dissemination in subject-matter venues while increasing publication in highly ranked statistics journals.

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×

The dissemination of software products by the division appears to be centered on open-source publication by the research groups themselves. Dissemination and technology transfer could be improved if these efforts were supported by either a dedicated software group that could manage and implement a pipeline for the translation of research tools into products with robustness and usability attributes appropriate for broader distribution and use, or by partnering with other units within NIST to establish such a pipeline for disseminating software.

RECOMMENDATION: SED should commit to providing translational software support for research products with the goal of disseminating software with usability at near-commercial levels.

GENERAL CONCLUSIONS

SED has maintained outstanding expertise to support core missions of NIST and has provided close and productive collaborations with other NIST divisions, other government agencies, and industry. It serves as a neutral and trusted arbiter on the interpretation of statistical evidence and the validity of statistical methods. With a relatively small staff size, SED faces a challenge of maintaining core competence in statistical design, modeling, data cleaning, and uncertainty measurement, while at the same time growing new competence to support NIST initiatives in areas such as AI. There is need and opportunity for improving the technical quality of programs, scientific expertise, resource development, and dissemination efforts. A common theme underlying many of these areas is the need to strengthen ties with the statistical community to improve the recruitment, retention, and professional development of staff, and to enhance visibility through collaborative publications with academic statisticians.

Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 24
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 25
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 26
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 27
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 28
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 29
Suggested Citation:"4 Statistical Engineering Division." National Academies of Sciences, Engineering, and Medicine. 2021. An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021. Washington, DC: The National Academies Press. doi: 10.17226/26354.
×
Page 30
Next: 5 Crosscutting Conclusions and Recommendations »
An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021 Get This Book
×
 An Assessment of Selected Divisions of the Information Technology Laboratory at the National Institute of Standards and Technology: Fiscal Year 2021
Buy Ebook | $14.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

At the request of the National Institute of Standards and Technology (NIST), the National Academies of Sciences, Engineering, and Medicine has, since 1959, annually assembled panels of experts from academia, industry, medicine, and other scientific and engineering environments to assess the quality and effectiveness of the NIST measurements and standards laboratories. This report assesses the scientific and technical work performed by the NIST Information Technology Laboratory for the following divisions: Information Access, Software and Systems, and Statistical Engineering.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!