November 25, 1997
Dr. Tom Karl
Chairman, Climate Research Committee
of the National Research Council
Harris Building, Room 466
2001 Wisconsin Avenue, NW
Washington, DC 20007
RE: Climate Research Committee of the National Research Council Review and Assessment of Climate Modeling Activities in the U.S.
Dear Dr. Karl:
Per your request of November 3, 1997, we are pleased to provide the enclosed attachments:
1. NCAR measurements of single processor performance. The enclosed table summarizes a small set of measurements that we find useful for preliminary evaluation of computers. For example, many atmospheric models make heavy use of elementary functions so we measure them. With respect to computational kernels, “radabs” is a physics module from the NCAR Community Climate Model (CCM). The “shalxx” entries are for a 2D shallow water model at two grid sizes - 64 × 64 and 256 × 256. Our experience is that results from these kernels typically provide an upper bound on the performance of a specific computer relative to a broad set of atmospheric models. Because overall performance is often paced by memory performance, “copy” measures memory to memory transfer, “ia” measures indirect addressing, and “xpose” measures
transposition of arrays, which is fundamental to the implementation to the FFT. Probably the most important metric in this table is the performance of CCM2.
As you will note, the first five columns of the enclosed table give performance on leading edge microprocessor systems. The last two columns give performance for two parallel vector processing systems - the Cray C90, a second-generation vector computer, and the NEC SX-4, a state-of-the-art vector computer. Relative to the SX-4, microprocessors deliver from 7–17%, i.e., approximately 1/10th, of the performance of state-of-the-art vector processors. While the cycle time (MHz) of microprocessors now surpasses that of vector processors, our measurements show that for the past decade, the ratio of sustained performance between the two is approximately 10, in favor of vector processors. Thus, if one can achieve a certain level of performance, say 20 Gflops, using n vector processors, typically at least 10 n microprocessors are required to achieve the same level of performance.
2. A Sampling of Computing Systems in Major Atmospheric Modeling Centers Around the World. Simply put, our international colleagues now enjoy a substantial computational advantage over U.S. Modelers.
3. Comments from NCAR to the International Trade Commission. This document includes information as to our objectives in the procurement, details of the competing offers, and our rationale for selecting the SX-4.
Thank you for the opportunity to supply this information. If I can be of further assistance, please let me know.
Bill Buzbee, Director
Scientific Computing Division
National Center for Atmospheric Research
cc: R. Serafin, R. Anthes, C. Jacobs, J. Fein
NCAR Measurements of Single Processor Performance
A Sampling of Computing Systems in Major Atmospheric Modeling Centers Around the World
Bill Buzbee, Ph.D.
Director, NCAR Scientific Computing Division
November 25, 1997 (revised February 6, 1998)
The National Center for Atmospheric Research (NCAR) and the community it serves, currently enjoy world leadership in several areas of atmospheric sciences research that depend on high performance computing. In order to maintain this leadership, NCAR must have computing capabilities that are comparable to peer organizations throughout the world. The most powerful computer that NCAR has today is the Cray C90/16 and NCAR will soon install a 128 processor Distributed Shared Memory (DSM) microprocessor system. Neither of these systems will sustain more than 5 Gflops on a single application. However, NCAR's peer centers in Australia, Canada, England, and elsewhere, are installing systems that by January '98 will sustain from 20–100 Gflops on a single application. With these systems, they can, and they are, conducting research that is far beyond the ability of their U.S. counterparts.
Section one of this paper summarizes the computing capabilities of a small number of forecast and climate modeling centers around the world. Sections two and three discuss future plans at some of these centers. Section four summarizes computing capability at a small number of universities in Japan and Europe. Section five discusses the impact on U.S. atmospheric science. Overall, this paper shows that modelers outside of the U.S. have a substantial computational advantage over their U.S. colleagues and are likely to enjoy such for several years.
II. Systems Currently Installed
Table 1 lists some of NCAR's peer organizations and their associated computing systems that are capable of sustaining 20–100 Gigaflops on a single application.
In 1995, the European Center for Medium Range Weather Forecasting (ECMWF) selected the Fujitsu Vector Parallel Processor (VPP) system via competitive procurement. As of August 1997, the system has 116 processors, each of which sustains about 0.75 Gflops, giving the possibility of sustaining 80–100 Gflops on a single application. ECMWF is using the VPP to run the climate version of their forecast model (used in seasonal forecasts) at T63L50 resolution . In contrast, the NCAR Community Climate Model, Version 3 (CCM3), is typically run at T42L18. To move the UCAR CCM3 to a configuration similar to that being used at ECMWF would require a machine that can sustain at least 20 Gflops.
The Canadian Meteorological Center (CMC) has a 32 processor NEC SX-4. The CMC set a milestone recently by completing a 24-hour forecast over North America at 10-km resolution in about forty minutes of wallclock time . CMC was able to do this because the SX-4 sustains about 24 Gflops when executing the MC2 forecast model, thus CMC plans to reduce its operational forecast grid size to 10–15 km . By January of 1998, CMC will have two SX-4/32s and by January of 2000 they will have four SX-4/32s that can be clustered into a single 128 processor system via NEC's fiber optic Internode Crossbar
Switch  giving them an 80–100 Gflop capability. These machines will also be used for climate modeling .
In the spring of 1996, the UK Meteorological Office (UK Met) selected the Cray T3E with 696 processors but has not yet put it into operational use. They plan to dedicate 144 processors to the global operational forecast and 144 to the regional forecast. The remaining 408 processors are to be used for research, including climate modeling . This equipment is also used by the Hadley Centre.
Meteo-France has selected the Fujitsu VPP and currently has a system with 26 processors capable of sustaining 20 Gflops on a single model.
The Danish Meteorological Institute has two NEC SX-4s, one with sixteen processors and one with four. The sixteen processor system sustains approximately 12 Gflops. Twenty percent of the wallclock time on this machine is used for forecasting, the remaining eighty-percent and the four processor system are used for research including climate modeling .
The most powerful system in the U.S. that is used for climate modeling is a Cray T90 with twenty-six processors at the Geophysical Fluid Dynamics Laboratory (GFDL) in Princeton, New Jersey. A single processor of the T90 sustains about 0.6 Gflops when executing the NCAR CCM; thus the GFDL machine is capable of approximately 15 Gflops.
The Australian Bureau of Meteorology has selected the NEC SX-4 . The current system has sixteen processors, but will be upgraded to thirty-two processors in February 1998. A second SX-4 with twenty processors will be acquired in the third quarter of 1999. The two systems will be clustered via NEC's Internode Crossbar Switch, thus giving a 30–40 Gflop capability.
III. Future Developments Abroad
By 1999, the next generation of Japanese vector systems will probably be available with processors that may be more than twice as fast as the current generation. If so, it will be possible to sustain 80–100 Gflops with fewer than 50 processors and, obviously, implementing and managing models over 30–50 processors is much easier than over
hundreds of processors.
The Japanese Science and Technology Agency has established an “Earth Simulator” project . The project was launched in April 1997 with funding of approximately $400 million over five years. The project includes development of a high performance parallel computer with a sustained performance of one or more Teraflops by 2001. This system will be provided by either NEC or Fujitsu. For example, if the next generation Fujitsu VPP has a sustained performance of 2–3 Gflops per processor, then a few hundred of these processors could sustain one or more Teraflops.
IV. A Sample of Computing Systems in Universities Abroad.
The National Science Foundation provides university scientists, including atmospheric scientists, with access to high performance computers. The most powerful computer supported by NSF is a 7.5 Gflop (twelve processor) Cray T90 located at San Diego. In contrast, the University of Stuttgart, the Swiss Center for Scientific Computing, and Osaka University have large SX-4s. The University of Tokyo, Nagoya University, and Kyushu University all have Fujitsu VPPs with at least forty processors. Thus, all of these universities have systems that are capable of 20 Gflops or more.
V. Impact on U.S. Atmospheric Science
U.S. atmospheric science modelers currently enjoy global leadership in several areas of research that depend on high performance computers. To maintain that leadership, they need computing capabilities that are comparable to their international peers. For example, a 1-km regional forecast using 4DVAR with full physics adjoint is feasible, but to use such in time critical (less than one hour) forecasting will probably require a machine that can sustain at least 50 Gflops . Another example is a recently developed NCAR global chemistry model (MOZART) - in order to complete 100-year simulations of the climate within a reasonable timeframe, this model needs a computer that can sustain 20 to 40 Gigaflops .
The situation is particularly acute in climate modeling and is exemplified by the computational requirements of the NCAR Coupled System Model (CSM). Now that the CSM project has successfully completed a 350-year control run, there are two major studies that it
would like to undertake:
1. simulate the past 120 years of climate under at least six scenarios and four sensitivity studies per scenario and
2. simulate the next 200 years of climate under at least three scenarios and four sensitivity studies per scenario.
The total years to be simulated in 1) and 2) is 5280. At present, the flagship computer of the NCAR Climate Simulation Laboratory (CSL) is a Cray C90 that sustains 5 Gflops and that serves nine USGCRP projects including the CSM. On average, the CSM project can complete 100 years/month using the CSL C90. Thus, to complete 1) and 2) would require more than four calendar years, which is unacceptable relative to progress being made by our international peers.
The CSM project also plans future improvements to the model such as semi-Lagrangian dynamics, prediction of cloud water, and a sulfate aerosol model. These improvements are expected to quadruple the amount of computation required per simulated year. Thus, a 20 Gflop machine will be required to maintain the current average of 100 years/month.
For ease of reference, we denote 1) and 2) as Part A of the CSM science plan. Similarly, we denote development and execution of the next generation of CSM as Part B of the CSM science plan.
Now that the U.S. Department of Commerce has issued an antidumping order against Japanese high performance computers, NCAR plans to continue operating the CSL C90 in FY98-99 and to install a 128 microprocessor, Distributed Shared Memory (DSM) system in the CSL in mid-FY98. Based on measured performance of two leading-edge 128 processor DSM systems executing the NCAR CCM (Community Climate Model) and POM (Parallel Ocean Model), we estimate that 128 processor DSMs will sustain about 5.0 Gflops on the CSM by mid-FY98. If so, then Part A of the CSM science plan can probably be completed by end of FY99.
However, we believe that it will be FY99-00 before 256 processor DSMs can approach 20 Gflops. Thus, the following are not possible in the near term:
• Part B of the CSM science plan,
• a 1-km regional forecast with 4DVAR in less than one wallclock hour, and
• routine use of MOZART in climate studies.
Meteorological organizations outside the U.S. either have or soon will have computing systems that can sustain 20–100 Gflops on climate simulations, high resolution forecasts, etc. With these systems, they can and they are conducting research that is far beyond the ability of their U.S. counterparts.
The bottom line - earth systems modelers outside the U.S. have a substantial computational advantage over their U.S. colleagues and are likely to enjoy such for several years.
1. James Hack, NCAR CGD Division, personal communication, November 1997.
2. NEC SX-4'S REAL-TIME 24HR 10KM RES FORECAST DETAILED, HPCwire, August 1, 1997.
3. CMC PLANS TO REDUCE FORECASTING GRID SIZE WITH NEC SX-4, HPCWIRE, November 20, 1997.
4. CMC UPGRADES NEC SX-4 TO IMPROVE FORECASTING, HPCWIRE, November 21, 1997.
5. Dr. Paul Cluey, UK Met, personal communication, November 1997.
6. Dr. Leif Laursen, Danish Meteorological Institute, “Technical Advances in Short Range Weather Forecasting”, RCI European Member Management Symposium X, Rome, Italy, October 1997.
7. AUSTRALIAN METEOROLOGISTS, RESEARCHERS TO RECEIVE NEC SX-4, HPCwire, July 25, 1997.
8. Hisashi Nakamura, Director of Research for Computational Earth
Science, Research Organization for Information Science and Technology, Tokyo, personal communication, November 1997.
9. Bill Kuo, NCAR MMM Division, personal communication, November 1997.
10. Stacy Walters, NCAR ACD Division, personal communication, November 1997.
Comments from UCAR to the International Trade Commission Hearing
August 27, 1997
Dr. Bill Buzbee
Director, NCAR Scientific Computing Division
Members of the Commission:
On behalf of the University Corporation for Atmospheric Research (UCAR), thank you for this opportunity to present information on the issue before you.
UCAR RFP B10-95P requested computers that could demonstrate robust operation and high performance when executing UCAR applications. Specifically, the RFP stated (pg. 2):
“1. The requested system will be a production-level, high-performance computing system. Production implies a high level of system availability and reliability, and both a robust batch capability and robust software development environment.
2. The primary objective … is high performance in executing existing parallel multi-tasked and/or message passing atmospheric models, ocean models, and/or full Climate System models…”
Hereafter, we will refer to 1. as “UCAR's robust operational requirement.”
The final Best and Final Offer (BAFO) from Cray Research was received February 28, 1996. It detailed an ensemble of eight computers spanning five different system models and both vector and
nonvector architectures. Only one of the eight systems could be tested with respect to UCAR's robust operational requirement and the performance measured on it was 5.4 billion arithmetic operations per second (Gigaflops). The final array of equipment, to be delivered in August '98, was estimated by Cray Research to sustain 50.3 Gigaflops. The one system that was tested would have been removed in August '98 and none of the August '98 systems could be tested. The inability to test any of the August '98 systems presented unacceptable risk to UCAR.
In contrast, all of the equipment offered by the Federal Computer Corporation (FCC) could be tested and it demonstrated both robust operation and high performance. Simply put, Cray Research lost this procurement because their BAFO had unacceptable technical risk.
UCAR is a nonprofit Colorado membership corporation engaged in scientific and educational activities in the atmospheric and related sciences. With a membership of 62 universities, UCAR manages the National Center for Atmospheric Research (NCAR), under contract for the National Science Foundation. A major component of NCAR's mission is to provide state-of-the-art research tools and facilities to the U.S. atmospheric sciences community. These facilities include high performance computers.
NCAR has a long history of leadership in advancing technology for understanding and predicting the Earth's system. This research includes long-term development, documentation, and support of numerical models that require high performance computers. Thus, plans for acquiring and providing high performance computers are coordinated with plans for research projects that need these computers.
A New NCAR Climate Model
In 1995, NCAR scientists began development of a new climate model that substantially advances the state-of-the-art in climate modeling1 and this model requires a very high performance computer.
1 “Model Gets It Right - Without Fudge Factors,” AAAS Science, Vol. 276, 16 May 1997, pa. 1941.
For example, if this model is run 24 hours per day on a computer that can sustain 5 Gigaflops, approximately 16 calendar days are required to simulate 100 years of climate. In the course of a single scientific study, scientists routinely need to simulate several climate scenarios and perform several sensitivity studies for each scenario. Thus, a single scientific study may involve 20 or more 100-year simulations. By October '98, the successor to this model will require a computer that can sustain approximately 25 Gigaflops in order to complete a single 100-year simulation within approximately two weeks of calendar time. The computational requirements for this and similar models were considered when the RFP was developed.
The RFP was open to computers of any architecture, e.g. vector, nonvector, massively parallel, etc. The RFP included a benchmark suite of computer programs that were designed to verify robust operation and to measure performance. The benchmark suite was provided to 14 supercomputer vendors for their critique prior to the release of the RFP. This was done to assure UCAR that the benchmark suite was objective and could be readily executed on a variety of computer architectures.
In March '95, the RFP was formally released to the 14 vendors: 12 U.S. manufacturers and two foreign manufacturers. The RFP included the option for vendors to bid on one or both of two scenarios:
All vendors were given two opportunities to ask questions and request clarifications to the RFP.
Four of the 14 vendors responded and three of those were within the competitive range. UCAR required each vendor to perform a
Live Test Demonstration (LTD) using the benchmark suite and the first LTDs were performed in August and September '95.
In October '95, UCAR issued guidelines for preparing a BAFO. The guidelines stated “…UCAR is prepared to accept a major change in system architecture and programming environment …” Also, UCAR required that each vendor perform a second LTD and these LTDs were undertaken in February '96.
Performance Expectations in the BAFO Guidelines.
In its guidelines for preparing the BAFO, UCAR suggested that the vendors focus on the five-year scenario and UCAR refined its expectations of performance for this scenario:
By October '96 -
(1) at least one system that could sustain 5 Gigaflops when executing the NCAR community climate model from the benchmark suite, and
(2) an aggregate capacity of at least 20 Gigaflops.
By October '98 -
(3) at least one system that could sustain approximately 25 Gigaflops when executing the NCAR community climate model as specified in the BAFO guidelines, and
(4) an aggregate capacity of at least 45 Gigaflops.
Items 1) and 3) reflect the needs of the new NCAR Climate Model discussed previously. Items 2) and 4) could be met by offering an ensemble of systems. Item 1) was mandatory.
The BAFO from the Federal Computer Corporation (FCC)
The FCC BAFO provided:
one SX-4/32 to be delivered shortly after signing of the
a second SX-4/32 to be delivered October 1, 1997;
two additional SX-4/32s to be delivered October 1, 1998.
The February '96 LTD verified that the FCC BAFO met UCAR's robust operational requirement. The LTD also demonstrated that FCC met items 1), 2) and 4) of UCAR's performance expectations; specifically,
• A single SX-4 executed the benchmark for item 1) with a sustained performance of approximately 13 Gigaflops.
• With regard to item 2), the UCAR LTD for the SX-4 was conducted on a prototype machine with a 9.2 nanosecond cycle time. The prototype SX-4 executed the benchmark for item 2) with a sustained performance of 18 Gigaflops. Production versions of the SX-4 operate with an 8.0 nanosecond cycle time, so a production SX-4 will deliver 20.7 Gigaflops for item 2).
• The prototype SX-4 sustained 17 Gigaflops when executing the benchmark for item 3). Production versions of the SX-4 will deliver 19.5 Gigaflops. Further, the prototype SX-4 sustained 24 Gigaflops when executing a benchmark that is closely related to the benchmark for item 3).
• Since a production version of the SX-4 is projected to sustain 20.7 Gigaflops for item 2), it follows that the FCC BAFO meets item 4).
Overall, the NEC SX-4/32 is by far the fastest computer that UCAR has ever evaluated.
The BAFO from Cray Research
After an amendment to its BAFO (see Ref. ), Cray Research offered an ensemble of vector and non-vector equipment that involved one system in May '96, two systems in September '96, and five systems
in August '98.2
Cray Research could only perform the LTD on the May '96 system and this system demonstrated the ability to meet UCAR's robust operational requirement. However, this system only met item 1) of UCAR's performance expectations and the BAFO required removal of this system in August '98.
Basis for Selection
Three factors weighed heavily in evaluating the FCC BAFO and the Cray Research BAFO:
a) FCC demonstrated that all of its equipment for the five-year scenario met UCAR's robust operational requirement, and met items 1),2), and 4) of UCAR's performance expectations.
b) Cray Research could demonstrate only one machine - the May '96 system - that met UCAR's robust operational requirement and its performance only met 1); this system would have been removed in August '98 and none of the systems to be installed in August '98 could be tested.3
2 For details, see amended Attachment II to UCAR's response to the Purchaser's Questionnaire, July 31, 1997.
3 Counsel for Cray Research has noted that at the conclusion of the benchmark test in February '96, NCAR personnel advised Cray Research that “there were no showstoppers.” The context of that remark is as follows:
a. The initial (November 30, 1996) BAFO from Cray Research included a new nonvector system that was the centerpiece of the offer and that was being designed. An elementary analysis of the machine's specifications showed that it would not meet the performance levels that Cray Research was projecting for our applications. This was a “show stopper” for that offer and we advised Cray Research of this. Cray Research verified our analysis and amended their BAFO by replacing this machine with other equipment.
b. The amended BAFO from Cray Research contained their T90 as the flagship system for the first two years of the proposal and the LTD was to be performed on it. Approximately two weeks before the February '96 LTD, Cray Research informed us that they could not perform the LTD on a T90 due to fundamental problems with the T90 memory system. This was another “showstopper” for the BAFO. Cray Research requested that they be allowed to perform the LTD on the C90. We agreed.
(Footnote continued on next page)
c) The May '96 system from Cray Research accounted for only about 10% of the total computing capacity in the BAFO.
Based on a) through c), UCAR concluded that FCC offered and demonstrated overwhelmingly superior technical performance and low risk relative to the Cray Research five-year offer. Thus, Cray Research lost this procurement because their BAFO had unacceptable technical risk - in particular, neither the September '96 nor any of the August '98 systems could be tested. In fact - and as noted in  - had FCC withdrawn from the competition, UCAR would have selected the three-year offer from Cray Research due to the risks of their five-year offer.4
If An Antidumping Order is Issued
UCAR and the community it serves currently enjoy world leadership in several areas of atmospheric sciences research that depend on high performance computing. In order to maintain this leadership, UCAR must have computing capabilities that are comparable to peer organizations throughout the world. The most powerful computer that UCAR has today sustains 5 Gigaflops. Meteorological centers in Australia, Canada, England, and elsewhere are installing systems that by January '98 will sustain from 20–80 Gigaflops on a single application. This is four to sixteen times as much computing capability as UCAR has at present. Further, we estimate that those centers are acquiring this capability at an annual cost that does not exceed the annual expenditure that UCAR offered in this RFP.
If an antidumping order is issued, then UCAR has two options:
1. Switch to highly parallel, nonvector systems. As evident in the RFP, we have the option to switch to these systems. Several U.S. manufacturers market parallel, nonvector systems. By switching to this
(Footnote continued from previous page)
So with an amended BAFO and the last minute change to perform the LTD on the C90, Cray Research finally made an offer that did not have any “showstoppers.” The remark did not mean that Cray Research had won the competition, rather it meant they had qualified.
4 UCAR estimates that today - eighteen months after their BAFO - about 80% of the capacity offered by Cray Research is still not demonstrable.
technology - and the two are interchangeable - UCAR is assured a competitive marketplace from which to procure equipment. We have already noted that meteorological centers around the world are rapidly increasing their computational capability and doing so without an increase in cost. If an antidumping order is issued, then UCAR believes that the parallel, nonvector marketplace is our best hope for obtaining comparable amounts of computing per dollar. However, some time will be required to acquire and convert to the new systems.
2. Broaden our national and international collaborations to include access to high performance computing systems. The U.S. atmospheric sciences community routinely participates in national and international research projects and collaborations. When scientifically appropriate, these activities can occasionally include access to leading edge, high performance computers including computers in other countries. For technical reasons, this option is not a desirable way to compute. Moreover, this approach cannot be relied upon to meet UCAR's computing needs in a systematic manner that serves all of its users.
Both of these options will impede UCAR's rate of scientific progress while at the same time UCAR's international peers are accelerating their rate of progress. This will have far reaching, negative consequences. UCAR, plus the U.S. community it serves, may forfeit their research leadership in advancing technology for weather forecasting and climate modeling.
1. Cray Research lost this procurement because of unacceptable technical risk in its BAFO.
2. An antidumping order will have far-reaching, negative impact on U.S. leadership in atmospheric science.
1. “Accommodations Made for the Competitors,” Comments by the University Corporation for Atmospheric Research on the Antidumping Petition of Cray Research, Inc.,” dated August 16,
1996 and addendum thereto dated August 23, 1996 (both documents were made available to the ITC).
2. Letter from Frank Schuchat, Holme Roberts & Owen LLP, to Valerie Newkirk, Office of Investigations, U.S. ITC, dated September 18, 1996.