National Academies Press: OpenBook

Improving the Effectiveness of U.S. Climate Modeling (2001)

Chapter: Appendix I: Description of Different Codes

« Previous: Appendix H: Summary of Other Relevant Reports
Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×

Appendix I

Description of Different Codes

  1. ECMWF's IFS (Integrated Forecast System)

The ECMWF's IFS code is a parallel spectral weather model that is also used for seasonal climate prediction. Its structure is similar to climate codes from NCAR, including CCM, but its parallel execution model is highly evolved. It uses domain decomposition in two-dimensions and performs both spectral and Legendre transformations on the grid data. The sustained rates reported for IFS are in units of forecast days per day (in other words, the ratio of simulated time to real wall-clock time). The two machines compared with this code are the T3E (600-MHz Alpha processors) and the Fujitsu VPP5000. A single processor of the VPP5000 achieves 48 times the sustained speed of a single Cray T3E processor. In parallel configurations this ratio (of sustained rates per processor) increases to 57 when IFS executes on 1408 T3E processors and 98 VPP5000 processors. Notice that only 98 VPP5000 processors are nearly four times faster than 1,408 T3E processors.

Analysis of the performance of the IFS code (at T213 L31 resolution) on a variety of machines, (both microprocessor- and vector-based) indicates that machines with small numbers of fast vector processors are superior to highly parallel microprocessor-based SMPs (Fig. A-1).

  1. Environment Canada's MC2

MC2 is a regional, non-hydrostatic weather model. It uses a variety of sophisticated solvers and is structured somewhat like a global spectral model. In Table 4-3 we show the sustained performance in Mflops of MC2 for the Origin 2000 (250-MHz R10000 processors with 4-MB caches) ver-

Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×

FIGURE A-1 Performance of the IFS code (at T213 L31 resolution) on a variety of machines, both microprocessor- and vector-based. The figure shows that machines with small numbers of fast vector processors are superior to highly parallel microprocessor-based SMPs. Courtesy of D. Dent, European Centre for Medium-range Weather Forecasts.

Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×

sus the NEC SX-5, a Japanese VPP. Like the IFS code, the MC2 code executes nearly 50 times faster (on a per-processor basis) on the SX-5 compared to the R10000. The aggregate performance achieved on 28 SX-5 processors is 95,200 Mflops, almost 50% of the potential peak speed of the SX-5 in this parallel configuration. These speeds dwarf the 840 Mflops achieved on a smaller 12-processor configuration of the Origin.

  1. NCAR's MM5

MM5 is a grid-based, hydrostatic mesoscale model originally developed at Penn State. Its performance on a single Alpha 667-MHz processor is 360 Mflops, about 27% of peak performance. This is a much higher percentage of the peak microprocessor speed than is achieved by most weather models and is due primarily to MM5's good cache useage. When parallel execution of MM5 is considered, the AlphaServer cluster (128 4-processor machines connected together into a 512-processor configuration) is about 60% faster than a 20-processor VPP5000.

The per-processor speed ratio between the 512-processor Alpha and 20 processor VPP5000 is 16 in favor of the vector machine, much less than the 50-times difference found with the other codes. However, the lower ratio is consistent with the MM5, which attains sustained-to-peak performance a factor of 3 higher than the other codes.

  1. German Weather Service's LM and GME

Only limited performance data was available for the local (LM) and global (GME) weather models used by the German Weather Service (Deutscher Wetterdienst). LM is a regional model, while GME is a global model developed using an icosahedral-hexagonal grid. Both these models execute nearly 50 times faster (either serial or parallel) on a VPP5000 than on a Cray T3E.

The actual performance of real weather and climate codes support our contention that currently Japanese parallel vector supercomputers significantly outperform American-manufactured MPPs based upon microprocessor technology. If VPPs are not available, it is more difficult to get good performance. This is particularly true for “capability computing ” (see Section 3-1). As was demonstrated above, Amdahl's law requires a very high degree of parallelism in a model to achieve effective speedups on large numbers of processors. Recognizing this, SMP vendors have been moving to a hybrid architecture that places multiple processors on each node. SMP clusters typically have nodes that contain 2–16 processors sharing uniform memory access (UMA) via a bus or, when the number of processors exceeds 8, a higher-performance intra-node network. To take advantage of this UMA feature the preferred intra-node programming model is OpenMP threads, an evolving standard for what was once known on CRI VPP machines as “multi-tasking.” MPI is used for the non-

Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×

uniform memory access (NUMA) inter-node communications. Although this hybrid programming model adds another layer of complexity, it offers a useful path to parallelism if it can be efficiently implemented. For example, rather than doing two-dimensional domain decomposition in longitude and latitude, with the hybrid model one might decompose and use MPI only in latitude while treating parallelism in longitude with threads spread across the processors on each node.

Important considerations from a software standpoint are

  1. optimizing the placement of data with respect to the processor(s) that will use it most;

  2. minimizing the number and maximizing the size of messages sent between nodes;

  3. maximizing the number of operations performed on data that is in cache, while minimizing the amount of data required to be in cache for these operations to occur.

Normally, one expects the operating system or job scheduler to take care of “1” automatically. If data is not localized on the same node as the processor that will use it most often, performance will suffer and is likely to be quite variable from run to run. Item “2” requires careful planning of MPI calls. Item “3” requires the most code changes, such as subdividing the computational domain into blocks small enough so that all data for any single block will fit into cache. More radical steps involve converting from Fortran 90 to Fortran 77 in order to get explicit do-loops, re-ordering array and loop indices, in-lining subroutine calls, fusing loops, and other optimizations that would be left to the compilers if only they were capable.

Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×
Page 121
Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×
Page 122
Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×
Page 123
Suggested Citation:"Appendix I: Description of Different Codes." National Research Council. 2001. Improving the Effectiveness of U.S. Climate Modeling. Washington, DC: The National Academies Press. doi: 10.17226/10087.
×
Page 124
Next: Appendix J: Acronyms »
Improving the Effectiveness of U.S. Climate Modeling Get This Book
×
Buy Paperback | $44.00 Buy Ebook | $35.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Information derived from climate modeling has become increasingly important in recent years. More and more we understand that climate variability and change impacts society and that dealing with climate-related disasters, conflicts, and opportunities requires the best possible information about the past, present, and future of the climate system. To this end, Improving the Effectiveness of U.S. Climate Modeling describes ways to improve the efficacy of the U.S. climate modeling enterprise, given the current needs and resources. It discusses enhanced and stable resources for modeling activities, focused and centralized operational activities, how to give researchers access to the best computing facilities, the creation of a common modeling and data infrastructure, and research studies on the socioeconomic aspects of climate and climate modeling.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!