The scientific challenges of using both satellite and in situ data included the fact that the satellite data represent measurements of outgoing radiation at satellite level, which have a complicated dependence on multiple aspects of atmospheric structure, whereas in situ data typically provide a direct measurement of one aspect of atmospheric structure. Moreover, the satellite data constitute a continuous stream along a swath below the satellite orbit but may not exactly reproduce its view of a particular point for many hours, or perhaps several days. In addition, depending on the viewing geometry and frequency band used by a satellite instrument, clouds and precipitation may limit observational capability. Furthermore, satellite and in situ data have very different error characteristics, which also complicate the inference of information.
Optimal estimation of the evolving state of the atmosphere over a period (say, 1 day) requires one to use not only the observations available in that time window but also earlier observations and knowledge of the laws governing atmospheric evolution. These laws are highly nonlinear and are expressed in the forecast equations of an atmospheric model. To begin the interpretation of the observations received between 1200 UTC yesterday and 1200 UTC today, one projects yesterday’s best estimate at 1200 UTC forward to 1200 UTC today by using the forecast model to provide an a priori estimate for calculation of today’s best estimate. The evolving a priori state is sampled through the 24 hours by a simulated observation network to provide the a priori estimate of the actual observations. The estimate of the observations (the expected values of the observations) includes simulations of the actual in situ observations and simulations of the observations of the actual fleet of satellites operating during the period.
The mismatch between the actual observations and expected observations (simulated from the a priori forecast) is used in an iterative variational procedure to adjust the starting point for the forecast so that the trajectory of the forecast (the evolving model state) is increasingly close to the observations. The iterative nature of the calculation has many advantages, not least that one can make accurate use of observations that are nonlinear in the model variables and that many types of observations can contribute to the estimation. The algorithm is known as four-dimensional variational data assimilation (4D-Var), and the underlying Bayesian inference theory is closely related to such algorithms as the Kalman filter. The substantial computer costs are justified by the benefits of the calculation. A prime output of the calculation is the best estimate of the atmospheric state at 1200 UTC today. However, there are many other benefits, especially that it is a systematic resource for determining random and systematic errors in the observations, in the model, and in the procedure itself.
Current practice in operational data assimilation has evolved to its present state for two important reasons. First the observations available at 1200 UTC today cannot provide a global picture, because of gaps in the spatial coverage (both in the horizontal and in the vertical), gaps in temporal coverage, gaps in the range of observed variables, and uncertainties and variations in the errors and sampling characteristics of different observing systems. The numerical model uses yesterday’s best estimate and observations taken within the assimilation window to fill the observational gaps by transporting information from data-rich to data-sparse areas. The second reason is related to a basic result in estimation theory. Suppose that one seeks a best estimate of the state of a system by using two sources of information with accuracies1 represented by A1 and A2. Theory says that in the best combination of the two estimates, the two sources of information are weighted by their accuracies, and the accuracy A of the resulting combination is given by