National Academies Press: OpenBook
« Previous: Ralph Milliff Global and Regional Surface Wind Field Inferences from Spaceborne Scatterometer Data
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 43
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 44
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 45
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 46
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 47
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 48
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 49
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 50
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 51

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 43 TRANSCRIPT OF PRESENTATION MR. NYCHKA: Our next speaker is Ralph Milliff from Colorado Research Associates in Boulder, Colorado. MR. MILLIFF: I, too, would like to thank Doug for inviting me. The work that I am going to talk about began with collaborations in the geophysical statistics project at the National Center for Atmospheric Research, of which Doug is the director. My earliest collaborations, and ongoing, are with Mark Berliner who is now at Ohio State—he was a prior director of GSD—and Chris Wikle, who was then a post-doc at GSD. Also, in addition to Doug and Chris and Mark, I am happy to acknowledge Tim Hoar, who is the staff scientist for the statistics project, and Jan Morzel. Sustaining research support has come from the NASA Earth Science Enterprise ocean vector wind science team, and I acknowledge and appreciate that very much. Since there is a JPL person in the audience, I am obliged to show a picture of their instrument. This is what the people in the trade call eye candy. It is for you to look at. It isn't really the real system, but the data set that underlies the massive data stream I am going to talk about today is the surface winds over the global ocean. These all have begun to explode in volume and precision since about 1991 when, within the Earth-observing era of satellite data sets, the first global wind data set began with the European Space Agency mission, ERS-1 and 2. Before I tell you what a scatterometer is and how it works, I should convince you a little bit that the global ocean surface wind field is a worthwhile field to measure. The surface winds transfer momentum between the atmosphere and ocean. They modulate the transfer of heat and material properties, like carbon dioxide and liquid water. These obviously have important implications for inferences on climate processes and the rates of climate change, when taken in a global perspective. On shorter time scales, the surface winds and these same exchanges are very important in predicting weather. So, a scatterometer is a system that continually emits an active microwave pulse at the ocean's surface, where it is backscattered by capillary waves, and the backscatter signal is detected by the same platform in space, and related to a surface wind speed. So, this is a retrieval algorithm, of the kind that John Bates so cleanly described in his presentation. We are retrieving the surface wind from what the surface wind has done to ripple the surface, and backscatter a pulse of radiation that we know very well its properties. The little waves that backscatter radiation from a scatterometer are called cat's paws by sailors. They are the little ripples that form on the surface when a puff of wind sheers the surface of the ocean. Because we know the polarization frequency, the angles of incidence of the emitted pulse very well, and the backscattered pulse as well, we use several returns to fit model functions for separately the wind speed and wind direction, and retrieve the vector wind. The returns are aggregated over what we call wind vector cells. So, that is going to be my pixel. Instead of the radiance coming out of a particular patch or footprint on the surface of the Earth, I am looking at backscattered returns over an area. So, they are aggregated over an area. In terms of data volume, the European system that I mentioned first was launched in 1991. It is the longest-lived scatterometer system on our record. It continued until 2000. This is a 12-hour coverage of the globe. Within each swath there in black, there

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 44 are 70-kilometer resolution cells that overlap each other. So, the wind vector cell in the ERS system was 70 kilometers, and we got about 41% of global coverage in 24 hours. In 1996, NASA launched the NASA scatterometer, or N-scat. It about doubled the precision and the volume of the surface vector wind data set. The data organization in this case is a 25-kilometer resolution wind vector cells in two swaths that orient along each side of the satellite ground track. So, these are the polar orbits for a 12-hour coverage. You can see the gap at nadir in each swath, and then two swaths on either side of the satellite. The N-scat system came to an abrupt end when the solar collector on the satellite bus failed dramatically in 1997. So, we had about nine months worth of data. In response to its failure, NASA launched what is called QuickSCAT. It was a QuickSCATterometer. It has an 1,800 kilometer swath, so about 18 degrees longitude, with 25-kilometer wind vector cell resolution. This is the 12-hour coverage. Now, we are seeing 92 percent of the globe every 24 hours. As of tonight, at 8:30, a second sea winds system— sea winds is the scatterometer aboard the QuickSCAT satellite, a second sea winds system will launch aboard a Japanese satellite, hopefully, and we will have, for the first time, synoptic coverage of the surface wind field of the global ocean every day. This will be the 12-hour coverage from two sea wind systems. You can see that, in 12 hours, there are only very few gaps in the coverage of the surface wind fields. When we started to think about this problem, there were very large gaps, and this is one of the problems that Amy brought up in her talk just a minute ago. So, our first statistical model was how to fill those gaps with physically sensible surface winds. Well, what do I mean by physically sensible surface winds? One property of the surface wind field that has emerged from the scatterometer data set, an average property that we can hang our hat on as physicists and use statistical techniques to drive and time our interpolations is the spectral properties in wave number space. If you can permit me, I'll put the whole slide on this way. So, along what should be the ordinate, we have power spectral density or kinetic energy along—the abscissa is the spatial wave number. The spatial scales that correspond to those wave numbers are listed on the top here. What we observe in the surface wind field for our planet is that they obey an approximate power law. There is almost a constant slope in wave number space for the kinetic energy. That isn't the case, if you look at the other curves on this picture, for surface winds that come from weather center models. This is the climate model and these are forecast models. They depart from an approximate power law behavior on spatial scales much coarser than the grid resolution and numerical models that generate the weather to begin with. So, we have a spatial constraint now, to use in our interpolation. These spectra are for a box in the North Pacific Ocean, averaged over the calendar year 2000. There are interesting relations to two-dimensional and three-dimensional theoretical turbulence theories that also make this an appealing result, but they are not really relevant to this talk today. What we do notice now is that this spectral slope, this approximate power law, the slope of that spectrum has a spatial and temporal distribution. It is shallowest in the Tropics, where convective events supply energy at small scales, and they propagate upscale in an inverse cascade to the larger scale. They are steeper as you go to higher

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 45 latitudes. The western Pacific and the Indian Oceans are the shallowest. Typical slopes— this is for the zonal wind— there is a similar relationship with the Mariana winds, and there is an annual march in the slope of these spectra. It begins—it is shallowest in the earliest part of the year and steepens slightly in the Tropics and a less evident annual march in the midlatitude storm track regions. This pattern is repeatable. We can say that now because we have about three years. That was the picture for 2000. This is the picture for 2001. Again, we see shallow slopes in the western Pacific, shallowest slopes early in the year. We use these regional and seasonal properties of the surface wind field to perform that interpolation problem that I mentioned. We need to account for this wave number deficit in the weather center winds. What the weather center winds have going for them is that they are available four times a day everywhere. The satellite isn't true. It drops out in rain, and the satellite has a non-parametric sampling scheme that has to do with its polar orbit configuration. What we did was use a multi-revolution wave length procedure to blend wave number deficient surface analysis of the surface wind four times a day, with the available scatterometer data. The constraint and the reason why we used the wavelets was because wavelets have a multi-resolution property that allows you to specify the slope of a fractal process. The slopes we were going to use, obviously, are those spectral slopes that distribute with space and time over the globe, as observed by the scatterometer. So, with an eight degree square on the globe every day, we collect the spectral slope over a 30-degree-long spectrum, and store it, and use that, and sample from the log spike distribution based on that collection, to augment the wave number deficient weather center winds, whenever we don't have a satellite observation. This is an example of that blending procedure. The field I am looking at now is a scalar. It is the wind stress curls. This is the derivative of the two components of the winds, the east-west component and the north-south component. It is going to be noisy, but it is scalar. So, I can show the effect of our blending technique, and it also points out some important meteorological features, that are a benefit of this particular procedure. Wind stress curl extrema are positive in the Northern Hemisphere, in the region of atmospheric cyclones. These are storms in the storm track regions of the Pacific and Atlantic Oceans. Associated with these storms are frontal systems, and these are almost washed out in the weather center products. This is the wind stress curl from the National Centers for Environmental Prediction on a given day in 1996. This particular blending is for the N-scat system. We do this on a regular basis for QuickSCAT, and that data is available from the data archive system from NCAR. Overlay the N-scat swaths in the middle panel. That really doesn't show up well. Actually, I think this is in the abstract as well. The swaths from the satellite, all you can see is that the wave number properties in the satellite data are much richer at high wave numbers than they are—due to blow up in the north Atlantic. So, here is that cyclone in the North Atlantic. You are looking at it from the bottom of the ocean. Here it is from the top. The frontal system is here in yellow. Here are the overlying scatterometer plots. You can see that the scatterometer detects the very sharp spatial features of the front, and the high-amplitude wind stress curl that occurs there when it crosses. The blending procedure, because it is a spatial model, can't keep track of the propagation of this system. The space-time model will,

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 46 and we are working on that. Because it is a spatial model, within the eight degree squares in the gap region, high- amplitude wind stress curls, commensurate with the wind stress curl that occurs in the front, is distributed. That is important for driving a global ocean model. In fact, in 1999, we wrote a paper that showed that the ocean models that we used for climate forecasting, for climate stimulations, were very sensitive to this change in the surface wind fields. So, you drove a model with the weather center winds, and we are not getting this high wave number forcing. This high wave number forcing turns out to be important. This is the annual mean response after a spin of three degrees global ocean model. Here is the difference with respect to a calculation that was done with the weather center model. So, it is the blended winds minus the weather center winds. You can see, up to 7 meters per second, 3 1/2 knot differences in the currents of the upper oceans. More important, there is a big divergence pattern in the Tropics. This is the region of the El Niño Southern Oscillation signal. So, the divergences there have big implications for cold water at the surface, and the propagation or not of an El Niño signal into the eastern Pacific. So, there is a reason to do this for climate. I am going to shift now to regional applications, and perhaps a little more deep statistical modeling. This comes from my learning at the feet of Berliner and Wikle. This is an AVHR, a radiometer image. It is basically a temperature in the infrared and visible regions of the spectrum, of the Labrador Sea. This is the coast of Labrador, here is the southwestern coast of Greenland. This is one of a few regions in a world ocean where the so-called ocean deep convection process occurs, and this is critical to climate. This is a place where the properties of the surface atmosphere are subducted to great depth and for very long times into the ocean. This is the so-called thermal haline circulation, or the global conveyor belt, that you might have heard of. The Labrador Sea is one place. The eastern Mediterranean is another, and a few points around Antarctica are the other places where the atmosphere and the deep ocean are in contact, very briefly, during these very brief ocean convection events, and they drive the redistribution of heat on the planet. The ocean part of that redistribution happens here. So, those convective triggers are associated with weather patterns of the kind we see here. This is called a polar low. The low-pressure system is centered around the middle of the basin. The winds that are associated with this, the surface winds that are associated with this signal, drag dry, cold continental air across the relatively warm sea surface and exchange those properties that I talked about in the beginning of the talk—heat, moisture and momentum—and superdensify the surface ocean and provide this plunging physical mechanism. Within an hour of this AVHRR image, the NASA scatterometer, or N-scat, a fragment of that swath occurred, and you can see the signature of the polar low here. So, it is understanding these convective triggers, and certainly the surface wind field associated with them, is sort of the target science problem that we dealt with. This is another polar low signal in the Labrador Sea. This sets up a Bayesian hierarchical model that we use to retrieve a uniform surface wind field with estimates of uncertainty at each grid point, from the scatterometer data. So, we are going to build a posterior distribution for the wind at the diamond grid points—that is our target grid— given the scatterometer data and a prior, based on a

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 47 physical balance. This is the work of Andy Royal, who was another post-doc at GSP, and Mark Berliner and myself. So, I think, as a physicist, about Bayesian models in stages. The first is the data model stage, or what you would call the likelihood, and I think this is a very natural entity for satellite data. It is wrong to think of satellite data only as moorings for balloon traces that happen to be right next to each other in space. Instead, they inform probability distributions. So, probabilistic models are an essential new technique, I think, that we need a great deal of help with in the geophysical field. I think our sort of preliminary pilot studies show that there is a great deal of play here. The likelihood model gives us a distribution for the data that naturally arises from measurement error models that come from every satellite mission that is ever launched. We do what are called calibration validation studies. Calibration validation studies will inform the likelihood distribution to excellent precision and allow the satellite data—the volume of it—to actually speak to the posterior very clearly. In the prior process model, we used heritage and geophysical fluid dynamics that go back for generations. We developed essential dynamical descriptions of processes of interest, and there is a whole branch of atmospheric and oceanographic science to do just this. We can blend these two heritages, the observational side of our field and the process model side of our field, to develop very useful posterior distributions. Of course, the analytic derivation of the posterior is out of range because the normalizer is intractable. So, we rely on the advances in your field and GIB sampling and mark up chain Monte Carlo algorithms. The data statement for this particular problem—Bayesian formalism, as I hope I will get to at the end of the talk— is very amenable to multi-platform observation. In fact, we have a prototype model here that uses altimeter and scatterometer for the same problem. The altimeter measures the sea-surface height, from which we can infer ocean- surface currents. So, what we use as a data statement involves an incidence matrix, and this is another problem that satellite data introduced, and that the statistical community can readily address, changes of support. We have a single point observation, perhaps, within the swath of a satellite data. We want to infer something about a process in the grid cell for a model. You people know how to do this, and we are beginning to deal with that. That is the kind of information that goes into this incidence matrix, K. On the other hand, when we are given the abundance of data that comes with the satellite overpass, the incidence matrix need not be very sophisticated, we have found, and we have simple nearest-neighbor algorithms at present that will yield the results that I am about to show. Then, as I said before, the measurement error models are the calibration validation studies. For the process model, we use what we call stochastic geostrophy. This is a fundamental balance between the gradient of a pressure field and the implied velocity. Because we are on a rotating planet, any gradient in pressure or potential will initiate a flow from high potential to low potential but, because we are rotating, the resultant flow, which accounts for this rotation vector, will be along the lines parallel to the gradient. This is called the geostrophic relation, and we can translate this differential expression for the geostrophic relation into a probabilistic statement. So, for our priors, we say that the zonal wind, given some pressure, some hidden process pressure and variance, is distributed normally, and the mean of that normal distribution is proportional to the gradient of the pressure, and the variance is expressed

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 48 in terms of the covariance of the wind field that we might know about from our observations. The second level, we prescribed a pressure field. The pressure, you know, is a good news/bad news sort of field. It is the reason why we can't do massively parallel calculations very efficiently in climate and weather forecasting, because it is quasielliptic. So, the perturbations in the pressure from remote places have a very important impact on the pressure at the grid point of interest. So, it is bad in that sense. It is good in the following sense: Since it is quasi-elliptic, it is relatively smooth, and it is well approximated by harmonic operators. It turns out it is a good thing to hide in a Bayesian hierarchical models, because there are analytic expressions for surface pressure that fit well with meteorological processes, and we have done some regional studies with drifters in the region that give us space and time scales of variability for the pressure field there. So, we can prescribe a pressure process solely in terms of its covariant structure for models of that kind. Building that prior distribution, building the data distribution, using a Gibb's sampler, we generate the following posterior mean distribution for the surface winds. There is already an important result here. Had the prior dominated the flow, as I told you, should be parallel to the isobar. In fact, the wind, the posterior mean wind here, is crossing isobars, and that means that the satellite data has spoken. The Bayesian formalism requires that we get a distribution for not just the dependent variable of interest in the deterministic sense, but also all the parameters and the hidden processes. So, in addition to the surface wind, we have a posterior distribution for surface pressure as well. The right-hand panel shows what the weather center forecasts for this particular time also, and came up with in a deterministic model. This was a single realization from a forward model. All they came up with is the following pressure distribution. When we overlay the original satellite data it shows, in fact, they misplaced the low-pressure center. So, their region of ocean deep convection triggering would have been in the wrong place and, in fact, the intensity was considerably weaker than it is in the posterior mean distribution from the Bayesian hierarchical model. We have done a similar and more sophisticated Bayesian hierarchical model to retrieve surface winds in the Tropics. In fact, thanks to Tim Hoar we are providing 50 realizations of the most specifically reasonable surface winds from the Indian Ocean to the dateline, 40 degrees North and 40 degrees South, four times a day. That is going to be available, and it will be interesting to see what the geophysical community does with it. I know what I am going to do with it, but there is a great deal that can be done with 50 realizations in terms of putting error bars on the deduction of weather and climate processes that we study. Typically, for example, John Bates mentioned the Madden-Julian Oscillation. Well, what we typically have to do to study the process of the Madden-Julian Oscillation, which takes about 10 days to propagate across the Indian Ocean and into the western Pacific, is average several events. These events happen every 40 to 50 days when they happen, and then they don't happen for a while. So, the background flow system is completely different in the situations that you have to composite, in some sense, to get an idea of what the generic Madden-Julian Oscillation looks like. Now, with 50 realizations of the surface winds for a single Madden-Julian Oscillation, we have a different concept of what the error bars are going to be on in relationships between, for example, surface convergence and propagation of this wave. That is an aside. The Bayesian hierarchical model that describes that wind blending in the Tropics has been published in JASA.

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 49 There is a Wikle et al. (2001) paper that I would refer you to, and also Chris Wikle's Web page. That is one of the 30 or 40 recent publications on his page. What I would like to talk about now is the prototype atmosphere ocean model that is also set in the Bayesian hierarchical context. This is an analog, a probabilistic analog, for the centerpiece tools for climate analysis and climate forecast. People who analyze climate run massive—we really do mean massive now—atmosphere-ocean coupled simulations on the largest supercomputers that they can find, and they provide a single deterministic realization at the end of the day. I think that this community can guide well those kinds of calculations, which are very expensive, by building the essential PDF, that whatever formal models of simulation they choose to run have to go through in some mode or some parameter sense. I can talk more about that in the breakout session. What we did was combine the atmosphere model that I have just described for the Labrador Sea and an ocean model with slightly more sophisticated physics for the prior. We separated in the data stage or the likelihood the errors with respect to the atmospheric process, and the scatterometer data from the errors in the altimeter data from the ocean process. So, those were independent. In the process model stage, we simply factored the joint distribution between the atmosphere and ocean processes. In the atmospheric process times an ocean process, we come up with the posterior interests, which are a process for atmosphere and oceans, all the parameters, given scatterometer and altimeter data. Then the horrible normalizer, of course, is the simulation method, which is here. The simulation method is particularly clever, and this was Mark Berliner's design, and I will come to that, I hope, in the end. So, the process model was built on now a dynamic differential equation that has proved itself. It is the original ocean model, actually. It is called quasi-geostrophy. We have terms for the evolution of the ocean stream function, , non-linear effects, convection, planetary vorticity, forcing by the surface wind, bottom friction and internal friction. The first step that a deterministic modeler would take would be to discretize these on a grid of interest and form matrix operators, and that is done here. This is changing a differential equation into a difference equation, a very standard technique. You will notice that it is also very Markovian. We have matrix expressions operating on the previous time-level stream functions to give us an estimate of the next time-level stream function. We also separate out the boundary conditions which, when we jump to probabilistic state, will become a boundary process, and that is a very big issue in geophysical modeling of limited area domains. So, here is the leap to probablistic ocean stream function, and these operators are modeled directly after their finite-difference counterparts in the deterministic world. We have the linear operators on the previous time level, the non-linear operator, surface wind stress, boundary process, and we have added a model misfit term. This model misfit term is the only account for model error that forward model data simulation systems can make. In contrast, we have distributional parameters, which make these random variables in front of every term here. So, we have a term-by-term management of uncertainty, and the uncertainty can interact, in the way that invection uncertainty should interact with the diffusion uncertainty in the dynamic. Along with this come several economies. Because the

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 50 deterministic system is stiff, elliptic, difference equation, they are constrained to take very small time steps that are not relevant to the physics of interest. It is not important to a polar low, what is happening on 15 second time intervals but, because it is an elliptic system, the way it is written in deterministic space, they are constrained to take 15-second time steps, and this drives a huge expense. They are also constrained to take very small spatial steps in their models. In a probabalistic model, we are not so constrained. There must be a constraint that makes physical sense of some kind in the probabilistic world, and this is the sort of theoretical problem that I think this community could pursue and make large contributions. Nonetheless, we can take six hour time steps and three times the grid space in our air-sea hierarchical Bayesian model. The algorithm— -I can't go through it given time—is a clever combination of Markov chain Monte Carlo for the atmosphere, and important sampling Monte Carlo for the ocean. That is atmosphere-ocean physics in probabilistic space. The importance weights are the likelihood distributions for the ocean data. So, I build a catalog of forcing from the atmosphere. I go ahead and generate the ocean stream functions that come from every member of that catalogue. Then, I say the important ones have to do with how they—the data distributions from the ocean sensor, the altimeter oriented. We tested this model in what is called an observing system simulation experiment. We had a truth simulation from primitive equations, a more sophisticated physical set, very high resolution, and compared the Bayesian hierarchical model, posterior distribution, with that truth simulation over 10 days. First, we had to spin up the truth. So, for a year we forced the box that looked like the Labrador Sea with this kind of wind, and generated this kind of ocean stream function equivalent in primitive equations. There is a cyclonic eddy in the southwestern corner, and a rim current that is similar to the Labrador Sea, and then closed eddies on the interior of that rim current. Then, we idealized the surface forcing of a polar low, and sampled it as thought we had a scatterometer, and sampled the ocean as though we had an altimeter, corrupted those data with proper measurement noise, fed those to our data stages in the Bayesian hierarchical models. So, these are days one, three, five and seven of the simulated data. You can see the ocean stream function evolving underneath the altimeter. The altimeter tracks are here. This is representative of the TOPEC system. The simulated scatterometer is representative of the sea wind system, and you can see the polar low, which is perfectly circular and propagating perfectly zonally across this box, sampled, and then depart from the domain. This is a comparison of truth simulation on the left and the Bayesian hierarchical model posterior mean distribution on the right, for the same four days, one, three, five and seven of a 10-day simulation. I will show you a difference map in a minute, but the main difference is that the Bayesian hierarchical model is actually more responsive, in a posterior mean sense, to this polar low, than was the primitive equation model. As a physicist, my conception of statistical models always used to be, yes, they were generally right, but man, they were really sluggish and smooth, and the real detail that I needed wasn't available. Well, that seems to be quite the opposite in a posterior mean, let alone the realizations from the posterior distributions.

GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACEBORNE SCATTEROMETER DATA 51 Here are the difference plots, and I draw two columns here. One is the difference for the full Bayesian hierarchical model. The other is the difference when I exclude that importance weighting. So, here, on the right-hand column, is a Bayesian hierarchical model for which we did not supply altimeter data in a separate data stage. You can see that this allows us to quantify, in a distributional sense, the value added of the altimeter data to the atmosphere ocean problem. Interestingly, all of the differences isolate with features of interest. So, the cyclone in the southwestern corner is a place where differences exist and, because we have a posterior distribution, it is a place where uncertainty exists. This is a map of the standard deviation as a function of space for day seven, day five, day three, day one. Notice also that the boundary process is emerging as a source of uncertainty, and that is very consistent with the experience in forward modeling in limited area domains in the atmosphere and ocean. I am going to skip my last slide, which is a summary slide for this model that is in the abstract, and get to my conclusions. The regional and global surface wind data sets from space are important. With two sea wind systems, there will be eight times 105 surface vector wind retrievals every 24 hours. That is a Level 2 product. A Level 1 product is an order of magnitude bigger than that. Those are the backscatter observations. I would never use a Level 3 product— I hate to say this—because Level 3 depends very much on what you want to do, and I will build my own Level 3 products, as I have seen. I have blended winds from the weather center and the satellite. [Question off microphone from audience.] MR. MILLIFF: Level 3 is for eye candy. It makes for the prettiest slides and animation and things, but you can't do science. The problem that polar-orbiting and even equatorial-orbiting satellites pose for geophysicists is that they don't appear on regular grids, they don't have uniform spatial—they don't leave a global field with uniform spatial and temporal resolution. So, that is a key issue that Amy brought up. We have used physical constraints to drive a process to build those uniformly distributed spatially and temporally varying grids. Multi-resolution wavelets to impose this spectral constraints. Bayesian hierarchial models exploit the massive remote sensing data sets. I think what I expect to hear, in parts of this meeting, is that we have a problem of trying to find a needle in a haystack. I think what geophysicists need to say is that, wow, there is a haystack that we can use. Never mind the needles. We used to just put the needles out there as a few moorings in a few places. Now, in fact, there is a whole probability distribution that needs to be exploited that comes from these satellite data. Bayesian hierarchical models are amenable and readily adaptable to multiplatform data. The modern ocean observing system will involve remote sensors from space and in situ drifting, autonomous systems. The changes of support and distributional interactions of the uncertainty in the signals from those data, I think, are readily handled by the Bayesian hierarchical model approach. There has been a demonstration of air sea interaction through a Markov chain Monte Carlo-important sampling Monte Carlo linkage. Thanks.

Next: Global and Regional Surface Wind Field Inferences Given Spaceborne Scatterometer Data Ralph F.Milliff »
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop Get This Book
×
 Statistical Analysis of Massive Data Streams: Proceedings of a Workshop
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!