The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop
Report from Breakout Group
Instructions for Breakout Groups
MS. KELLER-MC NULTY: There are three basic questions, issues, that we would like the subgroups to come back and report on.
First of all, what sort of outstanding challenges do you see relative to the collection of material that was in the session? In particular there, we heard in all these cases that there are real specific constraints on these problems that have to be taken into consideration. We can’t just assume we get the process infinitely fast, whatever we want.
The second thing is, what are the needed collaborations? It is really wonderful today. So far, we are hearing from a whole range of scientists. So, what are the needed collaborations to really make progress on these problems?
Finally, what are the mechanisms for collaboration? You know, Amy, for example, had a whole list of suggestions with her talk.
So, the three things are the challenges, what are the scientific challenges, what are the needed collaborations, and what are some ideas on mechanisms for realizing those collaborations?
Report from Atmospheric and Meteorological Data Breakout Group
MR. NYCHKA: The first thing that the reporter has to report is that we could not find another reporter except for me. I am sorry, I was hoping to give someone the opportunity, but everybody shrank from it.
So, we tried to keep on track on the three questions. I am sure that the other groups realized how difficult that was.
Let me first talk about some technical challenges. The basic product you get out of this is a field. It is maybe a variable collected over space and time. There are some just basic statistical problems of how you summarize those in terms of probability density functions, if you have multiple samples of those, how you manipulate them, and also deal with them. Also, if you wanted to study, say, like a particular variable under an El Niño period versus a La Niña period, all those kinds of conditioning issues. So, that is basically, sort of very mainstream space-time statistics.
Another important component that came out of this is the whole issue of uncertainty. This is true in general, and there was quite a bit of discussion about aligning these efforts with the climate change research initiative, which is a very high level kind of organized effort by the U.S. government to study climate. Uncertainty measures are an important part of that, and no surprise that the typical deterministic geophysical community tends to sort of ignore these, but it is something that needs to be addressed.
There was also sort of the sentiment that one limitation is partly people’s backgrounds. People use what they are familiar with. What they tend to do is limited by the tools that they know. They are sort of reticent to take on new tools. So, you have this sort of vicious circle that you only do things that you know how to do. I think an interesting thing that came out of this—and let me highlight this as a very interesting technical challenge, and it is one of these curious things where, all of a sudden, a massive