The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop
data set no longer becomes very massive. What John was bringing up is that these large satellite records typically have substantial non-zero biases, even when you average them. These biases are actually a major component of using these. So, a typical bias would be simply change a satellite platform that is measuring a particular remotely sensed variable, and you can see a level shift or some other artifact. In terms of combining different satellites, you need to address this. These biases need to be addressed empirically as an important problem.
The other technical challenge is reducing data. This is another interesting thing about massive data sets, that part of the challenge here is to make them useful. In order to make them useful, you have to have some idea of what the clientele is. We have had some discussion about being careful about that, that you don’t want to sort of create some kind of summary of the data and have that not be appropriate for part of the user community. The other thing is, whatever summary is done, the assumptions used to make it should be overt, and also there should be measures of uncertainty along with it.
Collaborations, I think for this we didn’t talk about this much, because I think they were so obvious. Obviously, the collaborators should be people in the geophysical community that actually work and compile this data with the statisticians.
Some obvious centers are JPL, NCAR, NOAA—Ralph, do you volunteer CORA as well?
MR. NYCHKA: John, NCDC, I am assuming you will accept visitors if they show up.
AUDIENCE: Sure will. It is a great place to be in the summer, between the Blue Ridge and the Great Smokeys.
MR. NYCHKA: Okay, so one thing statisticians should realize is that there are these centers of concentrations of geophysical scientists, and they are great places to visit. The other collaboration that was brought up is that there needs to be some training of computer science in this. The other point, coming back to the climate change research initiative, is that this is another integrator, in terms of identifying collaborations. In terms of how to facilitate these collaborations, one suggestion was—this is post docs in particular—post docs at JPL.
I tried to steer the discussion a little bit, just to test the waters. What I suggested is some kind of regular process where there are meetings that people can anticipate. I am thinking sort of along the interface model or research conference model. It seems like the knee jerk reaction in this is simply, people identify an interesting area that they declare, let’s have a workshop. We have the workshop, people get together, and then that is it. It is sort of the final point in time. I think John agreed with me, in particular, that a single workshop isn’t the way to address this. So, I am curious about pursuing a sort of more regular kind of series of meetings. Okay, and that is it.