Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 203
--> Items for Ongoing Consideration Data Preparation Elevation of status of data preparation and data quality stages in professional societies Clear articulation of what is meant by a massive data set Development of rigorous, theory-based methods for reduction of dimensionality Systematic study of how, when, and why methods used with small and medium-sized data sets break down with large size data sets; understanding of how far current methods, both statistical and computational, can be pushed; articulation of the variety of models that might be useful Development of methods for integration of tools and techniques Development of specialized tools in general "packages" for non-standard (e.g., sensor-based) data Establishment of better links between statistics and computer science Exploration of the use of "infinite" data sets to stimulate methods for massive data sets Creation of richer language for describing structure in data Educational opportunities—for nonstatisticians who use some statistical techniques and for statisticians, to broaden the knowledge base and provide better links to computer science Models and Data Presentation Research Issues Discovery and comparison of homogeneous groups Communication and display of variability and bias in models Better design of hierarchical visual display New modeling metaphors and richer class of presentation approaches Methods to help "generalize" and "match" local models (e.g., automated agents) Robust or multiple models; sequential and dynamic models
OCR for page 203
--> Alternatives to internal cross-validation for model verification Retooling of computing environment for modeling massive data sets Simple presentation of ''massive'' complex data analyses