Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 203
-->
Items for Ongoing Consideration
Data Preparation
Elevation of status of data preparation and data quality stages in professional societies
Clear articulation of what is meant by a massive data set
Development of rigorous, theory-based methods for reduction of dimensionality
Systematic study of how, when, and why methods used with small and medium-sized data sets break down with large size data sets; understanding of how far current methods, both statistical and computational, can be pushed; articulation of the variety of models that might be useful
Development of methods for integration of tools and techniques
Development of specialized tools in general "packages" for non-standard (e.g., sensor-based) data
Establishment of better links between statistics and computer science
Exploration of the use of "infinite" data sets to stimulate methods for massive data sets
Creation of richer language for describing structure in data
Educational opportunities—for nonstatisticians who use some statistical techniques and for statisticians, to broaden the knowledge base and provide better links to computer science
Models and Data Presentation Research Issues
Discovery and comparison of homogeneous groups
Communication and display of variability and bias in models
Better design of hierarchical visual display
New modeling metaphors and richer class of presentation approaches
Methods to help "generalize" and "match" local models (e.g., automated agents)
Robust or multiple models; sequential and dynamic models
OCR for page 204
-->
Alternatives to internal cross-validation for model verification
Retooling of computing environment for modeling massive data sets
Simple presentation of ''massive'' complex data analyses
Representative terms from entire chapter:
data presentation