ables whose values are not known but are generated from some probability distribution. For example, the number of people visiting a particular website on a given day is random. In order to “model”—or characterize the distribution of—this random variable, statistical quantities (or parameters) might be considered, such as the average number of visits over time, the corresponding variance, and so on. These quantities characterize long-term trends of this random variable and, thus, put constraints on its potential values. A better model for this random variable might take into account other observable quantities such as the day of the week, the month of the year, whether the date is near some major event, and so on. The number of visits to the website can be constrained or predicted by these additional quantities, and their relationship will lead to a better model for the variable. This approach to data modeling can be regarded as statistical modeling: although there are no precise formulas that can deterministically describe the relationship among observed variables, the distribution underlying the data can be characterized. In this approach, one can only guess a certain form of the relationship up to some unknown parameters, and the error—or what is missed in this formulation—will be regarded as noise. Statistical modeling represents a powerful approach for understanding and analyzing data (see McCullagh, 2002).
In what follows, the committee does not make a sharp distinction between “statistics” and “machine learning” and believes that any attempt to do so is becoming increasingly difficult. Statisticians and machine learners work on similar problems, albeit sometimes with a different aesthetic and perhaps different (but overlapping) skill sets. Some modeling activities seem especially statistical (e.g., repeated measures analysis of variance), while others seem to have more of a machine-learning flavor (e.g., support vector machines), yet both statisticians and machine learners can be found at both ends of the spectrum. In this report, terms like “statistical model” or “statistical approach” are understood to include rather than exclude machine learning.
There are two major lenses through which statistical models are framed, which are described briefly below.
The Frequentist View
The first viewpoint is from classical statistics, where models can take a variety of forms, as can the methods for estimation and inference. One might model the conditional mean of the response (target for prediction) as a parametrized function of the predictors (e.g., linear regression). Although not a requirement, this model can be augmented with an additive noise component to specify the conditional distribution of the response given the predictors. Logistic regression models the conditional distribution of