Table A.1. Commonly used statistical techniques and their advantages and disadvantages.

Technique

Advantages

Disadvantages

Pearson’s Correlation

Well understood. Intuitive scale.

Linear, sensitive to outliers, not designed to find causal relationships.

Spearman’s Correlation

Well understood. Intuitive scale. Resistant to outliers.

Linear in the ranks. EOF/PCA Well understood. Efficient compression of large datasets. Linear. Sensitive to sampling errors. In most applications, requires the estimation of the dimensionality of the signal. If modes identification is desired, may require post processing with additional linear transformation.

Nonlinear (Complex) EOF/PCA

Can result in very efficient compression of data.

Sensitive to sampling errors. In most applications, requires the estimation of the dimensionality of the signal.

CCA/SVD

Well understood and applied often.

Linear. No guarantee that the cross-correlation or covariances are larger than the correlations within each variable. Often pre-processed by extracting EOFs to avoid this problem. May need post-processing with linear transformations if more than one field is desired.

Cluster Analysis

Divides data into groups based on distance.

Numerous cluster methods available that give different results when applied to a single data set using the same distance measure. Since it is an exploratory tool, does not contain rules for assigning membership to independent observations.

Compositing

Since it involves only averaging, it is well understood.

Unless careful pre-screening of data has been performed, it is possible that multiple modes may be averaged and unrepresentative results can emerge.

Discriminant Analysis

Well suited to separation of a finite number of categories if linear separability is present. Numerous variations exist to allow or outliers and unequal variance in the groups. Rules learned to classify can be applied to independent data.

Linear separability is not often present in large scale problems. Variable selection may be computationally intensive.

Regression

Well understood in basic form. Many variations exist for correlated predictors, nonlinear relationships, and when outliers are present.

Traditional multiple linear regression makes numerous assumptions that are rarely met in climate analyses.

Neural networks

Allow for fitting nonlinear relationships

Can be complicated to fit properly. Can be computationally intensive for large datasets. Does not give good guidance on the physics of a problem as there are no constant weights.

Kernel methods

Allow for fitting nonlinear relationships.

Must test for an appropriate kernel to fit. Can be computationally intensive for large datasets. Unless a linear programming approach is used, does not give good guidance on the physics of a problem as there are no constant weights.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement