# Dimensionality reduction

In statistics, dimensionality reduction is mapping a multidimensional space into a space of fewer dimensions. It is sometimes the case that analysis such as regression or classification can be carried out in the reduced space more accurately than in the original space.

Consider a string of beads, first 100 black and then 100 white. If the string is wadded up, a classification boundary between black and white beads will be very complicated in three dimensions. However, there is a mapping from three dimensions to one dimension, namely distance along the string, which makes the classification trivial. Unfortunately, a simplification as dramatic as that is rarely possible in practice.

To give a more realistic example, consider the atmospheric state: monthly average surface pressure perhaps, represented on a one degree grid. This will vary, month-by-month and point-by-point. If the correlation matrix of the data is constructed and the eigenvectors found (see principal components analysis) and listed in eigenvalue order, then (usually) just the first few eigenvectors can be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behaviour of the system. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors.

Dimensionality reduction without loss of information is possible if the data in question fall exactly on a smooth, locally flat subspace; then the reduced dimensions are just coordinates in the subspace. More commonly, data are noisy and there does not exist an exact mapping, so there must be some loss of information. Dimensionality reduction is effective if the loss of information due to mapping to a lower-dimensional space is less than the gain due simplifying the problem.

Dimensionality reduction methods can be classified as linear or nonlinear methods. Linear methods attempt to find a globally flat subspace, while nonlinear methods attempt to find a locally flat subspace. As is the case with other problems, linear methods are simpler and more completely understood, while nonlinear methods are more general and more difficult to analyze.

• Art and Cultures
• Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
• Space and Astronomy