Choosing data vectors representing a huge data set. Kohonen's SOM applied to the Kefallinia erosion data.

Citation:

Bartkowiak, A., Vassilopoulos, A., & Evelpidou, N. (2003). Choosing data vectors representing a huge data set. Kohonen's SOM applied to the Kefallinia erosion data.. In 1st International Conference on Environmental Research & Assessment.

Abstract:

We consider a large set of data comprising N=3422 data vectors, each containing observations on p=3 variables. We find for these data representative data vectors. We do it by employing the methodology of Kohonen's self-organizing maps. The found representative data vectors are called codebook vectors. In particular we analyze two collections (assemblages) of codebook vectors counting m=275 and m=120 elements. The quantity of the representation is measured by evaluating two errors: the quantization error q1 and the topological error q2. We show for our data that the magnitude of these errors depends on the way the original data were
standardized. After a thorough graphical analysis of the results we came to the conclusion that codebook vectors obtained from data standardized by range yield a little better representation as those do which were obtained from data standardized by variance. None of the representations is satisfactory from our point of view.