Computational methods for forming a nation-wide toponymic overview

Antti Leino

Presentation at the 14th NSA Names Congress, Ithala Game Reserve, 28th November 2006

Abstract

The geographical distribution of a single linguistic phenomenon -- for instance, a dialectal feature or a toponym -- has traditionally been presented in the form of a distribution map. Such maps are easy to read, but only when studying a few distributions at a time. The usefulness of these maps diminishes rapidly as the number of phenomena increases, and something else is needed for forming an overview of the entire system of toponyms for a region.

My goal is to present ideas on how computational data analysis can help in distilling the information from the distributions of some thousands of distinct features into a few easy-to-read maps. There are a variety of methods, such as independent component analysis, that can be used to find a small set of underlying variables that are independent from each other yet sufficient for explaining most of the large-scale variation in the distributions.

In demonstrating such methods I am using examples from the Geographic Names Register compiled by the Finnish National Land Survey. It contains all the toponyms that appear on the 1:20 000 scale Basic Map of Finland. An analysis of the names of such natural features as lakes or hills -- which are for the most part relatively old -- reveals structures that can be interpreted in relation to the settlement history of the country.


Antti Leino