Computational Linguistics
Research Group
applications in digital humanities and social sciences
The Group conducts research on core problems in NLP
-
how language conveys information,
how information can be extracted from text,
how latent structure can be learned from observed data.
People:
|
|
Recent alumni:
|
|
Projects:
- PULS:
-
Analysis of big-data streams of news media.
Information Extraction: finding facts and events in text, and reasoning over
extracted data.
Methods: neural networks, supervised and weakly-supervised machine learning.
Domains: general news, business intelligence, epidemiological surveillance, cross-border security and crime. Collaboration: Please see project page for partners. Funding: Tekes/BusinessFinland, European Commission
Revita:
-
Computational modeling to support language learning.
Revitalization of endangered languages from the Finno-Ugric, Turkic, and other language families.
Collaboration:
University of Helsinki
Department of Modern Languages,
Department of Finnish, Finno-Ugrian and Scandinavian Studies
YLE, Opetushallitus, University of Jyväskylä, Università degli Studi di Milano Funding: Academy of Finland, Project FinUgRevita.
Etymon:
-
Computational models of language evolution.
Modeling how Finnish is genetically related to the Uralic language family, based on data in etymological databases.
Methods: information theory, Minimum Description Length principle (MDL).
Applying the methods beyond the Uralic family—Turkic, Indo-European, Khoisan. Collaboration:
-
Russian Academy of Sciences (RAS), Institute of Linguistics. The
StarLing
Project, a collection of etymological databases for many
language families of the world.
KOTUS: enhancement of the Finnish etymological dictionary
"Suomen Sanojen
Alkuperä".
(The database is proprietary, and will be released for public access soon.)
SIGSLAV: Special Interest Group of the Association for Computational Linguistics on NLP for Slavic languages
Previous projects
-
Etymological BANANAS:
- Clarin:
-
the EU Clarin project for building infrastructures for
linguistic resources.
ContentFactory:
-
Research: Modeling genetic relationships among
members of a language family, using methods from population
genetics.
Application to etymological data from different language families, starting with Uralic and Turkic. Collaboration:
-
J Corander's
group, at the Department of Mathematics and Statistics:
population-genetics models.
Part of the COIN Center of Excellence of the Academy of Finland Russian Academy of Sciences: analysis of etymological databases, the StarLing Project KOTUS: etymological databases of the Uralic language family.
-
Collaboration between PULS and
the TermFactory
project
at the Department of
Modern Languages: large-scale ontologies for text-analysis tasks.