Etymon Project
Research Focus:
We develop computational methods for modeling-
language evolution, and
relationships among languages within language families.
Supported by:
-
Academy of Finland, the UraLink Project,
Russian Fund for
the Humanities /
Russian Foundation for Basic Research,
HIIT: Helsinki Institute
for Information Technology,
Algodan Center of Excellence: Algorithmic Data Analysis.
Language evolution and etymology:
Unsupervised Learning of Morphology:
-
Evaluation package
from the LREC-2016 paper
Transliteration:
Resources:
-
StarLing:
collection of etymological databases
for many language families of the world. Web-based etymological database,
Suomen Sanojen Alkuperä (SSA):
Finnish Etymological Dictionary
(To be published on the Web by Kotus,
please contact us to request access) Wiki: Etymon Project (Internal use) Wiki: Computational historical linguistics
and modeling population history (Internal use)
People:
-
Mian Du, PhD student
Suvi Hiltunen, MSc student
Guowei Lv, MSc student
Javad Nouri, MSc student
Kirill Reshetnikov,
Russian Academy of Sciences,
Institute of Linguistics, Moscow Arto Vihavainen, MSc student Marjaana Välisalo, MSc student Hannes Wettig, PhD student Roman Yangarber: Project Lead
Collaboration:
-
Russian Academy of Sciences (RAS),
Institute of Linguistics.
Bayesian
Statistics Group, led by J Corander, Department of Mathematics and
Statistics, COIN
Center of Excellence of the Academy of Finland.
Applying models from population-genetics to linguistic data to model language evolution.
KOTUS: analysis and enhancement of data in the Finnish etymological dictionary
"Suomen Sanojen Alkuperä".
(This database is proprietary, and will be released for public
access soon.
Please contact us or KOTUS to request permission to access.)
Publications: conference and journal papers, book chapters, dissertations
-
Modeling language evolution with codes that utilize context
and phonetic features
(pdf)
Javad Nouri, Roman Yangarber In Proceedings of CoNLL: 2016 Conference on Computational Natural Language Learning
(2016) Berlin, Germany From alignment of etymological data to phylogenetic inference via population genetics (pdf)
Javad Nouri, Jukka Sirén, Jukka Corander, Roman Yangarber In Proceedings of CogACLL: the 7th Workshop on Cognitive aspects of Computational Language Learning colocated with ACL-2016
(2016) Berlin, Germany Minimum Description Length Models for Unsupervised Learning of Morphology (Master's Thesis)
Javad Nouri (2016) University of Helsinki, Department of Computer Science A novel method for evaluation of morphological segmentation (pdf)
Javad Nouri, Roman Yangarber In Proceedings of LREC: 10th International Conference on Language Resources and Evaluation
(2016) Portorož, Slovenia Measuring Language Closeness by Modeling Regularity (pdf)
Javad Nouri, Roman Yangarber In Proceedings of the EMNLP 2014 Workshop on Language Technology for Closely Related Languages and Language Variants
(2014) Doha, Qatar Cognate discovery and alignment in computational etymology (Master's Thesis)
Guowei Lv (2014) University of Helsinki, Department of Computer Science MDL-based Models for Transliteration Generation (pdf)
Javad Nouri, Lidia Pivovarova, Roman Yangarber SLSP 2013: International Conference on Statistical Language and Speech Processing
Springer Verlag, Lecture Notes in Artificial Intelligence (LNAI) Volume 7978, (2013) Tarragona, Spain Information-theoretic modeling of etymological sound change (abstract)
Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber Invited chapter in Approaches to measuring linguistic differences (Lars Borin, Anju Saxena, eds.) Trends in Linguistics Series, Volume 265.
(2013) Mouton de Gruyter Probabilistic, Information-Theoretic Models for Etymological Alignment (Ph.D. Thesis) Hannes Wettig (2013) University of Helsinki, Department of Computer Science Information-theoretic Methods for Analysis and Inference in Etymology (pdf)
Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber In Proceedings of WITMSE-2012: the 5th Workshop on Information-theoretic Methods in Science and Engineering  (Steven de Rooij, Wojciech Kotłowski, Jorma Rissanen, Petri Myllymäki, Teemu Roos & Kenji Yamanishi, eds.)
(2012) Amsterdam, the Netherlands Minimum Description Length Modeling of Etymological Data (Master's Thesis)
Suvi Hiltunen (2012) University of Helsinki, Department of Computer Science Using Context and Phonetic Features in Models of Etymological Sound Change (pdf)
Hannes Wettig, Kirill Reshetnikov and Roman Yangarber. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) Workshop on Visualization of Linguistic Patterns and Uncovering Language History from Multilingual Resources
(2012) Avignon, France MDL-based models for alignment of etymological data (pdf)
Hannes Wettig, Suvi Hiltunen, Roman Yangarber. RANLP-2011: Conference on Recent Advances in Natural Language Processing
(2011) Hissar, Bulgaria MDL-based modeling of etymological sound change in the Uralic language family
Hannes Wettig, Suvi Hiltunen, Roman Yangarber. WITMSE-2011: The 4th Workshop on Information Theoretic Methods in Science and Engineering
(2011) Helsinki, Finland Probabilistic models for alignment of etymological data (pdf)
Hannes Wettig, Roman Yangarber. Nodalida-2011: Nordic Conference on Computational Linguistics
(2011) Riga, Latvia Hidden Markov models for induction of morphological structure of natural language
Hannes Wettig, Suvi Hiltunen, Roman Yangarber. WITMSE-2010: Workshop on Information Theoretic Methods in Science and Engineering
(2010) Tampere, Finland