Reading Group: Statistical Methods in Multilingual Information Access
Time: every second Thursday at 10-12
Place: A307
Description
In the reading group, state-of-the art statistical methods in multilingual
information access are studied. Problem domains of interest include statistical machine
translation (SMT) and cross-language information retrieval (CLIA),
and the probabilistic modeling and machine learning techniques that can be used in solving
these problems. The reading group is related to the EU project
SMART (Statistical Multilingual Analysis for Retrieval and
Translation).
The following topics are of interest:
- feature representations for natural language,
- alignment/translation models,
- language models/translation fluency,
- Markov models for SMT,
- statistical parsing,
- structured output learning
- latent semantic representations for natural language,
- probabilistic and language models for information retrieval.
Participants
Wray Buntine, Mätti Kääriäinen, Petri Myllymäki, Jussi Piitulainen, Juho Rousu, Marko Salmenkivi,
Ville Tuulos, Kimmo Valtonen, Matti Vuorinen, Roman Yangarber, Huizhen (Janey) Yu
Forms
For each meeting, we have a selected paper that everyone is expected to read
before the meeting. In addition, one of the participants will be the reader, 'esilukija',
who will guide us through the paper. Note that everybody is expected to have read the paper :)
Schedule
- Thu 16.10 Paper:
R. Kneser and H. Ney:
Improved backing-off for m-gram language modeling
In Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 181--184, 1995.
Reader: Matti K.
- Thu 2.11 Paper: P. E Brown, V.J. Della Pietra, S.A. Della Pietra, Robert L. Mercer:
The Mathematics of Statistical Machine
Translation: Parameter Estimation Computational Linguistics, Volume 19, Number 1, March 1993. Reader: Janey
- Thu 16.11 We continue with Brown et al...
- Thu 30.11 Paper:
- Thu 25.1.07 Paper:
C. Cortes, M. Mohri, J. Weston: A general regression technique
for learning transductions. ICML'2005, pp. 153 - 160
Reader: Juho
Literature
To be extended...
- T.L. Griffiths and M. Steyvers,
"Finding scientific topics,"
PNAS Colloquium, 2004.
- D. Klein and C.D. Manning,
"Corpus-based induction of syntactic structure: Models of dependency
and constituency,"
ACL, pages 478-485, 2004.
- C. Sutton and A. McCallum,
"An Introduction to Conditional Random Fields," in
Relational Learning. In Introduction to Statistical Relational Learning,
Edited by L. Getoor and B. Taskar. MIT Press. 2006.
- B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning,
"Max-margin parsing,"
in Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2004.