582602 Natural Language Processing
(8 cp, 4 cu)Lectures: 5 Sep 2006-11 Oct 2006, 31 Oct - 5 Dec 2006, Tue 10-12, Room B222
Exercises: 14 Sep - 13 Oct 2006, 2 Nov - 7 Dec 2006 Thu 12-14, Room C221
Results
The results are now here.Goals
To provide students with the basic foundations in the field of Natural Language Processing and Computational Linguistics. The course will introduce the students to:- the range of problems that this field deals with, and the state of the art
- the standard methods it employs, how they are applied, and evaluated.
- attend higher-level seminars on advanced or special topics in NLP
- advance their understanding by taking courses in related subjects, (e.g., machine learning.)
- participate in research projects in related areas
Synopsis
Rule-based and statistical linguistic analysis:- morphology,
- part-of-speech tagging,
- language modeling,
- name classification,
- grammars and parsing,
- shallow syntax/chunking,
- semantics,
- word sense disambiguation,
- discourse.
Applications that combine several levels of analysis: information extraction.
Assignments/Exercises, Project work, no Exam.
Text: D. Jurafsky & J.H. Martin Speech and Language Processing [J&M]
Pre-requisites: Data Structures,
Models of Programming and Computing.
Basic programming skills, interest in language or text.
Basic familiarity with these topics:
Finite state automata (FSA), regular expressions, regular
languages (e.g., J&M, chapter 2)
Course Materials
Found here (requires local access). Contains:- Lecture notes
- Course Wiki
- Assignments
- Additional materials
Tentative Schedule:
- Week 36: (2006.09.05)
-
Lecture: Introduction to NLP (RY)
- Week 37: (2006.09.12)
-
Lecture: High-level Application: Text Understanding and Information
Extraction (RY)
Exercise session: (2006.09.14)- Assignment 1: Manual annotation of facts in documents.
- Tutorial: Annotation and Evaluation tools for IE
- Week 38: (2006.09.19)
-
Lecture: Levels of analysis for IE (RY)
Exercise session: (2006.09.21)- Review Assignment 1 (part 1).
- Tutorial on development environment for IE
- Introduce Project: build/customize simple IE system
- Week 39: (2006.09.26)
-
Lecture: Morphology. Transducers (RY)
Exercise session: (2006.09.28)- Assignment 1 due (part 2).
- Assignment 2: finite state morphology.
- Tutorial on FS morphology/PC-Kimmo
- Introduce Project: Two-level Morphological analysis (non-English)
- Week 40: (2006.10.03)
-
Lecture: Language modeling. N-Grams. Spelling correction (RY)
Exercise session: (2006.10.05)- Assignment 3: N-grams.
- Introduce Project: N-grams and Spelling correction.
- Week 41: (2006.10.10)
-
Lecture: Syntax. Parsing [J&M, chap. 11] (GL)
Exercise session: (2006.10.12)- Assignment 4: CFG and Parsing (short).
- Week 42: (2006.10.17)
-
(No course meetings)
- Assignment 2 due (deadline moved from week 40).
Email solutions to teachers.
- Assignment 2 due (deadline moved from week 40).
- Week 43: (2006.10.24)
-
(No course meetings)
- Week 44: (2006.10.31)
-
Lecture: Parsing. [J&M, chap. 12] (GL)
Exercise session: (2006.11.02)- Assignment 3: due.
- Assignment 5: Parsing II
- Introduce Project: implement simple Grammar for Parsing, tools.
- Week 45: (2006.11.07)
-
Lecture: Parsing: Shallow Parsing/Chunking. [J&M, chap. 12] (GL)
Exercise session: (2006.11.09)- Assignment 4: due. (deadline extended to 16 Nov, see course wiki, task and Q&A)
- Week 46: (2006.11.14)
-
Lecture: Part of speech tagging, HMMs (RY)
Exercise session: (2006.11.16)- Assignment 4: due.
- Assignment 5: due.
- Assignment 6:.
- Introduce Project: Implement simple POS tagger.
- Week 47: (2006.11.21)
-
Lecture: Lecture 10.a: HMMs/Algorithms (RY)
Exercise session: (2006.11.23)- Lecture continuation, 10.b: HMM Training(RY)
- Week 48: (2006.11.28)
-
Lecture: 11.a: Word sense disambiguation: supervised methods (RY)
Exercise session: (2006.11.30)- Lecture continuation, 11.b: Unsupervised WSD, (Yarowsky)(RY)
- Introduce Project: Word sense disambiguation.
- Week 49: (2006.12.05)
-
Lecture: 12: Semantics: Distributional similarity (JP)
Exercise session: (2006.12.07)- Lecture 13: Automatic acquisition of semantic knowledge (RY)
- Assignment 6: due.
Project work
During the course, there will be 5 or 6 suggested mini-projects, plus shorter exercises. Each student will be expected to do 3 of the mini-projects. (Each mini-project may require between 3 and 4 weeks of work.)Grading
No exam. Students are graded based on their project work and their completed exercisesRegistration
Register through the department registration system from 24 August 2006.Contact
Department of Computer Science Street address: P.O. Box 68 Exactum Building, Room A223 FIN-00014 University of Helsinki Gustaf Hällströmin katu 2B FinlandRoman Yangarber
(Page layout < O. Heinonen < M. Raento < G. Lindén)