582410 Processing of large document collections, Material
Lecture 15.3.
- Slides: [PostScript] [Power Point] [PDF]
Other material used in the lecture:
Lecture 17.3.
- Slides: [PostScript] [Power Point] [PDF]
Reading:
- (Recommended) Fabrizio Sebastiani: Text categorization. In Alessandro Zanasi (ed.), Text Mining and its Applications, WIT Press, Southampton, UK, 2005. Forthcoming.
- (Additional, more technical)Fabrizio Sebastiani: Machine learning in automated text categorization. ACM Computing Surveys. ( local copy, PostScript) (local copy, PDF)
Other material used in the lecture:
Lecture 22.3.
- Slides: [Power Point] [PDF]
Lecture 31.3.
- Slides: [Power Point] [PDF]
Reading:
- Schapire, Singer and Singhal, Boosting and Rocchio Applied to Text Filtering Proceedings of SIGIR-98, the 21st ACM International Conference on Research and Development in Information Retrieval
- H.P. Luhn, The Automatic Creation of Literature Abstracts, in "Advances in Automatic Text Summarization ", eds. Inderjeet Mani and Mark T. Maybury. Originally in IBM Journal of Research and Development, April 1958.
Lecture 5.4.
- Slides: [Power Point] [PDF]
Reading:
Kupiec, Pedersen, Chen: A trainable document summarizer. Proceedings of the 18th ACM-SIGIR Conference, p. 68-73, 1995. Also Chapter 5 (p.55-60) in Advances in automatic text categorization, eds. Mani, Maybury. The MIT Press, 1999. [Local copy (PDF)]
Other material used in the lecture:
Lecture 7.4.
- Slides: [Power Point] [PDF]
Reading:
Boguraev, Kennedy: Salience-based content characterisation of text documents. Chapter (p.99-110) in Advances in automatic text categorization, eds. Mani, Maybury. The MIT Press, 1999.
Lecture 12.4.
- Slides: [Power Point] [PDF]
Reading:
Radev, Jing, Stys, Tam: Centroid-based summarization of multiple documents. Information Processing and Management, 40, 2004.
McKeown, Robin, Kukich: Generating concise natural language summaries. Information Processing and Management, 31 (5), 1995. Also Chapter 16 (p.233-263) in Advances in automatic text categorization, eds. Mani, Maybury. The MIT Press, 1999.
Lectures 14.4. and 19.4.
- Slides:
Information extraction process:[Power Point] [PDF]
Portability of IE systems: [Power Point] [PDF]
Reading:
Lecture 21.4.
- Slides: [Power Point] [PDF]
Reading:
Riloff: Automatically Constructing a Dictionary for Information Extraction Tasks (AutoSlog). Proceedings of the 11th National Conference on Artificial Intelligence (AAAI-93), 1993, p. 811-816.
Riloff: Automatically Generating Extraction Patterns from Untagged Text (AutoSlog-TS). Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), 1996, p. 1044-1049.
Riloff, Jones: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping. Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), 1999, p. 474-479.
Other material:
Lecture 26.4.
- Slides: [Power Point] [PDF]
Reading:
Moldovan, Harabagiu, Pasca, Mihalcea, Goodrum, Girju and Rus, The Structure and Performance of an Open-Domain Question-Answering System Proceedings of the 38th Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, October 2000, pages 563-570.
Cooper, Rüger. A Simple Question Answering System. TREC 2000.
Aunimo, Makkonen, Kuuskoski. Cross-Language Question Answering for Finnish. Proceedings of the Web Intelligence Symposium, held at the Finnish Artificial Intelligence Conference, September 2004.
Links:
WordNet, a lexical database for the English language [online search]
MOT Dictionary (access (at least) from the machines of the university)
Lecture 28.4.
- Slides: [Power Point] [PDF]