Information extraction from text, Week 4

The solutions should be ready for inspection by Thursday 7.3.2002 (midnight).

Remember that always, if you are in doubt what you should do, you can ask Reeta or send a message to our newsgroup!!

In this exercise, we study the algorithm of AutoSlog-TS. Assume, our text collection contains the following documents:


text 1; relevant

s: A group of terrorists
v: attacked
do: a post
pp: in Nuevo Progreso.


text 2; relevant

s: The National offices
v: were attacked
time: today.
s: Unidentified individuals
v: detonated
do: a bomb.
s: The bomb
v: destroyed
do: a car.


text 3; not relevant

s: The Armed Forces units
v: killed
do: one rebel.
s: They
v: destroyed
do: an underground hideout.


text 4; relevant

s: Unidentified individuals
v: attacked
do: a high tension tower.
s: They
v: destroyed
do: it.


text 5; not relevant

s: The coca growers
v: protest
do: the destruction of their fields.
s: The strike
v: is supported
pp: by the Shining Path.

Explain the process of AutoSlog-TS using these documents and give the ranking for the extraction patterns that are generated.

Abbreviations: s subject, v verb, do direct object, pp preposition phrase

More information: Riloff, E. "Automatically Generating Extraction Patterns from Untagged Text" Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96) , 1996, pp. 1044-1049.

Compare the methods for learning information extraction rules (AutoSlog, Crystal, AutoSlog-TS, Multi-level bootstrapping, ExDisco):
- What are the prerequisites of the method: What kind of input it needs and how this input should be processed/represented?
- Describe very briefly the basic idea of the algorithm.
- What is the output?
- What the human user has to do?

Helena Ahonen-Myka

Last modified: Wed Feb 27 19:03:55 EET 2002