Information extraction from text, Week 1



The solutions should be ready for inspection by Thursday 7.2.2002 (midnight).

Please submit the URL of your exercise "homepage" by using the registration form. And please do this only once in the beginning of the course. In case of changes or errors, contact Reeta Kuuskoski.


  1. The goal of this exercise is to learn (or brush up) some linguistic terminology that we need in the course. As a material we use the resources of the web site Guide to Grammar and Writing. [Some terminology in Finnish]

    Study the page The Garden of Phrases and check your knowledge using the Quiz on Recognizing the Function of Phrases. As your "answer" to this exercise you should do the following:

    You may want to check these pages, as well:



  2. Study the following document fragments.

     
    Police sources have reported that
    unidentified individuals planted a bomb in front of a Mormon Church in
    Talcahuano District. The bomb, which exploded and caused property
    damage worth 50,000 pesos, was placed at a chapel of the Church of
    Jesus Christ of Latter-Day Saints located at No 3856 Gomez Carreno
    Street.
    
    Prosecutor Juan Carbone Herrera requested the 25 years imprisonment
    for General Rolando Cabezas Alarcon of the Republican Guard for
    ordering the shooting of 124 of the San Pedro prison inmates.
    
    Last night in San Clemente District, 9 km north of Pisco, a
    group of terrorists dynamited machinery belonging to Albolones
    Peruanos, Inc.
    

    Try to figure out what might happen in the lexical analysis and name recognition phases when these texts are processed. For instance,

    1. give examples of information that is available in the lexical analysis of these sentences. You can assume that some language analyzer is used. Thus, you might find useful the list of tags the Conexor analysers use in their output (particularly 'Morphological tags'). You don't have to analyse all the text, just give some examples using the sample text fragments. (You can also play with the Conexor analyzer FDG or with the Lingsoft ENGTWOL or FINTWOL on the web.)

    2. give examples of names and special forms in the sample fragments. Try to formulate informal rules for finding the names and special forms, using the knowledge you found above in (1). You can also try to formulate the rules using regular expressions.



    Helena Ahonen-Myka
    Last modified: Wed Feb 6 11:32:11 EET 2002