Question Answering

Question answering (QA) is a task where the information need of the user is formulated as a natural language question and where the answer is given in natural language as well. In general, the length of the answer varies from one word to a couple of sentences, depending on the question. Also those situations where the system is not able to provide an answer should be detected. QA systems can be designed to work on a specific domain only (e.g. an aid for a company help desk) or they can be general purpose systems (e.g. TREC and CLEF QA Tracks, AskJeeves). In general, question answering systems use unstructured text documents as their database, but in addition, they can use lists of FAQs (Frequently Asked Questions) and structured databases. In open domain QA systems, the Web is a major source of information. However, much of the research on QA systems is concentrated in building systems for the CLEF and TREC evaluation campaigns. In these campaigns, the main database is newspaper text.

Cross-language Question Answering

Cross-language question answering means that the question is expressed in another language than that in which the documents from which the answer is extracted are written. In this case, the user can use one language to search information from documents written in one or more other languages. This is useful, because it would be tiresome to write the question over and over again in many languages, and also because many users have a good passive knowledge of several languages, but their active knowledge is more restricted. In our research, the questions can be expressed in Finnish and the document collection is in English. However, we believe that our methods can be extended to handle questions and documents in other languages as well. Cross-language QA is usually implemented either by first applying machine translation to the question and then passing it on to a monolingual QA system or by integrating cross-language processing into the QA system. Our approach is the latter one, because there is no reliable off-the shelf machine translation software for Finnish. In addition, we expect to improve our results by using the original question as the basis of processing for as long as possible, because when translation is performed, the information content of the question is almost always altered.

The work is on-going: PhD students involved in QA-related research are: Lili Aunimo, Reeta Kuuskoski and Juha Makkonen.

Links:

Resources for Finnish-English QA developed in Doremi Research Group.
TREC QA, a monolingual track for English.
CLEF QA, mono- and cross-language tracks for several European languages.
Examples of domain specific QA systems: Aunimo et al. (2003): a company help desk, Busemann et al. (2000): message classification in a call center and Lamontagne and Lapalme (2003): an email response system.