Lecture
Template rule development for Information Extraction: the NeT method
Speaker: |
Kelly Zervanou |
Date: |
Friday, 29 May 2009 |
Time: |
11:00-13:00 |
Location: |
"Mediterranean Studies" Seminar Room, FORTH. Heraklion, Crete |
Host: |
Prof. V. Christophides |
Abstract: |
Information Extraction (IE)
is becoming increasingly important for the semantic analysis of
free-text documents stored in large document repositories, such
as the Web. Once free-text is analysed for the recognition of
concepts and concept interrelations in events and facts of interest,
the resulting structured information becomes a valuable knowledge
resource. This resource can be of further use in other information
management technologies, such as document summarisation, ontology
development, semantic document indexing, question answering, etc.,
or can be further exploited by data mining and reasoning technologies. A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are typically based on pre-defined linguistic patterns. One of the current IE research objectives is the automatic or semiautomatic acquisition of these rules. Typically, current approaches to this problem rely on training text data or existing knowledge resources, such as domain ontologies. Within this research framework, we propose a knowledge-poor methodology for rule pattern acquisition. Our proposed NeT method for knowledge acquisition in IE aims at facilitating the development and customisation of IE systems. It is a data-centric approach which neither requires any manually annotated documents, nor any preexisting domain knowledge resources. The NeT method is based on the hypothesis that terms (the linguistic representation of concepts in a specialised domain) and Named Entities (e.g., the names of persons, organisations and dates of importance in the text) can together be considered as the basic semantic entities of textual information and can therefore be used as a basis for the conceptual representation of domain specific texts. The extraction patterns discovered by this approach involve significant associations of these semantic entities with verbs and they can subsequently be translated into the grammar formalism of choice. The proposed NeT method has been implemented in a demonstrator application by exploiting a combination of existing (ENGCG, BSEE, C/NC value) and custom developed tools. The potential of the method has been put to the test by evaluating it against manually annotated data, showing very promising results. |
Bio: |
Kalliopi Zervanou is Associate Researcher at the Technical University of Crete
(TUC), Dept. of Electronics & Computer Engineering. She received her Bachelor in
French Literature & Linguistics from Aristoteles University of Thessaloniki, an MSc
in Machine Translation and a PhD in Information Extraction from the UMIST and the
University of Manchester. She has worked as Researcher at the Dept. of Computation,
UMIST in the CONCERTO (ESPRIT n.29159: CONCEptual indexing, querying and
ReTrieval Of digital documents) and PARMENIDES (IST-2001-39023: Ontology
driven Temporal Text mining on organisational data for extracting temporal valid
knowledge) projects. She joined the Technical University of Crete Dept. of
Electronics & Computer Engineering in 2005, where she worked as one of the
principal investigators for the Information Extraction and Ontology Development
components for the TOWL Project (Time-determined ontology based information
system for real time stock market analysis). Her research interests include information
extraction, knowledge acquisition and representation techniques, development of
linguistic resources, terminology extraction, automatic summarisation and machine
translation. |