Seminář: Another approach to Information Extraction using Extended Ontologies

Datum a čas 1. 6. 2006 10:30 - 12:00
Místnost 403 NB

Another approach to Information Extraction using Extended Ontologies

Prezentující: Marek Nekvasil

The purpose of this work is to bring in an extension of advanced knowledge models, known as ontologies, so that they can be utilized in the process of automated information extraction from the web documents. Firstly we describe current approaches to information acquisition from web document by using so called wrappers and we also describe various methods to create such wrappers with as high degree of automation as possible. We comment theese methods with respect to our intention to utilize the extended ontologies in the process of automated creation of wrappers. Next we aim ourselves to the description of the ontology notation standards and to the proposed extension of OWL, one of the ontology languages. This extension is meant to bring in the possibility to include templates for the common values of properties of the extracted class in the ontology. These templates are designed to enable their composition of hierarchically ordered partial patterns. A few such patterns are also proposed. Then we present a proposition and derivation of an inference model, based on principles of fuzzy logic, for evaluation of the pattern matches and their combination into a template. This model can be used to automatically annotate the examples of properties of the extracted class in the document. Finally we proposed a simple type of wrapper and a way it can be learned automatically using the formerly proposed method of automated annotation with an extended ontology.