Seminář: Information Extraction using Presentation Ontologies

Datum a čas 30. 3. 2006 10:30 - 12:00
Místnost 403 NB

Information Extraction using Presentation Ontologies

Prezentující: Martin Labský

We describe an approach to information extraction that attempts to integrate diverse sources of extraction knowledge. The aim of our IE system under construction is to perform reasonably well under a broad scale of scenarios, with large differences in amounts of training data, manually specified patterns and document formatting structure. In our approach, the user initially creates a presentation ontology which describes the to–be–extracted objects both from a domain and a presentation point of view. Training data can then be used to improve extraction performance. Initial experience with extracting computer monitor descriptions from heterogeneous websites will be presented.