Seminář: Information Extraction using Presentation Ontologies
Datum a čas | 30. 3. 2006 10:30 - 12:00 |
---|---|
Místnost | 403 NB |
Information Extraction using Presentation Ontologies
Prezentující: Martin Labský
We describe an approach to information extraction that attempts to integrate diverse sources of extraction knowledge. The aim of our IE system under construction is to perform reasonably well under a broad scale of scenarios, with large differences in amounts of training data, manually specified patterns and document formatting structure. In our approach, the user initially creates a presentation ontology which describes the to–be–extracted objects both from a domain and a presentation point of view. Training data can then be used to improve extraction performance. Initial experience with extracting computer monitor descriptions from heterogeneous websites will be presented.