Seminář: Fuzzy ILP and Semantic Information Extraction from Texts

Datum a čas 15. 10. 2009 10:30 - 12:00
Místnost 403 NB

Fuzzy ILP and Semantic Information Extraction from Texts

Prezentující: Jan Dědek

We deal with linguistic information extraction from Czech texts from the Web. Our method exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0). We propose a system which captures text of web-pages, annotates it linguistically by PDT tools, extracts data and stores the data in an ontology. We present methods for learning queries over linguistically annotated data. Our experiments in the domain of reports of traffic accidents enable e.g. summarization of the number of injured people. Inductive Logic Programming plays one of the most interesting parts in our solution. We also present an ILP based approach for fuzzy classification of textual web reports. Our approach is based on Fuzzy Inductive Logic Programming. Main contributions are formal models, prototype implementation and some evaluation experiments.