Seminář: RExtractor: a Robust Information Extractor

Datum a čas 21. 5. 2015 10:30 - 12:00
Místnost 336 RB

RExtractor: a Robust Information Extractor

Prezentující: Vincent Kríž

We have presented our initial steps towards a linguistic processing of texts to detect entities and relations between them a year ago. This work was an essential part of the INTLIB project whose aim is to provide a more efficient and user-friendly tool for querying textual documents other than full-text search. Now we present the RExtractor system that processes input documents by natural language processing tools and consequently queries the parsed sentences to extract a knowledge base of entities and their relations. A workflow of the system is designed to be language and domain independent. We demonstrate RExtractor on Czech and English legal documents. In addition, we discuss RExtractor with respect to its deployment in search engines used by customers from a particular domain.