Seminář: Extracting Structured Data about Product and Job Offers from Semi-Structured Web

Datum a čas 5. 4. 2012 10:30 - 12:00
Místnost 403 NB

Extracting Structured Data about Product and Job Offers from Semi-Structured Web

Prezentující: Aleš Pouzar

The presentation is focused on practical aspects of information extraction based on extraction ontologies. Several experiments were performed with the Ex system using three types of extraction knowledge: manually written rules, formatting regularities and machine learning models. Development of extraction ontologies is demonstrated on the e-commerce and job offers domains. The goal is to obtain structured data of high granularity which can be useful for populating domain ontologies or in real applications like product comparison and job offer search. Advantages of the presented approach as well as its limitations and constraints in selected domains and their possible solutions are mentioned. (Slides are in Czech.)