Current events

13.2.2017 - 12.5.2017 - Spring Semester Classes

26.4.2017 - Rector's Day

15.5.2017 - 23.6.2017 - Examination Period

More... »

Search
Advanced search




Research group: Data Mining and Knowledge Discovery (DMKD)

Related pages

DMKD’s color among the KIZI groups is blue, referring to the ‘color’ of the ‘oceans’ of data that can be submitted to data mining and knowledge discovery tools.

News

  • January 23-26, 2017: Prof. Johannes Fürnkranz from university of Darmstadt will visit the group and teach a guest course on Inductive Rule Learning.
  • 31 November 2016: The JWS journal paper by T. Kliegr, joint with  O. Zamazal (SWOE group) was ranked 3rd in the annual Rector’s award. Congratulations!
  • October, 2016: The project “Pilot application for distributed analysis of big data” (led by T. Kliegr), funded by the CESNET association, has been successfully defended.
  • October 4, 2016: Tomáš Kliegr gave a talk on association rule classification and the EasyMiner system developed by the DMKD group at the IEEE Days at the University of West Bohemia.
  • October 2016: A new PhD student, Jiří Zettel, joined the group. He will be supervised by prof. Petr Berka and his topic will be related to data pre-processing for data mining.
  • September 20, 2016: Stanislav Vojíř has successfuly defended his PhD thesis “Business Rule Learning using data mining of GUHA association rules”. He remains member of our team.
  • August 2016: The LHD dataset developed by our group is available for download as part of the DBpedia 2015 release.
  • June 2016: A new paper by Tomáš Kliegr  (and Ondřej Zamazal): LHD 2.0: A text mining approach to typing entities in knowledge graphs appears in Elsevier’s Journal of Web Semantics. It follows up with the recent  LHD paper in the same journal.
  • April 2016: T. Kliegr and his team presented EasyMiner at the popular Machine Learning Meetup.

(For older news see page bottom)

Research focus

The Data Mining and Knowledge Discovery (DMKD) group at KIZI (one of its four research groups, overarched by the virtual Knowledge Engineering Group) undertakes research in analyzing various kinds of data in structured, semi-structured and textual form, and deriving useful knowledge from it. The focal areas of the group currently are:

  • Developing new tools for dealing with domain knowledge in data mining
  • Developing new tools for automation of data mining
  • Accessing “classical” data mining tools via a web interface, using a “web search” metaphor, and sharing data mining results in structured form over the web
  • Extracting structured data from free or semi-structured text
  • Disambiguation of textual entities, their open-class classification and linking to semantic resources (such as DBpedia)
  • Web / multimedia usage mining and user preference learning.

The research on data mining had been present at the Department long before this term became coined: tools for combinatorial data analysis (KAD) and “learning an expert system from observational data” (ESOD, later re-implemented by P. Berka as KEX), both derived from the even earlier GUHA method, appeared, under supervision of J. Ivánek, in early 1980s. Since mid 1990s the flagship datamining tool of KIZI has been LISp-Miner system (conceived by J. Rauch and developed by M. Šimůnek), currently after a major redesign centered around the new LM Workspace module and with scripting support based on LISp-Miner Control Language. Most recently, a family of web-oriented data mining tools arose under the leadership of T. Kliegr, such as EasyMiner.eu (in 2011) integrating the CMS-based reporting tool SEWEBAR, leveraging on background knowledge.

In parallel, there is ongoing work on mining from texts, with special focus on Wikipedia: the Targeted Hypernym Discovery method (THD, now part of the EntityClassifier.eu tool) and the associated LHD dataset.

The research has been supported by a number of research projects. The most important had recently been LinkedTV, an Integrated Project funded by the EU FP7 (2011-2015), under which the text mining tool EntityClassifier.eu and the InBeat.eu recommender have been developed (under the supervision of T. Kliegr). There have also been several CSF (Czech Science Foundation) projects, coordinated by J. Rauch. Newly, the group is engaged in the EU Horizon 2020 project OpenBudgets.eu, where analyses of fiscal data are taking place.

The group also co-organized several international events, most notably, RuleML 2014  (T. Kliegr, J. Rauch) and ISMIS 2009 (J. Rauch, P. Berka), several editions of the ECML/PKDD Discovery Challenge (P. Berka) and of the Linked Data Mining Challenge (V. Svátek).

Team

Group leaders: Jan Rauch, Tomáš Kliegr, Petr Berka, Milan Šimůnek

Other group members:

  • Faculty: David Chudán, Jiří Ivánek, Vojtěch Svátek, Stanislav Vojíř
  • Project worker: Jaroslav Kuchař (primarily at CTU, Prague)
  • PhD students: Viktor Nekvapil, Václav Zeman, Jiří Zettel
  • MSc students: Přemysl Václav Duben, Linda Horáková, Bohuslav Koukal

Past members: Jan Bouchner, Barbora Červenková, Milan Dojchinovski, Ivo Lašek, Andrej Hazucha, Martin Labský, Jan Nemrava, David Pejčoch, Radek Škrabal.

Collaborations

Within the University, the DMKD group mainly cooperates with

  • The Semantic Web and Ontological Engineering (SWOE) group within the same department. In particular, SWOE promotes the achievements of TMWE, such as the Linked Hypernym Dataset (LHD), in the semantic web and Linked Data community. There is also joint research in the field of background knowledge for text mining (e.g. the Ex information extractor project).
  • The business intelligence group (led by Dr. Ota Novotný) at the neighboring Dept. of Information Technology. The overlaping interest is in applying data mining techniques on the top of OLAP-powered data warehouses. Since August 2014, three DMKD members directly cooperate on DIT’s BI-oriented project funded by TACR, the Technological Agency of the Czech Republic.

Within the Czech Republic, there is lasting cooperation with the Web Intelligence group (led by Dr. Tomáš Vitvar) at the Czech Technical University. In particular, PhD students from the CTU group (M. Dojchinovski, J. Kuchař and I. Lašek) have been directly involved in research activities of the LinkedTV project.

At the international level, the group collaborates with numerous foreign partners, either within EU projects (in particularly, the EU FP7 IP LinkedTV project) or on informal basis. Examples of such joint research are:

  • Linked data mining, with University of Mannheim and University of Darmstadt
  • User interest mining, with University of Mons, Belgium (Numediart institute)
  • Action rules mining, with University of North Carolina, Charlotte, US
  • Logical calculi for data mining, with Technical University of Tampere, Finland

Selected recent publications

    • Kliegr T., Zamazal O.: LHD 2.0: A text mining approach to typing entities in knowledge graphs. J. Web Semantics, Volume 39, August 2016, 47-61.
    • Kliegr T.: Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery. J. Web Semantics, Volume 31, March 2015, 59-69, 2015
    • Rauch J., Šimůnek M.: Data Mining with Histograms – A Case Study. In: ISMIS 2015. Springer, LNCS.
    • Rauch J.: Formal Framework for Data Mining with Association Rules and Domain Knowledge – Overview of an Approach. Fundamenta Informaticae, 2015, Vol. 137, No. 2, 171–217.
    • Rauch J., Šimůnek M.: Dobývání znalostí z databází, LISp-Miner a GUHA. Oeconomica, 2014. 462 pages. ISBN 978-80-245-2033-9.
    • Fürnkranz J., Kliegr T.: A Brief Overview of Rule Learning. In: RuleML 2015: 54-69.
    • Rauch J., Šimůnek M.: Learning Association Rules from Data through Domain Knowledge and Automation. In: Rules on the Web (RuleML 2014). Springer LNCS, 2014, .
    • Šimůnek M., Rauch J.: EverMiner Prototype Using LISp-Miner Control Language. In: Foundations of Intelligent Systems (ISMIS 2014). Springer LNCS.
    • Šimůnek M.: LISp-Miner Control Language description of scripting language implementation. Journal of systems integration, 2014, Vol. 5, No. 2, online.
    • Rauch J.: Observational Calculi and Association Rules. Studies in Computational Intelligence, Vol. 469, Springer, 2013.
    • Kuchař J., Kliegr T.: GAIN: web service for user tracking and preference learning – a smart TV use case. In: RecSys ’13, ACM, 2013.
    • Chudán D., Svátek V.: Advanced Mining of Association Rules over Periodic Snapshots in a Data Warehouse. In: I-KNOW 2013, ACM, 28:1-28:4, 2013
    • Berka P.: Towards Comprehensive Concept Description Based on Association Rules. In: IDA’13, Springer LNCS, 2013.
    • Dojchinovski M., Kliegr T.: Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia. In: ECML-PKDD’13, Springer LNCS, 2013.
    • Škrabal R., Šimůnek M., Vojíř S., Hazucha A., Marek T., Chudán D., Kliegr T.: Association Rule Mining Following the Web Search Paradigm. In: ECML-PKDD’12, Springer LNCS, 2012.
    • Berka P.: Learning compositional decision rules using the KEX algorithm. Intelligent Data Analysis, 2012, Vol. 16, No. 4.

Education

Activities of the group are reflected in several courses taught at the University, most notably the MSc level courses:

A specialized Bc level course is:

A data mining primer is also provided as part of the Bc level course (mandatory for all students of the Informatics specialty):

Finally, there is also a relevant PhD-level course:

Older news:

  • November 2015: Jan Rauch and Milan Šimůnek have been awarded the UEP Rector’s Prize for their book on the  LISp-Miner system.
  • September 2015: David Chudán successfully defended his PhD thesis on Association rule mining as a support for OLAP on September 22. Congratulations! (David remains at the Department, now as project worker and manager funded from OpenBudgets.eu.)
  • September 2015: The LHD dataset developed by our group is available for download as part of the DBpedia 2015 release.
  • September 2015: Tomáš Kliegr is starting his 6-month post-doc internship at University of Darmstadt. He will be mainly working with prof. Johannes Fürnkranz in the field of rule/preference learning.
  • August 2015: At the RuleML 2015 conference in Berlin, the DMKD team presented the new EasyMiner/R interface, and Tomáš Kliegr co-chaired the RecSysRules 2015 challenge.
  • June 2015: Václav Zeman defended his PhD project progress on “Data mining on linked data” (after first year).
  • May 2015: The successful Know@LOD workshop with 3rd Linked Data Mining Challenge (LDMC), co-chaired by V. Svátek, was held at the ESWC2015 conference in Portoroz – it was the most attended of all 16 workshops.
  • May 2015: The EU Horizon 2020 project OpenBudgets.eu (web still under construction) started. The DMKD team (led by V. Svátek) will contribute to the WP related to budget/spending open data mining.
  • May 2015: The EU LinkedTV Integrated project was concluded by a successful final review. The rating of the project eventually was ‘Excellent Progress’ (i.e., the best possible). A decent part of the project outcomes is due to the DMKD group development efforts (coordinated by T. Kliegr).
  • February 2015: Tomáš Kliegr has been awarded a CESNET grant on “Pilot application for distributed analysis of big data”, which will allow his team to employ the computing capacity of the CESNET Metacenter for the research tasks addressed by the DMKD group.
  • February 2015: Stanislav Vojíř obtained the Best Paper award in the Applied Informatics category at the annual PhD research symposium of the Faculty.
  • February 2015: David Pejčoch succesfully defended his PhD thesis on “complex management of data and information quality”.
  • November 2014: The article “Linked Hypernyms: Enriching DBpedia with Targeted Hypernym Discovery”, by Tomáš Kliegr, has been accepted to the Elsevier Journal of Web Semantics (IF=1.377), see the article page
  • October 2014: The Linked Hypernym Dataset (LHD), a large collection of type assignments to RDF entities built by THD tool co-developed by Tomáš Kliegr, has been integrated into the official version of German DBpedia.
  • September 2014: A new PhD student, Václav Zeman, enrolled in September 2014 (with V. Svátek as Advisor) and joined the group (as well as the SWOE group). He will be working on novel data mining techniques (with special focus on linked data), as well as upkeeping the Czech DBpedia.
  • August 2014: A project named Automated business rules extraction with feedback loop, funded by TACR, Technological Agency of the Czech Republic, started. Jan Rauch, Milan Šimůnek and Stanislav Vojíř take part in this project.
  • August 2014: The RuleML conference collocated with ECAI 2014 (Aug 18-20, 2014) was co-organized by Jan Rauch,Tomáš Kliegr and Stanislav Vojíř as Local Chairs. Tomáš also co-organized its Special Track on ‘Learning (Business) Rules from Data’.
  • May 2014: The second edition of the Linked Data Mining Challenge was co-organized by Vojtěch Svátek in connection with the Know@LOD workshop (May 25, 2014) collocated with the ESWC conference in Crete.