Research Focus

The Data Mining and Knowledge Discovery (DMKD) group at KIZI (one of its four research groups, overarched by the virtual Knowledge Engineering Group) undertakes research in analyzing various kinds of data in structured, semi-structured and textual form, and deriving useful knowledge from it. The focal areas of the group currently are:

  • Developing new tools for dealing with domain knowledge in data mining
  • Developing new tools for automation of data mining
  • Research on algorithmic bias, in machine learning and on understandability of data mining results
  • Mining from RDF knowledge graphs
  • Accessing “classical” data mining tools via a web interface, using a “web search” metaphor, and sharing data mining results in structured form over the web
  • Extracting structured data from free or semi-structured text
  • Disambiguation of textual entities, their open-class classification and linking to semantic resources (such as DBpedia)
  • Web / multimedia usage mining and user preference learning.

The research on data mining had been present at the Department long before this term became coined: tools for combinatorial data analysis (KAD) and “learning an expert system from observational data” (ESOD, later re-implemented by P. Berka as KEX), both derived from the even earlier GUHA method, appeared, under supervision of J. Ivánek, in early 1980s. Since mid 1990s the flagship datamining tool of KIZI has been the LISp-Miner system (conceived by J. Rauch and developed by M. Šimůnek), currently after a major redesign centered around the new LM Workspace module and with scripting support based on the LISp-Miner Control Language (LMCL); an extensive bibliography of GUHA and LISp-Miner is available in Czech. Most recently, a family of web-oriented data mining tools arose under the leadership of T. Kliegr, such as (in 2011) integrating the CMS-based reporting tool SEWEBAR, leveraging on background knowledge.

In parallel there has been ongoing work on mining from texts, with special focus on Wikipedia: the Targeted Hypernym Discovery method (THD, now part of the tool) and the associated LHD dataset.

A recent thread addresses mining rules from RDF knowledge graphs: the RDFRules system.

The research has been supported by a number of research projects. The most important had recently been LinkedTV, an Integrated Project funded by the EU FP7 (2011-2015), under which the text mining tool and the recommender had been developed (under the supervision of T. Kliegr). There had also been several CSF (Czech Science Foundation) projects, coordinated by J. Rauch. More recently, the group had been engaged in the EU Horizon 2020 project (2015-2017), where analyses of fiscal data using various EasyMiner components took place.

The group also co-organized several international events, most notably, RuleML 2014 (T. Kliegr, J. Rauch) and ISMIS 2009 (J. Rauch, P. Berka), several editions of the ECML/PKDD Discovery Challenge (P. Berka) and of the Linked Data Mining Challenge (V. Svátek).