Data Mining and Knowledge Discovery
The Data Mining and Knowledge Discovery (DMKD) group at KIZI is one of its four informal working groups. Research-wise, it is also a constituent of a larger research team called Data Science and Explainable AI (DSXAI); however, DMKD scope is not in research alone, but also in education, including undergraduate.
DMKD undertakes research in analyzing various kinds of data in structured, semi-structured and textual form, and deriving useful knowledge from it, which is now also commonly labeled as Data Science. The focal areas of the group currently are:
- Research on algorithmic bias, in machine learning and on understandability of data mining results
- Developing new tools for dealing with domain knowledge in data mining
- Developing new tools for automation of data mining
- Accessing “classical” data mining tools via a web interface, using a “web search” metaphor, and sharing data mining results in structured form over the web
- Extracting structured data from free or semi-structured text
- Disambiguation of textual entities, their open-class classification and linking to semantic resources (such as DBpedia)
- Web / multimedia usage mining and user preference learning.
DMKD’s color among the KIZI groups is blue, referring to the ‘color’ of the ‘oceans’ of data that can be submitted to data mining and knowledge discovery tools.
News
- 15 October 2024: The COST Action “GOBLIN: Global Network on Large-Scale, Cross-domain and Multilingual Open Knowledge Graphs” is starting, see info. Tomáš Kliegr was one of the initiators of the proposal, and will serve as a Management Committee member for Czechia.
- 31 May 2024: The CIMPLE project (with six DMKD members involved in the team), was successfully completed, producing four software prototypes directly or indirectly related to the tasks of misinformation detection and explanation, and 13 academic papers, including four in high-standing journals.
- 31 December 2023: The EU H2020 project EU H2020 HeartBIT 4.0 – “Application of innovative Medical Data Science for heart diseases”, with Prof. P. Berka as participant coordinator for VSE, was successfully completed.
- 1 April 2021: A new EU-level (CHIST-ERA programme) project is starting, under the name CIMPLE. Its topic is “Countering Creative Information Manipulation with Explainable AI”, its coordinator is Prof. R. Troncy from EURECOM (France), VSE is one of the other four partners; the team (consisted of members of SWOE, DMKD and others) is led by Vojtěch Svátek, and will provide expertise in knowledge graphs, NLP, machine learning as well as data visualization.
- Submission deadline: March 31, 2021. You are invited to submit to the Special Issue on Explainable and Interpretable Machine Learning and Data Mining of the Data Mining and Knowledge Discovery journal (DMKD, Springer). Guest editors: Martin Atzmueller, Johannes Fürnkranz (editor-in-chief), Tomáš Kliegr, Ute Schmid.
- 1 March 2021: Two new, two-year, IGA (i.e., institutionally-funded) projects are starting. One, on cyber thread detection in a university network, is coordinated by Pavel Strnad, and follows up with the successfully completed IGA project by the same coordinator from 2019-2020. The other, on action rule mining over textual data, is coordinated by Lukáš Sýkora.
- 17 February 2021: Tomáš Kliegr was invited to present at the First international school on teaching and learning Machine Learning in Business Schools, organized by BigML.
- 10 February 10, 2021: The on-line first version of the article by Tomáš Kliegr, Štěpán Bahník and Johannes Fürnkranz, A review of possible effects of cognitive biases on the interpretation of rule-based machine learning models, appeared in Elsevier’s Artificial Intelligence journal (IF>6). The article is open access.
- 20 November 2020: An article on RDFRules: Making RDF Rule Mining Easier and Even More Efficient, authored by Václav Zeman, Tomáš Kliegr and Vojtěch Svátek, got accepted to IOS Press’ Semantic Web (IF > 3).
- 11 September 2020: A paper on Editable Machine Learning Models? A rule-based framework for user studies of explainability, by S. Vojíř and T. Kliegr, appeared online in Springer’s Journal of Advances in Data Analysis and Classification (IF>1.5).
- September 2020: A new PhD student joined the group. Lukáš Sýkora will be supervised by T. Kliegr, and will work on the topic of action rule mining.
- July 2020: An article on TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique, with Gollam Rabby as first author, appeared in a printed issue of Springer’s Cognitive Computation journal (IF > 4).
- July 2020: The paper presenting Lukáš Sýkora’s master thesis (supervised by T. Kliegr), Action Rules: Counterfactual Explanations in Python, won the 14th Rule Challenge 2020 competition.
- 29 June – 1 July 2020: Tomáš Kliegr was a Program Chair of the 4th International Joint Conference on Rules and Reasoning (RuleML+RR 2020) conference, eventually held online.
- April 2020: An article on cognitive preferences and the plausibility of rule-based models, co-authored by Tomáš Kliegr, appeared (with open access!) in a printed issue of Springer’s Machine Learning journal (IF ~ 2.7).
- 5 February 2020: Gollam Rabby succeeded in receiving funding for his project on “Knowledge Engineering of Researcher Data” (KNERD) from the VSE internal grant agency (IGA). The project will span between the DMKD and SWOE groups, and will involve two faculty and two other students.
- 1 February 2020: Our paper Advances in machine learning for the behavioral sciences appears in a printed issue of Sage’s American Behavioral Scientist (IF ~ 1.4). Preprint: https://arxiv.org/abs/1911.03249.
- 1 January 2020: A new EU H2020 project with participation of a DMKD team (lead by Petr Berka) is starting, under the name HeartBIT 4.0. DMKD will primarily provide methodological expertise in data mining, focused on the medical domain.
- 27 December 2019: An article on Associative Classification in R, co-authored by Tomáš Kliegr, appeared online (with open access!) in the R Journal (IF ~ 2.7).
- September 2019: A new PhD student joined the group. Gollam Rabby from Bangladesh (and with MSc. from Univ. Malaysia Pahang) will be supervised by Tomáš Kliegr, and will work on the topic of scholarly knowledge graphs. He will also collaborate with colleagues from the neighboring SWOE group.
- 22-24 September 2019: The Cognitive Systems research group from Univ. of Bamberg (led by prof. Ute Schmid) visited the Department upon an invitation by T. Kliegr. The visit mainly focused on technical discussions related to using our RDF-based association rule mining system RDFRules for learning ILP-style (Aleph) rules. As part of the visit, a CogSci group member, Bettina Finzel, gave a talk at our KEG seminar, and a workshop for students within the course Data Science in R and Python (4IZ566).
- 16-19 September 2019: The EasyMiner team was awarded the Best RuleML Challenge prize at RuleML’19 in Bolzano. The awarded paper is by Jiří Filip and Tomáš Kliegr: PyIDS–Python Implementation of Interpretable Decision Sets Algorithm by Lakkaraju et al, 2016⋆.” RuleML Challenge (2019).This new algorithm is available on GitHub as the pyIDS Python package.
- 26 April 2019: An article on data mining with histograms and domain knowledge, by J. Rauch and M. Šimůnek, appeared in the IOS Press’ Fundamenta Informaticae journal (IF ~ 0.7).
- 1 April 2019: An article on expert deduction rules in data mining with association rules, by J. Rauch, appeared in the Springer’s KAIS journal (IF > 2).
- 19 March 2019: An article on ML support for EU project categorization, by O. Zamazal, appeared in the Oxford’s Computer Journal (IF ~ 0.8).
- 1 March 2019: Tomáš Kliegr successfully concluded the habilitation process and became Associate Professor. This entitles him for both the PhD student full Advisorship (he will be taking over Václav Zeman) and a tenured contract.
- 1 March 2019: A new, two-year, IGA (i.e., institutionally-funded) project is starting, coordinated by Pavel Strnad and involving 2 faculty and 4 further (PhD and MSc) students. The project focuses on security aspects of user behavior models in computer networks.
- 21-24 January 2019: We enjoyed a visit of Eneldo Loza Mencía from University of Darmstadt. He led a short course on Data Mining and Machine Learning, and also gave a KEG seminar talk on his recent research.
- 25 September 2018: Two new PhD students joined the group. Both Pavel Strnad (supervised by P. Berka) and Lukáš Švarc (supervised by J. Ivánek) will research on machine-learning-based anomaly detection in (various kinds of) computer network data.
- 15 June 2018: An article on the EasyMiner system (see also the system webpage), by S. Vojíř and colleagues, appeared in the Elsevier’s KnowSys journal (IF > 4.5).
- 25 May 2018: Tomáš Kliegr gave an interview to Deutsche Welle on the role of algorithmic bias in machine learning. The article entitled “Can AI be free of bias?” can be freely accessed.
- 24 May 2018: An article on term similarity benchmarks, joint work of T. Kliegr with O. Zamazal (from the SWOE group), appeared in the Elsevier’s Data & Knowledge Engineering journal (IF ~ 1.7).
- 14 March 2018: An article on meta-learning in association rule post-processing, by P. Berka, appeared in the IOS Press’ Intelligent Data Analysis journal (IF ~ 0.8).
- 12 February 2018: The first term of the brand-new MSc.-level course on Data Science in Python and R (taught by T. Kliegr, in English) started. The department’s curriculum is now better balanced wrt. the modern programming-oriented paradigms in data science.
- 13 November 2017: In the prestigious ACM-endorsed Czecho-Slovak competition of MSc. theses, IT SPY (the base round involving about 1900 theses from 20 universities), that by B. Koukal, supervised by D. Chudán, had been invited to the final round (of best nine theses) and there it obtained the SAP Award for Contribution to the Field of Enterprise Information Systems. Big congratulations!
- 1 November 2017: An article on the InBeat system, by J. Kuchař and T. Kliegr, appeared in the Elsevier’s KnowSys journal (IF > 4.5).
- October 2017: A new PhD student, Ivan Jelínek, joined the group (being already in his 3rd year, by transfer from DIT due to his advisor’s retirement). His topic is “Unstructured Data Analysis on Social Networks”; he will be supervised by P. Strossa from SWOE (officially and along the linguistic line) but presumably also helped by relevant DMKD folks.
- 19 August 2017: An article comparing the GUHA and Apriori data mining methods, by J. Rauch and M. Šimůnek, appeared in the IOS Press’ IDA journal.
- 21 June 2017: The LHD tool by T. Kliegr won the 1st round of the DBpedia Open Extraction Challenge (TextExt) collocated with the LDK conference.
- 31 November 2016: The JWS journal paper by T. Kliegr, joint with O. Zamazal (SWOE group) was ranked 3rd in the annual Rector’s award. Congratulations!
- October, 2016: The project “Pilot application for distributed analysis of big data” (led by T. Kliegr), funded by the CESNET association, has been successfully defended.
- October 4, 2016: Tomáš Kliegr gave a talk on association rule classification and the EasyMiner system developed by the DMKD group at the IEEE Days at the University of West Bohemia.
- October 2016: A new PhD student, Jiří Zettel, joined the group. He will be supervised by prof. Petr Berka and his topic will be related to data pre-processing for data mining.
- September 20, 2016: Stanislav Vojíř has successfuly defended his PhD thesis “Business Rule Learning using data mining of GUHA association rules”. He remains member of our team.
- August 2016: The LHD dataset developed by our group is available for download as part of the DBpedia 2015 release.
- June 2016: A new paper by Tomáš Kliegr (and Ondřej Zamazal): LHD 2.0: A text mining approach to typing entities in knowledge graphs appears in Elsevier’s Journal of Web Semantics. It follows up with the recent LHD paper in the same journal.
- April 2016: T. Kliegr and his team presented EasyMiner at the popular Machine Learning Meetup.
- November 2015: Jan Rauch and Milan Šimůnek have been awarded the UEP Rector’s Prize for their book on the LISp-Miner system.
- September 2015: David Chudán successfully defended his PhD thesis on Association rule mining as a support for OLAP on September 22. Congratulations! (David remains at the Department, now as project worker and manager funded from OpenBudgets.eu.)
- September 2015: The LHD dataset developed by our group is available for download as part of the DBpedia 2015 release.
- September 2015: Tomáš Kliegr is starting his 6-month post-doc internship at University of Darmstadt. He will be mainly working with prof. Johannes Fürnkranz in the field of rule/preference learning.
- August 2015: At the RuleML 2015 conference in Berlin, the DMKD team presented the new EasyMiner/R interface, and Tomáš Kliegr co-chaired the RecSysRules 2015 challenge.
- June 2015: Václav Zeman defended his PhD project progress on “Data mining on linked data” (after first year).
- May 2015: The successful Know@LOD workshop with 3rd Linked Data Mining Challenge (LDMC), co-chaired by V. Svátek, was held at the ESWC2015 conference in Portoroz – it was the most attended of all 16 workshops.
- May 2015: The EU Horizon 2020 project OpenBudgets.eu (web still under construction) started. The DMKD team (led by V. Svátek) will contribute to the WP related to budget/spending open data mining.
- May 2015: The EU LinkedTV Integrated project was concluded by a successful final review. The rating of the project eventually was ‘Excellent Progress’ (i.e., the best possible). A decent part of the project outcomes is due to the DMKD group development efforts (coordinated by T. Kliegr).
- February 2015: Tomáš Kliegr has been awarded a CESNET grant on “Pilot application for distributed analysis of big data”, which will allow his team to employ the computing capacity of the CESNET Metacenter for the research tasks addressed by the DMKD group.
- February 2015: Stanislav Vojíř obtained the Best Paper award in the Applied Informatics category at the annual PhD research symposium of the Faculty.
- February 2015: David Pejčoch succesfully defended his PhD thesis on “complex management of data and information quality”.
- November 2014: The article “Linked Hypernyms: Enriching DBpedia with Targeted Hypernym Discovery”, by Tomáš Kliegr, has been accepted to the Elsevier Journal of Web Semantics (IF=1.377), see the article page
- October 2014: The Linked Hypernym Dataset (LHD), a large collection of type assignments to RDF entities built by THD tool coeloped by Tomáš Kliegr, has been integrated into the official version of German DBpedia.
- September 2014: A new PhD student, Václav Zeman, enrolled in September 2014 (with V. Svátek as Advisor) and joined the group (as well as the SWOE group). He will be working on novel data mining techniques (with special focus on linked data), as well as upkeeping the Czech DBpedia.
- August 2014: A project named Automated business rules extraction with feedback loop, funded by TACR, Technological Agency of the Czech Republic, started. Jan Rauch, Milan Šimůnek and Stanislav Vojíř take part in this project.
- August 2014: The RuleML conference collocated with ECAI 2014 (Aug 18-20, 2014) was co-organized by Jan Rauch,Tomáš Kliegr and Stanislav Vojíř as Local Chairs. Tomáš also co-organized its Special Track on ‘Learning (Business) Rules from Data’.
- May 2014: The second edition of the Linked Data Mining Challenge was co-organized by Vojtěch Svátek in connection with the Know@LOD workshop (May 25, 2014) collocated with the ESWC conference in Crete.