Seminář: Linking and debugging datasets with RDF Keys

Datum a čas 15. 5. 2012 10:00 - 12:00
Místnost 403 NB

Linking and debugging datasets with RDF Keys

Prezentující: Francois Scharffe

We introduce a novel method for analysing Web datasets based on key dependencies. This particular kind of functional dependencies, widely studied in the field of database theory, allows to evaluate if a property set is a key for the data set considered. When this is the case, there will not be any two instances having identical values for these properties. After giving necessary definitions, we propose an algorithm for detecting minimal keys and pseudo-keys in a RDF dataset. Pseudo-keys are a relaxed version of keys tolerating a few instances having equivalent values on the key. We then use this algorithm to detect keys in datasets published as Web data and we apply this approach in two applications: (i) reducing the number of properties to compare in order to discover equivalent instances between two datasets, (ii) detecting errors inside a dataset. The slides of the talk are available from http://www.scharffe.fr/presentations/20120515-VSE-Prague/