Seminář: A Tree-based Unsupervised Keyphrase Extraction Technique – ZRUŠENO
Datum a čas | 12. 3. 2020 16:00 - 17:30 |
---|---|
Místnost | 473 NB |
A Tree-based Unsupervised Keyphrase Extraction Technique
Prezentující: Gollam Rabby (KIZI VŠE; research carried out while at Univ. Pahang, Malaysia)
Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques, some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, we introduced a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent and employs limited statistical knowledge. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final key phrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with a total of 244 train and test articles, and compared with other relevant unsupervised techniques. Three evaluation metrics, namely precision, recall, and F1 scores are taken into consideration during the experiments. In this seminar, we will also introduce an idea about automated researcher profiling knowledge graph which can extract information from the web and build a knowledge graph automatically.