Seminář: Lexical Association Measures and Collocation Extraction

Datum a čas 26. 11. 2009 10:30 - 12:00
Místnost 403 NB

Lexical Association Measures and Collocation Extraction

Prezentující: Pavel Pecina

We present an extensive empirical evaluation of collocation extraction methods based on lexical association measures and their combination. The experiments are performed on a set of collocation candidates extracted from the Prague Dependency Treebank with manual morphosyntactic annotation. The collocation candidates were manually labeled as collocational or non-collocational. The evaluation is based on measuring the quality of ranking the candidates according to their chance to form collocations. Performance of the methods is compared by precision-recall curves and mean average precision scores. Further, we study the possibility of combining lexical association measures and present empirical results of several combination methods that significantly improved the state-of-the art in this task. We also propose a model reduction algorithm significantly reducing the number of combined measures without a statistically significant difference in performance.