Seminář: English-to-Czech Machine Translation: Should We Go Shallow or Deep?

Datum a čas 6. 3. 2008 10:30 - 12:00
Místnost 403 NB

English-to-Czech Machine Translation: Should We Go Shallow or Deep?

Prezentující: Ondřej Bojar

The purpose of my talk is to introduce two rather different approaches to machine translation (MT) I’m actively involved with. The first is so-called phrase-based MT where sentences are treated as plain sequences of words–opaque symbols. An input sentence is segmented into phrases‘ or rather n-grams and each phrase is translated nearly independently. A completely different approach is to automatically obtain a deep syntactic structure of the sentence (a tectogrammatical, dependency, tree in our case), decompose the tree into treelets, translating the treelets independently. Both of the methods rely on large collections of training data–i.e. texts that were previously translated by humans. The fact of shared training and evaluation data allows us to directly compare the performance and strong and weak points of the methods.‘