Seminář: Revelation of the author’s identity using machine learning and stylometry

Datum a čas 12. 3. 2015 10:30 - 12:00
Místnost 336 RB

Revelation of the author's identity using machine learning and stylometry

Prezentující: Jan Rygl

Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. Currently prevailing techniques build upon the machine learning approach. Stylometry-based algorithms are used to extract features for machine learning. In NLP Centre, we have developed Authorship Recognition Tool (ART) for the Ministry of the Interior. Now we are working on the Style & Identity Recognizer (SIR) that solves stylometry-based tasks such as authorship recognition; translation detection; age and gender prediction. Techniques such as Double-layer machine learning; similarity-based features; and authorship corpora will be presented.