Seminar: Document Classification with Supervised Latent Feature Selection

Date and time 8. 3. 2012 10:30 - 12:00
Room 403 NB

Document Classification with Supervised Latent Feature Selection

Speakers: Ondřej Háva

The classification of text documents to categories generally deals with large dimensionality of a structured representation of the documents. To favor generality over accuracy of the classifier some dimensionality reduction technique has to be applied.We propose a classification algorithm that utilizes the hidden structure of uncorrelated topics extracted from training documents and their known categories that may not be independent. The proposed classifier takes advantage of singular value decomposition of input and target variables and is capable of including various methods of hidden feature selection. We evaluated three feature selection procedures on two different collections of text documents. THE SLIDES ARE IN CZECH!