Document clustering and topic mining
Ομαδοποίηση εγγράφων και εξόρυξη θεμάτων

Master Thesis
Author
Ατλαμάζογλου, Ιωάννης
Atlamazoglou, Ioannis
Date
2021-07-05Advisor
Πετάσης, ΓεώργιοςPetasis, Georgios
View/ Open
Keywords
Document clustering ; Topic modeling ; Topic extractionAbstract
The purpose of this thesis is topic of extraction from documents in Greek language
and document clustering according to these topics, so that documents
that that refer to the same topic or are similar, belong in the same cluster. After
researching related work, popular methods of topic extraction models such as the
LDA and text representation methods such as BERT and FASTTEXT, which are
among the state if the art technologies used to export text representations in the
form of vectors, were explored and applied. To evaluate the document clustering
performance according to their vector embeddings, several metrics are applied which
are suitable for such tasks.