Fact checking with large language models
Έλεγχος γεγονότων με την χρήση μεγάλων γλωσσικών μοντέλων
Master Thesis
Author
Koufopoulos, Ioannis - Aris
Κουφόπουλος, Ιωάννης - Άρης
Date
2024-07View/ Open
Keywords
Μεγάλα γλωσσικά μοντέλα ; Αυτόματη επαλήθευση γεγονότων ; Μοντέλα ενσωμάτωσης κειμένου ; Ελληνικό σύνολο δεδομένων επαλήθευσης ισχυρισμώνAbstract
The recently increased focus on misinformation has stimulated research in fact checking, the task of assessing the truthfulness of a claim. While automated fact checking pipelines have seen a growing interest over the past few years, the rapid development of technologies that can be utilized to aid the automated pipeline, have paved the way for the revision of the current methods and approaches. The swift advancements in the field of deep learning, along with the emergence of the large language models (LLMs) have produced various techniques that can successfully replace traditional natural language processing (NLP) approaches. Thus, for the purposes of this thesis, we will employ an already established fact checking pipeline, and replace most of its methods with modern architectures, such as text embedding models, that are commonly found in LLMs. Furthermore, we will expand the dataset that was constructed for the purpose of storing Greek language statements and replace the classification models that were utilized to produce a verdict, with cutting-edge neural networks. The results of this dissertation proved that our approach produces very promising results in many cases, often reaching and surpassing the accuracy results produced by traditional NLP techniques. Furthermore, our methodology allows for considerable opportunities for further enchantment, since the final pipeline that we established is highly customisable and provides a breeding ground for additional experimentation—either by further enlarging the dataset or by introducing more effective text embedding models that may be produced in the future, and classification models with different architectures that are more reliable in capturing patterns.