Enhancing biomedical question answering systems for COVID-19
Βελτίωση συστημάτων απάντησης σε βιοϊατρικές ερωτήσεις για τον COVID-19
Master Thesis
Author
Philippas, Ioannis - Andreas
Φίλιππας, Ιωάννης - Ανδρέας
Date
2024-02View/ Open
Keywords
Natural language process ; NLP ; Question answering ; Information retrieval ; COVID-19 ; Trans-formers ; BERT ; SBERT ; Generative Pseudo Labeling (GPL)Abstract
In the domain of biomedical research and service, the retrieval of relevant information from diverse data sources remains a critical challenge. Traditional Information Retrieval (IR) systems often struggle with the complexity and the specificity of the biomedical domain. The COVID-19 pandemic has underscored the critical need for robust biomedical Question Answering (QA) systems capable of rapidly retrieving accurate and relevant information from validated biomedical literature sources. This thesis proposes an innovative approach that integrates dense neural networks with traditional IR methods, to enhance the performance of biomedical QA systems, with primary focus on addressing COVID-19-related inquiries.
At its core, the system utilizes dense models, such as transformer-based architectures like BERT (Bidirectional Encoder Representations from Transformers), known for their ability to capture semantic relationships and context in textual data. These models are trained on large-scale biomedical corpora to develop a deep understanding of domain-specific language and terminology. Additionally, the integration of traditional IR methods like BM25 complements the dense model IR infrastructure by providing an efficient and effective mechanism for initial document retrieval based on keyword matching and statistical relevance scoring. Combining these two approaches, the proposed system aims to enhance the accuracy, relevance and efficiency of biomedical QA tasks, particularly in the context of COVID-19. The system proposed in this thesis, incorporates a reader module trained on both biomedical and general QA datasets. This module, leverages techniques from machine reading comprehension, further refines retrieved documents to extract precise answers to user queries.
The proposed QA system is supported by a web application, offering users a friendly interface for querying biomedical-related inquiries. The back-end system orchestrates various components to efficiently retrieve documents stored in a specific vector database, rank their relevance, and extract or generate potential answers. These answers are then presented to users through a user-friendly interface. Additionally, users have the flexibility to customize system parameters via the user interface, enhancing the system’s usability.
By adapting advances neural networks such as BERT and Transformer-based models in biomedical domain, the system exhibited an increase in metrics over traditional and zero-shot methods. This thesis underscore the potential of dense models and QA systems to revolutionize biomedical IR, offering promising directions for future research and practical applications in enhancing the accessibility of critical biomedical knowledge.