R-GCN for drug repurposing in heterogeneous biomedical knowledge graph with contraindication-informed negative sampling

Master Thesis
Author
Kalogianni, Maria
Καλογιάννη, Μαρία
Date
2025-09View/ Open
Keywords
GNN ; R-GCN ; Heterogeneous knowledge graph ; Drug repurposingAbstract
Drug repurposing represents a cost-effective strategy for therapeutic development, with computational methods increasingly applied to uncover novel drug–disease associations. Traditional approaches often rely on random negative sampling, overlooking clinically meaningful constraints. This work explores a contraindication-informed hybrid sampling strategy that integrates 30,675 adverse drug–disease relationships from PrimeKG with random negatives, enabling clinically grounded evaluation of biomedical link prediction.
We systematically benchmark this framework using Node2Vec embeddings with machine learning classifiers and R-GCN architectures under different negative sampling ratios (1:3, 1:10). Our R-GCN achieves AUPRC scores of 0.91-0.92, performing competitively with state-of-the-art methods such as TxGNN and HGTDR. Literature-based validation of novel predictions demonstrates strong concordance with biomedical evidence, underscoring the translational potential of the approach.
By embedding clinical safety constraints directly into the modeling process, this work provides a framework for safety-aware biomedical link prediction. The findings emphasize the importance of domain-informed evaluation protocols and highlight a practical path toward more reliable AI-driven drug repurposing.


