A comparative analysis of data parallelism and model parallelism for deep learning-based text classification

Giagias, Dimitrios; Γιαγιάς, Δημήτριος

Συγκριτική ανάλυση παραλληλισμού δεδομένων και παραλληλισμού μοντέλου για την ταξινόμηση κειμένου με τεχνικές βαθιάς μάθησης

Bachelor Dissertation

Author

Giagias, Dimitrios

Γιαγιάς, Δημήτριος

Date

2025-09

Abstract

The increasing computational demands of modern deep learning models for natural language processing have made parallel training strategies essential for practical implementation. While the theoretical foundations of parallel training are well-established, empirical comparisons of different approaches applied to text classification architectures remain limited. This thesis presents an experimental comparison of data parallelism and model parallelism across four representative deep learning architectures: Convolutional Neural Networks (CNNs), Long Short Term Memory networks (LSTMs), Gated Recurrent Units (GRUs), and Transformers. An experimental framework was developed using PyTorch to evaluate three training strategies: sequential baseline training, data parallelism using DistributedDataParallel, and model parallelism through manual layer partitioning across two GPUs. All experiments were conducted on the AG News dataset, ensuring standardized evaluation conditions across different architectures and parallelization approaches. The experimental evaluation focused on two critical aspects: computational efficiency measured through training time and speedup analysis, and model quality assessed through standard classification metrics including accuracy, precision, recall, and F1-score. Results demonstrated that Data Parallelism consistently delivered substantial speedups (1.30×-1.80×) across all architectures while maintaining or improving model accuracy—including a notable +4.58% gain for GRU due to reduced overfitting. In contrast, Model Parallelism provided only modest acceleration (1.04×–1.12×) and exhibited high sensitivity to hardware topology, with performance degrading when inter-GPU communication costs dominated. The findings lead to a clear conclusion: Data Parallelism is the preferred strategy when models fit within a single device, offering strong throughput gains with minimal implementation cost, whereas Model Parallelism remains valuable primarily as a memory-scaling tool for architectures that exceed single-GPU capacity.

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Πληροφορικής

Number of pages

Language

English

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/18195

Collections

Τμήμα Πληροφορικής

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα