
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 2
March-April 2025
Indexing Partners



















Architectural Evolution in Distributed Training: From Parameter Servers to Zero Redundancy Systems
Author(s) | Aditya Singh |
---|---|
Country | United States |
Abstract | The rapid evolution of deep learning models has necessitated fundamental changes in distributed training architectures. This article comprehensively reviews the architectural transformation in distributed training systems, from the traditional parameter server approaches to modern innovations like Ring-AllReduce and pipeline parallelism. The article examines how these architectural advances, coupled with the Zero Redundancy Optimizer (ZeRO), have addressed the critical challenges of memory efficiency and hardware utilization in large-scale model training. The article further analyzes the synergy between architectural innovations and optimization algorithms, particularly focusing on Layer-wise Adaptive Moments optimizer for Batching training (LAMB) and Layer-wise Adaptive Rate Scaling (LARS), which enable stable training with large batch sizes. The article also explores various gradient compression and quantization techniques that reduce communication overhead while maintaining model quality. The analysis reveals how these combined advances have revolutionized the training of large-scale models, enabling unprecedented model sizes while maintaining computational efficiency. The article discusses emerging challenges and future directions in distributed training architectures, particularly focusing on system complexity, fault tolerance, and energy efficiency considerations. |
Keywords | Keywords: Distributed Training Architectures, Ring-AllReduce Networks, Pipeline Parallelism, Large-Scale Optimization, Neural Network Scaling. |
Field | Computer |
Published In | Volume 6, Issue 6, November-December 2024 |
Published On | 2024-12-29 |
DOI | https://doi.org/10.36948/ijfmr.2024.v06i06.34214 |
Short DOI | https://doi.org/g8xgmm |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
