
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 2
March-April 2025
Indexing Partners



















Scalable approach for Distributed File Processing using Spring, Zookeeper, and Docker
Author(s) | Arjun Reddy Lingala |
---|---|
Country | United States |
Abstract | In modern distributed systems, handling large-scale data efficiently is a key challenge, especially when dealing with structured and unstructured files stored in the Hadoop Distributed File System (HDFS) [1]. This paper presents an API- based solution using the Spring framework to process files in distributed file system, transforming them based on specific busi- ness requirements and storing the results back into distributed storage. The proposed architecture ensures high availability, fault tolerance, and efficient workload distribution through the integration of Apache Zookeeper [2] for consensus management and Docker [6] for containerized execution. We have distributed processing frameworks like Spark [8], which cannot be used in some cases where a certain process requires installing a software which cannot be done in distributed file system for security reasons. Approach discussed in this paper leverages the parallel execution of multiple Spring-based microservices, each deployed as independent Docker [6] containers, allowing for scalable and efficient processing. More instances of the Spring application can run simultaneously, ensuring that files are processed in a distributed manner to maximize throughput. The API facilitates seamless interaction with the HDFS [1] cluster, enabling efficient read, transformation, and write operations. To ensure coordination among instances, Apache Zookeeper [2] is used to manage leader election, task allocation, and synchronization, preventing conflicts and ensuring load balancing across nodes. The parallel processing workflow significantly improves the performance and resilience of the system. By running multiple instances in a containerized environment, our solution dynamically scales based on workload demands. Additionally, Zookeeper [2] ensures that processing tasks are distributed optimally, preventing redundant operations and maintaining system consistency. The paper provides a solution that demonstrates reduced processing time and improved fault tolerance compared to traditional single- instance processing methods. Through this paper, we highlight the benefits of combining Spring Boot [3], HDFS [1], Docker [6], and Zookeeper [2] for scalable and efficient distributed file processing. |
Keywords | Spring, Docker, Distributed processing, distributed storage, HDFS, Zookeeper, Consensus, Coordination, Containerization, REST API, Monitoring, Logging |
Field | Engineering |
Published In | Volume 4, Issue 5, September-October 2022 |
Published On | 2022-09-07 |
DOI | https://doi.org/10.36948/ijfmr.2022.v04i05.37539 |
Short DOI | https://doi.org/g85spd |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
