International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 7, Issue 2 (March-April 2025) Submit your research before last 3 days of April to publish your research paper in the issue of March-April.

Challenges in Streaming ETL Pipelines for High-Frequency Data Ingestion and Real-Time Processing

Author(s) Shiva Kumar Vuppala, Manohar Reddy Sokkula
Country India
Abstract The transition from traditional ETL (Extract, Transform, Load) pipelines to streaming ETL pipelines seeks to enhance the accessibility and usability of data by everyone in the organization, including non-technical organizational members. Traditional batch ETL pipelines, which process data at the end of a batch cycle, are ETL unsuitable for high-frequency data ingestion and real-time applications. Streaming ETL handles data in real-time as it is generated and accessed, allowing continuous processing of high-frequency data as it arrives. The paper analyzes the challenges in designing and implementing streaming ETL pipelines for high-frequency data ingestion and real-time processing. A review of existing literature led to the identification of the major streaming ETL challenges and their solutions. Some of the major challenges in streaming ETL pipelines include the ingestion of high-frequency data, achieving low-latency data transformation, recovering from faults, privacy and security, real-time data visualization, and availability of the required skills. Some strategies for addressing the challenges include backpressure and buffering, distributed messaging systems, windowing, lightweight serialization formats, checkpointing, idempotent operations, and watermarking. Organizations should address these challenges to unlock the full potential of streaming ETL pipelines
Keywords Streaming ETL, real-time data processing, latency optimization, out-of-order data, backpressure management, checkpointing, throughput
Published In Volume 6, Issue 6, November-December 2024
Published On 2024-12-30
DOI https://doi.org/10.36948/ijfmr.2024.v06i06.33506
Short DOI https://doi.org/g82gjz

Share this