International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 7, Issue 2 (March-April 2025) Submit your research before last 3 days of April to publish your research paper in the issue of March-April.

Implementing Machine Learning for ETL Data Transformation and Anomaly Detection

Author(s) Manohar Reddy Sokkula, Shiva Kumar Vuppala
Country India
Abstract The ETL (Extract, Transform, and Load) process is a critical data processing component. Traditional ETL processes lack the required capabilities and agility and fall short of coping with the dynamic and evolving nature of data ecosystems. The traditional ETL system presents a myriad of challenges to the data management process such as inefficiency in handling high-volume, high-velocity data, schema mapping, and preserving data quality. The purpose of the current study was to implement machine learning (ML) for ETL workflows by highlighting the role of ML in improving data transformation and anomaly detection, exploring methods for integrating ML in ETL pipelines, and analyzing the impact of ML in ETL pipelines through both practical and theoretical lenses. The credit card fraud dataset, comprising of 284,807 rows and 31 columns, was downloaded from Kaggle. The most significant problem with this dataset was the huge class imbalance. The dataset was balanced using a modern approach known as Synthetic Minority Over-sampling Technique (SMOTE). The Isolation Forest (IF) was used to detect anomalies in the dataset. The findings showed that implementing ML in ETL pipelines solves the problem of feature scale disparity, improving the balance and accuracy of the model. The project highlights the benefits of modern machine learning-driven ETL transformation and anomaly detection processes over traditional workflows.
Keywords ETL pipeline, machine learning, data transformation, anomaly detection, SMOTE, and isolation forest.
Published In Volume 6, Issue 6, November-December 2024
Published On 2024-12-30
DOI https://doi.org/10.36948/ijfmr.2024.v06i06.33504
Short DOI https://doi.org/g82gj2

Share this