Implementing Machine Learning for ETL Data Transformation and Anomaly Detection

Manohar Reddy Sokkula; Shiva Kumar Vuppala

doi:10.36948/ijfmr.2024.v06i06.33504

Implementing Machine Learning for ETL Data Transformation and Anomaly Detection

Author(s)	Manohar Reddy Sokkula, Shiva Kumar Vuppala
Country	India
Abstract	The ETL (Extract, Transform, and Load) process is a critical data processing component. Traditional ETL processes lack the required capabilities and agility and fall short of coping with the dynamic and evolving nature of data ecosystems. The traditional ETL system presents a myriad of challenges to the data management process such as inefficiency in handling high-volume, high-velocity data, schema mapping, and preserving data quality. The purpose of the current study was to implement machine learning (ML) for ETL workflows by highlighting the role of ML in improving data transformation and anomaly detection, exploring methods for integrating ML in ETL pipelines, and analyzing the impact of ML in ETL pipelines through both practical and theoretical lenses. The credit card fraud dataset, comprising of 284,807 rows and 31 columns, was downloaded from Kaggle. The most significant problem with this dataset was the huge class imbalance. The dataset was balanced using a modern approach known as Synthetic Minority Over-sampling Technique (SMOTE). The Isolation Forest (IF) was used to detect anomalies in the dataset. The findings showed that implementing ML in ETL pipelines solves the problem of feature scale disparity, improving the balance and accuracy of the model. The project highlights the benefits of modern machine learning-driven ETL transformation and anomaly detection processes over traditional workflows.
Keywords	ETL pipeline, machine learning, data transformation, anomaly detection, SMOTE, and isolation forest.
Published In	Volume 6, Issue 6, November-December 2024
Published On	2024-12-30
DOI	https://doi.org/10.36948/ijfmr.2024.v06i06.33504
Short DOI	https://doi.org/g82gj2

View / Download PDF File

E-ISSN 2582-2160

doi

CrossRef DOI is assigned to each research paper published in our journal.

IJFMR DOI prefix is
10.36948/ijfmr

Downloads

Research Paper Format Copyright Permission Form and Undertaking Form Cover Page Vol 7 Isu 2 Cover Page Vol 7 Isu 1 Cover Page Vol 6 Isu 6

All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.

CC-BY-SA

About IJFMR Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9687-828-838	editor@ijfmr.com

International Journal For Multidisciplinary Research

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Implementing Machine Learning for ETL Data Transformation and Anomaly Detection

Share this