
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 2
March-April 2025
Indexing Partners



















Implementing Machine Learning for ETL Data Transformation and Anomaly Detection
Author(s) | Manohar Reddy Sokkula, Shiva Kumar Vuppala |
---|---|
Country | India |
Abstract | The ETL (Extract, Transform, and Load) process is a critical data processing component. Traditional ETL processes lack the required capabilities and agility and fall short of coping with the dynamic and evolving nature of data ecosystems. The traditional ETL system presents a myriad of challenges to the data management process such as inefficiency in handling high-volume, high-velocity data, schema mapping, and preserving data quality. The purpose of the current study was to implement machine learning (ML) for ETL workflows by highlighting the role of ML in improving data transformation and anomaly detection, exploring methods for integrating ML in ETL pipelines, and analyzing the impact of ML in ETL pipelines through both practical and theoretical lenses. The credit card fraud dataset, comprising of 284,807 rows and 31 columns, was downloaded from Kaggle. The most significant problem with this dataset was the huge class imbalance. The dataset was balanced using a modern approach known as Synthetic Minority Over-sampling Technique (SMOTE). The Isolation Forest (IF) was used to detect anomalies in the dataset. The findings showed that implementing ML in ETL pipelines solves the problem of feature scale disparity, improving the balance and accuracy of the model. The project highlights the benefits of modern machine learning-driven ETL transformation and anomaly detection processes over traditional workflows. |
Keywords | ETL pipeline, machine learning, data transformation, anomaly detection, SMOTE, and isolation forest. |
Published In | Volume 6, Issue 6, November-December 2024 |
Published On | 2024-12-30 |
DOI | https://doi.org/10.36948/ijfmr.2024.v06i06.33504 |
Short DOI | https://doi.org/g82gj2 |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
