International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Reviewer Referral Program
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 6 Issue 6
November-December 2024
Indexing Partners
The Making of an Data Pipeline
Author(s) | Harsh Kaushik, Avnish Rai, Gaurav Kapasiya, Jai Prakash Bhati |
---|---|
Country | India |
Abstract | This paper details the development and implementation of a data engineering pipeline designed for the extraction, transformation, and loading (ETL) of data from a web-based directory. The project involves using asynchronous web scraping techniques to gather user details from a local business directory, transforming the data into a structured format, and loading it into a storage solution. The pipeline utilises Python, the HTTPX library for asynchronous HTTP requests, BeautifulSoup for HTML parsing, and Amazon S3 for data storage. By leveraging these technologies, the pipeline demonstrates an efficient approach to handling large-scale web data extraction and processing, significantly reducing the time required to gather and organise data from multiple web pages. This paper provides insights into the architecture, implementation, and performance of the ETL pipeline, highlighting the benefits and challenges of using asynchronous programming in data engineering. |
Keywords | ETL, Data engineering , Python, Async, Web Scraping, local.ch |
Field | Engineering |
Published In | Volume 6, Issue 3, May-June 2024 |
Published On | 2024-05-21 |
Cite This | The Making of an Data Pipeline - Harsh Kaushik, Avnish Rai, Gaurav Kapasiya, Jai Prakash Bhati - IJFMR Volume 6, Issue 3, May-June 2024. DOI 10.36948/ijfmr.2024.v06i03.20849 |
DOI | https://doi.org/10.36948/ijfmr.2024.v06i03.20849 |
Short DOI | https://doi.org/gtwmsm |
Share this
E-ISSN 2582-2160
doi
CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.