International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 7, Issue 1 (January-February 2025) Submit your research before last 3 days of February to publish your research paper in the issue of January-February.

Scaling up Data Mining Algorithms for Big Data

Author(s) Pankras K. Kandengukila, Daudi Mashauri
Country Tanzania
Abstract The rapid development of science and technology and replacement of digital equipment have presided over today’s era of big data. Automatically discovering and extracting hidden knowledge in the forms of patterns from these big data is known as data mining. However, the emergence of big data era has brought a series of challenges to data mining techniques including too long processing time, insufficient memory capacity and excessive power consumption. Aim of this paper is to study scaling up data mining algorithms for big data by Random Forest and Naïve Bayes. The background and applications of data mining, big data and cloud computing are briefly introduced together with the basic principles of Random Forest and Naive Bayes as well as MapReduce model in cloud computing. Then, the feasibility of parallelism of Random Forest and Naive Bayes is studied. Two parallel Random Forest and Naive Bayes algorithms based on MapReduce are developed and realized in Hadoop platform. Finally, the parallelism of Random Forest and Naive Bayes is validated by experiments. Their execution efficiency is analyzed through the experimental results on the different sizes of data sets and different numbers of clusters. It is shown that the proposed methods have a good performance and can be applied in process of big data.
Keywords Data Mining, Big data, Cloud Computing, Random Forest, Naïve Bayes
Field Computer > Data / Information
Published In Volume 7, Issue 1, January-February 2025
Published On 2025-01-19
Cite This Scaling up Data Mining Algorithms for Big Data - Pankras K. Kandengukila, Daudi Mashauri - IJFMR Volume 7, Issue 1, January-February 2025. DOI 10.36948/ijfmr.2025.v07i01.34838
DOI https://doi.org/10.36948/ijfmr.2025.v07i01.34838
Short DOI https://doi.org/g82gwt

Share this