.

ISSN 2063-5346
For urgent queries please contact : +918130348310

Clustering of Bigdata Using Genetic Algorithm in Hadoop MapReduce

Main Article Content

Chandra Shekhar Gautam , Mr. Laxmi Narayan Soni , Dr. Prabhat Pandey
» doi: 10.48047/ecb/2022.11.12.99

Abstract

The clustering of Bigdata is a common task in data mining and machine learning. The goal is to group similar data points to identify patterns and relationships in the data. However, clustering large datasets can be computationally expensive and time-consuming. This is where Hadoop MapReduce comes in. Hadoop is a sophisticated framework that facilitates the distributed processing of voluminous datasets across multiple clusters of computers. MapReduce is a programming model that simplifies the processing of large datasets by breaking them down into smaller chunks and processing them in parallel across the cluster. One approach to clustering Bigdata using Hadoop MapReduce is to use a genetic algorithm. A genetic algorithm is an optimization technique that is inspired by the process of natural selection. It works by iteratively generating and evaluating candidate solutions and using the best solutions as a basis for generating the next generation of candidates. This paper introduces a technique to parallelize GA-based clustering by extending Hadoop MapReduce. An analysis of the proposed approach to evaluate performance gains to a sequential algorithm is presented. The analysis is predicated upon a substantial real-world dataset.

Article Details