European Chemical Bulletin

BIOCHEMISTRY OF FASTING – A REVIEW ON METABOLIC SWITCH AND AUTOPHAGY.
Volume - 13 | Issue-1

ONE-POT ENVIRONMENT FRIENDLY SYNTHESIS OF IMINE DERIVATIVE OF FURFURAL AND ASSESSMENT OF ITS ANTIOXIDANT AND ANTIBACTERIAL POTENTIAL
Volume - 13 | Issue-1

MODELING AND ANALYSIS OF MEDIA INFLUENCE OF INFORMATION DIFFUSION ON THE SPREAD OF CORONA VIRUS PANDEMIC DISEASE (COVID-19)
Volume - 13 | Issue-1

INCIDENCE OF HISTOPATHOLOGICAL FINDINGS IN APPENDECTOMY SPECIMENS IN A TERTIARY CARE HOSPITAL IN TWO-YEAR TIME
Volume - 13 | Issue-1

SEVERITY OF URINARY TRACT INFECTION SYMPTOMS AND THE ANTIBIOTIC RESISTANCE IN A TERTIARY CARE CENTRE IN PAKISTAN
Volume - 13 | Issue-1

Required files to be uploaded

Copyright

Clustering of Bigdata Using Genetic Algorithm in Hadoop MapReduce

PDF

Keywords:

Big Data, Clustering, Davies-Bouldin Index, Parallel Genetic Algorithm, Distributed processing, Hadoop MapReduce.

Chandra Shekhar Gautam , Mr. Laxmi Narayan Soni , Dr. Prabhat Pandey
» doi: 10.48047/ecb/2022.11.12.99

Abstract

The clustering of Bigdata is a common task in data mining and machine learning. The goal is to group similar data points to identify patterns and relationships in the data. However, clustering large datasets can be computationally expensive and time-consuming. This is where Hadoop MapReduce comes in. Hadoop is a sophisticated framework that facilitates the distributed processing of voluminous datasets across multiple clusters of computers. MapReduce is a programming model that simplifies the processing of large datasets by breaking them down into smaller chunks and processing them in parallel across the cluster. One approach to clustering Bigdata using Hadoop MapReduce is to use a genetic algorithm. A genetic algorithm is an optimization technique that is inspired by the process of natural selection. It works by iteratively generating and evaluating candidate solutions and using the best solutions as a basis for generating the next generation of candidates. This paper introduces a technique to parallelize GA-based clustering by extending Hadoop MapReduce. An analysis of the proposed approach to evaluate performance gains to a sequential algorithm is presented. The analysis is predicated upon a substantial real-world dataset.

Issue

Volume -11, issue-12 (2022 )

Submit article