.

ISSN 2063-5346
For urgent queries please contact : +918130348310

“HINDI BHASA AND HINDI TEXT MINING USING STATE-OF-THE-ART MACHINE LEARNING METHODS”

Main Article Content

Dr. Ajay Kumar Shukla,Pawan Kumar
» doi: 10.48047/ecb/2023.12.si5.145

Abstract

Hindi Bhasa is an ancient language of India and is still widely used in many areas of the country. However, due to the large amount of linguistically diverse varieties in the language, it has remained a challenge to accurately extract key information from Hindi-language texts, particularly for tasks such as text mining, sentiment analysis, and topic modeling. This paper presents a study on applying state-of-the-art machine learning methods to the task of Hindilanguage text mining. In particular, we explore the effect of contextual pre-processing of the texts on the accuracy of various learning algorithms, as well as the effectiveness of multiple model combinations. We apply five different types of classification models—logistic regression, Naive Bayes, support vector machines, decision tree, and artificial neural networks—on a corpus of Hindi Tweets and evaluate their accuracy. We use both raw and preprocessed (transliterated) data for training these models and analyze the use of ensemble methods for combination of multiple models. Our results on the corpus of tweets show that the models trained on preprocessed data provide better results than the models trained on unprocessed data, and a combination of multiple models further improves the accuracy. The proposed approach can be employed for text-mining applications in Hindi language and may also be applicable to other languages

Article Details