European Chemical Bulletin

BIOCHEMISTRY OF FASTING – A REVIEW ON METABOLIC SWITCH AND AUTOPHAGY.
Volume - 13 | Issue-1

ONE-POT ENVIRONMENT FRIENDLY SYNTHESIS OF IMINE DERIVATIVE OF FURFURAL AND ASSESSMENT OF ITS ANTIOXIDANT AND ANTIBACTERIAL POTENTIAL
Volume - 13 | Issue-1

MODELING AND ANALYSIS OF MEDIA INFLUENCE OF INFORMATION DIFFUSION ON THE SPREAD OF CORONA VIRUS PANDEMIC DISEASE (COVID-19)
Volume - 13 | Issue-1

INCIDENCE OF HISTOPATHOLOGICAL FINDINGS IN APPENDECTOMY SPECIMENS IN A TERTIARY CARE HOSPITAL IN TWO-YEAR TIME
Volume - 13 | Issue-1

SEVERITY OF URINARY TRACT INFECTION SYMPTOMS AND THE ANTIBIOTIC RESISTANCE IN A TERTIARY CARE CENTRE IN PAKISTAN
Volume - 13 | Issue-1

Required files to be uploaded

Copyright

“HINDI BHASA AND HINDI TEXT MINING USING STATE-OF-THE-ART MACHINE LEARNING METHODS”

PDF

Keywords:

Hindi Bhasa, Machine Learning Methods, Text Classification, Analysis of Short Texts

Dr. Ajay Kumar Shukla,Pawan Kumar
» doi: 10.48047/ecb/2023.12.si5.145

Abstract

Hindi Bhasa is an ancient language of India and is still widely used in many areas of the country. However, due to the large amount of linguistically diverse varieties in the language, it has remained a challenge to accurately extract key information from Hindi-language texts, particularly for tasks such as text mining, sentiment analysis, and topic modeling. This paper presents a study on applying state-of-the-art machine learning methods to the task of Hindilanguage text mining. In particular, we explore the effect of contextual pre-processing of the texts on the accuracy of various learning algorithms, as well as the effectiveness of multiple model combinations. We apply five different types of classification models—logistic regression, Naive Bayes, support vector machines, decision tree, and artificial neural networks—on a corpus of Hindi Tweets and evaluate their accuracy. We use both raw and preprocessed (transliterated) data for training these models and analyze the use of ensemble methods for combination of multiple models. Our results on the corpus of tweets show that the models trained on preprocessed data provide better results than the models trained on unprocessed data, and a combination of multiple models further improves the accuracy. The proposed approach can be employed for text-mining applications in Hindi language and may also be applicable to other languages

Issue

Volume -12, Special Issue-5 (2023 )

Submit article