European Chemical Bulletin

BIOCHEMISTRY OF FASTING – A REVIEW ON METABOLIC SWITCH AND AUTOPHAGY.
Volume - 13 | Issue-1

ONE-POT ENVIRONMENT FRIENDLY SYNTHESIS OF IMINE DERIVATIVE OF FURFURAL AND ASSESSMENT OF ITS ANTIOXIDANT AND ANTIBACTERIAL POTENTIAL
Volume - 13 | Issue-1

MODELING AND ANALYSIS OF MEDIA INFLUENCE OF INFORMATION DIFFUSION ON THE SPREAD OF CORONA VIRUS PANDEMIC DISEASE (COVID-19)
Volume - 13 | Issue-1

INCIDENCE OF HISTOPATHOLOGICAL FINDINGS IN APPENDECTOMY SPECIMENS IN A TERTIARY CARE HOSPITAL IN TWO-YEAR TIME
Volume - 13 | Issue-1

SEVERITY OF URINARY TRACT INFECTION SYMPTOMS AND THE ANTIBIOTIC RESISTANCE IN A TERTIARY CARE CENTRE IN PAKISTAN
Volume - 13 | Issue-1

Required files to be uploaded

Copyright

A Proposal of Framework Improving Sentiment Classifier using TF-IDF for the Twitter Dataset and the Tweet Length

PDF

Keywords:

Sentiment Analysis and Opinion Mining, Feature Extraction, TF-IDF, Twitter Dataset

Hoong-Cheng Soong, Ramesh Kumar Ayyasamy,Nur Syadhila Che Lah
» doi: 10.48047/ecb/2023.12.si4.1485

Abstract

Sentiment Analysis and Opinion Mining are relevant nowadays in many sectors to determine sentiment polarity towards an entity or its aspects. It provides a high percentage of the forecast of the triumph of something, be it events, products, organisations, persons and many more, especially the opinions retrieved from the eminent social media such as Facebook, Twitter or others. Nonetheless, many of the researchers focus on the techniques themselves in the classifying methods without discussing more of the pre-processing parts for the improvement to improve the accuracy based on the corpus sizes and the tweet length or even on the word-embedding or text vectorisation before the passing the tasks to the sentiment classifiers using the myriad of machine learning/deep learning methods. Pre-processing methods, particularly in Natural Language Processing (NLP) stages such as stopwords removal, lemmatisation, stemming, and others, should be done because it is related to the Sentiment Analysis. TF-IDF stands for “Term Frequency — Inverse Document Frequency.” and is essential in text retrieval methods to emphasize the crucial words with different weightage in the documents that are undoubtedly helpful to Sentiment Analysis during Word Embedding or Text Vectorisation stages. Furthermore, the tweet length and data sizes as the corpus should identify the effects on the accuracy of the Sentiment Analysis. It is noted that currently, Twitter has allowed from 140 characters to 280 characters for the tweet length, which is interesting to discuss for the Sentiment Analysis using the Twitter dataset. In short, this research proposed several options if the corpus has fewer or fewer datasets and, on the contrary, with massive datasets

Issue

Volume -12, Special Issue-4 (2023 )

Submit article