Publication:
Semantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies

cris.sourceIdoai:repositorio.ucsc.cl:25022009/3305
dc.contributor.authorGourisaria, Mahendra Kumar
dc.contributor.authorChandra, Satish
dc.contributor.authorDas, Himansu
dc.contributor.authorPatra, Sudhansu Shekhar
dc.contributor.authorSahni, Manoj
dc.contributor.authorLeon Castro, Ernesto
dc.contributor.authorSingh, Vijander
dc.contributor.authorKumar, Sandeep
dc.date.accessioned2022-10-26T11:28:42Z
dc.date.accessioned2023-09-11T14:55:00Z
dc.date.available2022-10-26T11:28:42Z
dc.date.created2022-10-26T11:28:42Z
dc.date.issued2022
dc.description.abstractThe evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
dc.description.sponsorshipFacultad de Ciencias Económicas y Administrativas
dc.identifier.doi10.3390/healthcare10050881
dc.identifier.urihttps://repositorio.ucsc.cl/handle/25022009/8513
dc.languageeng
dc.publisherHealthcare (Switzerland)
dc.rightsacceso abierto
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectBiLSTM
dc.subjectCOVID-19 sentiment analysis
dc.subjectLatent Dirichlet Allocation (LDA)
dc.subjectNatural language processing
dc.subjectTopic modeling
dc.subject.ocdeCiencias médica y de la salud::Ciencias de la salud
dc.titleSemantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies
dc.typeartículo
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
healthcare-10-00881-v2 (1).pdf
Size:
7.33 MB
Format:
Adobe Portable Document Format
Description: