Semantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies

Gourisaria, Mahendra Kumar; Chandra, Satish; Das, Himansu; Patra, Sudhansu Shekhar; Sahni, Manoj; Leon Castro, Ernesto; Singh, Vijander; Kumar, Sandeep

doi:10.3390/healthcare10050881

Publication:
Semantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies

cris.sourceId	oai:repositorio.ucsc.cl:25022009/3305
dc.contributor.author	Gourisaria, Mahendra Kumar
dc.contributor.author	Chandra, Satish
dc.contributor.author	Das, Himansu
dc.contributor.author	Patra, Sudhansu Shekhar
dc.contributor.author	Sahni, Manoj
dc.contributor.author	Leon Castro, Ernesto
dc.contributor.author	Singh, Vijander
dc.contributor.author	Kumar, Sandeep
dc.date.accessioned	2022-10-26T11:28:42Z
dc.date.accessioned	2023-09-11T14:55:00Z
dc.date.available	2022-10-26T11:28:42Z
dc.date.created	2022-10-26T11:28:42Z
dc.date.issued	2022
dc.description.abstract	The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
dc.description.sponsorship	Facultad de Ciencias Económicas y Administrativas
dc.identifier.doi	10.3390/healthcare10050881
dc.identifier.uri	https://repositorio.ucsc.cl/handle/25022009/8513
dc.language	eng
dc.publisher	Healthcare (Switzerland)
dc.rights	acceso abierto
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	BiLSTM
dc.subject	COVID-19 sentiment analysis
dc.subject	Latent Dirichlet Allocation (LDA)
dc.subject	Natural language processing
dc.subject	Topic modeling
dc.subject.ocde	Ciencias médica y de la salud::Ciencias de la salud
dc.title	Semantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies
dc.type	artículo
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: healthcare-10-00881-v2 (1).pdf
Size:: 7.33 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Publicaciones Científicas

Publication: Semantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies

Options

Files

Original bundle

Collections

Publication:
Semantic analysis and topic modelling of Web-Scrapped COVID-19 tweet corpora through data mining methodologies