Speaker: Dr. Shi Chen, University of North Carolina Charlotte
Abstract: Infosurveillance/infoveillance substantially complements our current case-based, genomic, and serosurveillance systems for health emergencies such as the current COVID-19 pandemic. Large volumes of social media data provide almost instantaneous and accurate depiction of public discourse of COVID-19, especially public perceptions of the risk of the pandemic, sentiments towards various interventions (e.g., non-pharmaceutical interventions NPIs and vaccinations), and other intermingled societal issues. These insights are critical to infer potential behavioral changes during the pandemic and predict the epidemic dynamics. In this study, we demonstrate several different approaches in analyzing unstructured textual data and characterizing dynamic public opinions during the course of COVID-19, including regular expression (regex), latent Dirichlet allocation (LDA), and data-driven deep learning based bidirectional encoder representation transformers (BERT). We comprehensively sample COVID-19 related discussions on Twitter since early 2020. We show that BERT has the best performance in identifying real-time public opinions of various aspects of COVID-19. This study highlighted the importance of effective infosurveillance from large social media data during health emergencies for public health decision makers.