Natural Language Processing of Tweets to Explore Mental Health
Link to Final Paper/Github Repository
https://github.com/aldenfelix/Twitter_NLP
Abstract
The onset of the COVID-19 Pandemic was marked by sudden and drastic changes to daily life. These changes combined with the worsening global health crisis severely impacted the mental health of many people. This study seeks to understand the ways in which people’s mental health was impacted while also evaluating the effectiveness of Twitter (X) tweets as an alternative method of traditional public survey. To this end I employ natural language processing techniques consisting of topic modeling and a dictionary method on a sample of tweets taken from the Dallas Forth Worth Metroplex during 3 periods: before the pandemic, the start of the pandemic, and a year after the start of the pandemic. My research provides insights on the effective use of tweets in natural language processing for future research to build upon.
Project Presentation
>Methods & Data
As a result of changes to the Twitter API the ideal option of pulling data relevant to this study was not available. Searching for publicly available datasets of tweets relevant to this study also yielded no results. Therefore, the data employed in this study was taken from one of my previous projects where twitter data had been retrieved before changes to the API. The dataset consists of about 480,000 tweets across all 3 periods.
The natural language processing methods employed in this study are topic modeling and a dictionary method. I preprocessed the tweets by removing punctuation, numbers, stop words, and whitespace, transforming all words to lowercase, and stemming terms. Applying topic modeling on the tweets allows us to examine how the pandemic affected individuals by grouping together the words most commonly associated with each other across tweets. This is a powerful unsupervised machine learning technique that can serve as an alternative to traditional survey methods. It has the advantage of not leading responses with questions like a survey might, but has a disadvantage of information loss with less frequently mentioned topics (Grimmer et al., 2022; Roberts et al., 2014). A dictionary method on the other hand is useful in observing the intensity of mental health issues that occurred in my sample. It consists of a simple count of words across all tweets that are found in a supplied dictionary. The ideal dictionary for this study would include terms that only capture discussions related to mental health. However, the 5 simplicity of the dictionary method means that complex ideas such as mental health, which can include numerous terms used commonly in other topics, may not be well captured (Grimmer et al., 2022). The dictionary used in this study (Table 1 in the presentation and paper) was built based on a literature review of previous mental health studies using Twitter data and an analysis of the terms used in this study’s collected tweets.
References
Brooks, S. K., Webster, R. K., Smith, L. E., Woodland, L., Wessely, S., Greenberg, N., & Rubin, G. J. (2020). The psychological impact of quarantine and how to reduce it: rapid review of the evidence. The lancet, 395(10227), 912-920.
Cowan, K. (2020). Survey results: Understanding people’s concerns about the mental health impacts of the COVID-19 pandemic. MQ: Transforming mental health and the academy of medical Sciences, 2020.
Edo-Osagie, O., De La Iglesia, B., Lake, I., & Edeghere, O. (2020). A scoping review of the use of Twitter for public health research. Computers in biology and medicine, 122, 103770. https://doi.org/10.1016/j.compbiomed.2020.103770
Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. (2022). Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press. Chapter 13.
Marshall, C., Lanyi, K., Green, R., Wilkins, G. C., Pearson, F., & Craig, D. (2022). Using Natural Language Processing to Explore Mental Health Insights From UK Tweets During the COVID-19 Pandemic: Infodemiology Study. JMIR infodemiology, 2(1), e32449. https://doi.org/10.2196/32449
Ortiz-Ospina, E. (2019). The rise of social media. OurWorldInData.org. https://ourworldindata.org/rise-of-social-media
Pedrosa, A. L., Bitencourt, L., Fróes, A. C. F., Cazumbá, M. L. B., Campos, R. G. B., de Brito, S. B. C. S., & Simões E Silva, A. C. (2020). Emotional, Behavioral, and Psychological Impact of the COVID-19 Pandemic. Frontiers in psychology, 11, 566212. https://doi.org/10.3389/fpsyg.2020.566212
Pradyumn, M., Kapoor, A., & Tabrizi, N. (2018). Big Data Analytics on Twitter: A Systematic Review of Applications and Methods. Big Data – BigData 2018, 326–333. https://doi.org/10.1007/978-3-319-94301-5_26
Qiu, J., Shen, B., Zhao, M., Wang, Z., Xie, B., & Xu, Y. (2020). A nationwide survey of psychological distress among Chinese people in the COVID-19 epidemic: implications and policy recommendations. General psychiatry, 33(2), e100213. https://doi.org/10.1136/gpsych-2020-100213
Roberts, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder- Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand. (2014). “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58: 1064-1082.
Sengupta, S., Mugde, S., & Sharma, G. (2020). An Exploration of Impact of COVID 19 on mental health -Analysis of tweets using Natural Language Processing techniques. medRxiv. https://doi.org/10.1101/2020.07.30.20165571
Siddiqui, S., Alhamdi, H. W. S., & Alghamdi, H. A. (2022). Recent Chronology of COVID-19 Pandemic. Frontiers in public health, 10, 778037. https://doi.org/10.3389/fpubh.2022.778037
Zhang, J., Lu, H., Zeng, H., Zhang, S., Du, Q., Jiang, T., & Du, B. (2020). The differential psychological distress of populations affected by the COVID-19 pandemic. Brain, behavior, and immunity, 87, 49–50. https://doi.org/10.1016/j.bbi.2020.04.031