ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

A nine-language longitudinal analysis of migration-related content on three social media platforms, 2014-2019

European Politics
Media
Migration
Methods
Quantitative
Social Media
Comparative Perspective
Big Data
Yiyi Chen
Aalborg Universitet
Anamaria Dutceac Segesten
Lunds Universitet
Yiyi Chen
Aalborg Universitet
Anamaria Dutceac Segesten
Lunds Universitet

Abstract

How does media coverage of the topic of migration change over time across nine different languages? Our study addresses this issue in the context of the Syrian refugee crisis in Europe, and covers a period that includes the beginning of the crisis as well as its conclusion (in the form of the adoption by EU member states of stricter asylum policies restricting the Mediterranean route). We use a large dataset (31 million posts) of social media (Twitter messages, YouTube comments and Reddit threads) in nine languages** to address two of the largest challenges of computational text analysis: longitudinal change and multilingual comparison. Our methodological innovation is motivated by the theoretical need to capture the shifting nature of online expressions of opinion, as well as their variation according to domestic political contexts. As computational approaches have gained ground in political communication (Baden et al., 2022)*, computational linguistics methods have become widely embraced. Topic models in particular have enjoyed widespread adoption, especially “bag-of-words” formats such as LDA (Blei et al., 2003). However, LDA models suffer from some significant shortcomings when applied to social media texts. LDA models cannot account for the structural complexity and the evolution of social media content, known for its ephemeral nature. Refinements of LDA models have included a Dynamic Topic Model (Blei & Lafferty, 2006) and a topics-over-time model (Wang & McCallum, 2006). The largest problem encountered when applying the above-mentioned longitudinal models is that they demand extensive computational power that makes modeling of large datasets very difficult. Another problem encountered when dealing with topics of worldwide relevance such as migration is that restricting data to the predominant English language content leads to a skewed view of the phenomenon, which leaves out perspectives from the countries directly affected by migration flows. In the case of the Syrian refugee crisis, the languages of European countries that received most asylum seekers (Greece, Italy, and Spain) would be excluded from the analysis. To address this issue computationally, other studies have relied on automatic translation into English (see Lucas et al., 2015). However, this method is not scalable to large multilingual corpora and is also costly financially. Alternative methods have been to perform topic models separately by language (Heidenreich et al., 2019), to use multilingual word embeddings (Chan et al., 2020) or multilingual dictionaries (Maier et al., 2021). However, these models may not maintain comparability across languages. To address these shortcomings, our paper proposes a new approach that is partly inspired by the Geometry-Driven Longitudinal Topic Model developed by Wang et al. (2021), which has been tested successfully on social media data. This is a probabilistic model, which is a good match for the other solution we adopt to address linguistical diversity, namely the Polylingual Topic Model developed by DeSmet (2009) and tested on three languages and the topic of migration by Lind et al. (2021). *Bibliographical information available upon request. **English, Spanish, French, German, Italian, Dutch, Swedish, Danish, Polish.