ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Contribution (4): A Bridge Over the Language Gap: Employing Topic Modelling for Text Analyses Across Languages for Country Comparative Research

European Politics
Migration
Methods
Quantitative
Communication
Comparative Perspective
Big Data
Refugee
Fabienne Lind
University of Vienna
Fabienne Lind
University of Vienna
Jakob-Moritz Eberl
University of Vienna
Olga Eisele
University of Amsterdam
Tobias Heidenreich
WZB Berlin Social Science Center
Hajo Boomgaarden
University of Vienna

Abstract

Our analysis advances country comparative research on the timely topic of migration in Europe. More precisely, our study adds to the understanding of what and how European migration topics are emphasised differently or similarly in different European countries and languages. Methodo-logically, it speaks to the discussion about automated content analysis of multilingual text corpo-ra. We here focus on topic modelling for comparative social sciene questions. When social scientists use topic modelling for discovering topics in a text corpus, they usually do so for texts in a single language, mostly English. This usage does neither reflect the available digitized multilingual text landscape, nor the plethora of substantive comparative research inter-ests. Recent contributions have addressed this shortcoming mostly relying on (expensive) trans-lation routines. To avoid that, we work with the Polylingual Topic Model, proposed by Mimno et al. (2009), which derives lists of related clustered words in different languages without the use of translation. The model bridges the gap between languages by making use of document con-nections, i.e. online links between documents which are not directly translated but cover the same topics (i.e. Wikipedia articles on the same topic in different languages). The main contribu-tion of our approach is that we propose to connect documents by their publication date and a broader joint topical focus. Demonstrating the general applicability in political and communication science, we work exem-plary with four types of political text data (news media coverage, social media communication of political actors, government press releases, and political speeches) in seven languages.