ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

The Analytical Capacities of Vector-Based Text Analysis Methods in Political Science

Comparative Politics
Parliaments
Political Methodology
Quantitative
Big Data
Christoph Deppe
Helmut-Schmidt-University/University of the Armed Forces Hamburg

Abstract

Recent methodological innovations in natural language processing promise increasing analytical capacities, for potential applications throughout political science. The demand for methodological innovation in political science is apparent, since researchers now have access to more political text than ever before. This is especially visible in the area of parliamentary research since the plenary protocols of most parliaments are now digitally available in annotated corpora (Blaette 2017). The use of word vector based methods can be a practical new way for political scientists to analyze these vast amounts of political text with minimal resources, while providing a high level of language representation (Rodman 2020). These innovations thus have the potential to become powerful tools in political science. However, they should be implemented mindfully and their results should not be taken at face value. The verification processes these novel methods undergo in computer science are not suitable to verify them for applications in political science. Instead, they need to undergo a critical methodological reflection and verification in the context of political science in order to validate them for their specific applications in political science. This paper shall therefore contribute to the validation of word vector based methods for the analysis of textual data in political science, namely German language plenary protocols, by providing a comparative application of four generations of text analysis methods for measuring document distances. The following four generations of methods will be included in the analysis: First, a human coded analysis to measure ideological distances. Second, the word frequency based ideological scaling method Wordfish (Slapin and Proksch 2008). Third, Word2Vec, which is used to estimate word representations in a vector space from training data (Mikolov et al. 2013), as well as Word Mover's Distance (WMD), which utilizes word vectors, generated by Word2Vec, to calculate the distance between texts (Kusner et al. 2015). And fourth, the novel language representation model BERT (Devlin et al. 2018). The methods of generation one and two will be included in the paper to provide reference points for the evaluation of the novel methods. First, the paper will feature a disscussion of advantages and drawbacks of the advanced vector based methods Word2Vec, WMD and lastly BERT. Second, all four methodological generations shall be applied in a case study of the parliamentary debates on direct democracy in the state parliament of the German city state Hamburg between 1997 and 2018. The aim of the case study will be to measure the position and polarization of parties on the issue of direct democracy in Hamburg over time. The topic direct democracy was highly conflictive in the recent political history of Hamburg and is therefore very suitable to measure the position and polarization of parties over time. Third, the analyses compared to one another to evaluate the analytical added value of the novel methods in contrast to the known capacities of the veteran methods. This may serve as a starting point to evaluate, if these novel methods can indeed provide more analytical capacities to the study of parliamentary debates.