ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

The Use of Machine Learning Methods in Political Science: an In-depth Literature Review

Methods
Mixed Methods
Big Data
Jef de Slegte
Vrije Universiteit Brussel
Jef de Slegte
Vrije Universiteit Brussel
Bram Spruyt
Vrije Universiteit Brussel
Filip Van Droogenbroeck
Vrije Universiteit Brussel

Abstract

In the past decade, the application of machine learning methods in political science has experienced a growth in popularity. The increase in data volume and data sources motivated researchers to turn to these inductive data-driven methods as an alternative to deductive classical statistics. Especially when analyzing large text corpora, the use of machine learning has significantly contributed to generating new insights in various domains, such as political communication, conflict \& peace studies, or policy studies. Several review papers have proposed theoretical typologies for applying machine learning in political science and the social sciences in general. What is missing, however, is an overview of how and why machine learning methods are actually implemented in the field of political science. The aim of this study is therefore to conduct an empirical analysis of the political science literature that uses machine learning as a research method, and to respond to the following research questions: (1) in which research domains are machine learning methods most commonly utilized, (2) which machine learning methods are the most and least prevalent in political science, (3) are these machine learning methods implemented according to best practices, and (4) what are the opportunities for future research given the rapid evolution in machine learning? We applied the PRISMA framework for systematic review studies to select 339 articles (1990 - 2022) from Web of Science and Scopus in a structural manner, evaluated their relevance to the review study based on a set of inclusion criteria, and created a dedicated database to store the key characteristics of the articles. We found that the use of machine learning is the most prevalent in political communication and conflict and peace studies. For text-as-data, topic modelling is the most used method, while for tabular data support vector machine and random forest are most prevalent. Furthermore, our results indicate that the reporting on the optimization of machine learning models through hyper-parameter tuning could be more transparent in the majority of studies. Researchers often opt for existing applied literature when selecting a machine learning method, missing the opportunity to fully harness the potential of these methods, where conducting their own benchmarking exercise would be more beneficial for choosing the most suitable model.