ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

The best of both worlds - Improving human coding with automatic classification models

Party Manifestos
Political Parties
Methods
Pola Lehmann
WZB Berlin Social Science Center
Pola Lehmann
WZB Berlin Social Science Center
Tobias Burst
WZB Berlin Social Science Center

Abstract

Human coding of large text corpora is an important, but very labor intensive and time-consuming pillar of political science research. Consequently the rise and constant improvement of automated classification models presents itself as a perfect possibility to alleviate this burden, by enabling researchers to execute large coding tasks within a fraction of the time and work typically required. However, the reliability, consistency and quality of automated coding remains a concern. Can the computer really take over from humans and how comparable will the results be? This question is especially important for long-standing human coding projects like the Manifesto Project that do not only need to ensure reliable results at a certain time point but over time. In our paper, we examine the performance of a shared layer XML-RoBERTa model for the fine-grained coding scheme of the Manifesto Project, utilizing the context of the sentence to be classified. Our results show that the model capabilities outperform existing manifesto classification models, while staying below the quality level of human coding. In light of this, we discuss the potential benefits and limitations of automated coding approaches for coding tasks, including potential areas for and the extent of its integration in the coding process.