A Neural Network Model for Algorithmic Positioning of Political Parties
Party Manifestos
Political Methodology
Political Parties
Big Data
Abstract
Ideological position along well-defined axes, such as the standard left-right axis, is of the key variables used to describe political parties. Yet it is not a directly observable quantity. Thus, political scientists utilizing it in their research usually rely on standard data sets such as the Chapel Hill Expert Survey (CHES), Manifesto Research Project (MARP), or Varieties of Democracy (V-DEM). Those, in turn, mostly rely on expert surveys (MARP, which instead calculates party position on the basis of the prevalence of manifesto themes a priori associated with the left- and right-wing, is an exception). The expert survey approach, however, means that researchers are bound by the data availability limitations of whatever data source they are using. Thus, for instance, researchers relying on V-DEM can only use economic left-right dimension of the policy space, while those using Chapel Hill Expert Survey are restricted to 32 countries and to a five-year release cycle. Moreover, expert surveys inevitably introduce a risk of bias.
One alternative is automated party positioning on the basis of their programs. Yet the leading methods in this area, WordScore and WordFish algorithms, have failed to supplement expert surveys as the tool of choice. One reason for their limited performance is that both of them go back to the 2000s and rely on bag-of-words approach to natural language processing which was state-of-the-art back then. However, since that time, the field of algorithmic natural language processing has seen revolutionary advances due to the rise of transformer-based large language models. We seek to propose a new method for algorithmic party positioning, one that builds upon those advances.
We present a machine learning model that predicts a political party's position on a variety of ideological axes, such as the general left-right axis, economic left-right axis, and GAL-TAN axis used by CHES, on the basis of the text of the party's electoral manifesto. The model's main component is a neural network trained on the Manifesto Corpus and Chapel Hill Expert Survey to predict ideological positions on the basis of a high-dimensional vector embedding of the party manifesto. That embedding, in turn, is generated using an ensemble of large language models. We evaluate the model performance on an independent validation set, using the mean square error on the CHES scale as our evaluation metric. We demonstrate that our model constitutes a significant improvement upon standard toolbox such as MARP, WordFish, and WordScore.