In recent years, quantitative studies have started to look at the natural
language content in parliamentary debates as a primary data source, arguing that
party classification and misclassification can be used as measures of
polarization between parties in parliamentary systems. The intuition is that
better performing classifiers indicate more polarization between parties, and
worse performing classifiers indicate less polarization; when the classifier can
not distinguish between parties they are more alike and vice versa. What has been
ignored in this exercise, however, is the effect of different pre-processing
decisions and model specification approaches on classification performance. Do
different ways of modeling language lead to different conclusions?
In this study, we show how pre-processing and model specification can influence
these measures. We utilize a richly annotated dataset of parliamentary speeches
in the Norwegian Storting from the 1998-1999 to 2015-2016 session. In addition to
metadata at the speech level, member of parliaments, parties, and more, the
speeches themselves are automatically annotated with sentence boundaries,
parts-of-speech and lemmas. Using both institutional and linguistic features, we
are able to build both bare-bone language classifiers and ones that take more of
the language complexity into account.
Our main hypothesis is that if the results are stable across specifications,
classification performance as a polarization measure is a promising approach.
Conversely, if results vary significantly over specifications, we would need to
be careful when applying measures of polarization derived from party
classification of speech data: it will sometimes be unclear whether variation in
the precision of a classifier is caused by misspecified models, deviation from
the party line, or polarization between parties.