From Semantics to Substance: Refining Ideological Analysis in Legislative Texts
Parliaments
Political Methodology
Methods
Quantitative
Political Ideology
Abstract
Advancements in Natural Language Processing (NLP) and the growing availability of alternative political texts, such as parliamentary speeches and social media data, have offered new opportunities to determine parties’ policy-preferences and to locate them and politicians within latent ideological spaces. This is fundamental to understand political dynamics and addressing key research questions in the field.
While early computational tools, such as WordScore and WordFish, set the foundations for text-as-data approaches, they have been progressively replaced by word embeddings like Word2Vec, GloVe, and FastText. More recently, the advent and proliferation of transformer-based models have further revolutionized the field, reaching high correlations with more traditional and largely used measures (e.g. Manifesto Project, Chapel Hill Expert Survey). Yet, all these methods share a key theoretical assumption: semantic similarity in political language reflects ideological proximity. While political language is undoubtedly influenced by ideology and policy positioning, it is also shaped by numerous other factors, including rhetorical style, emotional and psychological appeals, audience targeting, and contextual influences. This means these elements can significantly impact the linguistic patterns of political actors, potentially introducing bias into results and distorting their ideological placement in latent spaces. Furthermore, this risk is heightened when analyzing policy preferences at a finer level of granularity or when the textual data available for analysis is limited.
Thus, this assumption warrants critical scrutiny. Does semantic similarity reliably indicate ideological proximity? Does that hold even when policy preferences are analyzed at a more granular level? An evaluation of whether state-of-the-art NLP techniques, particularly transformer-based models, are able to transcend stylistic differences in political language and accurately capture substantive ideological content is necessary. To test this, this paper uses two different sources of political text: parliamentary speeches and legislative bills, using the Portuguese case, spanning from 2002 until 2024. First, we assess parliamentary speeches, which are frequently used to study rhetorical features of political discourse and are known to be influenced by factors beyond ideological considerations. Second, we examine legislative bills which, given their formal purpose and institutional context, are likely to be less influenced by rhetorical elements and more policy-oriented. We then compare the results against established benchmarks, specifically the Manifesto Project and roll-call vote analyses, to assess the extent to which semantic similarity can be equated with ideological positioning.
These findings have several potential implications for legislative studies and beyond. First, it promotes a critical reflection of a frequently overlooked yet foundational theoretical assumption underlining established methodologies. Second, it proposes a framework for refining the ideological placement of parties and politicians within specific policy domains, considering diverse political content. Third, this approach may offer valuable insights into the dynamics of polarization over extended periods, shedding light on long-term trends. Finally, they have the potential to enhance voter communication by providing a clearer and more transparent understanding of where parties stand on various issues and the extent of ideological divergence between them.