ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Mapping the Media in Two Dimensions through Large Language Models and Machine Learning

Media
Internet
Political Ideology
Public Opinion
Technology
Voting Behaviour
Big Data
Influence
Lucas Paulo da Silva
Trinity College Dublin
Lucas Paulo da Silva
Trinity College Dublin

Abstract

The "minimal media effects" literature argues that media outlets do not play a large role in influencing voting behaviour, primarily because voters already select ideologically like-minded media outlets. However, cross-pressured voters (CPVs) may lack these like-minded outlets in the first place. CPVs hold economically-leftist and culturally-conservative positions (left-conservatives) or economically-rightist and culturally-progressive positions (right-progressives). Based on the market and journalistic models of media ideology, I argue that media outlets that are more economically-rightist are also more culturally-conservative. Previous approaches to media ideology have been unable to test this relationship because they lack data on the two-dimensional ideological positions of a sufficiently large number of outlets. To test this relationship for the first time, I must take an innovative approach to data collection. I construct a novel corpus of 120,000 online news articles from 120 outlets in the UK, Ireland, the US, Canada, Australia, and New Zealand. This provides enough articles to measure the economic and cultural positions of each outlet and enough outlets to test the relationship between these positions. However, it is difficult to classify an article’s ideological position because this involves high-level contextual reasoning. Furthermore, news articles are not usually explicit about ideological positions. Automated dictionary approaches struggle to accurately classify the ideology of such texts, especially in two distinct ideological dimensions. Human-coding would be highly expensive and time-consuming for this project – even to obtain a smaller training set for supervised machine learning classification. Thus, I use an innovative approach that incorporates large language models like OpenAI’s GPT. First, I use a Structural Topic Model to assign the 120,000 articles to economic or cultural "supertopics," Next, I use large language models to label 2000 of the economic articles as leftist/centrist/rightist and 2000 of the cultural articles as progressive/centrist/conservative. In early tests, I have compared 200 of these articles to expert-coded labels and found impressive similarity, despite the challenges of labelling ideologically-subtle texts on two different ideological dimensions. This provides a labelled dataset of 4000 articles. I use the embeddings of this labelled data to train, test, and validate a supervised machine learning model (Support Vector Machine). Then I calculate ideological scores for the remaining 116,000 articles using the Support Vector Machine’s predicted probabilities for ideological classifications. Finally, I compute the mean economic and cultural positions for each outlet and test the relationship with regression. This design incorporates a variety of innovative methods to examine a relationship that has been difficult to test so far. These methods enable research that is consequential for academia and society, allowing us to better understand the political media landscape, media effects, and voting behaviour. If CPVs are constrained by the supply of media to select outlets with which they disagree on at least one of the two dimensions (economic or cultural), this exposure to divergent media outlets could significantly alter their voting calculations and voting behaviour.