ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

As bad as humans: Can machine learning text analysis be used for framing analysis

Media
Methods
Communication
Big Data
Constantinos Djouvas
Cyprus University of Technology
Constantinos Djouvas
Cyprus University of Technology
Vasileios Manavopoulos
University of Cyprus
Fernando Mendez
University of Zurich

Abstract

In this paper we validate the suitability of state-of-the-art machine learning techniques for performing framing analysis. A core methodology in political communication research, framing analysis is one of the dominant approaches to the study of news media. As defined by Gamson and Modigliani (1989) in a widely cited work, a media frame is “a central organizing idea or story line that provides meaning to an unfolding strip of events". The way in which different aspects of a news story or a message are made salient is what we refer to as framing. Traditionally, framing research is a labour intensive process requiring multiple iterations of deductive and inductive approaches to frame building. The former process draws on a-priori frames while inductive approaches are more issue-specific and self-generating. Researchers typically blend the two approaches to produce a very detailed codebook of frames and coding instructions. Coders frequently require extensive training over multiple rounds. Described as such, framing analysis is an area of social science research that could putatively benefit from some of the latest advances in computational text analysis. Not surprisingly, there have been computational approaches to framing analysis. While Walter and Ophir (2019) draw on topic models to inductively identify frames in US news channels, Kwak et al (2020) leverage machine learning models -namely a pre-trained BERT model- to build a media frame classifier. Our approach is closer to the second group and similarly leverages pre-trained transformer models. As with Kwak et al (2020) we also take inspiration from an earlier computational approach to media frames -the media frames corpus by Card et al (2015), who noted the inherent difficulty of conducting framing analysis, its subjectivity and the typically low levels of inter-coder agreement among annotators. This is our starting point in asking the following question: Could a turn to machine learning text analysis yield results that are at least as bad as humans. While setting the bar seemingly low, the gains would be non-trivial if computational methods could significantly reduce the human work load while achieving similar results. To validate our turn to machine learning we draw on two distinct types of corpora -news articles and tweets- that focus on a type of democratic innovation known as participatory budgeting. The collection of documents is in three international languages (English, French and Spanish) and comprises just under 30,000 news articles and over a quarter-of-a-million tweets. We use computational methods to perform two critical tasks: (i) identify relevant documents and (ii) identify frames. Our procedures are validated against a manually coded test set of 600 news articles and 3,000 tweets. The results show that the proposed methodology would not only have saved significant human resources but, most importantly, vastly expanded the scope of the analysis.