ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

The (real) need for a human touch: Testing a human-machine hybrid topic classification workflow on a New York Times corpus

Media
Methods
Quantitative
Communication
Big Data
Akos Mate
HUN-REN Centre for Social Sciences
Akos Mate
HUN-REN Centre for Social Sciences
Miklos Sebok
HUN-REN Centre for Social Sciences

Abstract

The classification of the items of ever-increasing textual databases has become an important goal for a number of research groups active in the field of comparative politics (see, for instance, the Comparative Agendas Project – CAP). Although trained human classifiers are still the gold standard for most policy topic labelling projects such as CAP, there is a growing number of use-cases where the initial effort of human classifiers was successfully augmented through the use of supervised machine learning (SML) based classification. In this paper we investigate such a hybrid workflow solution classifying the lead paragraphs of New York Times front page articles from 1996 to 2006 according to the CAP policy categories. We find that using human coding and validation combined with an ensemble SML hybrid approach can reduce the need for human coding while maintaining very high precision rates, and offering a modest to good level of recall. The modularity of this hybrid workflow allows for various setups to address the idiosyncratic resource bottlenecks that a large-scale text classification project might face.