ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Crowdsourcing arguments from debates and its implications in machine learning

Internet
Methods
Lab Experiments
Rafael Mestre
University of Southampton
Rafael Mestre
University of Southampton

Abstract

Machine learning and, in particular, natural language processing techniques have been shown to be a powerful tool for analyzing political discourse’s influence on news media, social networks and political debates. A rapidly growing field in this realm is argument mining, or the automated extraction of arguments from discussions or deliberations. Despite the fast-paced advances in natural language processing, the task of argument identification is still difficult to perform with accuracy, and it heavily relies on great quantities of data annotated by experts for claims, premises, and similar labels. As democracies develop in a digital age, the general public understanding of arguments is an ever more important as democratic publics are required to constantly weigh opposing arguments from a variety of sources in political campaigns and debates. Crowdsourcing has been extensively used by many scholars to rapidly annotate large amounts of data or to produce datasets for machine learning models to be trained on. However, recently, many scholars have been referring to the “crisis of crowdsourcing”, as they have discovered that many annotators might be using bots to automate the task or not engaging seriously with the task. To partly solve this issue, many of the platforms now include quality settings that can ensure the researcher that the annotators are engaging with the task. The usefulness of this methodology in the identification of arguments in political debates is not so clearly established, as preliminary work has shown that crowd annotators might identify arguments with significant differences to those identified by experts in argumentation. One possible reason of this discrepancy could be the difficulty of the task, as crowdsourcing works better when small and simple tasks that do not require high intellectual engagement from the user. In this paper, we present a study in which we aim at unraveling the disagreements when identifying arguments in presidential debates by crowd workers of two different sources. We use a set of transcripts from the US presidential debates of 2016 between Joe Biden and Donald Trump. We ask a general crowd of online annotators from a web platform to annotate the argumentative relation between pairs of sentences in the debate, namely ‘support’, ‘attack’ or ‘neither’. We also ask another smaller crowd of expert in-house annotators provided by the company to do the same task, which are also able to provide feedback on why they chose their answers. We test the hypothesis that the in-house annotators, as they are hired by the company to do annotations jobs of different kinds, will have more experience and provide annotations with a higher agreement. We discover, however, that both sets of annotators have very similar levels of low agreement, suggesting that the task of argument annotations might be too complex for crowd annotation, even if the annotators have high experience in crowdworking. Finally, we qualitatively review some of the answers and reasons provided and we shed some light into which scenarios tend to be sources of disagreement and which ones usually have a higher level of agreement.