ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Using AI to extract conflict events from historical text sources: The case of the Veritable Annals of the Choson Dynasty, 1413-1865

Conflict
Methods
War
State Power
Big Data
Charles Miller
Australian National University
Dongwook Kim
Australian National University
Charles Miller
Australian National University

Abstract

An extensive body of literature argues that violent conflict is a key driver of state formation and political development. However, difficulties in gathering data on such conflict for the pre-modern time period and in non-Western contexts pose challenges to testing and advancing the argument. Particularly, one of these difficulties is the time and cost of extracting information on conflict from primary text sources, especially in non-European languages. In this paper, we propose to demonstrate and evaluate the ability of large language models (LLMs) to produce conflict event data from pre-modern non-Western sources at scale, using the Veritable Annals of the Choson Dynasty, 1413-1865 as our crucial case. Registered in the UNESCO’s Memory of the World, the Annals record daily events that occurred across the Choson Dynasty, which ruled modern-day North and South Korea from 1413 to 1865. They thus provide an excellent primary text source for conflict events over more than four centuries. However, the scale of this source, comprising 419,615 entries, makes the cost of using human coders to extract data from it prohibitive. In this paper, we propose a solution to this problem, using AI-based tools. We do so by employing various AI-based LLMs to extract data on conflict events from a random sample of the entries, and by validating their performance against ground truth data generated by a native Korean-speaking research assistant. The models are OpenAI’s commercial GPT 4, an original T5-based Korean-language LLM, and a version of the latter fine-tuned specifically to detect conflict events. We compare accuracy and costs for each method. We then discuss the potential of this approach to be applied to other historical contexts outside of Korea, especially in East and South Asia, and the contribution which this would represent to understanding the relationship between conflict and state development.