Identifying Social Class and Networks from US Legislative Journals using LLMs

Elites

USA

Methods

Presenter(s)

Ivan Fomichev

Universität Bern

Author(s)

Ivan Fomichev

Universität Bern

Workshop Using Large Language Models (LLMs) and Other AI Tools to Gather and Analyse Political Elite Networks

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.

Abstract

Historical documents, such as nineteenth-century American legislative journals, contain vast amounts of structured information, yet they remain largely inaccessible for quantitative analysis due to non-machine-readable formats and optical character recognition (OCR) noise. This paper presents a scalable, Large Language Model (LLM)-based framework designed to extract structured roll-call data from large corpora of noisy, image-derived text. To overcome LLM context window constraints, the proposed pipeline employs a four-stage "hierarchical narrowing" strategy—indexing, classifying, assembling, and extracting—coupled with internal cross-referencing (matching extracted names to aggregate vote counts) for automated validation. Using the South Carolina House of Representatives (1858–1882) as a demonstration case, the pipeline achieves an extraction accuracy exceeding 95%. The resulting dataset facilitates the construction of legislative co-voting networks, allowing for the analysis of latent factional structures across the dramatic political ruptures of the antebellum, Civil War, and Reconstruction eras. Ultimately, this approach offers a highly generalizable template for recovering sparse, structured data from historical text at scale.

Install the app

Identifying Social Class and Networks from US Legislative Journals using LLMs

Abstract