ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

From RAGs to riches: collecting and processing elite formal networks from large-scale Chinese text through LLMs.

China
Elites
Government
Local Government
Quantitative
Big Data
Empirical
Zheng Zhang
Chinese University of Hong Kong
Pierre Landry
Chinese University of Hong Kong
Zheng Zhang
Chinese University of Hong Kong

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.


Abstract

Artificial Intelligence (AI) tools have been widely touted as a revolutionary breakthrough for research in social sciences, particularly since the introduction of large language models (LLMs) that allow researchers to query and analyze large corpora of textual data. Drawing on rich historical accounts produced by the Chinese administrative state allows researchers to trace political elites’ patterns of mobility and identify their networks. To analyze such data efficiently, we propose an LLM workflow where we streamline our data processing from printed materials to structured datasets that are suitable for statistical modeling and further LLM queries. Our engagement with LLMs is centered around the Retrieval-Augmented Generation (RAG) framework so that we pre-empt the widespread concerns about data hallucinations and carry out queries based on prescribed data sources. Our framework also advances the principles of open, replicable, and transparent research by developing a host of mechanisms to demonstrate the quality of research design and LLM outputs. We acknowledge the challenge of identifying elite networks posted by extant scholarship. Whereas using overlap rules to define ties might overestimate connections between elites, coding documented co-occurrence of individuals as the evidence of connections could suffer from systematic selection bias. With full awareness of these valid concerns, we would like to draw a distinction between formal and informal networks. Although most existing work has strived to identify informal ties and highlight their significance, what has been collected through the application of overlap rules is formal elite networks, a kind of connections shaped by institutional structure. Such ties do not necessarily equate to friendship or a patronage-client network between elites, emphasized in the scholarship of factional politics; they are still informative in predicting how elites interact and serve as a crucial anchor to evaluating those less visible personalized connections. While recognizing the importance of both types of elite networks, our project mostly focuses on collecting and modeling a complete set of formal elite connections. We demonstrate our framework through a proof-of-concept empirical design drawing on administrative elites in Hebei Province and Tianjin Centrally-administered Municipality since the formation of local Communist Party in 1921. Our core personnel data are collected from the Materials on the organization of Party and government institutions until 1987. Such complete coverage allows us to gather reliable career paths of all elites appearing in the Materials and model their mobility and network without any form of selection bias. We first present how the factoids are retrieved from the Materials and organized in order to interact with LLMs. We then evaluate the likelihood of formal connections between elites by time, location, institution, and hierarchical ranks. We also develop a series of tests, such as distinguishing between homonyms though logical inference and evaluating the implied ranks of institutions and individuals. We verify the quality of LLM performance through various tests of data consistency and validity. We also discuss the implications of our results for the ability to scale up analysis with RAGs in a reliable and replicable manner while ensuring quality control at the same time.