ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Extracting Elite Career Data from Arabic CVs: An Open-Source LLM Approach

Comparative Politics
Elites
Methods
Empirical
Gilad Wenig
University of California, Los Angeles
Gilad Wenig
University of California, Los Angeles

To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.


Abstract

This paper demonstrates the use of open-source LLMs for extracting structured career information from Arabic CVs, enabling systematic analysis of political elite networks across time. We address key methodological questions about LLM selection, prompt engineering, and validation, all core concerns for computational elite network research. Studies of elite networks have traditionally relied on labor-intensive manual data collection. These challenges are particularly acute in authoritarian states, where institutional opacity obscures the informal networks through which power often operates. Recent scholarship shows the potential for gathering elite data computationally (Lee & Shih 2023; González-Bustamante 2025), with the “promptbook” framework (Stuhler et al. 2025) offering a systematic approach to LLM-based information extraction (IE) from unstructured text. Yet most work examines English-language sources using proprietary models. Critical questions remain: Can open-source models match proprietary performance? How do prompting strategies affect extraction quality? We address these questions using 1,087 Kuwaiti political elite CVs from biographical directories spanning 1997–2012. Using the open-source Qwen3-4B-Instruct model, we compare IE quality across different prompting approaches, from minimal baselines to highly engineered prompts incorporating domain-specific rules for institution name standardization and inference logic. Enhanced prompts achieved 97.5% success rates and reduced missing institutional affiliations by 50%, critical improvements for generating network data. Post-processing recovered all failures, achieving 100% coverage and extracting 6,401 career entries. While our current contribution centers on the extraction pipeline, these data enable several analytical pathways central to elite network research. The structured career histories support the construction of institutional co-membership networks across ministries, the military, business, academia, parliament, and the diplomatic corps — relationships especially relevant in authoritarian settings where informal ties often cross organizational boundaries. The data also allow for subsequent career sequence analysis to identify pathways to influence, and for linking institutional experience to legislative behavior such as voting or speech patterns. By producing high-quality structured data at scale, our approach provides the necessary empirical foundation for such analyses. This paper demonstrates how open-source models can support rigorous, replicable elite research in the Middle East and beyond. We discuss validation strategies and highlight methodological considerations for scholars applying LLM-based extraction to elite networks across diverse political contexts. References González-Bustamante, Bastián. 2025. “Machine Learning and Political Events: Application of a Semi-Supervised Approach to Produce a Dataset on Presidential Cabinets.” Social Science Computer Review 1–14. Lee, Jonghyuk, and Victor C. Shih. 2023. “Machine-Learning Analysis of Leadership Formation in China to Parse the Roles of Loyalty and Institutional Norms.” Proceedings of the National Academy of Sciences 120(45): 1–7. Stuhler, Oscar, Cat Dang Ton, and Etienne Ollion. 2025. “From Codebooks to Promptbooks: Extracting Information from Text with Generative Large Language Models.” Sociological Methods & Research 54(3): 794–898.