Managed Independence? Repeat Contracting, Evaluator Affiliation, and Positive Bias in UN Evaluations
Public Administration
UN
Knowledge
Quantitative
To access full paper downloads, participants are encouraged to install the official Event App, available on the App Store.
Abstract
International Organizations (IOs) rely on evaluations to learn from past program performance and to justify organizational authority to member states and other principals. In the UN system alone, roughly 750 evaluation reports are published annually, and most are now externally commissioned. This trend has created a complex evaluation industry, populated by large and mid-sized consulting firms, individual consultants, universities, and NGOs. As evaluations have been outsourced, independence has come to depend not only on internal oversight but also on procurement choices, including how tenders are designed and which vendors are selected. While external evaluators are formally independent, many depend on recurring work from the same IOs, creating incentives to water down criticism.
This paper examines whether evaluator characteristics and contracting arrangements systematically shape the tone of externally commissioned evaluations across eight UN entities (UNDP, UNICEF, WFP, FAO, UNHCR, ILO, UN Women, and UNIDO). I argue that administrative reliance on repeat contractors can generate relational dependence that biases reporting in predictable ways. To capture these dynamics, I conceptualize outsourcing as a two-mode network linking evaluators and firms to IOs through authored reports and repeated contracts. I then test whether embeddedness in this network, measured by repeat-tie intensity, concentration of work among a small set of consultants, and cross-agency mobility, is associated with more positive evaluative language. I further test whether these dependence effects are moderated by evaluator affiliation, expecting weaker positive bias among university-affiliated evaluators than among commercial providers.
Empirically, the study applies a fine-tuned BERT language model to measure sentence-level assessments and derive positivity scores for over 10,000 evaluation reports, building on Eckhard et al. (2023). These textual measures are merged with newly collected data on evaluator identities, organizational affiliations, and contract histories, assembled via text extraction and scraping of public procurement and author records. The design enables both evaluator-level and organization-level analyses of how contract dependence and administrative choices translate into systematic patterns in evaluative discourse.
The paper contributes to research on the politics of administration in global governance by showing how procurement and staffing practices shape the information IOs produce about themselves. It also speaks to debates on legitimacy and self-legitimation by identifying a mechanism through which evaluation systems can generate systematically more positive performance signals under common contracting conditions.