Does Bias Get Too Personal? Auditing Effects of Personalization and Randomization on Politics-Related Web Searches in Switzerland
Voting
Internet
Mixed Methods
Technology
Big Data
Abstract
Search engines (SEs) are prone to algorithmic bias, which can be defined as a tendency to produce outputs that are systematically skewed toward the interests of specific social groups (Friedman & Nissenbaum, 1996) and can reinforce different types of stereotypes (Noble, 2018). Due to high trust in SEs’ outputs and their capacity to affect individual political preferences (Epstein & Robertson, 2015), bias can distort users’ perception of social reality and affect political decision-making. However, despite the growing interest in algorithmic bias in the context of politics-related information curation (e.g., Pradel, 2021), there is still a shortage of studies examining factors leading to biased outputs, in particular search personalization and randomization, which are two key features of SE performance.
We aim to address this gap using data from a large-scale project combining a survey and a systematic algorithm audit in Switzerland. Following the approach of Unkel and Haim (2021), we developed algorithmic personas to audit how SEs personalize information about popular votes in three major national languages in Switzerland (i.e., German, French, and Italian). We focus on two popular initiatives regarding retirement, which were voted for in March 2024. Using a representative survey of Swiss internet users (N = 1131), where we inquired about search queries and websites used to find information about each initiative, we constructed multiple algorithmic personas (N=60). Each persona consisted of up to 80 queries and 16 websites, which were grouped together based on respondents’ voting intentions and political preferences. During the data collection, 20 virtual agents (i.e., Selenium-based scripts) per persona (except a number of control personas) visited a set of websites or searched for these websites online to model the history of interactions, which can be used by search engines to personalize their outputs. Then, each agent entered queries associated with the persona and recorded search outputs.
We audited the two search engines — Google and Bing — and collected the first page of search results due to users rarely going beyond it (Urman & Makhortykh, 2023). Altogether, we used 2,400 agents (1,200 per engine) deployed via Google Computer Engine from Swiss IPs and performed two rounds of data collection. To investigate the possible effects of randomization, all agents performed their routines simultaneously. The obtained dataset comprises over 300,000 pages, which we use to investigate source-level bias in the representation of the voted initiatives. Using manual labeling, we categorize all sources retrieved by search engines to identify whether 1) search engine outputs are subject to personalization and 2) whether personalization results in skewed exposure to certain types of initiative-related sources. Additionally, we investigate to what degree potential search bias is affected by randomization and whether this factor is to be taken into consideration in future research.