Large language models act as if they are part of a group
Publication
NetSI authors
Research area
Resources
Abstract
An extensive audit of large language models reveals that numerous models mirror the ‘us versus them’ thinking seen in human behavior. These social prejudices are likely captured from the biased contents of the training data.
The impressive capabilities of large language models (LLMs), such as GPT-4 and Llama, have accelerated their integration into daily lives. These models now power a wide range of applications, ranging from search engines and e-mail agents to customer support chatbots and telehealth platforms1. Their widespread adoption stems from their ability to reply to complex prompts with coherent, human-like responses and seamlessly integrate contextually relevant information. These impressive capabilities arise from training on vast amounts of human-written text. The primary objective of such training is to predict the most probable sequence of words given a context with no regard for the truthfulness or correctness of the provided training text2. Consequently, we should not assume that LLMs are objective entities aligned with universal human values. Instead, they are likely to absorb human behaviors — including harmful biases, such as stereotypes and prejudices3 — present in the training data. Writing in Nature Computational Science, Tiancheng Hu and colleagues4 demonstrate that numerous LLMs exhibit social biases similar to those seen in humans, such as ingroup solidarity and outgroup hostility5.