Large language models act as if they are part of a group

Germans Savcisens

Publication

Nature Computational Science

January 2, 2025

Abstract

An extensive audit of large language models reveals that numerous models mirror the ‘us versus them’ thinking seen in human behavior. These social prejudices are likely captured from the biased contents of the training data.

The impressive capabilities of large language models (LLMs), such as GPT-4 and Llama, have accelerated their integration into daily lives. These models now power a wide range of applications, ranging from search engines and e-mail agents to customer support chatbots and telehealth platforms¹. Their widespread adoption stems from their ability to reply to complex prompts with coherent, human-like responses and seamlessly integrate contextually relevant information. These impressive capabilities arise from training on vast amounts of human-written text. The primary objective of such training is to predict the most probable sequence of words given a context with no regard for the truthfulness or correctness of the provided training text². Consequently, we should not assume that LLMs are objective entities aligned with universal human values. Instead, they are likely to absorb human behaviors — including harmful biases, such as stereotypes and prejudices³ — present in the training data. Writing in Nature Computational Science, Tiancheng Hu and colleagues⁴ demonstrate that numerous LLMs exhibit social biases similar to those seen in humans, such as ingroup solidarity and outgroup hostility⁵.

Large language models act as if they are part of a group

Publication

NetSI authors

Research area

Resources

Abstract

Related publications