On Sunday, independent AI researcher Simon Willison published a detailed analysis of Anthropic’s newly released system prompts for Claude 4’s Opus 4 and Sonnet 4 models, offering insights into how Anthropic controls the models’ “behavior” through their outputs. Willison examined both the published prompts and leaked internal tool instructions to reveal what he calls “a sort of unofficial manual for how best to use these tools.”
To understand what Willison is talking about, we’ll need to explain what system prompts are. Large language models (LLMs) like the AI models that run Claude and ChatGPT process an input called a “prompt” and return an output that is the most likely continuation of that prompt. System prompts are instructions that AI companies feed to the models before each conversation to establish how they should respond.
Unlike the messages users see from the chatbot, system prompts typically remain hidden from the user and tell the model its identity, behavioral guidelines, and specific rules to follow. Each time a user sends a message, the AI model receives the full conversation history along with the system prompt, allowing it to maintain context while following its instructions.
While Anthropic publishes portions of its system prompts in its release notes, Willison’s analysis reveals these published versions are incomplete. The full system prompts, which include detailed instructions for tools like web search and code generation, must be extracted through techniques like prompt injection—methods that trick the model into revealing its hidden instructions. Willison relied on leaked prompts gathered by researchers who used such techniques to obtain the complete picture of how Claude 4 operates.
For example, even though LLMs aren’t people, they can reproduce human-like outputs due to their training data that includes many examples of emotional interactions. Willison shows that Anthropic includes instructions for the models to provide emotional support while avoiding encouragement for self-destructive behavior. Claude Opus 4 and Claude Sonnet 4 receive identical instructions to “care about people’s wellbeing and avoid encouraging or facilitating self-destructive behaviors such as addiction, disordered or unhealthy approaches to eating or exercise.”

Loading comments...