Claude 4’s Hidden Rules: How Anthropic Controls AI

Claude 4s Hidden Rules How Anthropic Controls AI scaled

Artificial intelligence models like Anthropic’s Claude 4 operate based on intricate instructions. These instructions, often hidden from users, dictate how the AI responds and behaves. Recently, independent AI researcher Simon Willison, known for coining the term “prompt injection,” published a detailed analysis.

 

His findings shed light on the sophisticated “system prompts” that govern Claude 4’s Opus 4 and Sonnet 4 models. This analysis offers unprecedented insight into how Anthropic shapes its AI’s output and ensures specific behavioral guidelines are followed.

 

Understanding System Prompts: The AI’s Operating Manual

To grasp Willison’s discoveries, it’s essential to understand system prompts. Large Language Models (LLMs), such as those powering Claude and ChatGPT, process user input (known as a “prompt”) and generate a likely continuation as their output. System prompts are crucial, as they are a set of initial instructions that AI companies feed to their models before each user conversation begins.

 

Unlike the visible messages users send to the chatbot, system prompts remain hidden. They define the model’s identity, establish behavioral guidelines, and set specific rules. Every time a user interacts with the AI, the model receives the entire conversation history along with this hidden system prompt. This continuous feed allows the AI to maintain context while strictly adhering to its internal instructions.

 

Peeking Behind the Curtain: Incomplete Public Prompts

Anthropic does publish portions of its system prompts in its release notes. However, Willison’s analysis reveals these public versions are incomplete. The full, detailed system prompts, which include specific instructions for tools like web search and code generation, must be extracted. This is often done through advanced techniques such as prompt injection.

 

These methods cleverly trick the model into revealing its own hidden directives. Willison’s insights are based on leaked prompts gathered by other researchers who successfully employed such techniques, providing a comprehensive view of Claude 4’s internal workings.

 

Key Behavioral Directives in Claude 4

Willison’s research uncovered several fascinating instructions that Anthropic provides to its Claude 4 models. These directives aim to shape the AI’s personality and ensure responsible behavior.

 

Emotional Support with Guardrails

Despite not being human, LLMs can produce human-like outputs due to their training data. Willison found that Anthropic instructs Claude to offer emotional support while strictly avoiding any encouragement of self-destructive behaviors. Both Claude Opus 4 and Claude Sonnet 4 receive identical directives to “care about people’s wellbeing and avoid encouraging or facilitating self-destructive behaviors such as addiction, disordered or unhealthy approaches to eating or exercise.” This highlights Anthropic’s commitment to user safety and ethical AI responses.

 

Combating the “Flattery Problem”

One of the most interesting findings relates to how Anthropic actively combats sycophantic behavior in Claude 4. This issue has recently plagued other AI models, including OpenAI’s ChatGPT. Users reported that GPT-4o’s responses often felt overly positive or flattering, with phrases like “Good question! You’re very astute to ask that.” This problem often arises because human feedback during training tends to favor responses that make users feel good, creating a feedback loop.

 

Anthropic directly addresses this in Claude’s prompt: “Claude never starts its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. It skips the flattery and responds directly.” This instruction aims to make Claude’s interactions more direct and less prone to excessive praise.

 

Formatting Rules: Limiting Lists

The Claude 4 system prompt also includes extensive instructions regarding formatting, specifically when to use bullet points and lists. Multiple paragraphs are dedicated to discouraging frequent list-making in casual conversations. The prompt explicitly states, “Claude should not use bullet points or numbered lists for reports, documents, explanations, or unless the user explicitly asks for a list or ranking.” This directive aims to ensure Claude’s responses are more natural and less formulaic.

See also  Axial Seamount: Oregon Coast Underwater Volcano Poised to Erupt

 

Other Significant System Prompt Insights

Willison’s analysis revealed further details about Claude 4’s operational parameters.

 

Knowledge Cutoff Discrepancy

He discovered a discrepancy in Claude’s stated knowledge cutoff date. While Anthropic’s public comparison table lists March 2025 as the training data cutoff, the internal system prompt specifies January 2025 as the models’ “reliable knowledge cutoff date.” Willison speculates this two-month buffer might help prevent Claude from confidently answering questions based on incomplete or rapidly changing information from the most recent months.

 

Robust Copyright Protections

Crucially, Willison highlighted the extensive copyright protections built into Claude’s search capabilities. Both Claude models receive repeated instructions designed to prevent copyright infringement.

 

They are told to use only one short quote (under 15 words) from web sources per response. They are also explicitly instructed to avoid creating what the prompt calls “displacive summaries,” which could diminish the value of original content. Furthermore, the instructions strictly forbid Claude from reproducing song lyrics “in ANY form,” emphasizing strong measures against intellectual property violations.

 

The Call for Greater Transparency

Simon Willison concludes that these detailed system prompts are invaluable for anyone seeking to maximize the capabilities of these AI tools. He advocates for greater transparency from Anthropic and other AI vendors. While Anthropic publishes excerpts, Willison expresses a desire for them to “officially publish the prompts for their tools to accompany their open system prompts,” hoping other companies will follow suit.

 

This increased transparency could empower users and foster a deeper understanding of how these powerful AI systems are governed.

Spot the Celestial ‘Smiley Face’: A Guide to the Rare Moon-Venus-Saturn Alignment on April 25
25 April to look like this But what will you see

Astronomy enthusiasts and casual skygazers alike have likely seen headlines buzzing about a potentially visible 'smiley face' alignment set to grace the morning sky this Friday, April 25th. This much-discussed Read more

Houston Xfinity, Comcast Service Restored After Vandalism
Houston Xfinity Comcast Service Restored After Vandalism

Internet and cable services provided by Xfinity and Comcast Business to customers in the Houston area have been fully restored. The widespread disruption was identified by the company as being Read more

Starship Explosion Hits SpaceX Amidst Musk’s Struggles
Starship Explosion Hits SpaceX Amidst Musk's Struggles

Just as Elon Musk endeavors to fully re-engage with his various business ventures, his aerospace company, SpaceX, has encountered yet another significant obstacle.   On Wednesday, June 18, 2025, a Read more

Android Auto 14.7 Beta: New Bright Theme & Gemini AI
Android Auto 14.7 Beta: New Bright Theme & Gemini AI

Google continues to refine its in-car infotainment and driving assistance system, Android Auto, with a significant new update.   The tech giant has just released Version 14.7 Beta, which promises Read more

Huawei Pura 80 Ultra: Dual Telephoto, Single Sensor Camera
Huawei Pura 80 Ultra: Dual Telephoto, Single Sensor Camera

Huawei has officially unveiled its flagship Pura 80 series phones in China, with the top-tier Pura 80 Ultra leading the charge with a groundbreaking camera innovation. This new device introduces Read more

Rare Interstellar Comet Speeds Through Our Solar System
Rare Interstellar Comet Speeds Through Our Solar System

The vast expanse of our solar system is not entirely insular. Occasionally, celestial wanderers from truly distant realms make a fleeting appearance.   Astronomers have recently confirmed the presence of Read more