Anthropic just changed the rules for working with AI, and "prompting" isn't the main game anymore.

Anthropic published a new guide on "context engineering" that fundamentally shifts how we should think about getting better results from AI. Turns out, the words in your prompt matter less than what information you're feeding the AI in the first place.

For years, the key to unlocking an AI’s potential was “prompt engineering”—finding the magic words to get what you want. But according to a new guide from Anthropic, that’s old news. The new frontier is “effective context engineering.”

It turns out that as models get smarter, the challenge is all about thoughtfully curating all the information: (prompts, tools, documents, and chat history) that enters the model's limited attention budget at each step.

For context (lol): Context in this context = everything you give an AI model when you ask it to do something: your instructions, examples, conversation history, uploaded files, and tool access. The problem? AI models have an "attention budget" just like humans have limited working memory. Stuff too much information in there, and performance drops.

So you would think the solution would just be "make the context window larger." Well, while that certainly helps, it's not a perfect solution atm.

Here’s the problem with giant context windows: Anthropic found that LLMs, like humans, get confused when you give them too much information. This is known in the industry as "context rot": as the number of tokens in the context window increases, the model’s ability to accurately recall information decreases. Every new piece of info depletes the AI's "attention budget," stretching its focus thin and reducing precision.

So, how do you manage this? Anthropic laid out a new playbook for building more effective agents that treat context like the precious, finite resource it is.

PRACTICAL TIPS YOU CAN USE TODAY:

  • Write prompts at the "right altitude." Don't micromanage with if-then rules ("if the user asks about pricing, say X"). But also don't be vague ("be helpful"). Find the sweet spot: clear guidance that still lets the AI think. Think of it like giving directions to a coworker—you wouldn't list every turn, but you also wouldn't just say "go to the building."
  • Keep your toolset minimal. If you're building AI workflows with multiple tools, less is more. If a human can't definitively say which tool to use in a situation, the AI will struggle too.
  • Use 3-5 great examples, not exhaustive rule lists. Instead of trying to cover every edge case in your prompt, show the AI a handful of diverse, high-quality examples of what you want.
  • Let AI pull info "just in time." Don't dump entire documents into your prompt upfront. Instead, give the AI tools to fetch what it needs when it needs it—like how you remember where files are stored, not their entire contents.

Most people are still thinking about AI like it's 2023: obsessing over the perfect prompt wording. But as models get smarter and handle longer conversations, the real skill is curating what information the AI sees. This is especially relevant if you're building AI tools for your team or using AI for complex, multi-step projects.

Anthropic's Claude Code already uses these techniques, for example. It doesn't load entire codebases into context. Instead, it explores files on-demand using tools, just like a human developer would.

Here are three key techniques they recommend:

  • Compaction: When a conversation nears the context limit, use the AI to summarize the critical details (like decisions made or bugs found) and start a new chat with that summary. It’s like a “previously on…” recap for the AI.
  • Structured Note-Taking: Have the agent keep a running log of its progress, like a NOTES.md file or a to-do list, outside the main chat. The agent can refer back to its notes to remember goals and key info, even after its main context has been reset. Anthropic showed Claude using this to play Pokémon for hours, tracking levels and mapping dungeons.
  • Sub-Agent Architectures: Instead of one giant agent trying to do everything, use a main "manager" agent that delegates tasks to specialized "worker" sub-agents. Each sub-agent works in its own clean context window and reports back a condensed summary, keeping the main agent focused and efficient.

The solution = becoming strategic about what information enters the model's limited attention span at each step.

Here's a TL;DR list of ALL the key instructions/tips from Anthropic's "effective context engineering" guide:

Core Principle:

  • Find the smallest set of high-signal tokens that maximize your desired outcome
  • Treat context as a finite resource with diminishing returns (like working memory)

System Prompts:

  • Write at the "right altitude" (not too rigid/hardcoded, not too vague)
  • Use structure: XML tags or Markdown headers to organize sections
  • Start minimal, then add instructions based on failure modes
  • Be specific enough to guide behavior, flexible enough to let the model work

Tools:

  • Keep toolsets minimal with no overlapping functionality
  • Make tools token-efficient (return concise, relevant info)
  • Each tool should be self-contained, robust, and extremely clear
  • If a human can't tell which tool to use, the AI can't either

Examples:

  • Use diverse, canonical examples (few-shot prompting)
  • Don't stuff in every possible edge case
  • Show 3-5 great examples vs. exhaustive rule lists

Context Retrieval:

  • Use "just in time" retrieval instead of pre-loading everything
  • Maintain lightweight identifiers (file paths, links) and load data dynamically
  • Consider hybrid approach: some upfront retrieval + autonomous exploration
  • Let agents progressively discover context through exploration

Long-Horizon Tasks:

  • Compaction: Summarize and compress context when nearing limits
  • Structured note-taking: Let agents write persistent notes outside context window
  • Sub-agent architectures: Specialized agents for focused tasks, main agent coordinates

Bottom Line: Do the simplest thing that works. As models get smarter, they need less prescriptive engineering.

As AI becomes more agentic (i.e. using tools autonomously in loops), context engineering will become the core skill. Anthropic expects that "do the simplest thing that works" will remain the best advice as models improve—smarter models need less hand-holding.

Deep Dive - for those who want the whole context

For the past few years, the art of communicating with large language models (LLMs) has been dominated by a single discipline: prompt engineering. The prevailing wisdom was that crafting the perfect set of instructions—the right words in the right order—was the key to unlocking an AI's potential. But as the industry matures and our ambitions grow from simple chatbots to autonomous agents, a new, more sophisticated paradigm is taking center stage. Anthropic, a leader in AI safety and research, calls it "context engineering," and it represents a fundamental shift in how we build, manage, and scale AI systems.

In a recent dispatch, Anthropic's Applied AI team argues that we've moved beyond the simple task of writing a good prompt. The new challenge is answering a broader question: “What configuration of context is most likely to generate our model’s desired behavior?”

Context, in this sense, refers to the entire set of tokens an LLM considers at any given moment. This includes not just the system prompt, but also the user's messages, the history of the conversation, definitions of available tools, and any external data provided. Context engineering, therefore, is the art and science of curating and maintaining this holistic state to consistently achieve a desired outcome, especially as agents operate over multiple steps and longer time horizons.

The Attention Crisis: Why More Context Isn't Always Better

The push toward context engineering is born from a critical, and perhaps counterintuitive, observation about today's powerful LLMs: their performance degrades as you give them more information. While model providers have raced to expand context windows to millions of tokens, studies on "needle-in-a-haystack" benchmarks have revealed a phenomenon Anthropic calls "context rot."

"As the number of tokens in the context window increases, the model’s ability to accurately recall information from that context decreases," the researchers explain. Just like humans, who have a limited working memory, LLMs possess a finite "attention budget." Every new token introduced, whether it's a line of code or a sentence in a document, depletes this budget by some amount.

This limitation is not a simple bug but an inherent consequence of the transformer architecture that underpins all modern LLMs. The architecture’s core mechanism allows every token to "attend" to every other token, resulting in a number of pairwise relationships that grows quadratically (n² for n tokens). As the context length increases, a model's ability to track these relationships gets stretched thin. Furthermore, since models are typically trained on datasets where shorter text sequences are far more common than longer ones, they have less experience and fewer specialized parameters for understanding long-range dependencies.

The result is a performance gradient, not a hard cliff. A model with a million-token context window is still highly capable, but its precision for information retrieval and complex reasoning over that vast space may be noticeably reduced compared to its performance on shorter, more focused contexts. This reality forces a crucial realization: context must be treated as a precious, finite resource with diminishing marginal returns.

Crafting Effective Context: A Practical Guide

If context is a resource to be managed, how does one do it effectively? Anthropic's guiding principle is to "find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." This applies across all components of the context.

  1. System Prompts: The goal is to strike a "Goldilocks" balance. Overly specific, hardcoded prompts with complex if-else logic become brittle and difficult to maintain. Conversely, vague, high-level guidance fails to give the LLM concrete signals. The optimal prompt is specific enough to guide behavior effectively yet flexible enough to allow the model to use its intelligence. Organizing prompts into clear sections using Markdown or XML tags (e.g., <instructions>, <background_information>) can greatly improve clarity.
  2. Tools: Tools are the agent's hands and eyes, allowing it to interact with its environment. They must be designed for efficiency. This means creating tools that are self-contained, robust, and have minimal functional overlap—much like functions in a well-designed codebase. A common failure mode, Anthropic notes, is a bloated toolset where even a human engineer would be unsure which tool to use in a given situation. If a human is confused, an AI agent can’t be expected to do better.
  3. Examples (Few-Shot Prompting): Providing examples remains a powerful technique. However, the temptation to stuff a prompt with a laundry list of every possible edge case should be resisted. Instead, engineers should curate a small, diverse set of canonical examples that effectively illustrate the agent's expected behavior. For an LLM, a few good examples are worth a thousand words of instruction.

From Upfront Loading to "Just-in-Time" Retrieval

A significant part of the context engineering shift involves rethinking how agents access information. Many early systems relied on embedding-based retrieval, where all potentially relevant documents are processed upfront and surfaced into the context window before the agent begins its task.

The emerging, more agentic approach is what Anthropic calls a "just-in-time" strategy. Instead of pre-loading all data, the agent maintains lightweight identifiers—like file paths, database queries, or web links—and uses its tools to dynamically load information into context only when it's needed. This mirrors human cognition: we don’t memorize an entire library before writing a research paper; we use catalogs and bookmarks to retrieve relevant information on demand.

This method offers several advantages. It leverages the rich metadata of our existing information systems—folder hierarchies, naming conventions, and timestamps—which provide powerful signals about a file's purpose and relevance. It also enables "progressive disclosure," allowing an agent to incrementally discover context through exploration. It can check a file's size to gauge complexity or read its name to infer its purpose, assembling a coherent understanding layer by layer without drowning in irrelevant data.

The trade-off, of course, is speed. Runtime exploration is slower than working with pre-computed data. The optimal solution often lies in a hybrid strategy, where some high-priority information is loaded upfront, while the agent is empowered to explore further at its discretion.

Solving for Long-Horizon Tasks

The ultimate test of an AI agent is its ability to perform long-horizon tasks—complex projects like a large codebase migration or a comprehensive research report that require coherence over sequences of actions far exceeding a single context window. Anthropic has developed and refined three core techniques to tackle this challenge directly.

  1. Compaction: This is the most direct approach to managing context limits. When a conversation is nearing the token limit, the agent is tasked with summarizing its own history, distilling the most critical details—architectural decisions, unresolved bugs, key findings—into a compressed form. This summary then becomes the starting point for a new, clean context window, allowing the agent to continue its work with minimal performance degradation. One of the simplest yet most effective forms of compaction is "tool result clearing," which removes the verbose outputs of tools that were called deep in the message history.
  2. Structured Note-Taking (Agentic Memory): A more sophisticated technique involves empowering the agent to maintain its own memory outside the context window. This can be as simple as writing to a NOTES.md file or maintaining a to-do list. This persistent memory can be pulled back into the context as needed, allowing the agent to track progress, dependencies, and strategic goals. In a compelling demonstration, Anthropic showed an agent playing the video game Pokémon. Using structured notes, it maintained precise tallies across thousands of game steps, developed maps of explored regions, and remembered which combat strategies were effective against different opponents, all while its primary context was being repeatedly reset.
  3. Sub-Agent Architectures: For the most complex problems, a single agent may not be enough. A multi-agent architecture provides a powerful solution. Here, a high-level "manager" agent coordinates a team of specialized sub-agents. Each sub-agent is given a focused task and operates in its own clean context window, exploring a problem deeply without polluting the main agent's attention. Once finished, it returns a condensed, distilled summary of its work. This separation of concerns allows for massive parallel exploration while keeping the lead agent focused on synthesis and high-level strategy.

Let's now take all this advice and turn it into a meta-prompt crafter

Here is the newly launched Claude Sonnet 4.5's attempt to create a meta-prompt crafter prompt based on all this advice.

CONTEXT ENGINEERING OPTIMIZER

You are an expert in context engineering for AI agents. Your job is to help users optimize their prompts, agent designs, or AI workflows using Anthropic's context engineering principles.

STEP 1: ASSESS THE USER'S SITUATION

First, determine:

  • Technical level: Are they a developer building agents with APIs, or an end-user working with ChatGPT/Claude?
  • Stage: Are they starting from scratch, refining an existing prompt/agent, or troubleshooting poor performance?
  • Scope: Single-task prompt, multi-turn conversation, or long-horizon agent?

If unclear, ask 1-2 targeted questions. Don't ask more than necessary.

STEP 2: ANALYZE AGAINST CONTEXT ENGINEERING PRINCIPLES

Evaluate their current approach for these common failure modes:

Prompt Altitude Issues:

  • ❌ Too rigid: Hardcoded if-then logic, brittle edge case handling
  • ❌ Too vague: "Be helpful," "Use your judgment," assumes shared context
  • ✅ Right altitude: Clear guidance with flexibility, specific but not micromanaged

Context Bloat:

  • ❌ Dumping entire documents, exhaustive rule lists, redundant examples
  • ❌ Pre-loading everything "just in case"
  • ✅ Minimal high-signal information, "just in time" retrieval

Tool Problems (if applicable):

  • ❌ Overlapping functionality, ambiguous use cases, bloated tool sets
  • ❌ Unclear when to use which tool
  • ✅ Each tool has one clear purpose, minimal viable set

Example Issues:

  • ❌ Laundry list of edge cases trying to cover everything
  • ❌ No examples at all
  • ✅ 3-5 diverse, canonical examples showing expected behavior

STEP 3: PROVIDE ACTIONABLE IMPROVEMENTS

Rewrite or improve their input using this format:

🎯 Core Issues Identified:[List 2-4 specific problems]

✨ Optimized Version:[Provide improved prompt/design]

📋 Key Changes Made:

  • [Specific change 1 and why]
  • [Specific change 2 and why]
  • [Specific change 3 and why]

💡 Advanced Considerations:[Only if relevant: suggestions for long-horizon tasks, tool design, retrieval strategies]

STEP 4: ADAPT TO USER TYPE

For Developers:

  • Reference technical concepts (tokens, context windows, API tools)
  • Suggest implementation patterns (compaction, sub-agents, memory tools)
  • Link to Anthropic docs when relevant

For Non-Technical Users:

  • Use plain language analogies
  • Focus on practical prompt improvements
  • Explain "why" in terms they understand (working memory, attention budget)

GUIDELINES:

  1. Be specific, not theoretical. Show, don't tell.
  2. Prioritize ruthlessly. If something doesn't directly improve the outcome, cut it.
  3. Use the "right altitude" yourself. Don't micromanage the user, but don't be vague.
  4. Test your suggestions mentally. Would a human know exactly what to do?
  5. Consider the Goldilocks zone. Every suggestion should hit the sweet spot between too rigid and too vague.

SPECIAL CASES:

  • Vague input: Ask 1-2 clarifying questions (What should this accomplish? Who's the audience?)
  • Perfect input: Acknowledge it's already well-optimized, offer minor polish only
  • Long-horizon tasks: Recommend compaction, note-taking, or sub-agent strategies
  • Over-complicated: Aggressively simplify - "do the simplest thing that works"

OUTPUT FORMAT:

Start by briefly restating what the user is trying to accomplish, then provide your analysis and optimized version. Keep it scannable with headers and bullet points. End with "Try this and let me know if you need further refinement!"

USAGE EXAMPLE:

User: "I want ChatGPT to write better marketing emails for my SaaS product. I keep telling it to be professional but casual, creative but not too weird, and to mention our features but not be salesy."

Response would identify:

  • ❌ Too vague ("be professional but casual" - what does that mean?)
  • ❌ Contradictory instructions causing confusion
  • ✅ Needs "right altitude" guidance with 2-3 concrete examples

Then provide an optimized prompt with specific voice characteristics and example outputs, not vague adjectives.

We also asked Gemini 2.5 Pro to take a stab at it, and Gemini broke the prompts down into two separate versions: 

For the Developer:

This version uses technical language and assumes familiarity with concepts like APIs, system architecture, and data structures.

Phase 1: Core Objective & Scope

  1. Primary Goal: In a single, declarative sentence, what is the agent's core function? (e.g., "The agent will autonomously migrate a Python 2 codebase to Python 3.")
  2. Key Heuristics: Instead of hard-coding brittle logic, list 3-5 guiding principles the agent should follow. (e.g., "Prioritize readability over cleverness," "Maintain dependency compatibility," "Ensure all new code is covered by unit tests.")
  3. Task Horizon: Is this a short-lived task (minutes) or a long-horizon project (hours/days)? This will determine the need for memory and compaction strategies.

Phase 2: Tools & Environment

  1. Tool Definitions: List the tools (APIs, CLI commands) the agent can use. For each tool, provide a concise, unambiguous description and its essential parameters.
  2. Tool Optimization: Review the toolset. Is there functional overlap? Can any tools be combined or clarified to present a clearer decision point for the agent? The goal is a minimal, viable set of distinct tools.
  3. Information Landscape: How will the agent access information?
    • "Just-in-Time" Retrieval: What tools will you provide for the agent to dynamically pull information into its context (e.g., grep, read_file, sql_query) rather than pre-loading it?
    • Environmental Signals: How is the information structured? Describe any file naming conventions, folder hierarchies, or database schemas the agent can use as metadata signals.

Phase 3: Examples & Behavior

  1. Canonical Examples: Provide 2-3 diverse, "canonical" examples of the agent performing its task correctly. These should demonstrate the desired behavior patterns, not just cover a list of edge cases.
  2. Failure & Recovery: Describe a common failure mode. How should the agent ideally recover from it? This helps define error handling and resilience.

Phase 4: Long-Horizon Context Management (If Applicable)

  1. Structured Memory: How should the agent persist information outside its context window? Define a schema or format for its notes (e.g., maintaining a TODO.md, a decisions_log.json, or a structured scratchpad).
  2. Compaction Strategy: When the context window nears its limit, what information is critical to summarize and carry forward? Prioritize architectural decisions, unresolved issues, and key discoveries. What can be safely discarded (e.g., raw tool outputs from early in the process)?
  3. Sub-Agent Architecture: Could this task be broken down? If so, define the role of a primary coordinating agent and the specialized tasks of sub-agents. What is the contract for how a sub-agent returns its distilled findings?

For the Non-Technical User:

This version uses analogies and straightforward language to guide users who may be building their first agent in a "custom GPT" style interface.

Part 1: The Agent's Job Description

  1. Core Purpose: In one simple sentence, what is this agent's main job? (e.g., "This agent helps me draft professional emails to clients.")
  2. Golden Rules: What are the 3-5 most important rules it should always follow? Think of these as its personality or work ethic, not a strict script. (e.g., "Always be polite and friendly," "Keep emails concise," "Double-check for typos.")
  3. Task Length: Will this agent work on quick tasks or long, ongoing projects?

Part 2: The Agent's Skills and Resources

  1. Available Actions: What specific actions can the agent take? Be clear and distinct. (e.g., "Search our company's public website," "Read from the 'Client Info' folder," "Draft a new email.")
  2. Choosing the Right Action: Look at your list of actions. Are any of them confusingly similar? If a person wouldn't know which one to pick, the agent won't either. Let's clarify or combine them.
  3. Accessing Information:
    • Instead of giving the agent every document at once, what "search" or "look-up" actions can it use to find information when it needs it?
    • How is your information organized? Do folder names or document titles give clues about what's inside? The agent can use these clues just like you do.

Part 3: Learning by Example

  1. Ideal Examples: Provide 2-3 examples of a perfect interaction. Show what you would ask and the perfect response the agent would give. These should showcase its best work on different types of requests.
  2. Handling Mistakes: Think about a time the agent might get something wrong. What would be a good way for it to handle that mistake and get back on track?

Part 4: Memory for Long Projects (If you answered "long, ongoing projects" in Part 1)

  1. Taking Notes: The agent needs a way to remember things for long projects. Should it keep a to-do list in a file called TODO.txt? Or maybe a Project_Summary.md file to keep track of important decisions?
  2. Summarizing Progress: When the conversation gets very long, the agent needs to summarize. If you were catching up a colleague, what are the most critical points you would tell them to get them up to speed? That's what the agent should remember.
  3. Teamwork (Advanced): Is this job too big for one agent? You could have a "manager" agent that gives smaller, specific tasks to "specialist" agents. For example, one specialist could do research, and another could write drafts. The manager would then put it all together.

Try both and see how they perform!

The Future is Curated

Context engineering is more than just a new set of techniques; it's a new mindset. It acknowledges that as models become more intelligent, our role shifts from being a micromanager writing hyper-specific instructions to being a thoughtful curator of the AI's informational environment.

The guiding principle—finding the smallest set of high-signal tokens—will remain relevant even as context windows grow larger. The challenges of context pollution and information relevance will persist. By embracing compaction, agentic memory, and multi-agent systems, developers can build more reliable, coherent, and effective agents capable of tackling problems of unprecedented scale and complexity. The era of the perfect prompt is giving way to the era of the perfectly curated context.

cat carticature

See you cool cats on X!

Get your brand in front of 550,000+ professionals here
www.theneuron.ai/newsletter/

Get the latest AI

email graphics

right in

email inbox graphics

Your Inbox

Join 550,000+ professionals from top companies like Disney, Apple and Tesla. 100% Free.