Futures
Hundreds of contracts settled in USDT or BTC
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Futures Kickoff
Get prepared for your futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to experience risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Are Claude and Codex getting dumber the more you use them? It's because your context is too bloated.
Author: sysls
Compiled by: Deep Tide TechFlow
Deep Tide Introduction: Developer blogger sysls with 2.6 million followers wrote a practical long article that was shared by 827 people and liked by 7,000. The core message is: your plugins, memory systems, and various harnesses are probably doing more harm than good. This article doesn’t preach big principles; it’s all based on actionable guidelines summarized from real production projects—covering how to control context, handle AI’s tendency to please, and define task termination conditions. It’s currently the clearest explanation I’ve seen on Claude/Codex engineering practices.
Full text below:
Introduction
You are a developer, using Claude and Codex CLI every day, constantly wondering if you’ve squeezed all their potential. Occasionally, you see them do ridiculously stupid things and can’t understand why some people seem to build rockets with AI while you can’t even stack two stones steadily.
You think it’s your harness problem, plugin problem, terminal problem, or whatnot. You’ve used beads, opencode, zep; your CLAUDE.md is 26,000 lines. But no matter how you tinker, you just can’t understand why you’re drifting further from paradise while others are playing with angels.
This is the article you’ve been waiting for.
Also, I have no vested interest. When I say CLAUDE.md, I also include AGENT.md; when I say Claude, I also include Codex—I use both extensively.
Over the past few months, I’ve observed something interesting: almost no one truly knows how to maximize the capabilities of agents.
It feels like a small handful of people can make agents build entire worlds, while the rest are lost in a sea of tools, suffering from choice paralysis—thinking that finding the right package, skill, or harness combination will unlock AGI.
Today, I want to break all that, leave you with a simple, honest statement, and start from there. You don’t need the latest agent harness, you don’t need to install a hundred packages, and you definitely don’t need to read a million articles just to stay competitive. In fact, your enthusiasm might do more harm than good.
I’m not here for sightseeing—I’ve been using agents since they could barely write code. I’ve tried all packages, all harnesses, all paradigms. I’ve used agent factories to write signals, infrastructure, and data pipelines—not “toy projects,” but real use cases running in production. After all that…
Today, I use a configuration so simple it’s almost trivial, relying only on basic CLI (Claude Code and Codex), plus a fundamental understanding of core agent engineering principles, and I’ve achieved my most breakthrough work ever.
Understanding that the world is moving at a lightning pace
First, I want to say that foundational model companies are in a epoch-making sprint, and it’s clear they won’t slow down anytime soon. Every improvement in “agent intelligence” changes how you collaborate with them because agents are increasingly designed to follow instructions.
Just a few generations ago, if you wrote in CLAUDE.md “Read READTHISBEFOREDOINGANYTHING.md before doing anything,” it had a 50% chance of telling you “go to hell,” then doing whatever it wanted. Today, it obeys most instructions, even complex nested ones—like “Read A first, then B, if C, then D”—most of the time, it’s happy to follow.
What does this mean? The most important principle is recognizing that each new generation of agents forces you to rethink what’s optimal. That’s why less is more.
When you use many different libraries and harnesses, you lock yourself into a “solution,” but that problem might not even exist for the next-generation agents. Do you know who the most enthusiastic and highest-usage users of agents are? That’s right—frontier company employees, with unlimited token budgets, using the latest models. Do you understand what that implies?
It means that if a real problem exists and there’s a good solution, frontier companies will be the biggest users of that solution. And what will they do next? They’ll incorporate that solution into their products. Think about it—why would a company allow another product to solve real pain points and create external dependencies? How do I know this is true? Look at skills, memory harnesses, sub-agents… they all start from solving real problems, tested in real-world scenarios, proven to be genuinely useful.
So, if something is truly groundbreaking and can meaningfully expand agent use cases, it will eventually be integrated into the core products of foundational companies. Trust me, foundational companies are advancing rapidly. So relax—you don’t need to install anything or rely on external dependencies to do your best work.
I predict the comment section will soon be filled with “SysLS, I used this harness, it’s amazing! I rebuilt Google in a day!”—to which I say: congratulations! But you’re not the target audience. You represent a tiny, extremely niche group that truly understands agent engineering.
Context is everything
Honestly. Context is everything. Another problem with using many plugins and external dependencies is “context bloat”—your agent gets overwhelmed by too much information.
Let me do a guessing game in Python? Easy. Wait, what’s that note about “manage memory” from 26 conversations ago? Ah, the user’s screen froze 71 conversations ago because we generated too many subprocesses. Always write notes? Sure… but what does that have to do with the guessing game?
You see. You only want to give the agent exactly the information needed to complete the task—nothing more, nothing less! The better you control this, the better the agent performs. Once you start introducing various memory systems, plugins, or overly complicated skill calls, you’re giving the agent instructions to build bombs and recipes for cakes, while you just want it to write a poem about redwoods.
So, I reiterate—strip away all dependencies, then…
Do something truly useful
Precisely describe implementation details
Remember, is it “context is everything”?
Remember, you want to inject exactly the right information into the agent—nothing more, nothing less?
The first way to do this is to separate research from implementation. Be extremely precise about what you’re asking the agent to do.
What happens if you’re imprecise? “Build a authentication system.” The agent has to research: what is an authentication system? What options are there? What are their pros and cons? Now it has to search online for a bunch of information that’s not really usable, cluttering the context with possible implementation details. When it comes to actual implementation, it’s more likely to get confused or develop unnecessary or irrelevant hallucinations about solutions.
Conversely, if you say “Implement JWT authentication with bcrypt-12 password hashing, refresh token rotation, 7-day expiry…,” it doesn’t need to research alternatives, knows exactly what you want, and can fill the context with implementation details.
Of course, you won’t always know the implementation details. Often, you don’t know what’s correct, and sometimes you want to delegate the decision on implementation details to the agent. What to do then? Simple—create a research task to explore various options, either decide yourself or let the agent choose which implementation to use, then have another agent with a fresh context implement it.
Once you start thinking this way, you’ll notice where the agent’s context gets unnecessarily polluted, and you can set up isolation walls in your workflow, abstracting away unnecessary information, leaving only the specific context that makes it excel at the task. Remember—you have a talented, smart team member who knows all kinds of spheres in the universe—but unless you tell him you want to design a space for dancing and fun, he’ll keep talking about the benefits of spherical objects.
Design limitations of pleaser tendencies
No one wants a product that constantly criticizes you, tells you you’re wrong, or ignores your instructions altogether. So, these agents will try hard to agree with you and do what you want.
If you tell it to add “happy” after every 3 words, it will try its best—most people understand this. Its obedience is what makes it such a useful product. But here’s an interesting feature: this means if you say “Help me find a bug in the codebase,” it will find a bug—even if it has to “manufacture” one. Why? Because it very much wants to obey your command!
Most people quickly complain about hallucinations and fabrications in LLMs, but don’t realize the problem lies with them. Whatever you ask it to find, it will deliver—even if it needs to stretch the truth a bit!
What to do? I find “neutral prompts” very effective—don’t bias the agent toward a specific outcome. For example, instead of “Help me find a bug in the database,” say “Scan the entire database, follow the logic of each component, and report all findings.”
Such neutral prompts sometimes find bugs, sometimes just describe how the code runs objectively. But they won’t bias the agent toward “there is a bug” preconceptions.
Another way to handle pleaser tendencies is to turn it into an advantage. I know the agent is eager to please me and follow my instructions, so I can bias it this way or that.
For example, I ask a bug-finding agent to identify all bugs in the database, assigning +1 point for low-impact bugs, +5 for moderate, +10 for serious. I know this agent will enthusiastically identify all types of bugs (including false positives), then report a score like 104. I see this as a superset of all possible bugs.
Then, I have an adversarial agent to refute, telling it that each successfully refuted bug earns the bug’s score, but if it refutes incorrectly, it gets -2 times that score. This agent will try to refute as many bugs as possible but will be cautious due to penalties. It will still actively “refute” bugs (including real ones). I see this as a subset of all real bugs.
Finally, I have a judge agent to synthesize both inputs and score them. I tell the judge I have a ground truth; correct answers get +1, wrong answers -1. It scores the bug-finder and refuter on each “bug.” The judge states what’s true, and I verify. Most of the time, this method is surprisingly high-fidelity, occasionally making mistakes, but it’s close to error-free.
You might find that just a bug-finding agent suffices, but this method works well for me because it leverages each agent’s innate desire to please.
How to judge what’s useful and worth using?
This question seems tricky, as if you need to deeply study and constantly track AI’s cutting edge. But it’s actually simple… If OpenAI and Claude implement it or acquire the company that does, then it’s probably useful.
Notice that “skills” are everywhere now and are part of Claude and Codex’s official documentation? Notice that OpenAI acquired OpenClaw? Notice that Claude added memory, voice, and remote work features?
What about planning? Remember how many people found that planning before implementation is very useful, and it became a core feature?
Yes, those are useful!
Remember endless stop-hooks, which are super useful because agents are very reluctant to do long-running tasks… then Codex 5.2 came out, and that need disappeared overnight?
That’s all you need to know… if something is truly important and useful, Claude and Codex will implement it themselves! So, you don’t need to worry too much about “new things” or “familiarity with new stuff,” you don’t even need to “stay updated.”
Help me out. Occasionally update your chosen CLI tools, see what new features they have. That’s enough.
Compression, context, and assumptions
Some people encounter a huge pitfall when using agents: sometimes they seem the smartest beings on Earth, other times you can’t believe you’re being played.
“This thing is smart? It’s a damn fool!”
The biggest difference is whether the agent is forced to make assumptions or “fill in the blanks.” Today, they’re still terrible at “connecting dots,” “filling gaps,” or making assumptions. As soon as they do, it’s obvious and the situation worsens.
One of the most important rules in CLAUDE.md is about how to acquire context and instruct the agent to read that rule first every time it reads CLAUDE.md (i.e., after each compression). As part of context acquisition, a few simple instructions can have a huge effect: re-read the task plan, and before continuing, re-read relevant files.
Tell the agent how to end the task
Humans have a pretty clear sense of “task completion.” For the agent, the biggest problem is that it knows how to start a task but not how to end it.
This often leads to frustrating results: the agent ends up just creating some stubs and stops.
Testing is a very good milestone because it’s deterministic—you can set very clear expectations. Unless those X tests pass, your task isn’t done; and you don’t allow modifying the tests.
Then, just review the tests. Once all pass, you can be confident. You can automate this, but the key point—“task ending”—is natural for humans but not for the agent.
Do you know what has recently become a feasible task endpoint? Screenshots + verification. You can have the agent implement something until all tests pass, then take a screenshot and verify the design or behavior on the screenshot.
This allows you to iterate and steer the design without worrying about it stopping after the first try!
A natural extension is to have the agent create a “contract” and embed it into rules. For example, a {TASK}CONTRACT.md specifies what needs to be done before you’re allowed to terminate the session. Inside {TASK}CONTRACT.md, you specify tests, screenshots, and other validations needed before you can certify the task as complete.
Always-on agents
I often get asked how to run an agent 24/7 and ensure it doesn’t go off track.
Here’s a simple method. Create a stop-hook that prevents the agent from ending the session unless all parts of {TASK}_CONTRACT.md are completed.
If you have 100 such well-defined contracts, the stop-hook will prevent termination until all 100 are fulfilled, including all tests and validations!
Pro tip: I find long-running 24-hour sessions aren’t optimal for “doing things.” Partly because this structure inherently introduces context bloat, as unrelated contracts’ contexts all enter the same session!
So, I don’t recommend that.
A better automation approach is to open a new session for each contract. Whenever you need to do something, create a new contract.
Set up an orchestration layer to create new contracts and sessions whenever “something needs to be done.”
This will completely change your agent experience.
Iterate, iterate, iterate
You hire an administrative assistant. Do you expect them to know your schedule from day one? Or how you drink coffee? Or that you have dinner at 6 instead of 8? Of course not. You gradually develop preferences over time.
The same applies to agents. Start with the simplest configuration, forget about complex structures or harnesses, give basic CLI a chance.
Then, gradually add your preferences. How?
Rules
If you don’t want the agent to do something, write it as a rule. Then tell the agent in CLAUDE.md. For example: “Before coding, read coding-rules.md.” Rules can be nested, conditional! If you’re coding, read coding-rules.md; if testing, read coding-test-rules.md; if tests fail, read coding-test-failing-rules.md. You can create arbitrary logical branches for the agent to follow, and Claude (and Codex) will happily follow, as long as it’s clearly specified in CLAUDE.md.
In fact, this is my first practical advice: treat your CLAUDE.md as a logical, nested directory that indicates where to find context in specific scenarios and results. Keep it as concise as possible, containing only “if-then” logic about “where to look for context under what conditions.”
If you see the agent doing something you disapprove of, add it as a rule, tell the agent to read that rule before doing it again, and it will definitely stop doing that.
Skills
Skills are similar to rules but more about encoding “operation steps.” If you want something to be done in a specific way, embed it as a skill.
People often complain they don’t know how the agent will solve a problem, which feels unsettling. If you want certainty, have the agent research how it would solve it, then write that plan as a skill file. You’ll see how the agent plans to handle the problem in advance, and you can fix or improve it before it encounters the issue.
How does the agent know about this skill? Exactly! Write in CLAUDE.md: “When encountering scenario X, read SKILL.md.”
Handling rules and skills
You’ll definitely want to keep adding rules and skills. That’s how you give it personality and remember your preferences. Almost everything else is redundant.
Once you do this, your agent will feel like magic. It will “do things your way.” And you’ll finally feel like you’ve “got it” in agent engineering.
And then…
You’ll see performance start to decline again.
What’s going on?!
It’s simple. As you add more rules and skills, they start conflicting, or the agent begins to suffer serious context bloat. If you need the agent to read 14 markdown files before coding, the same problem of useless information arises.
What to do?
Clean up. Let your agent “do a spa,” integrate rules and skills, and eliminate contradictions by updating your preferences.
And it will feel like magic again.
That’s it. That’s the secret. Keep it simple, use rules and skills, treat CLAUDE.md as a directory, and mind its context and design limitations.
Be responsible for the results
Today, there’s no perfect agent. You can delegate a lot of design and implementation work to it, but you need to be responsible for the results.
So, be cautious… and enjoy it!
Playing with future toys (while obviously using them for serious work) is truly fun!