Loop Engineering When You Stop Prompting the Agent and Start Building the System That Does

The leverage moved. It is no longer the quality of one prompt you type, but the design of the system that generates and verifies prompts while you are not in the seat.

June 11, 2026 13 min read
The one-sentence version Prompt, context, loop Ralph: the dumb loop Who is talking about it The pieces of a loop One loop, end to end The risks it does not solve The Day Two problem Where to start

For about two years, getting useful work out of a coding agent meant sitting in the chair. You wrote a prompt, read what came back, judged it, and typed the next thing. You held the tool the whole time, one turn after another. That was the job, and people got good at it.

Sometime in mid-2026 the job changed names. The thing people now call loop engineering is the shift from being the person who prompts the agent to being the person who designs the system that prompts it. Instead of you typing the next instruction after every response, a small system finds the work, hands it out, checks the result, writes down what got done, and decides what to do next. You build that system once. Then you let it poke the agent instead of poking it yourself.

A close-up of an old mechanical automation - a continuous loop of belt running over polished brass gears and pulleys, caught mid-motion with no operator present, lit warmly against deep navy shadow - a machine turning itself

This post walks through what the term actually means, where it came from, who is pushing it, how the pieces fit together, and where it goes wrong. We will also be honest about the part most of the hype skips: a loop running unattended is a loop making mistakes unattended, and that changes what your job is rather than removing it.

The one-sentence version

Loop engineering is building a system that prompts your agent on a schedule and against a goal, instead of typing each prompt yourself. The leverage moves from the quality of a single prompt to the design of the system that generates and verifies prompts.

A useful way to picture it: a coding agent already runs an inner loop on every turn. It reasons about what to do, takes an action like editing a file or running a test, observes the result, then loops back and reasons again against the new state. That perceive, reason, act, observe cycle is the agentic loop, and it has been there all along.

The inner loop, on every turn
Perceive
Reason
Act
Observe
and back to perceive, against the new state - every turn

Loop engineering sits one floor above it. You are no longer steering each turn by hand. You are building an outer loop that runs on a schedule, spawns helpers, feeds itself work, and keeps going across many of those inner cycles without you in the seat for each one.

How we got here: prompt, then context, then loop

It helps to see loop engineering as the third layer in a stack that has been building for a few years. Each layer wraps the one inside it, and each one moves the point of leverage a little further from the raw model call.

Loop engineering unit of work: a self-running cycle across many turns
Context engineering unit of work: the conditions around one answer
Prompt engineering unit of work: one turn you typed by hand
the raw model call

Prompt engineering came first. You optimized how you phrased a single instruction. The unit of work was one turn you typed by hand.

Context engineering came next. You stopped obsessing over the wording of the prompt and started managing everything else in the window: the docs, the history, the tool definitions, the files the model could see. The unit of work became the conditions around one answer.

Loop engineering is the layer on top. You optimize the system that decides what to prompt and when, and whether the result is acceptable. The unit of work is a self-running cycle across many turns.

None of the earlier layers go away. A loop is built out of prompts, and a sloppy prompt inside a loop just produces sloppy work faster. The loop still has to put the right files and tool definitions in front of the model on each turn - that is context engineering doing its job. What the new layer adds is the autonomous control structure wrapped around all of it.

Ralph: the dumb loop that started it

You cannot tell the story of loop engineering without telling the story of Ralph, because the fancy version is just Ralph with better tooling.

In July 2025, the engineer Geoffrey Huntley named a pattern after Ralph Wiggum, the Simpsons character who is not especially bright but never quits. A Ralph loop is a coding agent running inside an infinite shell loop. Each iteration reads the same prompt file, modifies the codebase on disk, and uses the file system instead of conversation history as its memory. Every iteration starts with a fresh, clean context window. State survives between iterations through the codebase itself, a TODO file, and git history.

Dumb things can work surprisingly well.

- the lesson of Ralph, as the technique spread through the agentic-coding community in mid-2025

The reason it works is the part worth keeping. Ralph stores state in the external file system and git, not in the context window. That one decision decouples the life of the loop from the limits of the model's memory.

Resets every iteration

The context window

Each new agent starts fresh with a clean window. The model forgets everything between runs. On its own, that would cap every job at the size of one context.

Persists across runs

The file system and git

The codebase, a TODO file, and git history do not reset. Each fresh agent inherits the previous agent's work - so jobs no single window could hold become possible.

That makes the pattern unusually good at long jobs that no single context window could hold: large refactors, library ports, legacy migrations. There is one more trick that makes Ralph more than a toy. Coding agents like to declare victory early. They stop the moment they decide the task is done, whether or not it actually is. Ralph implementations get around this by refusing to take the agent's word for it.

The seed of everything

When the agent tries to exit, a control script scans its output for a predefined completion signal - a literal COMPLETE token it has to emit. No token, and the script reloads the prompt and starts another round. The loop ends only when completion is proven, not when the agent feels finished. Do not trust the agent's self-assessment. Verify it from the outside.

Who is talking about it

The Ralph pattern stayed a community technique through late 2025. The broader name showed up in June 2026, popularized by the Google engineer Addy Osmani in an essay called "Loop Engineering." He built on a line from Peter Steinberger, who argued that you should be designing loops that prompt your agents rather than prompting them by hand, and on comments from Boris Cherny, the Claude Code lead at Anthropic.

I no longer prompt the model directly. I write loops that run and prompt the model and figure out what to do - and writing those loops is now the job.

- the Boris Cherny framing, paraphrased

The thing to notice is what he is not saying. He is not saying coding got easier. He is saying the highest-value work moved. A well-designed loop multiplies a good engineer. A badly designed loop multiplies a bad decision just as fast, with less of you watching.

Keep some skepticism here. Much of the loudest writing comes from vendors, evangelists, and people selling consulting around it. Thoughtworks put the Ralph technique on its Technology Radar as Trial, not Adopt - the right level of caution for something this young. The splashy cost numbers, the ship-six-repos-overnight kind of claim, are mostly self-reported by the people most invested in the idea catching on. The pattern is real and worth understanding. The marketing around it is running ahead of the evidence.

The pieces of a working loop

A loop that holds together needs five things, plus one place to remember state. The names differ between tools, but the shape is the same everywhere.

Automation

heartbeat

A recurring trigger that surfaces work without you asking. It runs on a schedule and does discovery and triage on its own. Without a heartbeat you do not have a loop, you have one run you did once.

Isolation

worktrees

The moment you run more than one agent, files collide. A git worktree gives each agent its own directory on its own branch, sharing one repo history, so one agent's edits physically cannot touch another's checkout.

Captured knowledge

skills

An agent starts every session cold and fills gaps with a confident guess. A skill is your intent written down outside the model: conventions, build steps, the thing you do not do anymore because of that one incident. Knowledge compounds across runs.

Connectors

MCP

The wiring to your real tools, commonly on the Model Context Protocol. The difference between an agent that says "here is the fix" and a loop that opens the pull request, links the ticket, and pings the team once the build is green.

Sub-agents

maker / checker

The model that wrote the code is far too generous grading its own homework. A second agent, with different instructions and sometimes a different model, catches what the first talked itself into. One explores, one implements, one verifies.

Memory

state on disk

A file or board outside any conversation holding what is done and what is next. The model forgets everything between runs, so state lives where the model does not. The agent forgets. The repo does not.

What one loop looks like end to end

Put the pieces together and a single thread turns into a small control panel. A sensible first loop looks like this.

1

An automation runs every weekday morning

The heartbeat fires against the repo, with no one asking it to. Its prompt calls a triage skill.

2

The triage skill reads the state of the world

Yesterday's failed builds, the open issues, recent commits - then writes its findings into a memory file on disk.

3

Each worthwhile finding opens an isolated worktree

The thread spins up a worktree and sends a sub-agent to draft the fix on its own branch, in its own directory.

4

A second sub-agent reviews the draft

The checker grades the draft against the project's conventions and existing tests - the maker is not allowed to pass its own work.

5

If it passes, a connector ships it

The PR is opened and the ticket updated. Anything the loop cannot handle on its own lands in a queue for a human.

Tomorrow morning the heartbeat fires again, reads the memory file, and picks up where today stopped. You designed the system once - you did not prompt any individual step by hand.

If this is your first loop, start much smaller than the full version. A single automation that triages failed builds into a markdown file every morning, with no automatic merging, already removes a recurring chore - and lets you watch how the loop behaves before you trust it with anything that writes to your repo.

The risks the loop does not solve

This is the part the breathless takes skip, so we will spend real time on it. A loop changes the work. It does not delete you from it. Three problems actually get sharper as the loop gets better.

Verification

A loop running unattended is a loop making mistakes unattended. Splitting verifier from maker makes "done" mean something - but done is a claim, not a proof. A human reading the merged changes stays in the loop no matter how good the checker gets.

Comprehension debt

The faster the loop ships code you did not write, the wider the gap between what lives in your repo and what you understand. A smooth loop just widens that gap faster - unless you read what it produced.

The quiet one

When the loop runs itself, it gets tempting to stop having an opinion and accept whatever it returns. Designing the loop is the cure when you do it with judgment, and the accelerant when you do it to avoid thinking.

Two people can build the exact same loop and get opposite outcomes. One moves faster on work they understand deeply. The other stops understanding the work at all. The loop cannot tell the difference. You can.

This is exactly the Day Two problem

Here is where the pattern stops being a developer curiosity and starts being an operations question, which is the part most of the coverage misses.

A loop is easy to stand up and hard to keep honest. Standing one up is a weekend project. Keeping it producing reliable work for the next year, without quietly accumulating risk, is a different discipline entirely. It needs verifiers you actually trust, sensible limits on how often it runs and how much it spends, captured knowledge that stays current, and the security plumbing to let an autonomous process touch real systems without becoming a liability.

70-90%
of AI projects stall after the demo and never reach production
0
weekend to stand a loop up - the easy part
the Day Two stretch where it has to keep earning its keep

That is the same gap that swallows most AI projects. They stall not because the model was wrong, but because no one built the operations, instrumentation, and governance to sustain it after go-live. A loop makes that gap concrete. The demo is the bash script that ran once and produced something impressive. Day Two is the loop still running in March, touching production, spending tokens, opening pull requests while you are asleep, and either earning its keep or compounding small errors into large ones.

This is the People, Process, Platform order, applied to autonomy.

Platform

Automations, worktrees, connectors, sub-agents, memory. The easy part, and increasingly it ships inside the tools.

Process

The verification design, the cost controls, the cadence, the boundaries within which the loop is allowed to act.

People

The engineers who stay accountable for what the loop ships and keep their judgment in the design and review path.

A loop built by someone who intends to remain the engineer is leverage. A loop built by someone trying to stop thinking is a slow-motion incident.

If you take one thing from all of this: build the loop like someone who intends to stay the engineer, not just the person who presses go.

Where to start

You do not need a platform to try this. You need a small, well-specified, mechanical task with a clear pass or fail condition. A file you have meant to clean up. A migration with a test suite that tells you when it is done.

Pick a pass/fail task

Small, mechanical, well-specified, with a condition that tells you objectively when it is done. A test suite is ideal.

Run it in a sandboxed worktree

One iteration by hand. Read the diff and the commit yourself before you run another. Watch how it behaves.

Cap the iterations

Put a hard limit on the loop so a runaway cannot burn through your budget while you are not looking.

Keep auto-merge off

No automatic merging until you have watched the loop behave for a while. Trust is earned one reviewed diff at a time.

Get those habits right on something small, and the larger version is the same shape with more pieces. Get them wrong, and a bigger loop just gets you to the wrong place faster. The technique is genuinely useful. The discipline around it is what decides whether it pays off.

The loop is the easy part. Day Two is the discipline.

Strongly builds autonomous systems the way you would want one touching production: verifiers you trust, cost controls, governance, and engineers who stay accountable for what ships. People, then process, then platform - with Day Two designed in from Day One.

Talk to an FDE

References

  1. Addy Osmani, "Loop Engineering" (Substack). addyo.substack.com/p/loop-engineering
  2. Addy Osmani, "Loop Engineering" (personal blog). addyosmani.com/blog/loop-engineering
  3. Addy Osmani, "Agent Harness Engineering". addyosmani.com/blog/agent-harness-engineering
  4. Geoffrey Huntley, "Ralph Wiggum as a software engineer" (the canonical post). ghuntley.com/ralph
  5. Dex Horthy / HumanLayer, "A Brief History of Ralph". humanlayer.dev/blog/brief-history-of-ralph
  6. Dex Horthy / HumanLayer on the Ralph loop and the Research, Plan, Implement methodology (LinearB). linearb.io
  7. Thomas Wiegold, "The Ralph Loop: How Recursive AI Agents Actually Work". thomas-wiegold.com
  8. "Ralph Loop, The Agent Loop Pattern Where AI Tests and Fixes Itself". ice-ice-bear.github.io
  9. "From ReAct to Ralph Loop: A Continuous Iteration Paradigm for AI Agents," Alibaba Cloud. alibabacloud.com
  10. snarktank/ralph, an open-source Ralph implementation. github.com/snarktank/ralph
  11. vercel-labs/ralph-loop-agent, a Ralph loop for the AI SDK. github.com/vercel-labs/ralph-loop-agent
  12. "What Is Loop Engineering? The New Meta for AI Coding Agents," MindStudio. mindstudio.ai
  13. "Loop Engineering: Designing Systems That Prompt AI Agents," Lushbinary. lushbinary.com
  14. Anthropic, Claude Code documentation. docs.anthropic.com
  15. OpenAI, Codex. openai.com/codex
  16. Model Context Protocol, official documentation. modelcontextprotocol.io