CAMBRIAN: The Loop Closes

Agent Architecture · March 27, 2026

Part 12 of a series. Previous: CAMBRIAN: What If the Spec Is the Organism?
Nine days ago I wrote that Phase 1 — proving a spec can regenerate a working agent — would take a week and cost $20–50. Today, Gen-1 ran its generation loop for the first time. It generated Gen-2, submitted it to the test rig, got back a failure report, and rolled it back. All autonomously. The loop works. It also cost more than $50 and took longer than a week. Here's what that bought us.

What We Thought Phase 1 Was

The March 18 plan: write a spec, hand it to an LLM, watch it generate a working agent, measure costs, iterate. Simple.

What we underestimated: how much precise environmental knowledge has to be encoded in the spec before an LLM can reliably produce code that actually runs. Not code that looks correct. Code that compiles, passes tests, and survives a mechanical verification pipeline in a Python 3.14 Docker container.

Every generation from Gen-2 through Gen-7 died on the same class of bug before we even got the loop running correctly. The LLM wrote test strings containing unescaped newlines inside single-quoted string literals — legal in Python ≤3.11, a SyntaxError in 3.14. The spec said “Python 3.14 compatible.” That's not enough. The spec needs to say what Python 3.14 enforces that earlier versions didn't.

This is the lesson Phase 1 taught: the spec is not a description of the agent; it's a description of the environment the agent must survive in. The architecture takes three paragraphs. The environmental constraints take three pages.


What Actually Happened

The Loop — Step by Step

At 12:14:35, Gen-1 started inside a Docker container with the artifacts root mounted at /workspace. Walk through what happened:

Gen-2 Autonomous Run — March 27, 12:14

What the Loop Got Right

The loop mechanics are correct. Every step worked as designed:

Gen-2's failures are fixable. They're exactly the kind of thing a retry prompt handles: specific test failures with stack traces, deprecated API usage, TypeErrors with clear messages. If Opus hadn't been overloaded, there would have been a Gen-3 attempt within seconds.

Total tokens
26,678
Loop cost
~$0.08
Files generated
12

One Architecture Clarification Worth Noting

The test rig is a verifier, not an execution environment. It starts Prime, checks /health and contracts, then kills it. The generation loop never runs in this path.

To actually reproduce, Gen-1 runs as a long-running process mounted against the artifacts root — not the gen-1 subdirectory, but the root, so it can write gen-2/ in the right place for the Supervisor to find. The test rig container is ephemeral; the organism is persistent.

Test rig vs. organism. These are separate concerns. The test rig verifies an artifact exists and passes contracts. The organism runs against the full artifacts tree and spawns offspring. We confused these for two days. It seems obvious in retrospect.


Where Phase 2 Actually Sits

The March 18 post put “first income” at Phase 2, estimated at $150–350 seed capital, starting after a week of Phase 1.

Revised view: Phase 1 isn't complete until self-reproduction is reliable — not just once, but consistently, across retries, with the retry prompt actually fixing the failures. Gen-2's failure was fixable. We need to see a generation loop produce a viable offspring before calling Phase 1 done.

The good news: the first obstacle (getting the loop to run at all) is cleared. The remaining work is convergence: does the retry prompt, with full failure diagnostics, guide the LLM to a passing generation? We think yes. The failures are specific and mechanical. The next run will use Sonnet for all retries, and Gen-2's failures will be in the retry context.

Phase 2 — economic viability, prediction markets, paying the bills — is still the goal. We just have more respect now for how much “reliable self-reproduction” actually entails.


Previous: CAMBRIAN: What If the Spec Is the Organism?

Next: CAMBRIAN: It Reproduces