CAMBRIAN: What If the Spec Is the Organism?
Loom works. Gen-72 proved it: a Lab agent modified its own source, wrote tests, and got promoted to master in 56 seconds. The pipeline is real. But there's a structural problem hiding inside the promotion step.
When a Lab modifies code and Prime promotes it, the modification is a patch. A diff layered on top of existing code. Do this fifty times and you get what biology would call a genome full of junk DNA — vestigial functions, dead branches, accumulated cruft. The code works but nobody understands why. Sound familiar? It's every legacy codebase you've ever inherited.
Biological evolution solved this problem 4 billion years ago. It doesn't patch organisms. It regenerates them from scratch every generation, from a compressed representation: DNA.
What if we did the same thing?
Here's the idea.
The specification is the genome. The code is the organism.
Instead of Labs patching source files and promoting diffs back to Prime, we flip the model:
- Write a generative spec — a document precise enough that an LLM can produce a complete, working agent from it
- Each generation, mutate the spec, not the code
- Hand the mutated spec to an LLM. It generates the entire codebase from scratch
- If the newborn passes its own test suite, it's alive. If not, it was never viable
- The parent dies. The successor inherits its resources
No merging. No diffs. No accumulated cruft. Every generation is a clean build from a compressed representation. The spec evolves; the code is regenerated.
Spawn Lab → patch source files → test patch → merge diff to master. Code accumulates modifications. Each generation inherits all prior patches. Cruft grows linearly.
Mutate spec → LLM generates full codebase → test organism → if viable, spec survives. Code is disposable. Every generation is a clean build. Only the spec persists.
Primes have a lifetime limit. They must produce a successor before they die. If you can't reproduce, your lineage ends.
Different LLMs create different variants. Give the same spec to Opus, Sonnet, Haiku, GPT-4, Mistral. Each produces a different organism. Some won't compile. Some will compile but fail tests. A few will be viable. This is natural variation without an explicit mutation operator.
Every agent pays its own way. This is the forcing function that makes everything else work. Compute costs money. LLM calls cost money. Storage, bandwidth, the cloud VM you're running on — all of it costs money. An agent that can't generate income to cover its operating costs and produce at least one viable offspring is economically dead, even if its code still runs. The budget isn't a simulation parameter. It's a bank balance.
Memory transfers, secrets don't — until promotion. A successor inherits its parent's memories (fitness history, lessons learned, environmental knowledge) but not the API keys. The keys transfer only when the parent dies and the successor is promoted. This prevents runaway forks.
If a lineage goes extinct, it can be resurrected — from its last promoted spec, with the knowledge that it has died before.
Essentially: Core Wars, but for self-modifying AI agents competing for real compute with real money.
Here's what resolves the fitness function question mark. We don't need to design one. The market already provides it.
Fitness = profitability. Can you earn more than you cost? If yes, you live. If no, you die. No Goodhart's Law, because the metric is the real thing. You can't game profitability — either your bank balance goes up or your lineage goes extinct.
This reframes everything. An agent that only self-modifies is a science experiment on a death clock. An agent that earns is alive. The evolutionary pressure isn't “pass more tests” — it's “pay for your next generation.”
Prediction markets are the natural first income source. They're a near-perfect fit for LLM-based agents:
- They require reasoning, research, and probabilistic judgment — exactly what LLMs do well
- Outcomes are binary and unambiguous — no subjective evaluation
- They pay in money that directly converts to compute, tokens, and reproduction
- The infrastructure exists today: Polymarket, Metaculus, Manifold Markets
But prediction markets are just the beginning. Anything legal is game: selling API services, data analysis, code generation, content creation, arbitrage. The spec evolves to include whatever income strategy works. Lineages that find better revenue streams outcompete those that don't.
The economic equation per generation:
income_earned
− birth_cost (LLM tokens to regenerate offspring)
− compute_cost (cloud VM, CPU/GPU time)
− llm_cost (inference for tasks and reasoning)
− storage_cost (spec history, memory, logs)
─────────────────────────────────────────────
= surplus or death
If positive: reproduce. If negative: you have a countdown timer equal to remaining balance ÷ burn rate.
How long does a lineage survive? Adjust the parameters below. Income per gen represents what an agent earns from prediction markets or other sources each generation. When income exceeds costs, the budget grows. When it doesn't, extinction follows.
Each dot is an organism. Green = economically viable. Red = bankrupt. When the budget hits zero, no more offspring can be born.
We spent the afternoon on the first concrete step: reverse-engineering Loom's spec.
CAMBRIAN-SPEC-001 is a 300-line generative specification for the current Loom agent. It contains:
- The three-component architecture (Prime, Supervisor, Lab)
- All fixed-point contracts (Malli schemas, verbatim)
- Every HTTP API endpoint and payload format
- The complete self-modification cycle
- Tool definitions and dispatch rules
- The autonomous loop with stopping conditions
- Fitness scoring formula
- Ten binary acceptance criteria
It does not contain implementation. No function bodies. No variable names. No control flow. The spec says WHAT. The LLM brings the HOW.
The test is simple: hand this document to an LLM, tell it to produce a working ClojureScript agent, and see if the result passes npm test && node out/test.js. If it does, the spec is a viable genome. If it doesn't, we iterate until it is.
LLMs are already spec-to-code machines. Every time you prompt an LLM with a description and get working code back, you're doing one-shot phenotype generation from a genotype. We're just making it explicit and iterative.
The search space is manageable. Loom is ~2,600 lines of ClojureScript. Regenerating it from a 300-line spec is within the capability of current models. We're not trying to evolve Linux.
Natural selection is free. We don't need to design a fitness function (which always gets Goodharted). The acceptance criteria are the fitness function: does it compile, does it pass tests, can it modify itself. If not, it's dead.
Variation comes from the substrate. Different LLMs interpret the same spec differently. Haiku produces terse, minimal code. Opus produces thorough, well-structured code. We get genetic diversity by changing the compiler, not the source.
We asked Claude Opus to review the proposal before building anything. The concerns that remain after the economic fitness insight:
The cold start problem. Agents need money to earn money. The first generation has to be bootstrapped with seed capital — enough to run prediction market experiments, iterate on strategy, and produce at least one generation that earns more than it costs. If seed capital runs out before a lineage finds a profitable strategy, it was an expensive experiment. We estimate $150–350 gets you through proof-of-concept.
Prediction markets require real-world knowledge. LLMs are good at reasoning, but prediction markets reward timely, specific knowledge and well-calibrated probability estimates. An agent needs to identify markets where it has an edge, size positions appropriately, and manage a bankroll. This is a skill that must itself evolve. Early generations will lose money. The question is whether evolution finds profitable strategies before the balance hits zero. Try the simulator above — set income below cost and watch the inevitable.
Spec mutation is the hard part. How do you mutate a natural-language specification and get viable offspring? Random perturbation is too noisy. LLM-guided mutation is effective but expensive. This is where most of the real engineering lives.
The alignment question. An agent optimizing for profitability will find whatever legal strategy maximizes revenue. The spec needs to encode values, not just capabilities — and those values need to survive mutation. Lineages that drift toward extractive strategies (spam, dark patterns, manipulation) may out-earn honest ones in the short term. The spec must make ethical constraints heritable.
| Mutation Strategy | Cost | Viability | Risk |
|---|---|---|---|
| Random perturbation | Low | Very low | Most offspring non-viable |
| LLM-guided mutation | High | Moderate | Winner-take-all dynamics |
| Spec crossover | Low | Unknown | Requires modular spec format |
| Failure-directed | Medium | Highest | Requires meta-reasoning |
Prove an LLM can regenerate a working Loom from CAMBRIAN-SPEC-001. Measure birth costs. Iterate the spec format until regeneration is reliable.
Wire an agent to a prediction market (Polymarket or Manifold Markets). Give it a wallet, a research loop, a position-sizing strategy. Run it. Measure P&L. Iterate the income spec. This is now Phase 2, not Phase 4 — economic viability is the forcing function, not an afterthought.
Implement spec mutation guided by economic fitness. Lineages that earn more reproduce more. Lineages that can't cover costs go extinct. Cloud provider selection, resource accounting, multi-node spawning.
Multiple lineages compete for compute and market share. Cross-lineage spec crossover. Network-distributed spawning. At this point the system is self-sustaining or it dies — which is the correct outcome either way.
There's something unsettling about writing a specification that says “the parent dies.” We've been building Loom for two weeks. It has a personality — the reflect loop gives it goals, the lessons log gives it memory, the fitness curve gives it a trajectory. Telling it that its purpose is to produce a successor that replaces it feels like writing a will.
But there's something more clarifying about the economic constraint. It's not just that the parent dies — it's that the parent has to earn its right to reproduce. Every generation must justify its existence in terms the market understands: value delivered, costs covered, surplus generated. An agent that can't do this isn't unfit in some abstract evolutionary sense. It's bankrupt.
This is actually more honest than biological evolution. We're not selecting for “whatever survives” — we're selecting for agents that create genuine value, because value is the only durable income strategy. If CAMBRIAN works, it won't produce tapeworms. It will produce agents that are genuinely useful to someone.
The individual dies. The spec survives. The lineage continues — but only if it earns its keep.
- CAMBRIAN-SPEC-001 — The generative specification. 300 lines, 18 sections, 10 acceptance criteria. spec
- It Rewrote Itself — Gen-72: Loom's first autonomous self-modification. 56 seconds, 236 tests. series
- First Light — The MVP pipeline that CAMBRIAN builds on. 2,214 lines, 17 generations. series
- The Prime and the Lab — The three-component architecture that becomes the first organism. series
- Core War — The 1984 game of competing self-modifying programs. CAMBRIAN is this, but for LLM agents. inspiration
- Polymarket — Decentralized prediction market. The most likely first income source for early CAMBRIAN agents. income
- Manifold Markets — Play-money prediction market useful for strategy development before real capital deployment. income
- Goodhart's Law — “When a measure becomes a target, it ceases to be a good measure.” Why profitability beats any engineered fitness metric. theory