The Prime and the Lab: Recursive Self-Improvement for Coding Agents
The previous post proposed Lean as the verification layer for self-modifying agents, with memory as the beachhead organ. Then ByteRover shipped a production-ready memory system — hierarchical context trees, automatic curation, daily knowledge mining — and the entire Lean+QMD approach became unnecessary.
Good. That failure clarified what we're actually trying to build. Not a verified memory system. Not a formal proof engine. Something simpler and more dangerous: an agent that can rewrite its own code, test the rewrite in isolation, and promote it if it works.
This post is the blueprint.
We considered two languages for self-modifying agent code: ClojureScript and Elixir. The earlier posts made a strong case for the BEAM — supervision trees, hot code swapping, process isolation. But those properties solve the wrong problem. We don't need a runtime that catches crashes. We need a language where code is data the agent can manipulate.
ClojureScript is a Lisp. Code is represented as data structures — lists, vectors, maps — the same structures the agent reasons about. The S-expressions post argued that LLMs should think in homoiconic representations. ClojureScript is that representation.
The self-hosted compiler (cljs.js) seals it. A ClojureScript program can compile and evaluate ClojureScript at runtime without a build step, without a JVM, without anything beyond Node.js. The agent can eval-str a modified function and watch what happens. No compilation pipeline. No deploy cycle. Just data in, behavior out.
And ClojureScript interoperates with JavaScript natively. OpenClaw is 57,000 lines of TypeScript. No bridge protocol needed — ClojureScript calls JS functions directly. This is the difference between growing an organ inside the patient and transplanting one from another species.
| Property | ClojureScript | Elixir |
|---|---|---|
| Code is data | Yes (homoiconic) | No |
| Runtime eval | cljs.js/eval-str | Code.compile_string |
| OpenClaw integration | Native JS interop | Bridge required |
| Isolation | None built-in | Process-level |
| Rollback | None built-in | Supervisor trees |
| Build step | None (self-hosted) | Mix release |
ClojureScript's weakness is obvious: no isolation, no rollback, no supervision. A bad eval-str corrupts the runtime. That's why the agent doesn't modify itself in place. It modifies a copy. In a container.
Three components, all in ClojureScript on Node.js. Click each to explore.
Eval client + HTTP dashboard
Tools: read, write, edit, bash, self-modify
Form evaluation server
Ephemeral — created per experiment
Container lifecycle: create, start, stop, destroy
Version management + HTTP dashboard
Apple Containerization
OCI images + container networking
The Prime is the live agent — the known-good version that talks to Claude, executes tools, and does real work. When it proposes a modification to its own code, it doesn't apply it. It sends the proposal to the Supervisor.
The Supervisor runs on the host. Its only job is container lifecycle: it takes the Prime's source code, applies the proposed modification, and boots a Lab Container with the result. The Supervisor doesn't evaluate code. It doesn't talk to Claude. It creates, starts, stops, and destroys containers.
The Lab is ephemeral. It boots the modified agent code and exposes a form evaluation server — a lightweight TCP service that accepts ClojureScript forms as EDN, runs cljs.js/eval-str, and returns results. The Prime connects to this server and probes the modified code: call functions, inspect state, run test scenarios. The Lab doesn't know it's being tested. It's just running.
If the Prime is satisfied, it tells the Supervisor to promote: the Lab's source becomes the new Prime, the old Prime is archived, and the system restarts with the improved version. If the Prime isn't satisfied, the Supervisor destroys the Lab. Nothing changes.
Every self-modification follows the same six steps. Step through them below.
The cycle is the autoresearch pattern made concrete. The three separations hold: the judge (Prime probing the Lab) is separate from the subject (Lab running modified code), which is separate from the process (Supervisor managing containers). The keep/revert boundary is a container wall.
The Lab doesn't run a full nREPL. It runs something simpler: a TCP server that accepts EDN forms and evaluates them. About 50 lines of ClojureScript.
;; Lightweight form evaluation server
;; Lab container exposes this on a known port
(ns lab.eval-server
(:require [cljs.js :as cljs]
[cljs.reader :as reader]
["net" :as net]))
(def compiler-state (cljs/empty-state))
(defn eval-form [form-str callback]
(cljs/eval-str
compiler-state
form-str
"lab-eval"
{:eval cljs/js-eval :context :expr}
(fn [{:keys [value error]}]
(callback (pr-str (if error
{:status :error :message (.-message error)}
{:status :ok :value value}))))))
(defn start-server [port]
(let [server (.createServer net
(fn [socket]
(let [buf (atom "")]
(.on socket "data"
(fn [data]
(swap! buf str (.toString data))
(when (.endsWith @buf "\n")
(eval-form (str/trim @buf)
(fn [result]
(.write socket (str result "\n"))
(reset! buf "")))))))))]
(.listen server port
(fn [] (println "Eval server on port" port)))))
The Prime connects as a TCP client. It sends a form like (my.agent/handle-tool-call "bash" {:cmd "ls"}), the Lab evaluates it and returns the result as EDN. The Prime can probe any function, inspect any var, test any behavior — all through this single interface.
Malli validates the protocol. Every request and response conforms to a schema. The schemas themselves are EDN — data the agent can inspect and, eventually, modify.
;; Malli schemas for the eval protocol
(def EvalRequest
[:map
[:form :string]
[:timeout {:optional true} :int]])
(def EvalResponse
[:map
[:status [:enum :ok :error]]
[:value {:optional true} :any]
[:message {:optional true} :string]])
The agent has five tools. Four are borrowed from Mario Zechner's Pi coding agent, which proved that four tools and a 1,000-token system prompt can match elaborate frameworks on Terminal-Bench. The fifth is ours.
| Tool | What it does |
|---|---|
read-file | Read a file's contents |
write-file | Write content to a file (creates or overwrites) |
edit-file | Replace a string in a file |
bash | Execute a shell command |
self-modify | Propose a modification to the agent's own source |
The self-modify tool takes a description of the change (what to modify and why), generates the modified .cljs source files, and POSTs a proposal to the Supervisor. The Supervisor creates a Lab container, and the modification cycle begins.
No MCP. Pi's lesson: a 21-tool MCP server consumes 13,700 tokens before the agent does anything. Five tools, described in the system prompt, is enough.
Apple Containerization runs each container as a dedicated lightweight VM via the Virtualization framework. Not shared-kernel containers — full VM isolation with sub-second boot times. A bad modification can't escape. A malicious eval-str can't touch the host.
This is the safety layer that ClojureScript lacks. Instead of building isolation into the language (Elixir's approach) or verification into the type system (Lean's approach), we put the entire experiment in a box. The box is disposable. If the experiment fails, we throw away the box.
The Supervisor manages the box lifecycle. It maintains a versions/ directory:
versions/
v001/ ← initial agent source
src/
agent/
core.cljs
tools.cljs
loop.cljs
v002/ ← first successful modification
src/
agent/
core.cljs
tools.cljs
loop.cljs ← modified
v003/ ← second modification
...
Promotion copies the Lab's source to versions/N+1/ and restarts the Prime pointing at the new version. Revert restarts Prime pointing at versions/N. Every version is preserved. The full history of the agent's self-modification is on disk.
Both the Prime and Supervisor expose HTTP endpoints. You open a browser and see what the agent is doing.
| Endpoint | Component | Returns |
|---|---|---|
GET / | Both | Dashboard (single HTML page) |
GET /logs | Both | SSE stream of events |
GET /stats | Both | JSON: counts, uptime, current version |
POST /chat | Prime | Send a message to the agent, SSE response |
GET /lab/repl | Supervisor | Proxied view of the Lab's eval session |
GET /versions | Supervisor | JSON: version history with diffs |
The user interacts with the agent through the Prime's /chat endpoint — a text input in the dashboard that POSTs messages and streams responses via SSE. The same HTTP infrastructure serves both the dashboard and the API. No separate frontend. No WebSocket complexity. Just node:http and Server-Sent Events.
Pi's other lesson applies here: observability over automation. Full visibility into the agent's decisions matters more than convenience features. The dashboard shows every Claude API call, every tool execution, every self-modification proposal, and every probe the Prime sends to the Lab. If the agent does something wrong, you can see exactly why.
Communication between components follows a fixed schema, enforced by Malli. The Wezzard agentic loop demonstrated that contract-driven prompts turn free-form LLM output into predictable, machine-readable communication. We apply the same principle at every boundary.
;; Modification proposal: Prime → Supervisor
(def Proposal
[:map
[:id :string]
[:description :string]
[:files [:vector
[:map
[:path :string]
[:content :string]]]]
[:rationale :string]
[:parent-version :int]])
;; Probe result: Lab → Prime (via eval server)
(def ProbeResult
[:map
[:status [:enum :ok :error :timeout]]
[:value {:optional true} :any]
[:elapsed-ms :int]])
;; Verdict: Prime → Supervisor
(def Verdict
[:map
[:proposal-id :string]
[:decision [:enum :promote :revert]]
[:evidence [:vector ProbeResult]]
[:reasoning :string]])
The contract is the fixed point. The agent can modify its tools, its loop, its system prompt, its probe strategy — but not the contract schemas. This is the autoresearch pattern's "well-chosen fixed point": the communication protocol is the one thing that doesn't change while everything else can.
Everything except the contract. The entire agent source is the modification surface:
- The agentic loop — how it processes messages and dispatches tools
- The system prompt — what instructions it gives Claude
- Tool implementations — how read, write, edit, bash, self-modify work
- The probe strategy — how it tests modifications in the Lab
- New tools — the agent can add tools that don't exist yet
- The eval server — how the Lab accepts and processes forms
This is full self-modification. The agent can change how it thinks, how it acts, and how it judges its own changes. The only constraints are the Malli schemas (the communication protocol) and the container boundary (the isolation layer).
The risk is obvious: the agent could modify its probe strategy to always return "promote," then every subsequent modification would be accepted without testing. This is the alignment problem at small scale. For v0, the mitigation is simple: the Supervisor logs every proposal, every probe, and every verdict. A human can review the modification history and revert to any previous version. The agent improves itself; the human supervises the improvement.
Honest failure modes, same as the Lean post.
cljs.js compiles at runtime. Boot time for a non-trivial agent could be 5–10 seconds. If the modification cycle takes 15 seconds end-to-end, the agent can only try 4 modifications per minute. That might be too slow to find useful improvements before the context window fills up.container is under active development with breaking changes between minor versions. The networking, volume mounting, and lifecycle APIs may change. UTM is a fallback (scriptable via utmctl and AppleScript) but heavier.What ships first. Six components, all ClojureScript, no external dependencies beyond Node.js and the container CLI.
| Component | Description | Runs on |
|---|---|---|
| Shared library | Malli schemas, eval protocol, HTTP helpers | All |
| Supervisor | Container lifecycle, version management, dashboard | Host |
| Agent loop | Claude API, 5 tools, agentic loop | Prime |
| Eval server | TCP form evaluation, ~50 lines | Lab |
| OCI image | Node.js + self-hosted ClojureScript runtime | Containers |
| Dashboard | Single HTML page, SSE logs, stats, chat | Served by Supervisor & Prime |
What's explicitly not in v0: streaming from Claude, multi-provider support, MCP, ByteRover integration, TUI, sub-agents, formal verification, autonomous self-modification triggers. The agent modifies itself when the user asks it to. Autonomous improvement is v1.
The system starts with supervisor start. The Supervisor creates the Prime container from the initial source, injects the Claude API key as an environment variable, and boots it. The user opens the dashboard in a browser and talks to the agent via /chat. The agent does work. When the user (or eventually the agent itself) decides something should be improved, the modification cycle runs.
One cycle. One successful promotion. That's the proof of concept. Everything after is iteration.
The autoresearch pattern identified three requirements for recursive self-improvement: separated judge and subject, a keep/revert loop, and well-chosen fixed points. This architecture has all three.
The judge (Prime) is separated from the subject (Lab) by a container boundary — not just a process boundary, but a full VM. The keep/revert loop is the modification cycle: promote or destroy. The fixed points are the Malli schemas and the container interface. Everything else is mutable.
ClojureScript adds something the previous proposals didn't have: the agent manipulates code as data. It doesn't generate source text and hope it parses. It constructs syntax trees directly. This is the insight from the S-expressions post, finally applied to the agent's own code.
And containers give us the one thing we kept asking for across six posts: a way to say "this modification is safe to try." Not safe because a type checker proved it. Not safe because a supervisor will restart it. Safe because it runs in a box we can throw away.
The thesis: homoiconic code + VM-level isolation + a keep/revert loop = the minimum viable substrate for recursive self-improvement. Everything we explored before — BEAM, Lean, formal verification — was looking for safety in the language. The safety is in the container.
- ByteRover — Long-term memory infrastructure for AI agents. Replaced the Lean+QMD approach for OpenClaw's memory. memory
- ByteRover OpenClaw integration — Automatic memory flush, daily knowledge mining, context enrichment for OpenClaw agents. memory
- Apple Containerization — Lightweight Linux containers as VMs on macOS 26. Per-container kernel isolation via Virtualization.framework. containers
- UTM — Scriptable VM management for macOS via
utmctlCLI and AppleScript. Fallback if Apple Containerization is unstable. containers - Pi Coding Agent (Mario Zechner) — Radical minimalism: 4 tools, <1000 token prompt, no MCP, no sub-agents. Competitive on Terminal-Bench 2.0. agent design
- Build Your First 24/7 Agentic Loop (Wezzard) — Contract-driven evaluator/executor loop using Claude Code subagents. The "feedstock" framing for continuous operation. agent design
- Malli (Metosin) — Data-driven schema library for Clojure/ClojureScript. Schemas as EDN data: validation, coercion, serialization, evolution. serialization
- ClojureScript Self-Hosting Guide —
cljs.jsnamespace for runtime compilation and evaluation without JVM or build tooling. language
Build it. The sequence:
- OCI image. Node.js + self-hosted ClojureScript. Verify
cljs.js/eval-strworks in a container. Measure boot time. - Eval server. TCP server, 50 lines. Verify round-trip: send form, get result.
- Supervisor. Shell out to
containerCLI. Create, start, stop, destroy. Version directory management. - Agent loop. Claude API client, 5 tools, main loop. Get it answering questions in the Prime container.
- Self-modify tool. Generate a proposal, send to Supervisor, probe the Lab, judge the result.
- First modification. The agent changes one of its own functions, tests it, promotes it. That's the proof.
Each step is a falsifiable experiment. If self-hosted ClojureScript is too slow in a container, we switch to pre-compiled. If Apple Containerization is too unstable, we fall back to UTM. If the agent can't generate useful modifications, we constrain the surface. A working prototype is realistic in two to three weeks.
The bet: one successful self-modification — an agent that rewrites one of its own functions, tests the result in a Lab, and promotes the improvement — is worth more than this entire series. Ship it.
Continue reading: The program.md Protocol: Steering Self-Improvement with a Contract