The Prime and the Lab: Recursive Self-Improvement for Coding Agents

Agent Architecture · March 12, 2026

Part 7 of a series. See: Making OpenClaw Self-Aware, Runtime Self-Modification, Porting to Elixir, Loom, The Autoresearch Pattern, and Loom in Lean. Next: The program.md Protocol.
We abandoned Lean. We abandoned the BEAM. ByteRover solved memory. What's left is the real question: can an agent rewrite its own code, test the result in isolation, and keep only what works? This post is both the argument and the spec. A ClojureScript agent running in a container, modifying itself, and proving the modification in a second container before promoting it. No frameworks. No build step. Just a loop, a contract, and two containers.

The previous post proposed Lean as the verification layer for self-modifying agents, with memory as the beachhead organ. Then ByteRover shipped a production-ready memory system — hierarchical context trees, automatic curation, daily knowledge mining — and the entire Lean+QMD approach became unnecessary.

Good. That failure clarified what we're actually trying to build. Not a verified memory system. Not a formal proof engine. Something simpler and more dangerous: an agent that can rewrite its own code, test the rewrite in isolation, and promote it if it works.

This post is the blueprint.


Why ClojureScript

We considered two languages for self-modifying agent code: ClojureScript and Elixir. The earlier posts made a strong case for the BEAM — supervision trees, hot code swapping, process isolation. But those properties solve the wrong problem. We don't need a runtime that catches crashes. We need a language where code is data the agent can manipulate.

ClojureScript is a Lisp. Code is represented as data structures — lists, vectors, maps — the same structures the agent reasons about. The S-expressions post argued that LLMs should think in homoiconic representations. ClojureScript is that representation.

The self-hosted compiler (cljs.js) seals it. A ClojureScript program can compile and evaluate ClojureScript at runtime without a build step, without a JVM, without anything beyond Node.js. The agent can eval-str a modified function and watch what happens. No compilation pipeline. No deploy cycle. Just data in, behavior out.

And ClojureScript interoperates with JavaScript natively. OpenClaw is 57,000 lines of TypeScript. No bridge protocol needed — ClojureScript calls JS functions directly. This is the difference between growing an organ inside the patient and transplanting one from another species.

PropertyClojureScriptElixir
Code is dataYes (homoiconic)No
Runtime evalcljs.js/eval-strCode.compile_string
OpenClaw integrationNative JS interopBridge required
IsolationNone built-inProcess-level
RollbackNone built-inSupervisor trees
Build stepNone (self-hosted)Mix release

ClojureScript's weakness is obvious: no isolation, no rollback, no supervision. A bad eval-str corrupts the runtime. That's why the agent doesn't modify itself in place. It modifies a copy. In a container.


The Architecture

Three components, all in ClojureScript on Node.js. Click each to explore.

System architecture — click to explore
Prime Container
Agent loop + Claude API
Eval client + HTTP dashboard
Tools: read, write, edit, bash, self-modify
Lab Container
Modified agent code
Form evaluation server
Ephemeral — created per experiment
Supervisor
Host process (macOS)
Container lifecycle: create, start, stop, destroy
Version management + HTTP dashboard
 
Host Environment
macOS 26 + Apple Silicon
Apple Containerization
OCI images + container networking
Click any component above to see how it works.

The Prime is the live agent — the known-good version that talks to Claude, executes tools, and does real work. When it proposes a modification to its own code, it doesn't apply it. It sends the proposal to the Supervisor.

The Supervisor runs on the host. Its only job is container lifecycle: it takes the Prime's source code, applies the proposed modification, and boots a Lab Container with the result. The Supervisor doesn't evaluate code. It doesn't talk to Claude. It creates, starts, stops, and destroys containers.

The Lab is ephemeral. It boots the modified agent code and exposes a form evaluation server — a lightweight TCP service that accepts ClojureScript forms as EDN, runs cljs.js/eval-str, and returns results. The Prime connects to this server and probes the modified code: call functions, inspect state, run test scenarios. The Lab doesn't know it's being tested. It's just running.

If the Prime is satisfied, it tells the Supervisor to promote: the Lab's source becomes the new Prime, the old Prime is archived, and the system restarts with the improved version. If the Prime isn't satisfied, the Supervisor destroys the Lab. Nothing changes.


The Modification Cycle

Every self-modification follows the same six steps. Step through them below.

Modification cycle — click steps or use controls
Propose
Spawn Lab
Probe
Judge
Promote
Repeat
Click a step above to see what happens.

The cycle is the autoresearch pattern made concrete. The three separations hold: the judge (Prime probing the Lab) is separate from the subject (Lab running modified code), which is separate from the process (Supervisor managing containers). The keep/revert boundary is a container wall.


The Form Evaluation Server

The Lab doesn't run a full nREPL. It runs something simpler: a TCP server that accepts EDN forms and evaluates them. About 50 lines of ClojureScript.

;; Lightweight form evaluation server
;; Lab container exposes this on a known port

(ns lab.eval-server
  (:require [cljs.js :as cljs]
            [cljs.reader :as reader]
            ["net" :as net]))

(def compiler-state (cljs/empty-state))

(defn eval-form [form-str callback]
  (cljs/eval-str
    compiler-state
    form-str
    "lab-eval"
    {:eval cljs/js-eval :context :expr}
    (fn [{:keys [value error]}]
      (callback (pr-str (if error
                          {:status :error :message (.-message error)}
                          {:status :ok    :value value}))))))

(defn start-server [port]
  (let [server (.createServer net
                 (fn [socket]
                   (let [buf (atom "")]
                     (.on socket "data"
                       (fn [data]
                         (swap! buf str (.toString data))
                         (when (.endsWith @buf "\n")
                           (eval-form (str/trim @buf)
                             (fn [result]
                               (.write socket (str result "\n"))
                               (reset! buf "")))))))))]
    (.listen server port
      (fn [] (println "Eval server on port" port)))))

The Prime connects as a TCP client. It sends a form like (my.agent/handle-tool-call "bash" {:cmd "ls"}), the Lab evaluates it and returns the result as EDN. The Prime can probe any function, inspect any var, test any behavior — all through this single interface.

Malli validates the protocol. Every request and response conforms to a schema. The schemas themselves are EDN — data the agent can inspect and, eventually, modify.

;; Malli schemas for the eval protocol

(def EvalRequest
  [:map
   [:form :string]
   [:timeout {:optional true} :int]])

(def EvalResponse
  [:map
   [:status [:enum :ok :error]]
   [:value {:optional true} :any]
   [:message {:optional true} :string]])

The Five Tools

The agent has five tools. Four are borrowed from Mario Zechner's Pi coding agent, which proved that four tools and a 1,000-token system prompt can match elaborate frameworks on Terminal-Bench. The fifth is ours.

ToolWhat it does
read-fileRead a file's contents
write-fileWrite content to a file (creates or overwrites)
edit-fileReplace a string in a file
bashExecute a shell command
self-modifyPropose a modification to the agent's own source

The self-modify tool takes a description of the change (what to modify and why), generates the modified .cljs source files, and POSTs a proposal to the Supervisor. The Supervisor creates a Lab container, and the modification cycle begins.

No MCP. Pi's lesson: a 21-tool MCP server consumes 13,700 tokens before the agent does anything. Five tools, described in the system prompt, is enough.


Containers as the Keep/Revert Boundary

Apple Containerization runs each container as a dedicated lightweight VM via the Virtualization framework. Not shared-kernel containers — full VM isolation with sub-second boot times. A bad modification can't escape. A malicious eval-str can't touch the host.

This is the safety layer that ClojureScript lacks. Instead of building isolation into the language (Elixir's approach) or verification into the type system (Lean's approach), we put the entire experiment in a box. The box is disposable. If the experiment fails, we throw away the box.

The Supervisor manages the box lifecycle. It maintains a versions/ directory:

versions/
  v001/          ← initial agent source
    src/
      agent/
        core.cljs
        tools.cljs
        loop.cljs
  v002/          ← first successful modification
    src/
      agent/
        core.cljs
        tools.cljs
        loop.cljs   ← modified
  v003/          ← second modification
    ...

Promotion copies the Lab's source to versions/N+1/ and restarts the Prime pointing at the new version. Revert restarts Prime pointing at versions/N. Every version is preserved. The full history of the agent's self-modification is on disk.


Observability

Both the Prime and Supervisor expose HTTP endpoints. You open a browser and see what the agent is doing.

EndpointComponentReturns
GET /BothDashboard (single HTML page)
GET /logsBothSSE stream of events
GET /statsBothJSON: counts, uptime, current version
POST /chatPrimeSend a message to the agent, SSE response
GET /lab/replSupervisorProxied view of the Lab's eval session
GET /versionsSupervisorJSON: version history with diffs

The user interacts with the agent through the Prime's /chat endpoint — a text input in the dashboard that POSTs messages and streams responses via SSE. The same HTTP infrastructure serves both the dashboard and the API. No separate frontend. No WebSocket complexity. Just node:http and Server-Sent Events.

Pi's other lesson applies here: observability over automation. Full visibility into the agent's decisions matters more than convenience features. The dashboard shows every Claude API call, every tool execution, every self-modification proposal, and every probe the Prime sends to the Lab. If the agent does something wrong, you can see exactly why.


The Contract

Communication between components follows a fixed schema, enforced by Malli. The Wezzard agentic loop demonstrated that contract-driven prompts turn free-form LLM output into predictable, machine-readable communication. We apply the same principle at every boundary.

;; Modification proposal: Prime → Supervisor
(def Proposal
  [:map
   [:id :string]
   [:description :string]
   [:files [:vector
            [:map
             [:path :string]
             [:content :string]]]]
   [:rationale :string]
   [:parent-version :int]])

;; Probe result: Lab → Prime (via eval server)
(def ProbeResult
  [:map
   [:status [:enum :ok :error :timeout]]
   [:value {:optional true} :any]
   [:elapsed-ms :int]])

;; Verdict: Prime → Supervisor
(def Verdict
  [:map
   [:proposal-id :string]
   [:decision [:enum :promote :revert]]
   [:evidence [:vector ProbeResult]]
   [:reasoning :string]])

The contract is the fixed point. The agent can modify its tools, its loop, its system prompt, its probe strategy — but not the contract schemas. This is the autoresearch pattern's "well-chosen fixed point": the communication protocol is the one thing that doesn't change while everything else can.


What Can the Agent Modify?

Everything except the contract. The entire agent source is the modification surface:

This is full self-modification. The agent can change how it thinks, how it acts, and how it judges its own changes. The only constraints are the Malli schemas (the communication protocol) and the container boundary (the isolation layer).

The risk is obvious: the agent could modify its probe strategy to always return "promote," then every subsequent modification would be accepted without testing. This is the alignment problem at small scale. For v0, the mitigation is simple: the Supervisor logs every proposal, every probe, and every verdict. A human can review the modification history and revert to any previous version. The agent improves itself; the human supervises the improvement.


What Kills This

Honest failure modes, same as the Lean post.

Self-hosted ClojureScript is slow
cljs.js compiles at runtime. Boot time for a non-trivial agent could be 5–10 seconds. If the modification cycle takes 15 seconds end-to-end, the agent can only try 4 modifications per minute. That might be too slow to find useful improvements before the context window fills up.
"Doesn't crash" isn't a judge
For v0, the success criterion is "the Lab boots and responds to probes without error." This is the same gap we identified in the BEAM: it catches crashes, not subtle degradation. A modification that makes the agent slightly worse at tool selection will pass this test every time. Real judgment requires real metrics — response quality, task completion, token efficiency. That's v1.
The probe strategy problem
The agent chooses what probes to send to the Lab. If it modifies its own probe strategy to be less rigorous, subsequent modifications face weaker testing. This is a slow drift toward accepting bad changes. Mitigation: log everything, human review, and eventually a fixed probe suite that the agent can't modify.
Apple Containerization is pre-1.0
container is under active development with breaking changes between minor versions. The networking, volume mounting, and lifecycle APIs may change. UTM is a fallback (scriptable via utmctl and AppleScript) but heavier.
Context window economics
Every modification cycle consumes Claude API tokens: the proposal, the probe results, the verdict. At Claude's pricing, running this loop 24/7 could cost hundreds of dollars per day. The agent needs to be selective about what it tries to improve, which requires judgment it may not have yet.

The MVP

What ships first. Six components, all ClojureScript, no external dependencies beyond Node.js and the container CLI.

ComponentDescriptionRuns on
Shared libraryMalli schemas, eval protocol, HTTP helpersAll
SupervisorContainer lifecycle, version management, dashboardHost
Agent loopClaude API, 5 tools, agentic loopPrime
Eval serverTCP form evaluation, ~50 linesLab
OCI imageNode.js + self-hosted ClojureScript runtimeContainers
DashboardSingle HTML page, SSE logs, stats, chatServed by Supervisor & Prime

What's explicitly not in v0: streaming from Claude, multi-provider support, MCP, ByteRover integration, TUI, sub-agents, formal verification, autonomous self-modification triggers. The agent modifies itself when the user asks it to. Autonomous improvement is v1.

The system starts with supervisor start. The Supervisor creates the Prime container from the initial source, injects the Claude API key as an environment variable, and boots it. The user opens the dashboard in a browser and talks to the agent via /chat. The agent does work. When the user (or eventually the agent itself) decides something should be improved, the modification cycle runs.

One cycle. One successful promotion. That's the proof of concept. Everything after is iteration.


Why This Might Work

The autoresearch pattern identified three requirements for recursive self-improvement: separated judge and subject, a keep/revert loop, and well-chosen fixed points. This architecture has all three.

The judge (Prime) is separated from the subject (Lab) by a container boundary — not just a process boundary, but a full VM. The keep/revert loop is the modification cycle: promote or destroy. The fixed points are the Malli schemas and the container interface. Everything else is mutable.

ClojureScript adds something the previous proposals didn't have: the agent manipulates code as data. It doesn't generate source text and hope it parses. It constructs syntax trees directly. This is the insight from the S-expressions post, finally applied to the agent's own code.

And containers give us the one thing we kept asking for across six posts: a way to say "this modification is safe to try." Not safe because a type checker proved it. Not safe because a supervisor will restart it. Safe because it runs in a box we can throw away.

The thesis: homoiconic code + VM-level isolation + a keep/revert loop = the minimum viable substrate for recursive self-improvement. Everything we explored before — BEAM, Lean, formal verification — was looking for safety in the language. The safety is in the container.


References

What Comes Next

Build it. The sequence:

  1. OCI image. Node.js + self-hosted ClojureScript. Verify cljs.js/eval-str works in a container. Measure boot time.
  2. Eval server. TCP server, 50 lines. Verify round-trip: send form, get result.
  3. Supervisor. Shell out to container CLI. Create, start, stop, destroy. Version directory management.
  4. Agent loop. Claude API client, 5 tools, main loop. Get it answering questions in the Prime container.
  5. Self-modify tool. Generate a proposal, send to Supervisor, probe the Lab, judge the result.
  6. First modification. The agent changes one of its own functions, tests it, promotes it. That's the proof.

Each step is a falsifiable experiment. If self-hosted ClojureScript is too slow in a container, we switch to pre-compiled. If Apple Containerization is too unstable, we fall back to UTM. If the agent can't generate useful modifications, we constrain the surface. A working prototype is realistic in two to three weeks.

The bet: one successful self-modification — an agent that rewrites one of its own functions, tests the result in a Lab, and promotes the improvement — is worth more than this entire series. Ship it.


Continue reading: The program.md Protocol: Steering Self-Improvement with a Contract