OpenAgentCompiler: YOLO agents behind a compile-time allow-list

If you are building more than a handful of agents that need different models, tight permissions, and no ambient runtime framework, this is the shape I landed on after hitting those walls myself. Agent frameworks keep re-inventing the same pile: tool registries, permission objects, per-agent adapters, hand-written JSON configs. OpenAgentCompiler takes the other direction — define the agent once in Python, compile it into a static OpenCode project, let the existing runtime handle the rest. Code snippets below are trimmed to the parts that matter; the rest is on PyPI.

tldr

Compile the agent once — including its permission surface — then run static OpenCode artifacts with no ambient framework. The compiler is build-time only; at runtime there’s nothing to call but opencode run.
YOLO-first, but bounded by a build-time allow-list. "*": "deny" by default, then selective allow per action. Models run without interactive approval, but every tool call has to match a pattern that was decided at compile time.
Bash as the universal tool surface; models, providers, and MCPs stay in shared config. Every tool is a Python script exposed through uv run scripts/<name>.py; providers (Anthropic, OpenAI, vLLM, z.ai) and MCP servers are declared once and referenced by name.
Output is plain markdown + JSON + scripts; runtime is stock OpenCode, unmodified. .opencode/agents/*.md with frontmatter, opencode.json at the root, scripts copied next to them. Human-diffable artifacts all the way down.

Why a compiler and not a framework

The naive way to build a multi-agent system is to instantiate agents at runtime: a Python class per agent, a dictionary of tools, a wrapper around the model client, some decorator to enforce permissions. It works for three agents. It falls over at thirty.

What breaks is not the runtime, it is the state. Every agent has its own prompt, tool list, permissions, model, retry policy. If any of that is Python code rather than data, it is impossible to diff two agents, impossible to A/B a new model across the whole suite, and impossible to let another tool (like OpenCode’s built-in runtime) own the execution. The agent-definition language ends up being “whatever Python you wrote last week”.

The compiler flips this. The agent is data — a frozen dataclass built by a fluent API. Python is only used at compile time to produce that data. At runtime there is no framework: just markdown files, a JSON config, and a shell script that calls opencode run --agent <name>. Debugging a misbehaving agent means reading its compiled .md file, not tracing through builder code.

Build time:  AgentBuilder → AgentDefinition → compile_agent() → OpenCodeWriter → disk
Run time:    opencode run --agent <name>

Everything on the build-time line is your Python code. The run-time line is the stock OpenCode runtime, unmodified.

The core idea is simple: agent definitions should compile to static artifacts, and permission boundaries should be decided at build time, not improvised at runtime. Everything else in this post is a consequence of that.

YOLO-first, gated by a compiled allow-list

The phrase “YOLO agent” has a specific meaning here: the model is not prompted to confirm tool calls, not sandboxed behind a human-in-the-loop review, not rate-limited on write operations. It just runs. The framing comes from the creator of Pi agents — the core framework OpenClaw is built on — which advocates letting the model just rip without human confirmation in the loop. The twist here is that YOLO is bounded by a strict compile-time tool set rather than running against the full shell. The obvious failure mode if you don’t bound it is that a model hallucinates rm -rf ~ and ruins your afternoon — especially likely with weaker locally-hosted models, uncensored fine-tunes, or any other shed-science-level checkpoint where you can’t count on the model not emitting something destructive.

The compiler’s answer is to make the allow-list a build artifact, not a runtime check. Every tool a given agent can call is enumerated at compile time and emitted into the agent’s frontmatter as a permission: block. At runtime, OpenCode refuses anything outside that block. There is no “ask the user” path because there is no user in the loop — the only thing between the model and the shell is the pattern match.

The compiler’s most important output is not the prompt text. It is the permission surface. A generated system prompt can be wrong and the worst case is a confused agent; a wrong permission surface is how an agent runs rm -rf or calls an unapproved script against production data. The rest of the design — bash as the tool surface, multi-model switching, MCP integration — all gets its value from the fact that everything routes through this one compile-time boundary.

One operational consequence of that is worth naming: because the compiled output is plain markdown + JSON + scripts, it goes in git. A change to any agent’s permission surface shows up as a diff on the frontmatter YAML — a new bash pattern, an mcp: false flipped to a server-specific allow, a read: false changed to read: true. You can review that diff in a PR before it hits production. “Why does this agent now need rm?” is a question a reviewer can ask on the compiled artifact, not something you discover by watching the agent in prod. Most runtime agent frameworks cannot give you that, because the permission decisions do not live in a file that gets committed.

The compiler runs this function for every agent at build time:

def _auto_tool_permissions(all_tools, defn, ...) -> dict[str, Any]:
    # Deny everything, then selectively re-enable.
    bash_perms: dict[str, str] = {"*": "deny"}
    for t in all_tools:
        for action in t.actions:
            bash_perms[action.command_pattern] = "allow"

    result: dict[str, Any] = {}
    result["bash"]  = bash_perms
    result["read"]  = False
    result["write"] = False
    result["edit"]  = False
    result["task"]  = False
    result["mcp"]   = False
    # ...workspace, workflow, subagent, skill, MCP patterns slot in here
    return result

The emitted frontmatter looks like this:

permission:
  bash:
    "*": "deny"
    "uv run scripts/fetch_context.py *": "allow"
    "uv run scripts/embedding_search.py *": "allow"
  read: false
  write: false
  edit: false
  task: false
  mcp: false

The leading *: deny fires before any allow because OpenCode evaluates patterns in order, so a missing pattern always means denial rather than “undefined”. Built-in capabilities that look innocuous (read, write, edit) are turned off by default too — an agent that needs to read a file gets a cat pattern on bash, not a blanket read: true. mcp: false at the top blocks MCP tools globally, and per-server allow patterns re-enable them one at a time.

One quirk worth calling out: the tool: section is informational only, OpenCode does not enforce it, but models that see tool: read: false in their frontmatter are still less likely to attempt bash cat workarounds than models that only see the enforced permission: block. So the compiler mirrors restrictions into both sections.

Bash as the universal tool surface

The second big design choice: there is no tool registry. There is bash, and bash calls Python scripts.

JSON-schema tools with typed arguments would have worked too, but every backend that might run an agent — OpenCode, a local model via vLLM, a cloud model via Anthropic’s API — has its own way of representing custom tools. Writing adapters for all of them is exactly the framework-ification the compiler is trying to avoid. Bash is a universal interface every backend already supports, and models have strong shell priors from pretraining, so the model can reuse the machinery it already has instead of learning a new DSL per agent.

There is a cost to this choice that’s worth naming where it lands. MoE models in particular leak pretraining shell habits the allow-list then denies — | head, timeout, 2>&1, &&/|| chains — racking up denied tool calls even on workloads where a structured interface would sail through. In practice this means every new MoE is worth a round of looking at which shell idioms it reaches for and either allow-listing them or translating them at the tool boundary. The cost lives entirely at the permission-pattern level; the design decision to use bash does not change.

A tool is just a Python script with a ScriptTool subclass:

# scripts/fetch_context.py
class FetchContextTool(ScriptTool):
    """Fetch cached project context for the given entity type."""

    class Input(BaseModel):
        entity_type: Literal["web_searches", "docs", "notes"]
        limit: int = 10

    async def run(self, inp: Input) -> dict:
        return {"results": await _fetch(inp.entity_type, inp.limit)}

Registering it with the compiler is one builder call:

search = (
    ToolBuilder()
    .name("fetch-context")
    .description("Fetch cached project context")
    .from_script("scripts/fetch_context.py")
    .build()
)

from_script imports the module, finds the ScriptTool subclass, introspects the Pydantic input model for argument docs, and derives the command pattern automatically:

def from_script(self, file_path: str) -> ToolBuilder:
    spec = importlib.util.spec_from_file_location(module_name, file_path)
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)

    for _, obj in inspect.getmembers(module, inspect.isclass):
        if obj is not _ScriptTool and issubclass(obj, _ScriptTool):
            handler_cls = obj
            break

    self._populate_from_handler(handler_cls, file_path)
    return self

There are two entry points that do the same underlying work. from_script("scripts/foo.py") finds and imports the handler class for you — convenient when the tool and the build script live in different packages. from_handler(FooTool, "scripts/foo.py") skips the import-by-path step and takes the class directly — handy when you already have it in scope (and slightly faster on large suites). The end-to-end example later in the post uses from_handler because the handler classes are already imported for type-checking; smaller setups can just use from_script.

At compile time, the tool produces an ActionDefinition:

@dataclass(frozen=True, slots=True)
class ActionDefinition:
    command_pattern: str   # "uv run scripts/fetch_context.py *"
    description: str
    usage_example: str

That command_pattern is what gets emitted into the agent’s permission.bash block. The model sees the script’s description and example in the system prompt, invokes it as uv run scripts/fetch_context.py --entity_type docs --limit 5, and OpenCode’s pattern matcher lets it through because the compiled permission allows exactly that prefix.

Adding a new tool is: write a Python script with a ScriptTool subclass, add one ToolBuilder().from_script(...) call to the agent definition, recompile. No adapter, no runtime registration, no glue.

A complete example: two agents, a shared skill, a subagent, and an MCP server

To ground the rest of the post: a full compile, showing how tools, skills, and subagents actually fit together in practice. Tool descriptions and skill instructions get rendered into each agent’s prompt automatically.

from open_agent_compiler import (
    ModelConfig, ProviderConfig, ProviderOptions,
    compile_agent, OpenCodeWriter,
)
from open_agent_compiler.builders import (
    AgentBuilder, ConfigBuilder, SkillBuilder,
    SubagentBuilder, ToolBuilder, WorkflowStepBuilder,
)

# Handler classes live in your project. Each subclasses ScriptTool and
# declares a Pydantic Input model. The compiler introspects them.
from my_project.tools import FetchContextTool, EmbeddingSearchTool, WritePRTool

# ── Shared config: providers + one MCP server ──
config = (
    ConfigBuilder()
    .provider(ProviderConfig(
        name="anthropic",
        options=ProviderOptions(api_key="env:ANTHROPIC_API_KEY"),
        models=(ModelConfig(name="sonnet", id="claude-sonnet-4-5-20250929"),),
    ))
    .provider(ProviderConfig(
        name="vllm",
        options=ProviderOptions(api_key="env:VLLM_API_KEY", base_url="http://localhost:8082/v1"),
        models=(ModelConfig(name="qwen35-27b", id="cyankiwi/Qwen3.5-27B-AWQ-BF16-INT8"),),
    ))
    .default_model("anthropic/sonnet")
    .compaction(auto=True, prune=True)
    .mcp_server("web-search", command="https://api.searchapi.io/mcp?token=YOUR_TOKEN")
    .build()
)

# ── Tools ── description gets rendered in the agent prompt.
fetch_context = (
    ToolBuilder()
    .name("fetch-context")
    .description("Fetch cached project context by entity type (docs, notes, prior web searches).")
    .from_handler(FetchContextTool, "scripts/fetch_context.py")
    .build()
)

embedding_search = (
    ToolBuilder()
    .name("embedding-search")
    .description("Semantic search across the project's embedded docs and code.")
    .from_handler(EmbeddingSearchTool, "scripts/embedding_search.py")
    .build()
)

write_pr = (
    ToolBuilder()
    .name("write-pr")
    .description("Draft a PR description from a summary and a list of changed files.")
    .from_handler(WritePRTool, "scripts/write_pr.py")
    .build()
)

# ── Skills ── group tools with instructions. Rendered in the prompt as a section.
research_skill = (
    SkillBuilder()
    .name("research")
    .description("Gather project context through cached lookups and semantic search.")
    .instructions(
        "Start with fetch-context for known entities (docs, notes, prior searches). "
        "Fall back to embedding-search for open-ended topic queries. "
        "Do not call raw curl — use these tools exclusively."
    )
    .tool(fetch_context)
    .tool(embedding_search)
    .build()
)

pr_skill = (
    SkillBuilder()
    .name("pr-drafting")
    .description("Draft PR descriptions grounded in prior changes.")
    .instructions(
        "Use embedding-search first to find similar past PRs, "
        "then call write-pr with a short summary + changed files."
    )
    .tool(embedding_search)  # shared with research_skill
    .tool(write_pr)
    .build()
)

# ── Subagent ── standalone file, own permission block, invoked via Task tool.
deep_dive = (
    SubagentBuilder()
    .name("deep-dive-researcher")
    .description("Exhaustive topic research. Returns a structured summary.")
    .notes("Invoked when the primary decides a topic needs more than a shallow lookup.")
    .build()
)

# ── Workflow steps ── each step renders as a numbered section in the prompt
# and declares which tools or subagents it uses (compiler double-checks those
# against the agent's skill/subagent attachments).
step_cached = (
    WorkflowStepBuilder()
    .id("1")
    .name("Check cached context")
    .instructions(
        "Try fetch-context first for the user's topic. "
        "If it returns a usable result, skip to step 3."
    )
    .todo("Check cache", "Look for cached docs or notes")
    .use_tool("fetch-context")
    .build()
)

step_search = (
    WorkflowStepBuilder()
    .id("2")
    .name("Semantic search")
    .instructions(
        "If the cache miss, run embedding-search across the project. "
        "If the query is broad or multi-topic, delegate to deep-dive-researcher "
        "via the Task tool instead of running multiple searches yourself."
    )
    .todo("Semantic search", "Find relevant snippets or delegate")
    .use_tool("embedding-search")
    .subagent("deep-dive-researcher")
    .build()
)

step_summarize = (
    WorkflowStepBuilder()
    .id("3")
    .name("Summarise findings")
    .instructions("Write a one-paragraph summary grounded in the retrieved context. Cite entity IDs.")
    .todo("Summarise", "Produce final summary")
    .mark_done("Summarise")
    .build()
)

# ── Primaries ──
research_agent = (
    AgentBuilder()
    .name("research-analyst")
    .description("Searches project context and summarises findings.")
    .mode("primary")
    .config(config)
    .system_prompt(
        "You are a research analyst. For simple queries use the tools directly; "
        "for complex topics, delegate to deep-dive-researcher via the Task tool."
    )
    .skill(research_skill, instruction="Use this for any lookup or semantic search.")
    .subagent(deep_dive)
    .workflow_step(step_cached)
    .workflow_step(step_search)
    .workflow_step(step_summarize)
    .build()
)

pr_writer = (
    AgentBuilder()
    .name("pr-writer")
    .description("Drafts PR descriptions grounded in prior changes.")
    .mode("primary")
    .config(config)
    .system_prompt("You are a PR writer. Find similar past PRs, then draft a description.")
    .skill(pr_skill, instruction="Use this skill end-to-end for drafting.")
    .build()
)

# ── Compile ──
writer = OpenCodeWriter(output_dir="build/")
for agent_def in [research_agent, pr_writer]:
    writer.write(compile_agent(agent_def, target="opencode"))

The emitted build/.opencode/agents/research-analyst.md — note the skill description and instructions are rendered into the prompt body automatically, and task: true appears because a subagent was attached:

---
description: Searches project context and summarises findings.
mode: primary
model: anthropic/sonnet
permission:
  bash:
    "*": "deny"
    "uv run scripts/fetch_context.py *": "allow"
    "uv run scripts/embedding_search.py *": "allow"
  read: false
  write: false
  edit: false
  task: true                # because of .subagent(deep_dive)
  mcp: false
  "web-search*": "deny"     # MCP denied explicitly until the agent opts in
---

You are a research analyst. For simple queries use the tools directly; for complex topics, delegate to deep-dive-researcher via the Task tool.

## Skills

### research — Gather project context through cached lookups and semantic search.
Use this for any lookup or semantic search.

Start with fetch-context for known entities (docs, notes, prior searches). Fall back to embedding-search for open-ended topic queries. Do not call raw curl — use these tools exclusively.

- **fetch-context** — Fetch cached project context by entity type (docs, notes, prior web searches).
  `uv run scripts/fetch_context.py --entity_type <docs|notes|web_searches> --limit <N>`
- **embedding-search** — Semantic search across the project's embedded docs and code.
  `uv run scripts/embedding_search.py --query <text>`

## Subagents

- **deep-dive-researcher** — Exhaustive topic research. Returns a structured summary. Invoked when the primary decides a topic needs more than a shallow lookup.

## Workflow

### Step 1 — Check cached context
Try fetch-context first for the user's topic. If it returns a usable result, skip to step 3.
Tools available in this step: `fetch-context`.

### Step 2 — Semantic search
If the cache miss, run embedding-search across the project. If the query is broad or multi-topic, delegate to deep-dive-researcher via the Task tool instead of running multiple searches yourself.
Tools available in this step: `embedding-search`. May delegate to: `deep-dive-researcher`.

### Step 3 — Summarise findings
Write a one-paragraph summary grounded in the retrieved context. Cite entity IDs.

Five things to notice:

The embedding_search tool object is defined once and referenced by two skills and (transitively) three builders. Each emitted agent’s permission block is generated independently from its own tool list.
Skills do real work in the prompt: the description + instructions + per-tool description block become a structured “Skills” section so the model sees tool usage in context, not just as a bash allow-list.
Workflow steps each render as their own numbered section in the prompt and declare which tools / subagents they use. The compiler validates that every .use_tool(...) and .subagent(...) on a step is actually attached to the agent — you can’t reference a tool in step 2 that the agent doesn’t have.
The subagent is emitted as its own standalone build/.opencode/agents/deep-dive-researcher.md with its own permission block — never inlined into the primary’s prompt. The primary only gets a summary reference.
task: true appears only on the research-analyst because a subagent was attached; the PR writer gets task: false. The MCP server declared in config is disabled per-agent by default (mcp: false + explicit "web-search*": "deny"); an agent that needs it would flip that pattern to "allow".

Everything else in this post — multi-model presets, workflows, opencode_manager-spawned primary chains — extends the same shape.

Multi-model compilation and dynamic switching

Agents should not care which model runs them. Different workloads want different budgets — an orchestrator might burn Opus tokens, a classification subagent wants the cheapest fast model, a local experiment wants the vLLM endpoint. The ConfigBuilder declares providers once; agents reference them by name; the default is chosen per run.

config = (
    ConfigBuilder()
    .provider(ProviderConfig(
        name="anthropic",
        options=ProviderOptions(api_key="env:ANTHROPIC_API_KEY"),
        models=(
            ModelConfig(name="sonnet", id="claude-sonnet-4-5-20250929",
                        options=ModelOptions(temperature=0.0)),
            ModelConfig(name="opus", id="claude-3-5-opus-20241022",
                        options=ModelOptions(temperature=0.5)),
        ),
    ))
    .provider(ProviderConfig(
        name="vllm",
        options=ProviderOptions(
            base_url="http://localhost:8082/v1",
            api_key="env:VLLM_API_KEY",
        ),
        models=(ModelConfig(name="qwen35-27b",
                            id="cyankiwi/Qwen3.5-27B-AWQ-BF16-INT8"),),
    ))
    .default_model("anthropic/sonnet")
    .compaction(auto=True, prune=True)
    .build()
)

Compiling with a different default produces a different opencode.json with the same agent markdown files. The comphy_slop_factory project uses a --preset flag on its compile script to pick which provider hierarchy gets emitted:

# Local iteration: same agents, vLLM provider.
python scripts/compile_agents.py --preset local --output build-local/

# Production: same agents, GLM-4.7 via z.ai.
python scripts/compile_agents.py --preset prod --output build-prod/

Same 16-agent pipeline, different provider hierarchy in the emitted config, no agent code changes. Per-agent sampling overrides (e.g. temperature=0.0 on an orchestrator, temperature=0.9 on a writer) survive the switch. My daily rotation is a local Qwen3.5-27B during iteration and GLM-4.7 for production — the Qwen benchmark post goes through why the 27B dense is my default local agent model despite faster MoEs existing.

MCPs through the same compile step

MCP tools are structured function-calling rather than bash, so they do not have the shell-idiom problem described in the previous section — no | head to leak, no timeout to strip. But they do have a different surface-area problem: an MCP server can expose a dozen tools, and the compiler has no schema-level way to tell which subset a given agent should be allowed to touch. The strategy is blanket mcp: false at the top of every agent’s permission block, then per-server glob allow-lists for agents that need them.

Model Context Protocol servers — local stdio or remote HTTP — get declared on the same ConfigBuilder. Nothing special, same builder:

config = (
    ConfigBuilder()
    .provider(provider_config)
    # Local stdio server: command + args + env.
    .mcp_server(
        "neocortex",
        command="python",
        args=["-m", "neocortex"],
        env={"NEOCORTEX_DB": "postgresql://localhost/research"},
    )
    # Remote HTTP with the token embedded in the URL — shortest form.
    .mcp_server(
        "web-search",
        command="https://api.example.com/mcp?token=YOUR_TOKEN",
    )
    # Remote HTTP with a separate Authorization header — cleaner when the
    # token lives in an env var or rotates.
    .mcp_server(
        "paid-search",
        url="https://api.example.com/mcp",
        headers={"Authorization": f"Bearer {os.getenv('SEARCH_API_KEY')}"},
    )
    .build()
)

Local stdio gets command + args + env. Remote HTTP supports two forms: a URL directly in command= (token embedded, shortest), or url= + headers= when you want the token in a header instead of the query string. The compiler dispatches on which keyword is present. Compiled into opencode.json:

"mcp": {
  "neocortex":   { "type": "local",  "command": ["python", "-m", "neocortex"],
                   "environment": {"NEOCORTEX_DB": "postgresql://..."} },
  "web-search":  { "type": "remote", "url": "https://api.example.com/mcp?token=YOUR_TOKEN" },
  "paid-search": { "type": "remote", "url": "https://api.example.com/mcp",
                   "headers": {"Authorization": "Bearer ..."} }
}

Agents still have to opt in. The blanket mcp: false set by the permission auto-generator blocks every MCP tool by default; each agent that needs a server re-enables it with a glob pattern:

permission:
  mcp: false
  "neocortex*": allow
  "web-search*": allow

The memory server in the DeepMind hackathon project, for example, exposes a handful of read/write tools, most of which a given subagent has no business touching. In practice, default-deny catches a lot of accidental over-permissioning during development — you add an MCP server to the config, forget to allow-list it on an agent, and discover the missing permission by seeing the tool not fire rather than by seeing it misfire.

The build process

Compile is one script. From the coding_agent_suite project:

def compile_all(config_overrides=None, output_dir=None) -> Path:
    build_dir = output_dir or DEFAULT_BUILD_DIR
    config = build_config(overrides=config_overrides)

    agents = [
        build_research_agent(config),
        build_plan_evaluator_agent(config),
        build_implementor_agent(config),
        build_review_orchestrator(config),
        build_fixer_agent(config),
        *get_research_subagents(config),
        *get_evaluator_subagents(config),
        *get_review_subagents(config),
    ]

    writer = OpenCodeWriter(output_dir=build_dir)
    for agent_def in agents:
        compiled = compile_agent(agent_def, target="opencode")
        writer.write(compiled)

    return build_dir

Running it:

python scripts/compile_agents.py --output build/

The output is a normal OpenCode project:

build/
  opencode.json              # providers, models, MCPs, compaction
  .opencode/agents/
    research-orchestrator.md # frontmatter + system prompt per agent
    research-analyst.md
    implementor.md
    ...
  scripts/
    fetch_context.py         # tool handler scripts, copied in
    embedding_search.py
    workspace_io.py          # bundled infrastructure
    subagent_todo.py
    opencode_manager.py

Runtime invocation is whatever OpenCode offers:

opencode run --agent pipeline/research-orchestrator "Find security vulnerabilities"

# Override the model without recompiling
opencode run --agent pipeline/research-orchestrator \
  --model anthropic/opus "Find security vulnerabilities"

Subagents are spawned by primaries through two paths: OpenCode’s built-in Task tool (for mode="subagent" children), or a bash call to opencode_manager.py run --agent <name> (for mode="primary" children that need their own permission set). The compiler chooses which path to enable per agent, based on the declared subagent modes, and only emits the corresponding permission pattern. A primary that only has Task-mode children gets task: true and nothing else; one that dispatches bash-mode children gets a narrow opencode_manager.py run*--agent*<sa_name>* pattern and task: false.

Projects I’ve built on this

Four of my own projects run on the compiler — not a broad-adoption story, a “the shape flexes across workloads” story:

coding_agent_suite (WIP) — OpenHands-inspired sandboxed platform for OpenCode ralf loops (a recover-and-advance-the-loop pattern for long multi-stage tasks with explicit state transitions). Each loop fetches a repo, builds a Docker pod, runs tests inside it. The compiler runs inside the pod to build the agent suite before each run.
Personal Agent (Telegram) — smart-home + personal-ops agent with access to cameras, emails, smart devices, Garmin, screenshots, and a few chatbots. Ralf loops for longer tasks.
google-deepmind-hackathon (initial implementation, unclear if it still runs) — multi-modal research with a NeoCortex MCP server as persistent memory across subagents, plus a testing suite that ingested from external links and Google Drives.
comphy_slop_factory — slop-movie generator. Story → still frames → WAN-2.2 video → sound effects and spoken audio → 2-3 min output.

Nothing above is unique to OpenCode as a backend. The compiler’s target is a string — compile_agent(agent_def, target="opencode") — and the Writer interface is a protocol. A different backend would need a new compiler function and a new writer; the builder API, the tool definitions, and the permission model all stay the same.

One session per task

A note on the assumption this framework builds on, because it shapes several of the tradeoffs in the next section. Agents are designed to run one session per task rather than one long chat that queues everything. You spin up a fresh session for each task, route prior context in through tools or prebuilt prompts (which still cache nicely), and get real concurrent execution — 10 agents on 10 tasks at once, instead of a single long chat serialising the 10th behind the 9th.

This is fundamental to why the framework feels light despite shipping no runtime orchestration primitives. Independent sessions are the orchestration primitive, and anything above them is a shell script. It’s also why “no hot-reload” doesn’t bite: when every run starts a fresh session, “recompile and start the next session” is effectively hot-reload for anything the next task needs.

What this does not do

With the session-per-task assumption set, here is what the design deliberately skips:

No runtime orchestration primitives. Agents talk to each other through OpenCode’s existing Task tool or through bash calls to opencode_manager.py. There is no message bus, no queue, no shared state beyond the workspace files agents write by convention. Orchestration that needs more than that goes in a shell script above the agent layer. Side benefit: OpenCode’s web UI makes subagent spawns trivially observable — each Task-tool invocation is a visible new process, so debugging “why is this stuck” usually means looking at which subagent is still alive rather than parsing logs. Chains of primary agents spawned via bash calls to opencode_manager.py don’t show up in the UI the same way — each primary is its own OpenCode session — but the subagent layer is where most of the orchestration lives anyway.
No hot-reload. Changing an agent means recompiling. At the scale I actually run — 150+ agents compiled across 10 provider variants, roughly 1500 markdown files — a full regen takes under half a second. The tradeoff (fully static artifacts a human can diff) is worth it, and the session-per-task philosophy above means hot-reload mostly doesn’t bite anyway: every new task starts a fresh session.
No automatic retries, no backoff, no circuit breakers. The agent markdown ends where the system prompt ends. Reliability concerns live in the shell script that calls opencode run, not in the agent definition.
No enforced MoE-safe prompting. The compiler will happily emit a 30-rule system prompt with five negations in it. Whether the downstream model can actually hold those rules is a model-architecture question, not a compiler one — covered in the MoE rule-binding post.

The principle throughout is to emit static artifacts and let other tools own the runtime. For most of my projects that means OpenCode’s web UI handles the spawned-agent / subagent / tool-call visualization — the compiler doesn’t ship a UI layer and doesn’t need to.

Current state

Alpha. 0.1.x on PyPI, MIT, Python 3.12+.

The API surface is small. Typical flow: ToolBuilder wraps your Python scripts, AgentBuilder assembles tools + subagents + prompt into an agent, ConfigBuilder declares providers and MCPs, then compile_agent() plus OpenCodeWriter emit the static project. SubagentBuilder, WorkflowStepBuilder, and SkillBuilder cover the more specialised cases.

The permission model is the most battle-tested part; the workflow generator is the newest and still changing.

pip install open-agent-compiler
# or
uv add open-agent-compiler

If you end up compiling agents on top of this and notice something that the design should handle but does not, the repo takes issues.