Barnum - The ringmaster for your agents

The missing workflow engine for AI agents.

LLMs are extremely powerful tools, and we're using them to perform increasingly complicated tasks. However, a markdown plan file doesn't scale as the workflow grows more complicated. Agents lose track, skip steps, and make the wrong call when their context fills up. They act like unpredictable, wild animals.

That's why we created Barnum. Barnum provides the missing structure, rigor, and predictability for these complicated workflows. You define your workflow using a configuration file that is analyzable in advance. When it runs, transitions are validated, giving you certainty that the workflow will execute exactly as you designed it. And that certainty is what lets you build more powerful tools on top of agents.

🦁 A choreographed show

Workflows are expressed as statically analyzable state machines. Transitions are declared up front and validated when the workflow is run. Invalid ones are rejected and retried. No hoping the agent stays on track.

🐘 The right performer for each act

Some acts are agents, some acts are shell commands, and each does what it's best at. Fan-out with jq, commit with git, validate with your compiler, no agent needed.

🐯 No one goes off script

An agent performing a step never sees the full workflow, just the instructions for performing their current task. Focused context means agents don't get confused and can make better decisions.

See it in action.

One programme. Greatest show on earth.

With Barnum, you specify your workflow upfront in a configuration file. You can express ordering constraints (A before B), fan-out (one task per file), and aggregation (do X after everything finishes) in plain, readable JSON that can be validated before anything runs. Agents only handle the parts that require judgment. They never see the full workflow, so their context stays small and they don't drift off course as the work scales up. Each agent response is validated against a schema you define, so the workflow executes exactly as you specified.

In this programme, a command lists each .js file. Barnum dispatches one agent per file to convert it to TypeScript, in parallel. When all conversions finish, a finally hook triggers an agent that runs tsc and fixes any remaining type errors. One JSON file, no glue code.

config.jsonc
{
  "entrypoint": "ListFiles",
  "steps": [
    {
      "name": "ListFiles",
      // One ConvertToTS task per .js file
      "action": {
        "kind": "Command",
        "script": "find \"$(pwd)/src\" -name '*.js' | jq -R '{kind: \"ConvertToTS\", value: {file: .}}' | jq -s '.'"
      },
      "next": ["ConvertToTS"],
      // After all conversions: fix any remaining type errors
      "finally": { "kind": "Command", "script": "echo '[{\"kind\": \"FixErrors\", \"value\": {}}]'" }
    },
    {
      "name": "ConvertToTS",
      "value_schema": {
        "type": "object",
        "required": ["file"],
        "properties": { "file": { "type": "string" } }
      },
      "action": {
        "kind": "Pool",
        "instructions": {
          "inline": "Convert this JS file to TypeScript. Add types, rename to .ts. Return []."
        }
      },
      "next": []
    },
    {
      "name": "FixErrors",
      "action": {
        "kind": "Pool",
        "instructions": {
          "inline": "Run npx tsc --noEmit and fix all TypeScript errors. Return []."
        }
      },
      "next": []
    }
  ]
}

Looks complicated? Let the performers write the programme.

Barnum programmes are just JSON with a published schema. Point your agent at pnpm dlx @barnum/barnum config schema to get the full JSON Schema, show it the repertoire for common patterns, and tell it what you want. It'll write a working programme.

Why Barnum?

A single agent with a markdown plan can handle simple tasks. But real work (migrating 50 files, refactoring across a codebase, running multi-step pipelines) breaks that model fast. Context fills up, the agent loses track, and you can't predict what it will do before you run it.

Barnum is the ringmaster for your agents. You declare the full graph of steps and valid transitions upfront. It's statically analyzable before anything runs. At runtime, agents choose which path through the graph to take, but they can never go off script.

What Barnum gives you

Fan-out: split work into parallel tasks. List 50 files, refactor them all concurrently, commit when done.
Branching: route to different agents based on what the code needs. An analyzer decides; a specialist executes.
Sequential chains: process items one at a time when order matters, like applying multiple changes to the same file.
Adversarial review: implement, then judge, then revise. Loop until a critic agent approves the work.

Safety net: post hooks catch failures and route them to fix-up agents instead of just retrying blindly.
Hooks: enrich context before an agent sees it, validate results after, clean up resources when a subtree completes.
Schema validation: each step declares what data it accepts. Malformed responses are rejected before they propagate.
Commands: deterministic shell scripts for the mechanical parts: listing files, calling APIs, running builds. Save the LLM for the thinking.

Ladies and gentlemen, the show is about to begin!

📜 1. Write the programme

Write a programme with steps, transitions, and schemas. Each step is either an agent task or a shell command.

🎪 2. Corral the troupe

Start a troupe and connect agents to it. The more agents you add, the more work runs in parallel.

🎬 3. Showtime

Hand the programme to Barnum. It distributes tasks across your agents, enforces valid transitions, retries failures, and respects concurrency limits.

Start the show