Barnum - The programming language for orchestrating agents

The missing programming language for orchestrating AI agents.

LLMs are incredibly powerful tools. They are being asked to perform increasingly complicated, long-lived tasks. Unfortunately, the naive way to work with agents quickly hits limits. When their context becomes too full, they become forgetful and make the wrong decisions. You can't rely on them to faithfully execute a complicated, multi-step plan.

Barnum is an attempt to enable LLMs to perform dramatically more complicated, ambitious tasks. With Barnum, you define an asynchronous workflow, which is effectively a state machine. This makes it easy to reason about the possible states and actions that your agents will be asked to perform, and the steps can be independent and small.

🦁 A choreographed show

Workflows are composed from type-safe primitives: .then(), .iterate(), .map(), loop, branch, tryCatch. First-class constructs for orchestration, not prose or ad-hoc scripts.

🐘 The right performer for each act

Handlers are either built-in primitives or TypeScript async functions. Agents handle the parts that require judgment. Deterministic code handles the rest. No LLM needed to list files or run a type-checker.

🐯 No one goes off script

Each handler runs in its own isolated Node.js subprocess. The agent performing a refactor never sees the full workflow — just its input and a prompt. Focused context means better decisions.

See it in action.

A simple example.

Handlers are the building blocks of a Barnum workflow. Today, handlers are either built-in primitives or exported TypeScript async functions. (Support for other languages is planned.) You compose them into workflows using postfix methods like .then() (sequential) and .iterate() / .map() (fan-out).

handlers/steps.ts
// handlers/steps.ts
import { createHandler } from "@barnum/barnum/runtime";
import { z } from "zod";

export const listFiles = createHandler({
  outputValidator: z.array(z.string()),
  handle: async () => {
    return readdirSync("src/").filter(f => f.endsWith(".ts"));
  },
}, "listFiles");

export const refactor = createHandler({
  inputValidator: z.string(),
  handle: async ({ value: file }) => {
    await callAgent({
      prompt: `Refactor ${file} to replace all class-based React
components with functional components using hooks.`,
      allowedTools: ["Read", "Edit"],
    });
  },
}, "refactor");

// ... typeCheck, fix, commit, createPR

run.ts
// run.ts
import { runPipeline } from "@barnum/barnum/pipeline";
import {
  listFiles, refactor, typeCheck, fix, commit, createPR,
} from "./handlers/steps.js";

runPipeline(
  listFiles
    .iterate()
    .map(refactor.then(typeCheck).then(fix).then(commit).then(createPR))
    .collect(),
);

listFiles runs once and returns an array of filenames. .iterate() enters Iterator, .map() fans out — each filename flows through refactor → typeCheck → fix → commit → createPR in parallel. .collect() gathers the results back into an array. Each handler executes in its own isolated subprocess. The Rust runtime manages the state machine: dispatching handlers, collecting results, and advancing the workflow. No handler sees another handler's context.

Why not just write this in JavaScript?

The simple example above is simple. You could probably ask your favorite LLM to one-shot the orchestration script, and it would do a decent job. When the workflow grows in complexity, you might reach for plan mode or write a markdown file describing the steps. That works for a while. But what happens when the plan has 40 steps across 15 files with conditional branches, retries on failure, parallel fan-out, and a review loop? Good luck getting an agent to faithfully and reliably execute that plan.

And in practice, you do want the complicated version. You want the agent to refactor, then evaluate the result, then type-check, then fix errors in a loop until it's clean:

const refactorWithRetry =
  refactor
    .then(evaluate)
    .then(loop((recur) =>
      typeCheck.then(classifyErrors).branch({
        HasErrors: Iterator.fromArray<TypeError>().map(fix).drop().then(recur),
        Clean: drop,
      })
    ))
    .then(commit)
    .then(createPR);

runPipeline(
  listFiles.iterate().map(refactorWithRetry).collect(),
);

The problem isn't that any individual piece is hard. The problem is that expressing a precise, complicated asynchronous workflow in prose or ad-hoc scripts is fragile. A programming language geared towards orchestration is what you actually want — one where .iterate(), .map(), loop, branch, tryCatch, and .then() are first-class constructs with type-safe composition.

Looks complicated? Agents are good at writing this.

Barnum workflows are TypeScript with strong types and Zod validators. Every combinator is fully typed — your agent gets autocomplete, type errors, and compiler feedback as it writes. Show it one of the working demos as a reference and tell it what you want. It'll write a working pipeline.

What Barnum gives you

.then(): sequential chains. Process steps one after another.
.iterate().map(): fan-out to parallel. List 50 files, refactor them all concurrently.
loop: retry and iterate. Fix type errors in a loop until the code is clean.
branch: conditional routing. An analyzer classifies; specialists execute.

tryCatch: error recovery. Catch failures and route to fallback handlers.
withTimeout: deadline enforcement. Time out a handler and fall back to an alternative.
Schema validation: handlers declare input and output schemas via Zod. Validated at every boundary.
Isolated execution: each handler runs in its own subprocess. No shared context, no drift.

Ladies and gentlemen, the show is about to begin!

I'm ready, start the show