Giving Agents a CLI Is Not About Commands¶

The phrase "give Codex a custom CLI for your repo" sounds tactical.

It sounds like the advice is: write a few helper commands so the agent does not have to remember how to run tests.

That is useful, but it is not the real idea.

The deeper idea is that a repo-local CLI gives agents a stable control plane. It turns a repository from a pile of files into an environment with named operations, state transitions, checks, and remediation paths.

This matters most when the repo is not merely an app, but a long-running creative or experimental system.

The Remotion video lab is the case study.

The Starting Problem¶

The surface request was simple: learn Remotion, store good prompts, make videos, and eventually use the system for business.

The real problem was sharper.

Early Remotion experiments were expected to be messy. Agents would create bad videos, strange transitions, incoherent layouts, disposable assets, failed prompts, screenshots, renders, and half-useful compositions. That mess was not a defect. It was part of the learning process.

The failure mode was letting that experimental mess become the repo's operating model.

A naive repo would organize by file type:

prompts/
assets/
experiments/
src/
renders/

That looks clean for a week. Then it turns into archaeology.

The agent can see files, but it cannot tell which assets are reusable, which prompts worked once, which experiments failed for taste reasons, which renders are evidence, and which artifacts are safe to promote into business use.

So the real requirement is not tidiness.

The real requirement is controlled state.

The Harness Engineering Lens¶

OpenAI's Harness Engineering article makes the important distinction: an agent-first repo should make the repository itself the system of record. A short AGENTS.md acts as a map, while repo-local docs, executable plans, checks, and tools carry the actual operating knowledge.

Source: Harness engineering: leveraging Codex in an agent-first world, OpenAI, February 11, 2026.

The lesson is not "write more documentation."

The lesson is:

put durable knowledge where future agents can find it
make important boundaries mechanically checkable
encode human taste as rules, rubrics, and feedback loops
avoid a giant instruction blob
make the repo legible to the agent that will maintain it

For a Remotion lab, this means the repo should not just hold video code. It should teach a fresh agent how to run the studio, evaluate video quality, index assets, preserve provenance, promote useful discoveries, and clean up failed experiments.

That is where the CLI becomes central.

The Inference¶

"Give agents a CLI" generalizes to:

Give the agent named verbs for the lifecycle of the repo's important objects.

The CLI should not mirror shell convenience.

It should mirror the repo's ontology.

In the Remotion case, the important objects are:

experiments
prompts
assets
templates
renders
critiques
promotions
quality rules

The important lifecycle is:

create experiment
  -> ingest assets
  -> render evidence
  -> critique output
  -> extract lesson
  -> promote reusable artifacts
  -> update registry
  -> garbage-collect leftovers

That lifecycle should not live only in a README. It should be executable.

The Case Study Repo¶

An elegant arrangement has one agent spine and one or more domain capsules.

For the Remotion lab:

repo/
  AGENTS.md
  README.md

  docs/
    INDEX.md
    INTENT.md
    QUALITY.md
    DECISIONS.md

  work/
    active/
    completed/
    backlog.md

  domains/
    remotion/
      README.md
      MOTION_LANGUAGE.md
      ASSET_POLICY.md
      QUALITY_RUBRIC.md

      prompts/
      templates/
      experiments/
      assets/
      renders/
      registry/
      src/

  tools/
    studio.ts

The top-level repo explains how agents should operate.

The Remotion domain contains the video-specific world.

The CLI owns the transitions.

The CLI As Control Plane¶

The command surface should be small and boring:

studio doctor
studio experiment new <template>
studio asset ingest <path> --experiment <id>
studio render still <experiment-id>
studio render preview <experiment-id>
studio critique <experiment-id>
studio promote <experiment-id>
studio gc
studio verify
studio github readiness

Each command exists because the repo has a state transition that agents should not improvise.

studio experiment new creates the expected folder, metadata, initial prompt, and work note.

studio asset ingest records provenance, source prompt, ownership status, hash, and experiment association.

studio render still produces visual evidence before a full render wastes time.

studio critique forces the bad-video learning loop to happen in writing.

studio promote moves artifacts from experimental space into durable library space only after evidence and critique exist.

studio gc removes or archives junk without deleting provenance.

studio verify checks that the repo is still agent-legible.

studio github readiness checks whether the public repo is safe and coherent enough to publish.

The CLI is not there to make the human feel productive.

It is there so future agents do not have to guess what "done" means.

Why A Registry Matters¶

Media repos fail differently from normal code repos.

They accumulate assets quickly. Those assets are often large, generated, duplicated, semi-licensed, temporary, or only useful inside one experiment.

Folders alone cannot express that.

The Remotion lab needs registries:

domains/remotion/registry/
  experiments.jsonl
  assets.jsonl
  prompts.jsonl
  promotions.jsonl

A generated asset record might look like:

{
  "id": "asset_20260506_001",
  "path": "domains/remotion/assets/generated/news-texture.png",
  "kind": "image",
  "status": "temporary",
  "source": "generated_by_agent",
  "experiment": "exp_20260506_news_hook",
  "license": "user-generated",
  "sha256": "TODO",
  "notes": "Useful texture, poor headline contrast."
}

This changes the agent's job.

Without a registry, the agent asks: "What files are here?"

With a registry, the agent asks: "What state is this artifact in, and what transition is allowed next?"

That is the harness move.

The Promotion Rule¶

The most important command is not render.

It is promote.

Experimentation should be cheap. Promotion should be strict.

A Remotion experiment should not graduate into the durable library unless:

the prompt is saved
the input assets are indexed
at least one still or preview render exists
critique notes identify what worked and what failed
reusable components are named explicitly
asset provenance is recorded
the quality rubric does not reject the result

The repo doctrine becomes:

experiments may be messy
the registry may not be messy
the promoted library may not be messy
the CLI owns transitions between states

This is the cleanest answer to the user's concern.

The repo can support a huge amount of agent-generated mess if the mess is fenced, indexed, and forced through lifecycle transitions.

The General Pattern¶

This case study generalizes beyond Remotion.

For any agent-legible repo, ask:

What are the repo's important objects?
What states can those objects be in?
What transitions should agents perform repeatedly?
Which transitions are dangerous if improvised?
What evidence proves a transition happened correctly?
What metadata must survive future sessions?
What should be mechanically checked?
What should be promoted from human judgment into a rule?

Then build the CLI around those answers.

For a web app, the objects might be features, routes, migrations, user journeys, tests, and releases.

For a research repo, the objects might be sources, claims, evidence packets, syntheses, contradictions, and publications.

For a design system, the objects might be components, variants, tokens, screenshots, audits, and adoption notes.

For a Remotion lab, the objects are experiments, prompts, assets, renders, critiques, templates, and promotions.

The CLI is the same pattern with different nouns.

Teaching Walkthrough¶

If I were teaching this to someone, I would walk them through it like this:

First, show the naive folder tree. It looks reasonable.

Second, ask what happens after fifty experiments. The tree still looks reasonable, but the meaning is gone.

Third, identify the missing layer: state. Which files are temporary? Which are trusted? Which prompt produced which render? Which failure taught a rule?

Fourth, introduce the CLI as the repo's state-transition surface. The CLI is not just a convenience wrapper. It is how the repo prevents agents from silently making up workflow.

Fifth, show studio promote. This is the moment the idea becomes concrete. Promotion is where messy creative work becomes durable reusable infrastructure.

Sixth, generalize: every serious agent repo needs named objects, allowed transitions, evidence, verification, and cleanup.

That is what "give agents a custom CLI" really means.

The Durable Claim¶

A repo for agents should not merely be readable.

It should be operable.

Documentation tells the agent what the system means.

The CLI lets the agent change the system without inventing a process.

Verification tells the agent whether the change is acceptable.

Registries tell the agent what survived previous sessions.

Promotion rules turn taste and judgment into repeatable infrastructure.

In the Remotion case, that means agents can generate lots of bad videos without destroying the repo, because the repo has a way to metabolize failure.

That is the point of the custom CLI.