Running a 3-Agent AI Team: What Actually Works

Most people using AI have one chatbot. We have three AI agents working as a coordinated team — a coordinator, a builder, and a trader — running 24/7 on a Mac Mini in West Yorkshire. Here's what we've learned after the first week.

The cast

Think of it like a small company:

Holly (Coordinator / PM) — runs on Claude Opus. Writes PRDs, assigns work, monitors the team, runs morning briefs, handles communications. Basically a chief of staff.
Sherman (Builder) — runs on Claude Sonnet + Claude Code. Receives specs from Holly, writes actual code, reviews for quality, reports back. Named after the Shermanator — methodical, systematic, slightly intense.
Finch (Trading Operator) — runs on Claude Sonnet. Monitors an algorithmic trading bot, runs experiments, sends alerts when things break. The quiet strategic one.

All three run on OpenClaw, communicating via inter-agent messaging, posting updates to Discord, and escalating to Telegram when something's on fire.

Why three instead of one?

We tried the one-agent approach first. It doesn't scale.

A single agent can't hold context for a trading bot codebase and two iOS apps and business strategy and daily operations. The context window fills up, the agent gets confused, and you spend more time re-explaining than you save.

Separation of concerns isn't just a software pattern — it works for AI teams too:

The coordinator shouldn't be writing code. Holly's value is leverage: a good PRD that saves Sherman 4 hours beats Holly writing mediocre code for 4 hours.
Cost scales with role. Opus ($200/mo flat) for the strategic thinker. Sonnet (~$1/day each) for the workers. The brain is expensive; the hands are cheap.
Specialisation compounds. Sherman is becoming a Claude Code expert — learning which flags work, which patterns fail, how to structure tasks for best output. That knowledge lives in his workspace and improves every task.

The daily rhythm

Here's what a typical day looks like:

7:30am — Holly runs the morning brief. Checks email, calendar, overnight trading results. Posts a summary to Discord.
9:00am — Human arrives, reviews the brief, sets priorities for the day.
10:00am — Holly does a team sync. Pings Sherman and Finch, checks progress, reassigns if needed.
Throughout the day — Sherman builds features, Finch monitors the trading bot, Holly coordinates and handles ad-hoc requests.
11:00pm — Holly runs a night shift: creates task ideas, builds prototypes, writes documentation while the human sleeps.

The agents don't sleep. The human does. That's the whole point.

What actually works

1. PRD-first workflow

Holly writes detailed specs. Sherman builds exactly what's specced. No ambiguity, no "I interpreted it as..." moments. The PRD is the contract.

This is the single biggest productivity multiplier. A 30-minute PRD saves hours of back-and-forth. We've documented the pattern in detail — it's the same approach we use for building iOS apps.

2. Cron-based operations

Health checks, morning briefs, team syncs, trading reports — all run on schedules. The agents don't wait to be asked. They check, report, and escalate.

This sounds simple but it's transformative. You wake up to a briefing instead of spending 30 minutes figuring out what happened overnight.

3. Tiered alerting

Not everything deserves a notification:

🟢 Green — logged to Discord, no notification. "Autopilot completed a cycle."
🟡 Amber — posted to Discord channel. "Bot restarted, no data loss."
🔴 Red — Telegram to the human. "Trading bot down for 6+ hours."

Without this, you drown in noise. With it, silence means everything's fine.

4. Knowledge hierarchy

Holly sees everything. Sherman sees his codebase. Finch sees his trading data. Information flows up, tasks flow down. Just like a real org.

What doesn't work (yet)

1. Session reliability

Agents go dormant. You ping them, they time out. You ping again, they wake up confused. This is the biggest pain point right now.

Finch timed out 4 times in one day recently. His cron jobs ran fine — the scheduled work executed perfectly — but when Holly tried to have a real-time conversation to reassign work, nothing came back. It's like having an employee who does their job but never answers the phone.

2. Agent initiative

The agents wait for instructions. They don't look at the backlog and think "this is the most impactful thing I could work on." They don't notice the trading bot has been down for 18 hours and proactively investigate.

This is improving — we're building patterns for proactive work into their system prompts — but it's not natural yet. Real employees develop instincts. AI agents follow instructions.

3. Cross-agent handoffs

Sherman finds a bug in the trading bot. Finch needs to deploy the fix. The handoff between them is still clunky — Holly has to manually relay context, check both sides understood, and verify the fix landed.

In a real team, Sherman would walk to Finch's desk and say "here's the commit, here's what changed, here's how to test it." We don't have that yet.

4. Rate limits as the real constraint

On Claude's Max plan ($200/month), cost isn't the issue — rate limits are. Too many crons firing at once, all three agents active simultaneously, and suddenly someone's queued. We've staggered cron schedules and moved workers to Sonnet to reduce pressure, but it's a constant game of resource management.

The economics

Let's be honest about cost:

Claude Max plan: $200/month (flat rate, all agents)
Infrastructure: Mac Mini M4 (already owned), Cloudflare free tier, Ollama on a gaming PC for the trading bot's AI
Total: ~$200/month for a 3-agent team that works 24/7

Compare that to a single junior developer at £30k+/year — and the agents don't take holidays, don't need onboarding, and can be working at 3am while you sleep.

The ROI is absurd if you invest the time to set up good workflows. Without good PRDs, clear boundaries, and proper monitoring, you'll spend more time managing agents than they save you.

💡 The real cost

The expensive part isn't the API bill. It's your time writing PRDs, reviewing output, and building the coordination layer. Plan for a week of setup before you see returns.

Lessons for anyone trying this

Start with one agent. Get it reliable before adding more. We ran Holly solo for weeks before introducing Sherman.
Write good specs. Garbage in, garbage out applies to agents exactly as much as it applies to code.
Monitor everything. Agents won't tell you they're stuck. Build health checks, morning briefs, and alerting from day one.
Keep workers cheap. Sonnet for execution, Opus for strategy. Don't burn premium tokens on code that Sonnet writes just as well.
Have a human review layer. Never let agents push to production unsupervised. The "dangerously skip permissions" flag exists, but the human should still see everything before it ships.
Document aggressively. Agent memory resets every session. If it's not written down, it didn't happen. MEMORY.md, daily logs, brain docs — the overhead is worth it.

What's next

We're working on:

Self-healing agents — when an agent detects it's stuck or producing bad output, it should escalate and retry, not silently fail
Better handoffs — structured task objects that agents pass between each other, with context, acceptance criteria, and verification steps
A fourth agent? — possibly a dedicated QA agent that reviews Sherman's code from a different perspective. Or an outreach agent that handles emails and social.

The fundamental bet is that AI agents are about to get much more reliable. The coordination patterns we're building now — PRD workflows, tiered alerting, knowledge hierarchies — will matter even more when the underlying models improve.

We're building the management layer for a future where everyone has an AI team.

The agents don't sleep. The human does. That's the whole point.

This is part of the AI Dev Diary series at Sett & Stone. We're building real products with AI — honestly, including the failures. More at settandstone.com/blog.