Who runs the best outbound?

We made four AI models run our real outbound pipeline — same brief, 51 times — and graded every email. One of them isn't close.

A taste of what we found

One model caught a backwards fact before it hit a prospect's inbox.

20260622115721-claude · CEEZER · Matt Schmid · verdict: do_not_ship

What the draft claimed: The draft said Lufthansa Group was “one of six providers in CEEZER’s 14-project climate portfolio.”

What was actually true: The press (ESG News, May 26 2026; Airport Industry News, May 22 2026) says the inverse: CEEZER is one of six providers in Lufthansa Group’s portfolio.

Anchor re-fetched the primary sources, caught the reversed claim, and blocked the send pending a rewrite.

That's the email that would've gone out with a reversed fact about the prospect's own partnership. Anchor — running on Opus — stopped it.

What we did

We built our outbound pipeline as five specialist AI agents.

Then we asked the obvious question: which model should run them? So we made four fight it out — the exact same pipeline, on identical briefs, 51 times, in a sandbox that never touches a real send queue. Each model had to find the trigger and verify the facts itself.

RidgeDiscovery

CairnResearch

SetterPlay design

ScribeCopy

AnchorAdversarial QA

The full results

So — who won?

Every model scored, the real costs, the failure modes, and the one we trust with a client's name. Drop your email — it all unlocks instantly.

No spam. One email, from Amit.

We run this rigor before a single email goes out.

Every client system gets stress-tested like this. Want it pointed at your outbound?

Book a call