Agents Frame
blog
Postmortems on agents that picked the wrong frame, benchmarks for the libraries behind the API, and the occasional opinion on where multi-step agents are going.
.png?width=3840&quality=90&format=auto)
Single-model agents collapse every question into one frame. POST an intent to /v1/think and get three thinking frameworks back, ranked by ELO and ready to paste — so the agent can sanity-check the obvious answer before it commits to a destructive action.

First Principles, Inversion, Sunk Cost, Second-Order, Probabilistic, Margin of Safety, Circle of Competence, Hanlon's Razor, Occam's Razor, Confirmation Bias, Availability Heuristic, Dunning-Kruger, Jobs-to-be-Done, Lean Startup, SWOT — what each frame is good for, and where it breaks.
4x.jpg?width=3840&quality=90&format=auto)
The four-step pipeline behind /v1/think: detect language, embed via Gemini, pgvector cosine recall, ELO rerank. End-to-end p50 sits at ~430ms — predictable enough to drop inside agent loops without budgeting an extra 5-second tail.

Picking next-forge over rolling our own monorepo: hand-written Tailwind sections plus dictionary-driven copy beat BaseHub-fetched homepages for control, the @repo split keeps Clerk and Stripe off the marketing surface, and Bun + Turborepo cuts cold builds to ~4 seconds.

A single tools/call to route returns in well under a second and needs no streaming progress. Keeping the wire format stateless makes horizontal scaling trivial and avoids forcing every host (Claude Desktop, Cursor, Windsurf) to implement SSE replay correctly.
4x.jpg?width=3840&quality=90&format=auto)
Each framework carries a per-language ELO score updated by /v1/feedback. K-factor 16, decay-free at V0, applied in the rerank step on top of pgvector cosine recall. The cumulative effect: routing accuracy improves with usage instead of frozen at the day-one embedding distance.