posts/ideas/engineering-harness-ai-bottleneck-approval.md

Post Idea: The Engineering Harness — How AI Changed Our Bottleneck from "Can We Build It?" to "Should We Ship It?"

Status: Idea Date: 2026-05-22 Source: Operating Model (qwestly-docs/docs/Engineering/operating-model.md), OpenAI "Harness Engineering" article, Qwestly MCP ecosystem, Workspace Harness doc, Engineering Excellence talks

Hook

We documented our engineering operating model — a detailed spec for how we ship software: feature lifecycle, planning docs, EM signoff gates, PR discipline, testing expectations, Asana paper trails. It's thorough. It's correct. And we built it assuming humans would do most of the work.

Then we built AI agents that could read and write our docs, search our meetings, manage our Asana tasks, review our code, and ship features. And suddenly the bottleneck flipped — it's no longer "can we build this?" It's "should we ship this?"

The Before Picture

Our operating model identifies 8 problems we were correcting:

  • Multiple unrelated features in one PR
  • Stale branches not rebased
  • Authors not reviewing their own diff
  • Over-engineered and under-engineered solutions
  • Low test coverage
  • Critical decisions reviewed too late
  • Weak paper trail in Asana
  • Complex features taking weeks

Each of these is a human-scale problem — limited attention, context switching, forgetting to document, not wanting to open yet another tab.

The AI Layer We Built

We didn't replace the operating model. We wired MCP servers into every agent that tap directly into the services the model already described:

MCP What it connects to What agents can do
Qwestly MCP (qwestly-mcp-dev) Engineering docs, 18 prompt templates, Asana formatting Read our operating model, apply code review prompts, create PRs, write planning docs
Granola MCP (granola-local-mcp) Meeting notes & transcripts Pull context from past decisions, search discussions
Asana MCP Task management, project tracking Update tickets with PR links, maintain paper trail
Notes MCP (local-notes) Local markdown engineering notes Search, read, create, update engineering notes — fast, local, no API key
Vercel (via MCP/CLI) Deploy previews, logs Check deployment status, verify builds

Every agent — whether running in vim (Avante.nvim + ACP), Zed (native ACP), or the terminal (Claude Code → DeepSeek) — has access to all of these. The tools follow the agent, not the editor.

The Effect

The work shifts to the design plan phase

The human role has fundamentally changed. Instead of spending days coding, debugging, and wiring things together, the bulk of human effort now goes into the design plan — defining scope, architecture, APIs, data models, and constraints. Once the plan is solid, the agent executes.

Real example — Qwestly Career Agent:

  • ~2 days of human design planning (architecture, data model, prompt strategy, tool contracts, frontend integration points)
  • Agent implements the entire system: backend orchestration, frontend integration, tests, deployment — in a matter of hours
  • QA, paper trail, Asana linking all handled by the agent as part of the loop

The ratio flipped: planning is the heavy lift. Implementation is the easy part.

The agent has more diagnostic context than any human

A human debugging a bug opens a few tabs. An agent debugging the same bug:

  1. Starts from the workspace harness — sees the full multi-project layout (e.g. candidate/ frontend + api-python/ backend) in one view, not just the repo they happen to have open
  2. Reads server logs and client-side logs — both wired into the MCP toolchain, so it can correlate errors across the stack instantly
  3. Opens a browser to reproduce the bug visually, inspect DOM state, and verify fixes — same Chrome DevTools Protocol approach the OpenAI harness engineering team used
  4. Diagnoses and solves problems faster than we ever could manually because it doesn't context-switch between tools

A typical cycle:

  1. Agent reads the operating model to know the process
  2. Agent searches Granola meetings for prior discussion context
  3. Agent checks Asana for ticket details and acceptance criteria
  4. Agent reads engineering docs for architecture context
  5. Agent generates a planning doc (via MCP prompt template)
  6. Agent implements across repos — frontend + backend + shared lib — with full code review prompt self-check
  7. Agent debugs end-to-end: server logs, client logs, browser verification
  8. Agent creates PR with description, links everything back in Asana
  9. Agent checks deploy status on merge

The new bottleneck: approval

Engineering speed is no longer the constraint. The team ships faster than anyone can reasonably review. The operating model's signoff gates (EM approval for architecture, DB changes, security, etc.) are now the critical path — not coding.

This is a good problem to have. It means the harness is working.

Code quality went up, not down

Counterintuitive: agent-generated code is more consistent because:

  • The 18 prompt templates (code-review, security-review, accessibility-audit, write-unit-tests, deslop, add-error-handling, etc.) get applied every time, not just when someone remembers
  • The operating model's standards (one PR = one feature, author review before reviewer, planning doc requirements) are mechanically enforceable by agents
  • The AGENTS.md and workspace harness give agents a map of the codebase, so they don't guess

The paper trail is better

The operating model demands: "the Asana task should stay the async source of truth... link the plan-review PR, the implementation PR, and other artifacts." Agents do this automatically because it's in the MCP toolchain. No more "I'll add the link later."

The Engineering Harness Concept

This is the idea from the OpenAI "Harness Engineering" article (Feb 2026) that resonated: the practice of designing the environment agents work in, rather than writing code directly. Dom built a concrete implementation of this:

  • Workspace harness (workspace-harness.md) — opinionated multi-repo layout with scripts, AGENTS.md, and conventions
  • Qwestly MCP — 18 prompt templates + docs CRUD tools that make engineering knowledge actionable by agents
  • Granola + Asana MCPs — meeting context and project management wired directly into the agent loop
  • ACP setup — portable across vim, zed, cursor, terminal — no editor lock-in

The operating model documents what the process should be. The MCP ecosystem makes it executable by AI.

Key Tension Worth Exploring

The operating model was written before the AI tooling was this capable. Some of its assumptions are already outdated:

  • "Authors not reviewing their own diff" → agents can review their own diff as part of the loop
  • "Plan review in a draft PR" → agent can generate a plan, self-review, and open a ready-to-review PR
  • "EM signoff before implementation" → this becomes the bottleneck when implementation takes hours instead of days

Should the operating model be updated to factor in AI throughput? Or does it stay as a human-layer process while agents operate within it?

Potential Structure

  1. The operating model — what we wrote down as our ideal process
  2. The eight problems we were trying to fix
  3. The MCP layer: how we wired services into every agent
  4. The effect: from weeks to a day, bottleneck flipped to approval
  5. The harness concept: designing the environment, not writing code
  6. Open question: do we update the operating model for agent throughput?

Tags

engineering ai operating-model harness-engineering mcp agents workflow devtools