Building an 8-layer agentic delegation system: orchestration, routing, and cost discipline across a multi-LLM stack

The problem this solves

Running multiple AI models across multiple providers without a routing framework produces predictable outcomes: premium model tokens burned on low-value tasks, no separation between the agent doing the work and the agent checking it, no cost discipline, no status tracking between agents, and no systematic handoff protocol.

This framework solves all of it.

The 8-layer hierarchy

Each layer has a defined ownership scope, specific inputs, and specific outputs. Nothing crosses layers informally.

Layer 0 — Principal. Human decision owner. All escalations terminate here.

Layer 1 — Executive Orchestrators. Set goals, define scope, assign workflows, approve escalations, accept final output on high-stakes tasks.

Layer 2 — Delegation Managers / Routers. Choose model and service. Decide cheapest viable route vs premium strategist. Choose Witsy vs Ollama vs paid API. Enforce cost and latency rules.

Layer 3 — Researchers / Extractors. Pull source data. Scrape or retrieve task inputs. Normalize into structured payloads for downstream layers.

Layer 4 — Scorers / Classifiers. First-pass issue scoring, clustering, categorization, priority assignment, evidence labeling.

Layer 5 — Writers / Generators. Create drafts. Produce reports. Generate structured outputs.

Layer 6 — Critics / QA. Challenge the draft. Verify fit against requirements. The critic is always a separate model from the writer — never self-review.

Layer 7 — Publishers / Executors. Save files. Update dashboards. Push outputs to storage. Hand off to implementation channels.

Layer 8 — Monitors / Reporters. Track status. Normalize metrics. Maintain workflow state.

Free-first routing logic

The default execution rule: use the cheapest viable route. Escalate to paid or premium models only when the task genuinely requires it.

Task type	Default route	Escalate when
Bulk extraction	Cheapest reliable worker	Extraction quality weak
Classification/scoring	Cheap structured worker	Ambiguity high
Strategic brief	Premium strategist	High-stakes scope
Draft writing	Strong generator	Quality weak
QA/critique	Separate critic model	Risk high
Local/private drafting	Ollama peer¹	Quality insufficient

The Witsy substrate

Witsy (local service, port 8090) functions as the primary delegation hub — routing tasks across an engine inventory spanning OpenRouter, Mistral, Google Gemini, Groq, Cerebras, OpenAI, Anthropic, DeepSeek, xAI, and Ollama.²

Ollama operates as a peer local inference layer, not a subordinate Witsy dependency.¹ Recommended use cases: local low-cost summarization, quick reasoning or coding helpers, preprocessing, private draft generation, and overflow work when central routes are constrained.

Inbox/outbox protocol

Cross-agent handoff uses a structured message schema with defined status states:

{
  "task_id": "uuid-or-timestamp",
  "agent_from": "source-agent",
  "agent_to": "target-agent",
  "instruction": "detailed task",
  "model": "preferred-model",
  "timeout_sec": 180,
  "priority": "normal"
}

Status states: received → running → complete → error → escalation needed.

7 workflow types

Each workflow is a defined chain from orchestrator to publisher — templates, not ad hoc decisions.

Article generation: SERP/top-10 extractor → factor matrix scorer → brief writer → article drafter → critic/SEO QA → publisher.

GBP / local optimization: Local research extractor → profile auditor → landing-page matcher → review/proof strategist → critic/QA → implementer/tracker.

400-site audit: Bulk extractor → first-pass classifier → issue clusterer → strategist summarizer → critic/QA → report generator. Core rule: do not use premium models on every site first.

Marketing plan generation: Baseline aggregator → package-fit scorer → roadmap drafter → critic/QA → dashboard initializer.

Additional workflow templates: image generation, video generation, dashboard/reporting.

Cost discipline

Default to free-first
Reserve premium models for synthesis, executive QA, and genuinely hard reasoning
Keep writer and critic separate — never self-review
Do not burn premium tokens on bulk first-pass tasks

Sources

Ollama. Local LLM Inference. ollama.com. Open-source tool for running large language models locally via HTTP API. Used as a peer inference layer within the Zeus multi-agent stack. ↩︎ ↩︎
Witsy. Multi-Provider AI Client. witsyai.com. Desktop AI client supporting multiple LLM providers via unified API routing. Configured at port 8090 within the Zeus infrastructure. ↩︎