Agentic Architecture Overview

Foundational concepts for designing a system where an orchestrator agent receives a user request, figures out intent, and dispatches work to sub-agents or tool calls. No framework-specific opinions here — just the building blocks you'll need regardless of what you pick.

1. What Is an "Agent" in 2026?

The term "agent" is overloaded. In practice, an agent is a loop:

LLM call → parse output → execute tool(s) → feed results back → LLM call again

This loop is the atomic unit. Everything else — routing, sub-agents, memory, RAG — is layered on top. If there's no tool-calling loop, it's not an agent; it's a fancy completion.

Degrees of agency

Level	What it does	Example
Tool-augmented LLM	One-shot: LLM picks tools, executes them, returns result. No loop.	"Classify this email" → calls a classifier tool → done.
Single-step agent	LLM calls tools, gets results, generates final answer. One loop turn.	"What do we know about user X?" → query DB → format answer.
Multi-step agent	Can chain multiple tool calls, using prior results as context.	"Generate a Qwestly Card" → fetch LinkedIn data → analyze gaps → write sections → format PDF.
Orchestrator	An agent whose primary "tool" is delegating to other agents.	"Update my LinkedIn" → decides: needs data ingestion + content generation → spawns sub-agents.

For Qwestly, you'll want at least multi-step agents for individual tasks and an orchestrator agent to route between them.

2. Agent Topologies (How Agents Talk to Each Other)

A. Supervisor / Orchestrator (recommended for Qwestly)

User → [Orchestrator Agent] ──routes to──→ [Specialist Agent A]
                                     └───→ [Specialist Agent B]
                                     └───→ [Tool/API calls]

One "main" agent receives the request, classifies intent, and delegates.
Delegation can be: spawn a sub-agent, make a tool call, or answer from memory.
Pros: Clear control flow, easy to audit, easy to add new capabilities.
Cons: Orchestrator is a single point of design complexity; if it mis-classifies intent, the wrong agent fires.

B. Round-Robin / Debate

Agent A → Agent B → Agent C → (back to A until consensus)

Agents talk to each other in turns, converging on an answer.
Popularized by AutoGen and ChatDev.
Pros: Good for tasks requiring multiple perspectives (e.g., code generation + review).
Cons: Hard to control, expensive (many LLM calls), nondeterministic.
Not recommended for Qwestly's use case — you don't need debate, you need routing.

C. Hierarchical

Supervisor
  ├── Team Lead A
  │     ├── Worker A1
  │     └── Worker A2
  └── Team Lead B
        ├── Worker B1
        └── Worker B2

Multi-level delegation. A supervisor delegates to team leads, who delegate to workers.
Pros: Scales to very complex systems (think: "build a whole app").
Cons: Deep nesting = latency, complexity, debugging hell.
Maybe useful if Qwestly grows to have departments of agents (data ingestion team, content gen team, export team), but overkill for v1.

D. Tool-based (Agents as Functions)

No "sub-agents" at all — the orchestrator sees everything as a tool call.
generate_qwestly_card(user_id) is just another tool in the orchestrator's toolbox.
The tool itself may be backed by a simple script, not a full agent loop.
Pros: Simplest mental model. Easy to test. Reuses existing services.
Cons: If a tool needs sub-reasoning (e.g., "decide which card template to use"), that logic has to live somewhere — either in the tool implementation or in the orchestrator's prompt.

My recommendation for Qwestly: Start with Tool-based for most things, graduate to Supervisor with a few specialist agents when the single orchestrator's prompt gets too long or the routing logic becomes genuinely complex.

3. The Orchestrator Loop

Whether you use a framework or build from scratch, the core loop is:

1. Receive user message
2. System prompt + conversation history → LLM
3. LLM responds with either:
   a. A tool call (function-calling / tool-use API)
   b. A final text answer
4. If tool call: execute tool, append result to conversation, go to step 2
5. If final answer: return to user

Key design decisions

a. Structured outputs vs. raw text parsing

Function-calling API (OpenAI, Anthropic, Gemini all support it natively) is the gold standard. The LLM returns JSON-structured tool invocations. No parsing needed.
Raw text parsing (regex the model's output for tool calls) was necessary before function-calling existed. Don't do this in 2026.

b. Max iterations

Always set a limit (e.g., 10 tool calls per request). Otherwise a confused agent can loop infinitely burning tokens.

c. Error handling

What happens when a tool returns an error? The agent should see the error and decide: retry, try something else, or apologize. This is automatic in any good framework.

d. Streaming

Can the user see the agent's reasoning as it happens? This is important for UX — a blank "thinking..." state for 30 seconds feels broken. Streaming tokens + tool-call status messages is the standard.

4. Memory

Three kinds of memory matter for an agentic system:

Type	Where it lives	What it stores	Qwestly use case
Conversation memory	In-memory (session) or DB	Current chat history	"What did we just say?"
User profile memory	Database (Postgres, etc.)	Facts about the user	LinkedIn data, preferences, generated cards
Knowledge base	Vector DB (RAG)	Documents, reference material	Company policies, card templates, writing style guides

Conversation strategies

Sliding window: Keep last N messages. Simple, cheap. Fine for most Qwestly interactions.
Summarization: Periodically summarize old history into a single message. Extends context without losing everything.
Hybrid: Window for recent + summary for old. Best of both.

For Qwestly v1, sliding window is fine. Upgrade to hybrid if users have long sessions.

5. Tool-Use Patterns

Tools are how agents affect the world. Every tool needs:

Name — short, descriptive
Description — tells the LLM when to use it (this is critical — bad descriptions = wrong tool choices)
Input schema — JSON Schema of parameters
Implementation — the actual code that runs

Tool design principles

Granularity matters. Too coarse: one do_everything() tool that the LLM can't reason about. Too fine: 50 micro-tools the LLM can't navigate. Aim for ~5-15 tools per agent.
Descriptions are prompts. Write them like instructions: "Use this when the user asks to generate a Qwestly Card. Requires a user_id. This will create a draft card in the user's account."
Idempotency. Tool calls can be retried. Design tools that are safe to run twice.

For Qwestly, likely tools:

Tool	Description
`get_user_profile(user_id)`	Returns what we currently know about the user
`ingest_linkedin_profile(linkedin_url)`	Fetches + stores LinkedIn data for a user
`query_user_data(user_id, natural_query)`	RAG/DB query about what we know (e.g., "what schools did they attend?")
`suggest_about_section(user_id)`	Generates an improved LinkedIn About section
`generate_qwestly_card(user_id, format?)`	Creates or regenerates a Qwestly Card
`list_available_services()`	What can Qwestly do? Useful for onboarding

6. RAG (Retrieval-Augmented Generation)

RAG lets an agent answer questions about documents it wasn't trained on. The classic flow:

User question → embed query → search vector DB → retrieve top-k chunks → 
inject into LLM context → LLM answers with citations

When RAG helps Qwestly

Users asking "what do you know about me?" → retrieve from their profile data
Users asking "how does the Qwestly Card work?" → retrieve from your documentation
Generating personalized content → retrieve user's past cards, writing style, preferences

RAG is NOT always the answer

If the data is structured (DB rows, API responses), tool calls are simpler and more reliable than RAG.
Use RAG for unstructured text — documents, notes, past outputs, competitor profiles.
For Qwestly: profile Q&A might be a hybrid — structured data via tool + unstructured notes via RAG.

7. Multi-Agent Communication Patterns

When one agent needs to talk to another:

a. Orchestrator delegates (recommended)

Orchestrator: "Generate a Qwestly Card for user 42"
  → calls sub-agent CardAgent(42) as a tool
  → CardAgent returns result
  → Orchestrator wraps it in a friendly response

b. Shared bus / message queue

Agent A publishes {type: "card_requested", user_id: 42}
Agent B (subscribed to card_requests) picks it up, processes, publishes result
Agent A picks up result

More complex but scales better. Overkill for v1.

c. Agent-as-tool (simplest)

The orchestrator doesn't know it's calling an "agent." It just calls a tool. The tool implementation happens to be agentic internally. This is my recommendation for Qwestly v1 — each capability looks like a tool, but some tools internally use an LLM loop.

Tool: generate_qwestly_card(user_id)
  → internally: calls LLM with profile data → generates sections → 
    calls formatting tool → returns result

The orchestrator never knows there's a sub-agent. This is the cleanest abstraction.

8. Guardrails & Safety

Even for a career agent, you need:

Prompt injection resistance: Users shouldn't be able to trick the agent into running arbitrary queries. Validate tool inputs.
Scoped tool access: The orchestrator should only expose tools appropriate to the user's role/permissions.
Human-in-the-loop: Before the agent posts anything public (e.g., "update my LinkedIn"), require user confirmation.
Audit logging: Every tool call, every LLM response, every generation. Essential for debugging and trust.

9. Key Terms Glossary

Term	Definition
Agent	An LLM-powered loop that can use tools to accomplish tasks
Orchestrator	The top-level agent that routes requests to sub-agents or tools
Tool	A function the LLM can call (API, DB query, code execution)
Function-calling	Native LLM API support for structured tool invocation
RAG	Retrieval-Augmented Generation — injecting relevant documents into LLM context
MCP	Model Context Protocol — a standardized way to expose tools/data to LLMs
Vector DB	Database optimized for similarity search (Pinecone, Chroma, Qdrant, pgvector)
Embedding	A vector representation of text that captures semantic meaning
System Prompt	The persistent instructions prepended to every LLM call
Structured Output	LLM response constrained to a JSON schema (e.g., tool calls or typed responses)

Referenced by

index

_private/qwestly-docs/Features/qwestly-agent/architecture-overview.md