_private/qwestly-docs/Features/qwestly-agent/architecture-overview.md
Table of Contents
Agentic Architecture Overview
Foundational concepts for designing a system where an orchestrator agent receives a user request, figures out intent, and dispatches work to sub-agents or tool calls. No framework-specific opinions here โ just the building blocks you'll need regardless of what you pick.
1. What Is an "Agent" in 2026?
The term "agent" is overloaded. In practice, an agent is a loop:
LLM call โ parse output โ execute tool(s) โ feed results back โ LLM call again
This loop is the atomic unit. Everything else โ routing, sub-agents, memory, RAG โ is layered on top. If there's no tool-calling loop, it's not an agent; it's a fancy completion.
Degrees of agency
| Level | What it does | Example |
|---|---|---|
| Tool-augmented LLM | One-shot: LLM picks tools, executes them, returns result. No loop. | "Classify this email" โ calls a classifier tool โ done. |
| Single-step agent | LLM calls tools, gets results, generates final answer. One loop turn. | "What do we know about user X?" โ query DB โ format answer. |
| Multi-step agent | Can chain multiple tool calls, using prior results as context. | "Generate a Qwestly Card" โ fetch LinkedIn data โ analyze gaps โ write sections โ format PDF. |
| Orchestrator | An agent whose primary "tool" is delegating to other agents. | "Update my LinkedIn" โ decides: needs data ingestion + content generation โ spawns sub-agents. |
For Qwestly, you'll want at least multi-step agents for individual tasks and an orchestrator agent to route between them.
2. Agent Topologies (How Agents Talk to Each Other)
A. Supervisor / Orchestrator (recommended for Qwestly)
User โ [Orchestrator Agent] โโroutes toโโโ [Specialist Agent A]
โโโโโ [Specialist Agent B]
โโโโโ [Tool/API calls]
- One "main" agent receives the request, classifies intent, and delegates.
- Delegation can be: spawn a sub-agent, make a tool call, or answer from memory.
- Pros: Clear control flow, easy to audit, easy to add new capabilities.
- Cons: Orchestrator is a single point of design complexity; if it mis-classifies intent, the wrong agent fires.
B. Round-Robin / Debate
Agent A โ Agent B โ Agent C โ (back to A until consensus)
- Agents talk to each other in turns, converging on an answer.
- Popularized by AutoGen and ChatDev.
- Pros: Good for tasks requiring multiple perspectives (e.g., code generation + review).
- Cons: Hard to control, expensive (many LLM calls), nondeterministic.
- Not recommended for Qwestly's use case โ you don't need debate, you need routing.
C. Hierarchical
Supervisor
โโโ Team Lead A
โ โโโ Worker A1
โ โโโ Worker A2
โโโ Team Lead B
โโโ Worker B1
โโโ Worker B2
- Multi-level delegation. A supervisor delegates to team leads, who delegate to workers.
- Pros: Scales to very complex systems (think: "build a whole app").
- Cons: Deep nesting = latency, complexity, debugging hell.
- Maybe useful if Qwestly grows to have departments of agents (data ingestion team, content gen team, export team), but overkill for v1.
D. Tool-based (Agents as Functions)
- No "sub-agents" at all โ the orchestrator sees everything as a tool call.
generate_qwestly_card(user_id)is just another tool in the orchestrator's toolbox.- The tool itself may be backed by a simple script, not a full agent loop.
- Pros: Simplest mental model. Easy to test. Reuses existing services.
- Cons: If a tool needs sub-reasoning (e.g., "decide which card template to use"), that logic has to live somewhere โ either in the tool implementation or in the orchestrator's prompt.
My recommendation for Qwestly: Start with Tool-based for most things, graduate to Supervisor with a few specialist agents when the single orchestrator's prompt gets too long or the routing logic becomes genuinely complex.
3. The Orchestrator Loop
Whether you use a framework or build from scratch, the core loop is:
1. Receive user message
2. System prompt + conversation history โ LLM
3. LLM responds with either:
a. A tool call (function-calling / tool-use API)
b. A final text answer
4. If tool call: execute tool, append result to conversation, go to step 2
5. If final answer: return to user
Key design decisions
a. Structured outputs vs. raw text parsing
- Function-calling API (OpenAI, Anthropic, Gemini all support it natively) is the gold standard. The LLM returns JSON-structured tool invocations. No parsing needed.
- Raw text parsing (regex the model's output for tool calls) was necessary before function-calling existed. Don't do this in 2026.
b. Max iterations
- Always set a limit (e.g., 10 tool calls per request). Otherwise a confused agent can loop infinitely burning tokens.
c. Error handling
- What happens when a tool returns an error? The agent should see the error and decide: retry, try something else, or apologize. This is automatic in any good framework.
d. Streaming
- Can the user see the agent's reasoning as it happens? This is important for UX โ a blank "thinking..." state for 30 seconds feels broken. Streaming tokens + tool-call status messages is the standard.
4. Memory
Three kinds of memory matter for an agentic system:
| Type | Where it lives | What it stores | Qwestly use case |
|---|---|---|---|
| Conversation memory | In-memory (session) or DB | Current chat history | "What did we just say?" |
| User profile memory | Database (Postgres, etc.) | Facts about the user | LinkedIn data, preferences, generated cards |
| Knowledge base | Vector DB (RAG) | Documents, reference material | Company policies, card templates, writing style guides |
Conversation strategies
- Sliding window: Keep last N messages. Simple, cheap. Fine for most Qwestly interactions.
- Summarization: Periodically summarize old history into a single message. Extends context without losing everything.
- Hybrid: Window for recent + summary for old. Best of both.
For Qwestly v1, sliding window is fine. Upgrade to hybrid if users have long sessions.
5. Tool-Use Patterns
Tools are how agents affect the world. Every tool needs:
- Name โ short, descriptive
- Description โ tells the LLM when to use it (this is critical โ bad descriptions = wrong tool choices)
- Input schema โ JSON Schema of parameters
- Implementation โ the actual code that runs
Tool design principles
- Granularity matters. Too coarse: one
do_everything()tool that the LLM can't reason about. Too fine: 50 micro-tools the LLM can't navigate. Aim for ~5-15 tools per agent. - Descriptions are prompts. Write them like instructions:
"Use this when the user asks to generate a Qwestly Card. Requires a user_id. This will create a draft card in the user's account." - Idempotency. Tool calls can be retried. Design tools that are safe to run twice.
For Qwestly, likely tools:
| Tool | Description |
|---|---|
get_user_profile(user_id) |
Returns what we currently know about the user |
ingest_linkedin_profile(linkedin_url) |
Fetches + stores LinkedIn data for a user |
query_user_data(user_id, natural_query) |
RAG/DB query about what we know (e.g., "what schools did they attend?") |
suggest_about_section(user_id) |
Generates an improved LinkedIn About section |
generate_qwestly_card(user_id, format?) |
Creates or regenerates a Qwestly Card |
list_available_services() |
What can Qwestly do? Useful for onboarding |
6. RAG (Retrieval-Augmented Generation)
RAG lets an agent answer questions about documents it wasn't trained on. The classic flow:
User question โ embed query โ search vector DB โ retrieve top-k chunks โ
inject into LLM context โ LLM answers with citations
When RAG helps Qwestly
- Users asking "what do you know about me?" โ retrieve from their profile data
- Users asking "how does the Qwestly Card work?" โ retrieve from your documentation
- Generating personalized content โ retrieve user's past cards, writing style, preferences
RAG is NOT always the answer
- If the data is structured (DB rows, API responses), tool calls are simpler and more reliable than RAG.
- Use RAG for unstructured text โ documents, notes, past outputs, competitor profiles.
- For Qwestly: profile Q&A might be a hybrid โ structured data via tool + unstructured notes via RAG.
7. Multi-Agent Communication Patterns
When one agent needs to talk to another:
a. Orchestrator delegates (recommended)
Orchestrator: "Generate a Qwestly Card for user 42"
โ calls sub-agent CardAgent(42) as a tool
โ CardAgent returns result
โ Orchestrator wraps it in a friendly response
b. Shared bus / message queue
Agent A publishes {type: "card_requested", user_id: 42}
Agent B (subscribed to card_requests) picks it up, processes, publishes result
Agent A picks up result
More complex but scales better. Overkill for v1.
c. Agent-as-tool (simplest)
The orchestrator doesn't know it's calling an "agent." It just calls a tool. The tool implementation happens to be agentic internally. This is my recommendation for Qwestly v1 โ each capability looks like a tool, but some tools internally use an LLM loop.
Tool: generate_qwestly_card(user_id)
โ internally: calls LLM with profile data โ generates sections โ
calls formatting tool โ returns result
The orchestrator never knows there's a sub-agent. This is the cleanest abstraction.
8. Guardrails & Safety
Even for a career agent, you need:
- Prompt injection resistance: Users shouldn't be able to trick the agent into running arbitrary queries. Validate tool inputs.
- Scoped tool access: The orchestrator should only expose tools appropriate to the user's role/permissions.
- Human-in-the-loop: Before the agent posts anything public (e.g., "update my LinkedIn"), require user confirmation.
- Audit logging: Every tool call, every LLM response, every generation. Essential for debugging and trust.
9. Key Terms Glossary
| Term | Definition |
|---|---|
| Agent | An LLM-powered loop that can use tools to accomplish tasks |
| Orchestrator | The top-level agent that routes requests to sub-agents or tools |
| Tool | A function the LLM can call (API, DB query, code execution) |
| Function-calling | Native LLM API support for structured tool invocation |
| RAG | Retrieval-Augmented Generation โ injecting relevant documents into LLM context |
| MCP | Model Context Protocol โ a standardized way to expose tools/data to LLMs |
| Vector DB | Database optimized for similarity search (Pinecone, Chroma, Qdrant, pgvector) |
| Embedding | A vector representation of text that captures semantic meaning |
| System Prompt | The persistent instructions prepended to every LLM call |
| Structured Output | LLM response constrained to a JSON schema (e.g., tool calls or typed responses) |