_private/qwestly-docs/Features/qwestly-agent/framework-comparison.md
Table of Contents
Framework Comparison — Agentic Orchestration in 2026
A detailed, opinionated comparison of the top frameworks for building agentic systems. The goal is to help you pick the right foundation for Qwestly's orchestration layer.
Quick Decision Matrix
| Framework | Language | Best For | Learning Curve | Production Readiness | Qwestly Fit |
|---|---|---|---|---|---|
| LangGraph | Python/TS | Complex agent workflows, state machines | Steep | Very high (LangSmith, LangServe) | ⭐⭐⭐⭐⭐ |
| CrewAI | Python | Multi-agent role-playing, task delegation | Low | High | ⭐⭐⭐⭐ |
| OpenAI Agents SDK | Python | Single-agent tool-use, simplicity | Low | High | ⭐⭐⭐⭐ |
| Pydantic AI | Python | Type-safe agents, structured outputs | Medium | High | ⭐⭐⭐⭐⭐ |
| AutoGen (v0.4+) | Python | Multi-agent conversations, code gen | Medium-High | High (Microsoft) | ⭐⭐⭐ |
| Semantic Kernel | C#/Python | Enterprise .NET ecosystems | Medium | Very high (Microsoft) | ⭐⭐⭐ |
| Dify | Python (platform) | No-code/low-code agent builder | Very low | High | ⭐⭐⭐ |
| Agno (Phidata) | Python | Multi-modal agents, data pipelines | Medium | Medium-High | ⭐⭐⭐ |
| Haystack | Python | RAG-heavy pipelines | Medium | Very high | ⭐⭐⭐ (RAG only) |
| Vercel AI SDK | TypeScript | Next.js apps, streaming | Low | Very high | ⭐⭐⭐ (TS only) |
| Build from scratch | Any | Full control, minimal deps | N/A | Up to you | ⭐⭐⭐⭐ |
1. LangGraph (by LangChain)
Language: Python (mature), TypeScript (catching up) License: MIT Current: Part of LangChain ecosystem, but increasingly treated as the recommended path forward (LangChain as a library of primitives + LangGraph as the orchestration runtime).
What it is
LangGraph is an agent as a state machine framework. You define nodes (LLM calls, tool executions, human review) and edges (conditional routing between nodes). The runtime executes this graph, maintaining state across steps.
# Conceptual — not exact API
graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tools", execute_tools)
graph.add_conditional_edges("llm", should_continue, ...)
graph.set_entry_point("llm")
app = graph.compile()
Strengths
- Most flexible of all frameworks. You can model any agent topology — supervisor, hierarchical, nested subgraphs, cycles.
- LangSmith is best-in-class for debugging/tracing agent runs. You see every LLM call, tool result, and state transition.
- LangServe makes deploying agents as APIs trivial.
- Massive ecosystem: integrations for every vector DB, every LLM provider, every tool.
- Subgraph support: you can compose graphs inside graphs (natural for orchestrator + sub-agents).
- Human-in-the-loop is a first-class concept: you can pause execution at any node and wait for human input.
Weaknesses
- Steep learning curve. The state machine model takes time to internalize. LangChain's documentation has historically been scattered (improving in 2025-26).
- Over-abstraction. Many layers of indirection. Debugging can require tracing through 5+ levels of abstraction.
- Package churn. LangChain has a history of breaking changes (mitigated by the 0.3+ stability push, but still a concern).
- Can be heavy. For simple tool-use, LangGraph is a lot of ceremony.
Sweet spot
Complex, stateful agent workflows where you need fine-grained control over the loop — exactly what Qwestly's orchestrator needs once it grows beyond trivial routing.
✅ Why for Qwestly
The subgraph model maps perfectly: a supervisor graph that routes to sub-graphs for LinkedIn ingestion, Card generation, etc. LangSmith tracing would be invaluable for debugging multi-agent interactions.
❌ Why not
If you just need "call a tool, return answer," LangGraph is overkill. You'd pay the complexity tax without using its strengths.
2. CrewAI
Language: Python License: MIT (with commercial tier) Current: Very active, popular for role-based agent scenarios.
What it is
CrewAI lets you define Agents (with roles, goals, backstories) and Tasks (with descriptions, expected outputs), then assemble them into Crews that execute collaboratively.
from crewai import Agent, Task, Crew
researcher = Agent(role="Data Researcher", goal="Find user info", ...)
writer = Agent(role="Content Writer", goal="Write LinkedIn sections", ...)
research_task = Task(description="Fetch LinkedIn data for user", agent=researcher)
write_task = Task(description="Write about section", agent=writer)
crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = crew.kickoff()
Strengths
- Incredibly easy to get started. The role/backstory metaphor is intuitive. A working multi-agent system in 20 lines.
- Built-in delegation: agents can ask other agents for help.
- Tool integration: You can give agents custom tools (function-calling).
- Process types: Sequential, hierarchical, and soon more complex orchestration.
- Active community with many examples and templates.
Weaknesses
- Role/backstory is a prompt pattern, not architecture. The framework wraps everything in role-play prompts. This is effective but can feel like a gimmick for serious production systems.
- Limited control: You get the CrewAI loop or nothing. Customizing the decision logic is harder than LangGraph.
- Cost opacity: Each agent makes multiple LLM calls. It's easy to accidentally burn tokens.
- Debugging: Less mature than LangSmith. Tracing multi-agent conversations is harder.
Sweet spot
Rapid prototyping and scenarios with distinct agent personas. Great for demos and MVPs.
✅ Why for Qwestly
You could prototype Qwestly's flow in a day: a LinkedInIngestionAgent, a CardGeneratorAgent, an OrchestratorAgent. The role-playing could work well for "personality" in user-facing interactions.
❌ Why not
The lack of fine-grained control may frustrate as you scale. CrewAI is great for getting something working fast; whether it stays great as you add complexity depends on your needs.
3. OpenAI Agents SDK
Language: Python License: MIT (open source, by OpenAI) Current: Launched 2025, rapidly maturing. Successor to the experimental Swarm project.
What it is
A lightweight, official OpenAI SDK for building agentic systems. Core concepts: Agent (LLM + tools), Handoff (agent-to-agent transfer), Guardrails (input/output validation).
from agents import Agent, Runner, handoff
orchestrator = Agent(
name="Orchestrator",
instructions="Route to the right specialist",
handoffs=[
handoff(linkedin_agent),
handoff(card_gen_agent),
],
)
result = Runner.run(orchestrator, "Update my LinkedIn")
Strengths
- Officially maintained by OpenAI — works perfectly with their API (GPT-4o, o3, etc.).
- Simplest API of any framework. Three concepts (Agent, Handoff, Guardrail) cover 90% of use cases.
- Handoffs are elegant: one agent can pass control to another with full context.
- Guardrails as first-class concept: input guardrails (before agent runs) and output guardrails (before response goes to user).
- Streaming is built-in and works well.
- Traceability: Built-in tracing via OpenAI dashboard (similar to LangSmith but OpenAI-native).
Weaknesses
- OpenAI-locked. You can theoretically use it with other providers via custom adapters, but it's clearly designed for OpenAI. If you want flexibility (Anthropic, Gemini, open-source models), this is a limitation.
- Younger ecosystem. Fewer community integrations than LangChain/CrewAI. You'll write more custom code for things like vector DB integration.
- Simpler = less powerful. The handoff model is clean but can't express complex graph topologies. Fine for supervisor → worker, harder for nested workflows.
- Dependency on OpenAI. If OpenAI changes pricing, deprecates features, or has an outage, your system is affected. Not a dealbreaker but a risk.
Sweet spot
Single-agent tool-use and simple supervisor/worker patterns where you're already using OpenAI.
✅ Why for Qwestly
If you're already on OpenAI and want the simplest possible path to a working orchestrator + specialist agents, the Agents SDK is compelling. Handoffs map nicely to Qwestly's intent-routing pattern.
❌ Why not
If you want multi-provider flexibility (e.g., Claude for creative writing + GPT-4o for structured data), this locks you in. Also, complex workflows may outgrow the handoff model.
4. Pydantic AI
Language: Python License: MIT Current: Quietly becoming one of the most popular "not-LangChain" agent frameworks. By the makers of Pydantic.
What it is
A type-safe agent framework built on Pydantic's validation layer. Agents are Python classes with typed tools and structured outputs. The focus is on correctness and developer experience.
from pydantic_ai import Agent, RunContext
class CardOutput(BaseModel):
sections: list[CardSection]
summary: str
card_agent = Agent(
"openai:gpt-4o",
result_type=CardOutput,
system_prompt="Generate Qwestly Cards",
)
@card_agent.tool
async def get_user_profile(ctx: RunContext, user_id: str) -> UserProfile:
"""Fetch user profile data"""
return await db.get_user(user_id)
result = await card_agent.run("Make a card for user 42")
# result.data is a typed CardOutput — guaranteed by Pydantic
Strengths
- Type safety is the core philosophy. Every tool input, tool output, and agent result is validated by Pydantic. No runtime surprises.
- Best-in-class structured output. The
result_typeparameter ensures the LLM's final output matches your schema — with automatic retries if it doesn't. - Lightweight. No heavy abstractions. Just agents, tools, and dependencies.
- Multi-provider: OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama (local models) — all first-class.
- Excellent DX. Dependency injection, testability, mypy/pyright support.
- Logfire (by Pydantic) for observability — similar to LangSmith but Pydantic-native.
Weaknesses
- No built-in multi-agent orchestration. You build that yourself. No handoffs, no subgraphs, no crew. You can compose agents manually, but there's no framework for it.
- Smaller ecosystem. Fewer pre-built integrations. More "bring your own" for vector DB, memory, etc.
- Younger. Active development, but less battle-tested at scale than LangGraph.
- Python only. No TypeScript version.
Sweet spot
Type-safe, production-grade single agents and simple orchestrator patterns. Excellent when correctness matters (structured outputs, validated tool calls).
✅ Why for Qwestly
This is actually a strong contender. You can build an orchestrator as an Agent with tools (including tools that invoke other Pydantic AI agents internally). The type safety means card generation outputs are guaranteed to match your schema. Good multi-provider support (Claude for creative writing, GPT for structured data).
❌ Why not
If you want a framework that handles multi-agent orchestration out of the box, this isn't it. You'll design your own routing layer. Also, less community support means more problems to solve yourself.
5. AutoGen (v0.4+, by Microsoft)
Language: Python, .NET License: MIT Current: Major rewrite in 2025 (v0.4). Now more modular and event-driven.
What it is
Microsoft's framework for multi-agent conversations. Agents communicate by sending messages to each other (publish/subscribe model). The focus is on flexible agent interactions.
Strengths
- Flexible messaging model. Agents can have complex multi-turn conversations.
- Strong code generation agents (useful for "write a script to do X").
- Microsoft backing — heavy investment, active development.
- GroupChat pattern: multiple agents discuss a task with a moderator guiding the conversation.
Weaknesses
- v0.4 rewrite means many v0.2 examples and blog posts are obsolete. Documentation is still catching up.
- Event-driven model is powerful but complex. Harder to reason about than a linear graph.
- Code generation focus. Many features are optimized for coding agents, which may not map well to Qwestly's career-agent use case.
Sweet spot
Research scenarios, multi-agent debate, code generation.
Verdict for Qwestly
Probably not the best fit. The core value prop (agent conversations, code gen) doesn't align with Qwestly's needs (intent routing, structured data retrieval, content generation). The complexity isn't justified.
6. Semantic Kernel (by Microsoft)
Language: C# (primary), Python (secondary) License: MIT Current: Enterprise-focused, well-integrated with Azure.
What it is
Microsoft's lightweight SDK for integrating LLMs into traditional applications. Agents, plugins, memory, planners — all designed to fit into .NET enterprise architectures.
Strengths
- Best .NET support by far. If your stack is C#, this is the obvious choice.
- Strong enterprise features: telemetry, AAD auth, Azure integration.
- Planner: an auto-generated step-by-step plan for multi-step tasks.
- Plugin ecosystem: connectors for Office 365, SharePoint, Dynamics, etc.
Weaknesses
- Python support is second-class. The Python SDK lags behind the C# one. If you want Python, other frameworks are better.
- Less flexible than LangGraph for complex agent topologies.
- Smaller community compared to LangChain.
Verdict for Qwestly
If you're in a .NET shop and Qwestly's backend is C#, absolutely consider it. If you're going Python (which you said you're leaning toward), skip this — it's not competitive with Python-native frameworks.
7. Dify
Language: Platform (backend Python, frontend TS) License: Apache 2.0 (open source) Current: Very popular for no-code/low-code LLMOps.
What it is
An open-source LLM application platform with a visual editor. You build agents, RAG pipelines, and workflows through a drag-and-drop interface, with code extensions for custom logic.
Strengths
- Visual workflow builder — non-engineers can build and modify agent flows.
- All-in-one: RAG, agent, workflow, monitoring, all in one platform.
- Self-hostable (Docker).
- Excellent for rapid prototyping — you can have a working chatbot with RAG in an hour.
Weaknesses
- It's a platform, not a library. You build inside Dify, not in your own codebase. This creates coupling to Dify's deployment model.
- Custom code feels bolted-on. Complex logic requires writing Python code blocks, but they run in Dify's sandbox, which is restrictive.
- Not great for complex orchestration. The visual workflow model breaks down for nested agents, dynamic routing, etc.
- Scaling concerns. Self-hosted Dify can be resource-intensive. The cloud version has pricing per app.
Verdict for Qwestly
Great for prototyping a Qwestly chatbot quickly (you could build it in a day). But for production with custom agent logic, data pipelines, and complex orchestration, you'll likely hit Dify's ceiling. Consider it as an internal tool (e.g., customer support agent) rather than the core orchestration layer.
8. Agno (formerly Phidata)
Language: Python License: MPL-2.0 Current: Rebranded from Phidata in 2025. Focus on multi-modal agents.
What it is
A full-stack agent framework with emphasis on knowledge bases, multi-modal capabilities (images, audio, video), and data analysis.
Strengths
- Built-in knowledge bases with automatic embeddings.
- Multi-modal support: agents that can process images, PDFs, structured data.
- Beautiful agent UI included (playground for testing).
- Tool integrations for common data sources.
Weaknesses
- Fewer integrations than LangChain for specialized tools/vector DBs.
- Younger community. Less battle-tested.
- Multi-modal focus may not be relevant for Qwestly (unless you're processing images/PDFs heavily).
Verdict for Qwestly
Worth a look if Qwestly needs to process PDFs (existing resumes, LinkedIn PDFs) or images (profile photos, screenshots). Not the top choice for pure text-based orchestration.
9. Haystack (by deepset)
Language: Python License: Apache 2.0 Current: v2.x, focused on RAG pipelines with agent capabilities.
What it is
A framework for building search and RAG pipelines, with recent agent features added. Strongest when your primary need is retrieval.
Strengths
- Best RAG pipeline builder in the ecosystem. Document processing, chunking, embedding, hybrid search — all first-class.
- Production-grade: used by enterprises for search systems.
- Multi-provider: OpenAI, Cohere, Anthropic, local models via Hugging Face.
- Pipeline visualization (built-in tracing).
Weaknesses
- Agent features are newer and less mature than the RAG pipeline features.
- Multi-agent orchestration is not a core strength. Haystack is a RAG framework with agent extras, not an agent framework with RAG extras.
- Smaller agent community than LangChain/CrewAI.
Verdict for Qwestly
If RAG is a primary feature (users asking free-form questions about their own data), Haystack is excellent — but you'd likely pair it with a separate orchestration framework. As a sole framework for Qwestly's agent orchestration, it's not the right fit.
10. Vercel AI SDK
Language: TypeScript License: Apache 2.0 Current: Very active, dominant in the Next.js ecosystem.
What it is
A TypeScript SDK for building streaming AI interfaces. Includes tool-use, multi-step agents, and RAG — all designed for frontend-heavy architectures.
Strengths
- Best streaming of any framework. Built for UX where users see tokens as they're generated.
- First-class tool-use with type-safe tool definitions.
- Seamless Next.js integration (server components, edge runtime, etc.).
- Provider-agnostic: OpenAI, Anthropic, Google, Mistral, and open-source models via AI SDK providers.
Weaknesses
- TypeScript only. If you want Python, this isn't it.
- Frontend-focused. The agent loop runs on the server, but the SDK's design priorities are UI-centric. Backend-heavy architectures may find it awkward.
- Not designed for multi-agent orchestration. You'd build your own supervision layer.
Verdict for Qwestly
If Qwestly's frontend is Next.js and you want the agent loop to run close to the UI (edge functions, instant streaming), this is compelling. But you said you're leaning toward Python, and the orchestration logic is back-end work — so this is probably not the primary framework for your use case. Could be the frontend layer that talks to a Python orchestration backend.
11. Build from Scratch
Using: OpenAI / Anthropic / Gemini API directly + your own loop.
Pros
- No dependency on framework churn. The API surface of LLM providers changes slower than any framework.
- Full understanding of every line. When something breaks, you know exactly where.
- Minimal overhead. No abstractions you don't need.
- Exact fit: You build exactly what Qwestly needs, nothing more.
Cons
- You implement everything: tool-calling loop, retry logic, streaming, error handling, state management, conversation memory, tracing.
- No LangSmith / Logfire observability out of the box (but you can add OpenTelemetry).
- Lower velocity early on. Frameworks give you a head start on common patterns.
Recommended approach
Start with a lightweight framework (Pydantic AI or OpenAI Agents SDK), not from scratch. The "trivial" parts (tool-calling loop, error recovery, streaming) are actually not trivial to get right. Once you understand the patterns, you can always replace the framework later — the core logic (your tools, prompts, data models) is framework-agnostic.
Python vs. TypeScript — The Decision
Python
| Pro | Con |
|---|---|
| Dominant ecosystem for AI/ML | Async can be tricky (but asyncio is good now) |
| Every framework listed above is Python-first | Weaker type system than TS (but Pydantic + mypy close the gap) |
| Richer data science / NLP libraries | Some deployment platforms favor Node.js |
| Larger talent pool for AI engineers |
TypeScript
| Pro | Con |
|---|---|
| Excellent type system | Fewer AI-native frameworks |
| Better async model (Promises are ergonomic) | Most TS frameworks are wrappers around Python ones |
| Vercel AI SDK is excellent for streaming UI | Data science / NLP is less natural |
| Edge runtime deployment | Smaller AI engineering talent pool |
Verdict
Go Python. The ecosystem advantage is decisive for an AI-native startup. Every framework, every vector DB library, every embedding model has Python as a first-class citizen. TypeScript's advantages (type safety, async, edge runtime) are real but not enough to outweigh Python's ecosystem for this use case.
If you want TypeScript for the frontend and Python for the orchestration backend, that's a very common and good architecture.
My Framework Tier List for Qwestly
Tier 1 (Strongly Consider)
| Framework | Why |
|---|---|
| Pydantic AI | Type-safe, lightweight, multi-provider, excellent structured output. Build your own orchestrator on top. Best "I know what I'm doing" choice. |
| OpenAI Agents SDK | Simplest path if you're locked into OpenAI. Handoffs map well to Qwestly's routing needs. |
| LangGraph | Most powerful and flexible. Best for complex production systems. Pick this if you anticipate sophisticated workflows. |
Tier 2 (Worth Evaluating)
| Framework | Why |
|---|---|
| CrewAI | Fastest path to a working prototype. Evaluate whether the role-play model works for your team. |
| Dify | Good for internal tools and rapid prototyping. Not for the core agent runtime. |
Tier 3 (Skip for Qwestly)
| Framework | Why |
|---|---|
| AutoGen | Code-gen focus, complex model, doesn't align with Qwestly's needs. |
| Semantic Kernel | Only if you're on .NET. |
| Haystack | Great RAG, but not an orchestration framework. Use for RAG, not for agents. |
| Vercel AI SDK | If you go Python, this doesn't apply. If you go TS, it's worth a look for the frontend, not the backend. |
| Agno | Multi-modal focus not relevant. Smaller ecosystem. |