Table of Contents

Framework Comparison — Agentic Orchestration in 2026

A detailed, opinionated comparison of the top frameworks for building agentic systems. The goal is to help you pick the right foundation for Qwestly's orchestration layer.

Quick Decision Matrix

Framework	Language	Best For	Learning Curve	Production Readiness	Qwestly Fit
LangGraph	Python/TS	Complex agent workflows, state machines	Steep	Very high (LangSmith, LangServe)	⭐⭐⭐⭐⭐
CrewAI	Python	Multi-agent role-playing, task delegation	Low	High	⭐⭐⭐⭐
OpenAI Agents SDK	Python	Single-agent tool-use, simplicity	Low	High	⭐⭐⭐⭐
Pydantic AI	Python	Type-safe agents, structured outputs	Medium	High	⭐⭐⭐⭐⭐
AutoGen (v0.4+)	Python	Multi-agent conversations, code gen	Medium-High	High (Microsoft)	⭐⭐⭐
Semantic Kernel	C#/Python	Enterprise .NET ecosystems	Medium	Very high (Microsoft)	⭐⭐⭐
Dify	Python (platform)	No-code/low-code agent builder	Very low	High	⭐⭐⭐
Agno (Phidata)	Python	Multi-modal agents, data pipelines	Medium	Medium-High	⭐⭐⭐
Haystack	Python	RAG-heavy pipelines	Medium	Very high	⭐⭐⭐ (RAG only)
Vercel AI SDK	TypeScript	Next.js apps, streaming	Low	Very high	⭐⭐⭐ (TS only)
Build from scratch	Any	Full control, minimal deps	N/A	Up to you	⭐⭐⭐⭐

1. LangGraph (by LangChain)

Language: Python (mature), TypeScript (catching up) License: MIT Current: Part of LangChain ecosystem, but increasingly treated as the recommended path forward (LangChain as a library of primitives + LangGraph as the orchestration runtime).

What it is

LangGraph is an agent as a state machine framework. You define nodes (LLM calls, tool executions, human review) and edges (conditional routing between nodes). The runtime executes this graph, maintaining state across steps.

# Conceptual — not exact API
graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tools", execute_tools)
graph.add_conditional_edges("llm", should_continue, ...)
graph.set_entry_point("llm")
app = graph.compile()

Strengths

Most flexible of all frameworks. You can model any agent topology — supervisor, hierarchical, nested subgraphs, cycles.
LangSmith is best-in-class for debugging/tracing agent runs. You see every LLM call, tool result, and state transition.
LangServe makes deploying agents as APIs trivial.
Massive ecosystem: integrations for every vector DB, every LLM provider, every tool.
Subgraph support: you can compose graphs inside graphs (natural for orchestrator + sub-agents).
Human-in-the-loop is a first-class concept: you can pause execution at any node and wait for human input.

Weaknesses

Steep learning curve. The state machine model takes time to internalize. LangChain's documentation has historically been scattered (improving in 2025-26).
Over-abstraction. Many layers of indirection. Debugging can require tracing through 5+ levels of abstraction.
Package churn. LangChain has a history of breaking changes (mitigated by the 0.3+ stability push, but still a concern).
Can be heavy. For simple tool-use, LangGraph is a lot of ceremony.

Sweet spot

Complex, stateful agent workflows where you need fine-grained control over the loop — exactly what Qwestly's orchestrator needs once it grows beyond trivial routing.

✅ Why for Qwestly

The subgraph model maps perfectly: a supervisor graph that routes to sub-graphs for LinkedIn ingestion, Card generation, etc. LangSmith tracing would be invaluable for debugging multi-agent interactions.

❌ Why not

If you just need "call a tool, return answer," LangGraph is overkill. You'd pay the complexity tax without using its strengths.

2. CrewAI

Language: Python License: MIT (with commercial tier) Current: Very active, popular for role-based agent scenarios.

What it is

CrewAI lets you define Agents (with roles, goals, backstories) and Tasks (with descriptions, expected outputs), then assemble them into Crews that execute collaboratively.

from crewai import Agent, Task, Crew

researcher = Agent(role="Data Researcher", goal="Find user info", ...)
writer = Agent(role="Content Writer", goal="Write LinkedIn sections", ...)

research_task = Task(description="Fetch LinkedIn data for user", agent=researcher)
write_task = Task(description="Write about section", agent=writer)

crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = crew.kickoff()

Strengths

Incredibly easy to get started. The role/backstory metaphor is intuitive. A working multi-agent system in 20 lines.
Built-in delegation: agents can ask other agents for help.
Tool integration: You can give agents custom tools (function-calling).
Process types: Sequential, hierarchical, and soon more complex orchestration.
Active community with many examples and templates.

Weaknesses

Role/backstory is a prompt pattern, not architecture. The framework wraps everything in role-play prompts. This is effective but can feel like a gimmick for serious production systems.
Limited control: You get the CrewAI loop or nothing. Customizing the decision logic is harder than LangGraph.
Cost opacity: Each agent makes multiple LLM calls. It's easy to accidentally burn tokens.
Debugging: Less mature than LangSmith. Tracing multi-agent conversations is harder.

Sweet spot

Rapid prototyping and scenarios with distinct agent personas. Great for demos and MVPs.

✅ Why for Qwestly

You could prototype Qwestly's flow in a day: a LinkedInIngestionAgent, a CardGeneratorAgent, an OrchestratorAgent. The role-playing could work well for "personality" in user-facing interactions.

❌ Why not

The lack of fine-grained control may frustrate as you scale. CrewAI is great for getting something working fast; whether it stays great as you add complexity depends on your needs.

3. OpenAI Agents SDK

Language: Python License: MIT (open source, by OpenAI) Current: Launched 2025, rapidly maturing. Successor to the experimental Swarm project.

What it is

A lightweight, official OpenAI SDK for building agentic systems. Core concepts: Agent (LLM + tools), Handoff (agent-to-agent transfer), Guardrails (input/output validation).

from agents import Agent, Runner, handoff

orchestrator = Agent(
    name="Orchestrator",
    instructions="Route to the right specialist",
    handoffs=[
        handoff(linkedin_agent),
        handoff(card_gen_agent),
    ],
)
result = Runner.run(orchestrator, "Update my LinkedIn")

Strengths

Officially maintained by OpenAI — works perfectly with their API (GPT-4o, o3, etc.).
Simplest API of any framework. Three concepts (Agent, Handoff, Guardrail) cover 90% of use cases.
Handoffs are elegant: one agent can pass control to another with full context.
Guardrails as first-class concept: input guardrails (before agent runs) and output guardrails (before response goes to user).
Streaming is built-in and works well.
Traceability: Built-in tracing via OpenAI dashboard (similar to LangSmith but OpenAI-native).

Weaknesses

OpenAI-locked. You can theoretically use it with other providers via custom adapters, but it's clearly designed for OpenAI. If you want flexibility (Anthropic, Gemini, open-source models), this is a limitation.
Younger ecosystem. Fewer community integrations than LangChain/CrewAI. You'll write more custom code for things like vector DB integration.
Simpler = less powerful. The handoff model is clean but can't express complex graph topologies. Fine for supervisor → worker, harder for nested workflows.
Dependency on OpenAI. If OpenAI changes pricing, deprecates features, or has an outage, your system is affected. Not a dealbreaker but a risk.

Sweet spot

Single-agent tool-use and simple supervisor/worker patterns where you're already using OpenAI.

✅ Why for Qwestly

If you're already on OpenAI and want the simplest possible path to a working orchestrator + specialist agents, the Agents SDK is compelling. Handoffs map nicely to Qwestly's intent-routing pattern.

❌ Why not

If you want multi-provider flexibility (e.g., Claude for creative writing + GPT-4o for structured data), this locks you in. Also, complex workflows may outgrow the handoff model.

4. Pydantic AI

Language: Python License: MIT Current: Quietly becoming one of the most popular "not-LangChain" agent frameworks. By the makers of Pydantic.

What it is

A type-safe agent framework built on Pydantic's validation layer. Agents are Python classes with typed tools and structured outputs. The focus is on correctness and developer experience.

from pydantic_ai import Agent, RunContext

class CardOutput(BaseModel):
    sections: list[CardSection]
    summary: str

card_agent = Agent(
    "openai:gpt-4o",
    result_type=CardOutput,
    system_prompt="Generate Qwestly Cards",
)

@card_agent.tool
async def get_user_profile(ctx: RunContext, user_id: str) -> UserProfile:
    """Fetch user profile data"""
    return await db.get_user(user_id)

result = await card_agent.run("Make a card for user 42")
# result.data is a typed CardOutput — guaranteed by Pydantic

Strengths

Type safety is the core philosophy. Every tool input, tool output, and agent result is validated by Pydantic. No runtime surprises.
Best-in-class structured output. The result_type parameter ensures the LLM's final output matches your schema — with automatic retries if it doesn't.
Lightweight. No heavy abstractions. Just agents, tools, and dependencies.
Multi-provider: OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama (local models) — all first-class.
Excellent DX. Dependency injection, testability, mypy/pyright support.
Logfire (by Pydantic) for observability — similar to LangSmith but Pydantic-native.

Weaknesses

No built-in multi-agent orchestration. You build that yourself. No handoffs, no subgraphs, no crew. You can compose agents manually, but there's no framework for it.
Smaller ecosystem. Fewer pre-built integrations. More "bring your own" for vector DB, memory, etc.
Younger. Active development, but less battle-tested at scale than LangGraph.
Python only. No TypeScript version.

Sweet spot

Type-safe, production-grade single agents and simple orchestrator patterns. Excellent when correctness matters (structured outputs, validated tool calls).

✅ Why for Qwestly

This is actually a strong contender. You can build an orchestrator as an Agent with tools (including tools that invoke other Pydantic AI agents internally). The type safety means card generation outputs are guaranteed to match your schema. Good multi-provider support (Claude for creative writing, GPT for structured data).

❌ Why not

If you want a framework that handles multi-agent orchestration out of the box, this isn't it. You'll design your own routing layer. Also, less community support means more problems to solve yourself.

5. AutoGen (v0.4+, by Microsoft)

Language: Python, .NET License: MIT Current: Major rewrite in 2025 (v0.4). Now more modular and event-driven.

What it is

Microsoft's framework for multi-agent conversations. Agents communicate by sending messages to each other (publish/subscribe model). The focus is on flexible agent interactions.

Strengths

Flexible messaging model. Agents can have complex multi-turn conversations.
Strong code generation agents (useful for "write a script to do X").
Microsoft backing — heavy investment, active development.
GroupChat pattern: multiple agents discuss a task with a moderator guiding the conversation.

Weaknesses

v0.4 rewrite means many v0.2 examples and blog posts are obsolete. Documentation is still catching up.
Event-driven model is powerful but complex. Harder to reason about than a linear graph.
Code generation focus. Many features are optimized for coding agents, which may not map well to Qwestly's career-agent use case.

Sweet spot

Research scenarios, multi-agent debate, code generation.

Verdict for Qwestly

Probably not the best fit. The core value prop (agent conversations, code gen) doesn't align with Qwestly's needs (intent routing, structured data retrieval, content generation). The complexity isn't justified.

6. Semantic Kernel (by Microsoft)

Language: C# (primary), Python (secondary) License: MIT Current: Enterprise-focused, well-integrated with Azure.

What it is

Microsoft's lightweight SDK for integrating LLMs into traditional applications. Agents, plugins, memory, planners — all designed to fit into .NET enterprise architectures.

Strengths

Best .NET support by far. If your stack is C#, this is the obvious choice.
Strong enterprise features: telemetry, AAD auth, Azure integration.
Planner: an auto-generated step-by-step plan for multi-step tasks.
Plugin ecosystem: connectors for Office 365, SharePoint, Dynamics, etc.

Weaknesses

Python support is second-class. The Python SDK lags behind the C# one. If you want Python, other frameworks are better.
Less flexible than LangGraph for complex agent topologies.
Smaller community compared to LangChain.

Verdict for Qwestly

If you're in a .NET shop and Qwestly's backend is C#, absolutely consider it. If you're going Python (which you said you're leaning toward), skip this — it's not competitive with Python-native frameworks.

7. Dify

Language: Platform (backend Python, frontend TS) License: Apache 2.0 (open source) Current: Very popular for no-code/low-code LLMOps.

What it is

An open-source LLM application platform with a visual editor. You build agents, RAG pipelines, and workflows through a drag-and-drop interface, with code extensions for custom logic.

Strengths

Visual workflow builder — non-engineers can build and modify agent flows.
All-in-one: RAG, agent, workflow, monitoring, all in one platform.
Self-hostable (Docker).
Excellent for rapid prototyping — you can have a working chatbot with RAG in an hour.

Weaknesses

It's a platform, not a library. You build inside Dify, not in your own codebase. This creates coupling to Dify's deployment model.
Custom code feels bolted-on. Complex logic requires writing Python code blocks, but they run in Dify's sandbox, which is restrictive.
Not great for complex orchestration. The visual workflow model breaks down for nested agents, dynamic routing, etc.
Scaling concerns. Self-hosted Dify can be resource-intensive. The cloud version has pricing per app.

Verdict for Qwestly

Great for prototyping a Qwestly chatbot quickly (you could build it in a day). But for production with custom agent logic, data pipelines, and complex orchestration, you'll likely hit Dify's ceiling. Consider it as an internal tool (e.g., customer support agent) rather than the core orchestration layer.

8. Agno (formerly Phidata)

Language: Python License: MPL-2.0 Current: Rebranded from Phidata in 2025. Focus on multi-modal agents.

What it is

A full-stack agent framework with emphasis on knowledge bases, multi-modal capabilities (images, audio, video), and data analysis.

Strengths

Built-in knowledge bases with automatic embeddings.
Multi-modal support: agents that can process images, PDFs, structured data.
Beautiful agent UI included (playground for testing).
Tool integrations for common data sources.

Weaknesses

Fewer integrations than LangChain for specialized tools/vector DBs.
Younger community. Less battle-tested.
Multi-modal focus may not be relevant for Qwestly (unless you're processing images/PDFs heavily).

Verdict for Qwestly

Worth a look if Qwestly needs to process PDFs (existing resumes, LinkedIn PDFs) or images (profile photos, screenshots). Not the top choice for pure text-based orchestration.

9. Haystack (by deepset)

Language: Python License: Apache 2.0 Current: v2.x, focused on RAG pipelines with agent capabilities.

What it is

A framework for building search and RAG pipelines, with recent agent features added. Strongest when your primary need is retrieval.

Strengths

Best RAG pipeline builder in the ecosystem. Document processing, chunking, embedding, hybrid search — all first-class.
Production-grade: used by enterprises for search systems.
Multi-provider: OpenAI, Cohere, Anthropic, local models via Hugging Face.
Pipeline visualization (built-in tracing).

Weaknesses

Agent features are newer and less mature than the RAG pipeline features.
Multi-agent orchestration is not a core strength. Haystack is a RAG framework with agent extras, not an agent framework with RAG extras.
Smaller agent community than LangChain/CrewAI.

Verdict for Qwestly

If RAG is a primary feature (users asking free-form questions about their own data), Haystack is excellent — but you'd likely pair it with a separate orchestration framework. As a sole framework for Qwestly's agent orchestration, it's not the right fit.

10. Vercel AI SDK

Language: TypeScript License: Apache 2.0 Current: Very active, dominant in the Next.js ecosystem.

What it is

A TypeScript SDK for building streaming AI interfaces. Includes tool-use, multi-step agents, and RAG — all designed for frontend-heavy architectures.

Strengths

Best streaming of any framework. Built for UX where users see tokens as they're generated.
First-class tool-use with type-safe tool definitions.
Seamless Next.js integration (server components, edge runtime, etc.).
Provider-agnostic: OpenAI, Anthropic, Google, Mistral, and open-source models via AI SDK providers.

Weaknesses

TypeScript only. If you want Python, this isn't it.
Frontend-focused. The agent loop runs on the server, but the SDK's design priorities are UI-centric. Backend-heavy architectures may find it awkward.
Not designed for multi-agent orchestration. You'd build your own supervision layer.

Verdict for Qwestly

If Qwestly's frontend is Next.js and you want the agent loop to run close to the UI (edge functions, instant streaming), this is compelling. But you said you're leaning toward Python, and the orchestration logic is back-end work — so this is probably not the primary framework for your use case. Could be the frontend layer that talks to a Python orchestration backend.

11. Build from Scratch

Using: OpenAI / Anthropic / Gemini API directly + your own loop.

Pros

No dependency on framework churn. The API surface of LLM providers changes slower than any framework.
Full understanding of every line. When something breaks, you know exactly where.
Minimal overhead. No abstractions you don't need.
Exact fit: You build exactly what Qwestly needs, nothing more.

Cons

You implement everything: tool-calling loop, retry logic, streaming, error handling, state management, conversation memory, tracing.
No LangSmith / Logfire observability out of the box (but you can add OpenTelemetry).
Lower velocity early on. Frameworks give you a head start on common patterns.

Recommended approach

Start with a lightweight framework (Pydantic AI or OpenAI Agents SDK), not from scratch. The "trivial" parts (tool-calling loop, error recovery, streaming) are actually not trivial to get right. Once you understand the patterns, you can always replace the framework later — the core logic (your tools, prompts, data models) is framework-agnostic.

Python vs. TypeScript — The Decision

Python

Pro	Con
Dominant ecosystem for AI/ML	Async can be tricky (but asyncio is good now)
Every framework listed above is Python-first	Weaker type system than TS (but Pydantic + mypy close the gap)
Richer data science / NLP libraries	Some deployment platforms favor Node.js
Larger talent pool for AI engineers

TypeScript

Pro	Con
Excellent type system	Fewer AI-native frameworks
Better async model (Promises are ergonomic)	Most TS frameworks are wrappers around Python ones
Vercel AI SDK is excellent for streaming UI	Data science / NLP is less natural
Edge runtime deployment	Smaller AI engineering talent pool

Verdict

Go Python. The ecosystem advantage is decisive for an AI-native startup. Every framework, every vector DB library, every embedding model has Python as a first-class citizen. TypeScript's advantages (type safety, async, edge runtime) are real but not enough to outweigh Python's ecosystem for this use case.

If you want TypeScript for the frontend and Python for the orchestration backend, that's a very common and good architecture.

My Framework Tier List for Qwestly

Tier 1 (Strongly Consider)

Framework	Why
Pydantic AI	Type-safe, lightweight, multi-provider, excellent structured output. Build your own orchestrator on top. Best "I know what I'm doing" choice.
OpenAI Agents SDK	Simplest path if you're locked into OpenAI. Handoffs map well to Qwestly's routing needs.
LangGraph	Most powerful and flexible. Best for complex production systems. Pick this if you anticipate sophisticated workflows.

Tier 2 (Worth Evaluating)

Framework	Why
CrewAI	Fastest path to a working prototype. Evaluate whether the role-play model works for your team.
Dify	Good for internal tools and rapid prototyping. Not for the core agent runtime.

Tier 3 (Skip for Qwestly)

Framework	Why
AutoGen	Code-gen focus, complex model, doesn't align with Qwestly's needs.
Semantic Kernel	Only if you're on .NET.
Haystack	Great RAG, but not an orchestration framework. Use for RAG, not for agents.
Vercel AI SDK	If you go Python, this doesn't apply. If you go TS, it's worth a look for the frontend, not the backend.
Agno	Multi-modal focus not relevant. Smaller ecosystem.

Referenced by

index

_private/qwestly-docs/Features/qwestly-agent/framework-comparison.md

Framework Comparison — Agentic Orchestration in 2026

Quick Decision Matrix

1. LangGraph (by LangChain)

What it is

Strengths

Weaknesses

Sweet spot

✅ Why for Qwestly

❌ Why not

2. CrewAI

What it is

Strengths

Weaknesses

Sweet spot

✅ Why for Qwestly

❌ Why not

3. OpenAI Agents SDK

What it is

Strengths

Weaknesses

Sweet spot

✅ Why for Qwestly

❌ Why not

4. Pydantic AI

What it is

Strengths

Weaknesses

Sweet spot

✅ Why for Qwestly

❌ Why not

5. AutoGen (v0.4+, by Microsoft)

What it is

Strengths

Weaknesses

Sweet spot

Verdict for Qwestly

6. Semantic Kernel (by Microsoft)

What it is

Strengths

Weaknesses

Verdict for Qwestly

7. Dify

What it is

Strengths

Weaknesses

Verdict for Qwestly

8. Agno (formerly Phidata)

What it is

Strengths

Weaknesses

Verdict for Qwestly

9. Haystack (by deepset)

What it is

Strengths

Weaknesses

Verdict for Qwestly

10. Vercel AI SDK

What it is

Strengths

Weaknesses

Verdict for Qwestly

11. Build from Scratch

Pros

Cons

Recommended approach

Python vs. TypeScript — The Decision

Python

TypeScript

Verdict

My Framework Tier List for Qwestly

Tier 1 (Strongly Consider)

Tier 2 (Worth Evaluating)

Tier 3 (Skip for Qwestly)

Referenced by