_private/qwestly-docs/Engineering/prompts-and-traces-api.md
Table of Contents
Prompts & Traces API
POST /api/prompts/invoke and GET /api/traces/{run_id} — LangSmith prompt execution and trace polling, living in api-python.
Overview
These endpoints let callers pull a versioned prompt from LangSmith Hub, invoke it with overridable models (OpenAI or DeepSeek), and then poll for the result via the traces endpoint. They are the backend for generative features like interview evals and Linkedin About generation.
Both endpoints require QWESTLY_SERVICE_API_KEY authentication via the service gateway.
POST /api/prompts/invoke
Pull a prompt from LangSmith by name (and optional commit hash) and invoke it with the given input. Supports synchronous and fire-and-forget async modes.
Auth
X-API-Key header with QWESTLY_SERVICE_API_KEY.
Request body
{
"prompt_name": "linkedin-about",
"prompt_version": "abc1234",
"input": { "profile": "...", "tone": "professional" },
"model": { "name": "deepseek:deepseek-v4-pro", "temperature": 0.7 },
"surface": "interview-eval",
"user_id": "auth0|abc123"
}
| Field | Type | Required | Description |
|---|---|---|---|
prompt_name |
string | yes | LangSmith Hub prompt name |
prompt_version |
string | no | Specific commit hash; omit for latest |
input |
object | no | Prompt input variables (default: {}) |
model |
object | no | Model override (see Model override) |
surface |
string | no | Surface label written to llm_traces for analytics |
user_id |
string | no | User ID written to llm_traces for attribution |
Model override
{ "name": "deepseek:deepseek-v4-pro", "temperature": 0.7 }
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Model identifier (see below) |
temperature |
float | no | Sampling temperature (provider default if omitted) |
Extra fields are passed through to the underlying chat model constructor. Model names use a provider:model-name convention:
| Name | Resolves to |
|---|---|
gpt-5.4 |
ChatOpenAI(model="gpt-5.4") |
openai:gpt-5.4 |
Same — explicit openai: prefix is stripped |
deepseek-v4-pro |
ChatDeepSeek(model="deepseek-v4-pro") with reasoning disabled |
deepseek:deepseek-v4-pro |
Same — explicit deepseek: prefix is stripped |
When model is omitted, the LangSmith Hub prompt's bundled model is used with environment-provided secrets.
Query parameters
| Param | Type | Default | Description |
|---|---|---|---|
output |
string | — | If "json", rewrites text content blocks into parsed JSON blocks |
async |
bool | false |
If true, fire-and-forget (see Async mode) |
Sync response (async=false, default)
{
"success": true,
"result": { "type": "text", "text": "Generated content..." },
"trace_id": "run-abc123..."
}
Async mode (async=true)
When async=true, the LLM call runs in a background thread. The endpoint returns immediately with a trace_id:
{ "success": true, "trace_id": "run-abc123..." }
The caller should poll GET /api/traces/{trace_id} until status is "completed" or "failed".
On dispatch, an llm_traces MongoDB record is written with:
status: "running"prompt_used(name + version)surfaceanduser_id(if provided)
A 10-second timeout applies while waiting for LangSmith to produce a trace ID. If the timeout is hit, a 504 is returned.
Error responses
| Status | When |
|---|---|
401 |
LangSmith authentication failed |
404 |
Prompt not found in LangSmith Hub |
502 |
LangSmith general error |
503 |
LangSmith API key not configured |
504 |
Trace ID not returned within 10s (async mode) |
500 |
Unexpected error |
GET /api/traces/{run_id}
Poll a LangSmith trace by its run ID. Returns the current status and, when terminal, the extracted output.
Auth
X-API-Key header with QWESTLY_SERVICE_API_KEY.
Path parameter
| Param | Type | Description |
|---|---|---|
run_id |
string | LangSmith trace ID (from POST /api/prompts/invoke) |
Responses
Running (still in progress):
{ "success": true, "status": "running", "output": null, "error": null }
Completed:
{ "success": true, "status": "completed", "output": "Generated text content...", "error": null }
Failed:
{ "success": true, "status": "failed", "output": null, "error": "Error message from LangSmith" }
Not found (ingestion lag, or unknown ID — treat as "running"):
{ "success": true, "status": "running", "output": null, "error": null }
llm_traces write
When the trace first reaches a terminal state (completed or failed), the endpoint upserts an llm_traces MongoDB record with the output and status. This happens even if the caller didn't use async mode — any poll of a completed trace creates the record. DB failures are silently swallowed; they never break the trace response.
Error responses
| Status | When |
|---|---|
502 |
LangSmith client error |
503 |
LangSmith API key not configured |
504 |
LangSmith poll timed out (30s) |
Polling flow (example)
POST /api/prompts/invoke?async=true → { success: true, trace_id: "run-xyz" }
GET /api/traces/run-xyz → { status: "running" }
GET /api/traces/run-xyz → { status: "running" }
GET /api/traces/run-xyz → { status: "completed", output: "..." }
Database: llm_traces collection
Stored in the qwestly_internal MongoDB database. Records are written:
- On async dispatch —
status: "running"withprompt_used,surface,user_id - On sync completion —
status: "completed"with fulloutput - On first terminal poll —
status: "completed"or"failed"withoutput/error
| Field | Type | Description |
|---|---|---|
trace_id |
string | LangSmith run ID |
status |
string | "running", "completed", or "failed" |
prompt_used |
object | { name, version } from the request |
surface |
string | Surface label (e.g. "interview-eval") |
user_id |
string | Auth0 user ID |
input |
object | Prompt input variables |
output |
any | LLM result (when completed) |
error |
string | Error message (when failed) |
metadata |
object | Arbitrary metadata |
created_at |
datetime | Record creation time |
updated_at |
datetime | Last update time |
Document owner: Engineering
Last updated: June 2026