_private/qwestly-docs/Engineering/prompts-and-traces-api.md

Prompts & Traces API

POST /api/prompts/invoke and GET /api/traces/{run_id} — LangSmith prompt execution and trace polling, living in api-python.

Overview

These endpoints let callers pull a versioned prompt from LangSmith Hub, invoke it with overridable models (OpenAI or DeepSeek), and then poll for the result via the traces endpoint. They are the backend for generative features like interview evals and Linkedin About generation.

Both endpoints require QWESTLY_SERVICE_API_KEY authentication via the service gateway.


POST /api/prompts/invoke

Pull a prompt from LangSmith by name (and optional commit hash) and invoke it with the given input. Supports synchronous and fire-and-forget async modes.

Auth

X-API-Key header with QWESTLY_SERVICE_API_KEY.

Request body

{
  "prompt_name": "linkedin-about",
  "prompt_version": "abc1234",
  "input": { "profile": "...", "tone": "professional" },
  "model": { "name": "deepseek:deepseek-v4-pro", "temperature": 0.7 },
  "surface": "interview-eval",
  "user_id": "auth0|abc123"
}
Field Type Required Description
prompt_name string yes LangSmith Hub prompt name
prompt_version string no Specific commit hash; omit for latest
input object no Prompt input variables (default: {})
model object no Model override (see Model override)
surface string no Surface label written to llm_traces for analytics
user_id string no User ID written to llm_traces for attribution

Model override

{ "name": "deepseek:deepseek-v4-pro", "temperature": 0.7 }
Field Type Required Description
name string yes Model identifier (see below)
temperature float no Sampling temperature (provider default if omitted)

Extra fields are passed through to the underlying chat model constructor. Model names use a provider:model-name convention:

Name Resolves to
gpt-5.4 ChatOpenAI(model="gpt-5.4")
openai:gpt-5.4 Same — explicit openai: prefix is stripped
deepseek-v4-pro ChatDeepSeek(model="deepseek-v4-pro") with reasoning disabled
deepseek:deepseek-v4-pro Same — explicit deepseek: prefix is stripped

When model is omitted, the LangSmith Hub prompt's bundled model is used with environment-provided secrets.

Query parameters

Param Type Default Description
output string If "json", rewrites text content blocks into parsed JSON blocks
async bool false If true, fire-and-forget (see Async mode)

Sync response (async=false, default)

{
  "success": true,
  "result": { "type": "text", "text": "Generated content..." },
  "trace_id": "run-abc123..."
}

Async mode (async=true)

When async=true, the LLM call runs in a background thread. The endpoint returns immediately with a trace_id:

{ "success": true, "trace_id": "run-abc123..." }

The caller should poll GET /api/traces/{trace_id} until status is "completed" or "failed".

On dispatch, an llm_traces MongoDB record is written with:

  • status: "running"
  • prompt_used (name + version)
  • surface and user_id (if provided)

A 10-second timeout applies while waiting for LangSmith to produce a trace ID. If the timeout is hit, a 504 is returned.

Error responses

Status When
401 LangSmith authentication failed
404 Prompt not found in LangSmith Hub
502 LangSmith general error
503 LangSmith API key not configured
504 Trace ID not returned within 10s (async mode)
500 Unexpected error

GET /api/traces/{run_id}

Poll a LangSmith trace by its run ID. Returns the current status and, when terminal, the extracted output.

Auth

X-API-Key header with QWESTLY_SERVICE_API_KEY.

Path parameter

Param Type Description
run_id string LangSmith trace ID (from POST /api/prompts/invoke)

Responses

Running (still in progress):

{ "success": true, "status": "running", "output": null, "error": null }

Completed:

{ "success": true, "status": "completed", "output": "Generated text content...", "error": null }

Failed:

{ "success": true, "status": "failed", "output": null, "error": "Error message from LangSmith" }

Not found (ingestion lag, or unknown ID — treat as "running"):

{ "success": true, "status": "running", "output": null, "error": null }

llm_traces write

When the trace first reaches a terminal state (completed or failed), the endpoint upserts an llm_traces MongoDB record with the output and status. This happens even if the caller didn't use async mode — any poll of a completed trace creates the record. DB failures are silently swallowed; they never break the trace response.

Error responses

Status When
502 LangSmith client error
503 LangSmith API key not configured
504 LangSmith poll timed out (30s)

Polling flow (example)

POST /api/prompts/invoke?async=true  →  { success: true, trace_id: "run-xyz" }
GET  /api/traces/run-xyz             →  { status: "running" }
GET  /api/traces/run-xyz             →  { status: "running" }
GET  /api/traces/run-xyz             →  { status: "completed", output: "..." }

Database: llm_traces collection

Stored in the qwestly_internal MongoDB database. Records are written:

  1. On async dispatchstatus: "running" with prompt_used, surface, user_id
  2. On sync completionstatus: "completed" with full output
  3. On first terminal pollstatus: "completed" or "failed" with output/error
Field Type Description
trace_id string LangSmith run ID
status string "running", "completed", or "failed"
prompt_used object { name, version } from the request
surface string Surface label (e.g. "interview-eval")
user_id string Auth0 user ID
input object Prompt input variables
output any LLM result (when completed)
error string Error message (when failed)
metadata object Arbitrary metadata
created_at datetime Record creation time
updated_at datetime Last update time

Document owner: Engineering
Last updated: June 2026