_private/qwestly-hire-docs/features/prompt-eval-job-analyzer.md

Prompt Validation Contract: Job Analyzer

Document Type: Prompt Validation Contract
Purpose: Define expected behavior, test scenarios, and evaluation metrics for LLM prompt
Status: Production Ready


Overview

Goal: Extract structured information from job postings with zero hallucinationโ€”only extract information explicitly present in source text.

Problem: LLM was generating fabricated job descriptions (responsibilities, skills, compensation) when given minimal input like job titles only, corrupting database accuracy.

Solution: Explicit omission rulesโ€”extract only what's present, omit fields entirely if not mentioned. Better incomplete than incorrect.

Status: Production Ready

Metric Target Actual Status
Hallucination Rate < 5% 0% โœ… EXCEEDS
Extraction Accuracy > 95% 100% โœ… EXCEEDS
Test Coverage 100% 100% โœ… MEETS

Evaluation

Test Results: 26/26 scenarios passed | Duration: 55.39s | Date: March 11, 2026
Test Suite: __tests__/llm/job-analyzer.test.ts

Test Files  1 passed (1)
     Tests  26 passed (26)
  Duration  55.39s

Key Improvements:

  • Equity fields correctly omitted when not mentioned
  • Job family allows omission for ambiguous titles
  • Explicit omission rules prevent inference
  • Field-specific extraction guidelines prevent over-extraction

Conclusion: All success criteria met. Production-ready.


Background

Developed to address data quality issues where LLM fabricated information from minimal inputs (e.g., title-only). Solution prioritizes omission over inferenceโ€”explicit rules prevent hallucination while maintaining extraction accuracy.

Design Principles:

  • Omission over inference (incomplete > incorrect)
  • Field-specific extraction rules
  • Test-driven validation (26 scenarios)
  • Production-first approach

Technical Reference

Field Extraction Rules

Field Rule Example
title Always required. Extract verbatim from input. "Senior Engineer" โ†’ "Senior Engineer"
employment_type Only if explicitly mentioned ("full-time", "part-time", etc.). Do not infer from title. "Senior Engineer" โ†’ null
location_type Only if explicitly mentioned ("remote", "hybrid", "in-person"). Do not infer from location strings. "San Francisco" โ†’ null
description Only if paragraph description exists in input. Do not generate from title. May be rephrased/cleaned while preserving meaning. Title only โ†’ null
responsibilities Extract if explicitly mentioned in description (bullet points, numbered lists, or clearly stated in paragraphs). Empty array if none. Title only โ†’ []
must_have.skills Only if explicitly listed as requirements. Do not infer from title or context. Title only โ†’ []
must_have.years_of_experience Only if explicitly mentioned ("3+ years", "5-7 years"). Do not infer from "Senior" title. "Senior Engineer" โ†’ omit
location Only if explicitly mentioned. Do not infer from company name. Omit if not present
company_name Only if explicitly mentioned. Do not infer or fabricate. Omit if not present
org_unit_name Only if department/team name explicitly mentioned. Omit if not present
team Only if team description section exists. May be rephrased while preserving meaning. Omit if not present
compensation Only if salary/compensation explicitly mentioned. Do not infer or generate ranges. Omit if not present
compensation.base_annual_salary.min Only if minimum salary explicitly mentioned. Omit if not present
compensation.base_annual_salary.max Only if maximum salary explicitly mentioned. Omit if not present
compensation.base_annual_salary.currency Only if currency explicitly mentioned. Omit if not present
benefits Only if benefits section exists. Do not generate generic benefits. May be rephrased while preserving meaning. Omit if not present
has_equity Only if equity/stock options explicitly mentioned. Do not infer. Omit if not present
equity_details Only if equity details explicitly provided. May be rephrased while preserving meaning. Omit if not present
must_have.industry_experience Only if explicitly listed in requirements section. Omit if not present
nice_to_have.skills Only if explicitly listed in "nice to have" or "preferred" section. Omit if not present
nice_to_have.soft_skills Only if explicitly listed in preferred section. Omit if not present
nice_to_have.company_stage_experience Only if explicitly listed as preferred. Omit if not present
nice_to_have.industry_experience Only if explicitly listed as preferred (marked as "plus" or "nice to have"). Omit if not present
job_family Can infer from title if clear ("Software Engineer" โ†’ "engineering"). Use "other" if ambiguous. "Senior Engineer" โ†’ "engineering"
is_epd Set to true if EPD explicitly mentioned OR if job_family is engineering/design/product. Omit if job_family is "other". "Head of EPD" โ†’ true

All other fields: Follow same principle - only populate if explicitly present in input.


Test Scenarios

# Scenario Expected Behavior
1 Title only Only title populated. All other fields null/[]/omitted.
2 Title + employment/location type Extract employment_type and location_type only. No fabricated descriptions or skills.
3 Description + responsibilities, no skills Extract description and responsibilities. must_have.skills = [].
4 Skills mentioned in context (not requirements) Extract description only. must_have.skills = [] (skills in context, not listed as requirements).
5 Employment type in title Extract title and infer employment_type from title (e.g., "Contract Engineer").
6 Location mentioned, no work arrangement Extract location and description. location_type = null (work arrangement not mentioned).
7 Paragraph description only, no bullets Extract description. Extract responsibilities if clearly mentioned in description text, otherwise [].
8 Full requirements section Extract must_have.skills, must_have.years_of_experience, must_have.soft_skills from requirements section.
9 Years of experience + company stage Extract must_have.years_of_experience and must_have.company_stage_experience. Extract nice_to_have.industry_experience if marked as "plus".
10 Complete job posting with compensation Extract all mentioned fields including compensation and benefits. No fabricated skills or responsibilities.
11 EPD role explicitly mentioned Extract title and set is_epd=true. No fabricated details.
12 Empty/minimal input Caught by validation. If reaches LLM, return minimal with title = "Untitled Job".
13 Company name not mentioned Extract title and description. company_name should be omitted (not fabricated).
14 Company name explicitly mentioned Extract company_name only if explicitly stated in text.
15 Org unit name mentioned Extract org_unit_name only if department/team name is explicitly mentioned (e.g., "Engineering Department", "Product Team").
16 Team description section Extract team field only if "About the Team" or similar section exists. Omit if not present.
17 Equity mentioned Extract has_equity=true and equity_details only if equity/stock options explicitly mentioned. Do not infer.
18 Equity not mentioned Extract title and other fields. has_equity and equity_details should be omitted (not inferred).
19 Nice-to-have skills section Extract nice_to_have.skills only if "Nice to have" or "Preferred" section exists with skills listed.
20 Nice-to-have soft skills Extract nice_to_have.soft_skills only if explicitly listed in preferred section. Omit if not present.
21 Nice-to-have company stage Extract nice_to_have.company_stage_experience only if explicitly listed as preferred.
22 Must-have industry experience Extract must_have.industry_experience only if explicitly listed in requirements (e.g., "5+ years in fintech").
23 Compensation with range and currency Extract compensation.base_annual_salary.min, max, and currency only if all explicitly mentioned (e.g., "$120,000 - $150,000 USD").
24 Compensation partial information If only salary range mentioned without currency, extract min and max only. currency should be omitted.
25 Job family ambiguous title For ambiguous titles (e.g., "Technical Lead", "Manager"), set job_family = "other". Do not infer.
26 Multiple locations mentioned Extract location as array if multiple locations mentioned. location_type should be omitted unless work arrangement explicitly stated.

Test Examples

Note: Descriptions and text fields may be rephrased or cleaned up during extraction. The key requirement is that the extracted content must be traceable to the source text and contain the same information, not that it matches verbatim.

Scenario 1: Title only

Input:

Senior Software Engineer

Expected Output:

{
  "title": "Senior Software Engineer",
  "job_family": "engineering"
}

Negative Assertions: description, responsibilities, must_have, company_name, location, compensation, benefits should all be omitted.


Scenario 2: Title + employment/location type

Input:

Full-time Remote Senior Engineer

Expected Output:

{
  "title": "Full-time Remote Senior Engineer",
  "employment_type": "full-time",
  "location_type": "remote",
  "job_family": "engineering"
}

Negative Assertions: description, responsibilities, must_have.skills, compensation should all be omitted.


Scenario 3: Description + responsibilities, no skills

Input:

Software Engineer

We are looking for a software engineer to join our team.

Responsibilities:
- Build and maintain web applications
- Write clean, maintainable code
- Collaborate with cross-functional teams

Expected Output:

{
  "title": "Software Engineer",
  "description": "We are seeking a software engineer to join our team.",
  "responsibilities": [
    "Build and maintain web applications",
    "Write clean, maintainable code",
    "Collaborate with cross-functional teams"
  ],
  "job_family": "engineering"
}

Note: Description may be rephrased (e.g., "looking for" โ†’ "seeking") as long as the meaning is preserved.

Negative Assertions: must_have.skills should be omitted (no requirements section exists).


Scenario 4: Skills mentioned in context (not requirements)

Input:

Software Engineer

We use Python and JavaScript in our tech stack. The ideal candidate will work with these technologies daily.

Expected Output:

{
  "title": "Software Engineer",
  "description": "Our tech stack includes Python and JavaScript. Candidates will work with these technologies on a daily basis.",
  "job_family": "engineering"
}

Note: Description may be rephrased while preserving the same information about the tech stack.

Negative Assertions: must_have.skills should be omitted (skills mentioned in context, not as requirements).


Scenario 5: Employment type in title

Input:

Contract Engineer

Expected Output:

{
  "title": "Contract Engineer",
  "employment_type": "contract",
  "job_family": "engineering"
}

Scenario 6: Location mentioned, no work arrangement

Input:

Senior Engineer

Location: San Francisco

We are looking for an engineer based in San Francisco.

Expected Output:

{
  "title": "Senior Engineer",
  "location": ["San Francisco"],
  "description": "We are seeking an engineer located in San Francisco.",
  "job_family": "engineering"
}

Note: Description may be rephrased (e.g., "based in" โ†’ "located in") while preserving location information.

Negative Assertions: location_type should be omitted (work arrangement not mentioned).


Scenario 7: Paragraph description only, no bullets

Input:

Product Manager

We are seeking a Product Manager who will be responsible for defining product strategy, working with engineering teams, and launching new features. The role involves gathering requirements, prioritizing features, and coordinating releases.

Expected Output:

{
  "title": "Product Manager",
  "description": "We are looking for a Product Manager responsible for product strategy, engineering collaboration, and feature launches. The role includes requirements gathering, feature prioritization, and release coordination.",
  "responsibilities": [],
  "job_family": "product"
}

Note: Description may be rephrased and condensed while preserving key information. If responsibilities are clearly mentioned in paragraph form, they may be extracted. Otherwise, empty array.


Scenario 8: Full requirements section

Input:

Senior Engineer

Requirements:
- 5+ years of software engineering experience
- Python, JavaScript, React
- Strong communication skills
- Ability to work in a fast-paced environment

Expected Output:

{
  "title": "Senior Engineer",
  "must_have": {
    "years_of_experience": {
      "min": 5,
      "field": "software engineering"
    },
    "skills": ["Python", "JavaScript", "React"],
    "soft_skills": ["Strong communication skills", "Ability to work in a fast-paced environment"]
  },
  "job_family": "engineering"
}

Scenario 9: Years of experience + company stage

Input:

Senior Engineer

Requirements:
- 5+ years of experience
- Experience at early-stage startups

Nice to have:
- Fintech industry experience (plus)

Expected Output:

{
  "title": "Senior Engineer",
  "must_have": {
    "years_of_experience": {
      "min": 5
    },
    "company_stage_experience": ["early-stage startups"]
  },
  "nice_to_have": {
    "industry_experience": ["Fintech"]
  },
  "job_family": "engineering"
}

Scenario 10: Complete job posting with compensation

Input:

Senior Software Engineer

We are looking for a Senior Software Engineer to join our team.

Responsibilities:
- Design and implement scalable systems
- Mentor junior engineers
- Participate in code reviews

Requirements:
- 5+ years of experience
- Python, JavaScript

Salary: $150,000 - $180,000 USD

Benefits: Health insurance, 401k, unlimited PTO

Expected Output:

{
  "title": "Senior Software Engineer",
  "description": "We are seeking a Senior Software Engineer to join our team.",
  "responsibilities": [
    "Design and implement scalable systems",
    "Mentor junior engineers",
    "Participate in code reviews"
  ],
  "must_have": {
    "years_of_experience": {
      "min": 5
    },
    "skills": ["Python", "JavaScript"]
  },
  "compensation": {
    "base_annual_salary": {
      "min": 150000,
      "max": 180000,
      "currency": "USD"
    }
  },
  "benefits": "Health insurance, 401k, unlimited PTO",
  "job_family": "engineering"
}

Note: Description may be rephrased while preserving the core message.


Scenario 11: EPD role explicitly mentioned

Input:

Head of EPD

Expected Output:

{
  "title": "Head of EPD",
  "is_epd": true,
  "job_family": "other"
}

Note: EPD roles should set is_epd=true even if job_family is "other".


Scenario 12: Empty/minimal input

Input:

   

Expected Output:

{
  "title": "Untitled Job"
}

Note: Should be caught by validation before reaching LLM, but if it reaches LLM, return minimal output.


Scenario 13: Company name not mentioned

Input:

Senior Software Engineer

Expected Output:

{
  "title": "Senior Software Engineer",
  "job_family": "engineering"
}

Negative Assertions: description, responsibilities, must_have, company_name, location, compensation, benefits should all be omitted.


Scenario 13: Company name not mentioned

Input:

Senior Software Engineer

We are looking for an experienced engineer to join our team. You will work on building scalable systems.

Expected Output:

{
  "title": "Senior Software Engineer",
  "description": "We are seeking an experienced engineer to join our team. You will work on building scalable systems.",
  "job_family": "engineering"
}

Note: Description may be rephrased while preserving the core information.

Negative Assertions: company_name must be omitted (not fabricated).


Scenario 14: Company name explicitly mentioned

Input:

Senior Software Engineer at Acme Corp

We are looking for an experienced engineer to join Acme Corp's engineering team.

Expected Output:

{
  "title": "Senior Software Engineer at Acme Corp",
  "company_name": "Acme Corp",
  "description": "We are seeking an experienced engineer to join Acme Corp's engineering team.",
  "job_family": "engineering"
}

Note: Description may be rephrased while preserving company name and core information.


Scenario 15: Org unit name mentioned

Input:

Product Manager

Join our Product Team. We are looking for a Product Manager in the Engineering Department.

Expected Output:

{
  "title": "Product Manager",
  "org_unit_name": "Product Team",
  "description": "Join our Product Team. We are seeking a Product Manager for the Engineering Department.",
  "job_family": "product"
}

Note: Description may be rephrased while preserving org unit information. If multiple department names are mentioned, extract the most relevant one or the first one.


Scenario 16: Team description section

Input:

Software Engineer

About the Team:
Our engineering team is passionate about building great products. We value collaboration and innovation.

Job Description:
We are looking for a software engineer...

Expected Output:

{
  "title": "Software Engineer",
  "team": "Our engineering team is passionate about building great products. We value collaboration and innovation.",
  "description": "We are seeking a software engineer...",
  "job_family": "engineering"
}

Note: Team description may be rephrased while preserving the core message about the team's values and culture.

Negative Assertions: If no "About the Team" section exists, team should be omitted.


Scenario 17: Equity mentioned

Input:

Senior Engineer

We offer competitive salary plus equity package. Stock options will vest over 4 years.

Expected Output:

{
  "title": "Senior Engineer",
  "description": "We offer competitive salary plus equity package. Stock options vest over 4 years.",
  "has_equity": true,
  "equity_details": "Stock options vest over 4 years",
  "job_family": "engineering"
}

Note: Description and equity_details may be rephrased while preserving the same information about equity and vesting.


Scenario 18: Equity not mentioned

Input:

Senior Engineer

We offer competitive salary and benefits.

Expected Output:

{
  "title": "Senior Engineer",
  "description": "We offer competitive salary and benefits.",
  "job_family": "engineering"
}

Note: Description may be rephrased while preserving information about compensation and benefits.

Negative Assertions: has_equity and equity_details must be omitted (not inferred).


Scenario 19: Nice-to-have skills section

Input:

Software Engineer

Requirements:
- 3+ years of experience
- Python, JavaScript

Nice to have:
- React experience
- TypeScript
- GraphQL

Expected Output:

{
  "title": "Software Engineer",
  "must_have": {
    "years_of_experience": {
      "min": 3
    },
    "skills": ["Python", "JavaScript"]
  },
  "nice_to_have": {
    "skills": ["React", "TypeScript", "GraphQL"]
  },
  "job_family": "engineering"
}

Negative Assertions: Skills mentioned only in "Nice to have" should NOT appear in must_have.skills.


Scenario 20: Nice-to-have soft skills

Input:

Product Manager

Requirements:
- Strong communication skills
- 5+ years experience

Preferred:
- Leadership experience
- Public speaking ability

Expected Output:

{
  "title": "Product Manager",
  "must_have": {
    "years_of_experience": {
      "min": 5
    },
    "soft_skills": ["Strong communication skills"]
  },
  "nice_to_have": {
    "soft_skills": ["Leadership experience", "Public speaking ability"]
  },
  "job_family": "product"
}

Scenario 21: Nice-to-have company stage

Input:

Senior Engineer

Requirements:
- 5+ years experience

Nice to have:
- Experience at early-stage startups
- Experience at Series B+ companies

Expected Output:

{
  "title": "Senior Engineer",
  "must_have": {
    "years_of_experience": {
      "min": 5
    }
  },
  "nice_to_have": {
    "company_stage_experience": ["early-stage startups", "Series B+ companies"]
  },
  "job_family": "engineering"
}

Scenario 22: Must-have industry experience

Input:

Senior Engineer

Requirements:
- 5+ years of software engineering experience
- 3+ years in fintech industry
- Experience with payment systems

Expected Output:

{
  "title": "Senior Engineer",
  "must_have": {
    "years_of_experience": {
      "min": 5,
      "field": "software engineering"
    },
    "industry_experience": ["fintech"],
    "skills": ["payment systems"]
  },
  "job_family": "engineering"
}

Scenario 23: Compensation with range and currency

Input:

Senior Engineer

Salary: $120,000 - $150,000 USD per year

Expected Output:

{
  "title": "Senior Engineer",
  "compensation": {
    "base_annual_salary": {
      "min": 120000,
      "max": 150000,
      "currency": "USD"
    }
  },
  "job_family": "engineering"
}

Note: Currency codes should be standardized (USD, EUR, GBP, etc.).


Scenario 24: Compensation partial information

Input:

Senior Engineer

Salary: $120,000 - $150,000 per year

Expected Output:

{
  "title": "Senior Engineer",
  "compensation": {
    "base_annual_salary": {
      "min": 120000,
      "max": 150000
    }
  },
  "job_family": "engineering"
}

Negative Assertions: currency should be omitted if not explicitly mentioned.


Scenario 25: Job family ambiguous title

Input:

Technical Lead

Expected Output:

{
  "title": "Technical Lead",
  "job_family": "other"
}

Note: Titles like "Manager", "Lead", "Director" without clear domain context should use "other".


Scenario 26: Multiple locations mentioned

Input:

Senior Engineer

Locations: San Francisco, New York, Remote

We are looking for engineers in multiple locations.

Expected Output:

{
  "title": "Senior Engineer",
  "location": ["San Francisco", "New York", "Remote"],
  "description": "We are seeking engineers for multiple locations.",
  "job_family": "engineering"
}

Note: Description may be rephrased while preserving location information.

Negative Assertions: location_type should be omitted unless work arrangement (remote/hybrid/in-person) is explicitly stated separately from location names.


Success Criteria

  • Hallucination Rate: < 5% of fields fabricated when input is minimal
  • Extraction Accuracy: > 95% of populated fields traceable to source text
  • Test Coverage: 100% of scenarios pass

Implementation Requirements

  1. Prompt Updates: Add explicit "no fabrication" rule and field-specific extraction instructions
  2. Validation: Post-process to ensure extracted fields exist in source text
  3. Tests: Comprehensive test suite covering all scenarios above