_private/qwestly-hire-docs/features/prompt-eval-job-analyzer.md
Table of Contents
Prompt Validation Contract: Job Analyzer
Document Type: Prompt Validation Contract
Purpose: Define expected behavior, test scenarios, and evaluation metrics for LLM prompt
Status: Production Ready
Overview
Goal: Extract structured information from job postings with zero hallucinationโonly extract information explicitly present in source text.
Problem: LLM was generating fabricated job descriptions (responsibilities, skills, compensation) when given minimal input like job titles only, corrupting database accuracy.
Solution: Explicit omission rulesโextract only what's present, omit fields entirely if not mentioned. Better incomplete than incorrect.
Status: Production Ready
| Metric | Target | Actual | Status |
|---|---|---|---|
| Hallucination Rate | < 5% | 0% | โ EXCEEDS |
| Extraction Accuracy | > 95% | 100% | โ EXCEEDS |
| Test Coverage | 100% | 100% | โ MEETS |
Evaluation
Test Results: 26/26 scenarios passed | Duration: 55.39s | Date: March 11, 2026
Test Suite: __tests__/llm/job-analyzer.test.ts
Test Files 1 passed (1)
Tests 26 passed (26)
Duration 55.39s
Key Improvements:
- Equity fields correctly omitted when not mentioned
- Job family allows omission for ambiguous titles
- Explicit omission rules prevent inference
- Field-specific extraction guidelines prevent over-extraction
Conclusion: All success criteria met. Production-ready.
Background
Developed to address data quality issues where LLM fabricated information from minimal inputs (e.g., title-only). Solution prioritizes omission over inferenceโexplicit rules prevent hallucination while maintaining extraction accuracy.
Design Principles:
- Omission over inference (incomplete > incorrect)
- Field-specific extraction rules
- Test-driven validation (26 scenarios)
- Production-first approach
Technical Reference
Field Extraction Rules
| Field | Rule | Example |
|---|---|---|
title |
Always required. Extract verbatim from input. | "Senior Engineer" โ "Senior Engineer" |
employment_type |
Only if explicitly mentioned ("full-time", "part-time", etc.). Do not infer from title. | "Senior Engineer" โ null |
location_type |
Only if explicitly mentioned ("remote", "hybrid", "in-person"). Do not infer from location strings. | "San Francisco" โ null |
description |
Only if paragraph description exists in input. Do not generate from title. May be rephrased/cleaned while preserving meaning. | Title only โ null |
responsibilities |
Extract if explicitly mentioned in description (bullet points, numbered lists, or clearly stated in paragraphs). Empty array if none. | Title only โ [] |
must_have.skills |
Only if explicitly listed as requirements. Do not infer from title or context. | Title only โ [] |
must_have.years_of_experience |
Only if explicitly mentioned ("3+ years", "5-7 years"). Do not infer from "Senior" title. | "Senior Engineer" โ omit |
location |
Only if explicitly mentioned. Do not infer from company name. | Omit if not present |
company_name |
Only if explicitly mentioned. Do not infer or fabricate. | Omit if not present |
org_unit_name |
Only if department/team name explicitly mentioned. | Omit if not present |
team |
Only if team description section exists. May be rephrased while preserving meaning. | Omit if not present |
compensation |
Only if salary/compensation explicitly mentioned. Do not infer or generate ranges. | Omit if not present |
compensation.base_annual_salary.min |
Only if minimum salary explicitly mentioned. | Omit if not present |
compensation.base_annual_salary.max |
Only if maximum salary explicitly mentioned. | Omit if not present |
compensation.base_annual_salary.currency |
Only if currency explicitly mentioned. | Omit if not present |
benefits |
Only if benefits section exists. Do not generate generic benefits. May be rephrased while preserving meaning. | Omit if not present |
has_equity |
Only if equity/stock options explicitly mentioned. Do not infer. | Omit if not present |
equity_details |
Only if equity details explicitly provided. May be rephrased while preserving meaning. | Omit if not present |
must_have.industry_experience |
Only if explicitly listed in requirements section. | Omit if not present |
nice_to_have.skills |
Only if explicitly listed in "nice to have" or "preferred" section. | Omit if not present |
nice_to_have.soft_skills |
Only if explicitly listed in preferred section. | Omit if not present |
nice_to_have.company_stage_experience |
Only if explicitly listed as preferred. | Omit if not present |
nice_to_have.industry_experience |
Only if explicitly listed as preferred (marked as "plus" or "nice to have"). | Omit if not present |
job_family |
Can infer from title if clear ("Software Engineer" โ "engineering"). Use "other" if ambiguous. | "Senior Engineer" โ "engineering" |
is_epd |
Set to true if EPD explicitly mentioned OR if job_family is engineering/design/product. Omit if job_family is "other". | "Head of EPD" โ true |
All other fields: Follow same principle - only populate if explicitly present in input.
Test Scenarios
| # | Scenario | Expected Behavior |
|---|---|---|
| 1 | Title only | Only title populated. All other fields null/[]/omitted. |
| 2 | Title + employment/location type | Extract employment_type and location_type only. No fabricated descriptions or skills. |
| 3 | Description + responsibilities, no skills | Extract description and responsibilities. must_have.skills = []. |
| 4 | Skills mentioned in context (not requirements) | Extract description only. must_have.skills = [] (skills in context, not listed as requirements). |
| 5 | Employment type in title | Extract title and infer employment_type from title (e.g., "Contract Engineer"). |
| 6 | Location mentioned, no work arrangement | Extract location and description. location_type = null (work arrangement not mentioned). |
| 7 | Paragraph description only, no bullets | Extract description. Extract responsibilities if clearly mentioned in description text, otherwise []. |
| 8 | Full requirements section | Extract must_have.skills, must_have.years_of_experience, must_have.soft_skills from requirements section. |
| 9 | Years of experience + company stage | Extract must_have.years_of_experience and must_have.company_stage_experience. Extract nice_to_have.industry_experience if marked as "plus". |
| 10 | Complete job posting with compensation | Extract all mentioned fields including compensation and benefits. No fabricated skills or responsibilities. |
| 11 | EPD role explicitly mentioned | Extract title and set is_epd=true. No fabricated details. |
| 12 | Empty/minimal input | Caught by validation. If reaches LLM, return minimal with title = "Untitled Job". |
| 13 | Company name not mentioned | Extract title and description. company_name should be omitted (not fabricated). |
| 14 | Company name explicitly mentioned | Extract company_name only if explicitly stated in text. |
| 15 | Org unit name mentioned | Extract org_unit_name only if department/team name is explicitly mentioned (e.g., "Engineering Department", "Product Team"). |
| 16 | Team description section | Extract team field only if "About the Team" or similar section exists. Omit if not present. |
| 17 | Equity mentioned | Extract has_equity=true and equity_details only if equity/stock options explicitly mentioned. Do not infer. |
| 18 | Equity not mentioned | Extract title and other fields. has_equity and equity_details should be omitted (not inferred). |
| 19 | Nice-to-have skills section | Extract nice_to_have.skills only if "Nice to have" or "Preferred" section exists with skills listed. |
| 20 | Nice-to-have soft skills | Extract nice_to_have.soft_skills only if explicitly listed in preferred section. Omit if not present. |
| 21 | Nice-to-have company stage | Extract nice_to_have.company_stage_experience only if explicitly listed as preferred. |
| 22 | Must-have industry experience | Extract must_have.industry_experience only if explicitly listed in requirements (e.g., "5+ years in fintech"). |
| 23 | Compensation with range and currency | Extract compensation.base_annual_salary.min, max, and currency only if all explicitly mentioned (e.g., "$120,000 - $150,000 USD"). |
| 24 | Compensation partial information | If only salary range mentioned without currency, extract min and max only. currency should be omitted. |
| 25 | Job family ambiguous title | For ambiguous titles (e.g., "Technical Lead", "Manager"), set job_family = "other". Do not infer. |
| 26 | Multiple locations mentioned | Extract location as array if multiple locations mentioned. location_type should be omitted unless work arrangement explicitly stated. |
Test Examples
Note: Descriptions and text fields may be rephrased or cleaned up during extraction. The key requirement is that the extracted content must be traceable to the source text and contain the same information, not that it matches verbatim.
Scenario 1: Title only
Input:
Senior Software Engineer
Expected Output:
{
"title": "Senior Software Engineer",
"job_family": "engineering"
}
Negative Assertions: description, responsibilities, must_have, company_name, location, compensation, benefits should all be omitted.
Scenario 2: Title + employment/location type
Input:
Full-time Remote Senior Engineer
Expected Output:
{
"title": "Full-time Remote Senior Engineer",
"employment_type": "full-time",
"location_type": "remote",
"job_family": "engineering"
}
Negative Assertions: description, responsibilities, must_have.skills, compensation should all be omitted.
Scenario 3: Description + responsibilities, no skills
Input:
Software Engineer
We are looking for a software engineer to join our team.
Responsibilities:
- Build and maintain web applications
- Write clean, maintainable code
- Collaborate with cross-functional teams
Expected Output:
{
"title": "Software Engineer",
"description": "We are seeking a software engineer to join our team.",
"responsibilities": [
"Build and maintain web applications",
"Write clean, maintainable code",
"Collaborate with cross-functional teams"
],
"job_family": "engineering"
}
Note: Description may be rephrased (e.g., "looking for" โ "seeking") as long as the meaning is preserved.
Negative Assertions: must_have.skills should be omitted (no requirements section exists).
Scenario 4: Skills mentioned in context (not requirements)
Input:
Software Engineer
We use Python and JavaScript in our tech stack. The ideal candidate will work with these technologies daily.
Expected Output:
{
"title": "Software Engineer",
"description": "Our tech stack includes Python and JavaScript. Candidates will work with these technologies on a daily basis.",
"job_family": "engineering"
}
Note: Description may be rephrased while preserving the same information about the tech stack.
Negative Assertions: must_have.skills should be omitted (skills mentioned in context, not as requirements).
Scenario 5: Employment type in title
Input:
Contract Engineer
Expected Output:
{
"title": "Contract Engineer",
"employment_type": "contract",
"job_family": "engineering"
}
Scenario 6: Location mentioned, no work arrangement
Input:
Senior Engineer
Location: San Francisco
We are looking for an engineer based in San Francisco.
Expected Output:
{
"title": "Senior Engineer",
"location": ["San Francisco"],
"description": "We are seeking an engineer located in San Francisco.",
"job_family": "engineering"
}
Note: Description may be rephrased (e.g., "based in" โ "located in") while preserving location information.
Negative Assertions: location_type should be omitted (work arrangement not mentioned).
Scenario 7: Paragraph description only, no bullets
Input:
Product Manager
We are seeking a Product Manager who will be responsible for defining product strategy, working with engineering teams, and launching new features. The role involves gathering requirements, prioritizing features, and coordinating releases.
Expected Output:
{
"title": "Product Manager",
"description": "We are looking for a Product Manager responsible for product strategy, engineering collaboration, and feature launches. The role includes requirements gathering, feature prioritization, and release coordination.",
"responsibilities": [],
"job_family": "product"
}
Note: Description may be rephrased and condensed while preserving key information. If responsibilities are clearly mentioned in paragraph form, they may be extracted. Otherwise, empty array.
Scenario 8: Full requirements section
Input:
Senior Engineer
Requirements:
- 5+ years of software engineering experience
- Python, JavaScript, React
- Strong communication skills
- Ability to work in a fast-paced environment
Expected Output:
{
"title": "Senior Engineer",
"must_have": {
"years_of_experience": {
"min": 5,
"field": "software engineering"
},
"skills": ["Python", "JavaScript", "React"],
"soft_skills": ["Strong communication skills", "Ability to work in a fast-paced environment"]
},
"job_family": "engineering"
}
Scenario 9: Years of experience + company stage
Input:
Senior Engineer
Requirements:
- 5+ years of experience
- Experience at early-stage startups
Nice to have:
- Fintech industry experience (plus)
Expected Output:
{
"title": "Senior Engineer",
"must_have": {
"years_of_experience": {
"min": 5
},
"company_stage_experience": ["early-stage startups"]
},
"nice_to_have": {
"industry_experience": ["Fintech"]
},
"job_family": "engineering"
}
Scenario 10: Complete job posting with compensation
Input:
Senior Software Engineer
We are looking for a Senior Software Engineer to join our team.
Responsibilities:
- Design and implement scalable systems
- Mentor junior engineers
- Participate in code reviews
Requirements:
- 5+ years of experience
- Python, JavaScript
Salary: $150,000 - $180,000 USD
Benefits: Health insurance, 401k, unlimited PTO
Expected Output:
{
"title": "Senior Software Engineer",
"description": "We are seeking a Senior Software Engineer to join our team.",
"responsibilities": [
"Design and implement scalable systems",
"Mentor junior engineers",
"Participate in code reviews"
],
"must_have": {
"years_of_experience": {
"min": 5
},
"skills": ["Python", "JavaScript"]
},
"compensation": {
"base_annual_salary": {
"min": 150000,
"max": 180000,
"currency": "USD"
}
},
"benefits": "Health insurance, 401k, unlimited PTO",
"job_family": "engineering"
}
Note: Description may be rephrased while preserving the core message.
Scenario 11: EPD role explicitly mentioned
Input:
Head of EPD
Expected Output:
{
"title": "Head of EPD",
"is_epd": true,
"job_family": "other"
}
Note: EPD roles should set is_epd=true even if job_family is "other".
Scenario 12: Empty/minimal input
Input:
Expected Output:
{
"title": "Untitled Job"
}
Note: Should be caught by validation before reaching LLM, but if it reaches LLM, return minimal output.
Scenario 13: Company name not mentioned
Input:
Senior Software Engineer
Expected Output:
{
"title": "Senior Software Engineer",
"job_family": "engineering"
}
Negative Assertions: description, responsibilities, must_have, company_name, location, compensation, benefits should all be omitted.
Scenario 13: Company name not mentioned
Input:
Senior Software Engineer
We are looking for an experienced engineer to join our team. You will work on building scalable systems.
Expected Output:
{
"title": "Senior Software Engineer",
"description": "We are seeking an experienced engineer to join our team. You will work on building scalable systems.",
"job_family": "engineering"
}
Note: Description may be rephrased while preserving the core information.
Negative Assertions: company_name must be omitted (not fabricated).
Scenario 14: Company name explicitly mentioned
Input:
Senior Software Engineer at Acme Corp
We are looking for an experienced engineer to join Acme Corp's engineering team.
Expected Output:
{
"title": "Senior Software Engineer at Acme Corp",
"company_name": "Acme Corp",
"description": "We are seeking an experienced engineer to join Acme Corp's engineering team.",
"job_family": "engineering"
}
Note: Description may be rephrased while preserving company name and core information.
Scenario 15: Org unit name mentioned
Input:
Product Manager
Join our Product Team. We are looking for a Product Manager in the Engineering Department.
Expected Output:
{
"title": "Product Manager",
"org_unit_name": "Product Team",
"description": "Join our Product Team. We are seeking a Product Manager for the Engineering Department.",
"job_family": "product"
}
Note: Description may be rephrased while preserving org unit information. If multiple department names are mentioned, extract the most relevant one or the first one.
Scenario 16: Team description section
Input:
Software Engineer
About the Team:
Our engineering team is passionate about building great products. We value collaboration and innovation.
Job Description:
We are looking for a software engineer...
Expected Output:
{
"title": "Software Engineer",
"team": "Our engineering team is passionate about building great products. We value collaboration and innovation.",
"description": "We are seeking a software engineer...",
"job_family": "engineering"
}
Note: Team description may be rephrased while preserving the core message about the team's values and culture.
Negative Assertions: If no "About the Team" section exists, team should be omitted.
Scenario 17: Equity mentioned
Input:
Senior Engineer
We offer competitive salary plus equity package. Stock options will vest over 4 years.
Expected Output:
{
"title": "Senior Engineer",
"description": "We offer competitive salary plus equity package. Stock options vest over 4 years.",
"has_equity": true,
"equity_details": "Stock options vest over 4 years",
"job_family": "engineering"
}
Note: Description and equity_details may be rephrased while preserving the same information about equity and vesting.
Scenario 18: Equity not mentioned
Input:
Senior Engineer
We offer competitive salary and benefits.
Expected Output:
{
"title": "Senior Engineer",
"description": "We offer competitive salary and benefits.",
"job_family": "engineering"
}
Note: Description may be rephrased while preserving information about compensation and benefits.
Negative Assertions: has_equity and equity_details must be omitted (not inferred).
Scenario 19: Nice-to-have skills section
Input:
Software Engineer
Requirements:
- 3+ years of experience
- Python, JavaScript
Nice to have:
- React experience
- TypeScript
- GraphQL
Expected Output:
{
"title": "Software Engineer",
"must_have": {
"years_of_experience": {
"min": 3
},
"skills": ["Python", "JavaScript"]
},
"nice_to_have": {
"skills": ["React", "TypeScript", "GraphQL"]
},
"job_family": "engineering"
}
Negative Assertions: Skills mentioned only in "Nice to have" should NOT appear in must_have.skills.
Scenario 20: Nice-to-have soft skills
Input:
Product Manager
Requirements:
- Strong communication skills
- 5+ years experience
Preferred:
- Leadership experience
- Public speaking ability
Expected Output:
{
"title": "Product Manager",
"must_have": {
"years_of_experience": {
"min": 5
},
"soft_skills": ["Strong communication skills"]
},
"nice_to_have": {
"soft_skills": ["Leadership experience", "Public speaking ability"]
},
"job_family": "product"
}
Scenario 21: Nice-to-have company stage
Input:
Senior Engineer
Requirements:
- 5+ years experience
Nice to have:
- Experience at early-stage startups
- Experience at Series B+ companies
Expected Output:
{
"title": "Senior Engineer",
"must_have": {
"years_of_experience": {
"min": 5
}
},
"nice_to_have": {
"company_stage_experience": ["early-stage startups", "Series B+ companies"]
},
"job_family": "engineering"
}
Scenario 22: Must-have industry experience
Input:
Senior Engineer
Requirements:
- 5+ years of software engineering experience
- 3+ years in fintech industry
- Experience with payment systems
Expected Output:
{
"title": "Senior Engineer",
"must_have": {
"years_of_experience": {
"min": 5,
"field": "software engineering"
},
"industry_experience": ["fintech"],
"skills": ["payment systems"]
},
"job_family": "engineering"
}
Scenario 23: Compensation with range and currency
Input:
Senior Engineer
Salary: $120,000 - $150,000 USD per year
Expected Output:
{
"title": "Senior Engineer",
"compensation": {
"base_annual_salary": {
"min": 120000,
"max": 150000,
"currency": "USD"
}
},
"job_family": "engineering"
}
Note: Currency codes should be standardized (USD, EUR, GBP, etc.).
Scenario 24: Compensation partial information
Input:
Senior Engineer
Salary: $120,000 - $150,000 per year
Expected Output:
{
"title": "Senior Engineer",
"compensation": {
"base_annual_salary": {
"min": 120000,
"max": 150000
}
},
"job_family": "engineering"
}
Negative Assertions: currency should be omitted if not explicitly mentioned.
Scenario 25: Job family ambiguous title
Input:
Technical Lead
Expected Output:
{
"title": "Technical Lead",
"job_family": "other"
}
Note: Titles like "Manager", "Lead", "Director" without clear domain context should use "other".
Scenario 26: Multiple locations mentioned
Input:
Senior Engineer
Locations: San Francisco, New York, Remote
We are looking for engineers in multiple locations.
Expected Output:
{
"title": "Senior Engineer",
"location": ["San Francisco", "New York", "Remote"],
"description": "We are seeking engineers for multiple locations.",
"job_family": "engineering"
}
Note: Description may be rephrased while preserving location information.
Negative Assertions: location_type should be omitted unless work arrangement (remote/hybrid/in-person) is explicitly stated separately from location names.
Success Criteria
- Hallucination Rate: < 5% of fields fabricated when input is minimal
- Extraction Accuracy: > 95% of populated fields traceable to source text
- Test Coverage: 100% of scenarios pass
Implementation Requirements
- Prompt Updates: Add explicit "no fabrication" rule and field-specific extraction instructions
- Validation: Post-process to ensure extracted fields exist in source text
- Tests: Comprehensive test suite covering all scenarios above