Prompt Engineering

52 fragments · Layer 3 Synthesized established · 52 evidence · updated 2025-01-31

Summary

Prompt engineering at this scale is primarily a failure-diagnosis discipline: the most expensive mistakes are prompts that appear to work in happy-path testing but silently fail on edge cases — broad keywords triggering false positives, AI ignoring explicit constraints, context fragmentation when assembling prompts from multiple files. The single most consequential architectural decision is whether to use one comprehensive prompt or split into category-specific prompts; LabelCheck tried both and the split approach failed due to AI context fragmentation, requiring a full rollback. Numeric constraints (word count, section minimums, character limits) reliably require both prompt enforcement and post-processing validation — the model will drift without the safety net. Decoupling extraction from synthesis, then caching extractions keyed to schema version and file mtime, eliminates re-extraction cost entirely during prompt iteration and is the highest-leverage infrastructure investment for any multi-stage LLM pipeline.

TL;DR

What we've learned
- Category-specific prompts looked like a 60-70% size win but were ultimately removed from LabelCheck because AI context fragmentation caused incomplete analysis; a single comprehensive prompt outperformed the split approach in practice.
- Numeric output constraints (word count, section counts, character limits) require both prompt instructions and post-processing validation — prompts alone drift, as ContentCommand discovered when a 1,500-word target produced 4,288 words.
- Extraction caching keyed to schema version + file mtime drops Haiku API calls to zero for prompt-only iterations in Meridian — the highest-ROI infrastructure change for iterative prompt development.
- Broad keywords in pattern matching cause false positives at production scale; "help," "now," and "today" are too common for intent detection and must be replaced with specific phrases.
- LLM-judged rubric evaluation (10 criteria, 0-10 scale) enables quantified prompt iteration; Meridian's judge call completed in 20.4 seconds and returned an 88/100 average on a 9-fragment legal topic.

External insights

No external sources ingested yet for this topic.

Common Failure Modes

1. Category-specific prompt assembly causes AI context fragmentation

In LabelCheck, category-specific prompts were built by assembling content from multiple markdown files at runtime. The approach showed a 60-70% prompt size reduction (30KB → 7-8KB per category) and 5-10 second performance gains in benchmarks. Then it was removed entirely.

The failure: when prompts are assembled from multiple files, the AI receives a fragmented context that causes incomplete analysis — sections reference concepts defined elsewhere in the original monolith but absent in the assembled version. The model doesn't error; it silently omits analysis.

The fix was a full rollback to a single comprehensive TypeScript prompt. This is a resolved contradiction in the evidence: the enhancement work ^[1] preceded the removal decision ^[2], and the removal was empirical, not theoretical.

Lesson: Benchmark prompt size and latency, but validate completeness of analysis output before shipping a split-prompt architecture.

2. Numeric output constraints drift without post-processing enforcement

ContentCommand hit this directly: a 1,500-word target produced 4,288 words, inflating the Surfer SEO score from a real 52 to a misleading 82. The prompt specified a word count range; the model ignored it under longer-context generation pressure.

The fix requires two layers:

// Layer 1: Prompt constraint (necessary but not sufficient)
// "Generate content between 1,275 and 1,650 words (85-110% of 1,500 target).
//  If you exceed 1,650 words, cut the lowest-value paragraphs. CRITICAL: 
//  do not pad to hit minimums."

// Layer 2: Post-processing validation
function enforceWordCount(content: string, target: number): string {
  const words = content.split(/\s+/).length;
  const min = Math.floor(target * 0.85);
  const max = Math.ceil(target * 1.10);
  if (words > max) {
    // trim or flag for re-generation
  }
  return content;
}

The same pattern applies to context limits in Meridian: Haiku-based planning passes need a 40K character hard cap, writing passes need 80K — enforced in code, not just in the prompt.
^[3]

3. Broad keywords cause false positives in pattern matching

In AsymXray's call intent analysis, keywords like "help", "now", "today", and "right now" were triggering on greetings and casual conversation, not genuine urgency or intent signals. The model matched the word, not the semantic context.

Fix: replace single-word triggers with specific multi-word phrases that carry unambiguous intent signal. Phrase weighting was also increased from ×2 to ×3 to improve recall on genuine matches without widening the false-positive surface.

# Before: too broad
URGENCY_KEYWORDS = ["now", "today", "help", "urgent"]

# After: specific phrases only
URGENCY_PHRASES = [
    "need this fixed today",
    "can't wait until",
    "emergency situation",
    "production is down",
]

^[4]

4. AI ignores soft constraint language; requires imperative commands

In LabelCheck, ambiguity detection was specified with language like "should check" and "consider flagging." It didn't trigger reliably. The fix was replacing soft language with explicit imperative commands:

# Before (unreliable)
"You should check for ambiguity when the product category is unclear."

# After (reliable)
"STOP AND CHECK: Before proceeding, you MUST evaluate category confidence.
If confidence is below 85%, you MUST set ambiguity_detected: true.
This is CRITICAL — do not skip this step."

Consistent across LabelCheck: "MUST", "CRITICAL", "STOP AND CHECK" reliably trigger behavior that "should" and "consider" do not.
^[5]

5. AI prioritizes functional signals over structural indicators in classification

In LabelCheck, the model was classifying products based on health claims and functional ingredients rather than the definitive regulatory indicator: whether the label contains a Supplement Facts panel or a Nutrition Facts panel. A product with a Nutrition Facts panel was being classified as a supplement because it contained functional ingredients.

The fix is explicit rule prioritization in the prompt:

"CLASSIFICATION RULE (highest priority, overrides all other signals):
1. If the label contains a 'Supplement Facts' panel → DIETARY SUPPLEMENT
2. If the label contains a 'Nutrition Facts' panel → CONVENTIONAL FOOD
Panel type is the definitive regulatory indicator per 21 CFR 101.
Do NOT override this based on ingredients or health claims."

^[6]

6. JSON parsing fails silently when model wraps output in markdown fences

ClientBrain's sentiment analysis was falling back to default values without error. Root cause: Claude Haiku was wrapping JSON responses in ```json fences, and JSON.parse() was receiving the fence characters, throwing a parse error caught by a silent fallback.

// Fragile
const result = JSON.parse(response.content);

// Fix: strip fences before parsing
function extractJSON(raw: string): unknown {
  // Find first { and last } to handle nested content
  const start = raw.indexOf('{');
  const end = raw.lastIndexOf('}');
  if (start === -1 || end === -1) throw new Error('No JSON object found');
  return JSON.parse(raw.slice(start, end + 1));
}

The indexOf('{') / lastIndexOf('}') approach handles nested objects better than regex fence-stripping, which breaks on JSON containing backtick characters.
^[7]

7. Reference database gaps cause false positive compliance errors

LabelCheck's GRAS database was missing 50+ bioavailable vitamin and mineral synonyms — methylcobalamin, adenosylcobalamin, hydroxocobalamin, pyridoxal-5-phosphate (P-5-P), and chelated mineral forms. A fortified coffee product using methylcobalamin was flagged as non-compliant despite being fully legal.

This is a data problem masquerading as a prompt problem. The fix was expanding the synonym database, not changing the prompt. Before attributing false positives to prompt logic, audit the reference data the prompt is checking against.
^[8]

8. Initial analysis misses claims that appear only in follow-up chat

In LabelCheck, the initial AI analysis scanned the label upload. When users asked follow-up questions in chat, they sometimes introduced marketing language not present in the original label — and the initial analysis pass had no visibility into it. Problematic terms in chat were going undetected.

Fix: extend the analysis scope to include follow-up chat content, or run a secondary detection pass on chat messages. The red flag marketing term list was also expanded from 4 terms to 20+ to improve recall.
^[9]

What Works

Decoupling extraction from synthesis, then caching extractions

In Meridian, separating the extraction pass (reading source documents, pulling structured data) from the synthesis pass (writing the article) enables caching extractions keyed to schema version + file mtime. When you're iterating on the synthesis prompt, extraction API calls drop to zero. The fixture-based regression harness costs ~$0.20 and runs in ~15 minutes for 6 representative topics.

This is the highest-leverage infrastructure investment for any multi-stage LLM pipeline. The cost of prompt iteration without it is proportional to corpus size; with it, iteration cost is flat.
^[10]

LLM-judged rubric evaluation for quantified prompt iteration

Meridian uses a 10-criteria rubric (0-10 scale per criterion) evaluated by a judge LLM call to score synthesis output. This replaces eyeballing diffs between prompt versions. A judge call on a 9-fragment legal topic completed in 20.4 seconds and returned an 88/100 average.

The rubric criteria should match the generation criteria — ContentCommand validated this separately: quality scoring that doesn't evaluate against the same standards used in generation produces misleading scores.
^[11]

Extracting prompts to external markdown files

In LabelCheck, extracting prompts from TypeScript to external .md files reduced analysis-prompts.ts from 360 lines to 32 lines (91% reduction) and a route handler from 2,087 to 871 lines (58% reduction). The prompts become editable by non-developers and diffable in git without TypeScript noise.

The tradeoff: prompts in external files are harder to type-check and easier to accidentally break with whitespace changes. Add a prompt-loading test that validates the file exists and parses to a non-empty string.
^[12]

RAG lite: cheap OCR pre-classification to filter regulatory documents

LabelCheck's full analysis prompt included ~50 regulatory documents. After adding a GPT-4o-mini OCR pre-classification step (cost: ~$0.0001 per analysis, detail: low), the relevant document set drops to 15-25 based on product category. This reduces prompt size by 60-70% without the context fragmentation risk of splitting the analysis prompt itself.

The key distinction: RAG lite filters the inputs to a single comprehensive prompt. It does not split the prompt. This is what the category-specific prompt approach got wrong.
^[13]

Explicit enumeration of acceptable vs. prohibited patterns

In both LabelCheck and ContentCommand, replacing general guidance with explicit categorized lists of acceptable and prohibited examples improved accuracy. For LabelCheck claims analysis: listing nutrient content claims, structure/function claims, and FDA-authorized health claims as acceptable (not just listing prohibited ones) reduced false positives on legitimate supplement language.

# Less effective
"Flag any health claims that may be problematic."

# More effective
"ACCEPTABLE (do not flag):
- Nutrient content claims: 'High in Vitamin C', 'Good source of calcium'
- Structure/function WITH disclaimer: 'Supports immune health*'
- FDA-authorized health claims: 'May reduce risk of heart disease'

PROHIBITED (always flag):
- Disease claims: 'Treats diabetes', 'Cures cancer'
- Structure/function on conventional foods (no disclaimer makes it legal)"

^[14]

Feature flags with graceful fallbacks for prompt experimentation

LabelCheck used feature flags to gate category-specific prompts, falling back to the monolithic prompt when the flag was off or when forcedCategory wasn't explicitly provided. This meant the experimental path never broke production. When the category-specific approach was ultimately removed, the rollback was a flag flip, not a code revert under pressure.

^[15]

Gotchas and Edge Cases

AI-generated insights require post-processing filters for client-specific metric exclusions. In AsymXray, ROAS metrics were appearing in insights for lead-gen clients who don't run ROAS campaigns. The fix was a post-processing filter, not a prompt change — and the filter had to be extended to cover "Local Business" and "Awareness" objectives, not just "Lead Generation." Crazy Lenny's surfaced this when their Marketing Objective was set to "Local Business."
^[16]

Structure/function claims are prohibited on conventional foods entirely, not just missing a disclaimer. The prompt originally flagged structure/function claims on conventional foods as "missing disclaimer." The correct flag is "prohibited." Supplements allow structure/function claims with the FDA disclaimer; conventional foods don't allow them at all under 21 CFR 101. This is a regulatory distinction that must be encoded explicitly — the model will default to the softer interpretation.
^[17]

AI struggles to distinguish Statement of Identity from marketing taglines without explicit guidance. On complex packaging, the model was extracting taglines ("The Ultimate Recovery Formula") as the product name instead of the regulated Statement of Identity. Fix: provide contextual clues — font size hierarchy, net quantity placement, Nutrition/Supplement Facts panel location, manufacturer address, and barcode placement all help identify the Principal Display Panel per 21 CFR 101.1.
^[18]

Supplement ingredients require extraction from two sources. LabelCheck found that extracting only from the Supplement Facts panel missed active ingredients listed solely in the ingredient list. The prompt must explicitly instruct extraction from both sources and deduplication.
^[19]

Allergen detection requires absolute rules, not probabilistic ones. Prompts that say "flag if allergens may be present" produce false positives on hypothetical scenarios. The correct instruction: flag non_compliant only when allergens are confirmed present in the ingredient list, not inferred from manufacturing context.
^[19]

Context overflow in Haiku-based pipelines is silent. In Meridian, the _index.md registry grew to 1.7MB before context overflow was diagnosed. The symptom was degraded output quality, not an error. Fix: compact registries to slug→alias lists instead of full YAML, and enforce hard character limits in code (40K for planning pass, 80K for writing pass).
^[20]

Markdown output formatting requires a client-side safety net. ContentCommand's key takeaways section was rendering as prose instead of bullet lists despite prompt instructions. The fix required both strengthened prompt language and a fixBulletLists() post-processing function. LLM variance on markdown syntax is real and persistent.
^[21]

Where Docs Disagree With Practice

Category-specific prompts: documented as a best practice, failed in production. The standard advice for large prompt optimization is to split by category and load only the relevant subset. In LabelCheck, this produced measurable size and latency improvements in benchmarks but caused incomplete analysis in production due to context fragmentation. The monolithic prompt, despite being larger, produced more complete and consistent output. The benchmark metric (prompt size) was not the right proxy for the outcome metric (analysis completeness).
^[2]

Soft constraint language in prompts: documented as sufficient, insufficient in practice. OpenAI and Anthropic documentation presents constraint language like "you should" and "please ensure" as effective. In LabelCheck's ambiguity detection, soft language consistently failed to trigger the behavior. "MUST", "CRITICAL", and "STOP AND CHECK" were required. This may be model-specific (Haiku vs. Sonnet) but was consistent enough across LabelCheck's use cases to treat as a general rule.
^[22]

Allergen flagging: "flag potential allergens" produces false positives. The intuitive prompt instruction is to flag anything that could be an allergen concern. In practice, this causes the model to flag hypothetical cross-contamination scenarios, manufacturing environment risks, and "may contain" statements as non_compliant when they're actually compliant disclosures. The correct framing is confirmatory, not precautionary.
^[19]

Tool and Version Notes

GPT-4o-mini with detail: low — Used in LabelCheck for OCR pre-classification at ~$0.0001 per call. Sufficient for category detection from label images; not sufficient for full compliance analysis.
Claude Haiku (planning/extraction pass) — Context limit behavior: 40K characters for planning pass, 80K for writing pass in Meridian. Silent degradation above these limits, not hard errors. Observed in Meridian.
Claude Haiku JSON output — Wraps responses in ```json markdown fences inconsistently. Always strip fences or use indexOf('{') / lastIndexOf('}') extraction. Observed in ClientBrain.
Frase SERP API — Used in ContentCommand to derive target word count (110% of SERP average, clamped 1,200-4,000 words) and LSI keyword lists. Enriches briefs with competitor-derived data rather than static targets.
LLM-as-judge evaluation — Meridian's judge call on a 9-fragment topic: 20.4 seconds, 88/100 average. Viable for regression testing at ~$0.20 per 6-topic run. Not viable for per-request quality gating due to latency.

Sources

Synthesized from 52 fragments: 49 git commits across AsymXray, ClientBrain, ContentCommand, Crazy Lenny's, LabelCheck, and Meridian; 0 external sources; 0 post-mortems. Date range: unknown to unknown.

Sources

Fragments (52)

Synthesis regression harness: frozen fixtures + write-pass diff Meridian

# Synthesis regression harness: frozen fixtures + write-pass diff **Project:** Meridian (`meridian`) **Date:** 2026-04-10 **Author:** Mark Hope **Commit:** `1e629e4` **Scope:** 17 files, +14369/-11 ## Commit message ``` Synthesis regression harness: frozen fixtures + write-pass diff Le

1e629e4 · 17 files · +14369/-11 · 2026-04-10 · medium confidence
linter: harden for three-dimensional knowledge model Meridian

# linter: harden for three-dimensional knowledge model **Project:** Meridian (`meridian`) **Date:** 2026-04-10 **Author:** Mark Hope **Commit:** `51744c6` **Scope:** 2 files, +381/-53 ## Commit message ``` linter: harden for three-dimensional knowledge model Five issues found during pr

51744c6 · 2 files · +381/-53 · 2026-04-10 · high confidence
LLM-judged synthesis evaluation rubric Meridian

# LLM-judged synthesis evaluation rubric **Project:** Meridian (`meridian`) **Date:** 2026-04-10 **Author:** Mark Hope **Commit:** `7a1b6c6` **Scope:** 3 files, +509/-0 ## Commit message ``` LLM-judged synthesis evaluation rubric Lets us quantify prompt iterations instead of eyeballing

7a1b6c6 · 3 files · +509/-0 · 2026-04-10 · medium confidence
Synthesizer: extraction cache, decoupled extract/write, output versioning Meridian

# Synthesizer: extraction cache, decoupled extract/write, output versioning **Project:** Meridian (`meridian`) **Date:** 2026-04-09 **Author:** Mark Hope **Commit:** `0808dfa` **Scope:** 1 files, +424/-49 ## Commit message ``` Synthesizer: extraction cache, decoupled extract/write, outp

0808dfa · 1 files · +424/-49 · 2026-04-09 · medium confidence
Synthesizer prompt: editorial voice guardrails + TL;DR + gap naming Meridian

# Synthesizer prompt: editorial voice guardrails + TL;DR + gap naming **Project:** Meridian (`meridian`) **Date:** 2026-04-09 **Author:** Mark Hope **Commit:** `da465ee` **Scope:** 1 files, +143/-13 ## Commit message ``` Synthesizer prompt: editorial voice guardrails + TL;DR + gap namin

da465ee · 1 files · +143/-13 · 2026-04-09 · high confidence
Improve synthesizer: hierarchical structure, editorial test, --force flag Meridian

# Improve synthesizer: hierarchical structure, editorial test, --force flag **Project:** Meridian (`meridian`) **Date:** 2026-04-07 **Author:** Mark Hope **Commit:** `a238dc3` **Scope:** 2 files, +142/-32 ## Commit message ``` Improve synthesizer: hierarchical structure, editorial test,

a238dc3 · 2 files · +142/-32 · 2026-04-07 · high confidence
Fix context overflow: compact registries, truncate planning input, reset bloated index Meridian

# Fix context overflow: compact registries, truncate planning input, reset bloated index **Project:** Meridian (`meridian`) **Date:** 2026-04-05 **Author:** Mark Hope **Commit:** `2de9bdc` **Scope:** 1 files, +32/-9 ## Commit message ``` Fix context overflow: compact registries, truncat

2de9bdc · 1 files · +32/-9 · 2026-04-05 · high confidence
Add wiki/log.md operations log Meridian

# Add wiki/log.md operations log **Project:** Meridian (`meridian`) **Date:** 2026-04-04 **Author:** Mark Hope **Commit:** `527ee41` **Scope:** 5 files, +79/-0 ## Commit message ``` Add wiki/log.md operations log Append-only log for all agent activity. Operations: ingest, distill, comp

527ee41 · 5 files · +79/-0 · 2026-04-04 · medium confidence
feat: derive target word count from SERP competitor analysis ContentCommand

# feat: derive target word count from SERP competitor analysis **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-03 **Author:** Mark Hope **Commit:** `0cb6a0b` **Scope:** 1 files, +22/-6 ## Commit message ``` feat: derive target word count from SERP competitor analysis B

0cb6a0b · 1 files · +22/-6 · 2026-03-03 · medium confidence
feat: add content generation tuning options (reading level, style, voice, model) ContentCommand

# feat: add content generation tuning options (reading level, style, voice, model) **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-03 **Author:** Mark Hope **Commit:** `75028d8` **Scope:** 7 files, +317/-61 ## Commit message ``` feat: add content generation tuning optio

75028d8 · 7 files · +317/-61 · 2026-03-03 · medium confidence
feat: overhaul content generation for GEO/AEO optimization + add download options ContentCommand

# feat: overhaul content generation for GEO/AEO optimization + add download options **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-03 **Author:** Mark Hope **Commit:** `b324127` **Scope:** 2 files, +411/-144 ## Commit message ``` feat: overhaul content generation for G

b324127 · 2 files · +411/-144 · 2026-03-03 · high confidence
fix: Key Takeaways renders as clean bullet list instead of blockquote blob ContentCommand

# fix: Key Takeaways renders as clean bullet list instead of blockquote blob **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-03 **Author:** Mark Hope **Commit:** `b376ac0` **Scope:** 2 files, +17/-2 ## Commit message ``` fix: Key Takeaways renders as clean bullet list i

b376ac0 · 2 files · +17/-2 · 2026-03-03 · high confidence
fix: enforce word count limits and NLP keyword coverage in content generation ContentCommand

# fix: enforce word count limits and NLP keyword coverage in content generation **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-03 **Author:** Mark Hope **Commit:** `b7d4323` **Scope:** 1 files, +20/-4 ## Commit message ``` fix: enforce word count limits and NLP keyword

b7d4323 · 1 files · +20/-4 · 2026-03-03 · high confidence
feat: align quality scoring with content generation standards ContentCommand

# feat: align quality scoring with content generation standards **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-01 **Author:** Mark Hope **Commit:** `0ee4e18` **Scope:** 1 files, +51/-10 ## Commit message ``` feat: align quality scoring with content generation standards

0ee4e18 · 1 files · +51/-10 · 2026-03-01 · high confidence
feat: wire Frase SERP analysis and semantic keywords into content pipeline ContentCommand

# feat: wire Frase SERP analysis and semantic keywords into content pipeline **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-01 **Author:** Mark Hope **Commit:** `8c94750` **Scope:** 3 files, +168/-7 ## Commit message ``` feat: wire Frase SERP analysis and semantic keyw

8c94750 · 3 files · +168/-7 · 2026-03-01 · high confidence
feat: overhaul content prompts based on Otterly.AI audit feedback ContentCommand

# feat: overhaul content prompts based on Otterly.AI audit feedback **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-01 **Author:** Mark Hope **Commit:** `ad31ad4` **Scope:** 1 files, +75/-36 ## Commit message ``` feat: overhaul content prompts based on Otterly.AI audit

ad31ad4 · 1 files · +75/-36 · 2026-03-01 · high confidence
fix: improve content generation prompts for editorial-quality output ContentCommand

# fix: improve content generation prompts for editorial-quality output **Project:** ContentCommand (`contentcommand`) **Date:** 2026-03-01 **Author:** Mark Hope **Commit:** `cd618d2` **Scope:** 1 files, +31/-13 ## Commit message ``` fix: improve content generation prompts for editorial-

cd618d2 · 1 files · +31/-13 · 2026-03-01 · high confidence
Fix JSON extraction, remove redundant Key Issues section, improve headings ClientBrain

# Fix JSON extraction, remove redundant Key Issues section, improve headings **Project:** ClientBrain (`client-brain`) **Date:** 2026-02-10 **Author:** Mark Hope **Commit:** `487b650` **Scope:** 3 files, +23/-33 ## Commit message ``` Fix JSON extraction, remove redundant Key Issues sect

487b650 · 3 files · +23/-33 · 2026-02-10 · high confidence
Improve sentiment narrative quality and fix cron skip bug ClientBrain

# Improve sentiment narrative quality and fix cron skip bug **Project:** ClientBrain (`client-brain`) **Date:** 2026-02-10 **Author:** Mark Hope **Commit:** `6d93793` **Scope:** 6 files, +1243/-121 ## Commit message ``` Improve sentiment narrative quality and fix cron skip bug - Prompt

6d93793 · 6 files · +1243/-121 · 2026-02-10 · high confidence
Fix sentiment JSON parsing — strip markdown code fences ClientBrain

# Fix sentiment JSON parsing — strip markdown code fences **Project:** ClientBrain (`client-brain`) **Date:** 2026-02-10 **Author:** Mark Hope **Commit:** `f576f8c` **Scope:** 1 files, +4/-1 ## Commit message ``` Fix sentiment JSON parsing — strip markdown code fences Haiku wraps JSON

f576f8c · 1 files · +4/-1 · 2026-02-10 · high confidence
feat: enhance form intent analysis to match call analysis capabilities AsymXray

# feat: enhance form intent analysis to match call analysis capabilities **Project:** AsymXray (`asymxray`) **Date:** 2026-01-17 **Author:** Mark Hope **Commit:** `5d37955` **Scope:** 9 files, +1049/-120 ## Commit message ``` feat: enhance form intent analysis to match call analysis cap

5d37955 · 9 files · +1049/-120 · 2026-01-17 · medium confidence
fix: improve call intent analysis patterns AsymXray

# fix: improve call intent analysis patterns **Project:** AsymXray (`asymxray`) **Date:** 2026-01-10 **Author:** Mark Hope **Commit:** `fd14dd1` **Scope:** 2 files, +10/-10 ## Commit message ``` fix: improve call intent analysis patterns - Remove overly broad keywords from emergency de

fd14dd1 · 2 files · +10/-10 · 2026-01-10 · medium confidence
fix: extend ROAS filtering to local and awareness objectives AsymXray

# fix: extend ROAS filtering to local and awareness objectives **Project:** AsymXray (`asymxray`) **Date:** 2025-12-19 **Author:** Mark Hope **Commit:** `c910369` **Scope:** 1 files, +46/-19 ## Commit message ``` fix: extend ROAS filtering to local and awareness objectives - Crazy Lenn

c910369 · 1 files · +46/-19 · 2025-12-19 · high confidence
fix: remove ROAS from AI insights for lead-gen clients AsymXray

# fix: remove ROAS from AI insights for lead-gen clients **Project:** AsymXray (`asymxray`) **Date:** 2025-12-19 **Author:** Mark Hope **Commit:** `cc69ed6` **Scope:** 2 files, +148/-50 ## Commit message ``` fix: remove ROAS from AI insights for lead-gen clients The AI-generated Accoun

cc69ed6 · 2 files · +148/-50 · 2025-12-19 · high confidence
Add goals context to AI chat system prompt AsymXray

# Add goals context to AI chat system prompt **Project:** AsymXray (`asymxray`) **Date:** 2025-12-04 **Author:** Mark Hope **Commit:** `c5e1167` **Scope:** 1 files, +54/-4 ## Commit message ``` Add goals context to AI chat system prompt The AI assistant now receives client goals data i

c5e1167 · 1 files · +54/-4 · 2025-12-04 · high confidence
Remove auto-trigger category ambiguity detection (Option A) LabelCheck

# Remove auto-trigger category ambiguity detection (Option A) **Project:** LabelCheck (`labelcheck`) **Date:** 2025-11-05 **Author:** Mark H **Commit:** `248a36b` **Scope:** 2 files, +1/-108 ## Commit message ``` Remove auto-trigger category ambiguity detection (Option A) Decision: Tru

248a36b · 2 files · +1/-108 · 2025-11-05 · medium confidence
Add comprehensive organic, natural, and GMO-free claim compliance checking LabelCheck

# Add comprehensive organic, natural, and GMO-free claim compliance checking **Project:** LabelCheck (`labelcheck`) **Date:** 2025-11-05 **Author:** Mark H **Commit:** `35743f6` **Scope:** 1 files, +81/-2 ## Commit message ``` Add comprehensive organic, natural, and GMO-free claim compl

35743f6 · 1 files · +81/-2 · 2025-11-05 · medium confidence
Fix GRAS false positives for fortified foods - add 50+ vitamin/mineral synonyms LabelCheck

# Fix GRAS false positives for fortified foods - add 50+ vitamin/mineral synonyms **Project:** LabelCheck (`labelcheck`) **Date:** 2025-11-05 **Author:** Mark H **Commit:** `9ccf724` **Scope:** 3 files, +341/-7 ## Commit message ``` Fix GRAS false positives for fortified foods - add 50+

9ccf724 · 3 files · +341/-7 · 2025-11-05 · medium confidence
Fix structure/function claims validation for conventional foods LabelCheck

# Fix structure/function claims validation for conventional foods **Project:** LabelCheck (`labelcheck`) **Date:** 2025-11-05 **Author:** Mark H **Commit:** `9cd206a` **Scope:** 1 files, +12/-2 ## Commit message ``` Fix structure/function claims validation for conventional foods Proble

9cd206a · 1 files · +12/-2 · 2025-11-05 · high confidence
Integrate category-specific prompts with feature flag LabelCheck

# Integrate category-specific prompts with feature flag **Project:** LabelCheck (`labelcheck`) **Date:** 2025-11-03 **Author:** Mark Hope **Commit:** `df24911` **Scope:** 1 files, +39/-0 ## Commit message ``` Integrate category-specific prompts with feature flag - Add feature flag supp

df24911 · 1 files · +39/-0 · 2025-11-03 · high confidence
Complete RAG Lite pre-classification integration for category-specific prompts LabelCheck

# Complete RAG Lite pre-classification integration for category-specific prompts **Project:** LabelCheck (`labelcheck`) **Date:** 2025-11-03 **Author:** Mark Hope **Commit:** `ebeccb1` **Scope:** 1 files, +7/-2 ## Commit message ``` Complete RAG Lite pre-classification integration for c

ebeccb1 · 1 files · +7/-2 · 2025-11-03 · high confidence
Refactor: Extract AI analysis prompt to separate module (Phase 1) LabelCheck

# Refactor: Extract AI analysis prompt to separate module (Phase 1) **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-30 **Author:** Mark Hope **Commit:** `00a7061` **Scope:** 2 files, +1241/-1218 ## Commit message ``` Refactor: Extract AI analysis prompt to separate module (Phas

00a7061 · 2 files · +1241/-1218 · 2025-10-30 · high confidence
Add Principal Display Panel (PDP) identification to AI analysis LabelCheck

# Add Principal Display Panel (PDP) identification to AI analysis **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-30 **Author:** Mark Hope **Commit:** `7379dd2` **Scope:** 1 files, +33/-0 ## Commit message ``` Add Principal Display Panel (PDP) identification to AI analysis Cri

7379dd2 · 1 files · +33/-0 · 2025-10-30 · high confidence
Add detailed unfolded box template analysis instructions LabelCheck

# Add detailed unfolded box template analysis instructions **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-30 **Author:** Mark Hope **Commit:** `bfc7ecb` **Scope:** 1 files, +30/-5 ## Commit message ``` Add detailed unfolded box template analysis instructions Enhanced PDP iden

bfc7ecb · 1 files · +30/-5 · 2025-10-30 · medium confidence
Improve AI product name extraction - distinguish from marketing taglines LabelCheck

# Improve AI product name extraction - distinguish from marketing taglines **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-30 **Author:** Mark Hope **Commit:** `c7b61c8` **Scope:** 1 files, +8/-1 ## Commit message ``` Improve AI product name extraction - distinguish from market

c7b61c8 · 1 files · +8/-1 · 2025-10-30 · high confidence
Major analysis improvements: PDP identification, fortification policy, claims citations, category violations LabelCheck

# Major analysis improvements: PDP identification, fortification policy, claims citations, category violations **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-30 **Author:** Mark Hope **Commit:** `e97e9ff` **Scope:** 3 files, +143/-35 ## Commit message ``` Major analysis improv

e97e9ff · 3 files · +143/-35 · 2025-10-30 · high confidence
Complete category selector integration - ambiguity detection to re-analysis flow LabelCheck

# Complete category selector integration - ambiguity detection to re-analysis flow **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-29 **Author:** Mark Hope **Commit:** `2dc2eac` **Scope:** 2 files, +90/-24 ## Commit message ``` Complete category selector integration - ambiguity

2dc2eac · 2 files · +90/-24 · 2025-10-29 · high confidence
Enhance AI marketing claims detection - catch problematic terms in initial analysis LabelCheck

# Enhance AI marketing claims detection - catch problematic terms in initial analysis **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-29 **Author:** Mark Hope **Commit:** `c358ae9` **Scope:** 1 files, +87/-6 ## Commit message ``` Enhance AI marketing claims detection - catch pr

c358ae9 · 1 files · +87/-6 · 2025-10-29 · high confidence
Enhance claims analysis with comprehensive prohibited claims examples LabelCheck

# Enhance claims analysis with comprehensive prohibited claims examples **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `139de4d` **Scope:** 1 files, +31/-10 ## Commit message ``` Enhance claims analysis with comprehensive prohibited claims

139de4d · 1 files · +31/-10 · 2025-10-24 · high confidence
Fix critical supplement analysis issues LabelCheck

# Fix critical supplement analysis issues **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `265f029` **Scope:** 1 files, +12/-4 ## Commit message ``` Fix critical supplement analysis issues This commit addresses 7 major issues reported with

265f029 · 1 files · +12/-4 · 2025-10-24 · high confidence
Improve claims analysis: Distinguish acceptable from prohibited claims LabelCheck

# Improve claims analysis: Distinguish acceptable from prohibited claims **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `2cf3815` **Scope:** 1 files, +55/-11 ## Commit message ``` Improve claims analysis: Distinguish acceptable from prohibi

2cf3815 · 1 files · +55/-11 · 2025-10-24 · high confidence
Add category-specific prompts with feature flag (Priority 2 refactoring) LabelCheck

# Add category-specific prompts with feature flag (Priority 2 refactoring) **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `2e6c5c7` **Scope:** 4 files, +648/-28 ## Commit message ``` Add category-specific prompts with feature flag (Priority

2e6c5c7 · 4 files · +648/-28 · 2025-10-24 · high confidence
Enhance category-specific prompts with comprehensive analysis instructions LabelCheck

# Enhance category-specific prompts with comprehensive analysis instructions **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `5356537` **Scope:** 4 files, +1173/-110 ## Commit message ``` Enhance category-specific prompts with comprehensive

5356537 · 4 files · +1173/-110 · 2025-10-24 · high confidence
Add comprehensive sexual health claims guidance and disclaimer requirements analysis LabelCheck

# Add comprehensive sexual health claims guidance and disclaimer requirements analysis **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `6cca9ed` **Scope:** 2 files, +189/-24 ## Commit message ``` Add comprehensive sexual health claims guidan

6cca9ed · 2 files · +189/-24 · 2025-10-24 · high confidence
Extend RAG lite to images: reduce regulatory documents from 50 to ~15-25 based on category LabelCheck

# Extend RAG lite to images: reduce regulatory documents from 50 to ~15-25 based on category **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `a8137f4` **Scope:** 2 files, +86/-21 ## Commit message ``` Extend RAG lite to images: reduce regula

a8137f4 · 2 files · +86/-21 · 2025-10-24 · high confidence
Remove all category-specific prompt code and files LabelCheck

# Remove all category-specific prompt code and files **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `ac44953` **Scope:** 2 files, +14/-77 ## Commit message ``` Remove all category-specific prompt code and files - Deleted prompts/ directory

ac44953 · 2 files · +14/-77 · 2025-10-24 · high confidence
Extract prompts to external markdown files (Priority 3 refactoring) LabelCheck

# Extract prompts to external markdown files (Priority 3 refactoring) **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `dd0a023` **Scope:** 10 files, +616/-377 ## Commit message ``` Extract prompts to external markdown files (Priority 3 refac

dd0a023 · 10 files · +616/-377 · 2025-10-24 · high confidence
Add DHEA warning checks and improve visual section separation LabelCheck

# Add DHEA warning checks and improve visual section separation **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-24 **Author:** Mark Hope **Commit:** `ebd6317` **Scope:** 7 files, +288/-18 ## Commit message ``` Add DHEA warning checks and improve visual section separation This

ebd6317 · 7 files · +288/-18 · 2025-10-24 · high confidence
Fix product classification to prioritize panel type and detect ambiguity LabelCheck

# Fix product classification to prioritize panel type and detect ambiguity **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-23 **Author:** Mark Hope **Commit:** `08012ca` **Scope:** 1 files, +21/-10 ## Commit message ``` Fix product classification to prioritize panel type and de

08012ca · 1 files · +21/-10 · 2025-10-23 · high confidence
Add critical panel type validation for dietary supplements LabelCheck

# Add critical panel type validation for dietary supplements **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-23 **Author:** Mark Hope **Commit:** `9ec6412` **Scope:** 1 files, +35/-4 ## Commit message ``` Add critical panel type validation for dietary supplements User feedback

9ec6412 · 1 files · +35/-4 · 2025-10-23 · medium confidence
Force ambiguity detection and improve PDF text encoding LabelCheck

# Force ambiguity detection and improve PDF text encoding **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-23 **Author:** Mark Hope **Commit:** `b29c37a` **Scope:** 2 files, +56/-18 ## Commit message ``` Force ambiguity detection and improve PDF text encoding Issue #1: Ambiguit

b29c37a · 2 files · +56/-18 · 2025-10-23 · high confidence
Add specialized PDF reading instructions for complex label designs LabelCheck

# Add specialized PDF reading instructions for complex label designs **Project:** LabelCheck (`labelcheck`) **Date:** 2025-10-22 **Author:** Mark H **Commit:** `458c931` **Scope:** 1 files, +12/-2 ## Commit message ``` Add specialized PDF reading instructions for complex label designs

458c931 · 1 files · +12/-2 · 2025-10-22 · high confidence

Prompt Engineering

Summary

TL;DR

Common Failure Modes

1. Category-specific prompt assembly causes AI context fragmentation

2. Numeric output constraints drift without post-processing enforcement

3. Broad keywords cause false positives in pattern matching

4. AI ignores soft constraint language; requires imperative commands

5. AI prioritizes functional signals over structural indicators in classification

6. JSON parsing fails silently when model wraps output in markdown fences

7. Reference database gaps cause false positive compliance errors

8. Initial analysis misses claims that appear only in follow-up chat

What Works

Decoupling extraction from synthesis, then caching extractions

LLM-judged rubric evaluation for quantified prompt iteration

Extracting prompts to external markdown files

RAG lite: cheap OCR pre-classification to filter regulatory documents

Explicit enumeration of acceptable vs. prohibited patterns

Feature flags with graceful fallbacks for prompt experimentation

Gotchas and Edge Cases

Where Docs Disagree With Practice

Tool and Version Notes

Related Topics

Sources

Sources

Fragments (52)