LLM Integration (Anthropic)

43 fragments · Layer 3 Synthesized established · 43 evidence · updated 2025-01-31
↓ MD ↓ PDF

Summary

The default max_tokens of 4096 is too small for any non-trivial Claude task — every project that hit truncation fixed it by raising to 16384, and JSON parsing breaks silently when the response is cut mid-structure. The most reliable architecture across projects is a two-tier model split: Haiku at temperature 0 for classification and planning, Sonnet for generation and writing. Tool use requires an explicit result-feeding loop or Claude can't chain actions. When latency is the primary constraint (sub-30-second user-facing responses), GPT-4o has outperformed Claude in practice — LabelCheck's migration from Claude 3.5 Sonnet → GPT-5 Mini → GPT-4o cut analysis time from 117+ seconds to 15–30 seconds.


TL;DR

What we've learned
- max_tokens: 4096 truncates real responses; set 16384 as the floor across all projects — this has burned Meridian, ContentCommand, and AsymXray independently.
- Haiku + Sonnet two-pass (plan cheap, write expensive) processes 5 documents in under 60 seconds with 3 concurrent workers in Meridian.
- Tool use loops need explicit result-feeding: Claude won't chain actions unless you feed each tool result back into the next API call.
- Claude's 200K context limit is reachable in production — ContentCommand's DataForSEO payloads hit 211K tokens and required pre-summarization.
- LabelCheck migrated away from Claude entirely for latency-sensitive analysis; GPT-4o at 15–30 seconds beat Claude 3.5 Sonnet at 117+ seconds.

External insights

No external sources ingested yet for this topic.


Common Failure Modes

max_tokens default truncates JSON mid-structure

Established failure mode across 4 projects (Meridian, ContentCommand, AsymXray, and implicitly LabelCheck). The Anthropic SDK default of max_tokens: 4096 is insufficient for responses that include structured JSON with substantive content. The failure mode is silent: the API returns a 200, but the response body ends mid-JSON, causing a parse error downstream.

SyntaxError: Unexpected end of JSON input

The fix is two-part: raise max_tokens to 16384 and add JSON repair logic for cases where truncation still occurs at edge-case lengths.

const response = await anthropic.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 16384,  // not 4096
  messages: [...]
});

Observed in Meridian (extraction and compiler passes), ContentCommand (brief generation), and AsymXray (brief generation).
[1]


Context window overflow from third-party API payloads

ContentCommand's DataForSEO competitive analysis responses exceeded Claude's 200K token context window — raw competitive data came in at 211K tokens, causing the API call to fail before Claude could process it.

The fix is pre-summarization: run a cheap pass to compress third-party payloads before they enter the main prompt. Don't assume external API responses are context-window-safe.

Observed in ContentCommand when building the content brief generation pipeline.
[2]


Anthropic client lazy initialization silently fails

Observed in AsymXray: the Anthropic client was initialized lazily (constructed on first use rather than at module load), and when the API key was missing or misconfigured, the failure surfaced as a confusing runtime error during the first AI call rather than at startup.

The fix is eager initialization with an explicit API key check at startup:

if (!process.env.ANTHROPIC_API_KEY) {
  throw new Error("ANTHROPIC_API_KEY is required");
}
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

Seen in two projects (AsymXray, Eydn) — treat missing API key as a startup crash, not a per-request error.
[3]


Tool use without result-feeding loops stalls after first action

In Eydn's AI chat, Claude was given 9 tools (guest management, task creation, vendor tracking, budget, mood board, decision memory, etc.) but the initial implementation didn't feed tool results back into subsequent API calls. Claude would invoke a tool and then stop, unable to chain a second action.

The fix is an explicit loop — up to 5 iterations in Eydn's implementation — that feeds each tool_result block back as a user message:

while (response.stop_reason === "tool_use" && iterations < MAX_ITERATIONS) {
  const toolResults = await executeTools(response.content);
  messages.push({ role: "assistant", content: response.content });
  messages.push({ role: "user", content: toolResults });
  response = await anthropic.messages.create({ model, messages, tools });
  iterations++;
}

Observed in Eydn when implementing the action-taking AI chat.
[4]


Frontend timeouts fire before long AI responses complete

AsymXray's brief generation hit a wall when max_tokens was raised from 2000 to 4096: the longer responses pushed past the default 60-second frontend timeout, returning a timeout error even though the API call eventually succeeded.

Fix: raise the frontend (Next.js route handler or fetch) timeout to 120 seconds when max_tokens is above ~3000. This is a two-knob problem — token limit and timeout must be raised together.

Observed in AsymXray.
[5]


Claude latency unacceptable for synchronous user-facing analysis

LabelCheck's FDA label analysis with Claude 3.5 Sonnet took 117+ seconds — unusable for a user waiting on a compliance result. The project migrated to GPT-5 Mini (still ~117 seconds), then to GPT-4o (15–30 seconds, 4–6x faster).

This is a documented migration decision, not a bug. The lesson: for synchronous, user-facing document analysis where the user is watching a spinner, Claude's latency profile may not fit. GPT-4o is currently faster for this workload.

Observed in LabelCheck (Claude 3.5 Sonnet → GPT-5 Mini → GPT-4o migration).
[6]


What Works

Haiku for planning/classification, Sonnet for writing

Consistent across Meridian and AsymXray: use claude-haiku-4-5 for decisions that produce structured output (which files to create, which industry category, which intent class), and claude-sonnet-4-5 for prose generation. Haiku is fast enough that the planning pass doesn't dominate wall time.

In Meridian's two-pass compiler, this split processes 5 documents in under 60 seconds with 3 concurrent workers. Haiku classification over 58 clients ran in 85.5 seconds total.

// Planning pass — cheap and fast
const plan = await anthropic.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 16384,
  temperature: 0,  // deterministic for classification
  messages: [{ role: "user", content: planningPrompt }]
});

// Writing pass — quality matters
const content = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 16384,
  messages: [{ role: "user", content: writingPrompt(plan) }]
});

[7]


Temperature 0 for classification tasks

Haiku at temperature: 0 produces stable, repeatable classifications. In Meridian's industry classifier, 34 of 58 clients came back high-confidence, 6 medium, 18 low — the low-confidence bucket is the right signal for routing to manual override rather than retrying with a different prompt.

Pair temperature 0 with JSON schema validation in the prompt. Don't rely on Claude to self-correct schema drift; validate the output and reject/retry on schema failure.
[8]


Full user-data context injection for actionable chat responses

In Eydn, injecting the complete user state into the system prompt — up to 50 tasks, 20 vendors, 100 guests, full budget, seating chart, and uploaded documents — produced qualitatively better responses than injecting summaries. Claude can reference specific vendor names, flag budget conflicts, and suggest concrete next steps when it has the actual data.

The practical limit: Eydn's full context fits comfortably within Claude's 200K window. If your user data grows beyond ~150K tokens, you'll need selective injection or summarization.
[9]


Claude vision API for PDF document analysis

The @anthropic-ai/sdk document type (available from v0.67.0) handles PDFs with complex layouts: rotated text, colored backgrounds, poor contrast. In LabelCheck, this was used for FDA label extraction before the project migrated to OpenAI for latency reasons — the extraction quality was not the issue.

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 16384,
  messages: [{
    role: "user",
    content: [{
      type: "document",
      source: {
        type: "base64",
        media_type: "application/pdf",
        data: pdfBase64
      }
    }, {
      type: "text",
      text: "Extract all label text and structure..."
    }]
  }]
});

Enforce a 10MB max file size before sending — the API will reject larger files and the error message is not user-friendly.
[10]


Batching fragments for synthesis at scale

In Meridian's Layer 3 synthesis pipeline, large topics are batched at 20 fragments per batch. Sending all fragments in a single call risks both context overflow and degraded synthesis quality (Claude's attention degrades over very long contexts). The 20-fragment batch size was chosen empirically.

[11]


Per-feature model configuration

Observed in AsymXray: rather than hardcoding a single model across all AI features, implement per-feature model configuration. This lets you tune cost vs. quality per use case without a code change, and makes it easy to A/B test model upgrades on individual features.

[12]


Gotchas and Edge Cases

Haiku is not lightweight for large-batch extraction

The assumption that Haiku = cheap = fast breaks down at scale. In Meridian, Haiku extraction with max_tokens: 16384 over large batches still required 16K tokens per call — the model is cheaper per token, but the token count is driven by the task, not the model. Budget accordingly.
[13]


Tool use with external APIs needs rate limiting and caching

In Eydn, web search was implemented as a Claude tool via the Tavily API. Without rate limiting, a single chat session could exhaust the Tavily quota. The production implementation caps at 10 searches per user per day with a 24-hour result cache. Any tool that calls an external paid API needs this treatment — Claude will call tools eagerly.
[14]


Admin context bypass in AI chat

In Eydn, the AI chat system prompt included task/vendor context fetched from the database. An early bug allowed admin users to bypass the user-scoping on that fetch, meaning the AI could respond with data from other users' accounts. Any context injection that queries the database must enforce the same auth scoping as the rest of the app.
[15]


LLM classification confidence distribution requires a manual override path

In Meridian's industry classifier, 18 of 58 clients (31%) came back low-confidence. A classifier without a manual override path would silently misclassify nearly a third of inputs. Build the override UI before shipping classification features — it's not optional.
[8]


Claude Opus 4.6 1M context is real but not free

Claude Opus 4.6's 1M token context window is used in Meridian and ContentCommand for complex multi-file engineering tasks and knowledge synthesis. The context window works as advertised. The cost per call at 1M tokens is significant — use it for async/batch tasks, not synchronous user-facing features.
[16]


Speech-to-text input requires alias matching in registry-validated prompts

In Meridian's registry-enforced compiler, Claude validates planned knowledge paths against clients.yaml and topics.yaml. Voice input (speech-to-text) introduces transcription errors — "Eydn" becomes "Eden", "AsymXray" becomes "Asim Ray". The validation layer needs alias matching, not exact-string matching, or voice-driven workflows break constantly.
[17]


Where Docs Disagree With Practice

LabelCheck: Claude → GPT migration for latency

The Anthropic docs don't address latency benchmarks relative to OpenAI. In practice at LabelCheck, Claude 3.5 Sonnet took 117+ seconds for FDA label analysis — the same task GPT-4o completes in 15–30 seconds. This is a 4–6x latency gap on a real production workload. The migration path was Claude 3.5 Sonnet → GPT-5 Mini (no improvement) → GPT-4o (fixed). For synchronous document analysis, Claude's latency is a real constraint that Anthropic's documentation doesn't surface.
[6]


max_tokens default is documented but the failure mode is not

Anthropic's docs note that max_tokens defaults to 4096 and can be raised. What they don't say is that JSON responses truncate silently — the API returns HTTP 200 with a stop_reason: "max_tokens" that's easy to miss if you're not explicitly checking it. Four projects hit this independently before it became a known pattern here.

if (response.stop_reason === "max_tokens") {
  // Response was truncated — JSON will be malformed
  throw new Error(`Response truncated at ${response.usage.output_tokens} tokens`);
}

[18]


Tool use docs show single-turn examples; production needs multi-turn loops

Anthropic's tool use documentation shows single tool invocations. Production use cases (Eydn's 9-tool chat with up to 5 chained actions) require a loop that feeds results back. The docs have examples of this but they're not prominent — the single-turn example is the one developers implement first and then have to retrofit.
[4]


Tool and Version Notes



Sources

Synthesized from 43 fragments: git commits across AsymXray, ContentCommand, Eydn, LabelCheck, and Meridian. No external sources ingested yet. Date range: unknown to unknown (commit timestamps not extracted).

Sources

  1. Meridian 5Ac699F Fix Extraction Increase Maxtokels To 16384 Improve, Meridian Cac78D4 Increase Compiler Maxtokens To 16384, Contentcommand 9B4782F Handle Truncated Json From Ai Increase Max Tokens, Asymxray Ef33A7A Resolve Ai Brief Generation Issues
  2. Contentcommand B96D1B7 Summarize Competitive Data To Prevent Prompt Token
  3. Asymxray B03C187 Add Gsc Historical Sync Fix Ai Client Initializati, Eydn App 3707872 Fix Chat Error Handling And Add Anthropicapikey Ch
  4. Eydn App 2Cb2D24 Add Tool Use To Ai Chat Eydn Can Now Take Actions
  5. Asymxray Ef33A7A Resolve Ai Brief Generation Issues
  6. Labelcheck A20C510 Migrate From Anthropic Claude To Openai Gpt For Ai, Labelcheck D2C1Be4 Switch From Gpt 5 Mini To Gpt 4O For Faster Analys
  7. Meridian 51746Ed Parallel Two Pass Compiler Haiku Plans Sonnet Writ, Meridian 3695D97 Classify Clients By Industry Via Haiku With Manual, Asymxray Caa7181 Implement Per Feature Ai Model Configuration
  8. Meridian 3695D97 Classify Clients By Industry Via Haiku With Manual
  9. Eydn App 4901B5C Give Ai Chat Full Access To All User Data
  10. Labelcheck 46A6820 Add Pdf Upload With Vision Based Text Extraction T, Labelcheck 513Ad8F Add Pdf Upload Support For Initial Label Analysis
  11. Meridian 1480508 Phase 3 Layer 3 Synthesis Agent Scheduler Endpoint
  12. Asymxray Caa7181 Implement Per Feature Ai Model Configuration
  13. Meridian 5Ac699F Fix Extraction Increase Maxtokels To 16384 Improve
  14. Eydn App 97978C1 Add Web Search To Ai Chat Via Tavily Api
  15. Eydn App F3E1B1F Fix Ai Chat Admin Bypass Taskvendor Context App Aw
  16. Eydn App 876317F Add Persistent Eydn Memory And Expand Chat Context, Meridian B3A1B4D Initial Scaffold For Meridian Knowledge System
  17. Meridian C2C610C Registry Enforced Compiler Clientsyaml Topicsyaml
  18. Meridian 5Ac699F Fix Extraction Increase Maxtokels To 16384 Improve, Contentcommand 9B4782F Handle Truncated Json From Ai Increase Max Tokens
  19. Labelcheck 46A6820 Add Pdf Upload With Vision Based Text Extraction T
  20. Labelcheck 2C1Ad05 Fix Openai Api Parameter For Gpt 5 Compatibility

Fragments (43)