Performance & Caching

quick-win-2-6.md
- wiki/engineering/performance-caching/labelcheck-354682d-extend-regulatory-cache-ttl-to-24-hours---quick-wi.md
- wiki/engineering/performance-caching/asymxray-97b88c3-add-performance-optimizations-for-production.md
- wiki/engineering/performance-caching/hazardos-3a4ddba-optimize-database-queries-for-analytics-and-servic.md
- wiki/engineering/performance-caching/hazardos-2b2ef7f-add-database-indexes-for-query-optimization.md
- wiki/engineering/performance-caching/meridian-7794133-receiver-sqlite-backed-job-store-fixes-cross-worke.md
- wiki/engineering/performance-caching/meridian-ae97fc9-switch-synthesis-queue-from-supabase-to-file-based.md
- wiki/engineering/performance-caching/asymxray-9e2a578-enforce-default-pagination-limit-in-repository-bas.md
- wiki/engineering/performance-caching/stride-v2-ee5ebc7-optimize-cicd-pipeline-for-faster-builds-and-bette.md
- wiki/engineering/performance-caching/asymxray-d9dee05-cap-emerging-trend-growth-rates-at-500-for-display.md
contradicting_sources: []
contradicting_count: 0
first_seen: "unknown"
last_updated: "2025-01-31"
hypothesis: false
rate_of_change: high
web_monitoring_frequency: weekly
fragment_count: 19
evolution_timeline: []
evolution_start: null
superseded_by: null
superseded_date: null
deprecation_notice: null
tags: []

Summary

The highest-leverage performance wins across projects have come from three sources in roughly this order: fixing unbounded queries (missing pagination, missing indexes), choosing the right cache TTL for data volatility, and parallelizing independent I/O-bound operations. A single missing database index or a default 1-hour TTL on static regulatory data can account for the majority of latency in a given endpoint — the fix is often a one-liner once the cause is identified. Shared-state bugs (cross-worker job lookups, race conditions in parallel pipelines) are the most common failure mode when moving from single-process to multi-process architectures, and they require explicit coordination mechanisms rather than hoping in-memory state is sufficient.

TL;DR

What we've learned
- Extending cache TTL from 1 hour to 24 hours on static regulatory data cut cache misses by 95% in LabelCheck — TTL selection is the single highest-leverage cache tuning decision.
- Missing database indexes caused 50-70% slower queries in Hazardos; adding composite indexes on frequently-queried column combinations fixed it without touching application code.
- In-memory job stores break silently under multi-worker deployments — Meridian's receiver hit cross-worker "job not found" errors until the store moved to SQLite.
- Unbounded queries (no pagination default) are a latency time bomb; AsymXray's repository base class now enforces a 100-record default limit.
- Parallelizing independent compliance checks (GRAS, NDI, Allergen) in LabelCheck cut post-processing from 6-7 seconds to 2-3 seconds with no correctness risk, because the checks write to independent fields.

External insights

No external sources ingested yet for this topic.

Common Failure Modes

In-memory job store breaks under multi-worker deployments

When a job is created by one worker process and a status poll arrives at a different worker, the second worker has no record of the job. The symptom is a "job not found" error that appears intermittently and is impossible to reproduce in single-process local dev.

Root cause: per-worker in-memory dicts are not shared across processes. Any deployment with more than one worker (Gunicorn, Uvicorn with multiple workers, Coolify scaling) will hit this.

Fix: replace the in-memory dict with a SQLite-backed store. The file is shared across workers on the same host, the overhead is negligible (~20 KB per hundred jobs), and it survives process restarts.

Observed in Meridian's receiver service — the cross-worker error was the direct trigger for the SQLite migration.
^[1]

Unbounded queries returning full table scans

Without a default pagination limit, list endpoints will return every matching row as the dataset grows. This doesn't fail loudly — it degrades gradually, and the first sign is usually a slow dashboard that nobody can explain.

Root cause: ORM query builders don't add LIMIT unless you tell them to. A repository base class that omits a default limit means every subclass inherits the problem.

Fix: enforce a default limit at the repository base class level, not at the route handler level. Route handlers get overridden; base classes don't.

class BaseRepository:
    DEFAULT_LIMIT = 100

    def find_many(self, filters, limit: int = DEFAULT_LIMIT, offset: int = 0):
        return self.db.query(...).filter(filters).limit(limit).offset(offset).all()

Observed in AsymXray — enforcing the 100-record default at the repository base class was the fix.
^[2]

Cache TTL set too short for data volatility

A 1-hour TTL on regulatory documents that change monthly means the cache is effectively bypassed — every hour, the first request re-fetches the document and pays full latency. At moderate traffic, this produces near-zero cache hit rates.

Root cause: default TTLs are often chosen arbitrarily or copied from a different data type. Regulatory/reference data has a completely different update cadence than user-generated data.

Fix: match TTL to actual data volatility. For LabelCheck's regulatory documents (monthly/quarterly updates), extending from 1 hour to 24 hours cut cache misses from ~24/day to ~1/day — a 95% reduction.

The inverse applies: don't apply long TTLs to real-time metrics. AsymXray uses 30-second TTLs for monitoring metrics and 3-minute TTLs for dashboard APIs, which is appropriate for data that changes continuously.
^[3]

Missing indexes on composite query patterns

Adding an index on a single column doesn't help if queries filter on two columns together. A query like WHERE user_id = ? AND status = ? will do a full scan of the user_id index and then filter in memory, which is nearly as slow as no index at all for high-cardinality combinations.

Root cause: indexes are added reactively (one column at a time) rather than by analyzing actual query patterns.

Fix: instrument slow queries first, then add composite indexes that match the actual filter + sort combinations in production queries. Observed in Hazardos — adding composite indexes on frequently-queried column combinations reduced query execution time by 50-70%.
^[4]

Sequential queries where parallel execution is safe

Analytics and dashboard endpoints frequently issue 3-5 independent queries in sequence — each one waits for the previous to complete even though none of them depend on each other's results. At 50-100ms per query, a 5-query sequence adds 250-500ms of pure waiting.

Root cause: the path of least resistance in synchronous code is sequential. Parallelism requires explicit asyncio.gather or Promise.all and a moment to verify there's no shared state.

Fix: identify independent I/O operations and run them concurrently. In Hazardos, consolidating sequential queries and running independent ones in parallel reduced round-trips by 60-90%. In LabelCheck, parallelizing GRAS, NDI, and Allergen compliance checks (which write to independent fields) cut post-processing from 6-7 seconds to 2-3 seconds.

# Before: sequential
gras_result = await check_gras(ingredients)
ndi_result = await check_ndi(ingredients)
allergen_result = await check_allergens(ingredients)

# After: parallel (safe because checks write to independent fields)
gras_result, ndi_result, allergen_result = await asyncio.gather(
    check_gras(ingredients),
    check_ndi(ingredients),
    check_allergens(ingredients),
)

^[5]

Re-running expensive analysis on already-processed data

Without result caching, every request for a call analysis or compliance check re-runs the full pipeline — LLM calls, regex matching, database lookups — even when the underlying data hasn't changed.

Root cause: no cache layer between the API endpoint and the analysis pipeline. The analysis functions are treated as pure compute rather than as expensive operations with cacheable outputs.

Fix: store analysis results on first completion and return the stored result on subsequent requests. In AsymXray, returning stored call analysis instead of re-analyzing was the fix. The key design decision is cache invalidation: what event should trigger a re-analysis? (Usually: the source transcript changes.)
^[6]

What Works

Redis with differentiated TTLs by data type

A single Redis TTL for all cached data is almost always wrong. AsymXray uses 30-second TTLs for monitoring metrics (changes every poll cycle) and 3-minute TTLs for dashboard API responses (changes less frequently). This produced a 60-80% cache hit rate and dropped API response time from 377ms to 200-250ms.

The pattern: categorize cached data by update frequency, assign TTLs accordingly, and treat the TTL as a first-class configuration value (not a magic number buried in code).
^[7]

File-based storage for small, non-critical queues

For queues under a few hundred items that don't need distributed access, a file-based JSON queue with file locking is simpler than a database and has no external dependency. Meridian switched its synthesis queue from Supabase to a file-based queue (68 items) and eliminated the Supabase round-trip entirely.

The constraint: this only works when all workers share a filesystem (single-host deployments). For multi-host deployments, you need a real queue.
^[8]

Performance regression testing as a CI gate

Stride v2 implemented comprehensive performance regression tests that run in CI. The value isn't catching regressions after they're deployed — it's catching them before merge, when the fix is cheap. Build artifact sharing in the same pipeline saves 5-10 minutes per CI run, which matters when the test suite is already slow.

The pattern: measure baseline performance metrics, encode them as test assertions, run them in CI. The hard part is choosing thresholds that are tight enough to catch real regressions but loose enough not to flake on CI variance.
^[9]

Database query performance monitoring before optimizing

Hazardos added database query performance monitoring before doing any optimization work. This is the correct order of operations — without instrumentation, you're guessing which queries are slow. With it, the 60-90% round-trip reduction was targeted at the actual bottlenecks.

The anti-pattern: adding indexes speculatively on columns that "seem important" rather than on columns that appear in slow query logs.
^[10]

Cloudflare R2 for backup storage

Observed in Meridian: R2's free egress eliminates the cost concern that makes restore drills feel expensive. When restores are free, you actually run them. The practical implication is that nightly sync + restore runbook becomes a viable operational pattern rather than a theoretical one.
^[11]

Gotchas and Edge Cases

Parallelization is only safe when operations are truly independent

LabelCheck's compliance checks are safe to parallelize because GRAS, NDI, and Allergen checks write to independent output fields. If two parallel operations write to the same field or read-modify-write shared state, parallelization introduces race conditions that are hard to reproduce and harder to debug. Before parallelizing, explicitly verify that operations have no shared mutable state.
^[12]

Generic term matching in ingredient databases produces false positives

In LabelCheck's GRAS matching, short generic terms (e.g., "water", "salt") match too broadly when the matching algorithm doesn't account for specificity. The fix was two-pronged: filter out generic terms below a length threshold, and sort candidates by length descending so longer (more specific) names are evaluated first. After the fix, all 11 energy drink ingredients matched correctly where they had previously produced false positives.
^[13]

External data sources can return physically impossible values

Google Trends data fed into AsymXray produced growth rates of 241,250% — not a bug in the calculation, but a real value from the source that is meaningless for display. Without a cap, these values break chart scales and make legitimate trends unreadable. AsymXray now caps growth rates at -95% to 500% before display. The general pattern: any time you're displaying computed metrics from external sources, add bounds validation before rendering.
^[14]

Reference database size directly affects false positive rate

LabelCheck's "requires NDI" recommendation was firing incorrectly because the ODI database only contained 1,000 of the ~2,000+ known old dietary ingredients. Ingredients not in the database were flagged as potentially requiring NDI notification. Expanding the database from 1,000 to 2,193 ingredients (adding the UNPA consolidated list) cut false positives for that recommendation significantly. The lesson: accuracy problems in classification systems are often data coverage problems, not algorithm problems.
^[15]

Rate limiting as a performance concern, not just a courtesy

Orbit ABM's 24-hour per-contact send throttle prevents email fatigue, but it also prevents the system from hammering downstream email infrastructure with duplicate sends during retry storms or misconfigured workflows. The throttle is implemented as a skip (not a queue), so it doesn't create backpressure — it just drops sends within the 24-hour window.
^[16]

Where Docs Disagree With Practice

No direct doc-vs-practice contradictions are present in the current evidence set. The patterns here are consistent with standard performance engineering guidance — the value is in the specific numbers (95% miss reduction, 60-90% round-trip reduction) and the specific failure modes (cross-worker in-memory state, generic term matching) that docs don't surface.

Tool and Version Notes

Redis TTL configuration: Differentiated TTLs (30s for metrics, 3min for dashboards) observed working in AsymXray production. No version-specific behavior noted — this is a configuration pattern, not a version constraint.
SQLite as a job store: ~20 KB per hundred jobs in Meridian. Appropriate for single-host deployments; not a substitute for a proper queue in distributed deployments.
Cloudflare R2: Free egress is the operative characteristic for backup use cases. Confirmed in Meridian nightly sync setup.
Supabase as a queue: Replaced in Meridian because the overhead of a remote database round-trip was disproportionate for a 68-item queue. Supabase is appropriate for persistent data, not for lightweight job coordination on a single host.

Sources

Synthesized from 19 fragments: 17 git commits across AsymXray, Hazardos, LabelCheck, Meridian, Orbit ABM, and Stride v2; 0 external sources; 0 post-mortems. Date range: unknown to unknown.

Summary

TL;DR

Common Failure Modes

In-memory job store breaks under multi-worker deployments

Unbounded queries returning full table scans

Cache TTL set too short for data volatility

Missing indexes on composite query patterns

Sequential queries where parallel execution is safe

Re-running expensive analysis on already-processed data

What Works

Redis with differentiated TTLs by data type

File-based storage for small, non-critical queues

Performance regression testing as a CI gate

Database query performance monitoring before optimizing

Cloudflare R2 for backup storage

Gotchas and Edge Cases

Parallelization is only safe when operations are truly independent

Generic term matching in ingredient databases produces false positives

External data sources can return physically impossible values

Reference database size directly affects false positive rate

Rate limiting as a performance concern, not just a courtesy

Where Docs Disagree With Practice

Tool and Version Notes

Related Topics

Sources

Sources

Fragments (19)