all-tests-now-passing.md
- wiki/engineering/testing/stride-v2-13e63ba-add-comprehensive-test-coverage-and-cicd-improveme.md
- wiki/engineering/testing/stride-v2-d1f0a26-adjust-coverage-thresholds-to-realistic-levels-for.md
- wiki/engineering/testing/eydn-app-dd9e292-optimize-ci-pipeline-shared-cache-job-dependencies.md
- wiki/engineering/testing/stride-v2-7656255-update-unit-tests-and-optimize-ci-workflow.md
- wiki/engineering/testing/contentcommand-9028983-add-comprehensive-test-suite-to-achieve-80-coverag.md
- wiki/engineering/testing/hazardos-eb2b4a1-add-comprehensive-api-test-suite.md
- wiki/engineering/testing/stride-v2-a0e2caf-add-comprehensive-validation-tests-for-server-acti.md
first_seen: "unknown"
last_updated: "2025-07-14"
hypothesis: false
fragment_count: 84
tags: []
Summary
Coverage thresholds set against the wrong baseline are the most common CI failure mode in this codebase — Next.js projects with server-side components, API routes, and DB operations can't hit 80% statement coverage uniformly, and trying to enforce it blocks CI rather than improving quality. The practical fix is tiered thresholds (global ~8%, components ~25%, lib ~40%) or removing coverage enforcement entirely until the project matures. The most reliable path to meaningful coverage is the batch approach: add 50–200 tests per commit targeting pure utilities and validation patterns first (deterministic, no mocks), then API routes with AAA + service-layer mocking, then components. E2E tests with Playwright require a production build (next start) and wait-on readiness checks — development server runs produce flaky results. CI costs can be cut 70–88% by combining shared npm cache, change-detection skipping, and running only Chromium on PRs.
TL;DR
What we've learned
- Coverage thresholds that don't account for server-side code cause CI failures; tiered or removed thresholds unblock development without sacrificing quality signal
- Validation-pattern tests (data structures, constants, business rules, pure utilities) are the fastest path to reliable coverage — no mocks, no flakiness
- E2E tests need a production build and wait-on before execution; dev server runs are unreliable across all projects tested
- Batch test additions (50–200 tests per commit) are the practical way to move coverage from single digits to 40%+; incremental per-feature additions maintain it
- CI pipeline costs are dominated by redundant npm ci calls and full browser matrix on every push — both are fixable with shared cache and branch-conditional matrix
External insights
No external sources ingested yet for this topic.
Common Failure Modes
Coverage thresholds set too high for Next.js projects with server-side code
The default instinct is to set Jest coverage thresholds at 80% globally. In Next.js App Router projects with server actions, API routes, and Prisma/Supabase calls, this is unreachable without mocking the entire infrastructure layer — which produces tests that verify mocks, not behavior. The result is CI that fails on coverage rather than on broken code.
In Stride v2, coverage thresholds were removed entirely to unblock CI during early development [1], then later replaced with tiered thresholds: global 8% statements, components 25%, lib 40% [2]. The tiered approach is better than removal — it still catches regressions in the testable layers while not penalizing the infrastructure-heavy layers.
Fix: Set thresholds per directory, not globally. Start from what you can actually achieve, not from what sounds good.
// jest.config.js coverageThreshold
{
"global": { "statements": 8 },
"./src/components": { "statements": 25 },
"./src/lib": { "statements": 40 }
}
E2E tests fail because the server isn't ready
Playwright tests that fire immediately after next start or next dev hit a race condition — the server process exists but isn't accepting connections yet. Symptom: tests fail with connection refused or timeout on the first request, pass if you add a manual sleep.
Fix: Use wait-on to poll the health endpoint before Playwright runs.
# .github/workflows/e2e.yml
- name: Start server
run: next start &
- name: Wait for server
run: npx wait-on http://localhost:3000 --timeout 60000
- name: Run Playwright
run: npx playwright test
Observed in Stride v2 [4]. Seen in two projects (Stride v2 and Eydn), where Eydn additionally confirmed that next start (production build) is more reliable than next dev for Playwright — the dev server's HMR activity can interfere with test timing [5].
ESM/Faker compatibility breaks tests in integration/examples paths
Jest's default transform config doesn't handle ESM modules from packages like @faker-js/faker when they appear in integration/ or examples/ subdirectories. The failure manifests as a syntax error on the ESM import, not a missing module error — which makes it look like a transform issue rather than a path exclusion issue.
Fix: Exclude the offending paths in Jest config rather than fighting the transform chain.
// jest.config.js
testPathIgnorePatterns: [
'/node_modules/',
'/integration/examples/',
]
Observed in Stride v2 when the integration/examples/ path was excluded after repeated failures [6].
Auth validation order causes E2E test failures on invalid input
In test mode, API endpoints that check authentication before validating request data will reject malformed test requests with auth errors rather than validation errors. This makes tests that intentionally send invalid payloads (to verify 400 responses) fail with 401s instead, producing misleading failure messages.
LabelCheck hit this when building E2E test coverage for API routes — the fix was reordering middleware to validate request data before auth checks in test mode, which brought the suite from partial failures to 83/83 passing [7].
This is a test-mode-specific concern: production order (auth first) is correct for security. The test mode reordering is only for the validation error path.
Redundant CI steps double test execution time
Running test and test:coverage as separate CI steps executes the full test suite twice. test:coverage already runs all tests — the standalone test step is pure waste.
Observed in AsymXray: removing the redundant unit test run halved CI execution time [8]. The same pattern appeared in Stride v2 when refactoring from ~20 parallel jobs to ~6 on PRs [9].
Build artifacts in CI cause dependency resolution failures
Uploading and downloading build artifacts between CI jobs introduces version skew — the artifact was built against one node_modules state, but the downstream job may have a different cache hit. Local builds with Next.js cache are more reliable than artifact-passing.
Observed in Stride v2 [10]. The fix was building locally in each job that needs the build output rather than sharing artifacts. This trades some CI time for reliability.
Environment variable validation misses empty strings
if (!process.env.SOME_VAR) catches undefined but not "". CI environments frequently set variables to empty strings as placeholders, which passes the undefined check but causes downstream failures when the value is actually used.
Fix: Check for both.
if (!process.env.SOME_VAR || process.env.SOME_VAR === '') {
throw new Error('SOME_VAR is required');
}
Observed in Stride v2 during CI artifact and env var validation improvements [11].
Mock files accumulate dead weight and obscure test intent
Test mock files grow as features are added and rarely shrink when features change. In LabelCheck, mocks.ts reached 286 lines — most of it unused after a migration from Jest unit tests to Playwright E2E tests. The dead mocks didn't cause failures, but they made the test setup opaque and slowed onboarding.
After cleanup: 77 lines, all tests still passing, 210 lines removed [12]. The signal: if you can't explain why a mock exists, it probably doesn't need to exist.
What Works
Validation-pattern tests as the baseline layer
Tests that verify data structures, constants, business rules, and pure utility functions require no mocking and produce zero flakiness. They're the fastest tests to write and the most reliable to maintain. In Stride v2, ~1,200 validation tests for server actions and utilities were added across 6 phases with near-zero maintenance burden [13]. In Hazardos, the same pattern anchored the first batch of 50 comprehensive tests (10,416 lines across 58 files) [14].
Start here before touching API routes or components. The coverage is real and the tests don't rot.
AAA pattern with service-layer mocking for API route tests
API route tests that mock at the service layer (not the DB layer) are the right abstraction: they test the route's auth, validation, and response-shaping logic without requiring a live database, and they don't couple to ORM internals.
// Arrange
const mockJobService = { getJob: jest.fn().mockResolvedValue({ id: '1', status: 'active' }) };
jest.mock('@/services/job', () => mockJobService);
// Act
const res = await GET(req, { params: { id: '1' } });
// Assert
expect(res.status).toBe(200);
expect(await res.json()).toMatchObject({ id: '1' });
Consistent across Hazardos (380+ passing API route tests across 85 test files) [15], LabelCheck [16], and ContentCommand (coverage 64% → 80.49%) [17].
Shared npm cache across CI jobs
Running npm ci in every parallel CI job is the single biggest source of wasted CI minutes. Caching node_modules in the install job and restoring it in downstream jobs reduces installs from N to 1.
In Eydn, this reduced npm ci calls from 4 to 1 [5]. In Stride v2, combining shared cache with change-detection skipping and single-browser PRs cut CI action minutes by 70–88% [18].
Batch test additions to establish baseline coverage
When a project has low coverage (< 15%), incremental per-feature test additions don't move the needle fast enough to be motivating or useful. The effective pattern is a dedicated batch commit: pick a domain (utilities, then services, then API routes, then components), write 50–200 tests, commit. Repeat across domains.
Hazardos went from 84 test cases (12%) to 1,157 (40%) through systematic batch additions across ~15 commits [19]. ContentCommand went from 64% to 80.49% in a single large commit covering 50+ test files [17]. Stride v2 added 182 tests in one commit to move from 3.59% to 6.56% [20].
Playwright for responsive and accessibility testing in CI
Playwright's multi-viewport support makes it practical to run responsive and accessibility checks as part of CI rather than as manual audits. In Eydn, 67 tests cover 21 public pages and 45 dashboard views across 3 viewports each, integrated into the CI pipeline [21].
The aria-hidden exclusion pattern is worth noting: dashboard tests required adding aria-hidden exclusions to avoid false positives from decorative elements [22].
Change-detection to skip tests on documentation-only commits
Running the full test suite on every push regardless of what changed is expensive and trains developers to ignore CI. Stride v2's CI workflow skips tests for documentation-only changes and uses path-based change detection to run only relevant test suites [18].
Gotchas and Edge Cases
Business logic tests require edge-case enumeration upfront
Tests for permissions, scheduling, and task generation look simple until you enumerate the cases: wildcard matching, week-of-month calculations, UTC vs. local time handling, seasonal schedule types. In Stride v2, comprehensive business logic tests required 117 test cases (39 for permissions, 21 for protect-action, 57 for task-generator) to cover the actual edge case space [23]. Underestimating this leads to tests that pass on the happy path but miss the cases that actually break in production.
AI-powered feature tests need multiple input format coverage
Voice transcription and content generation endpoints typically accept both base64 JSON payloads and FormData — and the two paths have different validation and error behavior. Tests that only cover one format miss real failure modes. In Hazardos, AI voice transcription tests explicitly covered base64 JSON, FormData, context validation, and error handling as separate cases [24].
Third-party integration tests need connection validation coverage
Tests for HubSpot, Mailchimp, QuickBooks, and Stripe routes should include cases where the connection isn't configured — not just the happy path. In Hazardos, segment route tests covered HubSpot sync, Mailchimp sync, and connection validation as distinct test cases [25]. Missing the "not connected" case means the test suite doesn't catch the most common user-facing error.
Health score thresholds in tests need to track implementation changes
Tests that assert specific numeric thresholds (CPA warning at 1.2x target, critical at 1.6x target) break silently when the underlying calculation changes — they fail with a number mismatch, not a meaningful error message. In AsymXray, health score tests required two separate threshold adjustment commits as the calculation evolved [26]. Consider asserting relative relationships (warning < critical) rather than absolute values where the exact threshold is subject to product decisions.
Placeholder tests create a false coverage signal
Hazardos added placeholder tests for lower-priority API routes with intent to fill them in later [27]. Placeholders that pass (e.g., it.todo() or expect(true).toBe(true)) inflate coverage metrics without providing any protection. Track placeholders explicitly — a // TODO: implement comment in a passing test is a coverage lie.
Mock chaining issues surface in service module tests
In Hazardos iteration 3 batch tests, some service module tests (jobs, activity, approval, commission) had mock chaining issues that weren't caught before commit and required follow-up fixes [28]. Mock chains where one mock's return value feeds into another mock's input are fragile — prefer flat mocks where possible, and run the full suite locally before committing large batches.
Aspirational E2E tests need explicit skip markers
Writing E2E tests for features not yet implemented is a legitimate planning technique, but only if the tests are explicitly skipped with a comment explaining why. In Stride v2, volunteer portal E2E tests were written aspirationally, then skipped, then later rewritten to match the actual implementation [29]. Without the explicit skip, aspirational tests fail CI and get ignored.
Where Docs Disagree With Practice
Jest coverage thresholds: docs suggest uniform high targets, practice requires tiering
Jest documentation and most testing guides recommend setting a single global coverage threshold (commonly 80%) and enforcing it uniformly. In practice, Next.js App Router projects with server components, API routes, and ORM calls cannot hit 80% globally without mocking the infrastructure layer so heavily that the tests stop being useful. The practical approach — tiered thresholds by directory — isn't well-documented in Jest's own docs.
Observed in Stride v2 [2] and implied by the coverage removal in the same project [1].
Playwright docs recommend dev server for local testing; production build is more reliable in CI
Playwright's getting-started docs use next dev as the webServer command. In CI, next start (production build) is more reliable — the dev server's HMR and compilation activity introduces timing variability that causes intermittent test failures. Observed in Eydn [5].
Auth-first middleware ordering: correct for production, wrong for test-mode validation error paths
Standard security guidance (and most framework docs) says authenticate before doing anything else. In test mode, this ordering causes validation-error tests to receive 401s instead of 400s, making it impossible to verify that the validation layer works correctly. LabelCheck required test-mode-specific middleware reordering to achieve 100% E2E pass rate [7]. This isn't documented anywhere in Next.js or common testing guides.
Tool and Version Notes
- Jest + ESM (Faker): ESM packages in
integration/examples/paths require explicittestPathIgnorePatternsexclusion. Observed in Stride v2. No version pinned in evidence, but the ESM/CJS boundary issue is present in Jest < 30. - Playwright: Multi-viewport responsive testing works well with
page.setViewportSize()per test.aria-hiddenexclusions needed for dashboard components with decorative elements. Observed in Eydn. wait-on: Required for E2E test reliability when starting Next.js server in CI. Use--timeout 60000minimum; cold starts on CI runners can be slow.- Claude Opus 4.5: All test commits across all six projects include Claude Opus 4.5 as co-author, indicating AI-assisted test generation is the standard workflow. Test quality appears high (92.4%+ pass rates on first run) but mock chaining issues do appear in service-layer tests and require human review.
- GitHub Actions: Full browser matrix (Chromium + Firefox + WebKit) should be gated to
main/masterbranch only. PR runs on Chromium alone. This pattern cut CI minutes 70–88% in Stride v2 [18]. - JUnit reporter: AsymXray added JUnit reporter for integration test results in CI, enabling test result visualization in GitHub Actions [30]. Worth adding to any project using GitHub Actions.
Related Topics
Sources
Synthesized from 84 fragments: git commits across AsymXray, ContentCommand, Eydn, Hazardos, LabelCheck, and Stride v2. No external sources ingested. No post-mortems. Date range: unknown to unknown (fragments lack explicit dates).
Sources
- Stride V2 9Cb9Fec Remove Coverage From Unit Tests To Avoid Threshold ↩
- Stride V2 D1F0A26 Adjust Coverage Thresholds To Realistic Levels For ↩
- Stride V2 D1F0A26 Adjust Coverage Thresholds To Realistic Levels For, Stride V2 9Cb9Fec Remove Coverage From Unit Tests To Avoid Threshold ↩
- Stride V2 D3400D4 Fix E2E Tests By Starting Server Before Running Te ↩
- Eydn App Dd9E292 Optimize Ci Pipeline Shared Cache Job Dependencies ↩
- Stride V2 F27E48B Remove Api Test Examples Step Excluded In Jest Con ↩
- Labelcheck D372B60 Fix E2E Test Protocol All Tests Now Passing ↩
- Asymxray 5Dfa385 Remove Redundant Unit Test Run In Ci ↩
- Stride V2 0B11F27 Refactor Cicd Pipeline To Reduce Duplication And F ↩
- Stride V2 D71A3F1 Remove Build Artifact Dependency Build Locally In ↩
- Stride V2 C40406F Improve Artifact Uploads And Env Var Validation ↩
- Labelcheck A21Ab82 Remove Superfluous Test Code Cleanup Unused Mock ↩
- Stride V2 A0E2Caf Add Comprehensive Validation Tests For Server Acti ↩
- Hazardos A9Fc5B4 Add First Batch Of 50 Comprehensive Tests ↩
- Hazardos A085704 Complete Remaining Api Route Test Implementations ↩
- Labelcheck 0B6C862 Add Api Route Tests And Testing Documentation ↩
- Contentcommand 9028983 Add Comprehensive Test Suite To Achieve 80 Coverag ↩
- Stride V2 7656255 Update Unit Tests And Optimize Ci Workflow ↩
- Hazardos 6572Db4 Comprehensive Documentation Update For Test Covera ↩
- Stride V2 13E63Ba Add Comprehensive Test Coverage And Cicd Improveme ↩
- Eydn App 4E6A789 Add Playwright Responsive Accessibility Tests To C ↩
- Eydn App 96837C2 Add Aria Hidden Exclusion To Dashboard Tests ↩
- Stride V2 147De95 Add Comprehensive Business Logic Tests For Permiss ↩
- Hazardos 71Bf0F5 Add Comprehensive Commissions Plans And Ai Voice T ↩
- Hazardos 0E5C6C0 Add Comprehensive Segments Route Tests, Hazardos 0Ca605C Add Performance Indexes And Stripe Service Test ↩
- Asymxray Bacc151 Update Health Score Tests To Match New Thresholds, Asymxray 766Bfca Fix Health Scores Tests Adjust Objective Priorit ↩
- Hazardos 77916Bf Add Lower Priority Api Route Test Placeholders ↩
- Hazardos 2Bf2694 Add Iteration 3 Batch Tests For Improved Coverage ↩
- Stride V2 B774B0C Skip Volunteer Portal E2E Tests Features Not Yet I, Stride V2 C752039 Rewrite Volunteer Portal E2E Tests For Actual Impl ↩
- Asymxray 81Ae420 Add Junit Reporter For Integration Test Results In ↩