Summary
The most expensive debugging mistakes across our projects are structural, not logical: N+1 queries that look like single fetches, nested API responses that silently return undefined, and test suites that test CSS class names instead of behavior. The single most reliable debugging workflow is temporary, targeted logging inserted at each data transformation boundary — API response, parsed data, UI output — then removed once the root cause is confirmed. Test infrastructure debt compounds fast: fixing ESLint errors, mock ordering, and environment variable handling before writing new tests saves more time than any coverage expansion. Three categories of bugs recur across every project: query performance, API response shape mismatches, and CI environment-specific failures that have nothing to do with application logic.
TL;DR
What we've learned
- N+1 queries are the most consistently high-impact fix: 7 sequential queries → 3–4 parallel queries cut dashboard load times measurably in both Stride v2 and Hazardos, with 80–95% reductions in database round-trips
- Nested API response wrappers (data.data, result.data) silently return undefined — AsymXray hit this at least 3 times across PageSpeed and Pulse endpoints before a systematic fix
- Google's OAuth revoke endpoint revokes all tokens for a user/app pair, not just the service being disconnected — calling it when a user disconnects one Google service breaks every other connected service
- Jest mock ordering is load-bearing: clearAllMocks must run before mock configuration, not after, or tests bleed state into each other
- UI component tests that assert CSS class names or Radix UI internals break on every component library update; Stride v2 deleted 13 such test files in a single commit
External insights
No external sources ingested yet for this topic.
Common Failure Modes
1. Nested API response wrapper not unwrapped — data silently shows as zero or undefined
Consistent across projects: an API returns { data: { mobile: {...}, desktop: {...} } } and the consumer reads response.mobile instead of response.data.mobile. The result is not an error — it's undefined coerced to zero or an empty render. This is invisible until you log the raw response.
In AsymXray, PageSpeed scores displayed as 0 for weeks. The fix was extracting data.data.mobile and data.data.desktop rather than data.mobile. The same pattern appeared in the Pulse page endpoint (result.data unwrap) and the GBP performance API.
// Wrong — response.mobile is undefined
const mobile = response.mobile?.categories?.performance?.score;
// Correct — PageSpeed wraps in data.data
const mobile = response.data?.mobile?.categories?.performance?.score;
Diagnostic approach: log JSON.stringify(response, null, 2) at the fetch boundary before any destructuring. Don't assume the shape matches the docs.
2. N+1 queries disguised as dashboard aggregations
The pattern: a dashboard loads a list of records, then fires one query per record to get counts or stats. At small scale it's invisible. At production scale it's the first thing to show up in slow query logs.
In Stride v2, getParticipantStats() ran 7 sequential queries. Replacing them with COUNT(DISTINCT), FILTER, and GROUP BY collapsed it to 4 parallel queries. getDashboardStats() went from 7 sequential to 3 parallel. In Hazardos, HubSpot bulk sync, commission summaries, and invoice stats were all sequential — parallelizing in batches of 10 cut database round-trips by 80–95%.
-- Before: one query per participant
SELECT COUNT(*) FROM sessions WHERE participant_id = $1;
SELECT COUNT(*) FROM sessions WHERE participant_id = $1 AND status = 'completed';
-- After: one aggregation query for all participants
SELECT
participant_id,
COUNT(DISTINCT id) AS total_sessions,
COUNT(DISTINCT id) FILTER (WHERE status = 'completed') AS completed_sessions
FROM sessions
GROUP BY participant_id;
The fix is always the same: identify the loop, move the aggregation into the query, run sibling queries in Promise.all.
3. Jest mock ordering causes test state bleed
Established failure mode across AsymXray, LabelCheck, and ContentCommand: when clearAllMocks is called after mock configuration in beforeEach, the configuration gets cleared before the test runs. Tests pass in isolation but fail when run together.
// Broken — clearAllMocks wipes the mock you just configured
beforeEach(() => {
mockSupabase.from.mockReturnValue(mockChain);
jest.clearAllMocks(); // ← clears the line above
});
// Fixed — clear first, then configure
beforeEach(() => {
jest.clearAllMocks();
mockSupabase.from.mockReturnValue(mockChain);
});
Supabase query chains require mocking the full .from().select().eq().single() chain — partial mocks return undefined at the first unmocked call. ContentCommand fixed 17 test files to reach 654 passing / 0 failing.
4. CI failures from environment-specific configuration, not logic errors
Consistent across projects: CI breaks on things that work locally because the runner environment differs in specific, non-obvious ways.
Three concrete instances:
- Stride v2: GitHub Actions runners lack IPv6 connectivity. Database connections using hostnames that resolve to IPv6 fail silently. Fix: force IPv4 resolution explicitly in the connection string. [4]
- Stride v2: Jest 30 renamed testPathPattern to testPathPatterns (plural). CI used Jest 30; local used Jest 29. Tests ran locally, failed in CI with a configuration error. [5]
- AsymXray: supabase CLI package name changed in CI workflow — wrong package name caused silent install failure. [6]
When CI fails on something that passes locally, check the environment before checking the code.
5. Google OAuth revoke endpoint destroys all tokens, not just the target service
Observed in AsymXray: calling Google's token revocation endpoint when a user disconnects one Google service (e.g., Google Ads) revokes the OAuth tokens for every Google service connected under that user/app combination — Search Console, Google Business Profile, everything.
The correct behavior when disconnecting an individual service: delete the service's credentials from your database and let the token expire naturally. Only call revoke if the user is fully disconnecting all Google services.
// Wrong — breaks all other Google services
await oauth2Client.revokeToken(token.access_token);
// Correct — remove from DB, let token expire
await db.from('google_tokens')
.delete()
.eq('user_id', userId)
.eq('service', serviceName);
6. Health score / metric calculation returning wrong value due to simple averaging
Observed twice in AsymXray (health score showed ~38–42 instead of correct ~73). Root cause both times: using average(allScores) instead of weighted aggregation that accounts for objective type. A local SEO account has different channel weights (35% paid, 45% organic, 20% site) than an e-commerce account — flat averaging produces a number that's wrong in a way that looks plausible.
The debugging approach that worked: add logging at each intermediate calculation step, not just the final output. The discrepancy between logged intermediate values and expected values pinpoints which weight or multiplier is wrong.
7. UI component tests that assert implementation details break on library updates
Established failure mode in Stride v2: tests that check for specific CSS class names (e.g., shadcn/ui internal classes) or Radix UI component internals break whenever the component library updates, even when the user-visible behavior is unchanged. Stride v2 accumulated 13 such test files and deleted them all in one commit, going from failing tests to 131 suites / 2479 passing.
The rule: test what the user sees and can interact with, not how the component is styled internally. getByRole, getByText, and user-event interactions survive library upgrades. querySelector('.radix-accordion-trigger') does not.
8. Production Next.js server fails on placeholder environment variables; dev server does not
Observed in Eydn: the production Next.js build validates environment variables at startup more strictly than the dev server. Placeholder values like NEXT_PUBLIC_API_KEY=placeholder that work fine in next dev cause the production server to fail to start entirely.
This surfaces in CI when the build step passes but the server startup check fails. Fix: use real values (or properly structured dummy values that pass validation) in CI environment configuration.
What Works
Temporary, targeted debug logging at data transformation boundaries
The pattern that resolves data discrepancy bugs fastest: insert console.log (or structured log) at three points — raw API response, after parsing/transformation, at the UI data boundary — then remove all of it once the root cause is confirmed. Don't log everything; log the shape at each handoff.
AsymXray used this pattern to resolve PageSpeed, budget pacing, GBP reviews, and health score bugs across multiple separate investigations. The commit pattern is consistent: debug: add logging for X → root cause found → remove debug logging — X fix confirmed.
Parallel query execution with Promise.all for sibling data fetches
Any time two or more queries don't depend on each other's results, run them in parallel. This is the fix for N+1 patterns and also for connection tests, health checks, and dashboard aggregations. In AsymXray, 9+ data source connection tests were parallelized; in Hazardos, commission summaries dropped from sequential to 5 parallel queries.
// Before
const assignments = await checkAssignments(userId);
const accountManager = await checkAccountManager(userId);
// After
const [assignments, accountManager] = await Promise.all([
checkAssignments(userId),
checkAccountManager(userId),
]);
Explicit column selection instead of select('*')
Replacing select('*') with explicit column lists in Supabase queries does two things: reduces bandwidth (measurable on wide tables), and prevents accidental exposure of sensitive fields added to a table later. In AsymXray this was a systematic refactor across multiple query sites.
It also makes N+1 patterns more visible — when you're forced to name the columns you need, you notice when you're fetching the same columns repeatedly in a loop.
Graceful test skipping when external resources are unavailable
Integration tests that require a live database or real credentials should skip cleanly when those resources aren't present, rather than failing with a connection error. In AsymXray, adding fallback environment variables and explicit skip logic brought the test suite from failing to 853 passing / 34 properly skipped.
const hasCredentials = process.env.DATABASE_URL && process.env.DATABASE_URL !== 'placeholder';
const itWithDb = hasCredentials ? it : it.skip;
itWithDb('validates tokens against live database', async () => { ... });
Structured logging (Pino) over console.* in production code
LabelCheck replaced 92+ console.log/error calls across 56 files with Pino structured logging. The practical benefit: log levels, JSON output parseable by log aggregators, and no accidental credential logging. console.log in library modules is particularly dangerous because it fires in production with no filtering.
Gotchas and Edge Cases
CSP blocks console.log output in browser debugging contexts
Observed in AsymXray PageSpeed debugging: Content Security Policy can block console output in certain browser contexts, making console.log silently do nothing. When you're getting no console output and you're sure the code is running, try alert() or write output directly to the DOM. This is rare but wastes significant time when it happens.
Google access token expires_at is not a reliable health indicator
Observed in AsymXray: Google access tokens expire after one hour regardless of what expires_at says, but they can be silently renewed using refresh_token. A token health check that reads expires_at will report healthy tokens that are actually expired. The reliable check: verify refresh_token is present and non-null. If it is, the token can be renewed; if it isn't, the user needs to re-authenticate.
Gravity Forms basic listing endpoint omits is_active
Observed in AsymXray: the Gravity Forms API /forms endpoint does not include is_active in the basic listing response — it's only present in the individual form detail endpoint. Code that reads form.is_active from a listing response gets undefined, which coerces to falsy, marking all forms as inactive. Fix: null-check and default to true, or fetch individual form details when is_active is needed.
Additionally: WordPress security plugins (WordFence and similar) can block the /forms listing endpoint entirely while allowing individual form access. Build a fallback that discovers forms from the entries endpoint when /forms returns 404.
Object.defineProperty fails on process.env in test environments
Observed in AsymXray: Object.defineProperty(process.env, 'NODE_ENV', { value: 'test' }) throws in Jest because process.env properties have non-configurable descriptors. Fix: use direct assignment (process.env.NODE_ENV = 'test') or jest.replaceProperty. This broke 5 integration tests before the fix.
Google API client is not thread-safe
Observed in Sieve: the Google API client requires sequential pre-fetching of credentials and configuration before any threaded operations begin. Initializing it inside a thread or calling it concurrently from multiple threads causes segfaults on Windows (encoding-related) and undefined behavior elsewhere. Pre-fetch everything you need from the Google client before spawning threads.
Cache delete-before-insert is required for immediate consistency
Observed in AsymXray: a cache layer that inserts new entries without first deleting the existing entry will serve stale data until TTL expiry, even if the underlying data changed. This is particularly visible when objective configurations change — the old cached brief persists. Fix: always DELETE the cache entry before INSERT, not after.
brace-expansion@5 override breaks ESLint's minimatch
Observed in Eydn: adding a brace-expansion@5 override in package.json to resolve a security audit finding broke ESLint, which depends on minimatch expecting the v1/v2 expand() API. ESLint's lint step crashed in CI with a function signature error. Removing the override restored CI. Moderate-severity vulnerabilities in client-side export tooling (exceljs → archiver) don't pose runtime risk and don't justify breaking the lint pipeline.
date-fns timezone handling differs between test and runtime environments
Observed in AsymXray: DateRangeSelector tests failed not because the component was wrong, but because date-fns date construction behaves differently depending on the system timezone of the test runner vs. the browser runtime. Test assertions needed adjustment to account for this, not the component logic. When date tests fail in CI but pass locally, check whether the CI runner's timezone matches your local environment.
Where Docs Disagree With Practice
Google OAuth revoke endpoint scope
Docs describe revokeToken() as revoking a specific token. In practice (observed in AsymXray), it revokes all tokens issued to the application for that user — across every Google service the user has connected. The docs don't warn that this is a user-level revocation, not a token-level one.
Gravity Forms API field availability
Docs for the Gravity Forms REST API imply is_active is a standard form field. In practice, it's absent from the listing endpoint (/forms) and only present in individual form detail responses. Any code written against the docs that reads is_active from a listing will silently get undefined.
Next.js production vs. dev server environment variable validation
Next.js documentation doesn't clearly distinguish that the production server (next start) validates environment variables more strictly at startup than the dev server (next dev). Placeholder values that allow next dev to run will cause next start to fail. This matters for CI pipelines that test server startup.
API keys in URL query parameters
Several Google API docs show API keys as query parameters (?key=YOUR_KEY). In practice, this leaks credentials into server access logs, browser history, and Referer headers. The correct approach — using X-Goog-Api-Key header — is documented but not emphasized. Observed as a security finding in Eydn.
xlsx package security posture
The xlsx package is widely documented as the standard Node.js Excel library. It contains prototype pollution vulnerabilities that are not prominently disclosed in its npm listing. In AsymXray, it required replacement with exceljs. Check the npm audit output before adopting xlsx in any new project.
Tool and Version Notes
-
Jest 30: renamed
testPathPattern(singular) totestPathPatterns(plural) in configuration. Projects on Jest 29 locally but Jest 30 in CI will see configuration errors only in CI. Observed in Stride v2. [5] -
Next.js Edge Runtime: the
cryptomodule API is incompatible with Edge Runtime. Any code that usescryptomust include a conditional check for the runtime environment before calling Edge-incompatible APIs. Observed in AsymXray E2E test startup. [26] -
Pino (structured logging): drop-in replacement for
console.*in Node.js services. LabelCheck migrated 92+ call sites across 56 files. JSON output is parseable by Datadog, Logtail, and similar aggregators without additional configuration. -
Playwright CI: requires explicit
webServerconfiguration and environment variables to be present before test runner starts. Missing env vars cause exit code 1 failures across all browsers simultaneously — the symptom looks like a Playwright bug but is almost always a missing variable. Observed in Stride v2. [27] -
shadcn/ui + Radix UI: component internals (CSS class names, data attributes) change between minor versions. Any test suite asserting on these will break on upgrade. Stride v2 learned this the hard way with 13 deleted test files.
-
exceljs: replacement for
xlsxwhen prototype pollution is a concern. API is different enough to require a rewrite of export logic, not a drop-in swap.
Related Topics
- Jest Configuration
- Playwright E2E
- Supabase Query Patterns
- N1 Query Optimization
- Google Oauth
- Environment Variables
- Github Actions
- Dependency Auditing
- Structured Logging
Sources
Synthesized from 75 fragments: git commits across AsymXray (majority, ~45 commits), Stride v2 (~15 commits), Hazardos (~6 commits), LabelCheck (~4 commits), ContentCommand (~2 commits), Eydn (~3 commits), Sieve (~1 commit). Client mentions include Adava Care and BluePoint ATM as AsymXray client accounts, not separate codebases. No external sources ingested yet for this topic. Date range: unknown to unknown.
Sources
- Asymxray Eac20E9 Fix Pagespeed Data Parsing And Clean Up Debug Code, Asymxray A799D04 Pagespeed Data Not Displaying Extract Datadata F, Asymxray 9935C03 Pulse Page Data Loading Issues Unwrap Api Respon ↩
- Stride V2 6C07C24 Fix N1 Query Issues In Dashboard And Student Stats, Hazardos 7Ea762F Optimize Remaining N1 And Inefficient Query Patter, Asymxray Ca864F6 Implement Database And Cache Optimizations ↩
- Asymxray 6E1A9Cc Resolve Login Tracker Test Failures, Labelcheck 9531D45 Fix Jest Test Environment And Api Route Tests Ph, Contentcommand 1E72967 Fix All Failing Tests 654 Passing 0 Failing ↩
- Stride V2 Df9B0Ea Force Ipv4 Resolution For Database Connection ↩
- Stride V2 640Fb31 Fix Ci Failures Security Lint Jest 30 Compatibilit ↩
- Asymxray 281056A Correct Supabase Cli Package Name In Ci Workflow ↩
- Asymxray 70A0Abd Dont Revoke Google Tokens When Disconnecting Indiv ↩
- Asymxray 7540Ead Fix Health Score Calculation Bug, Asymxray 4C58035 Remove Debug Logging Health Score Calculation No ↩
- Stride V2 67E58A2 Remove Broken Ui Component Tests, Stride V2 36F3Cf1 Fix Flaky And Brittle Component Tests ↩
- Eydn App 28278E6 Fix 3 Security Vulnerabilities Ci Server Startup ↩
- Asymxray 53Cb205 Add Comprehensive Pagespeed Debug Logging Debug, Asymxray 4C58035 Remove Debug Logging Health Score Calculation No, Asymxray A47F06C Remove Debug Logging Budget Calculation Fix Conf ↩
- Asymxray Ca864F6 Implement Database And Cache Optimizations, Hazardos 7Ea762F Optimize Remaining N1 And Inefficient Query Patter ↩
- Asymxray Cbdf028 Replace Select With Explicit Column Lists ↩
- Asymxray 40A1Ca4 Resolve Integration Test Failures For Validate Tok, Asymxray 993E6Cb Resolve E2E Test Failures And Authentication Issue ↩
- Labelcheck D4C3170 Migrate To Structured Logging With Pino, Asymxray D357859 Add Structured Logging To Lib Modules ↩
- Asymxray Ffb33Bb Add Visual Debug Alerts For Pagespeed Refresh Re ↩
- Asymxray 5B82107 Improve Testing Diagnostics Accuracy And Remove Fa ↩
- Asymxray 63390E7 Handle Undefined Isactive In Gravity Forms Api Res, Asymxray 99Aec6C Handle Gravity Forms Forms 404 By Falling Back To ↩
- Asymxray B01B923 Resolve Integration Test Objectdefineproperty Erro ↩
- Sieve Cd7E52C Fix Windows Encoding Segfault In Threaded Content ↩
- Asymxray 81Ffed5 Improve Cache Handling And Add Debug Logging For A ↩
- Eydn App 3B28605 Fix Ci Lint Crash Remove Brace Expansion Override ↩
- Asymxray 747Cf75 Correct Daterangeselector Test Expectations ↩
- Asymxray 63390E7 Handle Undefined Isactive In Gravity Forms Api Res ↩
- Asymxray Cf5Ae74 Security Replace Vulnerable Xlsx With Secure Excel ↩
- Asymxray 4F24Ffb Resolve E2E Test Startup Issues ↩
- Stride V2 3389343 Resolve E2E Test Failures And Improve Test Reliabi ↩