Postgres + pgvector

quick-win-3-50.md
- wiki/engineering/postgres-pgvector/labelcheck-4e301f8-implement-major-food-allergen-detection-database-f.md
- wiki/engineering/postgres-pgvector/eydn-app-573a4f8-add-missing-database-indexes-for-common-query-patt.md
- wiki/engineering/postgres-pgvector/orbitabm-0959f5d-prevent-duplicate-email-sends-to-the-same-contact.md
contradicting_sources: []
contradicting_count: 0
first_seen: "unknown"
last_updated: "2025-01-31"
hypothesis: false
rate_of_change: high
web_monitoring_frequency: weekly
fragment_count: 28
evolution_timeline: []
evolution_start: null
superseded_by: null
superseded_date: null
deprecation_notice: null
tags: []

Summary

Partial unique indexes with WHERE clauses silently break Supabase .upsert() — the onConflict target must be a non-partial index, which is safe for nullable columns because Postgres treats NULLs as distinct. PostgreSQL functions in Supabase need SECURITY DEFINER plus SET search_path = public to clear the "mutable search_path" security warning; without the explicit search_path, the function is flagged even if it otherwise works. The <=> cosine distance operator in pgvector requires the extensions schema to be in search_path, or queries silently fail to resolve the operator. For temporal vector search, weighting recency at 80% and similarity at 20% outperforms pure similarity scoring — but this requires a slower search path, so route by query intent rather than applying it universally.

TL;DR

What we've learned
- Supabase .upsert() with onConflict requires a plain (non-partial) unique index — partial indexes with WHERE clauses are silently ignored or error at runtime.
- Every PostgreSQL function in a Supabase project needs SECURITY DEFINER + SET search_path = public or the Supabase dashboard flags it as a security warning.
- Missing indexes are the single highest-leverage performance fix: three targeted indexes in LabelCheck moved queries from 80–450ms to 2–12ms (97% improvement each).
- pgvector's <=> operator requires extensions in search_path; forgetting this causes confusing operator-not-found errors rather than a clear pgvector error.
- Two-layer duplicate prevention (application check + database constraint) is the only reliable pattern — database constraints alone don't prevent race conditions, and application checks alone don't survive concurrent writes.

External insights

No external sources ingested yet for this topic.

Common Failure Modes

Supabase `.upsert()` silently fails with partial unique indexes

Supabase's .upsert({ onConflict: 'col_a, col_b' }) resolves the conflict target by looking for a unique index on exactly those columns. If the only matching index has a WHERE clause (e.g., WHERE deleted_at IS NULL), Postgres can't use it as the conflict target and the upsert either throws or inserts a duplicate instead of updating.

The fix is to drop the partial index and replace it with a non-partial unique index. This is safe for soft-delete columns because Postgres treats NULLs as distinct — two rows with deleted_at IS NULL don't conflict on a nullable unique column.

-- BROKEN: Supabase upsert can't target this
CREATE UNIQUE INDEX platforms_name_org_active_idx
  ON pe_platforms (name, org_id)
  WHERE deleted_at IS NULL;

-- FIXED: non-partial index; NULLs are distinct so soft-delete rows don't collide
CREATE UNIQUE INDEX platforms_name_org_idx
  ON pe_platforms (name, org_id);

Observed in Orbit ABM when the platforms migration started producing duplicate rows after a re-ingest.
^[1]

PostgreSQL function security warnings: "mutable search_path"

Supabase flags any function that doesn't pin its search_path as a security risk. The warning appears in the Supabase dashboard under Database → Functions and blocks some compliance checks. The root cause is that without an explicit search_path, a malicious or misconfigured schema earlier in the path could shadow public functions.

Fix: add SECURITY DEFINER and SET search_path = public to every function definition. For read-only functions, STABLE is the correct volatility; for pure computation with no DB reads, IMMUTABLE.

CREATE OR REPLACE FUNCTION public.my_function(arg text)
RETURNS text
LANGUAGE plpgsql
STABLE
SECURITY DEFINER
SET search_path = public
AS $$
BEGIN
  RETURN arg;
END;
$$;

Consistent across projects: hit this in AsymXray across at least 4 separate migration commits, each fixing a different batch of functions.
^[2]

pgvector `<=>` operator not found

When calling a pgvector similarity search function, queries fail with an operator resolution error if the extensions schema isn't in search_path. The <=> cosine distance operator lives in the extensions schema (where pgvector is installed in Supabase), not in public.

-- Fails if extensions not in search_path:
SELECT id, embedding <=> query_embedding AS distance
FROM documents
ORDER BY distance;

-- Fix: set search_path explicitly in the function
SET search_path = public, extensions;

Observed in ClientBrain when wiring the initial vector search function.
^[3]

Complex vector search queries hang without a timeout

Without a statement timeout, a slow or pathological vector search query (large embedding space, no HNSW index, high ef_search) will hold a connection open indefinitely. In ClientBrain this manifested as the search UI appearing to load forever with no error surfaced to the user.

Fix: set statement_timeout at the query level or connection level before executing vector search.

SET statement_timeout = '30s';
SELECT * FROM search_documents($1, $2, $3);

Observed in ClientBrain; 30 seconds was the chosen threshold for complex multi-source queries.
^[4]

Missing indexes on high-frequency columns

The most common performance failure across all projects: tables grow past a few thousand rows, sequential scans start showing up, and queries that were fast in development become slow in production. The pattern is always the same — the column is queried on every authenticated request, but no index was added at schema creation time.

Measured results from LabelCheck after adding 14 indexes across 8 tables:
- User history query: 450ms → 12ms (97% improvement) after (user_id, created_at) composite index
- Ingredient lookup: 120ms → 3–4ms (97% improvement) after LOWER(name) functional index
- Rate limit check: 80ms → 2–3ms (97% improvement) after (user_id, month) composite index

In Eydn, weddings.user_id was hit on every authenticated request with no index — added as part of a batch of 8 missing indexes.

^[5]

Migration fails referencing a table that doesn't exist

Observed in AsymXray: a migration referenced migration_log which had been removed in an earlier migration. The migration runner halted with a relation-not-found error, blocking all subsequent migrations.

Fix: before adding a constraint or index in a migration, verify the referenced table exists in the current schema state. When cleaning up, drop the reference before dropping the table.
^[6]

Upsert fails when duplicates already exist in the table

Adding a UNIQUE constraint to a column that already has duplicate values causes the migration to fail. In Orbit ABM, the pe_platforms table had accumulated duplicates before the deduplication constraint was introduced.

Fix: deduplicate before adding the constraint, in the same migration transaction.

-- Step 1: remove duplicates, keeping the oldest row
DELETE FROM pe_platforms
WHERE id NOT IN (
  SELECT MIN(id)
  FROM pe_platforms
  GROUP BY name, org_id
);

-- Step 2: now safe to add the constraint
ALTER TABLE pe_platforms
  ADD CONSTRAINT platforms_name_org_unique UNIQUE (name, org_id);

^[7]

What Works

Two-layer duplicate prevention

Application-level duplicate checks (query before insert) are vulnerable to race conditions under concurrent writes. Database-level constraints alone don't give you useful error messages or the ability to merge/flag duplicates. The pattern that holds: check at the application layer for UX (show "this already exists"), enforce at the database layer as a hard constraint.

In Orbit ABM, duplicate prevention uses domain matching for companies, email matching for contacts, and normalized name matching for markets/verticals. The database constraint is the safety net, not the primary check.
^[8]

Recency-aware vector search scoring for temporal queries

Pure cosine similarity returns the most semantically similar documents regardless of age. For queries with temporal intent ("what did we discuss last week", "recent decisions"), this produces stale results. In ClientBrain, routing temporal queries through a recency-weighted scorer (80% recency, 20% similarity) produced meaningfully better results than pure similarity.

The routing decision is made by running intent extraction (Claude Haiku with tool_use) in parallel with the embedding call — zero added latency since both run concurrently. Temporal intent → search_documents_v2 (recency-weighted, slower). Factual/relevance intent → search_documents_v1 (pure similarity, faster).

Single-project finding (ClientBrain), but the pattern is generalizable to any corpus with strong temporal structure.
^[9]

RLS role hierarchy for multi-tenant SaaS

Consistent pattern across AsymXray and LabelCheck: structure RLS policies around 4 roles with explicit grants at each level.

Role	Access
`admin`	Full read/write on all rows
`manager`	Broad read, limited write (own org)
`account_manager`	Read/write on assigned clients only
`service_role`	Bypasses RLS entirely (for API routes)

The service_role bypass is intentional and documented — it's how server-side API routes operate without impersonating a user. Don't fight it; instead, ensure all user-facing queries go through the authenticated client, not the service role.
^[10]

Functional and GIN indexes for non-standard query patterns

Three index types that pay off in specific domains:

LOWER() functional index for case-insensitive text matching — avoids full table scans on WHERE LOWER(name) = LOWER($1). Used in LabelCheck for ingredient lookup (120ms → 3ms).

GIN index on array columns for @> containment queries. Used in LabelCheck for allergen derivative matching against ingredient arrays.

Composite index ordered by selectivity — put the most selective column first. (user_id, created_at) outperforms (created_at, user_id) when filtering by user is the primary predicate.

^[11]

Hierarchical date fallback for incomplete records

When ingesting external data (ClickUp tasks, calendar events, CRM records), date fields are often partially populated. In ClientBrain, tasks were appearing on wrong dates because created_at was used as a fallback when due_date was null, but created_at reflects ingestion time, not task relevance.

Fix: use a priority hierarchy in SQL with COALESCE:

COALESCE(date_closed, due_date, date_created) AS source_date

This keeps tasks anchored to their most meaningful date. Single-project finding (ClientBrain), but applicable to any ingestion pipeline with heterogeneous date fields.
^[12]

Gotchas and Edge Cases

NULLs are distinct in unique indexes. Two rows with NULL in a unique column don't conflict — Postgres considers them distinct. This is the property that makes non-partial unique indexes safe for soft-delete patterns. It's also a footgun if you expect a unique constraint to prevent multiple NULLs.
^[1]

service_role bypasses RLS by design. Any query made with the Supabase service role key ignores all RLS policies. This is documented but easy to forget when debugging access issues — if a query works with service role but fails with the anon/authenticated role, the problem is RLS, not the query.
^[10]

Filter-only queries shouldn't invoke vector search. When a user applies filters but enters no text query, running a vector search with an empty or null embedding produces nonsense rankings. In ClientBrain, filter-only requests (browseDocuments) return rows sorted by date with no embedding call. The code path must explicitly branch on whether a text query is present.
^[13]

IMMUTABLE is stricter than it looks. A function marked IMMUTABLE must return the same result for the same inputs across all possible database states — no reads, no side effects, no calls to now() or random(). Postgres will allow you to declare a function IMMUTABLE even if it reads from a table; it won't enforce correctness. Mislabeling a function IMMUTABLE when it should be STABLE causes query planner bugs that are hard to reproduce.
^[14]

Deduplication before constraint addition must be in the same transaction. If you run deduplication in one migration and add the constraint in the next, a concurrent write between migrations can re-introduce duplicates before the constraint lands. Wrap both steps in a single migration.
^[7]

RFC Message-ID is the correct deduplication key for email ingestion. Email threads can be re-ingested from multiple sources (sent folder, received folder, shared mailbox). Using subject + sender + timestamp produces false duplicates for forwarded messages. RFC Message-ID is globally unique per message.
^[15]

Where Docs Disagree With Practice

Supabase docs show partial indexes in upsert examples. The Supabase documentation includes examples of onConflict targeting columns that happen to have partial unique indexes. In practice, if the only unique index on those columns is partial, the upsert fails or silently inserts a duplicate. The conflict target must resolve to a non-partial index. Docs don't call this out.
^[1]

SECURITY DEFINER alone is not sufficient for Supabase security compliance. Postgres documentation describes SECURITY DEFINER as the mechanism for elevating function privileges. Supabase's security scanner additionally requires SET search_path = public — a function with SECURITY DEFINER but no pinned search_path still triggers the "mutable search_path" warning. The two settings are independent requirements.
^[16]

pgvector operator resolution isn't automatic after CREATE EXTENSION. The pgvector docs imply that after installing the extension, operators like <=> are available in queries. In Supabase, the extension is installed in the extensions schema, not public. Queries in functions that don't include extensions in search_path fail to resolve <=> even though the extension is installed.
^[3]

Tool and Version Notes

Supabase + pgvector: The <=> operator lives in the extensions schema in Supabase-hosted Postgres. Self-hosted Postgres with pgvector installed in public doesn't have this issue. Behavior diverges between Supabase and self-hosted.

Supabase .upsert() onConflict: Tested against Supabase JS client v2.x. The partial index limitation is a Postgres constraint, not a Supabase client bug — any client sending an ON CONFLICT (col) clause will hit this if the index is partial.

PostgreSQL function volatility categories: IMMUTABLE > STABLE > VOLATILE in terms of optimization aggressiveness. Supabase's security scanner treats all three as valid but requires search_path pinning regardless of volatility. Use STABLE for functions that read from tables; IMMUTABLE only for pure computation.

GIN indexes: Available in Postgres 9.4+; no version concerns for current Supabase-hosted instances. Required for @> array containment queries — B-tree indexes don't support array operators.

statement_timeout: Session-level setting, not persistent. Must be set per-connection or per-query. In Supabase edge functions, set it at the start of the request handler, not in a migration.

Sources

Synthesized from 28 fragments: 28 git commits across AsymXray (8 commits), ClientBrain (9 commits), Orbit ABM (5 commits), LabelCheck (4 commits), Eydn (2 commits). No external sources ingested. Date range: unknown to unknown.

Summary

TL;DR

Common Failure Modes

Supabase .upsert() silently fails with partial unique indexes

PostgreSQL function security warnings: "mutable search_path"

pgvector <=> operator not found

Complex vector search queries hang without a timeout

Missing indexes on high-frequency columns

Migration fails referencing a table that doesn't exist

Upsert fails when duplicates already exist in the table

What Works

Two-layer duplicate prevention

Recency-aware vector search scoring for temporal queries

RLS role hierarchy for multi-tenant SaaS

Functional and GIN indexes for non-standard query patterns

Hierarchical date fallback for incomplete records

Gotchas and Edge Cases

Where Docs Disagree With Practice

Tool and Version Notes

Related Topics

Sources

Sources

Fragments (28)

Supabase `.upsert()` silently fails with partial unique indexes

pgvector `<=>` operator not found