wiki/knowledge/ai-tools/wedding-app-claude-ai-quality-review.md · 656 words · 2026-04-05

Wedding App Claude AI Quality Review Process

Overview

The wedding app uses Claude as its AI backend to answer user questions about wedding planning. Because AI responses are user-facing and directly affect product quality, a structured weekly review process was established during the ^[1] to catch unhelpful, inappropriate, or off-brand responses before and after public launch.

This process covers three areas: systematic question testing, edge case validation, and iterative tone training.

Why This Matters

AI features carry reputational risk. A single ridiculous or offensive response can undermine user trust and generate negative word-of-mouth — especially in a consumer product targeting an emotionally significant life event. The review process is designed to surface problems early, during beta testing, before they reach a broad audience.

"Anytime you have an AI feature, you want to really beat on it and make sure it's not going to do anything ridiculous." — Mark Hope

Weekly Review Protocol

1. Top-30 Question Audit

Each week, pull the 30 most common questions submitted by users and manually review the AI's responses. Evaluate for:

Accuracy and helpfulness
Appropriate tone (friendly, not clinical or dismissive)
Responses that are vague, overly generic, or potentially embarrassing

Flag any responses that are notably good (to reinforce) or notably bad (to correct).

2. Edge Case Testing

Test the AI against non-standard or unusual wedding scenarios that fall outside the typical American wedding template. Examples include:

Non-traditional venues ("I want to get married in a swamp")
Culturally or religiously specific ceremonies with unique requirements
Unconventional wedding structures (micro-weddings, elopements, polyamorous ceremonies)

Goal: Ensure the AI responds gracefully — neither refusing to engage nor giving absurd advice — when it encounters requests outside its training emphasis.

Future consideration: If enough edge case requests accumulate around a specific category (e.g., Asian wedding traditions, religious ceremonies), that category may be promoted to a first-class feature with dedicated prompting.

3. Dark Mode / UI Interaction Note

During Phase 1 QA, a low-contrast dark mode bug was identified (tan text on white background in the AI chat interface). While this is a UI issue rather than an AI issue, it affects the perceived quality of AI responses. Mark confirmed dark mode will not appear on the landing page, but it should be resolved in the app itself.

Tone Training

Claude's behavior is controlled in part by a system prompt file in the codebase that specifies:

Tone of voice (friendly, approachable)
Topics the AI should and should not address
Response style guidelines

Process for refinement:
1. Karly tests the AI and logs responses she finds unhelpful, off-tone, or unexpectedly good
2. She shares flagged examples with Mark
3. Mark updates the system prompt / training file accordingly

This is an ongoing loop — not a one-time setup. The expectation is that the AI's quality improves iteratively through the beta period and beyond.

Responsibilities

Task	Owner
Test AI with 30 common questions + edge cases	Karly Oykhman
Log and share flagged responses (good and bad)	Karly Oykhman
Update Claude system prompt / tone training file	Mark Hope
Weekly review cadence during beta	Both

Phase Integration

Phase	AI QA Activity
Phase 1 — Internal QA	Karly stress-tests AI with diverse and edge case questions; flags ridiculous responses to Mark
Phase 2 — Soft Launch / Beta	Weekly review of top-30 questions; edge case testing continues; tone training iterations based on real user queries
Phase 3 — Pre-Launch / Hard Launch	Ongoing monitoring; AI behavior should be stable and well-tuned by this point

^[2]
^[3]
^[4]