wiki/knowledge/ai-tools/wedding-app-claude-ai-quality-review.md Layer 2 article 656 words Updated: 2026-04-05
↓ MD ↓ PDF
ai-feature claude qa wedding-app product

Wedding App Claude AI Quality Review Process

Overview

The wedding app uses Claude as its AI backend to answer user questions about wedding planning. Because AI responses are user-facing and directly affect product quality, a structured weekly review process was established during the [1] to catch unhelpful, inappropriate, or off-brand responses before and after public launch.

This process covers three areas: systematic question testing, edge case validation, and iterative tone training.


Why This Matters

AI features carry reputational risk. A single ridiculous or offensive response can undermine user trust and generate negative word-of-mouth — especially in a consumer product targeting an emotionally significant life event. The review process is designed to surface problems early, during beta testing, before they reach a broad audience.

"Anytime you have an AI feature, you want to really beat on it and make sure it's not going to do anything ridiculous." — Mark Hope


Weekly Review Protocol

1. Top-30 Question Audit

Each week, pull the 30 most common questions submitted by users and manually review the AI's responses. Evaluate for:

Flag any responses that are notably good (to reinforce) or notably bad (to correct).

2. Edge Case Testing

Test the AI against non-standard or unusual wedding scenarios that fall outside the typical American wedding template. Examples include:

Goal: Ensure the AI responds gracefully — neither refusing to engage nor giving absurd advice — when it encounters requests outside its training emphasis.

Future consideration: If enough edge case requests accumulate around a specific category (e.g., Asian wedding traditions, religious ceremonies), that category may be promoted to a first-class feature with dedicated prompting.

3. Dark Mode / UI Interaction Note

During Phase 1 QA, a low-contrast dark mode bug was identified (tan text on white background in the AI chat interface). While this is a UI issue rather than an AI issue, it affects the perceived quality of AI responses. Mark confirmed dark mode will not appear on the landing page, but it should be resolved in the app itself.


Tone Training

Claude's behavior is controlled in part by a system prompt file in the codebase that specifies:

Process for refinement:
1. Karly tests the AI and logs responses she finds unhelpful, off-tone, or unexpectedly good
2. She shares flagged examples with Mark
3. Mark updates the system prompt / training file accordingly

This is an ongoing loop — not a one-time setup. The expectation is that the AI's quality improves iteratively through the beta period and beyond.


Responsibilities

Task Owner
Test AI with 30 common questions + edge cases Karly Oykhman
Log and share flagged responses (good and bad) Karly Oykhman
Update Claude system prompt / tone training file Mark Hope
Weekly review cadence during beta Both

Phase Integration

Phase AI QA Activity
Phase 1 — Internal QA Karly stress-tests AI with diverse and edge case questions; flags ridiculous responses to Mark
Phase 2 — Soft Launch / Beta Weekly review of top-30 questions; edge case testing continues; tone training iterations based on real user queries
Phase 3 — Pre-Launch / Hard Launch Ongoing monitoring; AI behavior should be stable and well-tuned by this point