The wedding app uses Claude as its AI backend to answer user questions about wedding planning. Because AI responses are user-facing and directly affect product quality, a structured weekly review process was established during the [1] to catch unhelpful, inappropriate, or off-brand responses before and after public launch.
This process covers three areas: systematic question testing, edge case validation, and iterative tone training.
AI features carry reputational risk. A single ridiculous or offensive response can undermine user trust and generate negative word-of-mouth — especially in a consumer product targeting an emotionally significant life event. The review process is designed to surface problems early, during beta testing, before they reach a broad audience.
"Anytime you have an AI feature, you want to really beat on it and make sure it's not going to do anything ridiculous." — Mark Hope
Each week, pull the 30 most common questions submitted by users and manually review the AI's responses. Evaluate for:
Flag any responses that are notably good (to reinforce) or notably bad (to correct).
Test the AI against non-standard or unusual wedding scenarios that fall outside the typical American wedding template. Examples include:
Goal: Ensure the AI responds gracefully — neither refusing to engage nor giving absurd advice — when it encounters requests outside its training emphasis.
Future consideration: If enough edge case requests accumulate around a specific category (e.g., Asian wedding traditions, religious ceremonies), that category may be promoted to a first-class feature with dedicated prompting.
During Phase 1 QA, a low-contrast dark mode bug was identified (tan text on white background in the AI chat interface). While this is a UI issue rather than an AI issue, it affects the perceived quality of AI responses. Mark confirmed dark mode will not appear on the landing page, but it should be resolved in the app itself.
Claude's behavior is controlled in part by a system prompt file in the codebase that specifies:
Process for refinement:
1. Karly tests the AI and logs responses she finds unhelpful, off-tone, or unexpectedly good
2. She shares flagged examples with Mark
3. Mark updates the system prompt / training file accordingly
This is an ongoing loop — not a one-time setup. The expectation is that the AI's quality improves iteratively through the beta period and beyond.
| Task | Owner |
|---|---|
| Test AI with 30 common questions + edge cases | Karly Oykhman |
| Log and share flagged responses (good and bad) | Karly Oykhman |
| Update Claude system prompt / tone training file | Mark Hope |
| Weekly review cadence during beta | Both |
| Phase | AI QA Activity |
|---|---|
| Phase 1 — Internal QA | Karly stress-tests AI with diverse and edge case questions; flags ridiculous responses to Mark |
| Phase 2 — Soft Launch / Beta | Weekly review of top-30 questions; edge case testing continues; tone training iterations based on real user queries |
| Phase 3 — Pre-Launch / Hard Launch | Ongoing monitoring; AI behavior should be stable and well-tuned by this point |