---
title: HubSpot Database Cleanup & Categorization — 71k Contacts
type: article
created: '2026-04-05'
updated: '2026-04-05'
source_docs:
- raw/2026-02-18-aviaryai-weekly-call-123511089.md
tags:
- hubspot
- data-cleanup
- lead-generation
- clay
- apollo
- email-hygiene
- abm
- aviary-ai
layer: 2
client_source: null
industry_context: null
transferable: true
---

# HubSpot Database Cleanup & Categorization — 71k Contacts

## Overview

AviaryAI's HubSpot instance contains approximately 71,000 contacts accumulated over time, but the database is not usable for targeted outbound campaigns in its current state. A structured cleanup and categorization project is required before high-volume ABM email sends can begin. This work is a prerequisite for the [[wiki/clients/aviary-ai/projects/aws-ses-abm-email-automation|AWS SES ABM email automation]] initiative.

The cleanup will be executed by Sebastian Gant using Clay and Apollo.

---

## Problem Statement

The 71k-contact database has several compounding issues:

- **~20,000 irrelevant contacts** — Blessing (AviaryAI's CEO) imported a bulk dump of VC firm contacts for a prior initiative; these are not target accounts for current outreach
- **Broken/missing names** — Some contacts show only "name" as a placeholder, the result of a failed automation attempt by Aaron
- **Stale email addresses** — The source data was ~6 months old at import; many contacts have emails tied to prior employers or roles that no longer exist
- **Missing categorization** — Insurance companies and other financial institution types are untagged; only Credit Unions and Banks have partial tagging
- **Duplicates** — Deduplication has not been run

Without cleanup, sending campaigns risks high bounce rates, domain reputation damage, and wasted spend.

---

## Cleanup Scope

### 1. Deduplication
Remove duplicate contact records across the database.

### 2. Name Remediation
Identify and fill contacts with missing or placeholder names. Contacts with no resolvable name should be flagged or removed.

### 3. Categorization
Tag all contacts into one of the following categories based on their associated company type:

| Category | Notes |
|---|---|
| **Credit Unions** | Already partially tagged; verify and complete |
| **Community Banks** | Already partially tagged; verify and complete |
| **Insurance Companies** | Currently untagged; needs full pass |
| **Other** | Anything not clearly categorizable; set aside for separate review |
| **VC / Non-Target** | ~20k Blessing-imported VC contacts; move out of active lists |

### 4. Email Hygiene / Validation
Run a hygiene check against all contacts to:
- Validate email addresses are deliverable
- Flag or remove addresses tied to former employers
- Reduce bounce rate before any sends go out

Tools: Clay, Apollo (AviaryAI has an existing Apollo account).

---

## Why This Matters for Campaigns

The cleaned and categorized database directly enables:

- **Targeted Tier 2/3 bulk sends** via AWS SES — goal is 1,000+ emails/week at scale
- **Segmented campaigns** by institution type (Credit Unions vs. Banks vs. Insurance)
- **A $1k/month learning budget** to test messaging and measure open/click rates at statistically significant volume (Aaron noted that at 50–100 emails/day manually, sample sizes were too small to draw conclusions)
- **Domain reputation protection** — high bounce rates from stale data would damage deliverability for all sends

> "We want to get them out ASAP. We don't care if they're perfect... our numbers weren't at a statistically significant [level] to see if the open rate and the message was right." — Aaron Grossman

---

## Tooling

| Tool | Role |
|---|---|
| **Clay** | Enrichment, categorization, data matching |
| **Apollo** | Contact data cross-reference; AviaryAI has an existing account |
| **HubSpot** | Target system; cleaned data lives here |

---

## Action Items

- [ ] **Sebastian Gant** — Deduplicate contacts across the full 71k database
- [ ] **Sebastian Gant** — Fill missing/broken contact names; remove unresolvable records
- [ ] **Sebastian Gant** — Tag all contacts: Credit Unions, Community Banks, Insurance Companies, Other, VC/Non-Target
- [ ] **Sebastian Gant** — Run email hygiene/validation pass to reduce bounce risk
- [ ] **Mark Hope** — Align with Sebastian on $1k/month learning budget and KPIs before first sends

---

## Related

- [[wiki/clients/aviary-ai/aviary-ai|AviaryAI Client Overview]]
- [[wiki/clients/aviary-ai/projects/aws-ses-abm-email-automation|AWS SES ABM Email Automation]]
- [[wiki/clients/aviary-ai/meetings/2026-02-18-abm-email-strategy-hubspot-automation|Meeting: ABM Email Strategy & HubSpot Automation (2026-02-18)]]
- [[wiki/knowledge/lead-generation/abm-tiered-outreach|ABM Tiered Outreach Strategy]]