THE HARBOR
STRATEGY.
Don't just publish. Engineer. Harbor replaces the "spray and pray" SEO model with a high-fidelity, agentic research protocol.
The 4-Stage
Agentic Loop.
Most AI tools are just wrappers around a single prompt. Harbor is a multi-agent ecosystem that performs iterative research loops with real-time scraping, structured extraction, and intelligent deduplication.
Sitemap Index Intelligence
Our agents don't blindly crawl. They analyze sitemap indexes to identify 'money sitemaps' (products, collections) while ignoring overhead like image-sitemaps or foreign language variants (/de/, /fr/, /es/).
- •Detects if sitemap.xml is an index with sub-sitemaps
- •AI selects 1-5 most relevant sitemaps from the index
- •Filters out: image sitemaps, video sitemaps, admin pages
- •Prioritizes: products, collections, categories, pages
- •Fetches up to 300 URLs per selected sitemap
client.responses.create({ model: 'gpt-5-nano', text: { format: { type: 'json_schema', strict: true } } })Smart URL Sampling
From a pool of thousands, Harbor shuffles and samples 300 URLs, then uses AI reasoning to pick the top 50 pages that determine your site's topical authority.
- •Shuffles URLs randomly for variety (no alphabetical bias)
- •Samples 300 URLs from the total pool
- •AI selects top 50 most relevant for target keyword
- •Homepage always included automatically
- •Filters out: login, cart, checkout, admin, legal pages
- •Classifies each URL by page type (product, collection, about)
const shuffledUrls = [...urls].sort(() => Math.random() - 0.5).slice(0, 300)Parallel Entity Extraction
Using 5x parallel concurrency with Promise.allSettled(), we scrape and extract structured data (pricing, images, offerings) to build a temporary knowledge base for your content.
- •5x parallel concurrency (configurable)
- •Promise.allSettled() for resilient batch processing
- •Jina for standard pages, BrightData for protected sites
- •Extracts: titles, headings, descriptions, pricing
- •Extracts: images with alt text and context
- •Extracts: offerings, contact info, internal links
- •Each page scored 0-1 for relevance
await Promise.allSettled(batch.map(url => scrapeAndExtract(url)))Contextual Synthesis
The OpenAI Responses API synthesizes the extracted entities into a master strategy, ensuring your new content is perfectly nested within your existing site graph with zero keyword cannibalization.
- •Sorts results by relevance score descending
- •Generates 2-3 sentence summary focused on keyword
- •Maps internal links from actual scraped URLs
- •Queries existing titles to prevent duplication
- •4-level anti-cannibalization enforcement
- •Returns structured analysis with internalLinks array
const synthesis = await client.responses.create({ model: 'gpt-5-nano', input: synthesisPrompt })4-Layer Anti-Cannibalization
Unlike blind AI wrappers, Harbor implements database-level deduplication to prevent keyword cannibalization
Domain-Scoped Query
Before generating any keyword, Harbor queries all previously generated titles from your specific domain hostname.
getAllPreviousSiteSeekerTitles({ sitemapUrl })Status Filtering
Only completed, non-generating records are included. In-progress articles won't block new topics, but finished content creates a permanent exclusion zone.
status === 'completed' && siteSeeker.keywordsPrompt Injection
The AI receives an explicit list of existing titles with instructions to avoid identical, similar, or semantically overlapping topics.
previousTitlesSection in systemPromptSemantic Distinctness
For pillar generation, the AI must create 15 distinct subniches with zero semantic overlap - no two pillars can cover similar ground.
NEVER repeat topics or create pillars that are semantically similarStructured Data Extraction
Every scraped page is parsed into a consistent JSON schema, enabling intelligent content synthesis and internal linking.
| Field | Type | Description |
|---|---|---|
| title | string | Page title |
| headings | array | All H1/H2/H3 headings |
| descriptions | array | Meta and content descriptions |
| pricing | array | Items with name, price, currency |
| images | array | URLs with alt text and context |
| offerings | array | Products/services with descriptions |
| links | array | Internal links with anchor text |
| contactInfo | object | Email, phone, address |
| relevanceScore | number | 0-1 relevance to keyword |
{
"title": "Premium Running Shoes",
"headings": ["Features", "Sizing"],
"pricing": [{
"item": "Air Max Pro",
"price": "189.99",
"currency": "USD"
}],
"images": [{
"url": "/shoes/air-max.jpg",
"alt": "Air Max Pro side view",
"context": "Product hero image"
}],
"offerings": [{
"name": "Air Max Pro",
"description": "Cushioned running"
}],
"relevanceScore": 0.92
}SITEMAP-AWARE SYNTHESIS.

DOMAIN-SCOPED DEDUPLICATION
The agent queries all previously generated titles from your specific domain hostname before generating new topics.
SEMANTIC LINK MAPPING
Internal links are selected from your actual sitemap URLs, scored for relevance, and placed at semantically appropriate positions.
BRAND VOICE EXTRACTION
Autonomous extraction of your brand's unique voice across scraped pages to ensure tone consistency in generated content.

