Back to all investigations

Investigation 01

AI SEO Audit

Building an end-to-end AI-powered SEO audit tool from scratch — and what the process revealed about product design, async workflows, and the real limits of AI automation.

AI Product Development | Active

End-to-End Audit Pipeline

Five processing stages from user input to delivered report — each stage handles a distinct responsibility in the audit workflow.

Input Wizard

Collects URL, business type, competitors, goals

Job Queue

BullMQ async processing with retry logic

AI Analysis

Claude processes 7 audit categories with structured output

Report Gen

Branded PDF with scores, recommendations, priorities

Delivery

Email notification with PDF link and dashboard access

1. Overview

SEO audits are time-consuming, repetitive, and inconsistently executed. A good audit touches local presence, on-page structure, schema markup, page speed, and competitive positioning — most of which follows a repeatable pattern. I wanted to test whether AI could handle the analysis layer reliably enough to be useful in a real workflow.

This was my first end-to-end full-stack AI SaaS build. I designed the product architecture, wrote the prompts, and built the entire app myself using Claude Code. The goal wasn't just to ship something — it was to understand what it actually takes to get AI to produce trustworthy, structured output inside a production system.

I designed the audit methodology, locked the technical stack, built the async job pipeline, and acted as the product decision-maker across every tradeoff.

2. Product Question

The core question: Can AI reliably generate structured, actionable SEO audit reports from minimal user inputs — and deliver them in a format a real client could use?

Secondary questions included:

  • Where does deterministic code end and AI judgment begin?
  • How do you prevent AI from hallucinating findings or skipping critical checks?
  • What's the minimum viable workflow for async AI tasks in a production app?
  • What quality bar makes this usable for real client projects?

3. Approach

I treated this as a product design problem first, and a coding problem second. Before writing any code, I documented a canonical audit structure based on how I'd run a manual SEO audit — covering GBP competitor analysis, on-page findings, schema markup, and rankability. That document became the source of truth the AI had to match: sections, table formats, and output structure were all locked to it. That single decision turned the product into a system implementing a defined methodology, rather than a flexible AI experiment.

The build used a job queue architecture (pg-boss) to run audits asynchronously: the user submits inputs through a minimal wizard, a worker process picks up the job, runs each audit section independently, and delivers the report via email and in-app viewer. Each section uses deterministic code for data extraction and OpenAI for qualitative analysis — keeping factual grounding in code while using the model only where judgment and synthesis add value.

4. What I Built

Five-Section Audit Structure

Each section runs independently — partial failures don't kill the whole report.

GBP Competitor Analysis

Compare against 3-5 local competitors

GBP Ranking Factors

Reviews, categories, attributes, photos

On-Page SEO

Title, meta, headings, content structure

Schema Markup

LocalBusiness, FAQPage, BreadcrumbList

Rankability Verdict

Final assessment with action items

Audit Pipeline

A five-section audit covering GBP competitor analysis, GBP ranking factors, on-page SEO, schema markup, and a rankability verdict. Each section runs independently so partial failures don't kill the whole report.

Prompt Architecture

Prompts were written section-by-section against the canonical audit structure. The LLM receives structured extracted data and produces structured JSON output — not free-form prose. This made outputs parseable, consistent, and easier to debug when something went wrong.

The prompt itself became part of the implementation artifact: v1.0 was drafted first, then refined into v2.0 with Claude Opus 4.5, tightening requirements and turning the app into a more fully specified system.

Async Job System

Built on pg-boss (Postgres-backed queue, no Redis) with a separate worker process hosted on Railway. Audits run independently of the web app, timeout at 10 minutes, retry on transient failures, and emit email notifications on completion or failure.

Report Delivery

Reports render as HTML in-app and generate downloadable PDFs via Puppeteer. SendGrid handles email delivery with a summary of key findings and links to the full report. The product was designed from the start so that output would be client-usable and presentation-ready — not just a raw data dump.

Stack: Next.js 14 (App Router) · TypeScript · Supabase (Postgres) · Prisma · pg-boss · OpenAI · SendGrid · Puppeteer · Vercel + Railway

5. Key Decisions & Tradeoffs

Fixed Audit Structure Over Open-Ended Analysis

Decision: The product generates a fixed, repeatable five-section audit rather than open-ended SEO analysis. Sections, table formats, and output structure are locked to a canonical reference document.

Tradeoff: Less flexible, but outputs are consistent and auditable. The canonical PDF eliminated ambiguity across UI, report JSON, HTML, and PDF outputs — all four had to match the same source of truth.

No Automated Competitor Discovery

Decision: Users manually provide 3–5 competitor Google Business Profile URLs. No Google Maps scraping.

Tradeoff: More friction at input, but automated Maps scraping was considered both unreliable and a Terms of Service risk. Manual input produces cleaner, more trustworthy data and avoids building fragile scraping infrastructure.

Deterministic Extraction, AI Interpretation

Decision: Data fetching and parsing (HTML, schema, PageSpeed) is handled by code. AI only touches the interpretation and recommendation layer.

Tradeoff: More code to write upfront, but significantly easier to debug. Hallucination risk is contained to the sections where it's hardest to verify — and easier to catch.

Separate Worker Process

Decision: The audit worker runs as a standalone Node process on Railway, separate from the Next.js app on Vercel.

Tradeoff: More infrastructure to manage, but necessary for reliability. Audits include multiple slow, failure-prone steps — page fetching, API calls, LLM analysis, PDF generation, email delivery. Moving this into a queue made progress tracking possible and kept long-running jobs out of serverless functions.

Minimal Wizard UX

Decision: A 3-step wizard collecting only the minimum required inputs: website URL, primary keyword, city/state, GBP search phrase, business type, and competitor GBP URLs.

Tradeoff: The product feels operational and guided rather than like a technical SEO control panel — which was intentional. The constraint also forced clarity about what the audit actually needed.

6. What I Learned

The hardest part wasn't the AI — it was the plumbing. Getting async jobs to run reliably, handle partial failures gracefully, and surface useful errors took more iteration than any prompt work.

Prompt engineering for structured output is a different discipline than conversational prompting. When the AI needs to produce parseable JSON that maps to a specific schema, every ambiguity in the prompt shows up as a broken report. Precision matters more than creativity.

The canonical PDF decision was more important than it seemed at the time. By anchoring the product to a fixed methodology upfront, I avoided a class of problems that come from letting AI outputs drift across runs. That reframing — "this implements a defined audit methodology" rather than "this generates SEO recommendations" — shaped every downstream decision.

This project also clarified what "AI-native" means in practice: not replacing the audit entirely, but compressing hours of manual work into minutes while keeping a human in the loop for final review and delivery.

7. Outcome & Next Steps

The core pipeline architecture works end-to-end: inputs → async job → multi-section AI analysis → HTML report → PDF → email delivery. The architecture is production-capable, though two known issues remain in the backlog: the rankability section falls back to a generic response when SERP comparison data is unavailable, and the report doesn't always reflect the exact number of competitors selected during setup. Both are documented and scoped — not architectural, just unfinished.

The next phase is on two tracks:

Deepening the audit value: Planned additions include a domain authority and backlink health check (comparing DA scores against competitors via the Moz API), and a local map pack grid search visual — a geo-grid showing how the business ranks across different blocks of their city for the target keyword. That last feature alone is a meaningful pricing differentiator.

Moving toward a real product: The UI needs a full redesign before anything else — the current version was built to validate the pipeline, not to impress a client. Alongside that: custom domain migration to seo-audit.mola.design, Stripe integration for plan-based access, multilingual support (EN/FR/ES), and an interactive fix-tracking checklist inside each report so users can mark action items complete and have a reason to return.

The longer-term question this investigation is building toward: can this be lightweight enough to work as a mobile-first product, where a business owner runs their own audit from their phone before a discovery call?

8. Links & Resources

  • Codex Prompt v2.0 (Notion — see parent page)
  • Full Implementation Plan (Notion — see parent page)
  • GBP Audit Reference Doc (Google Doc)
  • Linear Project: AI SEO Audit backlog
Back to all investigations Next: AI Email Context