Investigation 01
AI SEO Audit
Building an end-to-end AI-powered SEO audit tool from scratch — and what the process revealed about product design, async workflows, and the real limits of AI automation.
End-to-End Audit Pipeline
Five Processing Stages — from user input to delivered report, each stage handles a distinct responsibility in the audit workflow.
Input Collection
3-step wizard captures only what the audit needs: URL, keyword, location, and competitor GBP links
Async Job Queue
pg-boss processes audits independently of the web app, with retry logic and 10-minute timeout
AI Analysis
OpenAI interprets extracted data across 5 audit sections, constrained to structured JSON output against a canonical methodology
Report Generation
Branded PDF with section scores, prioritized recommendations, and action items
Delivery
Email with key findings summary and links to the full HTML report and downloadable PDF
Why This Matters
SEO audits are valuable, but the way they're typically delivered is manual, inconsistent, and difficult to scale. Different auditors use different methods, quality varies from report to report, and clients have limited visibility into how thorough the process really is.
I turned that service workflow into a structured product. Instead of relying on open-ended AI generation, the system follows a defined audit methodology with fixed sections, consistent outputs, and a clear quality bar. AI is used for interpretation and recommendations, while deterministic code handles the parts that need to stay factually grounded. That boundary was a deliberate product decision to reduce hallucination, improve reliability, and make the system easier to debug.
The result is an asynchronous pipeline that takes a small set of user inputs, runs a full five-section audit, and delivers a branded PDF report by email. It was designed as a client-ready product workflow from the start, not just a proof of concept.
1. Overview
This was my first end-to-end full-stack AI SaaS build. I designed the product architecture, wrote the prompts, and built the entire app using Claude Code. The goal wasn't just to ship something — it was to understand what it actually takes to get AI to produce trustworthy, structured output inside a production system.
I designed the audit methodology, locked the technical stack, built the async job pipeline, and acted as the product decision-maker across every tradeoff.
2. Product Question
The core question: Can AI reliably generate structured, actionable SEO audit reports from minimal user inputs — and deliver them in a format a real client could use?
Secondary questions included:
- Where does deterministic code end and AI judgment begin?
- How do you prevent AI from hallucinating findings or skipping critical checks?
- What's the minimum viable workflow for async AI tasks in a production app?
- What quality bar makes this usable for real client projects?
3. Approach
I treated this as a product design problem first, and a coding problem second. Before writing any code, I documented a canonical audit structure based on how I'd run a manual SEO audit — covering GBP competitor analysis, on-page findings, schema markup, and rankability. That document became the source of truth the AI had to match: sections, table formats, and output structure were all locked to it. That single decision turned the product into a system implementing a defined methodology, rather than a flexible AI experiment.
The build used a job queue architecture (pg-boss) to run audits asynchronously: the user submits inputs through a minimal wizard, a worker process picks up the job, runs each audit section independently, and delivers the report via email and in-app viewer. Each section uses deterministic code for data extraction and OpenAI for qualitative analysis — keeping factual grounding in code while using the model only where judgment and synthesis add value.
4. What I Built
Five-Section Audit Structure
Each section runs independently — partial failures don't kill the whole report.
GBP Competitor Analysis
Compare against 3-5 local competitors
GBP Ranking Factors
Reviews, categories, attributes, photos
On-Page SEO
Title, meta, headings, content structure
Schema Markup
LocalBusiness, FAQPage, BreadcrumbList
Rankability Verdict
Final assessment with action items
Independent Audit Sections That Fail Gracefully
The audit runs five sections — GBP competitor analysis, GBP ranking factors, on-page SEO, schema markup, and a rankability verdict. Each section executes independently, so if one fails (a page times out, an API call drops), the rest of the report still completes and delivers. This was a deliberate design choice: partial results are more useful than no results, and isolating failures makes them easier to diagnose and fix.
Structured Output, Not Free-Form Prose
Prompts were written section-by-section against the canonical audit structure. The LLM receives structured extracted data and returns structured JSON — not paragraphs of recommendations. This made outputs parseable, consistent across runs, and easier to debug when something went wrong. Over time, the prompt evolved from an initial draft into a tighter specification, making the system feel more like a defined contract than a flexible experiment.
Async Architecture for Long-Running Jobs
Audits involve multiple slow, failure-prone steps — page fetching, API calls, LLM analysis, PDF generation, and email delivery. Running this synchronously in a serverless function was not viable. The system uses a Postgres-backed job queue with a separate worker process that runs independently of the web app. Jobs retry on transient failures, time out safely, and keep the user experience non-blocking: submit your inputs, close the browser, and get an email when the report is ready.
Client-Ready Output From the Start
Reports render as HTML in-app and generate downloadable branded PDFs. Email delivery includes a summary of key findings with links to the full report. The product was designed from the start so the output could go directly to a client — without manual cleanup or reformatting.
Next.js, TypeScript, Supabase, Prisma, pg-boss, OpenAI, SendGrid, Puppeteer, Vercel, Railway
5. Key Decisions & Tradeoffs
Fixed Audit Structure Over Open-Ended Analysis
Decision: The product generates a fixed, repeatable five-section audit rather than open-ended SEO analysis. Sections, table formats, and output structure are locked to a canonical reference document.
Tradeoff: Less flexible, but outputs are consistent and auditable. The canonical PDF eliminated ambiguity across UI, report JSON, HTML, and PDF outputs — all four had to match the same source of truth.
No Automated Competitor Discovery
Decision: Users manually provide 3–5 competitor Google Business Profile URLs. No Google Maps scraping.
Tradeoff: More friction at input, but automated Maps scraping was considered both unreliable and a Terms of Service risk. Manual input produces cleaner, more trustworthy data and avoids building fragile scraping infrastructure.
Deterministic Extraction, AI Interpretation
Decision: Data fetching and parsing (HTML, schema, PageSpeed) is handled by code. AI only touches the interpretation and recommendation layer.
Tradeoff: More code to write upfront, but significantly easier to debug. Hallucination risk is contained to the sections where it's hardest to verify — and easier to catch.
Separate Worker Process
Decision: The audit worker runs as a standalone Node process on Railway, separate from the Next.js app on Vercel.
Tradeoff: More infrastructure to manage, but necessary for reliability. Audits include multiple slow, failure-prone steps — page fetching, API calls, LLM analysis, PDF generation, email delivery. Moving this into a queue made progress tracking possible and kept long-running jobs out of serverless functions.
Minimal Wizard UX
Decision: A 3-step wizard collecting only the minimum required inputs: website URL, primary keyword, city/state, GBP search phrase, business type, and competitor GBP URLs.
Tradeoff: The product feels operational and guided rather than like a technical SEO control panel — which was intentional. The constraint also forced clarity about what the audit actually needed.
6. What I Learned
The hardest part wasn't the AI — it was the plumbing. Getting async jobs to run reliably, handle partial failures gracefully, and surface useful errors took more iteration than any prompt work.
Prompt engineering for structured output is a different discipline than conversational prompting. When the AI needs to produce parseable JSON that maps to a specific schema, every ambiguity in the prompt shows up as a broken report. Precision matters more than creativity.
The canonical PDF decision was more important than it seemed at the time. By anchoring the product to a fixed methodology upfront, I avoided a class of problems that come from letting AI outputs drift across runs. That reframing — "this implements a defined audit methodology" rather than "this generates SEO recommendations" — shaped every downstream decision.
This project also clarified what "AI-native" means in practice: not replacing the audit entirely, but compressing hours of manual work into minutes while keeping a human in the loop for final review and delivery.
7. Outcome & Next Steps
The core pipeline works end-to-end: user inputs → async job → multi-section AI analysis → HTML report → PDF generation → email delivery. This validated several of the core ideas behind the project:
- Structured AI output is workable when the model is guided by a fixed audit methodology rather than left to generate freely.
- The async orchestration and report delivery pattern works in practice: jobs run reliably, partial failures are contained, and the final output arrives as a usable client-facing document.
- System design mattered more than prompt cleverness. The decisions that had the biggest impact — the canonical audit structure, the deterministic/AI boundary, and the separate worker process — were architectural, not prompt-engineering tricks.
Two known issues remain in the backlog: the rankability section falls back to a generic response when SERP comparison data is unavailable, and the report does not always reflect the exact number of competitors selected during setup. Both are scoped and documented. They are implementation gaps, not architectural blockers.
The next phase is focused on increasing audit value and improving presentation. That includes expanding the analysis with domain authority, backlink health, and local map pack visibility, while redesigning the interface to better support a client-facing experience. The current version was built to validate the workflow and architecture first; the next version will focus on usability, trust, and presentation.
8. What This Proves
This investigation was not just about building an SEO audit tool. It was a test of whether I could take a manual service workflow and turn it into a structured, reliable product with AI in the right places.
What it showed:
- I can take a loosely defined service and turn it into a repeatable product with clear inputs, structured outputs, and a consistent quality bar.
- I know where AI adds value and where it needs constraints. In this system, AI handles interpretation and recommendations, while data extraction stays deterministic to reduce hallucination risk and improve debuggability.
- I can design for operational realities — retries, partial failures, async processing, report formatting, and delivery — not just the AI layer.
- I think in systems, user experience, and reliability, not just prompts.
9. Links & Resources
- Live App — The working audit tool, currently in development.
- Canonical Audit Methodology — The five-section audit structure adapted from an existing SEO framework. Defines the sections, table formats, and output structure the entire system is built against. (Available on request)
- Codex Prompt v2.0 — The full product specification covering architecture, data model, job flow, report structure, and UX requirements. Refined collaboratively with Claude Opus 4.5.
- Project Documentation — Architecture notes, report schema, and product strategy.