Investigation 02
AI Email Context
Exploring how local project knowledge can give AI enough context to draft clearer, lower-effort replies without manual prompt-stuffing.
Scheduled AI Agent Modes
Five specialized modes — each triggered on schedule or on demand — working from the same Notion context layer.
Draft
Context-aware email reply drafting with tone matching
Inbox Scan
Identifies emails needing response with priority scoring
Daily Briefing
Morning summary of pending items and action needed
Decision Extraction
Pulls commitments and decisions from email threads
Resource Scanner
Indexes links and attachments into Notion knowledge base
1. The Problem
AI writing tools are impressive until you try to use them for your actual work.
Ask a general AI to draft a client email, and you get something competent but hollow. It doesn’t know who you’re writing to, what you agreed on last week, how you usually open messages, or which project this is referencing. You spend more time editing than you would have spent writing from scratch.
The obvious fix — dumping all your context into a prompt — hits a ceiling quickly. You can’t paste your entire client history into every email draft. And even if you could, it would be expensive, slow, and fragile.
The question this investigation set out to answer: can a structured context layer make AI-drafted emails reliably good, for real work, without manual prompt-stuffing every time?
2. What I Built
AI Email Context is an AI-assisted email management system that drafts contextual replies, scans for emails needing attention, and generates daily briefings — using a Notion workspace as its memory.
Instead of prompting the AI with raw context, the system retrieves structured data at runtime:
Structured Retrieval Layer
Instead of prompting the AI with raw context, the system retrieves structured data at runtime from these five Notion-backed sources.
Contacts Database
One record per person, history, contact type, relationship notes
Tone Profile
Written style guide describing personality and writing habits
Decision Log
Structured record of key decisions across active projects
Email Examples
Real sent emails used as few-shot style references
Skill Log
Record of every draft generated for audit trail and iteration
Contacts Database
One record per person, with communication history, contact type, and relationship notes.
Tone Profile
A written style guide the AI reads before drafting, not a set of rules but a description of how I actually write.
Decision Log
A structured record of key decisions across projects, so the AI can reference what was agreed without me restating it.
Email Examples
Real sent emails used as style reference.
Skill Log
A record of every draft generated, for audit trail and iteration.
3. How It Works
The system runs as a scheduled AI agent, with five modes: Draft, Inbox Scan, Daily Briefing, Decision Extraction, and Resource Scanner.
Scheduled AI Agent Modes
Five distinct operational modes running against the Notion-backed context layer to manage communications and extract knowledge.
Draft
Contextual replies
Inbox Scan
Triage & filter
Daily Briefing
Morning summary
Decision Extraction
Log agreements
Resource Scanner
Index new assets
When a scan or draft runs, the agent:
- Retrieves active contacts from Notion
- Filters emails against that contact list — ignoring promotional, automated, and already-replied threads
- Loads the relevant context (tone profile, decision log entries, relationship notes) for each email that needs a response
- Drafts a reply and saves it to Gmail as a draft in the correct thread
- Logs the run, including which context sources were used
Runtime Retrieval Process
How the agent processes an inbox scan or draft request step-by-step.
Retrieve Contacts
From Notion DB
Filter Emails
Ignore promo/auto
Load Context
Tone, decisions, notes
Draft Reply
With [VERIFY] tags
Save to Gmail
In correct thread
Log Run
Record used context
The [VERIFY: ...] tag system flags anything the agent isn’t certain about — rather than hallucinating a detail, it marks it for human review inline in the draft.
4. What the Testing Showed
Tone fidelity improved significantly with real context
Without the context layer, AI drafts were correct but generic. With the Notion-backed Tone Profile, drafts matched the actual voice: warm opener, bold section headers for multi-part messages, action-oriented close. The style guide works because it describes personality, not just formatting rules.
Contextual accuracy removed most of the editing work
Without project context, AI drafts are vague and require heavy editing to be usable. With the Decision Log and Contact Notes, drafts referenced the right project names, recent decisions, and relationship context — without me restating any of it. External first-user testing produced drafts requiring minimal editing, before any calibration period.
Context vs. No Context — Same Email, Different Results
A real comparison showing how structured context transforms a generic AI response into a highly specific, ready-to-send draft.
Hey Ridgeline team,
Thanks for your time today reviewing our statement of work. I’ve attached it along with our MSA for your records.
As discussed, I’ve added a clause giving you the option to opt out after the discovery phase if we can’t deliver within the estimated budget plus contingency. The only scenario where costs would exceed the estimate is if the integration scope expands significantly beyond what’s currently planned. We don’t expect that, but you have the protection to re-scope or walk away.
Here are a few accounts where we’ve done similar POS integrations:
- Summit Sports — integrated with Klaviyo and Shopify Plus
- Belair Group — integrated with Adobe Commerce and Salesforce SFMC
- Finch & Co — integrated with BigCommerce and Klaviyo
Let me know how your internal discussions go and if you have any questions on the SOW.
Thanks, Derek
Hi Derek,
Thanks for sending this over. I’ll review the SOW and MSA and get back to you with any questions.
The opt-out clause sounds reasonable, and it’s good to know you’ve worked on similar integrations before. I’ll share this with the team and we’ll discuss internally.
Appreciate the follow-up — I’ll be in touch soon.
Best, William
Hi Derek,
Thanks for turning this around quickly — the opt-out clause after discovery is exactly the kind of protection Nadia flagged as a priority in our last internal review, so that’s good to see.
I’ve shared the SOW and MSA with the team. A few things we’ll be working through on our side:
- Reference check on the POS integrations — the Summit Sports and Belair examples are helpful. I’ll ask whether their Oracle Retail partner was Meridian Systems, since that’s who manages our POS environment and we’d want to confirm they’ve collaborated before.
- Scope clarity on the middleware layer — our current Klaviyo integration runs through a custom connector that the Brightpath team built last year. We need to confirm whether your approach replaces that or sits alongside it.
- Budget alignment — I’ll pull the numbers into the comparison spreadsheet alongside the Maplewood proposal so we can present both options to Nadia and Johns at Thursday’s sync.
Expect to have questions back to you by end of week. If anything is time-sensitive on your side, let me know.
William
The first real bug: silent data loss at the retrieval layer
During the scheduled morning scan, the system used a semantic search query to fetch active contacts from Notion. It returned 10 of 18 contacts — silently, with no error. Eight contacts were simply excluded from the scan, including several active clients.
This is a platform limitation in Notion’s search API, not a logic error. The fix required replacing semantic search with a direct database fetch, adding count verification, and building a fallback protocol. The new approach consistently retrieves all 18 contacts and logs the result.
The broader lesson: for any AI system that depends on retrieval, silent partial retrieval is a critical failure mode. It doesn’t break loudly — it just quietly misses things.
External testing surfaced setup UX gaps — and confirmed the system is resilient
A colleague installed the system from scratch and ran it across several days, including intentional stress tests. The functionality held up well, but early runs surfaced real setup gaps:
- The “always run” permission prompts weren’t clearly flagged as something the user needed to approve during setup. A guided test run step was added so users see and approve these prompts before enabling scheduled tasks.
- The task monitor sidebar is collapsed by default. Added an explicit step to open it during setup so users can see what’s running.
- Gmail drafts weren’t landing in the correct thread — a threading bug where the draft was created without passing the thread identifier. Fixed and verified in the smoke test checklist.
- The [VERIFY: ...] tags weren’t prominent enough — these are flags for human review, and they need to be visible enough that the user actually sees them before sending.
The stress test that stood out: the Gmail connector was intentionally disconnected before a morning run to see what would happen. The system didn’t fail silently. It notified via the briefing that Gmail was unavailable, still produced two draft responses from partial inbox access, surfaced them in the briefing for copy/paste review, and flagged connector errors for attention.
The briefing also proactively flagged an unknown sender as someone worth adding to the Contacts DB — a small detail, but it shows the system doing useful triage beyond just drafting replies.
This feedback round accelerated quality significantly. Day-one external install going from “works but confusing” to “smooth” in a single iteration — and then holding up to intentional failure testing — is a meaningful signal.
Token efficiency required deliberate design
Token Efficiency & Optimization
Model selection and mode-aware context loading significantly reduced token consumption without degrading output quality.
Daily Pro Plan Usage
-64% DropSwitching from Opus to Sonnet and optimizing context loading reduced usage of the 5-hour window.
Briefing Mode Context Size
-70% TokensBefore
Mode-Aware
Loading all context for every mode burned budget. Mode-aware optimization only loads what’s needed.
The more interesting signal came from continued testing. After switching from Opus to Sonnet, token usage dropped from ~25% of the Pro plan’s 5-hour window on day one to ~16% on day two, and ~9% by day three — without any change in output quality. Model selection turns out to be meaningful configuration, not just a preference.
The initial version loaded all context for every mode regardless of whether the mode needed it. A briefing run was consuming as much context as a full draft run.
Mode-aware optimization brought briefing mode from approximately 3,000-5,000 tokens down to 800-1,500. A further optimization — caching active contacts in a single reference page rather than fetching individual records — reduced the contact retrieval step from multiple round-trips to one fetch. Time-bounded queries eliminated repeated processing of already-reviewed emails.
System stability required staggering scheduled tasks
Running the morning briefing and inbox scan as a single chained task caused the app to freeze when connectors were slow on startup. The fix: stagger them 30 minutes apart. Each runs independently and releases connector resources before the next begins.
Connector latency on cold starts is a genuine failure mode for production scheduled AI agents, not an edge case.
5. What This Approach Gets Right
- No custom infrastructure. Notion’s databases, pages, and relations are enough to build a structured retrieval layer. No vector database, no embeddings pipeline.
- Human-editable context. Contacts, decisions, and tone profile can be updated directly in Notion without touching code. The AI picks up changes on the next run.
- Structured retrieval enforces consistency. Database schemas mean the agent always knows which fields to expect.
- Incremental improvement is visible. The Skill Log creates an audit trail across runs, making it possible to see where draft quality improved and where gaps remain.
6. Where It Hits Limits
The context layer works well within Notion. But it doesn’t know that “John” in a Gmail thread is the same person as a contact in the database, or that a Google Drive file relates to a specific client unless the file name matches a known prefix. Resolution across sources — email, Drive, calendar, project management — requires manual maintenance or naming conventions as workarounds.
For a single-user system managing a known set of contacts, this is manageable. At scale, or across a team, the manual maintenance burden grows quickly.
7. Current Status
Email Brain is in active daily use at v3.5. Scheduled tasks run every morning. The system is documented and installable by others — external install testing confirmed the setup guide works end-to-end.
GitHub release is the next milestone.
8. Takeaways
- Structured context meaningfully improves AI output quality. The difference between a generic AI draft and a contextually accurate one isn’t better prompting — it’s better retrieval.
- Silent failures are the dangerous ones. Partial contact retrieval looked like success until it was audited. Logging and count verification aren’t optional for production AI systems.
- External testing finds what internal testing misses. Setup gaps were invisible to me because I already knew how the system worked. First-user testing revealed the actual experience.
- Token efficiency is a design constraint, not an afterthought. An agent that loads everything on every run burns context budget quickly. Mode-aware optimization is worth building early — and model selection matters more than expected. Switching from Opus to Sonnet cut usage by roughly a third with no quality loss.
- Resilience should be designed in, not bolted on. A system that fails gracefully — notifying the user, preserving what it can, flagging what needs attention — is meaningfully more useful than one that just stops. The Gmail disconnection test made this concrete.
9. Links & Resources
- Email Brain GitHub Repository (GitHub)
- Context Architecture Documentation (Notion)
- External Testing Feedback Log (Notion)