← All articles
ai-agentsSignal 75/100

AI intel digest

You can't just one shot it — Mehedi Hassan, Granola

Mehedi Hassan from Granola (referred to as "Cronulla" in the transcript) describes the gap between AI features working i

2026-05-1025 min read5,054 words10 facts · 0 assumptions
Start here

Executive summary

1. SUMMARY Mehedi Hassan from Granola (referred to as "Cronulla" in the transcript) describes the gap between AI features working in a playground versus production. He demonstrates Granola's meeting transcription and note-taking app, then explains how generic AI chat features fail real users. The core argument is that shipping production AI requires building internal observability tools, adapting desktop apps for web-based CI/CD, and creating tight feedback loops rather than trying to "one-shot" solutions with better prompts. 2. KEY FACTS FACT: Granola is a meeting notes app with real-time transcription that combines system audio and microphone input | EVIDENCE: "we're a meeting notes app where we sit on your doc... it has access to your system transcription system audio as well as your microphone audio, which means we have real-time transcription" | CONFIDENCE: HIGH FACT: Granola's chat feature allows users to ask questions about meetings and across shared context | EVIDENCE: "You can ask questions about a meeting that you just had and across a bunch of different meetings or like shared context as well" | CONFIDENCE: HIGH FACT: Web search integration cost Granola approximately 10 pence per chat at one point | EVIDENCE: "each chat could be costing you like 10 pence" | CONFIDENCE: HIGH FACT: An LLM provider shipped an overnight update that silently degraded Granola's web search results | EVIDENCE: "we were using a model for a good amount of time. And then overnight, they shipped an update and for some reason web search degraded and it was completely out of our control" | CONFIDENCE: HIGH FACT: Granola built custom internal tracing tools with a UI designed for non-engineers (product, data, CX teams) | EVIDENCE: "we started building our own tracing tools... the UI is built to like serve our our employees internally, not just like engineers, but also product, data, and like CX, and everyone" | CONFIDENCE: HIGH FACT: Granola's founder personally uses the tracing tool to follow agent loops front to back | EVIDENCE: "our founder literally goes into like the details like following the agent loop completely front to back to figure out exactly what went wrong" | CONFIDENCE: HIGH FACT: Granola refactored their Electron app so the frontend runs as a web app, enabling PR preview links | EVIDENCE: "we took our Electron app and we turned the front-end of the Electron app into a web shell. And this was deployed online. So, now our CI, whenever we open a PR, we get a preview link" | CONFIDENCE: HIGH FACT: Granola uses Cursor to automatically test PRs and upload screenshots | EVIDENCE: "once we open a PR, Cursor goes and tests it, uploads a screenshot into our PRs" | CONFIDENCE: HIGH FACT: Granola evaluated Tauri as an Electron replacement but did not ship it due to lack of performance gains | EVIDENCE: "We've tried Tauri before as well and we didn't really see massive performance gains, which we which is what we care about the most... haven't shipped it" | CONFIDENCE: HIGH FACT: The speaker has been coding since jQuery was popular and has seen React change frontend engineering | EVIDENCE: "I've been, you know, coding since jQuery was cool. I've seen React kind of change front-end engineering" | CONFIDENCE: HIGH 3. KEY IDEAS IDEA: The "one-shot" approach to AI features is seductive but insufficient for production | REASONING: Speaker demonstrates how a simple chat feature fails immediately in production with real users (wrong answers, slow search, misinterpreted queries) | IMPLICATION: Production AI requires systems for iteration and feedback, not just better initial prompts IDEA: Web search is not a simple tool call — it is a complex subsystem requiring dedicated engineering | REASONING: Speaker cites cost explosion (10p/chat), context window blowup, and provider instability as evidence that "billion-dollar companies who do web search" exist for a reason | IMPLICATION: Teams should treat search infrastructure as a first-class engineering problem, not an LLM add-on IDEA: A single prompt cannot serve diverse user roles with different information needs | REASONING: Sales wants deal focus, engineering wants action items/blockers/tickets, HR wants something else entirely | IMPLICATION: AI products need role-aware personalization or multiple specialized outputs, not generic responses IDEA: Internal observability must be designed for the whole organization, not just engineers | REASONING: Granola built a UI that product, data, and CX teams can use without CloudWatch queries; the founder himself traces agent loops | IMPLICATION: Democratizing AI debugging accelerates iteration and reduces dependency on engineering bottlenecks IDEA: Desktop apps can be re-architected to gain web-app development velocity | REASONING: Granola abstracted Electron IPC APIs to fall back to web standards, making the renderer environment-agnostic | IMPLICATION: Teams shipping desktop AI features can adopt CI/CD preview links and parallel testing without abandoning their platform IDEA: LLM self-verification can be integrated into the development workflow | REASONING: Cursor automatically tests PRs and uploads screenshots, which "speeds up the testing so much more" | IMPLICATION: AI-assisted QA is becoming a practical reality for product teams, not just a research curiosity IDEA: The core challenge of AI product engineering is closing the feedback loop, not prompting better | REASONING: Speaker frames the entire talk around making iteration feel "like playing a tennis game with LLM" rather than hoping a black box works | IMPLICATION: Infrastructure for rapid experimentation and observability matters more than any single technique 4. KEY QUOTES - "One-shotting is seductive. One line of code for web search. One prompt to serve every user. One deploy and you're done." — Video description, echoed in talk theme - "The token usage and token cost can bubble up quite a lot... each chat could be costing you like 10 pence. Obviously, at scale when you have millions of users, this is not really feasible." - "Overnight, they shipped an update and for some reason web search degraded and it was completely out of our control." - "One prompt can't generally serve everyone... LLMs are stubborn and we need to figure out how to get inside them and make them work how we want it to work." - "The answer isn't to one-shot better. It's about figuring out how you can make that feedback loop where it kind of feels like playing a tennis game with LLM." - "Our founder literally goes into like the details like following the agent loop completely front to back to figure out exactly what went wrong." 5. SIGNAL POINTS - Production AI features fail predictably: generic chatbots give wrong answers, web search is too slow and expensive, and one prompt cannot serve sales, engineering, and HR simultaneously. - Web search via LLM tool calls cost Granola ~10p per chat and blew up context windows; an overnight provider update silently degraded results with no recourse. - Granola built custom internal tracing with a UI for non-engineers (product, CX, data teams), giving full visibility into tool calls, reasoning traces, and cost — replacing CloudWatch complexity. - Granola refactored their Electron app into a web-shell frontend, enabling PR preview links and parallel testing; Cursor now auto-tests PRs and uploads screenshots. - The core thesis: the gap between "works in playground" and "works in production" is closed by feedback loop infrastructure, not by better one-shot prompts. - Granola evaluated Tauri as an Electron replacement but abandoned it due to lack of meaningful performance gains. 6. SOURCES MENTIONED - Granola (referred to as "Cronulla" in transcript): Meeting notes app with real-time transcription and AI features. Speaker is a product engineer there. - LLM providers (unnamed): Shipped an overnight update that degraded web search results. Described as "the labs." - CloudWatch: Referenced as the complex alternative that non-engineers should not need to use for debugging. - OpenTelemetry: Mentioned as an alternative approach to custom tracing. - AI SDK: Referenced as something Granola wraps around for their tracing implementation. - Electron: Desktop framework Granola uses; main process and render process architecture discussed. - Tauri: Evaluated as Electron alternative; not shipped due to lack of performance gains. - Cursor: AI coding tool used by Granola to automatically test PRs and upload screenshots. - Figma: Referenced negatively as insufficient for product validation compared to actual usage. 7. VERDICT This video carries unique signal for AI product engineers and technical leaders shipping LLM features to production. Unlike typical AI talks that focus on model capabilities or prompt engineering, this is an honest post-mortem of real production failures — cost explosions, provider instability, role mismatches — and the concrete infrastructure responses (custom tracing, Electron-to-web refactoring, Cursor automation). The speaker does not present theoretical frameworks; he describes exactly what broke, what it cost, and what Granola built to fix it. For anyone trying to close the "playground to production" gap, this is a rare look at the machinery of iteration rather than the magic of one-shot demos. Worth watching. --- COUNT: 10 facts, 0 assumptions, 0 demonstrations (the demo shown was of the product, not of the claims about production failures; claims about production are stated as experience, not demonstrated live) SIGNAL DENSITY: 75% — Most of the talk is concrete experience and specific technical decisions. Some filler around the product demo and the opening warning about not going deep into LLMs.

What matters

Signal points

  1. 1

    Production AI features fail predictably: generic chatbots give wrong answers, web search is too slow and expensive, and one prompt cannot serve sales, engineering, and HR simultaneously.

  2. 2

    Web search via LLM tool calls cost Granola ~10p per chat and blew up context windows; an overnight provider update silently degraded results with no recourse.

  3. 3

    Granola built custom internal tracing with a UI for non-engineers (product, CX, data teams), giving full visibility into tool calls, reasoning traces, and cost — replacing CloudWatch complexity.

  4. 4

    Granola refactored their Electron app into a web-shell frontend, enabling PR preview links and parallel testing; Cursor now auto-tests PRs and uploads screenshots.

  5. 5

    The core thesis: the gap between "works in playground" and "works in production" is closed by feedback loop infrastructure, not by better one-shot prompts.

  6. 6

    Granola evaluated Tauri as an Electron replacement but abandoned it due to lack of meaningful performance gains.

  7. 7

    6. SOURCES MENTIONED

  8. 8

    Granola (referred to as "Cronulla" in transcript): Meeting notes app with real-time transcription and AI features. Speaker is a product engineer there.

Interpretation

Key ideas

1

The "one-shot" approach to AI features is seductive but insufficient for production

Why: Speaker demonstrates how a simple chat feature fails immediately in production with real users (wrong answers, slow search, misinterpreted queries)

Implication: Production AI requires systems for iteration and feedback, not just better initial prompts

2

Web search is not a simple tool call — it is a complex subsystem requiring dedicated engineering

Why: Speaker cites cost explosion (10p/chat), context window blowup, and provider instability as evidence that "billion-dollar companies who do web search" exist for a reason

Implication: Teams should treat search infrastructure as a first-class engineering problem, not an LLM add-on

3

A single prompt cannot serve diverse user roles with different information needs

Why: Sales wants deal focus, engineering wants action items/blockers/tickets, HR wants something else entirely

Implication: AI products need role-aware personalization or multiple specialized outputs, not generic responses

4

Internal observability must be designed for the whole organization, not just engineers

Why: Granola built a UI that product, data, and CX teams can use without CloudWatch queries; the founder himself traces agent loops

Implication: Democratizing AI debugging accelerates iteration and reduces dependency on engineering bottlenecks

5

Desktop apps can be re-architected to gain web-app development velocity

Why: Granola abstracted Electron IPC APIs to fall back to web standards, making the renderer environment-agnostic

Implication: Teams shipping desktop AI features can adopt CI/CD preview links and parallel testing without abandoning their platform

6

LLM self-verification can be integrated into the development workflow

Why: Cursor automatically tests PRs and uploads screenshots, which "speeds up the testing so much more"

Implication: AI-assisted QA is becoming a practical reality for product teams, not just a research curiosity

Evidence

Key facts

Granola is a meeting notes app with real-time transcription that combines system audio and microphone input

HIGH

Evidence: we're a meeting notes app where we sit on your doc... it has access to your system transcription system audio as well as your microphone audio, which means we have real-time transcription

Granola's chat feature allows users to ask questions about meetings and across shared context

HIGH

Evidence: You can ask questions about a meeting that you just had and across a bunch of different meetings or like shared context as well

Web search integration cost Granola approximately 10 pence per chat at one point

HIGH

Evidence: each chat could be costing you like 10 pence

An LLM provider shipped an overnight update that silently degraded Granola's web search results

HIGH

Evidence: we were using a model for a good amount of time. And then overnight, they shipped an update and for some reason web search degraded and it was completely out of our control

Granola built custom internal tracing tools with a UI designed for non-engineers (product, data, CX teams)

HIGH

Evidence: we started building our own tracing tools... the UI is built to like serve our our employees internally, not just like engineers, but also product, data, and like CX, and everyone

Granola's founder personally uses the tracing tool to follow agent loops front to back

HIGH

Evidence: our founder literally goes into like the details like following the agent loop completely front to back to figure out exactly what went wrong

Granola refactored their Electron app so the frontend runs as a web app, enabling PR preview links

HIGH

Evidence: we took our Electron app and we turned the front-end of the Electron app into a web shell. And this was deployed online. So, now our CI, whenever we open a PR, we get a preview link

Show 3 more facts

Granola uses Cursor to automatically test PRs and upload screenshots

HIGH

Evidence: once we open a PR, Cursor goes and tests it, uploads a screenshot into our PRs

Granola evaluated Tauri as an Electron replacement but did not ship it due to lack of performance gains

HIGH

Evidence: We've tried Tauri before as well and we didn't really see massive performance gains, which we which is what we care about the most... haven't shipped it

The speaker has been coding since jQuery was popular and has seen React change frontend engineering

HIGH

Evidence: I've been, you know, coding since jQuery was cool. I've seen React kind of change front-end engineering

Memorable lines

Quotes

One-shotting is seductive. One line of code for web search. One prompt to serve every user. One deploy and you're done." — Video description, echoed in talk theme
The token usage and token cost can bubble up quite a lot... each chat could be costing you like 10 pence. Obviously, at scale when you have millions of users, this is not really feasible.
Overnight, they shipped an update and for some reason web search degraded and it was completely out of our control.
One prompt can't generally serve everyone... LLMs are stubborn and we need to figure out how to get inside them and make them work how we want it to work.
The answer isn't to one-shot better. It's about figuring out how you can make that feedback loop where it kind of feels like playing a tennis game with LLM.
Our founder literally goes into like the details like following the agent loop completely front to back to figure out exactly what went wrong.