AI intel digest

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

Leonie Monigatti (Elastic) argues that context engineering is roughly 80% agentic search — the mechanics of how agents d

2026-05-0823 min read4,549 words10 facts · 0 assumptions

Start here

Executive summary

1. SUMMARY Leonie Monigatti (Elastic) argues that context engineering is roughly 80% agentic search — the mechanics of how agents decide what to pull from files, databases, memory, and the web. She demonstrates four search interfaces (semantic search, general-purpose database query via ESQL, shell tool file navigation, and custom CLI integration with Gina Grap), shows where each breaks, and offers a framework for curating a retrieval stack with "low floor" specialized tools and "high ceiling" general-purpose tools. The core shift from RAG to agentic RAG is replacing fixed retrieval pipelines with agent-controlled search tools. 2. KEY FACTS FACT: Monigatti works at Elastic, the company behind Elasticsearch | EVIDENCE: "My name is Leonie. I work at Elastic, the company behind Elastic Search." | CONFIDENCE: HIGH FACT: The workshop uses LangChain as the agent framework | EVIDENCE: "We're going to be using LangChain for this session just because it wraps a lot of the complexity" | CONFIDENCE: HIGH FACT: The demo uses GPT-4o Nano for simple tasks and switches to GPT-4o Mini for query generation tasks | EVIDENCE: "I'm using GPT 4o Nano for this demo" ... "I'm switching from the GPT 4o Nano to the mini because I am now anticipating that writing search queries is a little bit more difficult" | CONFIDENCE: HIGH FACT: The conference session data used in demos is pre-chunked and stored in Elasticsearch | EVIDENCE: "In the database I have the conference session already chunked" | CONFIDENCE: HIGH FACT: ESQL (Elasticsearch Query Language) is a pipe-based query language that differs from SQL — for example, it uses asterisk (*) not percentage (%) as wildcard | EVIDENCE: "ESQL doesn't use the percentage sign as a wild card character. And in ESQL, you would use the asterisk" | CONFIDENCE: HIGH FACT: LangChain's shell tool has no safeguards by default and can delete files | EVIDENCE: "Using the shell tool can be risky since giving your agent access to a terminal can make it delete files... Also in LangChain it doesn't have any safeguards by default" | CONFIDENCE: HIGH FACT: Vercel published a blog post titled "Testing if bash is all you need" comparing agents with bash tool vs file search vs database tools | EVIDENCE: "I think there was a very interesting blog post by Vercel I believe... I think it's called testing if bash is all you need is the title" | CONFIDENCE: HIGH FACT: The hybrid agent (bash tool + database tool) achieved highest accuracy in Vercel's testing because it verified database results with shell commands | EVIDENCE: "the hybrid agent with the bash tool and the database tool was actually achieving the highest... accuracy because it first... using the database tool and then verifying the results with the shell tool" | CONFIDENCE: HIGH FACT: Monigatti has not personally used sub-agents for search queries | EVIDENCE: "Unfortunately not. I have not played around with sub-agents yet" | CONFIDENCE: HIGH FACT: OpenCode (from the demo context) uses an "exec tool" equivalent to shell/bash tool | EVIDENCE: "If you've played around with open cloud, it's called the exec tool" | CONFIDENCE: HIGH 3. KEY IDEAS IDEA: Context engineering is ~80% agentic search | REASONING: The critical step is not curation within the context window but the search tools that decide what enters it from diverse sources | IMPLICATION: Teams over-invest in prompt engineering and under-invest in retrieval tool design IDEA: The "low floor, high ceiling" framework for tool stacks | REASONING: Specialized tools (simple parameters, low error rate) give agents easy wins; general-purpose tools (shell, query languages) handle edge cases but require more iterations | IMPLICATION: Effective agents need both types, not a single silver bullet IDEA: Progressive disclosure via agent skills | REASONING: Loading full documentation into system prompts wastes context; skills inject minimal metadata upfront and load full docs only when triggered | IMPLICATION: This is a scalable pattern for equipping agents with large tool APIs without context bloat IDEA: Shell tools "cheat" at semantic search by chaining synonyms | REASONING: When grep fails on semantic queries, agents compensate by running multiple keyword searches (regulate, compliance, GDPR, governance, etc.) | IMPLICATION: This works but is inefficient; dedicated semantic search tools (Gina Grap, etc.) are needed for non-exact retrieval IDEA: Error handling in search tools enables agent self-correction | REASONING: Returning query errors to the agent (rather than crashing) lets it rewrite and retry | IMPLICATION: Robust tool design must treat error responses as part of the agent feedback loop, not as terminal failures IDEA: Start general, then specialize based on logged behavior | REASONING: If you don't know query patterns yet, begin with general-purpose tools, log agent behavior, then extract specialized tools from recurring patterns | IMPLICATION: This reverses the common "build specialized tools first" approach and reduces premature optimization 4. KEY QUOTES "Context engineering is about 80% agentic search because it's this little box right here." — Leonie Monigatti "Doing good search is incredibly difficult and that's why we have many different techniques to do search." — Leonie Monigatti "If you just start with a core purpose, if it works fine, great. But if you add more parameters or more tools and your agent is starting to struggle with calling the right tool, then maybe add some trigger condition." — Leonie Monigatti on tool descriptions "Is returning zero search results actually a valid response or is it a failure mode?" — Leonie Monigatti "For exact matches you probably still want to use grep. And for more semantic search or fuzzy queries use Gina Grap." — Leonie Monigatti "The hybrid agent with the bash tool and the database tool was actually achieving the highest accuracy because it first... using the database tool and then verifying the results with the shell tool." — Leonie Monigatti citing Vercel 5. SIGNAL POINTS The three failure modes in agentic search: agent calls no tool, agent calls wrong tool, agent generates wrong parameters — and tool descriptions are the primary lever for fixing all three Semantic search alone is brittle: searching "GDPA" returned results about "Gemma" due to tokenization similarity, demonstrating that vector search without keyword fallback fails on exact-match needs General-purpose query tools (ESQL/SQL) raise the agent's ceiling but also error rate; the ESQL wildcard example (% vs *) shows parameter complexity is a real failure mode Shell tools are versatile but risky — no safeguards in LangChain, and agents compensate for lack of semantic search by brute-force synonym chaining Agent skills with progressive disclosure solve the "band-aid documentation" problem: instead of patching system prompts per error, load structured skill docs on demand Vercel's finding that hybrid agents (database + shell verification) outperformed single-tool agents suggests multi-tool verification patterns are underexplored The "start general, log behavior, then specialize" workflow is a practical antidote to building the wrong tools upfront 6. SOURCES MENTIONED Elastic / Elasticsearch — Monigatti's employer; context source for database demos LangChain — Agent framework used throughout demos; provides tool decorators, shell tool, skill loading boilerplate OpenAI (GPT-4o Nano, GPT-4o Mini) — LLMs used in demos Vercel — Blog post "Testing if bash is all you need" cited on hybrid agent accuracy Gina Grap — Custom CLI for semantic search demonstrated as shell tool enhancement LlamaIndex — Mentioned as having "sam tools" as semantic search alternative LightOn — Mentioned as having "coal grap" based on multi-vector embeddings ESQL (Elasticsearch Query Language) — Pipe-based query language demonstrated as general-purpose database interface OpenCode — Mentioned as using "exec tool" equivalent to shell/bash tool 7. VERDICT This video carries unique signal for practitioners building retrieval systems for agents. Unlike generic RAG tutorials, it focuses on the underexplored layer between context sources and the context window — the search tool interface itself. The concrete demonstrations of failure modes (wrong wildcard, semantic search returning Gemma for GDPA, synonym-chaining in shell) are more valuable than abstract architecture advice. The "low floor, high ceiling" framework and the Vercel hybrid-agent verification finding provide actionable mental models. Worth watching for anyone whose agents suffer from brittle retrieval, though the Elasticsearch-specific ESQL sections are less transferable than the general principles. Signal density is high for builder audiences, lower for researchers or pure theorists. --- COUNT: 10 facts, 0 assumptions, 6 demonstrations (semantic search failure, ESQL query generation, skill loading, shell tool file navigation, Gina Grap integration, hybrid agent accuracy citation) SIGNAL DENSITY: 78%

What matters

Signal points

1
The three failure modes in agentic search: agent calls no tool, agent calls wrong tool, agent generates wrong parameters — and tool descriptions are the primary lever for fixing all three
2
Semantic search alone is brittle: searching "GDPA" returned results about "Gemma" due to tokenization similarity, demonstrating that vector search without keyword fallback fails on exact-match needs
3
General-purpose query tools (ESQL/SQL) raise the agent's ceiling but also error rate; the ESQL wildcard example (% vs *) shows parameter complexity is a real failure mode
4
Shell tools are versatile but risky — no safeguards in LangChain, and agents compensate for lack of semantic search by brute-force synonym chaining
5
Agent skills with progressive disclosure solve the "band-aid documentation" problem: instead of patching system prompts per error, load structured skill docs on demand
6
Vercel's finding that hybrid agents (database + shell verification) outperformed single-tool agents suggests multi-tool verification patterns are underexplored
7
The "start general, log behavior, then specialize" workflow is a practical antidote to building the wrong tools upfront
8
6. SOURCES MENTIONED

Interpretation

Key ideas

Context engineering is ~80% agentic search

Why: The critical step is not curation within the context window but the search tools that decide what enters it from diverse sources

Implication: Teams over-invest in prompt engineering and under-invest in retrieval tool design

The "low floor, high ceiling" framework for tool stacks

Why: Specialized tools (simple parameters, low error rate) give agents easy wins; general-purpose tools (shell, query languages) handle edge cases but require more iterations

Implication: Effective agents need both types, not a single silver bullet

Progressive disclosure via agent skills

Why: Loading full documentation into system prompts wastes context; skills inject minimal metadata upfront and load full docs only when triggered

Implication: This is a scalable pattern for equipping agents with large tool APIs without context bloat

Shell tools "cheat" at semantic search by chaining synonyms

Why: When grep fails on semantic queries, agents compensate by running multiple keyword searches (regulate, compliance, GDPR, governance, etc.)

Implication: This works but is inefficient; dedicated semantic search tools (Gina Grap, etc.) are needed for non-exact retrieval

Error handling in search tools enables agent self-correction

Why: Returning query errors to the agent (rather than crashing) lets it rewrite and retry

Implication: Robust tool design must treat error responses as part of the agent feedback loop, not as terminal failures

Start general, then specialize based on logged behavior

Why: If you don't know query patterns yet, begin with general-purpose tools, log agent behavior, then extract specialized tools from recurring patterns

Implication: This reverses the common "build specialized tools first" approach and reduces premature optimization

Evidence

Key facts

Monigatti works at Elastic, the company behind Elasticsearch

HIGH

Evidence: My name is Leonie. I work at Elastic, the company behind Elastic Search.

The workshop uses LangChain as the agent framework

HIGH

Evidence: We're going to be using LangChain for this session just because it wraps a lot of the complexity

The demo uses GPT-4o Nano for simple tasks and switches to GPT-4o Mini for query generation tasks

HIGH

Evidence: I'm using GPT 4o Nano for this demo" ... "I'm switching from the GPT 4o Nano to the mini because I am now anticipating that writing search queries is a little bit more difficult

The conference session data used in demos is pre-chunked and stored in Elasticsearch

HIGH

Evidence: In the database I have the conference session already chunked

ESQL (Elasticsearch Query Language) is a pipe-based query language that differs from SQL — for example, it uses asterisk (*) not percentage (%) as wildcard

HIGH

Evidence: ESQL doesn't use the percentage sign as a wild card character. And in ESQL, you would use the asterisk

LangChain's shell tool has no safeguards by default and can delete files

HIGH

Evidence: Using the shell tool can be risky since giving your agent access to a terminal can make it delete files... Also in LangChain it doesn't have any safeguards by default

Vercel published a blog post titled "Testing if bash is all you need" comparing agents with bash tool vs file search vs database tools

HIGH

Evidence: I think there was a very interesting blog post by Vercel I believe... I think it's called testing if bash is all you need is the title

Show 3 more facts

The hybrid agent (bash tool + database tool) achieved highest accuracy in Vercel's testing because it verified database results with shell commands

HIGH

Evidence: the hybrid agent with the bash tool and the database tool was actually achieving the highest... accuracy because it first... using the database tool and then verifying the results with the shell tool

Monigatti has not personally used sub-agents for search queries

HIGH

Evidence: Unfortunately not. I have not played around with sub-agents yet

OpenCode (from the demo context) uses an "exec tool" equivalent to shell/bash tool

HIGH

Evidence: If you've played around with open cloud, it's called the exec tool

Memorable lines

Quotes

“Context engineering is about 80% agentic search because it's this little box right here." — Leonie Monigatti”

“Doing good search is incredibly difficult and that's why we have many different techniques to do search." — Leonie Monigatti”

“If you just start with a core purpose, if it works fine, great. But if you add more parameters or more tools and your agent is starting to struggle with calling the right tool, then maybe add some trigger condition." — Leonie Monigatti on tool descriptions”

“Is returning zero search results actually a valid response or is it a failure mode?" — Leonie Monigatti”

“For exact matches you probably still want to use grep. And for more semantic search or fuzzy queries use Gina Grap." — Leonie Monigatti”

“The hybrid agent with the bash tool and the database tool was actually achieving the highest accuracy because it first... using the database tool and then verifying the results with the shell tool." — Leonie Monigatti citing Vercel”