AI intel digest
Agentic Search for Context Engineering — Leonie Monigatti, Elastic
Leonie Monigatti (Elastic) argues that context engineering is roughly 80% agentic search — the mechanics of how agents d
Executive summary
1. SUMMARY Leonie Monigatti (Elastic) argues that context engineering is roughly 80% agentic search — the mechanics of how agents decide what to pull from files, databases, memory, and the web. She demonstrates four search interfaces (semantic search, general-purpose database query via ESQL, shell tool file navigation, and custom CLI integration with Gina Grap), shows where each breaks, and offers a framework for curating a retrieval stack with "low floor" specialized tools and "high ceiling" general-purpose tools. The core shift from RAG to agentic RAG is replacing fixed retrieval pipelines with agent-controlled search tools. 2. KEY FACTS FACT: Monigatti works at Elastic, the company behind Elasticsearch | EVIDENCE: "My name is Leonie. I work at Elastic, the company behind Elastic Search." | CONFIDENCE: HIGH FACT: The workshop uses LangChain as the agent framework | EVIDENCE: "We're going to be using LangChain for this session just because it wraps a lot of the complexity" | CONFIDENCE: HIGH FACT: The demo uses GPT-4o Nano for simple tasks and switches to GPT-4o Mini for query generation tasks | EVIDENCE: "I'm using GPT 4o Nano for this demo" ... "I'm switching from the GPT 4o Nano to the mini because I am now anticipating that writing search queries is a little bit more difficult" | CONFIDENCE: HIGH FACT: The conference session data used in demos is pre-chunked and stored in Elasticsearch | EVIDENCE: "In the database I have the conference session already chunked" | CONFIDENCE: HIGH FACT: ESQL (Elasticsearch Query Language) is a pipe-based query language that differs from SQL — for example, it uses asterisk (*) not percentage (%) as wildcard | EVIDENCE: "ESQL doesn't use the percentage sign as a wild card character. And in ESQL, you would use the asterisk" | CONFIDENCE: HIGH FACT: LangChain's shell tool has no safeguards by default and can delete files | EVIDENCE: "Using the shell tool can be risky since giving your agent access to a terminal can make it delete files... Also in LangChain it doesn't have any safeguards by default" | CONFIDENCE: HIGH FACT: Vercel published a blog post titled "Testing if bash is all you need" comparing agents with bash tool vs file search vs database tools | EVIDENCE: "I think there was a very interesting blog post by Vercel I believe... I think it's called testing if bash is all you need is the title" | CONFIDENCE: HIGH FACT: The hybrid agent (bash tool + database tool) achieved highest accuracy in Vercel's testing because it verified database results with shell commands | EVIDENCE: "the hybrid agent with the bash tool and the database tool was actually achieving the highest... accuracy because it first... using the database tool and then verifying the results with the shell tool" | CONFIDENCE: HIGH FACT: Monigatti has not personally used sub-agents for search queries | EVIDENCE: "Unfortunately not. I have not played around with sub-agents yet" | CONFIDENCE: HIGH FACT: OpenCode (from the demo context) uses an "exec tool" equivalent to shell/bash tool | EVIDENCE: "If you've played around with open cloud, it's called the exec tool" | CONFIDENCE: HIGH 3. KEY IDEAS IDEA: Context engineering is ~80% agentic search | REASONING: The critical step is not curation within the context window but the search tools that decide what enters it from diverse sources | IMPLICATION: Teams over-invest in prompt engineering and under-invest in retrieval tool design IDEA: The "low floor, high ceiling" framework for tool stacks | REASONING: Specialized tools (simple parameters, low error rate) give agents easy wins; general-purpose tools (shell, query languages) handle edge cases but require more iterations | IMPLICATION: Effective agents need both types, not a single silver bullet IDEA: Progressive disclosure via agent skills | REASONING: Loading full documentation into system prompts wastes context; skills inject minimal metadata upfront and load full docs only when triggered | IMPLICATION: This is a scalable pattern for equipping agents with large tool APIs without context bloat IDEA: Shell tools "cheat" at semantic search by chaining synonyms | REASONING: When grep fails on semantic queries, agents compensate by running multiple keyword searches (regulate, compliance, GDPR, governance, etc.) | IMPLICATION: This works but is inefficient; dedicated semantic search tools (Gina Grap, etc.) are needed for non-exact retrieval IDEA: Error handling in search tools enables agent self-correction | REASONING: Returning query errors to the agent (rather than crashing) lets it rewrite and retry | IMPLICATION: Robust tool design must treat error responses as part of the agent feedback loop, not as terminal failures IDEA: Start general, then specialize based on logged behavior | REASONING: If you don't know query patterns yet, begin with general-purpose tools, log agent behavior, then extract specialized tools from recurring patterns | IMPLICATION: This reverses the common "build specialized tools first" approach and reduces premature optimization 4. KEY QUOTES "Context engineering is about 80% agentic search because it's this little box right here." — Leonie Monigatti "Doing good search is incredibly difficult and that's why we have many different techniques to do search." — Leonie Monigatti "If you just start with a core purpose, if it works fine, great. But if you add more parameters or more tools and your agent is starting to struggle with calling the right tool, then maybe add some trigger condition." — Leonie Monigatti on tool descriptions "Is returning zero search results actually a valid response or is it a failure mode?" — Leonie Monigatti "For exact matches you probably still want to use grep. And for more semantic search or fuzzy queries use Gina Grap." — Leonie Monigatti "The hybrid agent with the bash tool and the database tool was actually achieving the highest accuracy because it first... using the database tool and then verifying the results with the shell tool." — Leonie Monigatti citing Vercel 5. SIGNAL POINTS The three failure modes in agentic search: agent calls no tool, agent calls wrong tool, agent generates wrong parameters — and tool descriptions are the primary lever for fixing all three Semantic search alone is brittle: searching "GDPA" returned results about "Gemma" due to tokenization similarity, demonstrating that vector search without keyword fallback fails on exact-match needs General-purpose query tools (ESQL/SQL) raise the agent's ceiling but also error rate; the ESQL wildcard example (% vs *) shows parameter complexity is a real failure mode Shell tools are versatile but risky — no safeguards in LangChain, and agents compensate for lack of semantic search by brute-force synonym chaining Agent skills with progressive disclosure solve the "band-aid documentation" problem: instead of patching system prompts per error, load structured skill docs on demand Vercel's finding that hybrid agents (database + shell verification) outperformed single-tool agents suggests multi-tool verification patterns are underexplored The "start general, log behavior, then specialize" workflow is a practical antidote to building the wrong tools upfront 6. SOURCES MENTIONED Elastic / Elasticsearch — Monigatti's employer; context source for database demos LangChain — Agent framework used throughout demos; provides tool decorators, shell tool, skill loading boilerplate OpenAI (GPT-4o Nano, GPT-4o Mini) — LLMs used in demos Vercel — Blog post "Testing if bash is all you need" cited on hybrid agent accuracy Gina Grap — Custom CLI for semantic search demonstrated as shell tool enhancement LlamaIndex — Mentioned as having "sam tools" as semantic search alternative LightOn — Mentioned as having "coal grap" based on multi-vector embeddings ESQL (Elasticsearch Query Language) — Pipe-based query language demonstrated as general-purpose database interface OpenCode — Mentioned as using "exec tool" equivalent to shell/bash tool 7. VERDICT This video carries unique signal for practitioners building retrieval systems for agents. Unlike generic RAG tutorials, it focuses on the underexplored layer between context sources and the context window — the search tool interface itself. The concrete demonstrations of failure modes (wrong wildcard, semantic search returning Gemma for GDPA, synonym-chaining in shell) are more valuable than abstract architecture advice. The "low floor, high ceiling" framework and the Vercel hybrid-agent verification finding provide actionable mental models. Worth watching for anyone whose agents suffer from brittle retrieval, though the Elasticsearch-specific ESQL sections are less transferable than the general principles. Signal density is high for builder audiences, lower for researchers or pure theorists. --- COUNT: 10 facts, 0 assumptions, 6 demonstrations (semantic search failure, ESQL query generation, skill loading, shell tool file navigation, Gina Grap integration, hybrid agent accuracy citation) SIGNAL DENSITY: 78%
Signal points
- 1
The three failure modes in agentic search: agent calls no tool, agent calls wrong tool, agent generates wrong parameters — and tool descriptions are the primary lever for fixing all three
- 2
Semantic search alone is brittle: searching "GDPA" returned results about "Gemma" due to tokenization similarity, demonstrating that vector search without keyword fallback fails on exact-match needs
- 3
General-purpose query tools (ESQL/SQL) raise the agent's ceiling but also error rate; the ESQL wildcard example (% vs *) shows parameter complexity is a real failure mode
- 4
Shell tools are versatile but risky — no safeguards in LangChain, and agents compensate for lack of semantic search by brute-force synonym chaining
- 5
Agent skills with progressive disclosure solve the "band-aid documentation" problem: instead of patching system prompts per error, load structured skill docs on demand
- 6
Vercel's finding that hybrid agents (database + shell verification) outperformed single-tool agents suggests multi-tool verification patterns are underexplored
- 7
The "start general, log behavior, then specialize" workflow is a practical antidote to building the wrong tools upfront
- 8
6. SOURCES MENTIONED
Key ideas
Context engineering is ~80% agentic search
Why: The critical step is not curation within the context window but the search tools that decide what enters it from diverse sources
Implication: Teams over-invest in prompt engineering and under-invest in retrieval tool design
The "low floor, high ceiling" framework for tool stacks
Why: Specialized tools (simple parameters, low error rate) give agents easy wins; general-purpose tools (shell, query languages) handle edge cases but require more iterations
Implication: Effective agents need both types, not a single silver bullet
Progressive disclosure via agent skills
Why: Loading full documentation into system prompts wastes context; skills inject minimal metadata upfront and load full docs only when triggered
Implication: This is a scalable pattern for equipping agents with large tool APIs without context bloat
Shell tools "cheat" at semantic search by chaining synonyms
Why: When grep fails on semantic queries, agents compensate by running multiple keyword searches (regulate, compliance, GDPR, governance, etc.)
Implication: This works but is inefficient; dedicated semantic search tools (Gina Grap, etc.) are needed for non-exact retrieval
Error handling in search tools enables agent self-correction
Why: Returning query errors to the agent (rather than crashing) lets it rewrite and retry
Implication: Robust tool design must treat error responses as part of the agent feedback loop, not as terminal failures
Start general, then specialize based on logged behavior
Why: If you don't know query patterns yet, begin with general-purpose tools, log agent behavior, then extract specialized tools from recurring patterns
Implication: This reverses the common "build specialized tools first" approach and reduces premature optimization
Key facts
Monigatti works at Elastic, the company behind Elasticsearch
HIGHEvidence: My name is Leonie. I work at Elastic, the company behind Elastic Search.
The workshop uses LangChain as the agent framework
HIGHEvidence: We're going to be using LangChain for this session just because it wraps a lot of the complexity
The demo uses GPT-4o Nano for simple tasks and switches to GPT-4o Mini for query generation tasks
HIGHEvidence: I'm using GPT 4o Nano for this demo" ... "I'm switching from the GPT 4o Nano to the mini because I am now anticipating that writing search queries is a little bit more difficult
The conference session data used in demos is pre-chunked and stored in Elasticsearch
HIGHEvidence: In the database I have the conference session already chunked
ESQL (Elasticsearch Query Language) is a pipe-based query language that differs from SQL — for example, it uses asterisk (*) not percentage (%) as wildcard
HIGHEvidence: ESQL doesn't use the percentage sign as a wild card character. And in ESQL, you would use the asterisk
LangChain's shell tool has no safeguards by default and can delete files
HIGHEvidence: Using the shell tool can be risky since giving your agent access to a terminal can make it delete files... Also in LangChain it doesn't have any safeguards by default
Vercel published a blog post titled "Testing if bash is all you need" comparing agents with bash tool vs file search vs database tools
HIGHEvidence: I think there was a very interesting blog post by Vercel I believe... I think it's called testing if bash is all you need is the title
Show 3 more facts
The hybrid agent (bash tool + database tool) achieved highest accuracy in Vercel's testing because it verified database results with shell commands
HIGHEvidence: the hybrid agent with the bash tool and the database tool was actually achieving the highest... accuracy because it first... using the database tool and then verifying the results with the shell tool
Monigatti has not personally used sub-agents for search queries
HIGHEvidence: Unfortunately not. I have not played around with sub-agents yet
OpenCode (from the demo context) uses an "exec tool" equivalent to shell/bash tool
HIGHEvidence: If you've played around with open cloud, it's called the exec tool
Quotes
“Context engineering is about 80% agentic search because it's this little box right here." — Leonie Monigatti”
“Doing good search is incredibly difficult and that's why we have many different techniques to do search." — Leonie Monigatti”
“If you just start with a core purpose, if it works fine, great. But if you add more parameters or more tools and your agent is starting to struggle with calling the right tool, then maybe add some trigger condition." — Leonie Monigatti on tool descriptions”
“Is returning zero search results actually a valid response or is it a failure mode?" — Leonie Monigatti”
“For exact matches you probably still want to use grep. And for more semantic search or fuzzy queries use Gina Grap." — Leonie Monigatti”
“The hybrid agent with the bash tool and the database tool was actually achieving the highest accuracy because it first... using the database tool and then verifying the results with the shell tool." — Leonie Monigatti citing Vercel”