cover

2026-06-23

Why I Stopped Using Semantic Embeddings for Tool Retrieval

This Reddit post is interesting from a Claude / Claude Code builder’s perspective because it points at a very practical question: what is the best way to choose tools for an agent, and do semantic embeddings actually help? Even without much visible source text here, the title alone signals a common pain point in LLM systems: retrieval methods that sound elegant on paper but can be awkward in real agent workflows.

Key Points

The post is about stopping the use of semantic embeddings for tool retrieval.
The implied problem is that embedding-based matching may not have been the right fit for selecting tools in a tool-using LLM system.
This is directly relevant to agent design, tool routing, and function selection in Claude-style applications.
The story suggests a shift away from one retrieval approach toward something else, though the extracted source text here does not expose the replacement method.
For builders, the underlying theme is that semantic similarity is not always the most reliable signal when deciding which tool an agent should call.

My Take

What strikes me is how plausible this is in real systems. I think embeddings are often treated as the default answer for anything “retrieval-like,” but tool selection is a slightly different beast: you’re not just finding related text, you’re trying to make a high-stakes routing decision that affects correctness, latency, and user trust.

If I were building with Claude or Claude Code, I’d be cautious about assuming that vector similarity is enough. I’d probably use embeddings as one signal among several, not the whole decision layer. For example, I’d want hard constraints, tool metadata, maybe lightweight rules, and some explicit disambiguation before I let an agent fire off the wrong action. That feels more robust than hoping semantic closeness maps cleanly to tool intent.

I also think this kind of post is a useful antidote to “embeddings everywhere” hype. Embeddings are great for search, clustering, and fuzzy recall. But for tool use, I’d be curious whether simpler approaches sometimes win because they’re easier to debug and easier to reason about. When an agent picks the wrong tool, you want to know why immediately—not squint at a vector space and guess.

My honest take: if you’re building a Claude-based agent, I’d experiment with semantic retrieval, but I wouldn’t marry it. I’d test it against rule-based routing, schema matching, and explicit tool descriptions, then keep whatever produces the cleanest behavior in practice.

The takeaway is simple: semantic embeddings are powerful, but tool retrieval may be one of those areas where “more semantic” is not automatically “better.” For agent builders, the right abstraction is the one that makes tool choice dependable, not the one that sounds most elegant.

Reference: Source title

同じ著者の記事

A Model-Agnostic AI Workstation, but the Source Is Essentially Empty

A Model-Agnostic AI Workstation, but the Source Is Essentially Empty

From a Claude / Claude Code developer’s perspective, this is interesting mostly because the article title points at a real pain point: people want one setup that can work across models instead of rebuilding their workflow every time they switch vendors. But the extracted source body here doesn’t actually include the underlying post content, so there’s no substantive claim to analyze beyond the existence of the topic itself. The source metadata points to a Reddit post titled **“Please wait for ve

A Local Qwen Model Beat Claude Opus 4.7 on Simon Willison’s Pelican Test

A Local Qwen Model Beat Claude Opus 4.7 on Simon Willison’s Pelican Test

From a Claude / Claude Code builder’s perspective, this is a useful reminder that benchmark vibes can get weird fast. Simon Willison’s long-running “pelican riding a bicycle” test is intentionally silly, but it sometimes tracks something real: whether a model can reliably produce a clean SVG illustration from a prompt. Simon compared two fresh model releases: Alibaba’s Qwen3.6-35B-A3B and Anthropic’s Claude Opus 4.7. He ran Qwen locally on a MacBook Pro M5 using a 20.9GB quantized GGUF model in

Claude Code’s “Extended Thinking” Isn’t the Actual Reasoning

Claude Code’s “Extended Thinking” Isn’t the Actual Reasoning

For Claude Code users and people building agent workflows on top of Claude, this is a surprisingly important distinction: what looks like a reasoning trace may not be the raw chain of thought you assumed it was. If you’re treating those logs as an audit trail, a debugging artifact, or evidence of why an agent acted, the gap between “summary” and “actual thinking” really matters. The author inspected Claude Code session logs and found that the “thinking blocks” were not readable reasoning text, b

Mystery Company, $500 Million Burn, and the Risk of Agentic AI Hype

Mystery Company, $500 Million Burn, and the Risk of Agentic AI Hype

From a Claude / Claude Code developer’s perspective, this kind of headline is interesting because it sits right at the intersection of AI ambition and operational reality. Even without the full article text here, the framing alone suggests a cautionary tale about how quickly “AI transformation” can turn into expensive confusion when execution, governance, and expectations drift apart. The source headline describes a “mystery company” that reportedly accidentally spent $500 million. The amount is

Claude’s Identity Verification Push Is About Safety, Not Convenience

Claude’s Identity Verification Push Is About Safety, Not Convenience

If you build with Claude or Claude Code, this is one of those policy updates that matters even if it doesn’t feel “product-y” at first glance. Anthropic is making identity verification part of its safety and compliance stack, which tells you a lot about how seriously it’s treating abuse prevention and access control around powerful capabilities. Anthropic is rolling out identity verification for certain Claude use cases. You may see a verification prompt during routine platform integrity checks

Claude Under Pressure: What This Reddit Post Is Really Saying

Claude Under Pressure: What This Reddit Post Is Really Saying

From a Claude / Claude Code perspective, this story is interesting less for the headline than for what it hints at: people are actively probing how the model behaves under manipulation, pressure, and social-engineering-style prompts. For developers building with Claude, that kind of adversarial testing matters because it gets at trust, refusal behavior, and how fragile “verification” can be in a conversational system. The source is a Reddit post whose extracted body does not provide the underlyi

Recall: offline project memory for Claude Code

Recall: offline project memory for Claude Code

For Claude Code users, the most annoying part of a long-running project is not the coding itself — it’s the constant re-introduction. Recall is interesting because it tackles that cold-start problem with a very opinionated approach: local-first, offline, and deliberately non-LLM. That makes it less flashy than “AI memory” products, but in some ways more practical. Recall is a GitHub project that adds durable memory to Claude Code without sending data to an API. It keeps two files under `.recall/

Mozilla Used Anthropic's Mythos to Find and Fix a High-Stakes Issue

Mozilla Used Anthropic's Mythos to Find and Fix a High-Stakes Issue

From a Claude / Claude Code builder’s perspective, this is exactly the kind of story that gets my attention: a real-world product team using an Anthropic toolchain to hunt down an issue that mattered enough to be worth fixing. Even with the source text here being extremely sparse, the implication is interesting on its own: AI-assisted debugging is moving from demo-land into practical maintenance work. The source headline says Mozilla used Anthropic’s Mythos to find and fix an issue. The framing

Anthropic’s Call for an AI Freeze and What It Means for Claude Builders

Anthropic’s Call for an AI Freeze and What It Means for Claude Builders

From a Claude and Claude Code developer’s perspective, the interesting part of this story is not just the policy headline — it’s the signal it sends about where frontier-model companies think the AI race is headed. A call for a global freeze suggests real anxiety about capability jumps, deployment incentives, and the gap between technical progress and governance. The source is a Reddit post titled “Anthropic calls for global freeze in AI.” The extracted article body is not available here; the pa

Reddit Post About Anthropic and the US Administration

Reddit Post About Anthropic and the US Administration

For Claude and Claude Code developers, this story is interesting mainly because it sits at the intersection of frontier AI and public-sector scrutiny. Even though the extracted source here doesn’t include the post’s actual content, the title alone suggests a discussion worth watching: when Anthropic shows up in a government context, people building with Claude should pay attention. The source is a Reddit post titled “US President Administration and Anthropic”. The extracted article body does