2026-06-01

Kepler’s Verifiable AI Stack Shows Where Claude Fits Best

For Claude and Claude Code developers, this Kepler story is interesting because it’s not “AI replaces analysts” hype — it’s a concrete example of using Claude as a reasoning layer inside a system that still insists on deterministic verification. That distinction matters a lot in finance, where being right is less important than being able to prove why you’re right. It also reads like a fairly honest systems-design lesson: the model is useful, but only when the surrounding infrastructure is doing serious work.

Key Points

Kepler is building a financial research platform that aims to make AI answers auditable down to the exact filing, page, and line item.
The company says it has indexed:
- 26M+ SEC filings
- 50M+ public documents
- 1M+ private documents
- 14,000+ companies across 27 global markets
The founders, Vinoo Ganesh and John McRaven, previously worked at Palantir and interviewed 147 financial firms before starting Kepler.
The core customer problem was trust: firms wanted AI for research, but didn’t trust outputs they couldn’t audit.
Kepler’s approach is to pair Claude with deterministic infrastructure:
- Claude handles interpretation, decomposition, and reasoning
- deterministic systems handle computation, retrieval, and verification
The team says Claude was the model that most consistently held together long, multi-step plans without quietly dropping constraints.
Claude also stood out by asking humans to resolve ambiguity instead of guessing and continuing.
Kepler built:
- deterministic execution environments for provably correct operations
- a proprietary ontology for financial concepts and formulas
- recurring “skills” for common workflows
- strict access controls and security boundaries
The workflow is split across stages, with different models assigned to different jobs:
- Opus for complex reasoning and plan decomposition
- Sonnet for higher-throughput constrained stages
Kepler also trained specialized recall models, some using Claude as a foundation, and reports 94% accuracy on taxonomy mapping tasks versus 38–46% for other models.
Every prompt change, model upgrade, and context modification is tested against thousands of cases before production.
The system is built for provenance from the start, with full audit logging, siloed customer environments, and end-to-end traceability.
Kepler frames finance as the first product, but says the same pattern could apply to healthcare and legal workflows where verification matters.

My Take

What strikes me is how unglamorous — and how sensible — this architecture is. The headline is Claude, but the real product idea is “don’t let the model be the source of truth.” I think that’s the right instinct for high-stakes domains, and honestly it’s the sort of thing more AI teams should admit upfront instead of pretending a single model call can replace an entire verification pipeline.

I also like the split between interpretation and computation. That’s where a lot of agentic systems get sloppy: they ask the model to both understand the task and be the calculator, which is exactly how you end up with confident nonsense. Kepler’s insistence on deterministic execution, provenance, and stage-by-stage evaluation feels much closer to real enterprise software than to demoware.

What’s especially interesting to me is the claim that Claude does better at preserving long plans and surfacing ambiguity. I’d be curious whether that advantage holds broadly across other “must not hallucinate” workflows, or whether Kepler’s task design is amplifying Claude’s strengths. Either way, this is the kind of use case where a model that knows when to stop and ask a question is more valuable than one that just barrels ahead.

The one thing I’d caution against is overreading the “AI for finance” angle as if the model alone created the value. The article is pretty clear that the verification layer, ontology, retrieval, evaluation harness, and security model are doing a lot of the heavy lifting. That’s not a criticism — it’s the lesson. If I were building with Claude in a regulated workflow, I’d copy the system design more than the model choice: separate reasoning from execution, make provenance first-class, and evaluate relentlessly.

The takeaway is simple: in serious enterprise AI, Claude is most compelling when it’s not asked to be magic. It’s strongest as the reasoning layer inside a system that can prove every step.