PaPoo
cover

cc-canary: a Local Drift Detector for Claude Code Sessions

For Claude Code users, this repository is interesting because it turns a vague feeling — “my agent seems to be getting worse” — into something measurable. I think that’s a genuinely useful direction: instead of relying on vibes, it mines the session logs Claude Code already writes locally and tries to surface drift, regressions, and behavioral shifts over time.

Key Points

My Take

What strikes me is how much this looks like an observability tool for agent behavior, not just a helper script. That feels important: once you start using Claude Code seriously, the hard problem is often not “can it do the task?” but “is it quietly getting less careful, more verbose, more brittle, or more shortcut-happy over time?” I think this repo is trying to answer that in a way that’s concrete enough to act on.

I also like the privacy stance. Local-only analysis of logs already on disk is exactly the sort of thing I’d prefer for developer workflow telemetry. If you’re going to inspect agent behavior, I’d much rather do it without shipping sensitive session data to another service. The fact that it works from existing Claude Code JSONL files makes it feel practical instead of aspirational.

image_0002.svg

That said, I’m a little skeptical of the more elaborate metric stack. Some of these signals — “reasoning loops,” “thinking redaction rate,” “mean thinking length,” and so on — may be useful, but I think there’s a risk of overfitting narrative to noisy proxies. A drift dashboard can be helpful, but it can also make people feel like they have a scientific handle on model quality when they really just have an instrument panel full of heuristics. I’d be curious whether the composite health score and inflection detection stay meaningful across very different project styles.

The most compelling part, to me, is the combination of hard counts and forensic output. A markdown or HTML report that’s ready to paste into an issue or gist is actually useful for debugging a real Claude Code workflow problem. If I were using this, I’d try it on a few weeks of sessions, then compare the output against my own intuition: did the model start over-editing files, looping more, or acting less deliberate after a model switch or workflow change?

Overall, this is a thoughtful, developer-centric attempt to make Claude Code behavior inspectable. It’s early, and some of the metrics may prove noisy, but the direction is strong: local, auditable, session-level analysis for spotting drift before it becomes a productivity tax.


Reference: GitHub - delta-hq/cc-canary

同じ著者の記事