Show HN: Caliper – pass@k reliability testing for Claude Code and Codex skills
By edonadei · 2026-06-28 · 2 points · 1 comments
https://github.com/edonadei/caliper
Skills for Claude Code and Codex are hard to test. What I mean by hard is that there's no standard way to do it. You evaluate the skill once on something, it looks like it works. You publish it. Then the new super model releases (GLM 5.2 anyone?), it will quietly break for some…
Open the full discussion on BetterNews