clarte

Name: clarte
Author: michaelabrt

Verified

Studying the gap between what agents know and when they act on it.

198stars

0forks

TypeScript

Installation

# Add to your Claude Code skills
git clone https://github.com/michaelabrt/clarte

Getting Started

Guides for using ai agents skills like clarte.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T15:41:49.760Z",
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is clarte?

clarte is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by michaelabrt. Studying the gap between what agents know and when they act on it. It has 198 GitHub stars.

Is clarte safe to use?

Yes. clarte passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install clarte?

Clone the repository with "git clone https://github.com/michaelabrt/clarte" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is clarte written in?

clarte is primarily written in TypeScript. It is open-source under michaelabrt on GitHub, so you can review or fork the full source.

Are there alternatives to clarte?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh clarte against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

agent-audit gograph

[!IMPORTANT] This is an experimental research project, not a polished product. The findings are based on 700+ controlled sessions and 30+ experiments, but the real-world evaluation covers a small number of tasks. We’re sharing it early because the results are interesting enough to warrant wider testing. Contributions, replications and skepticism are welcome.

We ran 30+ experiments across 700+ agent sessions to find what measurably changes agent behavior.

First, we measured how agents spend their time. 170 sessions, 7,595 turns:

59% of turns: reading files the agent never edits
13%: re-running tests with no code change
28%: actual work

We assumed the fix was better information. So we built 15 context enrichments: instability metrics, facade maps, API surfaces, type-aware ordering, task-relevant weighting. Each benchmarked in isolation and combination.

Zero wins. Not one survived our combinatorial benchmark at realistic temperature. Three optimizations that individually showed -26%, -16% and -32% improvements combined to +63% overhead.

Then we found the placebo. A minimal context file - just the project language and test framework, two lines, zero analysis - performed identically to our full 2,000-token enrichment. The content was irrelevant. The file’s existence alone suppressed the agent’s exploration phase.

The real signal turned out to be first-edit timing. Strong correlation with session length across most tasks tested. Each delayed turn adds ~1.3 total turns. With context, agents start editing around turn 5. Without, turn 8. They find the right files on their own given enough time. They just lack the confidence to stop reading and start editing.

So we stopped injecting information. We started injecting confidence: instead of telling the agent what’s important, we tell it which files to edit.

For the full research story, see docs/research.md. All 30+ experiment writeups are in docs/experiments/.

How it works

Clarté is the experimental application of these findings. It parses your source code with tree-sitter, builds a weighted dependency graph from imports, call sites and git history, and on every prompt predicts which files need editing. The predictions go to a pre-flight agent that reads each target once and returns exact edit locations.

The full query pipeline runs in under 100ms. The Architecture section has the math.

npx @michaelabrt/clarte

Zero config. Works with Claude Code, Cursor, Copilot, Windsurf, Cline and OpenCode. TypeScript, Python, Go, Rust, Java.

npm install -g @michaelabrt/clarte --omit=optional

Early results

These are promising but based on limited evaluation. Treat them as directional, not definitive.

Real-world tests - 5 bug fixes in open-source repos (opaque prompts, Claude Sonnet, small n per task):

Task	Repo	Without Clarté	With Clarté	n
JSX async context loss	Hono	wrong file, did not finish	correct file, 2 min to first edit	2+2
Form validator prototype pollution	Hono	did not finish	completed (18 turns)	1+1
SQLite simple-enum array	TypeORM	47.7 turns	16.3 turns (-66%)	3+3
WebSocket adapter shutdown	NestJS	53 turns	38 turns (-28%)	7+7
URL fragment stripping	Hono	completed, high variance	completed, 3x more consistent	8+8

Baseline completed 3/5 within budget. With Clarté, 5/5. These are the controlled, reproducible runs from a larger iterative development process (hundreds of sessions across more tasks and repos). The 32 experiment writeups and 7 studies document the full research arc.

Fixture benchmarks (v0, context file only - no hooks or pre-flight):

Metric	Without Context	With Context	Delta	Significance
Wall-clock time (median)	130s	98s	-25%	p<0.001, small effect
Turns (median)	16	11.5	-28%	p<0.001, medium effect
Input tokens (median)	272K	108K	-60%	p<0.001, large effect

135 sessions (Claude Sonnet 4.6), 9 opaque tasks, statistical testing with Wilcoxon signed-rank, bootstrap CIs, Benjamini-Hochberg FDR correction and Cliff’s delta effect sizes. Methodology and full reports in the benchmark repo.

Contributing

This project benefits from wider testing. If you’re interested:

Try it on your codebase and report what works and what doesn’t. We need data beyond TypeScript and Claude.
Replicate the findings. Run the benchmark suite on your own tasks and compare.
Add tasks to the benchmark. More repos, more languages, more task types.
Challenge the methodology. The experiment writeups in docs/experiments/ are detailed enough to critique. We want to know what we’re getting wrong.
Improve prediction accuracy. The BM25F + Katz + logistic fusion pipeline is one approach. There may be better ones.

Architecture

graph TD
    subgraph offline ["Build Phase · offline"]
        A[tree-sitter] --> B[Dependency Graph]
        C[git log] --> D[Change Coupling]
        B --> E["HITS · Betweenness · Communities"]
        D --> F[Bayesian EWMA Priors]
        E & D --> G[Logistic Fusion Training]
    end

    subgraph prompt ["Query Phase · per prompt · sub-100ms"]
        H[Task Prompt] --> I["① BM25F Seed Resolution"]
        I --> J["② LSA Seed Expansion"]
        J --> K["③ Katz Propagation"]
        K --> L["④ Score Fusion"]
        L --> M[Pre-flight Agent]
    end

    B -.-> I
    G -.-> L
    F -.-> K
    M --> N((Agent))

Stage 1: Seed Expansion

You submit a task: "fix the JWT session leak." Two problems need solving.

Lexical matching. The query tokens "JWT" and "session" should match files like auth/jwt.ts or session/manager.ts. Clarté runs true multi-field BM25F (Robertson et al. 2004) across three document fields: file path segments, exported symbol names and import statements, each with independent length normalization and field weights.

Path segments are weighted 2x higher than symbols. auth/middleware.ts tells you more about a session-handling bug than a function named validate. Import names get 0.5x because they signal consumption, not definition. The query is tokenized with camelCase splitting, compound-word preservation and domain-specific synonym expansion (auth → authentication, db → database). IDF is computed globally across the corpus.

Each query term's IDF is weighted by a saturated pseudo-term-frequency that blends all three fields before applying the k₁ = 1.2 saturation constant. The weighted pseudo-term-frequency combines all three fields before saturation (true BM25F, not per-field BM25+).

$$\text{score}(d, q) = \sum_{t \in q} \text{IDF}(t) \cdot \frac{\widetilde{tf}(t, d)}{\widetilde{tf}(t, d) + k_1}$$

$$\widetilde{tf}(t, d) = \sum_{f \in \lbrace \text{path, sym, imp} \rbrace} w_f \cdot \frac{tf_{f}(t, d)}{1 - b_f + b_f \cdot |d_f| , / , \overline{dl}_f}$$

Three post-processing steps refine the candidate set: spreading activation propagates scores along import edges for 3 hops with 0.5^(hop-1) decay; test proxy scoring transfers test file scores to their source files at 0.6x (test paths encode what they cover); and an import ceiling caps re-export barrels at 0.5x the minimum direct-match score.

Conceptual matching. BM25F will never connect a bug report about "session tokens" to a file named SessionGuard.ts that exports validateJWT. No surface tokens overlap.

Latent Semantic Analysis bridges this gap. We build a file-symbol incidence matrix and compute a rank-32 approximation via randomized truncated SVD (Halko-Martinsson-Tropp algorithm). Files project into a 32-dimensional latent space where cosine similarity captures shared structural role rather than shared tokens.

The top BM25F seeds are averaged into a centroid vector. Non-seed files within cosine distance 0.3 enter the candidate pool at 0.4x discount, expanding the set with up to 5 conceptually related files. Activates only on codebases with 50+ files; below that, BM25F alone has sufficient coverage.

Sub-millisecond for typical codebases (1,000 files, 20 imports/file).

Parameter	Value	Role
k₁	1.2	Saturation constant
w_path	2.0	Path field weight
w_sym	1.0	Symbol field weight
w_imp	0.5	Import field weight
b_path	0.3	Path length normalization
b_sym	0.4	Symbol len