by johunsang
Fast, AI-agent-native code search in Rust — hybrid BM25 + semantic, Tree-sitter AST chunking, dependency & impact analysis. Drop-in replacement for grep/cat/read/ls in Claude Code, Codex, Cursor, Aider, OpenHands.
# Add to your Claude Code skills
git clone https://github.com/johunsang/semble_rssemble_rs is a Rust port and superset of MinishLab/semble built for AI coding agents. It returns the exact code chunks an agent needs, prints a token-cheap codebase tree instead of ls -R, and compresses 3 MB CI logs into 35 KB. One single binary, no daemon, no API keys, no GPU. Hybrid BM25 + Model2Vec static embeddings with code-aware reranking, plus a dependency graph, AST chunking, and a digest pipeline for build / test / CI output.
# Install Rust if needed, then:
git clone https://github.com/johunsang/semble_rs.git && cd semble_rs
cargo install --path .
The binary lands at ~/.cargo/bin/semble_rs. On first run, the default embedding model minishlab/potion-code-16M (~60 MB) is downloaded from HuggingFace.
# Map the codebase (replaces ls -R)
semble_rs tree ./my-project --symbols
# Find code by what it does (replaces grep + cat)
semble_rs search "how is auth handled" ./my-project --outline
# Compress build / CI output before reading it
cargo build 2>&1 | semble_rs digest
gh run view <id> --log-failed | semble_rs digest
For agent integration (Claude Code, Codex, Cursor), see Agent integration.
tree collapses ls -R by 4×–747×; --outline is -47% vs full output; digest reaches on real GitHub Actions logs.No comments yet. Be the first to share your thoughts!
deps / impact show what a file imports, defines, and what changes if you touch it. Optional Graphviz --dot output.digest auto-detects cargo, pnpm/npm/yarn/bun, tsc, pytest, go test, gradle, ruff, mypy, clang/gcc/cmake/make/swiftc, GitHub Actions.semble_rs search "auth flow" ./my-project --outline # pass 1: structural overview
semble_rs search "loginWithEmail" ./my-project --compact # pass 2: matching lines
semble_rs search "save model" https://github.com/MinishLab/model2vec # git URL
path defaults to the current directory; git URLs are accepted (cloned shallow).
| Mode | Output | Token cost vs --compact | When to use |
| --- | --- | --- | --- |
| --outline | One signature line per chunk | -47% | First-pass structural scan |
| --group | Directory grouping + match lines capped at 3 (+N overflow) | -47% | Many match lines per chunk |
| --compact | Score + path + every matching line | baseline | Precision scan |
| --json --strip | Chunk bodies (comments stripped) | +800% | Tooling / pipeline integration |
| --json | Chunk bodies (raw) | +900% | Tooling / pipeline integration |
Recommended: --outline to overview → --compact to narrow → --json --strip only if the chunk body itself is needed.
find-relatedGiven a file:line from a previous search result, returns chunks semantically similar to that location.
semble_rs find-related src/auth.rs 42 ./my-project
planWhen the agent doesn't know where to start, plan runs a small search and prints a recommended sequence of --outline / --group / --compact / deps / impact commands.
semble_rs plan "fix auth flow bug" ./my-project -k 5
plan is a guardrail, not an oracle: low-confidence candidates are leads, not facts. Skip it when the symbol or feature name is already known.
--modelAll search-side commands accept --model <hf-repo-or-local-path> to override the default embedder. Also honours the SEMBLE_MODEL_PATH environment variable.
semble_rs tree prints the codebase file tree using the same gitignore-aware index as search. It exists because ls -R on a real project explodes into tens or hundreds of thousands of tokens (.git/, target/, node_modules/ all included). Measured on real repos:
| Project | semble_rs tree | ls -R | Reduction |
| --- | --- | --- | --- |
| this repo (Rust + target/) | 533 B | 398,101 B | 747× |
| 6,693-file Python backend | 3,950 B | 254,066 B | 64× |
| 325-file ML training repo | 838 B | 7,522 B | 9× |
semble_rs tree # current directory
semble_rs tree -d # directories only
semble_rs tree --max-depth 2 # cap depth
semble_rs tree --symbols # append top-level symbols per file
semble_rs tree --lang rust,python # filter by language
semble_rs digest collapses build / test / install / CI output. Errors, file:line:col, tracebacks, panic stacks, and failed-step bodies are always preserved — only progress lines collapse to counts.
cargo build 2>&1 | semble_rs digest
pnpm install 2>&1 | semble_rs digest
pytest 2>&1 | semble_rs digest
gh run view <id> --log-failed | semble_rs digest
Measured on 15 real-world fixtures:
| Fixture | Raw → digest | Savings |
| --- | --- | --- |
| cargo build (clean, 218 crates) | 7,611 B → 59 B | -99.2% |
| cargo test (45 passing) | 3,368 B → 369 B | -89.0% |
| pnpm install | 1,323 B → 349 B | -73.6% |
| tsc (13 errors, 5 codes) | 1,085 B → 648 B | -40.3% |
| pytest (4 failures) | 2,762 B → 2,330 B | -15.6% |
| GitHub Actions log (rust-lang/rust failed CI, real) | 3.3 MB → 35 KB | -98.9% ⭐ |
| go test (with panic + stack) | 1,034 B → 475 B | -54.1% |
| gradle test (2 failures) | 1,232 B → 522 B | -57.6% |
| ruff / mypy / clang / cmake / swift | varies | -3% to -30% |
| TOTAL (15 fixtures) | 3.33 MB → 43 KB | -98.7% |
Auto-detection covers cargo, pnpm/npm/yarn/bun, tsc, pytest, go test, gradle, ruff, mypy, clang/gcc/cmake/make/swiftc, GitHub Actions. Force a handler with --format <name>; inspect with --show-format.
semble_rs deps src/auth.rs ./my-project # what this file imports / defines (flat)
semble_rs deps src/auth.rs ./my-project --tree # transitive imports as ASCII tree
semble_rs deps src/auth.rs ./my-project --tree --max-depth 3
semble_rs deps src/auth.rs ./my-project --dot | dot -Tpng > deps.png
semble_rs impact src/auth.rs ./my-project # who depends on this file (flat list)
semble_rs impact src/auth.rs ./my-project --tree # reverse-dependency tree
semble_rs impact src/auth.rs ./my-project --dot | dot -Tpng > impact.png
--tree (v0.9.1+) renders forward (deps) or reverse (impact) dependencies as an ASCII tree with cycle detection (repeated nodes marked (cycle)) and --max-depth N truncation (…). No external tool required, agent-readable.
impact is intended to be run before edits to a shared module to avoid surprises.
find-patternThin wrapper around ast-grep for structural queries that semantic search can't express:
semble_rs find-pattern 'fn $name($$$)' . --lang rust --compact
Requires ast-grep installed (brew install ast-grep or cargo install ast-grep).
semble_rs encode exposes the embedding model as a CLI for scripting and debugging:
semble_rs encode "search result scoring" # one vector → JSON array
echo -e "auth\nlogin\ntoken" | semble_rs encode # stdin, one sentence per line
semble_rs encode "x" --model minishlab/potion-multilingual-128M
Append a snippet like the following to your project-root CLAUDE.md or AGENTS.md. It works for Claude Code, Codex, Cursor (.cursorrules), Aider, and OpenHands.
## Code search and exploration
Use `semble_rs` instead of `ls -R`, `grep`, `cat`:
```bash
semble_rs tree . --symbols # codebase map (cheap)
semble_rs search "<feature or symbol>" . --outline # pass 1
semble_rs search "<feature or symbol>" . --compact # pass 2
semble_rs deps <file> . # what file imports / defines
semble_rs impact <file> . # files affected by changes
```
Compress noisy command output before reading it:
```bash
cargo build 2>&1 | semble_rs digest
pnpm install 2>&1 | semble_rs digest
gh run view <id> --log-