Open-source context engine that catches AI hallucinations and cuts your token bill 70–95%. The only AI helper that shows its work. Claude · Cursor · Codex,GPT & Custom Providers
# Add to your Claude Code skills
git clone https://github.com/juyterman1000/entrolyPRISM weights can shift as local feedback accumulates. The dashboard shows the current ranking weights and confidence signals.
Run
entroly demoon your own repo. The dashboard shows estimated token reduction per request and cumulative value tracking.
Run
entroly benchmark --compare-baselineto see how context quality improves as PRISM learns which files matter for your workflow.
Measured (HaluEval-QA, standard protocol): WITNESS scores AUROC 0.80 / 84.9% accuracy catching unsupported answers — at $0 and ~2 ms/decision, no LLM call. On identical data it statistically ties
gpt-4o-minias a judge and beats the published GPT-3.5 judge (62.6%). Threshold-free number, reproducible, no cherry-picking → full results & reproduce command.
Use WITNESS when you want model answers checked against supplied evidence before you trust them:
entroly witness --context-file evidence.txt --output-file answer.txt --mode strict
Proxy mode attaches proof certificate headers to every non-streaming JSON response. The full certificate is available from the sidecar URL in X-Entroly-Witness-Id; use --witness-embed only if you want certificates embedded into the provider JSON body:
entroly proxy --witness audit # headers + sidecar certificate
entroly proxy --witness strict --witness-profile rag # suppress unsupported factual claims
entroly proxy --witness strict --witness-profile summary # warn on unknowns to reduce over-suppression
entroly proxy --witness audit --witness-nli # use OpenAI NLI when OPENAI_API_KEY is set
No comments yet. Be the first to share your thoughts!
Profiles tune false-positive behavior by workload: rag, qa, benchmark_qa, and code fail closed in strict mode; summary, chat, and dialogue suppress contradictions but warn on unknown claims. JSON/structured outputs are audited with sidecar certificates and left byte-valid instead of being rewritten.
Certificate UX:
curl http://localhost:9377/witness/{id} # full proof path + evidence
curl http://localhost:9377/witness?limit=10 # recent certificates
curl -X POST http://localhost:9377/witness/{id}/feedback \
-H "Content-Type: application/json" \
-d '{"verdict":"false_positive"}'
The live dashboard also shows recent WITNESS certificates, flagged claims, proof/evidence snippets, suppression counts, and false-positive feedback totals when the proxy is running.
Current scope: non-streaming responses can be rewritten before return. In strict or annotate streaming mode, Entroly buffers the upstream stream, verifies it, then emits a verified SSE response; audit streaming mode remains pass-through and records certificates after completion. Optional NLI verification is batched with a latency budget and falls back to deterministic local PAV if the provider call fails.
Example trace from this repo's local development vault:
[detect] gap observed → entity="auth", miss_count=3
[synthesize] StructuralSynthesizer ($0, deterministic, no LLM)
[benchmark] skill=ddb2e2969bb0 → fitness 1.0 (1 pass / 0 fail, 338 ms)
[promote] status: draft → promoted
[spend] $0.0000 — invariant C_spent ≤ τ·S(t) holds
Compression did not reduce measured accuracy in these release benchmarks. Results below were measured with gpt-4o-mini; intervals are Wilson 95% confidence intervals.
| Benchmark | n | Budget | Baseline (95% CI) | With Entroly (95% CI) | Retention | Token Savings | |---|---|---|---|---|---|---| | NeedleInAHaystack | 20 | 2K | 100% [83.9-100%] | 100% [83.9-100%] | 100.0% | 99.5% | | LongBench (HotpotQA) | 50 | 2K | 64.0% [50.1-75.9%] | 68.0% [54.2-79.2%] | 106.2% | 85.3% | | Berkeley Function Calling | 50 | 500 | 100% [92.9-100%] | 100% [92.9-100%] | 100.0% | 79.3% | | SQuAD 2.0 | 50 | 100 | 78.0% [64.8-87.2%] | 76.0% [62.6-85.7%] | 97.4% | 39.3% | | GSM8K | 100 | 50K | 85.0% [76.7-90.7%] | 86.0% [77.9-91.5%] | 101.2% | pass-through¹ | | MMLU | 100 | 50K | 82.0% [73.3-88.3%] | 85.9% [77.8-91.4%] | 104.7% | pass-through¹ | | TruthfulQA (MC1) | 100 | 50K | 72.0% [62.5-79.9%] | 73.7% [64.3-81.4%] | 102.4% | pass-through¹ |
¹ pass-through: Context already fits within budget, so Entroly leaves it unchanged. Results vary by model, dataset, prompt shape,