by deonmenezes
Mantis Hack
# Add to your Claude Code skills
git clone https://github.com/deonmenezes/mantishackGuides for using ai agents skills like mantishack.
stalk · wait · strike · hold
Ethically hack and discover vulnerabilities in any software with the power of AI.
mantishack.com · Upstream: github.com/gadievron/raptor
Mantishack is a fork of RAPTOR — the Recursive Autonomous Penetration Testing and Observation Robot by Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, and John Cartwright. The agentic workflow, the Semgrep + CodeQL pipeline, the multi-stage validation methodology, the persona library, and the offline registry packs all come from RAPTOR. Mantishack carries that work forward, rebrands the user-facing surface to the /mantis-* slash-command vocabulary, adds an automatic auth + logging audit lane (JWT, cookies, audit-log coverage), and ships under MIT with two coexisting copyrights.
Upstream licence: MIT © 2025-2026 Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, John Cartwright — see
LICENSE. Fork-modification licence: MIT © 2026 Deon Menezes — seeLICENSE-MANTISHACK. Combined attribution and modification log inNOTICE.
If you came here looking for the canonical project, please visit github.com/gadievron/raptor — that is where upstream development happens. If you want to make the framework better, open a PR upstream.
Mantishack is an autonomous security research framework built on top of Claude Code (but not tied to it — you can plug in your own analysis layer too). It chains together static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow you can run against a codebase or binary.
It is not polished software. The upstream is held together with enthusiasm and duct tape, and it works well enough that the upstream maintainers can't stop using it. This fork is the same — usable in the field, rough in the corners. Open issues upstream at gadievron/raptor.
# Clone the repo
git clone https://github.com/deonmenezes/mantishack.git
cd mantishack
# Install Python dependencies
pip install -r requirements.txt
# Install Claude Code (required)
npm install -g @anthropic-ai/claude-code
# Install Semgrep (required for scanning)
pip install semgrep
# Open Mantishack
claude
Everything pre-installed. Open in VS Code with Dev Containers: Open Folder in Container, or build manually:
docker build -f .devcontainer/Dockerfile -t mantishack:latest .
docker run --privileged -it mantishack:latest
The --privileged flag is required for the rr deterministic debugger. The image is large (around 6 GB). It starts from the Microsoft Python 3.12 devcontainer and adds static analysis, fuzzing, and browser automation tooling.
Once inside, just say "hi" to get started, or jump straight to a command.
| Command | What it does | Status |
|---------|--------------|--------|
| /mantis-agentic | Full autonomous workflow: scan, auth+logging audit, validate, exploit, patch | Stable |
| /mantis-scan | Static analysis with Semgrep and CodeQL | Stable |
| /mantis-auth-audit | Automatic JWT + cookie + audit-log security check | Stable (fork addition) |
| /mantis-understand | Map attack surface, trace data flows, hunt vulnerability variants | Stable |
| /mantis-validate | Multi-stage exploitability validation pipeline (Stages 0–F) | Stable |
| /mantis-codeql | CodeQL-only deep analysis with SMT dataflow pre-screening | Stable |
| /mantis-exploit | Generate proof-of-concept exploit code | Beta |
| /mantis-patch | Generate secure patches for confirmed vulnerabilities | Beta |
| /mantis-fuzz | Binary fuzzing with AFL++ and crash analysis | Stable |
| /mantis-crash-analysis | Autonomous root-cause analysis for C/C++ crashes | Stable |
| /mantis-oss-forensics | Evidence-backed forensic investigation for GitHub repositories | Stable |
| /mantis-project | Named workspaces to organise runs and track findings over time | Stable |
| /mantis-sca | Software composition analysis | Stable |
| /mantis-cve-diff | Compare scanner runs across known CVE fixes | Stable |
| /mantis-web | Web application scanning | Alpha/stub |
Start by creating a project so all your runs land in one place:
/mantis-project create myapp --target /path/to/code # create a project first
/mantis-project use myapp # set it as active
/mantis-understand --map # map the attack surface
/mantis-agentic # scan, audit, validate, exploit, patch
/mantis-project findings # review everything in one place
/mantis-understand builds a context map of entry points, trust boundaries, and sinks before a line of scanning happens. /mantis-agentic then runs Semgrep and CodeQL, executes the auth + logging audit lane automatically, deduplicates findings, and dispatches each one for validation using the exploitation-validator methodology:
Findings that clear validation get exploit PoCs and patches generated. A cross-finding analysis runs at the end to find shared root causes and attack chains.
/mantis-validate runs this same pipeline as a standalone step if you already have findings from a previous scan.
Mantishack automatically runs an auth + logging audit on every /mantis-agentic invocation. The same checks are also exposed as a standalone /mantis-auth-audit slash command for faster, more-targeted runs.
The lane uses Semgrep rules tagged mantis_capability: auth-audit plus pytest fixtures that assert audit-log coverage at runtime. What it looks for:
JWT — engine/semgrep/rules/auth/jwt-misuse.yaml
alg=none accepted (token forgery)exp claim (token never expires)Cookies — engine/semgrep/rules/auth/cookie-security.yaml
HttpOnly (XSS-exfiltrable)Secure (plaintext-HTTP exposure)SameSite (CSRF)Logging — engine/semgrep/rules/logging/missing-auth-audit.yaml
is_admin = True) with no audit logsession_id written to logs (credential leak)Pytest harness — conftest.py
@pytest.mark.auth_audit marker + assert_audit_log_emitted fixture: tests that exercise auth-sensitive code paths fail the run if (a) no INFO/WARN log was emitted, or (b) any log record contains a raw JWT / session id / bearer token.Usage example for the pytest hook:
import pytest
@pytest.mark.auth_audit
def test_login_logs_failure(client, assert_audit_log_emitted):
client.post("/login", data={"u": "alice", "p": "wrong"})
# fixture teardown asserts an audit log was emitted and no credential leaked
Run the standalone audit:
python3 mantishack.py scan --repo /path/to/code --policy-groups auth,logging
Mantishack inherits RAPTOR's two-layer Z3 integration (pip install z3-solver). It is optional. Everything works without it, but the results are better with it.
Dataflow pre-screening (CodeQL) — When CodeQL produces a path result, the path constraints are checked for satisfiability before any LLM call is made. Paths that are provably unreachable get dropped immediately. For paths that are reachable, Z3 produces concrete candidate inputs that go into the analysis prompt.
One-gadget constraint analysis (binary feasibility) — During binary exploit feasibility assessment, Z3 checks whether a one-gadget's register and memory constraints are satisfiable against the concrete crash state. Gadgets are ranked by actual reachability rather than heuristics.
Z3 is pre-installed in the devcontainer. For manual installs: pip install z3-solver.
Semgrep scanning works fully offline. All registry packs that would normally be fetched from semgrep.dev at scan time are shipped in the repo under engine/semgrep/rules/registry-cache/. The scanner resolves pack IDs to local files before invoking semgrep, so no network call happens.
Cached packs: p/security-audit, p/owasp-top-ten, p/secrets, p/command-injection, p/jwt, p/default, p/xss.
CodeQL needs network access only during initial setup to download the CLI and query packs. Once installed it runs offline.
Mantishack has two separate model layers, inherited from RAPTOR:
The orchestration layer is always Claude Code. The CLAUDE.md, skills, and commands all run as Claude Code instructions. To change which Claude model orchestrates Mantishack, use Claude Code's --model flag or the /model command inside a session.
The analysis dispatch layer is the LLM that analyses individual vulnerability findings. This is separate from the orchestration
No comments yet. Be the first to share your thoughts!