by eadmin2
Iron-Man-style voice assistant + holographic HUD for Hermes Agent. Local Whisper STT, ElevenLabs voice, agent-summoned media panels, runs on your own hardware.
# Add to your Claude Code skills
git clone https://github.com/eadmin2/jarvis_aijarvis_ai is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by eadmin2. Iron-Man-style voice assistant + holographic HUD for Hermes Agent. Local Whisper STT, ElevenLabs voice, agent-summoned media panels, runs on your own hardware. It has 71 GitHub stars.
jarvis_ai's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/eadmin2/jarvis_ai" and add it to your Claude Code skills directory (see the Installation section above).
jarvis_ai is primarily written in Python. It is open-source under eadmin2 on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh jarvis_ai against similar tools.
No comments yet. Be the first to share your thoughts!
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
A self-hosted, Iron-Man-style voice assistant and command center built on top of Hermes Agent (NousResearch's open-source autonomous agent). Talk to a real agent — one with persistent memory, terminal access, web search, file tools, and 80+ skills — through a glowing arc-reactor HUD in any browser on your LAN, or a push-to-talk client.
Everything runs on your own hardware. The only cloud calls are your LLM provider (via Hermes) and ElevenLabs for the voice. Speech-to-text is fully local (Whisper on CPU).
▶ Watch the demo on YouTube — live transcription, agent tool calls, holographic media panels, and the cinematic boot, all in real time.
Click the ring and speak. Your words transcribe live on screen while you talk. The transcript goes to Hermes Agent, which actually does things — reads and writes files, runs commands, searches the web, remembers you across sessions — and the reply streams back as speech, sentence by sentence, while the rest is still being generated. Typical round trip: 3–5 seconds.
The HUD around the ring is a real control center:
hud_display); panels can fly into
left/right thirds, and "clear the screen" sweeps them awayB: panels flicker in, ring spins up,
"Systems online. Good morning."large-v3-turbo at ~0.2 s,
with automatic fallback to local Whisper when that machine is off Browser HUD (any LAN device) Host machine (tested on macOS / Apple Silicon)
── https/wss :443 ──────────┐ ┌──────────────────────────────────────┐
mic · speaker · panels ├──►│ voice pipeline server (this repo) │
│ │ STT: faster-whisper (local, free) │ ┌─────────────────┐
Push-to-talk client │ │ TTS: ElevenLabs Flash (streaming) ├──►│ Hermes Agent │
── ws :8765 ────────────────┘ │ HUD + auth + dashboard TLS proxy │ │ API :8642 (lo) │
└──────────────────────────────────────┘ │ memory · tools │
│ skills · cron │
└─────────────────┘
One brain, many faces: voice and typed chat share a single persistent Hermes session, so each knows what you said to the other — and memory survives every restart.
Full walkthrough in docs/SETUP.md. Short version:
# 1. Enable the Hermes Agent API server
cat >> ~/.hermes/.env <<EOF
API_SERVER_ENABLED=true
API_SERVER_KEY=$(python3 -c 'import secrets;print(secrets.token_urlsafe(32))')
JARVIS_HUD_TOKEN=$(python3 -c 'import secrets;print("jarvis-"+secrets.token_hex(3))')
ELEVENLABS_API_KEY=your-key-here
EOF
hermes gateway # or set up its LaunchAgent / service
# 2. This repo
git clone https://github.com/YOURNAME/jarvis-hermes-hud
cd jarvis-hermes-hud/server
python3 -m venv .venv
.venv/bin/pip install fastapi uvicorn requests pyyaml numpy anthropic \
RealtimeSTT faster-whisper silero-vad websockets psutil
cp config/server.example.yaml config/server.yaml # edit: your ElevenLabs voice_id etc.
scripts/make-certs.sh # self-signed TLS (browser mic needs it)
scripts/make-boot-audio.sh YourName # one-time boot greeting synthesis
# 3. Run
.venv/bin/python server.py
# open https://YOUR_HOST/hud/ → accept cert → enter your JARVIS_HUD_TOKEN → talk
For auto-start on boot, see launchd/ (macOS) — the plists document two non-obvious macOS traps (external-drive TCC and log paths) that cost us an evening.
| Action | How |
|---|---|
| Talk | Click the ring (or Space) · speak · click again to send |
| Stop the agent | red ■ STOP button or Esc |
| Barge in | click the ring while it's speaking |
| Typed chat | input bar at the bottom (same conversation as voice) |
| Cinematic boot | press B |
| Kanban / dashboards | VIEWS panel → animated pop-up viewers |
| Phone | open the HUD → Add to Home Screen |
server/ FastAPI voice pipeline + HUD host (the core of this project)
server/hud/ single-file HUD (vanilla JS, no build step)
server/scripts/ start/stop/health/smoke + cert & boot-audio generators
client/ optional Windows/Linux push-to-talk Python client (wake word capable)
worker/ optional GPU sidecars: big-model STT server + stats agent for the Machines panel
hermes-plugin/ Hermes tool plugin: lets the agent summon/dismiss HUD media panels
launchd/ macOS auto-start templates with hard-won TCC + FD-limit notes
docs/ SETUP, ARCHITECTURE (protocols/endpoints), TROUBLESHOOTING
Built on Hermes Agent by Nous Research. HUD aesthetics inspired by jarvis-dashboard. STT by faster-whisper / RealtimeSTT. Voice by ElevenLabs.
MIT — see LICENSE. Use it, fork it, build your own Jarvis.