by siddsachar
Thoth - Personal AI Sovereignty. A local-first AI assistant with integrated tools, a personal knowledge graph, voice, vision, shell, browser automation, scheduled tasks, health tracking, and messaging channels. Run locally via Ollama or add opt-in cloud models. Your data stays on your machine.
# Add to your Claude Code skills
git clone https://github.com/siddsachar/ThothThoth is a local-first AI assistant built for personal AI sovereignty — your models, your data, your rules. It combines a powerful ReAct agent with 25 integrated tools (70 sub-operations) — web search, email, calendar, file management, shell access, browser automation, vision, image generation, long-term memory with a personal knowledge graph, scheduled tasks, habit tracking, and more — plus a plugin system with a built-in marketplace and a multi-channel messaging framework (Telegram with full media support; more channels coming). Run everything locally via Ollama, or add opt-in cloud models (GPT, Claude, Gemini, and more) when you need frontier reasoning or don't have a GPU. Either way, your data — conversations, memories, documents, and history — stays on your machine.
Local models are already amazing. You'll be surprised what a 14B+ local model can do. If you start with cloud models today, and as local models get smarter and hardware gets cheaper, transition to fully local, fully private, fully free AI — seamlessly, with no changes to your setup.
Governments are investing billions to keep AI infrastructure within their borders. Thoth applies the same principle to the individual — your compute, your data, your choice of model, accountable to no one but you.
🖥️ One-click install on Windows & macOS — download, run, done. No terminal, no Docker, no config files. Get it here.
In ancient Egyptian mythology, Thoth (𓁟) was the god of wisdom, writing, and knowledge — the divine scribe who recorded all human understanding. Like its namesake, this tool is built to gather, organize, and faithfully retrieve knowledge — while keeping everything under your control.
No comments yet. Be the first to share your thoughts!
📖 Every feature below is documented in full technical detail in docs/ARCHITECTURE.md.
LangGraph-based autonomous agent with 25 tools / 70 sub-operations — the agent decides which tools to call, how many times, and in what order. Real-time token streaming with thinking model support (DeepSeek-R1, Qwen3, QwQ — collapsible reasoning bubbles). Smart context management via tiktoken: auto-summarization at 80% capacity, proportional tool-output shrinking, and dynamic tool budgets that adapt to available headroom. Destructive actions require explicit confirmation; orphaned tool calls are auto-repaired; recursive loops are caught with a wind-down warning at 75%.
Thoth builds a personal knowledge graph — entities (person, place, event, preference, fact, project, organisation, concept, skill, media) linked by typed directional relations (Dad --[father_of]--> User), with alias resolution, auto-linking on save, memory decay, and background orphan repair. The agent can save, search, link, and explore memories through natural conversation. Graph-enhanced auto-recall retrieves semantically similar entities via FAISS and expands 1 hop in the NetworkX graph before every LLM call. An interactive Knowledge tab visualizes the full graph with search, entity-type filters, ego-graph toggle, and clickable detail cards. Background extraction produces structured triples with deterministic cross-category dedup.
Export the entire knowledge graph as an Obsidian-compatible markdown vault — one .md file per entity with YAML frontmatter, [[wiki-links]], and per-type indexes. Entities grouped by type (wiki/person/, wiki/project/, …); sparse entities roll up into index files. Live export on save/delete, full-text search, and conversation export. The agent has 5 sub-tools (wiki_search, wiki_read, wiki_rebuild, wiki_stats, wiki_export_conversation) to interact with the vault directly.
A background daemon that refines the knowledge graph during idle hours — merging duplicates (≥0.93 similarity), enriching thin descriptions from conversation context, and inferring missing relationships between co-occurring entities. Three-layer anti-contamination system prevents cross-entity fact-bleed: sentence-level excerpt filtering, deterministic post-enrichment validation, and hardened prompts. Configurable 1–5 AM window; all operations logged to a dream journal viewable in the Activity tab.
Uploaded documents are processed through a map-reduce LLM pipeline that extracts structured knowledge into the graph. Documents are split into windows, summarized, then reduced into a coherent article; core entities and relations are extracted with full source provenance. Supports PDF, DOCX, TXT, Markdown, HTML, and EPUB. Live progress pill in the status bar with phase indicator and stop button. Per-document cleanup removes vector store entries and all extracted entities.
Run fully local via Ollama (39 curated tool-calling models) or connect OpenAI / OpenRouter for 100+ cloud models (GPT, Claude, Gemini) — switchable per-thread and per-task from the GUI. First-launch wizard offers Local or Cloud paths; star favorites for quick access; cloud vision models are auto-detected. Privacy controls disable memory extraction and auto-recall for cloud threads. Smart context trimming reduces token usage and cloud API costs.
Toggle-based voice input with local faster-whisper STT (4 model sizes, CPU-only int8) — no cloud APIs. Neural TTS via Kokoro with 10 voices (US/British, male/female), streaming sentence-by-sentence with automatic mic gating during playback. Combine both for a fully hands-free conversational experience.
Full shell access with 3-tier safety — safe commands (ls, git status) auto-execute, moderate commands (rm, pip install) require confirmation, dangerous commands (shutdown, reboot, mkfs) are blocked outright. Persistent sessions per thread, inline terminal panel, command history saved to disk. Background tasks support per-task command prefix allowlists.
Autonomous browsing in a visible Chromium window — navigate, click, type, scroll, and manage tabs through natural conversation. Accessibility-tree snapshots with numbered element references; per-thread tab isolation; persistent login profile; smart snapshot compression for context efficiency; crash recovery and automatic browser detection (Chrome → Edge → Playwright).
Camera capture, screen capture, and workspace image file analysis via local or cloud vision models. Cloud models with vision capability (GPT-4o, Claude) are auto-detected. Images displayed inline in chat; configurable vision model selection.
Unified task engine powered by APScheduler with 7 schedule types (daily, weekly, weekdays, weekends, interval, cron, one-shot delay). Template variables ({{date}}, {{time}}, {{task_id}}), multi-step prompt chaining, channel delivery (Telegram/Email), per-task model override, and configurable background permissions. Monitoring/polling patterns let the agent self-disable when conditions are met. Home-screen dashboard with task tiles, activity monitor, and run history.
A generic Channel ABC lets any messaging platform plug into Thoth — channels declare capabilities (photo, voice, documents, reactions, buttons) and the system auto-generates tools and settings UI for each one. Telegram is the first full-featured channel: inbound voice transcription (faster-whisper), photo analysis (Vision), document handling with text extraction (P