session-graph

Name: session-graph
Author: robertoshimizu

Verified

Turn your scattered AI coding sessions into a queryable knowledge graph. Multi-platform (Claude Code, ChatGPT, DeepSeek, Grok, Warp), W3C ontology, Wikidata entity linking, SPARQL.

102stars

17forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/robertoshimizu/session-graph

Getting Started

Guides for using data processing skills like session-graph.

Getting Started with AI Skills
First-time install walkthrough for Claude Code, Codex CLI, and ChatGPT.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T17:01:36.138Z",
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is session-graph?

session-graph is an open-source data processing skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by robertoshimizu. Turn your scattered AI coding sessions into a queryable knowledge graph. Multi-platform (Claude Code, ChatGPT, DeepSeek, Grok, Warp), W3C ontology, Wikidata entity linking, SPARQL. It has 102 GitHub stars.

Is session-graph safe to use?

Yes. session-graph passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install session-graph?

Clone the repository with "git clone https://github.com/robertoshimizu/session-graph" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is session-graph written in?

session-graph is primarily written in Python. It is open-source under robertoshimizu on GitHub, so you can review or fork the full source.

Are there alternatives to session-graph?

Yes. SkillsLLM lists many other Data Processing skills you can browse and compare side by side. Open the Data Processing category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh session-graph against similar tools.

LLM Engineer for Beginners

Ship LLM features to production - prompts, RAG, structured outputs, evaluation

39 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

225,757

Popular in Data Processing

Top skills in this category by stars

academic-research-skills

by Imbad0202

Academic Research Skills for Claude Code: research → write → review → revise → finalize

36,139

Developers Also Liked

Based on votes and bookmarks from developers who liked this skill

research-units-pipeline-skills

by WILLOSCAR

Research pipelines as semantic execution units: each skill declares inputs/outputs, acceptance criteria, and guardrails. Evidence-first methodology prevents hollow writing through structured intermediate artifacts.

skills datasec_skill

session-graph

Turn your scattered AI coding sessions into a queryable knowledge graph.

The Problem

Developers use 5+ AI tools every day -- Claude Code, ChatGPT, Cursor, Copilot, Grok, DeepSeek, Warp. Each session is an isolated silo. Knowledge dies when the tab closes.

You have solved the same problem three times across different tools and cannot find any of them. You debugged a Supabase auth flow in Claude Code last Tuesday, discussed the same pattern in ChatGPT a month ago, and asked Grok about JWT refresh tokens somewhere in between. None of these tools talk to each other.

Existing solutions are single-platform and flat-file. They give you search over one tool's history, not structured relationships across all of them. A grep over session logs does not tell you that FastAPI uses Pydantic or that Neo4j is a type of graph database. It just gives you walls of text.

session-graph fixes this.

The Solution

session-graph extracts structured knowledge triples -- (subject, predicate, object) -- from all your AI coding sessions, links entities to Wikidata for universal disambiguation, and loads everything into a SPARQL-queryable triplestore with full provenance back to the source conversation.

"What technologies have I used across all sessions?"  -->  SPARQL query  -->  structured answer
"How does FastAPI relate to Pydantic?"                 -->  FastAPI --uses--> Pydantic
"What sessions discussed authentication?"              -->  3 sessions across Claude Code + DeepSeek

The key insight: a knowledge graph without relationships is just a tag cloud. The minimum viable extraction unit is (subject, predicate, object), not [topic1, topic2, topic3].

What makes this different

Multi-platform: Ingests Claude Code, ChatGPT, DeepSeek, Grok, and Warp into a single unified graph. No other tool does this.
Formal ontology: Composes 5 W3C/ISO standards (PROV-O, SIOC, SKOS, Dublin Core, Schema.org) instead of inventing a custom schema.
Wikidata linking: Entities are disambiguated against 100M+ Wikidata items via owl:sameAs. "k8s", "kubernetes", and "K8s" all resolve to Q22661306.
Full provenance: Every knowledge triple traces back to the exact source message, session, platform, and file path.
Federated queries: SPARQL can query your local graph and Wikidata in a single query.

Results

From real-world usage across 52 sessions:

Metric	Value
Total triples in Fuseki	1,334,432
Sessions indexed	607+
Knowledge triples extracted	47,868+
Distinct entities	~8,000+
Wikidata-linked entities	4,774 (~33%)
Curated predicates	24 (with <1% `relatedTo` fallback)
Platforms supported	4 (Claude Code, DeepSeek, Grok, Warp)
Entity linking precision	7/7 (agentic ReAct linker)
Cost per 600 sessions	~$0.60 (Vertex AI batch pricing)

Graph Preview

Real data from SPARQL — technologies, concepts, and session provenance linked across multiple Claude Code sessions:

Knowledge Graph Preview

Hub nodes (large blue) are highly connected technologies. Green nodes are concepts/outputs. Purple rectangles are session IDs with dashed provenance edges. The "W" badge indicates entities linked to Wikidata.

Architecture

Scattered Sources              Adapter Layer           Knowledge Graph
-----------------              -------------           ---------------
Claude Code (.jsonl)  --+
DeepSeek (.json zip)  --+     triple_extraction.py
Grok (.json zip)      --+--->  (LLM extracts s,p,o   ---> Apache Jena Fuseki
Warp (SQLite)         --+      from each assistant         (SPARQL endpoint)
ChatGPT (planned)     --+      message using 24                 |
Cursor (planned)      --+      curated predicates)              |
                                     |                          v
                                     v                    SPARQL Queries
                            link_entities.py           (14 local templates
                             (LangGraph ReAct           + 6 Wikidata templates)
                              agent links to                    |
                              Wikidata QIDs)                    v
                                                        Claude Code Skill
                                                     (natural language -> SPARQL)

Real-time Loop (Claude Code):
  Session pause → stop_hook.sh → RabbitMQ → pipeline-runner → Fuseki
                                              (triple cache: 0 API calls for seen messages)

Pipeline in Detail

1. SOURCE PARSING (per platform --> RDF Turtle)
   Each parser reads a platform-specific format and produces
   PROV-O + SIOC session structure plus knowledge triples.

2. TRIPLE EXTRACTION (LLM-powered)
   Each assistant message --> LLM --> top 10 (subject, predicate, object) triples
   24 curated predicates | capped at 10 triples/message (prioritizes architecture)
   Closed-world vocabulary (deviations fuzzy-matched) | retry on JSON truncation

3. ENTITY FILTERING (two-level)
   Level 1: is_valid_entity() in triple_extraction.py -- rejects garbage at extraction
   Level 2: is_linkable_entity() in link_entities.py -- pre-filters before Wikidata
   Catches: filenames (*.py), hex colors (#8776f6), CLI flags (--force),
            ICD codes (j458), snake_case identifiers, DOM selectors, etc.
   48 whitelisted short terms bypass filters (ai, api, llm, rdf, sql, etc.)

4. ENTITY LINKING (context-aware, agentic)
   For each entity:
   +-- Normalize via entity_aliases.json (161 mappings: k8s-->kubernetes, etc.)
   +-- Frequency filter: --min-sessions 2 (default) -- only links entities
   |   appearing in 2+ sessions (~77% reduction)
   +-- Check SQLite cache
   +-- If miss --> LangGraph ReAct agent (LLM + Wikidata API tool)
   +-- Confidence threshold 0.7 --> owl:sameAs link
   +-- Entity dedup: same QID --> owl:sameAs between aliases

5. LOAD --> Apache Jena Fuseki (SPARQL endpoint)

6. QUERY --> SPARQL (via Claude Code skill or directly)

Supported Platforms

Platform	Parser	Format	Status
Claude Code	`jsonl_to_rdf.py`	JSONL	Production
DeepSeek	`deepseek_to_rdf.py`	JSON zip export	Production
Grok	`grok_to_rdf.py`	JSON (MongoDB export)	Production
Warp	`warp_to_rdf.py`	SQLite	Production
ChatGPT	--	JSON export	Planned
Cursor	--	SQLite / Markdown	Planned
VS Code Copilot	--	JSON	Planned

All parsers produce the same RDF schema. Entities merge by label across platforms.

Quick Start

git clone https://github.com/robertoshimizu/session-graph.git
cd session-graph
./setup.sh

The setup script checks prerequisites, creates .env with your LLM provider, installs Python dependencies, starts Docker services (Fuseki + RabbitMQ), and runs a smoke test — all interactively.

After setup: http://localhost:3030 (Fuseki SPARQL UI) and http://localhost:15672 (RabbitMQ, devkg/devkg).

# 1. Configure
cp .env.example .env
# Edit .env with your LLM provider API key (see Provider Support below)

# 2. Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 3. Create output directories
mkdir -p output/claude output/deepseek output/grok output/warp logs

# 4. Start all services (Fuseki + RabbitMQ + pipeline-runner)
docker compose up -d
# Fuseki SPARQL UI: http://localhost:3030
# RabbitMQ Management UI: http://localhost:15672 (devkg/devkg)

# 5. Process a single session (manual)
python -m pipeline.jsonl_to_rdf path/to/session.jsonl output/claude/session.ttl

# 6. Link entities to Wikidata
PYTHONUNBUFFERED=1 python -m pipeline.link_entities \
  --input output/*.ttl --output output/wikidata_links.ttl

# 7. Load into Fuseki (--auth required for Docker Fuseki)
python -m pipeline.load_fuseki output/*.ttl --auth admin:admin

# 8. Query at http://localhost:3030

Automatic Processing (Recommended)

With Docker Compose running, every Claude Code session is automatically processed:

Claude Code session ends
  → stop_hook.sh publishes to RabbitMQ (~33ms, non-blocking)
  → pipeline-runner container picks up the job
  → Extracts triples, generates .ttl, uploads to Fuseki
  → Failed jobs go to dead-letter queue for inspection

Configure the hook in ~/.claude/settings.json:

{
  "hooks": {
    "Stop": [{"hooks": [{"type": "command", "command": "/path/to/hooks/stop_hook.sh", "timeout": 5}]}]
  }
}

Bulk Processing (Backfill Your History)

Once automatic processing is running, it only captures new sessions going forward. But you likely have weeks or months of past Claude Code sessions already sitting on disk — and that's where most of the value is.

Claude Code stores every session as a .jsonl file under ~/.claude/projects/. Each project directory contains one file per session. A typical developer accumulates hundreds of sessions over a few months. Bulk processing lets you backfill all of them into the knowledge graph in one shot.

This is optional but highly recommended. The more sessions in the graph, the richer the connections — you'll find patterns and relationships you didn't know existed across your past work.