by harumiWeb
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.
# Add to your Claude Code skills
git clone https://github.com/harumiWeb/exstructNo comments yet. Be the first to share your thoughts!
ExStruct reads Excel workbooks into structured data and applies patch-based editing workflows through a shared core. It provides extraction APIs, a JSON-first editing CLI, and an MCP server for host-managed integrations, with options tuned for LLM/RAG preprocessing, reviewable edit flows, and local automation.
Detection heuristics, editing workflows, and output modes are adjustable for LLM/RAG pipelines and local automation.
light: cells + table candidates + print areas + shapes/charts (best-effort via direct OOXML parsing)libreoffice: best-effort non-COM mode for .xlsx/.xlsm. When the LibreOffice runtime is available, it adds merged cells, shapes, connectors, and chartsstandard: Excel COM mode with texted shapes + arrows, charts, SmartArt, and merged-cell rangesverbose: outputs all shapes with width/height and also emits cell hyperlinksformulas_map (formula string -> cell coordinates) via openpyxl/COM. It is enabled by default in verbose and can be controlled with include_formulas_map.--pretty for formatting), YAML, and TOON (optional dependencies).exstruct.edit only when you need the same patch contract from Python.verbose mode, or with include_cell_links=True, cell links are emitted in links.pip install exstruct
Optional extras:
pip install pyyamlpip install python-toonpip install pypdfium2 pillow (mode=libreoffice is not supported)pip install exstruct[yaml,toon,render]Platform note:
python3-uno. ExStruct probes a compatible system Python automatically for mode=libreoffice; if your environment needs an explicit interpreter, set EXSTRUCT_LIBREOFFICE_PYTHON_PATH=/usr/bin/python3.--probe mode before selection. An incompatible EXSTRUCT_LIBREOFFICE_PYTHON_PATH fails fast instead of surfacing a delayed bridge SyntaxError during extraction.exstruct input.xlsx > output.json # compact JSON to stdout by default
exstruct input.xlsx -o out.json --pretty # write pretty JSON to a file
exstruct input.xlsx --format yaml # YAML (requires pyyaml)
exstruct input.xlsx --format toon # TOON (requires python-toon)
exstruct input.xlsx --sheets-dir sheets/ # write one file per sheet
exstruct input.xlsx --auto-page-breaks-dir auto_areas/ # always shown; execution requires standard/verbose + Excel COM
exstruct input.xlsx --alpha-col # output column keys as A, B, ..., AA
exstruct input.xlsx --include-backend-metadata # include shape/chart backend metadata
exstruct input.xlsx --mode light # cells + table candidates + best-effort OOXML shapes/charts
exstruct input.xlsx --mode libreoffice # best-effort extraction of shapes/connectors/charts without COM
exstruct input.xlsx --pdf --image # PDF and PNGs (Excel COM required)
Auto page-break export is available from both the API and the CLI when Excel/COM is available. The CLI always exposes --auto-page-breaks-dir, but validates it at execution time.
mode=libreoffice rejects --pdf, --image, and --auto-page-breaks-dir early, and mode=light also rejects --auto-page-breaks-dir. Use standard or verbose with Excel COM for those features.
By default, the CLI keeps legacy 0-based numeric string column keys ("0", "1", ...). Use --alpha-col when you need Excel-style keys ("A", "B", ...).
By default, serialized shape/chart output omits backend metadata (provenance, approximation_level, confidence) to reduce token usage. Use --include-backend-metadata or the corresponding Python/MCP option when you need it.
exstruct patch --input book.xlsx --ops ops.json --backend openpyxl
exstruct patch --input book.xlsx --ops - --dry-run --pretty < ops.json
exstruct make --output new.xlsx --ops ops.json --backend openpyxl
exstruct ops list
exstruct ops describe create_chart --pretty
exstruct validate --input book.xlsx --pretty
patch and make print JSON PatchResult to stdout.ops list / ops describe expose the public patch-op schema.validate reports workbook readability (is_readable, warnings, errors).Recommended edit flow:
exstruct patch --dry-run and inspect PatchResult, warnings, and diff.--backend openpyxl when you want the dry run and the real apply to use the same engine.--backend auto, inspect PatchResult.engine; on Windows/Excel hosts the real apply may switch to COM.--dry-run only after the result is acceptable.ExStruct also ships one repo-owned Skill for agents that should follow the editing CLI safely instead of rediscovering the workflow each time.
Canonical repo source:
.agents/skills/exstruct-cli/You can install it with the following single command:
npx skills add harumiWeb/exstruct/.agents/skills --skill exstruct-cli
If your runtime cannot use npx skills add, place the same folder manually
into a local skill directory that discovers SKILL.md-based skills.
Use this Skill when the agent needs help choosing between patch, make,
validate, ops list, and ops describe, or when it should follow the safe
validate -> dry-run -> inspect -> apply -> verify workflow.
Example prompt for agents:
Use
$exstruct-clito choose the right ExStruct editing CLI command, follow a safe validate/dry-run/inspect workflow, and explain any backend constraints for this workbook task.
MCP is the integration / compatibility layer around the same editing core. Use
it when you need host-managed path restrictions, transport mapping, artifact
mirroring, or approval-aware agent execution. For ordinary Python workbook
editing, openpyxl / xlwings are usually a better fit. For local shell or
agent workflows, prefer the editing CLI.
uvx (recommended)You can run it directly without installation:
uvx --from 'exstruct[mcp]' exstruct-mcp --root C:\data --log-file C:\logs\exstruct-mcp.log --on-conflict rename
Benefits:
pip install requireduvx --from 'exstruct[mcp]==0.4.4' exstruct-mcpYou can also install it with pip:
pip install exstruct[mcp]
exstruct-mcp --root C:\data --log-file C:\logs\exstruct-mcp.log --on-conflict rename
Available tools:
| Tool name | Description |
| ------------------------------- | -------------------------------------- |
| exstruct_extract | Extracts data from a workbook. |
| exstruct_capture_sheet_images | Captures sheet images. |
| exstruct_make | Creates a new workbook. |
| exstruct_patch | Applies editing patches to a workbook. |
| `exstruct_read_json_chunk