by harumiWeb
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.
# Add to your Claude Code skills
git clone https://github.com/harumiWeb/exstructExStruct reads Excel workbooks into structured data and applies patch-based editing workflows through a shared core. It provides extraction APIs, a JSON-first editing CLI, and an MCP server for host-managed integrations, with options tuned for LLM/RAG preprocessing, reviewable edit flows, and local automation.
No comments yet. Be the first to share your thoughts!
Detection heuristics, editing workflows, and output modes are adjustable for LLM/RAG pipelines and local automation.
| Use case | Recommended interface | Why |
| --- | --- | --- |
| Write direct Python Excel-editing code | openpyxl / xlwings | Usually the better fit for imperative Python editing. Reach for exstruct.edit only when you specifically want ExStruct's patch contract in Python. |
| Run local operator or AI-agent edit workflows | exstruct patch, make, ops, validate | Canonical operational interface; JSON-first and dry-run friendly. |
| Run sandboxed or host-managed integrations | exstruct-mcp / MCP tools | Integration / compatibility layer that owns PathPolicy, transport, and artifact behavior. |
Extraction keeps the existing top-level Python API (extract, process_excel,
ExStructEngine) and the legacy exstruct INPUT.xlsx ... CLI entrypoint.
light (cells + table candidates + print areas only), libreoffice (best-effort non-COM mode for .xlsx/.xlsm; adds merged cells, shapes, connectors, and charts when the LibreOffice runtime is available), standard (Excel COM mode with texted shapes + arrows, charts, SmartArt, and merged-cell ranges), verbose (all shapes with width/height plus cell hyperlinks).formulas_map (formula string -> cell coordinates) via openpyxl/COM. It is enabled by default in verbose and can be controlled with include_formulas_map.--pretty for formatting), YAML, and TOON (optional dependencies).provenance, approximation_level, and confidence are omitted from serialized output by default. Enable them with --include-backend-metadata or include_backend_metadata=True.exstruct.edit only when you need the same patch contract from Python.verbose mode, or with include_cell_links=True, cell links are emitted in links.standard / verbose, PDF and sheet images can be generated when Excel COM is available.pip install exstruct
Optional extras:
pip install pyyamlpip install python-toonpip install pypdfium2 pillow (mode=libreoffice is not supported)pip install exstruct[yaml,toon,render]Platform note:
mode=libreoffice as the best-effort rich mode or mode=light for minimal extraction. .xls is not supported in mode=libreoffice.python3-uno. ExStruct probes a compatible system Python automatically for mode=libreoffice; if your environment needs an explicit interpreter, set EXSTRUCT_LIBREOFFICE_PYTHON_PATH=/usr/bin/python3.--probe mode before selection. An incompatible EXSTRUCT_LIBREOFFICE_PYTHON_PATH fails fast instead of surfacing a delayed bridge SyntaxError during extraction.ubuntu-24.04 and windows-2025. Linux installs libreoffice + python3-uno; Windows installs libreoffice-fresh, sets EXSTRUCT_LIBREOFFICE_PATH, and both jobs run tests/core/test_libreoffice_smoke.py with RUN_LIBREOFFICE_SMOKE=1.exstruct input.xlsx > output.json # compact JSON to stdout by default
exstruct input.xlsx -o out.json --pretty # write pretty JSON to a file
exstruct input.xlsx --format yaml # YAML (requires pyyaml)
exstruct input.xlsx --format toon # TOON (requires python-toon)
exstruct input.xlsx --sheets-dir sheets/ # write one file per sheet
exstruct input.xlsx --auto-page-breaks-dir auto_areas/ # always shown; execution requires standard/verbose + Excel COM
exstruct input.xlsx --alpha-col # output column keys as A, B, ..., AA
exstruct input.xlsx --include-backend-metadata # include shape/chart backend metadata
exstruct input.xlsx --mode light # cells + table candidates only
exstruct input.xlsx --mode libreoffice # best-effort extraction of shapes/connectors/charts without COM
exstruct input.xlsx --pdf --image # PDF and PNGs (Excel COM required)
Auto page-break export is available from both the API and the CLI when Excel/COM is available. The CLI always exposes --auto-page-breaks-dir, but validates it at execution time.
mode=libreoffice rejects --pdf, --image, and --auto-page-breaks-dir early, and mode=light also rejects --auto-page-breaks-dir. Use standard or verbose with Excel COM for those features.
By default, the CLI keeps legacy 0-based numeric string column keys ("0", "1", ...). Use --alpha-col when you need Excel-style keys ("A", "B", ...).
By default, serialized shape/chart output omits backend metadata (provenance, approximation_level, confidence) to reduce token usage. Use --include-backend-metadata or the corresponding Python/MCP option when you need it.
Note: MCP exstruct_extract defaults to options.alpha_col=true, which differs from the CLI default (false).
exstruct patch --input book.xlsx --ops ops.json --backend openpyxl
exstruct patch --input book.xlsx --ops - --dry-run --pretty < ops.json
exstruct make --output new.xlsx --ops ops.json --backend openpyxl
exstruct ops list
exstruct ops describe create_chart --pretty
exstruct validate --input book.xlsx --pretty
patch and make print JSON PatchResult to stdout.ops list / ops describe expose the public patch-op schema.validate reports workbook readability (is_readable, warnings, errors).exstruct extract or interactive safety flags yet.Recommended edit flow:
exstruct patch --dry-run and inspect PatchResult, warnings, and diff.--backend openpyxl when you want the dry run and the real apply to use the same engine.--backend auto, inspect PatchResult.engine; on Windows/Excel hosts the real apply may switch to COM.--dry-run only after the result is acceptable.ExStruct also ships one repo-owned Skill for agents that should follow the editing CLI safely instead of rediscovering the workflow each time.
Canonical repo source:
.agents/skills/exstruct-cli/You can install it with the following single command:
npx skills add harumiWeb/exstruct/.agents/skills --skill exstruct-cli
That command should install exstruct-cli directly from this repository's