# Add to your Claude Code skills
git clone https://github.com/harumiWeb/exstruct
README.md
ExStruct — Excel Structured Extraction Engine
ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, smartart, merged cell ranges, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines.
Excel → Structured JSON: cells, shapes, charts, smartart, table candidates, print areas/views, and auto page-break areas per sheet.
Output modes: (cells + table candidates + print areas; no COM, shapes/charts empty), (texted shapes + arrows, charts, smartart, merged cell ranges, print areas), (all shapes with width/height, charts with size, merged cell ranges, print areas). Verbose also emits cell hyperlinks and . Size output is flag-controlled.
llm
mcp-server
python-library
rag
structured-data
xlwings
light
standard
verbose
colors_map
Formula map extraction: emits formulas_map (formula string -> cell coordinates) via openpyxl/COM; enabled by default in verbose or via include_formulas_map.
Auto page-break export (COM only): capture Excel-computed auto page breaks and write per-area JSON/YAML/TOON when requested (CLI option appears only when COM is available).