by Zipstack
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
# Add to your Claude Code skills
git clone https://github.com/Zipstack/unstractGuides for using ai agents skills like unstract.
Last scanned: 6/2/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-06-02T08:37:40.867Z",
"npmAuditRan": true,
"pipAuditRan": true
}Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.
Built for teams in finance, insurance, healthcare, KYC/compliance, and much more.
| Task | Without Unstract | With Unstract |
|------|------------------|---------------|
| Schema definition | Write regex, build templates per vendor | Write a prompt once, handles variations |
| New document type | Days of development | Minutes in Prompt Studio |
| LLM integration | Build your own pipeline | Plug in any provider (OpenAI, Anthropic, Bedrock, Ollama) |
| Deployment | Custom infrastructure | ./run-platform.sh or managed cloud |
| Output | Unstructured text blobs | Clean JSON, ready for your database |
⭐ If Unstract helps you, star this repo!
Prompt Studio — Define document extraction schemas with natural language. Docs →

API Deployment — Send a document over REST API, get JSON back. Docs →

ETL Pipeline — Pull documents from a folder, process them, load to your warehouse. Docs →
MCP Server — Connect to AI agents (Claude, etc.) via Model Context Protocol. Docs →
n8n Node — Drop into existing automation workflows. Docs →
# Clone and start
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.sh
That's it!
unstract password: unstract# Pull and run entire Unstract platform with default env config.
./run-platform.sh
# Pull and run docker containers with a specific version tag.
./run-platform.sh -v v0.1.0
# Upgrade existing Unstract platform setup by pulling the latest available version.
./run-platform.sh -u
# Upgrade existing Unstract platform setup by pulling a specific version.
./run-platform.sh -u -v v0.2.0
# Build docker images locally as a specific version tag.
./run-platform.sh -b -v v0.1.0
# Build docker images locally from working branch as `current` version tag.
./run-platform.sh -b -v current
# Display the help information.
./run-platform.sh -h
# Only do setup of environment files.
./run-platform.sh -e
# Only do docker images pull with a specific version tag.
./run-platform.sh -p -v v0.1.0
# Only do docker images pull by building locally with a specific version tag.
./run-platform.sh -p -b -v v0.1.0
# Upgrade existing Unstract platform setup with docker images built locally from working branch as `current` version tag.
./run-platform.sh -u -b -v current
# Pull and run docker containers in detached mode.
./run-platform.sh -d -v v0.1.0
[!WARNING] This key encrypts adapter credentials — losing it makes existing adapters inaccessible!
Copy the value of ENCRYPTION_KEY from backend/.env or platform-service/.env to a secure location.
┌────────────────────────────────────────────────────────────┐
│ Unstract │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│ Frontend │ Backend │ Worker │ Platform Service │
│ (React) │ (Django) │ (Celery) │ (FastAPI) │
├─────────────┴─────────────┴─────────────┴──────────────────┤
│ Cache (Redis) │
├────────────────────────────────────────────────────────────┤
│ Message Queue (RabbitMQ) │
├────────────────────────────────────────────────────────────┤
│ Database (PostgreSQL) │
├────────────────────────────────────────────────────────────┤
│ LLM Adapters │ Vector DBs │ Text Extractors │
│ (OpenAI, etc.) │ (Qdrant, etc.) │ (LLMWhisperer) │
└────────────────────────────────────────────────────────────┘
Also see architecture.
| Category | Formats | |----------|---------| | Documents | PDF, DOCX, DOC, ODT, TXT, CSV, JSON | | Spreadsheets | XLSX, XLS, ODS | | Presentations | PPTX, PPT, ODP | | Images | PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP |
| Provider | Status | Provider | Status | |----------|--------|----------|--------| | OpenAI | ✅ | Azure OpenAI | ✅ | | OpenAI Compatible | ✅ | Anthropic Claude | ✅ | | AWS Bedrock | ✅ | Google Gemini | ✅ | | Ollama (local) | ✅ | Mistral AI | ✅ | | Anyscale | ✅ | | |
| Provider | Status | Provider | Status | |----------|--------|----------|--------| | Qdrant | ✅ | Pinecone | ✅ | | Weaviate | ✅ | PostgreSQL | ✅ | | Milvus | ✅ | | |
| Provider | Status | |----------|--------| | LLMWhisperer | ✅ | | Unstructured.io | ✅ | | LlamaIndex Parse | ✅ |
Sources: AWS S3, MinIO, Google Cloud Storage, Azure Blob, Google Drive, Dropbox, SFTP
Destinations: Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
Follow these steps to change the default username and password.
# Install pre-commit hooks
./dev-env-cli.sh -p
# Run pre-commit checks
./dev-env-cli.sh -r
Finance & Banking → | Insurance → | Healthcare → | Income Tax →
For teams that need managed infrastructure, advanced accuracy features, or compliance certifications.
No comments yet. Be the first to share your thoughts!