by bytedance
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
# Add to your Claude Code skills
git clone https://github.com/bytedance/UI-TARS-desktopEnglish | 简体中文
TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product. It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline
https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8
For more use cases, please check out #842.
# Launch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
Visit the compre
No comments yet. Be the first to share your thoughts!