by PathOnAIOrg
[NAACL2025] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
# Add to your Claude Code skills
git clone https://github.com/PathOnAIOrg/LiteWebAgentNo comments yet. Be the first to share your thoughts!
Disclaimer: Please note that LiteWebAgent is not affiliated with any for-profit company. This is a collaborative project from PathOnAI.org, an open-source AI research community, where Danqing Zhang (danqing.zhang.personal@gmail.com) is the main contributor and lead author of the NAACL paper. If anyone claims LiteWebAgent is affiliated with any for-profit company, please contact Danqing Zhang (danqing.zhang.personal@gmail.com) for verification.
VisualTreeSearch is a production-ready system for visualizing and understanding web agent test-time scaling. It builds upon LiteWebAgent to provide an intuitive framework for researchers and users to understand tree search execution in web agents.
From PyPI: https://pypi.org/project/litewebagent/
pip install litewebagent
Then, a required step is to setup playwright by running
playwright install chromium
Test playwright & chromium installation by running this script
python test_installation.py
Then please create a .env file, and update your API keys:
cp .env.example .env
You are ready to go! Try FunctionCallingAgent on google.com
python examples/google_test.py
Set up locally
First set up virtual environment, and allow your code to be able to see 'litewebagent'
python3.11 -m venv venv
. venv/bin/activate
pip3.11 install -e .
Then please create a .env file, and update your API keys:
cp .env.example .env
Test playwright & chromium installation by running this script
python3.11 test_installation.py
## easy case
python3.11 -m prompting_main --agent_type PromptAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
## more complicated case
python3.11 -m prompting_main --agent_type PromptAgent --starting_url https://www.amazon.com/ --goal 'add a bag of dog food to the cart.' --plan 'add a bag of dog food to the cart.' --log_folder log
## easy case
python3.11 -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
python3.11 -m function_calling_main --agent_type HighLevelPlanningAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
python3.11 -m function_calling_main --agent_type ContextAwarePlanningAgent --starting_url https://www.google.com --goal 'search dining table' --plan 'search dining table' --log_folder log
## more complicated case
python3.11 -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.amazon.com/ --goal 'add a bag of dog food to the cart.' --plan 'add a bag of dog food to the cart.' --log_folder log
https://www.loom.com/share/1018bcc4e21c4a7eb517b60c2931ee3c https://www.loom.com/share/aa48256478714d098faac740239c9013 https://www.loom.com/share/89f5fa69b8cb49c8b6a60368ddcba103 https://www.loom.com/share/8c59dc1a6f264641b6a448fb6b7b4a5c
We use axtree by default. Alternatively, you can provide a comma-separated string listing the desired input feature types.
python3.11 -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.airbnb.com --goal 'set destination as San Francisco, then search the results' --plan '(1) enter the "San Francisco" as destination, (2) and click search' --log_folder log
python3.11 -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.airbnb.com --goal 'set destination as San Francisco, then search the results' --plan '(1) enter the "San Francisco" as destination, (2) and click search' --features interactive_elements --log_folder log
python3.11 -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.airbnb.com --goal 'set destination as San Francisco, then search the results' --plan '(1) enter the "San Francisco" as destination, (2) and click search' --features axtree,interactive_elements --log_folder log
First, tell Git to ignore future changes to state.json:
git update-index --skip-worktree state.json
Then run the load_state.py script and log into the websites to enable auto-login:
python3.11 load_state.py save
We integrated AWM (Agent Workflow Memory) into the LiteWebAgent framework. You can follow these three steps to include induced workflows as memory for the web agent, we use 'add a bag of dog food to the cart' on amazon website as an example:
Step 1: Induce workflows from mind2web datasets
python3.11 memory/mind2web_workflows_induction.py --websites amazon
Please note that you can induce workflows for multiple websites by passing a comma-separated list of website names to the --websites parameter:
python3.11 memory/mind2web_workflows_induction.py --websites amazon,aa
Step 2: Embed and store workflows in DB for retrieval
python3.11 memory/update_vector_store.py
Step 3: Run function calling agent with memory
python3.11 -m function_calling_main --agent_type FunctionCallingAgent --starting_url https://www.amazon.com/ --goal 'add a bag of dog food to the cart.' --workflow_memory_website amazon
Start the Python backend server:
python3.11 -m api.server --port 5001
| Paper | Agent | |--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------| | SoM (Set-of-Mark) Agent | PromptAgent | | Mind2Web