Page-agent.js is a GUI agent that you drop into a webpage with a single script tag. It executes natural language commands like “fill out this form” without screenshots or multimodal models. The tool reads the DOM as text, adds an AI copilot to your SaaS with a few lines of code, and makes legacy web apps accessible via voice or text.
Page-agent.js is an in-page GUI agent built by Alibaba that lets you control web interfaces with natural language. No browser extension, headless browser, or screenshot-based vision model is required. It operates on text extracted from the DOM, making it lightweight, auditable, and easy to integrate.

Customer Persona
Page-agent.js targets developers building SaaS products who want to add an AI layer to existing user interfaces. It suits product teams that need to ship AI copilots without rewriting backends. The tool appeals to anyone who wants to make legacy web apps accessible via voice or text controls.
Market Analysis
In-page AI automation is a growing niche with alternatives like browser extensions, headless browsers, and vision-based models. Page-agent.js differentiates itself by requiring only a script tag, no extension installation or backend changes. For connecting AI agents to external services, consider using techniques shown in our guide on connecting AI agents to Google Workspace. This script‑tag approach positions Page-agent.js as a low‑friction solution for developers who prioritize quick integration and DOM‑level control.
Project Link
Project link:
https://github.com/alibaba/page-agent
How It Works
Page-agent.js works by injecting a script tag that loads the agent library. The library extracts the DOM as text and sends it to a configured LLM provider. Users can issue natural language commands through a floating UI or a dedicated interface. The agent parses the command, identifies the relevant DOM elements, and performs the requested actions.

Integration starts by cloning the repository and adding the script tag to your webpage. Configure your LLM endpoint and any required authentication. The README provides step‑by‑step instructions for setting up the floating UI and connecting to your preferred LLM. Testing with a non‑critical page is recommended to ensure command reliability across different DOM structures.
| Feature | Why it matters |
|---|---|
| Easy integration | Drop one script tag, no extension or backend rewrite required |
| Text based DOM manipulation | No screenshots or multi modal LLMs, works on textual DOM representation |
| Bring your own LLMs | Use your preferred LLM provider for privacy and cost control |
| Optional extension & MCP | Chrome extension for multi page flows, plus an MCP server in beta |
Back up your app’s DOM structure tests before wide deployment, because DOM variance across clients can affect command reliability.
Advertising Section
For more advanced AI agent deployments, explore our partner solutions.
The Verdict
Page-agent.js delivers an in-page AI copilot with minimal integration overhead. The script‑tag model and text‑based DOM manipulation provide a lightweight, auditable approach. However, DOM variance across clients can affect command reliability, so thorough testing is essential. Some features may require paid LLM endpoints, and sending actions to a live DOM has security implications. Ensure compliance with your LLM provider’s terms and local regulations before deploying sensitive workloads.