A computer-use or browser-use tool is a capability that lets an AI agent operate software directly—usually a web browser, and sometimes the broader desktop—by clicking, typing, scrolling, and reading the visible screen instead of calling an API.
Not every task has a clean API. Many real workflows still live in web apps: logging into dashboards, filling forms, checking inventories, downloading reports, or copying data between systems.
A browser-use tool helps an LLM or agent handle those tasks by interacting with the same interface a person would use. In practice, teams reach for it when:
It is especially useful for “long tail” business processes, but it is usually slower and less reliable than using a structured API when one exists.
Most computer-use tools expose a small set of actions such as:
click(x, y) or click a named element,type(text),scroll(direction),read the current page or screenshot,open(url) or press_key(...).The agent observes the interface, reasons about what to do next, and chooses one of those actions. In browser-focused tools, the environment is often a real browser with DOM access and/or screenshots. In broader desktop tools, the agent may rely on screenshots, accessibility trees, or other UI signals to navigate windows and apps.
A common pattern is:
This “observe → act → observe” loop is what makes computer-use different from ordinary text generation. The model is not just answering a question; it is controlling software.
User: “Download last month’s invoice from the vendor portal.”
Agent:
That whole flow may be done with browser actions rather than an API integration.
In short: computer-use/browser-use tools are great for reaching systems through their interface, but they are not a magic replacement for well-designed integrations.