🤖 AskUI Python SDK

Enable AI agents to control your desktop (Windows, MacOS, Linux), mobile (Android, iOS) and HMI devices

Why AskUI?

Traditional UI automation is fragile. Every time a button moves, a label changes, or a layout shifts, your scripts break. You're stuck maintaining brittle selectors, writing conditional logic for edge cases, and constantly updating tests.

AskUI Agents solve this by combining two powerful approaches:

Vision-based automation - Find UI elements by what they look like or say, not by brittle XPath or CSS selectors
AI-powered agents - Give high-level instructions and let AI figure out the steps AskUI Python SDK is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. With support for multiple AI models, multi-platform compatibility, and enterprise-ready features,

Whether you're automating desktop apps, testing mobile applications, or building RPA workflows, AskUI adapts to UI changes automatically—saving you hours of maintenance work.

Key Features

Multi-platform - Works on Windows, Linux, MacOS, and Android
Two modes - Single-step UI commands or agentic intent-based instructions
Vision-first - Find elements by text, images, or natural language descriptions
Model flexibility - Anthropic Claude, Google Gemini, AskUI models, or bring your own
Extensible - Add custom tools and capabilities via Model Context Protocol (MCP)
Caching - Save expensive calls to AI APIs by using them only when really necessary

Quick Examples

Agentic UI Automation

Run the script with python <file path>, e.g python test.py to see if it works.

🤖 Let AI agents control your devices

In order to let AI agents control your devices, you need to be able to connect to an AI model (provider). We host some models ourselves and support several other ones, e.g. Anthropic, OpenRouter, Hugging Face, etc. out of the box. If you want to use a model provider or model that is not supported, you can easily plugin your own (see Custom Models).

For this example, we will us AskUI as the model provider to easily get started.

🔐 Sign up with AskUI

Sign up at hub.askui.com to:

Activate your free trial by signing up (no credit card required)
Get your workspace ID and access token

⚙️ Configure environment variables

Linux & MacOS

export ASKUI_WORKSPACE_ID=<your-workspace-id-here>
export ASKUI_TOKEN=<your-token-here>

Windows PowerShell

$env:ASKUI_WORKSPACE_ID="<your-workspace-id-here>"
$env:ASKUI_TOKEN="<your-token-here>"

💻 Example

from askui import ComputerAgent

with ComputerAgent() as agent:
    # Complex multi-step instruction
    agent.act(
        "Open a browser, navigate to GitHub, search for 'askui vision-agent', "
        "and star the repository"
    )

    # Extract information from the screen
    balance = agent.get("What is my current account balance?")
    print(f"Balance: {balance}")

    # Combine both approaches
    agent.act("Find the login form")
    agent.type("user@example.com")
    agent.click("Next")

Extend with Custom Tools

Add new capabilities to your agents:

from askui import ComputerAgent
from askui.tools.store.computer import ComputerSaveScreenshotTool
from askui.tools.store.universal import PrintToConsoleTool

with ComputerAgent() as agent:
    agent.act(
        "Take a screenshot of the current screen and save it, then confirm",
        tools=[
            ComputerSaveScreenshotTool(base_dir="./screenshots"),
            PrintToConsoleTool()
        ]
    )

Getting Started

Ready to build your first agent? Check out our documentation:

Start Here - Overview and core concepts
Setup - Installation and configuration
Using Agents - Using the AskUI ComputerAgent and AndroidAgent
System Prompts - How to write effective instructions
Using Models - Using different models as backbone for act, get, and locate
BYOM - use your own model cloud by plugging in your own model provider
Caching - Optimize performance and costs
Tools - Extend agent capabilities
Reporting - Obtain agent logs as execution reports and summaries as test reports
Observability - Monitor and debug agents
Extracting Data - Extracting structured data from screenshots and files

Official documentation: docs.askui.com

Quick Install

pip install askui[all]

Requires Python >=3.10, <3.14

You'll also need to install AskUI Agent OS for device control. See Setup Guide for detailed instructions.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,107 Commits
.cursor		.cursor
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
scripts		scripts
src/askui		src/askui
tests		tests
.cursorrules		.cursorrules
.env.template		.env.template
.gitignore		.gitignore
.nvmrc		.nvmrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SIMPLIFICATION_CONCEPT.md		SIMPLIFICATION_CONCEPT.md
mypy.ini		mypy.ini
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
vertex.py		vertex.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 AskUI Python SDK

Why AskUI?

Key Features

Quick Examples

Agentic UI Automation

🤖 Let AI agents control your devices

🔐 Sign up with AskUI

⚙️ Configure environment variables

💻 Example

Extend with Custom Tools

Getting Started

Quick Install

License

About

Uh oh!

Releases 89

Uh oh!

Contributors 9

Languages

License

askui/vision-agent

Folders and files

Latest commit

History

Repository files navigation

🤖 AskUI Python SDK

Why AskUI?

Key Features

Quick Examples

Agentic UI Automation

🤖 Let AI agents control your devices

🔐 Sign up with AskUI

⚙️ Configure environment variables

💻 Example

Extend with Custom Tools

Getting Started

Quick Install

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 89

Uh oh!

Contributors 9

Languages