If you are looking for the MCP server: You find it here!
If you are looking for the Python Client Example, go on.. ;)
This project provides a simple example of how to use the smooth_operator_agent_tools client library to interact with the Smooth Operator agent.
This example demonstrates:
- Initializing the
SmoothOperatorClient. - Starting the Smooth Operator background server.
- Opening an application (Windows Calculator).
- Using the keyboard to type input.
- Using the mouse with ScreenGrasp to click UI elements by description.
- Retrieving the UI automation tree of the focused window.
- (Optional) Using the OpenAI API (GPT-4o) to interpret the automation tree and determine the application's state (e.g., the result displayed in the calculator).
- (Optional, commented out) Taking a screenshot and using the OpenAI API to analyze it.
- A ScreenGrasp API key. Get a free key from https://screengrasp.com/api.html.
- (Optional) An OpenAI API key if you want to use GPT-4o integration for result verification. Get a key from https://platform.openai.com/api-keys.
-
Clone the repository (if you haven't already):
# Navigate to the parent directory if needed git clone <repository-url> cd smooth-operator/client-libs/example-python
-
Create a virtual environment (recommended):
python -m venv .venv .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure API keys:
- Rename the
.env.examplefile to.env. - Open the
.envfile and replace the placeholder values with your actual ScreenGrasp API key and (optionally) your OpenAI API key.
- Rename the
Make sure the Smooth Operator Agent is running in the background.
Execute the example script:
python example.pyThe script will:
- Start the Smooth Operator server connection.
- Open the Windows Calculator.
- Type "3+4".
- Click the "equals" button.
- Retrieve the calculator's UI state.
- If an OpenAI key is provided, it will ask GPT-4o what result is displayed.
- Print the result from OpenAI (if applicable).
- Wait for you to press Enter before exiting.
- The example includes pauses (
asyncio.sleep) to allow time for applications to open and UI elements to update. You might need to adjust these timings based on your system's performance. - The OpenAI integration is optional. If you don't provide an API key, that part of the example will be skipped.
- The code includes a commented-out section demonstrating how to use screenshots instead of the automation tree for analysis. Screenshots are generally less reliable and more costly in terms of API credits.