agent-desktop is a new native Rust-based CLI tool that enables AI agents to interact with desktop applications by leveraging OS accessibility trees instead of visual pixel matching.
Key Points
- Provides structured, machine-readable JSON output for 53 distinct commands, including mouse, keyboard, and window management.
- Uses progressive skeleton traversal to reduce token usage by 78–96% when interacting with complex applications like Slack or VS Code.
- Features a C-ABI library (libagent_desktop_ffi) for direct integration with Python, Swift, Go, Ruby, and Node.js without requiring process forking.
- Assigns deterministic element references (e.g., @e1) to interactive UI components, allowing agents to perform precise actions like clicking or typing.
- Requires macOS 13.0+ and Accessibility permissions, with support for Windows and Linux currently in development.