Blog: The Self-Operating Computer Framework: A Year in Review →

Self-Operating Computer

An open-source framework to enable multimodal models to operate a computer.

Self-Operating Computer

Using the same inputs and outputs of a human operator, this framework enables multimodal AI models to view the screen and decide on a series of mouse and keyboard actions to reach an objective.

Integration

It is currently integrated with GPT-4o, o1, Gemini Pro Vision, Claude 3 and LLaVa.

Compatibility

Designed for support across operating systems and to be used various multimodal models.

This project is compatible with Mac OS, Windows, and Linux (with X server installed).

Join the Discussion and Contribute on GitHub

We encourage contributions and discussion via the Self Operating Computer GitHub page.

Our team is unable to provide custom support at this time.

Self-Operating Computer

Self-Operating Computer

Integration

Compatibility

Join the Discussion and Contribute on GitHub

An open-source framework to enable multimodal models to operate a computer.

Ask a computer to do anything

The all-encompassing AI solution you've been waiting for

Email Management

Everyday Tasks

Research