Android assistant experiment

On Screen

So this is On Screen. It can do native tasks on your phone, so you do not have to worry about every small thing yourself.

I wanted a phone assistant that could do more than answer questions. On Screen is my first attempt at that: it looks at what is visible, understands the controls on the screen, and uses accessibility to help move through native Android apps.

The first version is for people who want to test the idea, give feedback, and help figure out what would make a phone agent more reliable.

Download APK

Why this exists

Most phone assistants still stop at advice. They can answer a question or explain where a setting might be, but they usually do not inspect the current Android screen and act inside the app for you.

On Screen is my attempt at an autonomous Android agent: something that can read the visible UI, reason about the next action, and use accessibility to tap, type, scroll, and navigate through real phone workflows.

The goal is not just guidance. The goal is a phone agent that can complete native Android tasks while still being easy to supervise, pause, or stop when needed.

How version one works

Bring your own API key

The public app is mainly the cloud version for now. Paste your OpenAI API key in the app, optionally add a Brave Search key, then press play to launch the agent.

No backend from me

Agent memory is stored on your phone. I am not storing your data on my server because there is no backend from my side. In cloud mode, the model provider still receives the data you send to it.

Two branches

There are two directions: cloud and on-device. The demo version uses cloud real-time model APIs because that is the usable path today.

I also experimented with On Screen Offline: whisper.cpp for speech to text, Gemma as the vision-language model, and Kitten TTS for speech. The idea is exciting because privacy is better and there are no model API costs, but it is currently very slow and can heat the phone.

For a future version, I want to explore smaller on-device models, mobile GUI grounding, supervised fine-tuning on mobile action data, and RL on AndroidWorld-style tasks.

Before installing

Android or Play Protect may treat the app as suspicious because it uses accessibility permissions and can control the screen.
Payment, banking, wallet, or protected apps may not work while this app is installed. They block tools like this for security, so you may not be able to use some payment flows until you uninstall On Screen. I plan to fix or reduce this in the next version.
The cloud version can consume a lot of tokens. Keep your OpenAI usage panel open while testing.
This is version one. It can hallucinate, get stuck, or move to a new task before finishing the current one.
Like any app that can operate your device, test carefully and uninstall it after testing if you do not need it.

Feedback I need

Technical feedback would be especially useful. If you know Android, accessibility services, phone systems, on-device ML, GUI grounding, or agent training, tell me what could make this faster and more robust.

I spent about five days building and experimenting with this version. Your comments and ideas will decide whether version two should exist.

Send feedback in DMs: LinkedIn X