ProgramAT

This repository contains code for an AI-powered assistive technology platform, ProgramAT. ProgramAT equips blind or low-vision (BLV) users to create custom camera-based assistive technologies via natural language instructions. BLV users can test their camera-based ATs with input from their iPhone's camera. The system has 3 major components:

a mobile app, installable via TestFlight.
a server that runs the computation for the camera-based AT you build using the app. This interfaces with a GitHub repository, and runs the AI models necessary for your task.
ProgramAT GitHub repository. This is where your tools will live. It will be a fork of this repository for each individual user. We recommend forking closer to the event date, or keeping an eye on when your fork is behind the parent repository in order to sync.

What is the ProgramAT Mobile App?

This is a React Native app that facilitates AT creation, iteration, and testing of the created AT using your camera feed. You can also monitor the status of their creation. The app streams frames to a Python backend on your server, which executes pluggable tools — object detection, OCR, scene description, and more and returns spoken feedback.

Getting Started

Prerequisites

A GitHub account. Refer to screen reader friendly instructions by Jeff Bishop.
A Copilot subscription.
Github mobile, installed and logged in
Testflight, installed from Apple's app store
A computer with decent compute capacity, or ability to host a VM with decent compute capacity. For best performance, we recommend around 20G of available disk space wherever you plan to host. A GPU is not required.
A screen reader and an accessible web browser.
iPhone 12 or higher running iOS 26. Apple Intelligence or AI-specific features for processors are not required.
Python 3.11+ (for the backend server)
Node.js >= 20
Optional: React Native CLI development environment (setup guide). This is required if you want to run and install the app. You can skip this requirement if you are installing the app using our TestFlight link.
Optional: For iOS: Xcode and CocoaPods. You can skip if you are not building and running the app itself locally.

Installation

Fork this repository This is easiest to do from the Github website. First, select the button that says Fork. Once you have done so, it will take you to an interface where you can select the owner and name of the fork. By default, this is your username for owner, and ProgramAT-opensource for the repository name. We recommend leaving these defaults intact, but you can change them if you would like. Then, click the button that says Create Fork. After a few seconds, a copy of the repository will be made in your Github account.

Clone the repository

git clone https://github.com/your-username/ProgramAT-opensource.git
cd ProgramAT-opensource

Get the API keys See the API keys section for a detailed description.

Running the Mobile App Locally (Optional)

You only need this section if you want to make changes to the frontend and run the React Native app yourself. If you are using the TestFlight version of the app, you can skip this section.

This section requires an iPhone and XCode on Mac system.

Install React Native dependencies
```
cd ProgramATApp
npm install
```
Install iOS dependencies (iOS only)
```
cd ios
pod install
cd ..
```
Start the React Native development server
```
cd ProgramATApp
npx react-native start
```
This starts the React Native Metro development server.
Connect your phone Plug your iPhone into your computer while running the development server.

Depending on your setup, you may also need to:
- Trust the computer on your iPhone.
- Turn on Developer Mode on your iPhone.
- Open the app through Xcode or the React Native development workflow.

Notes:

If you are using the TestFlight version of the app, you do not need to run the React Native development server.
The backend server and the React Native development server are separate processes and must be started in different terminals or windows.
The backend handles tool execution and AI processing. The React Native development server only serves the mobile app frontend during local development.

API Keys

The backend reads configuration from backend/.env automatically when it starts, or from environment variables if you set them directly.

Use the table below as a quick reference for what each value does.

Variable	Required	Description
`LLM_MODEL`	Optional	Model name used by LiteLLM. Update this in `backend/.env` to change which provider/model is active (falls back to `GEMINI_MODEL` if present).
`GEMINI_API_KEY`	Optional	Provider API key for Google Gemini. Keep provider keys in `.env` as needed: `GEMINI_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GROQ_API_KEY`, `MISTRAL_API_KEY`.
`GROQ_API_KEY`	Optional	Provider API key for Groq. Set `LLM_MODEL` to e.g. `groq/llama-3.3-70b-versatile`. Free tier available, no credit card required.
`MISTRAL_API_KEY`	Optional	Provider API key for Mistral AI. Set `LLM_MODEL` to e.g. `mistral/mistral-small-latest`. Free tier available.
`GOOGLE_APPLICATION_CREDENTIALS`	For OCR tools	Google Cloud Vision API credentials used by Live OCR
`GITHUB_TOKEN`	For GitHub features	GitHub personal access token with `repo` scope
`GITHUB_REPO`	Yes (to access your own tools)	Target repo in `owner/repo` format
`HOST` / `PORT`	Optional	Server bind address (default `0.0.0.0:8080`)

How to Get the Keys

The following keys are required for the application to run, but you can add the key of other providers and use the corresponding model for text parsing and image analysis.

GitHub Token

Go to any GitHub page: https://github.com/settings/tokens.
Click Tokens(classic) and click Generate new token.
Add a note, choose an expiration, and select scopes to be repo and workflow.
Click Generate token, copy it and save it properly. You won't be able to see it again.
Paste the token into backend/.env as GITHUB_TOKEN.

Gemini API Key

Go to Google AI Studio: https://aistudio.google.com/app/apikey
Sign in with your personal Google account.
Click Create API key.
Select an existing Google Cloud project or create a new one, then click Create API key.
Copy the generated key.
Paste the key into backend/.env as GEMINI_API_KEY.

Billing:

If you want to increase the rate limit, click Set up billing in Google AI Studio and choose other plans for your API key.

Groq API Key

Go to GroqCloud Console: https://console.groq.com/keys
Sign in or create a free account (no credit card required).
Click Create API Key, give it a name, and click Submit.
Copy the generated key immediately — you won't be able to see it again.
Paste the key into backend/.env as GROQ_API_KEY.
Set LLM_MODEL to a Groq-hosted model, e.g. groq/llama-3.3-70b-versatile.

Mistral API Key

Go to Mistral AI Console: https://console.mistral.ai/api-keys
Sign in or create a free account.
Click Create new key, give it a name, and click Create.
Copy the generated key immediately — you won't be able to see it again.
Paste the key into backend/.env as MISTRAL_API_KEY.
Set LLM_MODEL to a Mistral model, e.g. mistral/mistral-small-latest or mistral/mistral-medium-latest.

Google Application Credentials

This project uses Google Cloud Vision for higher-quality OCR in tools/live_ocr.py. If you want Cloud Vision for better OCR, create a service account and download a JSON key, then set GOOGLE_APPLICATION_CREDENTIALS to that file path. If you prefer not to use Cloud Vision, the tool will fall back to local Tesseract if installed (no Google credentials required).

Steps:

Go to Google Cloud Console: https://console.cloud.google.com
Create or select a project and enable the Cloud Vision API in Menu -> API & Services -> Enabled APIs & services.
Create a service account: Menu -> IAM & Admin -> Service Accounts -> Create service account. Fill name/ID and click Continue.
Create a key: for this service account, select Actions -> Manage Keys -> Add key -> Create new key -> choose JSON -> Create.
A JSON file will be downloaded. Copy the JSON file from the Downloads folder to the backend folder in the codespace. Rename the file to be credentials.json.
Set GOOGLE_APPLICATION_CREDENTIALS to the JSON file path.

Security notes:

Do not commit the JSON key to version control. Add credentials.json to .gitignore.
Restrict the service account to only the Vision API and grant the minimal needed permissions.

Running the Application

You can host the server from your personal machine for free, or use a hosting service like a GCP virtual machine or AWS virtual machine, which may have associated costs.

The difference between these two options, aside from cost, is that when running from your personal machine, your server will shut off whenever your machine does. This is potentially avoidable by using a paid hosting service.

We provide instructions here for hosting from your personal machine. If you would prefer to use a paid hosting service, follow the instructions there for setup.

Notice: You can skip steps 1-2 if ngrok is already installed and configured in your terminal, or if you are using a paid hosting service.

Install ngrok Download ngrok for your system from https://ngrok.com/download and make sure ngrok version works in your terminal.
Connect ngrok to your account Sign up or log in your account in the downloading page. Go to Getting started -> Your authtoken -> Copy to get your authtoken. Configure it in your terminal.
```
ngrok config add-authtoken YOUR_NGROK_AUTHTOKEN
```
Notice: You can skip steps 3-8 by filling the API keys into the prompt in COPILOT_SETUP_PROMPT.md and giving it to Copilot (or your coding agent) and let it set up the server and use ngrok for you. After it gives you the forwarding address, you can go to step 9 and type it in the mobile application.
Create the backend .env file
```
cd backend
cp .env.example .env
```
Fill in the values in backend/.env
- LLM_MODEL: model name used by LiteLLM (for example gemini-3-flash-preview or openai/gpt-4o). Change this to switch provider/model without modifying code.
- Provider API keys: GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, MISTRAL_API_KEY
- GITHUB_TOKEN
- GITHUB_REPO
- GOOGLE_APPLICATION_CREDENTIALS

Set up the backend

python3 -m venv .venv (python -m venv .venv for Windows powershell)
source .venv/bin/activate (.\.venv\Scripts\Activate.ps1 for Windows powershell)
pip install -r requirements.txt

Activate the virtual environment and start the backend server
```
python stream_server.py
```
The server listens on 0.0.0.0:8080 by default.
Open up another terminal or window and start the ngrok tunnel
```
ngrok http 8080
```
Copy your forwarding address If you are using ngrok, keep the terminal created in step 4 open. ngrok will print a forwarding address, which is the public address your app should connect to. Copy this address.

If you are using a paid hosting service, your VM should list a public IP address, copy this IP address.
Paste the forwarding address into the app Change the prefix of the forwarding address from https to wss because we are using a websocket. Open ProgramAT app on your mobile device, go to the server address field in Settings, paste the ngrok forwarding address, and tap Connect.

Troubleshooting

Installing Dependencies

If you are using Windows Powershell and encounter the problem "This error might have occurred since this system does not have Windows Long Path support enabled" when installing LiteLLM, you need to bypass the path length limit. Open PowerShell as Administrator and run:

New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -Force

Then install the dependencies again.

Android build

If you do a release build for Android, the server will not connect to ws addresses, only wss. If using the ngrok tunnelling, this should not be an issue as wss is default, but may not be default behavior from certain paid hosting services (e.g. a GCP VM).

Forked Repository Settings

Since you are working from a fork of this repository, GitHub-related features may need a few repo-level settings that are not always copied over from the upstream project.

Make sure Issues are enabled in Settings -> General -> Features.
Make sure the workflow Auto-assign to Copilot is visible in Actions. If it is missing, delete and re-upload .github/workflows/copilot-assignment.yml.
Go to Settings -> Secrets and variables -> Actions, click New repository secret, create a secret named COPILOT_PAT, and set its value to your GitHub personal access token.
If you use gh auth login, make sure that token has the required scopes, including repo, workflow, and admin:org.

Usage

Select a Tool — Navigate to the Tools tab, browse the available tools, and tap one to select it.
Run — The Tool Runner opens with a live camera preview. Tap Run to execute the tool on single frames, or Stream to process frames continuously. Results are spoken aloud via TTS.
Chat — After a tool run, tap Chat to ask follow-up questions about the result (powered by LiteLLM; change the active model with LLM_MODEL in backend/.env).
Development mode — Use the PRs tab to browse open pull requests, select one to load its tools, and send text updates to GitHub issues.
Tool creation — To instead create a new tool, from development mode, select the "Create New Issue Instead" button in the PRs tab. Then, type or dictate the tool you would like to make into the text box, then submit. If more information is needed, the app will ask for it: in this case, dictate or type an answer to the request and resubmit, it will be appended to your initial request. Once the request is complete, the app will tell you it has made a new issue successfully. Copilot will automatically be assigned and create a relevant pull request. From there, wait for it to generate, and then run it as described in the earlier usage steps!
- If you prefer to do this from desktop, you can also fill out the issue template titled "Visual Assistive Technology" on the Github website
Tool iteration — To update or modify a tool, in development mode, click a relevant PR, and instead of choosing open tools as you would to run a tool, select update issue. Then, type or dictate the change you would like to make and submit. Copilot will automatically be assigned and make the desired changes to the relevant tool.
- If you would prefer to do this from desktop, you can also simply leave a comment on the relevant pull request from the Github website. In this case, you will be responsible for appending @copilot to your comment to assign Copilot.

Supported Input Modes

Streaming mode — Tools process frames continuously and return real-time audio feedback
Single-frame mode — Capture one frame and get a detailed result
Real-time camera streaming at configurable FPS via react-native-vision-camera
Conversation mode — Ask follow-up questions about tool results via a Chat tab. This becomes available after a single-frame mode interaction.
Custom GPT-like tools — Tools flagged as Custom GPT use Gemini Live for streaming multimodal conversations instead of executing code per frame

Supported Feedback Modes

Text-to-Speech feedback — all tool results are spoken aloud automatically
Rich audio output — tools can return speech, beeps, haptic vibration, earcons, and more via the AudioOutputService
Speech-to-Text input — voice input for follow-up questions using @react-native-voice/voice and OS-level dictation

Usage Modes

Development Mode (GitHub Integration)

PR browser — List open pull requests, select one, and load its tools
Text input for issues — Create or update GitHub issues with AI-powered parsing (LiteLLM)
Multi-turn conversations — The server asks for missing fields until the issue is complete
Copilot session logs — View AI coding session summaries per PR

Production Mode

Tools are pulled from the main branch only
The PR browser tab is hidden; users go straight to the tool list

Review mode

Tools are pulled from the main repository server (as opposed to your self-hosted server), specifically PRs people have put up for review
You can test tools as you would in development mode, but cannot directly create or edit tools.
- If you would like a tool of yours to be visible in review mode, create it locally, then submit a PR!
You can review tools (approve if they work as intended, request changes otherwise)
- Note that for this mode to work properly, you should first become a contributor. These permissions are needed to leave reviews.

Built-In Tools

Tool	Description	Model
Object Recognition	Detects and announces objects using YOLO11 + COCO	YOLOv11
Live OCR	Reads visible text aloud in real time	Google Cloud Vision API
Scene Description	Generates a spoken description of the scene	LiteLLM Vision
Camera Aiming	Guides users to center an object for a well-framed photo	YOLOv11
Door Detection	Detects doors/doorways with clock-face navigation cues	YOLOWorld
Empty Seat Detection	Finds unoccupied chairs and gives directional guidance	YOLOv11
Clothing Recognition	Identifies the most prominent clothing item and its features	LiteLLM Vision

New tools can be added by placing a Python file in the tools/ directory. Each tool exposes a main(image, input_data) function and returns an audio-friendly string or dict.

Example tool ideas!

You can make a wide variety of tools using ProgramAT! For best performance, we recommend thinking about tools that are camera based, and, for the time being, are stateless (do not need to remember their previous responses/backreference previous frames) and do not require bringing in an outside dataset.

Want to bring in stateless capabilities? Or ability to use outside datasets? Implement these features and become a contributor to share them with the community!

To get your imagination started, here are some ideas for tools people have tried before!

Plug point detector (see it run!): Help find plug points/outlets in unfamiliar rooms.
Playing card reader (see it run!): Describe a hand of playing cards, and how they change over the course of a card game
Uber finder (see it run!): Help identify an Uber by describing the colors, makes, and models of car in frame.
Clothing description: Get a simple description of an article or articles of clothing
Mail sorter: Describe what important vs. junk mail means to you and get guidance on whether you should open a particular piece of mail.
Makeup checker: Take a photo and analyze if there are any issues with your makeup application.
Sock matcher: Determine if two socks out of the laundry match

Project Structure

ProgramAT-opensource/
├── ProgramATApp/              # React Native mobile app
│   ├── App.tsx               # Root component, WebSocket setup, state management
│   ├── TabNavigator.tsx      # Bottom tabs: PRs, Tools, Chat, Settings
│   ├── ToolSelector.tsx      # Lists available tools
│   ├── ToolRunner.tsx        # Runs selected tool against camera feed
│   ├── CameraView.tsx        # Camera capture & frame streaming
│   ├── Chat.tsx              # Follow-up conversation interface
│   ├── PRsAndText.tsx        # PR browser & text input (dev mode)
│   ├── Settings.tsx          # Mode switching, server selection, theme
│   ├── WebSocketService.ts   # WebSocket connection manager
│   ├── AudioOutputService.ts # Multi-modal audio output (TTS, beeps, haptics)
│   ├── TextToSpeechService.ts
│   ├── BeepService.ts
│   ├── config.ts             # Server URLs, feature flags, mode config
│   └── __tests__/            # Jest test suite
├── backend/                   # Python WebSocket server
│   ├── stream_server.py      # Main server — message routing, tool execution, GitHub integration
│   ├── gemini_live.py        # Gemini Live API session manager for Custom GPT tools
│   ├── gemini_summarizer.py  # Summarizes Copilot session logs
│   ├── copilot_db.py         # SQLite storage for Copilot session data
│   ├── module_manager.py     # Auto-installs missing Python packages at runtime
│   └── requirements.txt      # Python dependencies
└── tools/                     # Pluggable vision/AI tools (Python)
    ├── object_recognition.py
    ├── live_ocr.py
    ├── scene_description.py
    ├── camera_aiming.py
    ├── door_detection.py
    ├── empty_seat_detection.py
    └── clothing_recognition.py

Development

Running Tests

cd ProgramATApp
npm test

Adding a New Tool

Create a Python file in tools/ (e.g., tools/my_tool.py).
Implement a main(image, input_data) function that returns an audio-friendly string or dict.
The tool is automatically discovered and available in the app when loaded from the server.

See tools/MODEL_SETUP.md and the existing tools for examples.

Building for Production

For iOS:

Open ProgramATApp/ios/ProgramATApp.xcworkspace in Xcode
Select your signing team and provisioning profile
Build for Release

Technology Stack

Mobile App

React Native 0.82.1 — Bare (non-Expo) for direct native module access
TypeScript
react-native-vision-camera — Camera capture and frame streaming
@react-native-voice/voice — Speech-to-text for follow-up questions
react-native-tts — Text-to-speech for tool results
react-native-audio-api — Programmatic audio generation (beeps, tones)
AsyncStorage — Local persistence for settings and sessions

Backend

Python 3.11 with async websockets
LiteLLM (configurable providers) — Model runtime used for AI parsing, scene description, and clothing recognition. LiteLLM can route requests to provider backends such as Google Gemini, OpenAI, Anthropic, Groq, or Mistral; switch the active model via LLM_MODEL in backend/.env.
Google Cloud Vision API — OCR
Ultralytics (YOLOv11 / YOLOWorld) — Object detection
OpenCV / NumPy / Pillow — Image processing
PyGithub — GitHub issue and PR integration

Get involved!!

As an open source project, we welcome community feedback and contributions.

If you are interested in going beyond your fork and contributing to ProgramAT, please start by reading our Code of Conduct and Contribution Guidelines.

Once you have done so, you can request the permissions needed to become a contributor by following these instructions!

License

See LICENSE.

Contributions

Ellie Seehorn (PhD student at University of Michigan)
Yushan Wei (Undergraduate Student at University of Michigan)
Venkatesh Potluri (Assistant Professor, University of Michigan. Principal Investigator of the Intelligent Developer Experiences for Accessibility Lab)
Anhong Guo (Assistant Professor, University of Michigan and principal investigator of the Human AI Lab)

Found a problem? Please file an issue!

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
.github		.github
ProgramATApp		ProgramATApp
backend		backend
tools		tools
.DS_Store		.DS_Store
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPILOT_SETUP_PROMPT.md		COPILOT_SETUP_PROMPT.md
DEV_PRODUCTION_MODES.md		DEV_PRODUCTION_MODES.md
LICENSE.md		LICENSE.md
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ProgramAT

What is the ProgramAT Mobile App?

Getting Started

Prerequisites

Installation

Running the Mobile App Locally (Optional)

API Keys

How to Get the Keys

GitHub Token

Gemini API Key

Groq API Key

Mistral API Key

Google Application Credentials

Running the Application

Troubleshooting

Installing Dependencies

Android build

Forked Repository Settings

Usage

Supported Input Modes

Supported Feedback Modes

Usage Modes

Development Mode (GitHub Integration)

Production Mode

Review mode

Built-In Tools

Example tool ideas!

Project Structure

Development

Running Tests

Adding a New Tool

Building for Production

Technology Stack

Mobile App

Backend

Get involved!!

License

Contributions

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages