A minimal, functional MVP that converts AI output into safe, structured, interactive UI schemas. This is a core infrastructure primitive for AI-driven UI generation.
AUI (AI UI) is a monolith application that:
- Takes AI output (text or JSON) and converts it to UI intent schemas
- Renders interactive UI components (form, choice, confirmation)
- Handles user interactions and feeds them back to the AI workflow
- Uses Ollama Cloud for AI model inference (no local setup required)
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Frontend │─────▶│ Backend │─────▶│ Ollama Cloud │
│ (React) │◀─────│ (FastAPI) │◀─────│ (Mistral) │
└─────────────┘ └──────────────┘ └──────────────┘
- Backend (FastAPI):
/api/hydrateendpoint that converts AI output to UI schemas - AI Layer (Ollama Cloud): Cloud-hosted Mistral 7B quantized model (q4_0) for fast UI intent extraction
- Frontend (React + TypeScript + Tailwind): Dynamic UI renderer for form, choice, and confirmation UIs
- Python 3.10+
- Node.js 18+
- Docker and Docker Compose (for containerized setup)
- Ollama Cloud API key (get one at https://ollama.com)
-
Clone and navigate to the project:
cd aui -
Get your Ollama Cloud API key:
- Sign up at https://ollama.com
- Get your API key from the dashboard
- Set it as an environment variable:
export OLLAMA_API_KEY=your_api_key_here - Or create a
.envfile in the project root:OLLAMA_API_KEY=your_api_key_here
-
Start all services:
docker-compose up --build
The docker-compose file will automatically use your
OLLAMA_API_KEYenvironment variable. -
Verify Ollama Cloud connection:
curl http://localhost:8000/api/health
This will show if Ollama Cloud is connected and ready.
-
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/api/health
Note: With Ollama Cloud, you don't need to pull models manually - they're available instantly. The default model is mistral:7b-instruct-q4_0, but you can switch models using the model switcher in the UI or by setting the OLLAMA_MODEL environment variable.
Stop and restart all services:
docker-compose down
docker-compose up --buildRestart without rebuilding (faster):
docker-compose restartRestart specific service:
docker-compose restart backend
docker-compose restart frontendStop all services (keeps containers):
docker-compose stopStop and remove containers:
docker-compose downStop and remove containers + volumes (clean slate):
docker-compose down -vAll services:
docker-compose logs -fSpecific service:
docker-compose logs -f backend
docker-compose logs -f frontendList running containers:
docker-compose ps-
Navigate to backend directory:
cd backend -
Create virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables: Create a
.envfile:OLLAMA_API_KEY=your_api_key_here OLLAMA_MODEL=mistral:7b-instruct-q4_0
Model Selection:
- Default:
mistral:7b-instruct-q4_0(4-bit quantized, 3-5x faster) - Faster:
mistral:7b-instruct-q8_0(8-bit, slightly larger but faster) - Very Fast:
llama3.2:1b(1B params, excellent for structured tasks) - Balanced:
phi3:mini(3.8B params, optimized for speed)
Note: With Ollama Cloud, models are available instantly - no need to pull them manually.
- Default:
-
Run the backend:
uvicorn main:app --reload
Backend will be available at http://localhost:8000
-
Navigate to frontend directory:
cd frontend -
Install dependencies:
npm install
-
Create
.envfile (optional, defaults to localhost:8000):VITE_API_BASE_URL=http://localhost:8000
-
Run the frontend:
npm run dev
Frontend will be available at http://localhost:5173
Convert AI output to UI schema.
Request:
{
"interaction_id": "optional-string",
"ai_output": "I need your name and email address",
"context": {}
}Response:
{
"interaction_id": "uuid-string",
"ui": {
"ui_type": "form",
"title": "Information Request",
"fields": [
{
"key": "name",
"label": "Name",
"type": "text",
"required": true
},
{
"key": "email",
"label": "Email",
"type": "email",
"required": true
}
],
"submit_label": "Submit"
},
"confidence": 0.85
}Process user interaction and get next UI.
Request:
{
"interaction_id": "uuid-string",
"event": {
"type": "submit",
"data": {
"name": "John Doe",
"email": "john@example.com"
}
}
}Response:
{
"interaction_id": "uuid-string",
"ui": {
"ui_type": "confirmation",
"title": "Success",
"message": "Thank you for your submission!"
},
"confidence": 0.9
}Collects user input with various field types:
text: Text inputnumber: Numeric inputemail: Email input with validationselect: Dropdown selection
Presents multiple options for user selection.
Displays a message (typically after an action).
- Start the application (Docker or local)
- Open the frontend (http://localhost:3000 or http://localhost:5173)
- Initial UI: A form will appear asking for name and email
- Submit the form: Fill in the fields and click submit
- Next UI: A confirmation message will appear
- Try new output: Use the textarea at the bottom to test different AI outputs
"Please select your preferred language: English, Spanish, or French""I need your name, email, and phone number""Choose an option: Option A, Option B, or Option C""Your request has been processed successfully"
aui/
├── backend/
│ ├── api/
│ │ └── routes.py # API endpoints
│ ├── schemas/
│ │ ├── models.py # Request/Response models
│ │ └── ui_schemas.py # UI type schemas
│ ├── ai/
│ │ ├── client.py # Ollama client
│ │ └── prompt.py # System prompts
│ ├── services/
│ │ └── hydration.py # Core hydration logic
│ ├── main.py # FastAPI app
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── Renderer.tsx
│ │ │ └── renderer/
│ │ │ ├── FormRenderer.tsx
│ │ │ ├── ChoiceRenderer.tsx
│ │ │ └── ConfirmationRenderer.tsx
│ │ ├── types/
│ │ │ ├── api.ts
│ │ │ └── ui.ts
│ │ ├── services/
│ │ │ └── api.ts
│ │ ├── App.tsx
│ │ └── main.tsx
│ ├── package.json
│ └── Dockerfile
├── docker-compose.yml
└── README.md
Backend:
OLLAMA_API_KEY: Ollama Cloud API key (required for cloud usage)OLLAMA_BASE_URL: Ollama Cloud endpoint URL (required when using cloud, e.g.,https://api.ollama.ai)OLLAMA_MODEL: Model name to use (default: mistral:7b-instruct-q4_0)
Frontend:
VITE_API_BASE_URL: Backend API URL (default: http://localhost:8000)
- No persistence: All state is in-memory
- No authentication: MVP scope excludes auth
- Confidence threshold: UI schemas with confidence < 0.6 fallback to confirmation
- Strict validation: Pydantic models enforce schema correctness
- Error handling: Graceful fallbacks to confirmation UI on errors
If the backend can't connect to Ollama Cloud:
Common issues:
-
Missing API Key: Ensure
OLLAMA_API_KEYis set in your environment or.envfile# Check if set: echo $OLLAMA_API_KEY # Or in .env file: OLLAMA_API_KEY=your_api_key_here
-
Invalid API Key: Verify your API key is correct at https://ollama.com
- Check the health endpoint:
curl http://localhost:8000/api/health - Look for authentication errors in backend logs
- Check the health endpoint:
-
Network Issues: Ensure your server can reach Ollama Cloud
- Check internet connectivity
- Verify no firewall is blocking outbound HTTPS connections
-
Check Backend Logs:
docker-compose logs -f backend
Look for authentication or connection errors
If you see "404 Not Found" errors when calling the API:
- Model not available: Verify the model name is correct
- Default:
mistral:7b-instruct-q4_0 - Check available models via the model switcher in the UI or
/api/modelsendpoint
- Default:
- Verify model name: Ensure
OLLAMA_MODELmatches an available model in Ollama Cloud
If frontend can't connect to backend:
- Check backend CORS settings in
main.py - Verify
VITE_API_BASE_URLmatches backend URL - Ensure backend is running on the expected port
If you get model errors:
- Check available models: Use the model switcher in the UI or call
/api/modelsendpoint - Verify model name: Ensure the model exists in Ollama Cloud
- Switch model: Use the model switcher or set
OLLAMA_MODELenvironment variable
The system includes several performance optimizations:
- Quantized Model: Default uses
mistral:7b-instruct-q4_0(4-bit quantization, 3-5x faster than full precision) - Optimized Parameters:
num_predict: 512- Limits response tokens (JSON responses are typically <200 tokens)temperature: 0.1- Lower temperature for faster, more deterministic responsestop_p: 0.9- Nucleus sampling for faster generationrepeat_penalty: 1.1- Prevents repetition
- Auto-Warmup: Model automatically warms up on backend startup
- Optimized Prompt: Reduced system prompt from 56 to ~30 lines with examples
- Retry Logic: Single retry on timeout (model may be loading)
- Reduced Timeout: 60 seconds (down from 180s) - with optimizations, responses typically complete in 5-15s
- Before optimizations: 180s timeout, frequent failures
- After optimizations: 5-15s typical response time, <30s worst case
- Speed improvement: 10-20x faster with quantized model + optimizations
- Reliability: 95%+ success rate vs previous timeout failures
The default model is mistral:7b-instruct-q4_0 (quantized). You can change it via environment variable:
Fast Models (Recommended):
mistral:7b-instruct-q4_0(default) - 4-bit, ~4GB, 3-5x faster, best balancemistral:7b-instruct-q8_0- 8-bit, slightly larger but faster than q4_0llama3.2:1b- 1B params, very fast, excellent for structured JSON tasksphi3:mini- 3.8B params, optimized for speed
Full Precision (Slower):
mistral:7b-instruct- Full 7B model, slower but potentially more accurate
To change model:
# In docker-compose.yml or .env
OLLAMA_MODEL=mistral:7b-instruct-q4_0Or use the model switcher in the UI to switch models dynamically. With Ollama Cloud, models are available instantly - no need to pull them.
If you see timeout errors or slow responses:
- Model automatically warms up: The backend automatically warms up the model on startup
- Check model is quantized: Ensure you're using a quantized model (q4_0 or q8_0)
- Check Ollama resources: Ensure Ollama has enough CPU/RAM allocated
- First request may be slower: Model loads into memory on first use (should be <10s with quantized model)
- Timeout is 60 seconds: With optimizations, responses should complete in 5-15s
- Try a smaller model: If timeouts persist, try
llama3.2:1bfor very fast responses
The backend is configured with:
- Connection timeout: 10 seconds
- Read timeout: 60 seconds (optimized from 180s)
- Write timeout: 30 seconds
- Auto-retry: Single retry on timeout
Tip: The model automatically warms up on backend startup, so the first request should be fast. If you restart Ollama, the backend will re-warmup on next startup.
- Only 3 UI types supported (form, choice, confirmation)
- No authentication
- No persistence
- No theming
- No plugin system
- Ollama Cloud API key required
- Additional UI types
- Persistence layer
- Authentication
- Multiple AI model support
- Enhanced error handling
- Automated testing
MIT
This is an MVP. For production use, consider:
- Adding authentication
- Implementing persistence
- Enhanced error handling
- Comprehensive testing
- Performance optimization