➡ Jump to installation and usage ⬅
Your Intuitive Platform for Yielding, Annotating, and Processing or yipyap
for short is a web application for uploading, browsing and managing image, audio and video dataset directories with caption support, generating and caching thumbnails, running various tagging and captioning models, editing dataset configuration and sample prompts, built with Python and SolidJS.
The frontend of yipyap is built with SolidJS, a reactive JavaScript framework that emphasizes fine-grained reactivity and performance, using Vite as the build tool for fast development and optimized production builds. The application follows a component-based architecture with a central app context managing global state. The main entry point is /src/main.tsx
, which sets up the app context and error boundaries, while the routing configuration is defined in /src/router.ts
. The core application state management resides in /src/contexts/app.tsx
, which handles theme management, locale/translation management, settings persistence, notification system, and various feature flags and configurations.
Components are organized in feature-based directories under /src/components
, with CSS modules or shared stylesheets for styling. Global styles are defined in /src/styles.css
, while theme-specific styles are in /src/themes.css
. All tests are centralized in the /src/test/__tests__/
directory, organized by functionality including component tests, context tests, utility tests, and internationalization tests.
- yipyap
- Browse directories with breadcrumbs
- View images with thumbnails and captions
- Search and sort files easily (TODO)
- Support for multiple caption formats
- View and edit image metadata
- Keyboard shortcuts
- Zoom and pan smoothly (Experimental toggle)
- Navigate with minimap (Experimental toggle)
- Drag and drop files to upload, with progress tracking
- Upload entire folders at once
- Perform batch operations
- Quick folder navigation
- Add captions and tags
- Generate captions automatically
- Auto-save your changes
- Beautiful tag colors that match your theme
- Edit multiple files at once (TODO)
-
Available in multiple languages
-
Right-to-left support
-
Locale-aware formatting
-
Browse and organize your image collection with an intuitive web interface
-
Powerful search capabilities with tag filtering and smart suggestions
-
Batch operations for moving, deleting, and organizing images efficiently
-
Advanced image editing with cropping, rotation, and format conversion
-
Smart captioning with multiple AI model support (JTP2, WDv3, Florence-2)
-
Tagging system with autocomplete and color-coded tags
-
Bounding box labeling with object detection models including Florence-2
-
Thumbnail generation and preview optimization
-
Responsive design that works on desktop and mobile devices
YipYap includes a sophisticated bounding box editor with support for multiple detection models:
- YOLO-based models for traditional object detection
- Watermark Detection for identifying watermarks in images
- Florence-2 models with conversational AI interface
Florence-2 is a powerful vision-language model that supports conversational prompting. The bounding box editor includes:
- Multiple model variants: Base, Large, Fine-tuned versions, and specialized models like PromptGen, SD3 Captioner, Flux Large, etc.
- Conversational interface: Ask questions about images using natural language
- Task-based detection: Object detection, dense captions, region proposals, and more
- Custom prompts: Override predefined tasks with your own questions
- Advanced generation settings: Control token limits, beam search, and sampling
microsoft/Florence-2-base
- Base model (0.23B parameters)microsoft/Florence-2-large
- Large model (0.77B parameters)microsoft/Florence-2-base-ft
- Fine-tuned base modelmicrosoft/Florence-2-large-ft
- Fine-tuned large modelHuggingFaceM4/Florence-2-DocVQA
- Document VQA specializedMiaoshouAI/Florence-2-base-PromptGen-v1.5
- Prompt generationMiaoshouAI/Florence-2-large-PromptGen-v1.5
- Large prompt generationthwri/CogFlorence-2.2-Large
- Enhanced capabilitiesgokaygokay/Florence-2-SD3-Captioner
- SD3 style captionsgokaygokay/Florence-2-Flux-Large
- Flux style captionsNikshepShetty/Florence-2-pixelpros
- Pixel-level understanding
You can also use custom Florence-2 models by providing:
- Local paths: Point to locally downloaded model files
- HuggingFace model IDs: Any compatible Florence-2 model from HuggingFace Hub
The system automatically detects Florence-2 models and provides the appropriate conversational interface.
Requirement: Python >=3.9
-
Download the latest release (right sidebar on github, download
yipyap-vx.y.z.zip
, not the source code) and unzip it. -
In the decompressed yipyap folder, create a virtual environement and install dependencies:
-
On Linux:
python -m venv venv ./venv/bin/pip install -r requirements.txt
-
On Windows
python -m venv venv .\venv\Scripts\pip install -r requirements.txt
Note for Windows Users: If you encounter an error about
libmagic
not being found, run this additional command:.\venv\Scripts\pip install python-magic-bin ``` -->
-
-
Run the server:
-
On Linux:
ROOT_DIR=/path/to/your/images ./venv/bin/uvicorn app.main:app
-
On Windows (PowerShell):
$env:ROOT_DIR="C:\path\to\your\images" .\venv\Scripts\uvicorn app.main:app
-
The application will be available at http://localhost:8000
.
Use --port 8000
to set the server port, for custom server configuration refer to uvicorn documentations.
- Navigate to
http://localhost:8000
to start browsing the current working directory. - Use the controls at the top to:
- Search for files
- Switch between grid and list views
- Sort items by name, date, or size
- Click on images to view them in full size and edit captions.
- Navigate directories using the breadcrumb trail or directory links.
Requirements: python and node.
-
Clone the repository:
git clone https://github.com/rakki194/yipyap cd yipyap
-
In the decompressed yipyap folder, create a virtual environement and install dependencies
-
On Linux:
python -m venv venv ./venv/bin/pip install -r requirements.txt
-
On Windows:
python -m venv venv .\venv\Scripts\pip install -r requirements.txt
-
-
Run the development servers:
-
On Linux:
ROOT_DIR=/path/to/your/images ./venv/bin/uvicorn app.main:app
-
On Windows (PowerShell):
$env:ROOT_DIR="C:\path\to\your\images" .\venv\Scripts\uvicorn app.main:app
-
This last step will:
- Install npm dependencies if needed
- Start the Vite dev server (port 1984), serving the frontend and proxying api calls to the backend.
- Start the FastAPI backend (port 1985)
- Enable hot reload for both frontend and backend
You can now open your browser to http://localhost:1984
ENVIRONMENT
: Set to "development" or "production" (default: "development")RELOAD
: Enable hot reload, "true" or "false" (default: "true" in development)ROOT_DIR
: Root directory for images (default: current directory)DEV_PORT
: HTTP port for the Vite server, serving the frontend and proxying the backend api (default 1984)BACKEND_PORT
: HTTP port for the backend api (defaultDEV_PORT+1
)
yipyap/
├── app/ # Backend application
│ ├── __init__.py # Package initialization
│ ├── main.py # FastAPI application and routes
│ ├── image_handler.py # Image processing and directory scanning
│ ├── caption_handler.py # Caption file management
│ ├── utils.py # Utility functions
├── src/ # Frontend application
│ ├── components/ # Additional components
│ ├── composables/ # SolidJS composables (not hooks)
│ ├── contexts/ # SolidJS contexts
│ ├── i18n/ # Internationalization
│ ├── icons/ # Icon components
│ ├── pages/ # Application pages
│ ├── resources/ # Frontend data resources
│ ├── test/ # Test utilities and setup
│ ├── theme/ # Theme-related components
│ ├── utils/ # Utility functions
│ ├── directives.tsx # SolidJS directives
│ ├── main.tsx # Application entry point
│ ├── models.ts # Data models
│ ├── router.ts # Routing configuration
│ ├── styles.css # Global styles
│ ├── themes.css # Theme-specific styles
│ ├── types.d.ts # TypeScript declarations
│ └── utils.ts # Shared utilities
├── package.json # Frontend dependencies and scripts
├── tsconfig.json # TypeScript configuration
├── vite.config.ts # Vite configuration
-
Frontend Architecture
- Entry point in
src/main.tsx
with app context setup - Global state management in
src/contexts/app.tsx
- Component-based architecture with both capitalized and lowercase component directories
- Composables for reusable reactive logic in
src/composables/
- Comprehensive i18n support in
src/i18n/
- Entry point in
-
Testing Infrastructure
- Centralized test utilities in
src/test/
- Test setup and configuration in
src/test/setup.ts
- Custom test hooks in
src/test/test-hooks.ts
- Test utilities in
src/test/test-utils.ts
- Centralized test utilities in
-
Styling System
- Global styles in
src/styles.css
- Theme-specific styles in
src/themes.css
- Theme components in
src/theme/
- Icon components in
src/icons/
- Global styles in
-
Backend Integration
- FastAPI routes in
app/main.py
- Image processing in
app/image_handler.py
- Caption management in
app/caption_handler.py
- FastAPI routes in
This project is licensed under the MIT License. See the LICENSE.md
file for details.
If you encounter any issues or have questions, feel free to open an issue on the GitHub repository.
The backend is built with FastAPI and provides a comprehensive API for image management and caption generation. It uses a layered architecture with the following components:
-
FastAPI Application (
app/main.py
)- HTTP endpoint definitions
- Request/response handling
- Development/Production mode configuration
- Static file serving
- SPA support
-
Data Access Layer (
app/data_access.py
)- SQLite-based caching system
- File system operations
- Image processing and thumbnail generation
- Caption file management
-
Caption Generation (
app/caption_generation/
)- Modular caption generator system
- Support for multiple ML models
- Async generation with error handling
- Model configuration management
-
Utility Layer (
app/utils.py
)- Path resolution and validation
- Security checks
- Helper functions
The directory browsing system provides efficient access to image collections through pagination support and cache-aware responses. It leverages If-Modified-Since handling and asynchronous directory scanning to optimize performance when browsing large datasets. The system intelligently manages directory listings to provide fast access while minimizing server load.
Image processing capabilities include automatic generation of thumbnails at 300x300 pixels and previews at 1024x1024 pixels, with WebP format optimization for reduced file sizes. The system handles color space management to ensure consistent image quality across different formats and display conditions. Security features protect against path traversal attacks while providing proper error handling and logging, with separate development and production modes for enhanced safety.
The caching and caption management systems work together to provide a robust media handling solution. Captions are supported in multiple formats including plain text .caption
, .txt
, and comma-separated .tags
and .wd
files, with automatic generation capabilities and priority-based ordering. The editor also supports .e621
with a custom JSON editor. You are also able to edit the sample-prompts.txt
for your dataset with a custom GUI and the configuration files with a .toml
, .yaml
, .json
or even .ini
files supported text editor. The SQLite-based metadata cache stores thumbnail references and directory listings, with intelligent cache invalidation to maintain data freshness. Batch operations are supported for efficient processing of multiple files, while the permission validation system ensures proper access control.
The testing infrastructure is centralized in the src/test
directory and consists of:
-
Test Setup (
setup.ts
)- Test environment configuration
- Global test utilities and helpers
- Mock data and fixtures
-
Test Hooks (
test-hooks.ts
)- Custom test hooks for component testing
- State management utilities for tests
- Mock context providers
-
Test Utilities (
test-utils.ts
)- Helper functions for testing
- Common test patterns
- Type definitions for testing
-
Test Configuration (
tsconfig.json
)- TypeScript configuration specific to tests
- Path mappings and compiler options
All tests should use these shared utilities to maintain consistency and reduce code duplication. The test infrastructure is designed to work seamlessly with the SolidJS testing utilities and supports both unit and integration tests.