yipyap

Introduction

Your Intuitive Platform for Yielding, Annotating, and Processing or yipyap for short is a web application for uploading, browsing and managing image, audio and video dataset directories with caption support, generating and caching thumbnails, running various tagging and captioning models, editing dataset configuration and sample prompts, built with Python and SolidJS.

The frontend of yipyap is built with SolidJS, a reactive JavaScript framework that emphasizes fine-grained reactivity and performance, using Vite as the build tool for fast development and optimized production builds. The application follows a component-based architecture with a central app context managing global state. The main entry point is /src/main.tsx, which sets up the app context and error boundaries, while the routing configuration is defined in /src/router.ts. The core application state management resides in /src/contexts/app.tsx, which handles theme management, locale/translation management, settings persistence, notification system, and various feature flags and configurations.

Components are organized in feature-based directories under /src/components, with CSS modules or shared stylesheets for styling. Global styles are defined in /src/styles.css, while theme-specific styles are in /src/themes.css. All tests are centralized in the /src/test/__tests__/ directory, organized by functionality including component tests, context tests, utility tests, and internationalization tests.

Features

Core Features

Browse directories with breadcrumbs
View images with thumbnails and captions
Search and sort files easily (TODO)

Image Viewing

Support for multiple caption formats
View and edit image metadata
Keyboard shortcuts
Zoom and pan smoothly (Experimental toggle)
Navigate with minimap (Experimental toggle)

File Management

Drag and drop files to upload, with progress tracking
Upload entire folders at once
Perform batch operations
Quick folder navigation

Captions & Tags

Add captions and tags
Generate captions automatically
Auto-save your changes
Beautiful tag colors that match your theme
Edit multiple files at once (TODO)

Languages

Available in multiple languages
Right-to-left support
Locale-aware formatting
Browse and organize your image collection with an intuitive web interface
Powerful search capabilities with tag filtering and smart suggestions
Batch operations for moving, deleting, and organizing images efficiently
Advanced image editing with cropping, rotation, and format conversion
Smart captioning with multiple AI model support (JTP2, WDv3, Florence-2)
Tagging system with autocomplete and color-coded tags
Bounding box labeling with object detection models including Florence-2
Thumbnail generation and preview optimization
Responsive design that works on desktop and mobile devices

Object Detection and Bounding Box Support

YipYap includes a sophisticated bounding box editor with support for multiple detection models:

Supported Models

YOLO-based models for traditional object detection
Watermark Detection for identifying watermarks in images
Florence-2 models with conversational AI interface

Florence-2 Integration

Florence-2 is a powerful vision-language model that supports conversational prompting. The bounding box editor includes:

Multiple model variants: Base, Large, Fine-tuned versions, and specialized models like PromptGen, SD3 Captioner, Flux Large, etc.
Conversational interface: Ask questions about images using natural language
Task-based detection: Object detection, dense captions, region proposals, and more
Custom prompts: Override predefined tasks with your own questions
Advanced generation settings: Control token limits, beam search, and sampling

Available Florence-2 Models

microsoft/Florence-2-base - Base model (0.23B parameters)
microsoft/Florence-2-large - Large model (0.77B parameters)
microsoft/Florence-2-base-ft - Fine-tuned base model
microsoft/Florence-2-large-ft - Fine-tuned large model
HuggingFaceM4/Florence-2-DocVQA - Document VQA specialized
MiaoshouAI/Florence-2-base-PromptGen-v1.5 - Prompt generation
MiaoshouAI/Florence-2-large-PromptGen-v1.5 - Large prompt generation
thwri/CogFlorence-2.2-Large - Enhanced capabilities
gokaygokay/Florence-2-SD3-Captioner - SD3 style captions
gokaygokay/Florence-2-Flux-Large - Flux style captions
NikshepShetty/Florence-2-pixelpros - Pixel-level understanding

Custom Model Support

You can also use custom Florence-2 models by providing:

Local paths: Point to locally downloaded model files
HuggingFace model IDs: Any compatible Florence-2 model from HuggingFace Hub

The system automatically detects Florence-2 models and provides the appropriate conversational interface.

Installation

Requirement: Python >=3.9

Download the latest release (right sidebar on github, download yipyap-vx.y.z.zip, not the source code) and unzip it.
In the decompressed yipyap folder, create a virtual environement and install dependencies:
- On Linux:
```
python -m venv venv
./venv/bin/pip install -r requirements.txt
```
- On Windows
```
python -m venv venv
.\venv\Scripts\pip install -r requirements.txt
```
Note for Windows Users: If you encounter an error about libmagic not being found, run this additional command:
```
.\venv\Scripts\pip install python-magic-bin
``` -->
```

Run the server:

On Linux:

ROOT_DIR=/path/to/your/images ./venv/bin/uvicorn app.main:app

On Windows (PowerShell):

$env:ROOT_DIR="C:\path\to\your\images"
.\venv\Scripts\uvicorn app.main:app

The application will be available at http://localhost:8000.

Use --port 8000 to set the server port, for custom server configuration refer to uvicorn documentations.

Usage

Navigate to http://localhost:8000 to start browsing the current working directory.
Use the controls at the top to:
- Search for files
- Switch between grid and list views
- Sort items by name, date, or size
Click on images to view them in full size and edit captions.
Navigate directories using the breadcrumb trail or directory links.

Development

Requirements: python and node.

Clone the repository:

git clone https://github.com/rakki194/yipyap
cd yipyap

In the decompressed yipyap folder, create a virtual environement and install dependencies

On Linux:

python -m venv venv
./venv/bin/pip install -r requirements.txt

On Windows:

python -m venv venv
.\venv\Scripts\pip install -r requirements.txt

Run the development servers:

On Linux:

ROOT_DIR=/path/to/your/images ./venv/bin/uvicorn app.main:app

On Windows (PowerShell):

$env:ROOT_DIR="C:\path\to\your\images"
.\venv\Scripts\uvicorn app.main:app

This last step will:

Install npm dependencies if needed
Start the Vite dev server (port 1984), serving the frontend and proxying api calls to the backend.
Start the FastAPI backend (port 1985)
Enable hot reload for both frontend and backend

You can now open your browser to http://localhost:1984

Environment Variables

ENVIRONMENT: Set to "development" or "production" (default: "development")
RELOAD: Enable hot reload, "true" or "false" (default: "true" in development)
ROOT_DIR: Root directory for images (default: current directory)
DEV_PORT: HTTP port for the Vite server, serving the frontend and proxying the backend api (default 1984)
BACKEND_PORT: HTTP port for the backend api (default DEV_PORT+1)

Developer Documentation

Project Structure

yipyap/
├── app/                     # Backend application
│   ├── __init__.py          # Package initialization
│   ├── main.py              # FastAPI application and routes
│   ├── image_handler.py     # Image processing and directory scanning
│   ├── caption_handler.py   # Caption file management
│   ├── utils.py             # Utility functions
├── src/                     # Frontend application
│   ├── components/          # Additional components
│   ├── composables/         # SolidJS composables (not hooks)
│   ├── contexts/            # SolidJS contexts
│   ├── i18n/                # Internationalization
│   ├── icons/               # Icon components
│   ├── pages/               # Application pages
│   ├── resources/           # Frontend data resources
│   ├── test/                # Test utilities and setup
│   ├── theme/               # Theme-related components
│   ├── utils/               # Utility functions
│   ├── directives.tsx       # SolidJS directives
│   ├── main.tsx             # Application entry point
│   ├── models.ts            # Data models
│   ├── router.ts            # Routing configuration
│   ├── styles.css           # Global styles
│   ├── themes.css           # Theme-specific styles
│   ├── types.d.ts           # TypeScript declarations
│   └── utils.ts             # Shared utilities
├── package.json             # Frontend dependencies and scripts
├── tsconfig.json            # TypeScript configuration
├── vite.config.ts           # Vite configuration

Key Components

Frontend Architecture
- Entry point in src/main.tsx with app context setup
- Global state management in src/contexts/app.tsx
- Component-based architecture with both capitalized and lowercase component directories
- Composables for reusable reactive logic in src/composables/
- Comprehensive i18n support in src/i18n/
Testing Infrastructure
- Centralized test utilities in src/test/
- Test setup and configuration in src/test/setup.ts
- Custom test hooks in src/test/test-hooks.ts
- Test utilities in src/test/test-utils.ts
Styling System
- Global styles in src/styles.css
- Theme-specific styles in src/themes.css
- Theme components in src/theme/
- Icon components in src/icons/
Backend Integration
- FastAPI routes in app/main.py
- Image processing in app/image_handler.py
- Caption management in app/caption_handler.py

License

This project is licensed under the MIT License. See the LICENSE.md file for details.

Acknowledgements

Getting Help

If you encounter any issues or have questions, feel free to open an issue on the GitHub repository.

Backend Architecture

The backend is built with FastAPI and provides a comprehensive API for image management and caption generation. It uses a layered architecture with the following components:

Core Components

FastAPI Application (app/main.py)
- HTTP endpoint definitions
- Request/response handling
- Development/Production mode configuration
- Static file serving
- SPA support
Data Access Layer (app/data_access.py)
- SQLite-based caching system
- File system operations
- Image processing and thumbnail generation
- Caption file management
Caption Generation (app/caption_generation/)
- Modular caption generator system
- Support for multiple ML models
- Async generation with error handling
- Model configuration management
Utility Layer (app/utils.py)
- Path resolution and validation
- Security checks
- Helper functions

Key Features

The directory browsing system provides efficient access to image collections through pagination support and cache-aware responses. It leverages If-Modified-Since handling and asynchronous directory scanning to optimize performance when browsing large datasets. The system intelligently manages directory listings to provide fast access while minimizing server load.

Image processing capabilities include automatic generation of thumbnails at 300x300 pixels and previews at 1024x1024 pixels, with WebP format optimization for reduced file sizes. The system handles color space management to ensure consistent image quality across different formats and display conditions. Security features protect against path traversal attacks while providing proper error handling and logging, with separate development and production modes for enhanced safety.

The caching and caption management systems work together to provide a robust media handling solution. Captions are supported in multiple formats including plain text .caption, .txt, and comma-separated .tags and .wd files, with automatic generation capabilities and priority-based ordering. The editor also supports .e621 with a custom JSON editor. You are also able to edit the sample-prompts.txt for your dataset with a custom GUI and the configuration files with a .toml, .yaml, .json or even .ini files supported text editor. The SQLite-based metadata cache stores thumbnail references and directory listings, with intelligent cache invalidation to maintain data freshness. Batch operations are supported for efficient processing of multiple files, while the permission validation system ensures proper access control.

Test Organization

The testing infrastructure is centralized in the src/test directory and consists of:

Test Setup (setup.ts)
- Test environment configuration
- Global test utilities and helpers
- Mock data and fixtures
Test Hooks (test-hooks.ts)
- Custom test hooks for component testing
- State management utilities for tests
- Mock context providers
Test Utilities (test-utils.ts)
- Helper functions for testing
- Common test patterns
- Type definitions for testing
Test Configuration (tsconfig.json)
- TypeScript configuration specific to tests
- Path mappings and compiler options

All tests should use these shared utilities to maintain consistency and reduce code duplication. The test infrastructure is designed to work seamlessly with the SolidJS testing utilities and supports both unit and integration tests.

Name		Name	Last commit message	Last commit date
Latest commit History 756 Commits
.github/workflows		.github/workflows
.vscode		.vscode
app		app
docs		docs
models		models
src		src
.cursorrules		.cursorrules
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.cpu		Dockerfile.cpu
LICENSE.md		LICENSE.md
README.md		README.md
config.json		config.json
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.cpu.txt		requirements.cpu.txt
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts
vitest.d.ts		vitest.d.ts

License

rakki194/yipyap

Folders and files

Latest commit

History

Repository files navigation

yipyap

Introduction

Table of Contents

Features

Core Features

Image Viewing

File Management

Captions & Tags

Languages

Object Detection and Bounding Box Support

Supported Models

Florence-2 Integration

Available Florence-2 Models

Custom Model Support

Installation

Usage

Development

Environment Variables

Developer Documentation

Project Structure

Key Components

License

Acknowledgements

Getting Help

Backend Architecture

Core Components

Key Features

Test Organization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages