lmmtoolkit is a toolkit for Multi-Modal Learning
-
Updated
Nov 21, 2023 - Python
lmmtoolkit is a toolkit for Multi-Modal Learning
A PyTorch implementation of "TextFuseNet: Scene Text Detection with Richer Fused Features".
To Fuse Semantic and Positional Clues with Cross-Attention for Scene Text Recognition
Multi-Modal Image Generation for News Stories
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
A Light Neural Network To Control Stable Diffusion Spatial Information tuned by Chinese
A fine tune version of Stable Diffusion model on self-translate 10k diffusiondb Chinese Corpus and "extend" it
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
Add a description, image, and links to the text-image topic page so that developers can more easily learn about it.
To associate your repository with the text-image topic, visit your repo's landing page and select "manage topics."