Image2text Prompt Generator

README Translation

English
Simplified Chinese
Japanese

introduce

Prompt generator

Supports parsing prompt descriptions from images, and can be extended based on descriptions for secondary image generation. Support Chinese throughChatGLMExtend the Prompt description.

✅ Models used in this project

All models are lazy loaded, downloaded and loaded only when used, and will not occupy video memory.

graphic text
Wen Shengwen
- stable diffusion
  - Ar4ikov/gpt2-650k-stable-diffusion-prompt-generator
- midjourney
  - succinctly/text2image-prompt-generator
- universal
  - Drishti Sharma/Stable Diffusion-Prompt-Generator-GPT-Neo-125M
  - microsoft/Promptist
Chinese extensionChatGLM-6B
translate
- Helsinki-NLP/opus-mt-en-zh
- Helsinki-NLP/opus-mt-zh-en

🚩 This project exists independently and is not integrated intoautomatic111/webui, which is convenient to close at any time to save video memory.

online demohug face demo
Graphics and text functions require GPU deployment
Some models use CPU (translation, Wen Shengwen) to prevent GPU memory overflow
supportstable diffusionandmidjourneytwopromptGeneration method
useChatGlam-6B-Net4save video memory

One key package

Baidu cloud disk download

The ChatGLM model needs to be downloaded separately (download the int4 version), and put it under the models directory of the program

v1.0Extraction code: 79sk
v1.5Extraction code: eb33
v1.8Extract code: 7hbt
offline modelExtraction code: 6ti4

starting program

webui.batThe main function
webui_chat.batMain function +chatGLM chat interface
webui_imagetools.batimage processing tool
webui_offline.batuse offline mode
- Revisesettings.offline.tomlinside the model path
- Modelgit clonearrivemodelsDirectory (cannot be copied directly from cache)
webui_venv.batInstall it manuallyvenvEnvironment, start with this, defaultvenvTable of contents.
The first run will automatically download the model, and the default download is in the user directory.cache/huggingface

main

starting program

webui.batThe main function
webui_chat.batMain function +chatGLM chat interface
webui_imagetools.batimage processing tool
webui_offline.batuse offline mode
- Revisesettings.offline.tomlinside the model path
- Modelgit clonearrivemodelsDirectory (cannot be copied directly from cache)
webui_venv.batInstall it manuallyvenvEnvironment, start with this, defaultvenvTable of contents.
The first run will automatically download the model, and the default download is in the user directory.cache/huggingface

update program

  cd image2text_prompt_generator
  git pull

orgithubPackage and download zip, overwrite the program directory

Configuration and use

使用方法

prompt optimization model

mircosoftGenerate a simple description (stable diffusion)
mjGenerate a random description (midjourney)
gpt2 650kandgpt_neo_125Mgenerate more complex descriptions

Wen Shengwen

Chinese to English translation
Chinese passChatGlam-6B-Net4extended to complex description
translate to english
Optimize model generation through prompt

graphic text

clip is used for multiple people, complex scenes, high video memory usage (>8G)
blip for simple characters and scenes
wd14 for figures
Prompt generation will automatically merge blip or clip + wd14

image processing tool

Batch buckle background
paste face (for refining clothes)
Buckle up
Batch rename (regular)
Tagging (Clip+W14 tagging and translation)

chatglm generate

hardware requirements

quantization level	Minimum GPU memory(reasoning)	Minimum GPU memory(Efficient parameter fine-tuning)
FP16 (no quantization)	13 GB	14 GB
INT8	8 GB	9 GB
INT4	6 GB	7 GB

browser plug-in

fromchatGPTBoxProject, modify some prompt words

useapi.batstart up
configurationchatGPTBoxPlugins for custom modelshttp://localhost:8000/chat/completions
existreleaseDownload the plugin inside
Modified plugin

Browser loads the plug-in

limit

not supportcuda, it is not recommended to use clip
Video memory <6G, it is not recommended to use ChatGLM

配置文件

configuration file

settings.toml

[server]
port = 7869 # 端口
host = '127.0.0.1' # 局域网访问需要改成 "0.0.0.0"
enable_queue = true # chat功能需要开启，如错误，需要关闭代理
queue_size = 10
show_api = false
debug = true

[chatglm]
model = "THUDM/chatglm-6b-int4" # THUDM/chatglm-6b-int4 THUDM/chatglm-6b-int8 THUDM/chatglm-6b

# 本地模型
# model = "./models/chatglm-6b-int8" 

device = "cuda" # cpu mps cuda
enable_chat = false # 是否启用聊天功能
local_files_only = false # 是否只使用本地模型

offline model

Please refer toChatGLM loads the model locallyModelgit clonearrivemodelsdirectory (not directly fromcachecopy), then modify thesettings-offline.tomlinside the model path

The windows path is best to use an absolute path, do not contain Chinese
linux/mac paths can use relative paths
Model Directory Structure Reference

settings-offline.toml

[generator]
enable = true # 是否启用generator功能
device = "cuda" # cpu mps cuda
fix_sd_prompt = true # 是否修复sd prompt
# models
microsoft_model = "./Promptist"
gpt2_650k_model = "./gpt2-650k-stable-diffusion-prompt-generator"
gpt_neo_125m_model = "./StableDiffusion-Prompt-Generator-GPT-Neo-125M"
mj_model = "./text2image-prompt-generator"
local_files_only = true # 是否只使用本地模型


[translate]
enable = true # 是否启用翻译功能
device = "cuda" # cpu mps cuda
local_files_only = true # 是否只使用本地模型
zh2en_model = "./models/opus-mt-zh-en"
en2zh_model = "./models/opus-mt-en-zh"

cache_dir = "./data/translate_cache" # 翻译缓存目录

[chatglm]
# 本地模型 https://github.com/THUDM/ChatGLM-6B#从本地加载模型
model = ".\\models\\chatglm-6b-int4" # ./chatglm-6b-int4 ./chatglm-6b-int8 ./chatglm-6b
## windows 绝对路径配置方法
# model = "E:\\zhangsan\\models\\chatglm-6b-int4" 
device = "cuda" # cpu mps cuda
enable_chat = true # 是否启用聊天功能
local_files_only = true # 是否只使用本地模型

hg cache configuration

To prevent the c drive from being full, it can be configuredcachedirectory to another disk

手动安装

manual installation

First, make sure your computer has thePython3.10. If you have not installed Python, go to the official site (https://www.python.org/downloads/) to download and install the latest version ofPython3.10. Next, download and unzip our tools installation package. Open the command line window (Windows users can press Win + R keys, enter "cmd" in the run box and press Enter to open the command line window), and enter the directory where the tool installation package is located. Enter the following command in a command line window to install the required dependencies:

git clone https://github.com/zhongpei/image2text_prompt_generator
cd image2text_prompt_generator

# 建立虚拟环境
python -m "venv" venv
# 激活环境 linux & mac 
./venv/bin/activate
# 激活环境 windows
.\venv\Scripts\activate


# gpu 加速
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

pip install --upgrade -r requirements.txt

This will automatically install the required Python dependencies. Once installed, you can start the tool by running:

# 激活环境 linux & mac
source ./venv/bin/activate
# 激活环境 windows
.\venv\Scripts\activate

# 运行程序
python app.py

This will launch the tool and open the tool's home page in your browser. If your browser does not open automatically, please manually enter the following URL: http://localhost:7869/ The tools are now successfully installed and started. You can follow the tool's documentation to start using it to process your image data.

Update information

v2.0 LangChain (local file question and answer)
v1.8 labeling tool
v1.7 translate local tag cache, translation cache, API
v1.6 picture tools
v1.5 add chatGLM model
v1.0 add webui

plan

web
configuration file
image2text
- clip
- blip
- wd14
text2text
- ChatGLM
- gpt2 650k
- gpt_neo_125M
- mj
cutout tool
- cut background
- pick people's heads
- Covering people's faces
- Modify file names in batches
- Load catalog tags and translate
translate
- f2m, f2f
- WD14 tags translation local cache
- translation cache
Label
- clip + w14 mixed batch image tags
LangChain
- index
- question and answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.en.md

README.en.md

Image2text Prompt Generator

README Translation

introduce

One key package

Baidu cloud disk download

starting program

starting program

update program

Configuration and use

prompt optimization model

Wen Shengwen

graphic text

image processing tool

chatglm generate

hardware requirements

browser plug-in

Browser loads the plug-in

limit

configuration file

offline model

hg cache configuration

manual installation

Update information

plan

Files

README.en.md

Latest commit

History

README.en.md

File metadata and controls

Image2text Prompt Generator

README Translation

introduce

One key package

Baidu cloud disk download

starting program

starting program

update program

Configuration and use

prompt optimization model

Wen Shengwen

graphic text

image processing tool

chatglm generate

hardware requirements

browser plug-in

Browser loads the plug-in

limit

configuration file

offline model

hg cache configuration

manual installation

Update information

plan