Skip to content

Latest commit

 

History

History
333 lines (238 loc) · 11.2 KB

README.en.md

File metadata and controls

333 lines (238 loc) · 11.2 KB

Image2text Prompt Generator

README Translation

introduce

Prompt generator

Supports parsing prompt descriptions from images, and can be extended based on descriptions for secondary image generation. Support Chinese throughChatGLMExtend the Prompt description.

✅ Models used in this project

All models are lazy loaded, downloaded and loaded only when used, and will not occupy video memory.

🚩 This project exists independently and is not integrated intoautomatic111/webui, which is convenient to close at any time to save video memory.

  • online demohug face demo
  • Graphics and text functions require GPU deployment
  • Some models use CPU (translation, Wen Shengwen) to prevent GPU memory overflow
  • supportstable diffusionandmidjourneytwopromptGeneration method
  • useChatGlam-6B-Net4save video memory

One key package

Baidu cloud disk download

The ChatGLM model needs to be downloaded separately (download the int4 version), and put it under the models directory of the program

starting program

  • webui.batThe main function
  • webui_chat.batMain function +chatGLM chat interface
  • webui_imagetools.batimage processing tool
  • webui_offline.batuse offline mode
    • Revisesettings.offline.tomlinside the model path
    • Modelgit clonearrivemodelsDirectory (cannot be copied directly from cache)
  • webui_venv.batInstall it manuallyvenvEnvironment, start with this, defaultvenvTable of contents.
  • The first run will automatically download the model, and the default download is in the user directory.cache/huggingface

    main

starting program

  • webui.batThe main function
  • webui_chat.batMain function +chatGLM chat interface
  • webui_imagetools.batimage processing tool
  • webui_offline.batuse offline mode
    • Revisesettings.offline.tomlinside the model path
    • Modelgit clonearrivemodelsDirectory (cannot be copied directly from cache)
  • webui_venv.batInstall it manuallyvenvEnvironment, start with this, defaultvenvTable of contents.
  • The first run will automatically download the model, and the default download is in the user directory.cache/huggingface

update program

  cd image2text_prompt_generator
  git pull

orgithubPackage and download zip, overwrite the program directory

Configuration and use

使用方法

prompt optimization model

  • mircosoftGenerate a simple description (stable diffusion)
  • mjGenerate a random description (midjourney)
  • gpt2 650kandgpt_neo_125Mgenerate more complex descriptions

img.png

Wen Shengwen

  • Chinese to English translation
  • Chinese passChatGlam-6B-Net4extended to complex description
  • translate to english
  • Optimize model generation through prompt

img.png

graphic text

  • clip is used for multiple people, complex scenes, high video memory usage (>8G)
  • blip for simple characters and scenes
  • wd14 for figures
  • Prompt generation will automatically merge blip or clip + wd14

img.png

image processing tool

  • Batch buckle background
  • paste face (for refining clothes)
  • Buckle up
  • Batch rename (regular)
  • Tagging (Clip+W14 tagging and translation)

img.pngimg.png

chatglm generate

hardware requirements

quantization level Minimum GPU memory(reasoning) Minimum GPU memory(Efficient parameter fine-tuning)
FP16 (no quantization) 13 GB 14 GB
INT8 8 GB 9 GB
INT4 6 GB 7 GB

img.png

browser plug-in

fromchatGPTBoxProject, modify some prompt words

  • useapi.batstart up

  • configurationchatGPTBoxPlugins for custom modelshttp://localhost:8000/chat/completions

  • existreleaseDownload the plugin inside

  • Modified plugin

Browser loads the plug-in

img.png

limit

  • not supportcuda, it is not recommended to use clip
  • Video memory <6G, it is not recommended to use ChatGLM
配置文件

configuration file

settings.toml

[server]
port = 7869 # 端口
host = '127.0.0.1' # 局域网访问需要改成 "0.0.0.0"
enable_queue = true # chat功能需要开启,如错误,需要关闭代理
queue_size = 10
show_api = false
debug = true

[chatglm]
model = "THUDM/chatglm-6b-int4" # THUDM/chatglm-6b-int4 THUDM/chatglm-6b-int8 THUDM/chatglm-6b

# 本地模型
# model = "./models/chatglm-6b-int8" 

device = "cuda" # cpu mps cuda
enable_chat = false # 是否启用聊天功能
local_files_only = false # 是否只使用本地模型

offline model

Please refer toChatGLM loads the model locallyModelgit clonearrivemodelsdirectory (not directly fromcachecopy), then modify thesettings-offline.tomlinside the model path

  • The windows path is best to use an absolute path, do not contain Chinese
  • linux/mac paths can use relative paths
  • Model Directory Structure Reference

img.png

settings-offline.toml

[generator]
enable = true # 是否启用generator功能
device = "cuda" # cpu mps cuda
fix_sd_prompt = true # 是否修复sd prompt
# models
microsoft_model = "./Promptist"
gpt2_650k_model = "./gpt2-650k-stable-diffusion-prompt-generator"
gpt_neo_125m_model = "./StableDiffusion-Prompt-Generator-GPT-Neo-125M"
mj_model = "./text2image-prompt-generator"
local_files_only = true # 是否只使用本地模型


[translate]
enable = true # 是否启用翻译功能
device = "cuda" # cpu mps cuda
local_files_only = true # 是否只使用本地模型
zh2en_model = "./models/opus-mt-zh-en"
en2zh_model = "./models/opus-mt-en-zh"

cache_dir = "./data/translate_cache" # 翻译缓存目录

[chatglm]
# 本地模型 https://github.com/THUDM/ChatGLM-6B#从本地加载模型
model = ".\\models\\chatglm-6b-int4" # ./chatglm-6b-int4 ./chatglm-6b-int8 ./chatglm-6b
## windows 绝对路径配置方法
# model = "E:\\zhangsan\\models\\chatglm-6b-int4" 
device = "cuda" # cpu mps cuda
enable_chat = true # 是否启用聊天功能
local_files_only = true # 是否只使用本地模型

hg cache configuration

To prevent the c drive from being full, it can be configuredcachedirectory to another disk

img.png

手动安装

manual installation

First, make sure your computer has thePython3.10. If you have not installed Python, go to the official site (https://www.python.org/downloads/) to download and install the latest version ofPython3.10. Next, download and unzip our tools installation package. Open the command line window (Windows users can press Win + R keys, enter "cmd" in the run box and press Enter to open the command line window), and enter the directory where the tool installation package is located. Enter the following command in a command line window to install the required dependencies:

git clone https://github.com/zhongpei/image2text_prompt_generator
cd image2text_prompt_generator

# 建立虚拟环境
python -m "venv" venv
# 激活环境 linux & mac 
./venv/bin/activate
# 激活环境 windows
.\venv\Scripts\activate


# gpu 加速
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

pip install --upgrade -r requirements.txt
  

This will automatically install the required Python dependencies. Once installed, you can start the tool by running:

# 激活环境 linux & mac
source ./venv/bin/activate
# 激活环境 windows
.\venv\Scripts\activate

# 运行程序
python app.py
    

This will launch the tool and open the tool's home page in your browser. If your browser does not open automatically, please manually enter the following URL: http://localhost:7869/ The tools are now successfully installed and started. You can follow the tool's documentation to start using it to process your image data.

Update information

  • v2.0 LangChain (local file question and answer)
  • v1.8 labeling tool
  • v1.7 translate local tag cache, translation cache, API
  • v1.6 picture tools
  • v1.5 add chatGLM model
  • v1.0 add webui

plan

  • web
  • configuration file
  • image2text
    • clip
    • blip
    • wd14
  • text2text
    • ChatGLM
    • gpt2 650k
    • gpt_neo_125M
    • mj
  • cutout tool
    • cut background
    • pick people's heads
    • Covering people's faces
    • Modify file names in batches
    • Load catalog tags and translate
  • translate
    • f2m, f2f
    • WD14 tags translation local cache
    • translation cache
  • Label
    • clip + w14 mixed batch image tags
  • LangChain
    • index
    • question and answer