This repository provides slim custom nodes to use FaceCLIP inside ComfyUI:
FaceCLIP Encode (Image+Text): Produces FaceCLIP joint embeddings from an aligned face image + text prompt.FaceCLIP SDXL Generate: Uses the FaceCLIP encoder + fine-tuned SDXL UNet to synthesize identity-preserving images from a face image and prompt.
Underlying research: bytedance/FaceCLIP (Apache 2.0). This slim package vendors only configs and node logic; it downloads required weights on demand.
comfyui_faceclip/
__init__.py
faceclip_node.py
generation_node.py
configs/
face_clip_l_14_config.yaml
face_clip_g_14_config.yaml
asset/
0001_female.png # example face (optional)
tests/
test_encode.py
test_generate.py
requirements.txt
LICENSE
README.md
From your ComfyUI root:
cd custom_nodes
# Clone your fork or this slim repo
git clone https://github.com/<your_user>/FaceCLIP-ComfyUI.git FaceCLIP-ComfyUI
cd FaceCLIP-ComfyUI
pip install -r requirements.txtFirst run downloads large checkpoints (FaceCLIP encoder ~3GB, open_clip weights ~1.7GB + ~10GB, UNet weights). Ensure sufficient disk & network.
This slim node package expects the original FaceCLIP code (the core/ module providing core.face_clip.face_clip) to be importable. You have two options:
- Clone original repo alongside slim nodes:
cd custom_nodes
git clone https://github.com/bytedance/FaceCLIP.git FaceCLIP
git clone https://github.com/<your_user>/FaceCLIP-ComfyUI.git FaceCLIP_ComfyUI # rename to avoid dash- Vendor required
core/subtree into this slim package (copy thecore/face_clipdirectory and any dependencies).
If you only clone the slim package, imports like from core.face_clip.face_clip import FaceCLIP_L_G_Wrapper will fail.
Inputs:
image(IMAGE): (B,H,W,3) aligned/cropped face batch.text(STRING): Prompt describing subject & scene.device: Must becuda(CPU unsupported currently).
Outputs:
faceclip_embeddings: Token embeddings combining L + bigG.faceclip_pooled: Pooled embedding.
Inputs:
face_image: (B,H,W,3) aligned face.prompt: Text prompt.negative_prompt: Extra negatives appended to built-in quality suppression list.num_images: Images per prompt.width,height: Output resolution (multiples of 8).seed: Random seed.steps: Diffusion steps (default 30).guidance: CFG scale (default 7.0).device:cudaonly.
Output:
images: Generated batch (N,H,W,3) float in [0,1].
Run inside the slim repo root (requires CUDA GPU):
python tests/test_encode.py
python tests/test_generate.pyIf CUDA missing, tests skip gracefully.
- Load a face image (use crop/alignment externally).
- Add
FaceCLIP Encode (Image+Text)OR directlyFaceCLIP SDXL Generate. - For generation, feed your face image and prompt, adjust steps/guidance.
- GPU with bfloat16 support (Ampere+ recommended) for SDXL node.
- VRAM: ≥16GB recommended for comfortable SDXL generation with FaceCLIP bigG branch.
- Disk: ≥20GB for checkpoints & open_clip weights.
- CUDA only (mixed precision & bfloat16 paths assumed).
- Multi-face batching not tuned; best with single face per prompt.
- No CPU fallback; implementing would require dtype & performance adjustments.
- Requires original FaceCLIP repo or vendored
core/code present for Python imports.
Original research & base code: https://github.com/bytedance/FaceCLIP
License: Apache 2.0 (see LICENSE). The model weights are governed by their original licenses; ensure compliance.
- Add CPU or FP32 fallback.
- Integrate FLUX / FaceT5 variant.
- Provide face alignment preprocessing node.
- Add image saving option directly in generation node.
PRs welcome for these improvements.