Image captioning utility scripts for preparing image-caption datasets. Uses LM Studio API with any vision model.
- Python 3.8+
- Install LM Studio LM Studio
- Download any vision model from LM Studio, i.e. Gemma 4, Qwen 3.5, etc.
- In
caption.pyunderpayload:model, changeyour-vision-modelto model name you want to use. - Go to LM Studio developer section and enable API.

- Default API URL is http://localhost:1234, we are connecting to this API URL in our script.
- Put your image dataset in
imagesfolder. - Run the enumeration script to automatically rename images to
000.jpg,001.jpg,002.jpg, etc.
python enumerate.py
- Run
caption.pyscript to auto caption your images.
python caption.py
- All captions will be saved in single file
result.txt. Single file because it helps in editing multiple captions at once. I am using VSCode for editing. - You can write your own input prompt in
caption.py. - Default input prompt: Write caption for this image under 100 words. Focus on subject and environment. Avoid speculation. Don't be poetic be precise.
- After you are done editing captions, Run
chop.pyscript. This will split captions into individual files.
python chop.py
- All captions will be saved in
captions/folder. - Combine
captionsandimagesfolder to create final dataset and you are good to go.
If this project saved you time and effort, consider supporting me on Ko-Fi.