A modular Spring Boot toolkit that bundles computer-vision and multimodal utilities such as detection, retrieval, and LLM integration.
- Overview
 - Repository Layout
 - Environment Setup
 - Quick Start
 - API Reference
 - Resources
 - Endpoint Flow Reference
 - Roadmap
 
JavaVisionMind is a collection of independent Spring Boot services that cover object detection, pose estimation, face recognition, person re-identification, text-based image retrieval, and large-language-model interactions. Each capability ships as a separate module so you can deploy only what you need.
| Module | Description | 
|---|---|
vision-mind-yolo-core | 
Core inference utilities for YOLOv11, FAST-SAM, pose estimation, and segmentation models. | 
vision-mind-yolo-app | 
REST facade that exposes the image-analysis capabilities from vision-mind-yolo-core. | 
vision-mind-ocr-core | 
PaddleOCR detector/recognizer/classifier pipeline reused by the OCR service. | 
vision-mind-ocr-app | 
REST wrapper that surfaces OCR results as JSON or annotated images. | 
vision-mind-ffe-app | 
Face feature extraction service including detection, alignment, similarity search, and index maintenance. | 
vision-mind-reid-app | 
Person re-identification workflows backed by Lucene for vector retrieval. | 
vision-mind-tbir-app | 
Text-Based Image Retrieval service built on CLIP embeddings plus Lucene vector search. | 
vision-mind-llm-core | 
Wrapper around OpenAI/Ollama style chat endpoints that powers multimodal prompts. | 
vision-mind-common | 
Shared DTOs, math helpers, and image/vector utilities. | 
vision-mind-test-sth | 
Scratchpad used for integration experiments and manual verification. | 
- 
Install JDK 17 and Maven 3.8+.
 - 
Download the required model bundles and OpenCV native runtime. Define the
VISION_MIND_PATHenvironment variable so every module can locate weights and.dll/.sofiles: - 
The model files have been uploaded to Alibaba Cloud Drive at https://www.alipan.com/s/ChvZFAKXUDp. Extraction code: 7i5y
# Windows PowerShell setx VISION_MIND_PATH "F:\\TestSth\\JavaVisionMind\\resource" # Linux / macOS shell export VISION_MIND_PATH=/opt/JavaVisionMind/resource
Expected structure:
${VISION_MIND_PATH} |-- lib |-- opencv |-- opencv_java490.dll # Windows `-- libopencv_java490.so # Linux - 
Verify the JVM can load
opencv_java490for your OS (the services auto-pick.dllor.so). - 
Download
resource.7zfrom the project release page, extract it to the repository root so that model files sit alongside the modules (for exampleresource/yolo/model/yolo.onnx). 
mvn clean install -DskipTests- YOLO API: 
mvn -pl vision-mind-yolo-app spring-boot:run - OCR service: 
mvn -pl vision-mind-ocr-app spring-boot:run - Face feature service: 
mvn -pl vision-mind-ffe-app spring-boot:run - Person re-identification: 
mvn -pl vision-mind-reid-app spring-boot:run - Text-based image retrieval: 
mvn -pl vision-mind-tbir-app spring-boot:run - LLM chat facade: 
mvn -pl vision-mind-llm-core spring-boot:run 
Each service uses /api as the context root. Default ports can be overridden in the respective application.properties.
vision-mind-ffe-app,vision-mind-reid-app, andvision-mind-tbir-appexpose avector.store.modeswitch.- Set to 
lucene(default) to persist vectors on disk,memoryto use the embedded chroma store, orelasticsearchto back vectors with an external ES cluster. - The Elasticsearch mode shares full-dimension embeddings; only the Lucene backend applies the ReID projection matrix.
 
The OCR stack combines vision-mind-ocr-core (engine orchestration) and vision-mind-ocr-app (REST facade), wrapping PaddleOCR into ONNX runtimes with optional post-processing.
- Switch between lite and ex detector/recognizer pairs with the 
detectionLevelflag (liteby default,exfor higher accuracy). - Choose the reconstruction strategy via 
plan, or call/detectWithSRand/detectWithLLMfor semantic or LLM-driven text refinement. - Request JPEG overlays from 
/detectIor/detectWithLLMIto visualise polygons and fine-tuned spans. - Ensure 
VISION_MIND_PATHpoints to the OCR ONNX bundle and dictionary so both engines initialise correctly. 
curl -X POST http://localhost:17006/vision-mind-ocr/api/v1/ocr/detect \
  -H "Content-Type: application/json" \
  -d '{ "imgUrl": "https://example.com/receipt.jpg", "detectionLevel": "lite" }'The response is wrapped in HttpResult<List<OcrDetectionResult>>, where each detection includes polygon coordinates, recognised text, and confidence.
Below tables outline the primary REST endpoints exposed by each runnable module. HttpResult<T> denotes the project-wide response wrapper containing success, message, and data fields.
| Method | Path | Description | Request Body | Response | 
|---|---|---|---|---|
| POST | /api/v1/img/detect | 
Run object detection within optional include/exclude polygons. | DetectionRequestWithArea JSON (imgUrl, threshold?, types?, detectionFrames?, blockingFrames?) | 
HttpResult<List<Box>> | 
| POST | /api/v1/img/detectI | 
Same as above but returns the annotated image. | DetectionRequestWithArea | 
image/jpeg bytes | 
| POST | /api/v1/img/detectFace | 
Detect faces in given regions. | DetectionRequestWithArea | 
HttpResult<List<Box>> | 
| POST | /api/v1/img/detectFaceI | 
Face detection with inline visualization. | DetectionRequestWithArea | 
image/jpeg bytes | 
| POST | /api/v1/img/pose | 
Human pose estimation. | DetectionRequestWithArea | 
HttpResult<List<BoxWithKeypoints>> | 
| POST | /api/v1/img/poseI | 
Pose estimation with skeleton overlay. | DetectionRequestWithArea | 
image/jpeg bytes | 
| POST | /api/v1/img/sam | 
FAST-SAM segmentation, returns bounding boxes. | DetectionRequest (imgUrl, threshold?, types?) | 
HttpResult<List<Box>> | 
| POST | /api/v1/img/samI | 
FAST-SAM segmentation visualization. | DetectionRequest | 
image/jpeg bytes | 
| POST | /api/v1/img/seg | 
YOLO segmentation output with masks. | DetectionRequestWithArea | 
HttpResult<List<SegDetection>> | 
| POST | /api/v1/img/segI | 
Segmentation visualization. | DetectionRequestWithArea | 
image/jpeg bytes | 
| Method | Path | Description | Request Body | Response | 
|---|---|---|---|---|
| POST | /api/v1/ocr/detect | 
Run PaddleOCR text detection/recognition with switchable lite (det/rec.onnx) or ex (det2/rec2.onnx) models across the full image. | 
OcrDetectionRequest (detectionLevel?, imgUrl) | 
HttpResult<List<OcrDetectionResult>> | 
| POST | /api/v1/ocr/detectI | 
Same as above but streams the annotated image. | OcrDetectionRequest (detectionLevel?, imgUrl) | 
image/jpeg bytes | 
| POST | /api/v1/ocr/detectWithSR | 
Applies the semantic reconstruction decoder to smooth noisy OCR output. | OcrDetectionRequest (detectionLevel?, plan?, imgUrl) | 
HttpResult<String> | 
| POST | /api/v1/ocr/detectWithLLM | 
Feeds detections through the LLM prompt for higher-level reasoning. | OcrDetectionRequest (detectionLevel?, plan?, imgUrl) | 
HttpResult<String> | 
| POST | /api/v1/ocr/detectWithLLMI | 
Returns an LLM-refined overlay image with polygon annotations. | OcrDetectionRequest (detectionLevel?, plan?, imgUrl) | 
image/jpeg bytes | 
| Method | Path | Description | Request Body | Response | 
|---|---|---|---|---|
| POST | /api/v1/face/computeFaceVector | 
Detect faces and return embeddings without persisting. | InputWithUrl (imgUrl, groupId?, faceScoreThreshold?) | 
HttpResult<FaceImage> | 
| POST | /api/v1/face/saveFaceVector | 
Persist an externally computed face vector. | Input4Save (imgUrl, groupId, id, embeds) | 
HttpResult<Void> | 
| POST | /api/v1/face/computeAndSaveFaceVector | 
Detect faces, store high-quality embeddings, and return inserted items. | InputWithUrl | 
HttpResult<List<FaceInfo4Add>> | 
| POST | /api/v1/face/deleteFace | 
Remove a stored face vector by document ID. | Input4Del (id) | 
HttpResult<Void> | 
| POST | /api/v1/face/findMostSimilarFace | 
Search the index with a probe image. | Input4Search (imgUrl, groupId?, faceScoreThreshold?, confidenceThreshold?) | 
HttpResult<List<FaceInfo4Search>> | 
| POST | /api/v1/face/findMostSimilarFaceI | 
Retrieve the best match preview image. | Input4Search | 
image/jpeg bytes | 
| POST | /api/v1/face/calculateSimilarity | 
Compare two image URLs using cosine similarity. | Input4Compare (imgUrl, imgUrl2) | 
HttpResult<Double> | 
| POST | /api/v1/face/findSave | 
Search first; if nothing matches insert the face into the index. | Input4Search | 
HttpResult<FaceInfo4SearchAdd> | 
| Method | Path | Description | Request Body | Response | 
|---|---|---|---|---|
| POST | /api/v1/reid/feature/single | 
Extract a single body feature vector. | JSON map { "imgUrl": "..." } | 
HttpResult<Feature> | 
| POST | /api/v1/reid/feature/calculateSimilarity | 
Compare two person crops. | JSON map { "imgUrl1", "imgUrl2" } | 
HttpResult<Float> | 
| POST | /api/v1/reid/feature/multi | 
Detect multiple persons and return vectors for each. | JSON map { "imgUrl": "..." } | 
HttpResult<List<Feature>> | 
| POST | /api/v1/reid/store/single | 
Extract and store a feature with metadata. | JSON map { "imgUrl", "cameraId?", "humanId?" } | 
HttpResult<Feature> | 
| POST | /api/v1/reid/search | 
Search the gallery by image. | JSON map { "imgUrl", "cameraId?", "topN", "threshold" } | 
HttpResult<List<Human>> | 
| POST | /api/v1/reid/searchOrStore | 
Single-cover workflow: search first, otherwise insert. | JSON map { "imgUrl", "threshold" } | 
HttpResult<Human> | 
| POST | /api/v1/reid/associateStore | 
Multi-cover workflow: always store the probe and link to the match. | JSON map { "imgUrl", "threshold" } | 
HttpResult<Human> | 
| Method | Path | Description | Request Body | Response | 
|---|---|---|---|---|
| POST | /api/v1/tbir/saveImg | 
Ingest an image: detect, augment, vectorize, and index. | SaveImageRequest (imgUrl, imgId?, cameraId?, groupId?, meta?, threshold?, types?) | 
HttpResult<ImageSaveResult> | 
| POST | /api/v1/tbir/deleteImg | 
Remove an image and its variants from the index. | DeleteImageRequest (imgId) | 
HttpResult<Void> | 
| POST | /api/v1/tbir/searchImg | 
Retrieve metadata by stored image ID. | SearchImageRequest (imgId) | 
HttpResult<SearchResult> | 
| POST | /api/v1/tbir/searchImgI | 
Render bounding boxes for search results of an image ID. | SearchImageRequest | 
image/jpeg bytes | 
| POST | /api/v1/tbir/search | 
Text-to-image retrieval. | SearchRequest (query, cameraId?, groupId?, topN?) | 
HttpResult<SearchResult> | 
| POST | /api/v1/tbir/searchI | 
Text-to-image retrieval with visualization. | SearchRequest | 
image/jpeg bytes | 
| POST | /api/v1/tbir/imgSearch | 
Image-to-image search via multipart upload. | multipart/form-data (image, topN) | 
HttpResult<SearchResult> | 
DTO quick reference
SaveImageRequestextendsDetectionRequestWithArea, adding optionalimgId,cameraId,groupId, and arbitrary metadata map.SearchResultwraps a list ofHitImageentries (image URL, boxes, score, metadata).HitImageretains matched sub-boxes for visualization endpoints.
| Method | Path | Description | Request Body | Response | 
|---|---|---|---|---|
| POST | /api/translate | 
Prompt the configured LLM to translate Chinese text to English. | Message (message, optional img) | 
Plain text | 
| POST | /api/chat | 
Free-form chat completion. | Message (message) | 
Plain text | 
| POST | /api/chatWithImg | 
Multimodal chat using an image URL/base64 plus prompt. | Message (message, img) | 
Plain text | 
JavaVisionMind.postman_collection.json(repository root) provides ready-to-run Postman/Apifox requests for every endpoint.- Model configuration lives under each module鈥檚 
src/main/resources/application*.propertiesfor per-service tuning. 
- LLaMA deployment support with streaming responses.
 - Alternative in-memory vector backends alongside Lucene.
 - YOLO video-stream processing pipeline resurrection in 
vision-mind-yolo-core. 
- Controller validates imgUrl and logs before delegating (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:45).
 - ImgAnalysisService.detectArea downloads the image into an OpenCV Mat (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:70).
 - analysis runs YOLOv11 inference, maps raw outputs to Box objects, and filters by requested class IDs (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:121).
 - Detections must overlap include polygons and avoid block polygons according to the configured ratios before they are returned (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:82).
 - Remaining boxes are wrapped in HttpResult and returned (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:60).
 
- Controller repeats validation and timing (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:70).
 - detectAreaI renders the image as BufferedImage and reuses detectArea (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:110).
 - Include/block frames and boxes are drawn over the image before the controller streams JPEG bytes (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:80).
 
- Controller checks the payload (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:99).
 - ImgAnalysisService.detectFace runs the face-trained YOLO model (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:213).
 - Polygon filtering is applied identically to generic detections (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:220).
 - Boxes are returned to the controller for response wrapping (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:112).
 
- Validation mirrors the JSON endpoint (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:118).
 - detectFaceI draws bounding boxes plus include/exclude frames and returns the annotated image (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:253).
 - Controller streams the JPEG bytes (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:128).
 
- Controller validates payload and logs (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:147).
 - poseArea invokes the YOLOv11 pose model and filters polygons (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:148).
 - Filtered BoxWithKeypoints are returned (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:160).
 
- Controller handles validation (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:173).
 - poseAreaI reuses poseArea, draws skeleton overlays, and returns a BufferedImage (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:187).
 - Controller streams JPEG (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:183).
 
- Controller validates and passes through (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:197).
 - sam executes FastSAM segmentation and returns boxes (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:279).
 
- Controller validates (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:216).
 - samI draws FastSAM boxes onto the image and returns annotated bytes (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:284).
 
- Controller checks payload and delegates (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:260).
 - segArea runs segmentation and returns per-class polygons (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:294).
 
- Controller forwards to the service (vision-mind-yolo-app/src/main/java/com/yuqiangdede/yolo/controller/ImgAnalysisController.java:238).
 - segAreaI draws segmentation polygons on the original image and returns them (vision-mind-yolo-core/src/main/java/com/yuqiangdede/yolo/service/ImgAnalysisService.java:299).
 
- Controller validates input, logs timing, and delegates to the service (vision-mind-ocr-app/src/main/java/com/yuqiangdede/ocr/controller/OcrController.java:30).
 OcrService.detectroutes the request into the shared inference pipeline (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:93).runInferencedownloads the image, selects the light/heavy engine, executes PaddleOCR, and applies include/exclude polygons (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:115).- Area-filtered detections are returned to the controller for wrapping (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:146).
 
- Controller invokes the overlay variant and prepares HTTP headers (vision-mind-ocr-app/src/main/java/com/yuqiangdede/ocr/controller/OcrController.java:47).
 detectWithOverlayBytesreusesdetectWithOverlayand encodes the annotated image as JPEG (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:107).detectWithOverlaydraws OCR polygons plus include/exclude frames prior to returning (vision-mind-ocr-core/src/main/java/com/yuqiangdede/ocr/service/OcrService.java:98).
- Controller validates imgUrl and logs (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:60).
 - FaceService.computeFaceVector extracts faces and embeddings (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:142).
 - getFaceInfos strips base64 payloads before returning (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:154).
 
- Controller demands vector info (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:78).
 - saveFaceVector persists embeddings with FfeVectorStoreUtil.add (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:95).
 
- Controller validates payload (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:96).
 - computeAndSaveFaceVector filters faces by the requested threshold, stores qualifying embeddings, and returns the trimmed list (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:77).
 
- Controller checks document ID (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:118).
 - delete removes the Lucene record (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:105).
 
- Controller validates thresholds (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:135).
 - findMostSimilarFace runs extraction, filters by quality, and executes a Lucene top-1 search (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:116).
 
- Controller repeats validation (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:153).
 - The controller streams the top match image returned by the service (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:163).
 
- Controller ensures two URLs (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:186).
 - calculateSimilarity extracts both embeddings, normalizes them, and computes cosine similarity (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:177).
 
- Controller validates the request (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/controller/FaceController.java:212).
 - findSave searches for each quality face, inserting any misses and returning both found and added items (vision-mind-ffe-app/src/main/java/com/yuqiangdede/ffe/service/FaceService.java:197).
 
- Controller validates request (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:23).
 - featureSingle embeds the probe and tags it with a UUID (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:75).
 
- Controller checks both URLs (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:39).
 - calculateSimilarity embeds both probes and computes cosine similarity (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:82).
 
- Controller validates payload (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:56).
 - featureMulti runs YOLO detection via ImgAnalysisService.detectArea, crops each person, embeds them, and returns the list (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:89).
 
- Controller enforces required IDs (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:72).
 - storeSingle embeds the probe, assigns a UUID, and stores using ReidVectorStoreUtil.add (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:109).
 
- Controller validates imgUrl, topN, and threshold (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:106).
 - search embeds the probe and queries Lucene for matching humans with optional camera scoping (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:117).
 
- Controller validates payload (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:125).
 - searchOrStore returns the best match or persists a new feature when none is found (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:123).
 
- Controller validates request (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/controller/ReidController.java:142).
 - associateStore searches for an existing match and always persists the new embedding, linking it to the matched human if available (vision-mind-reid-app/src/main/java/com/yuqiangdede/reid/service/ReidService.java:138).
 
- Controller validates payload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:46).
 - saveImg generates or reuses imgId, optionally collects YOLO/FastSAM detections, crops and augments regions, embeds both main and sub-images with CLIP, and persists embeddings with metadata (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:61).
 
- Controller checks imgId (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:66).
 - deleteImg validates the identifier, invokes the vector store deletion, and records execution time (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:167).
 
- Controller validates (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:82).
 - searchImg collects Lucene hits by stored ID and merges them into HitImage DTOs (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:321).
 
- Controller validates payload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:98).
 - searchImgI reuses searchImg, downloads matched images, draws boxes, and returns buffered previews (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:331).
 
- Controller validates query text (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:124).
 - searchByText expands prompts via LLM, embeds each with CLIP, queries Lucene, merges hits through getFinalList, and returns ranked HitImage results (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:182).
 
- Controller validates and delegates (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:143).
 - searchByTextI draws matched boxes on each result image for preview streaming (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:285).
 
- Controller accepts multipart upload (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/controller/TbirController.java:170).
 - imgSearch embeds the probe image, queries Lucene, and returns ranked matches (vision-mind-tbir-app/src/main/java/com/yuqiangdede/tbir/service/TbirService.java:302).
 
- Controller applies a translation prompt wrapper and delegates (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:23).
 - LLMService.chat validates input and routes to OpenAI or Ollama, throwing if neither is configured (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:22).
 
- Controller forwards the free-form prompt (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:39).
 - LLMService.chat handles provider selection as above (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:22).
 
- Controller validates text and optional image (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/controller/ChatController.java:50).
 - chatWithImg enforces payload completeness, injects a default system prompt if needed, and calls the configured OpenAI vision endpoint (vision-mind-llm-core/src/main/java/com/yuqiangdede/llm/service/LLMService.java:49).
 
Contributions and issue reports are welcome.
The following directions can extend the current toolkit and may serve as inspiration for upcoming releases:
- Multi-object tracking (MOT): Integrate trackers such as DeepSORT or ByteTrack within 
vision-mind-yolo-coreand pair them with detection outputs to provide cross-frame trajectories for security patrols or pedestrian-path analytics. - Fine-grained attribute recognition: Add attribute classifiers for pedestrians, faces, or vehicles (e.g., gender, clothing color, license-plate region) so that vector indexes can support richer filtering.
 - Video structuring pipeline: Build a batch video ingestion service that runs detection, segmentation, and re-identification on key frames, then archives the structured results for large-scale video libraries or case investigations.
 - Cross-camera association: Combine the existing re-identification stack with spatiotemporal constraints to correlate identities across camera feeds and trigger rule-based alerts.
 - Richer multimodal interactions: Extend 
vision-mind-llm-corewith image captioning, visual question answering (VQA), or prompt-template management to improve multimodal Q&A use cases. - Model management & observability: Provide unified model versioning, hot swapping, and inference performance dashboards to streamline operating multiple models in production.