Vision
Arbitrary-steps Image Super-resolution via Diffusion Inversion (CVPR 2025)
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Ultimate camera streaming application with support RTSP, RTMP, HTTP-FLV, WebRTC, MSE, HLS, MP4, MJPEG, HomeKit, FFmpeg, etc.
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Implementation of yolo v11 in c++ std 17 over opencv and onnxruntime
使用Opencv中的DNN模块对YOLOv8的所有类型模型,YOLOV9目标检测模型,YOLO11全系列模型进行了推理
Multi-Object Tracking with Ultralytics YOLO11
The YOLOv11 C++ TensorRT Project in C++ and optimized using NVIDIA TensorRT
NVR with realtime local object detection for IP cameras
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Witness the aha moment of VLM with less than $3.
Frontier Multimodal Foundation Models for Image and Video Understanding
Vision infrastructure to turn complex documents into RAG/LLM-ready data
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
This repository offers a TensorFlow-based anomaly detection system for cell images using adversarial autoencoders, capable of identifying anomalies even in contaminated datasets. Check out our code…
Train and test image anomaly detection models with Anomalib. Examples on a custom dataset
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
Anomaly detection on images using features from pretrained neural networks.
PyTorch implementation of "Sub-Image Anomaly Detection with Deep Pyramid Correspondences"
A Vision Transformer Network for Image Anomaly Detection and Localization
SimpleNet: A Simple Network for Image Anomaly Detection and Localization
YOLOv12: Attention-Centric Real-Time Object Detectors
YOLOv12 Inference Using CPP and ONNX Runtime
OCR & Document Extraction using vision models