Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
-
Updated
Jul 21, 2024 - Python
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Transformers 3rd Edition
Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡
Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
This package provides an implementation of the Vision Transformer (ViT) in TensorFlow.
Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)
The AI Enabled Sign Language System is a Streamlit app that detects, classifies, and translates Indian Sign Language (ISL) using custom-trained YOLOv8 and Vision Transformer (ViT) models. It supports real-time image capture, multi-language text translation, and text-to-speech conversion, enhancing accessibility and communication for ISL users.
This is a series of computer vision foundational projects that anyone diving into the field must tackle.
Implementaion of swin transdormer network using tenforflow
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Simplified Pytorch implementation of Vision Transformer (ViT) for small datasets like MNIST, FashionMNIST, SVHN and CIFAR10.
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Scenic: A Jax Library for Computer Vision Research and Beyond
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
An all-in-one toolkit for computer vision
MIST: A simple, scalable, and end-to-end framework for 3D medical imaging segmentation.
Omni Geoguessr AI: A Vision Transformer AI integrated with Geoguessr for automated geographic location prediction and gameplay using streetview panoramas.
Implementation of V architecture with Vission Transformer for Image Segemntion Task
This is a warehouse for SegFormer-pytorch-model, can be used to train your image datasets for segmentation tasks.
The official implementation of paper: "Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning"
Add a description, image, and links to the vision-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-transformer topic, visit your repo's landing page and select "manage topics."