vision-transformer

Here are 792 public repositories matching this topic...

dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated Jul 21, 2024
Python

Denis2054 / Transformers-for-NLP-and-Computer-Vision-3rd-Edition

Star

Transformers 3rd Edition

Updated Jul 21, 2024
Jupyter Notebook

emcf / thepipe

Star

Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡

pdf web scrapers multimodal vision-transformer gpt-4 large-language-models gpt-4o

Updated Jul 21, 2024
Python

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

image translation deep-learning neural-network gpu text machine-translation cuda transformer lstm seq2seq sequence-to-sequence tensor encoder-decoder attention-model transformer-encoder transformer-architecture vision-transformer

Updated Jul 20, 2024
C#

YShokrollahi / vit-transformers-tf

Star

This package provides an implementation of the Vision Transformer (ViT) in TensorFlow.

computer-vision transformers transformer vit vision-transformer

Updated Jul 20, 2024
Python

SalvatoreRa / tutorial

Star

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)

python nlp data-science machine-learning natural-language-processing image bioinformatics tutorial r computer-vision deep-learning graph biology tutorials artificial-intelligence convolutional-neural-networks streamlit streamlit-webapp vision-transformer

Updated Jul 20, 2024
Jupyter Notebook

ItsNotRohit02 / AI-Enabled-Sign-Language-Communication-System

Star

The AI Enabled Sign Language System is a Streamlit app that detects, classifies, and translates Indian Sign Language (ISL) using custom-trained YOLOv8 and Vision Transformer (ViT) models. It supports real-time image capture, multi-language text translation, and text-to-speech conversion, enhancing accessibility and communication for ISL users.

python machine-learning text-to-speech deep-learning image-classification object-detection text-translation sign-language-recognition vision-transformer yolov8

Updated Jul 20, 2024
Jupyter Notebook

afondiel / computer-vision-challenge

Star

This is a series of computer vision foundational projects that anyone diving into the field must tackle.

computer-vision image-processing image-classification image-generation image-detection computer-vision-algorithms computer-vision-tools computer-vision-opencv computer-vision-datasets vision-models vision-transformer computer-vision-python computer-vision-projects computer-vision-hello-world cv-challenge computer-vision-challenge

Updated Jul 19, 2024
Jupyter Notebook

YShokrollahi / swin-transformers

Star

Implementaion of swin transdormer network using tenforflow

transformer vision-transformer swin-transformer

Updated Jul 19, 2024
Jupyter Notebook

SunzeY / AlphaCLIP

Star

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

machine-learning deep-learning vision-and-language vision-language vision-transformer vision-language-model

Updated Jul 19, 2024
Jupyter Notebook

s-chh / PyTorch-Scratch-Vision-Transformer-ViT

Star

Simplified Pytorch implementation of Vision Transformer (ViT) for small datasets like MNIST, FashionMNIST, SVHN and CIFAR10.

simple transformer scratch vit vision-transformer vit-mnist transformer-mnist pytorch-vit vit-scratch vit-fashionmnist vit-svhn transformer-cifar10 vit-cifar10 vit-cifar vit-simple

Updated Jul 19, 2024
Python

NVlabs / MambaVision

Star

Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

deep-learning image-classification mamba visual-recognition self-attention hybrid-models vision-transformer foundation-models

Updated Jul 18, 2024
Python

google-research / scenic

Star

Scenic: A Jax Library for Computer Vision Research and Beyond

research computer-vision deep-learning transformers attention jax vision-transformer

Updated Jul 18, 2024
Python

Oneflow-Inc / libai

Star

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

nlp deep-learning transformer large-scale data-parallelism model-parallelism distributed-training self-supervised-learning oneflow pipeline-parallelism vision-transformer

Updated Jul 18, 2024
Python

alibaba / EasyCV

Star

An all-in-one toolkit for computer vision

computer-vision transformers pytorch classification object-detection self-supervised-learning vision-transformer

Updated Jul 18, 2024
Python

mist-medical / MIST

Star

MIST: A simple, scalable, and end-to-end framework for 3D medical imaging segmentation.

deep-learning pytorch medical-imaging convolutional-neural-networks image-segmentation unet vision-transformer attention-unet nnunet 3d-medical-imaging-segmentation unetr

Updated Jul 18, 2024
Python

Apsurt / omni-geo-ai

Star

Omni Geoguessr AI: A Vision Transformer AI integrated with Geoguessr for automated geographic location prediction and gameplay using streetview panoramas.