pix2tex: Using a ViT to convert images of equations into LaTeX code.
-
Updated
Jul 5, 2024 - Python
pix2tex: Using a ViT to convert images of equations into LaTeX code.
🤖 PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer"
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.
A ViT based transformer applied on multi-channel time-series EEG data for motor imagery classification
Vision Transformer using TensorFlow 2.0
Implementation of transformers based architecture in PyTorch.
Mimix: A Text Generation Tool and Pretrained Chinese Models
Simple and easy to understand PyTorch implementation of Vision Transformer (ViT) from scratch with detailed steps. Tested on small datasets: MNIST, FashionMNIST, SVHN, CIFAR10, and CIFAR100.
reproduction of semantic segmentation using masked autoencoder (mae)
Training ImageNet / CIFAR models with sota strategies and fancy techniques such as ViT, KD, Rep, etc.
Add a description, image, and links to the vit topic page so that developers can more easily learn about it.
To associate your repository with the vit topic, visit your repo's landing page and select "manage topics."