My project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.
-
Updated
Sep 20, 2024 - Python
My project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
[ECCV 2024 Oral] Official implementation of the paper "PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers"
An introduction to attention mechanisms and the vision transformer
State-of-the-art CLIP-like models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.
MIST: A simple, scalable, and end-to-end framework for 3D medical imaging segmentation.
[ICML 2024] Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Scripts and trained models from our paper: M. Ntrougkas, N. Gkalelis, V. Mezaris, "T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers", IEEE Access, 2024. DOI:10.1109/ACCESS.2024.3405788.
This repository contains implementations of prominent computer vision deep learning architectures. The focus is on simplifying these architectures while relying solely on the PyTorch library. The goal is to provide accessible and streamlined versions of key models in the field.
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"
RESTful API for vector similarity search. It uses the Python web framework FastAPI. This accelerates machine learning workflows that require vector similarity search using foundational models.
TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Adaptive Vision Transformer for efficient image classification, implementing dynamic token sparsification to reduce computational costs while maintaining accuracy.
Extract clean markdown from PDFs, URLs, Word docs, slides, videos, and more, ready for any LLM. ⚡
CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models
Add a description, image, and links to the vision-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-transformer topic, visit your repo's landing page and select "manage topics."