Stars
Improved sampling via learned diffusions (ICLR2024) and an optimal control perspective on diffusion-based generative modeling (TMLR2024)
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Official inference repo for FLUX.1 models
Utilities intended for use with Llama models.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
llama3 implementation one matrix multiplication at a time
Lightplane implements a highly memory-efficient differentiable radiance field renderer, and a module for unprojecting features from images to 3D grids.
A PyTorch native library for large model training
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Fast bare-bones BPE for modern tokenizer training
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Building blocks for foundation models.
A batched offline inference oriented version of segment-anything
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
Fast Implementation of Generalised Geodesic Distance Transform for CPU (OpenMP) and GPU (CUDA)
Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023
Fast and memory-efficient exact attention
Learnable latent embeddings for joint behavioral and neural analysis - Official implementation of CEBRA