📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
-
Updated
Mar 4, 2025
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
Light-field imaging application for plenoptic cameras
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
Light field geometry estimator for plenoptic cameras
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
🍺 CLI for quickly generating citations for websites and books
Provided is a Google Apps Script that's soul purpose is to help make MLA writing easier
In this section, predicting the energy efficiency of buildings with machine learning algorithms.
A application that helps you create and manage citations for a research paper or other project. Named after personal adjudant and bodyguard to Cl. Mustang, Lt. Hawkeye in the Fullmetal Alchemist manga and anime series.
An APA citation helper website (without ads!)
A Simple Toolkit for Managing Schoolwork
Code examples from the Graphics, Touch, Sound and USB book ported to the PIC32Mikromedia board
Add a description, image, and links to the mla topic page so that developers can more easily learn about it.
To associate your repository with the mla topic, visit your repo's landing page and select "manage topics."