Skip to content

LLM Large Language Model

Shuai YUAN edited this page Apr 30, 2024 · 13 revisions

Models

https://kipp.ly/transformer-taxonomy/ https://kipp.ly/transformer-inference-arithmetic/

Transformer

Position Encoding

multi-head

swin-transformer

transformer performance optimization

KV cache

Flash Decoding

llama

Frameworks

llama.cpp

llama2.c

Optimization

Table of Contents


Clone this wiki locally