LLM Large Language Model

Models

https://kipp.ly/transformer-taxonomy/ https://kipp.ly/transformer-inference-arithmetic/

Transformer

Position Encoding

multi-head

swin-transformer

https://github.com/microsoft/Swin-Transformer

transformer performance optimization

KV cache

大模型推理加速：看图学KV Cache

Flash Decoding

大模型推理加速之Flash Decoding：更小子任务提升并行度

llama

Llama 2详解

Frameworks

llama.cpp

llama2.c

https://github.com/karpathy/llama2.c

Optimization

Mastering LLM Techniques: Inference Optimization

profile: https://yszheda.github.io/
blog: https://yszheda.github.io/blog/

Tech Reading
- Programming Reading List
- CS Resources List
- Computer Architect Tips
- CG Reading List
- DL Reading List
- CA AQA Reading Notes
Memory Tips
Linux Disk
Linux Network Tips
Linux Network Tools
Linux kernel
Linker & Loader
Advanced C and CPP Compiling Notes
Software Testing
Tech Tools
- Bash
- awk tips
- strace
- ftrace
- gcc ld
- gdb
- ssh
- git tips
- svn tips
- Linux Network Tips
- Linux Network Tools
- Matlab Octave
- XeLaTeX
- Visual Studio
- SystemTap
- Doxygen
DevOps
- Linux DevOps
- Docker
- Kubernetes
- puppet
- Ansible
- Capistrano
- Chef
- Fabric
- Nix
- vagrant
- tmux
- Byobu
- PBS
- Slurm
- LSF
- nagios
- Graphviz DOT
Performance
- perf
- oProfile
- gprof
Vim
- Vim Tips
- PracticalVim
C++
- EMC++ Reading Notes
- EC++ Notes
- ESTL Notes
- CPP11 Tips
- C++ STL
- C++ C QA
- C++ Code Snippets
- C GotW
- google cppguide
Lua
- [Lua]debug
Python Tips
- Scrapy Tips
- pip
- Anaconda
Parallel Programming
- ARM NEON
- - ARM NEON Programmer’s Guide Reading Notes
- - ARM NEON tips
- - ARM tips
- GPGPU
- CUDA
- - CUDA Handbook Notes
- - CUDA Memory
- - CUDA Code Snippets
- - CUDA Reading List
- - CUDA Tools
- - CUDA Library
- - nvidia-docker
- OpenCL
- - OpenCL Tips
- TBB
- PPL
- RenderScript
- pthreads
Embedded Linux
- v4l2 (Video For Linux Two)
Devboard
- Firefly RK3288
- RK3399
- ALINX FPGA ZYNQ XC7Z 7010
- Intel Movidius
- Jetson TX1 TK1
FPGA
- Verilog
- FPGA Tools
Android
- adb tips
- Android NDK tips
- Android tips
- Android Java
- Android Tools
- Android Performance
- jni
nginx
- OpenResty
Build Tools
- CMake
- SCons
- autotools autoconf automake
- qmake
MySQL
Image Processing
- OpenCV
Point Cloud Processing
- pcl
robotics
- A Mathematical Introduction to Robotic Manipulation
- Introduction to Robotics Mechanics and Control Notes
- Manipulation
- - Collision Detection
- - - Real Time Collision Detection Notes
- industrial robot driver
Deep Learning
- Caffe
- TensorFlow
Math
- Algebra
- Statistics
- matrix multiplication algorithm
- Matlab Octave
- floating point
- Convolution
Game
- A star
- cocos2d-x
- - [cocos2dx]Debug
- - [cocos2dx]profile
- - [cocos2dx]ETC
- - [cocos2dx]shader
- - [cocos2dx]outline
- - [cocos2dx]优化

ClassicalMusic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Large Language Model

Models

Transformer

Position Encoding

multi-head

swin-transformer

transformer performance optimization

KV cache

Flash Decoding

llama

Frameworks

llama.cpp

llama2.c

Optimization

Table of Contents

Clone this wiki locally