We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
https://kipp.ly/transformer-taxonomy/ https://kipp.ly/transformer-inference-arithmetic/
Transformer: A Novel Neural Network Architecture for Language Understanding
The Illustrated Transformer
Transformer模型详解(图解最完整版)
详解Transformer (Attention Is All You Need)
The Annotated Transformer
Introduction to Attention Mechanism
The Transformer Attention Mechanism
Transformers Explained Visually - Overview of Functionality
Transformers Explained Visually - How it works, step-by-step
Transformers Explained Visually - Multi-head Attention, deep dive
分析transformer模型的参数量、计算量、中间激活、KV cache
为什么现在的LLM都是Decoder-only的架构?
https://github.com/NVIDIA/FasterTransformer
NVIDIA BERT推理解决方案Faster Transformer开源啦
英伟达Fastertransformer源码解读
https://github.com/Dao-AILab/flash-attention
https://github.com/ggerganov/llama.cpp
llama.cpp源码解析--CUDA流程版本
笔记:Llama.cpp 代码浅析(一):并行机制与KVCache
笔记:Llama.cpp 代码浅析(二)
笔记:Llama.cpp 代码浅析(三):计算开销
Tech Reading
Memory Tips
Linux Disk
Linux Network Tips
Linux Network Tools
Linux kernel
Linker & Loader
Advanced C and CPP Compiling Notes
Software Testing
Tech Tools
DevOps
Performance
Vim
C++
Lua
Python Tips
Parallel Programming
Embedded Linux
Devboard
FPGA
Android
nginx
Build Tools
MySQL
Image Processing
Point Cloud Processing
robotics
Deep Learning
Math
Game