# BEAST: B-Spline Encoded Action Sequence Tokenizer

**Reference:** Zhou et al., 2506.06072v2, Jun 10 2025

---

BEAST is a new way to compress a series of actions into a smaller number of tokens for a robot. The tokenization based approach has seen positive feedback based on its efficacy in NLP, however applying it to robotics has not been straightforward. 

> Different from tokenizers based on the vector quantization [13 – 15], it does not require additional tokenizer training.
BEAST compresses action trajectories into fixed-length token sequences enabling efficient parallel
decoding for faster token generation, requiring 4 − 8× fewer tokens than binning-based tokenization

Downsides from previous methods for encoding (like vector quantization or binning):  
1. Need to train separate encoder-decoder networks (adds complexity).
2. Different length token sequences, even for actions of the same length - causes issues in decoding quickly.
3. Gaps between chunks - not smooth, flowing actions.
        
A previous method, FAST, encoded actions using DCT and byte pair encoding (can treat this as a black box). Importantly, this method produced varied length sequences. This was seen as an issue because this makes parallel decoding more difficult. 

BEAST has shown itself to be more computationally efficient as well as requiring less training than previous models. 
Late in the paper BEAST used with Florence 2 (a VLA) is compared against pi0 and pi0+FAST in LIBERO (a benchmarking set of tasks). Ended up performing marginally worse than pi0 (beat pi0+fast), and is a much smaller model with no pretraining. 


## Methodology

BEAST is built on B-Splines which, generally, are a way of creating curves on a graph using a few key points.
Comparison using clamped B-splines vs 

B-Splines are inherently related to action chunking because it is a piecewise function,  action chuncking is a 0th degree bspline

## Experiments:

1. What advantages does BEAST offer over commonly used binning-based tokenizers?

BEAST is significantly more compact and smooth.
- It uses 4–8× fewer tokens than binning methods.
- Instead of encoding every individual action step, BEAST encodes entire chunks of actions using B-spline control points
- This results in smoother motion and more natural transitions between chunks, unlike binning, which can produce jerky or abrupt movements.

Summary: BEAST summarizes motion efficiently and gracefully, whereas binning treats every tiny detail separately.

2. How does BEAST contribute to the performance on imitation learning benchmarks?

BEAST improves imitation learning performance.
- It matches or outperforms existing tokenizers on both simulated and real-world robot benchmarks.
- Its ability to encode longer and more coherent action sequences leads to better task success rates and model stability.
- Replacing other tokenizers with BEAST in existing models often leads to measurable improvements.

Summary: BEAST helps models learn how motion flows, not just what happens step by step.

3. How does BEAST affect the training and inference efficiency?

BEAST is faster and simpler to use.
- It requires no additional training (unlike methods based on vector quantization).
- BEAST directly uses mathematical B-spline fitting, avoiding the need for encoder-decoder tokenizers.
- It enables parallel decoding, which speeds up inference and makes the system more efficient.

Summary: BEAST reduces system complexity and speeds up inference without sacrificing performance.

4. Does BEAST generalize to real-world scenarios?

Yes — BEAST generalizes well to real robots.
- It has been successfully tested on physical robots in tasks involving manipulation and object interaction.
- The motions it generates are smooth and continuous, making it suitable for real environments.
- BEAST performs well even in previously unseen settings and action types.



5. How do the design choices affect the performance of BEAST?

BEAST’s performance depends on spline configuration.
- Ablation studies show that the number of control points, the spline degree, and whether the spline is clamped all affect performance.
- Using too few control points can result in underfitting; too many can reduce efficiency.
- Spline degree controls how smooth the generated motion is and how well it generalizes.

Summary: BEAST is robust, but thoughtful tuning of its spline parameters yields the best results.