You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
P&T Treuno 125M (The model implements a highly scaled Custom Transformer (10,240 hidden dimension, 72 layers, 80 heads) designed strictly for ~125 Million Parameters. It utilizes Grouped Query Attention (GQA) and SwiGLU activations.)