NTK RoPE scaling. #115

alkeryn · 2023-06-29T13:38:08Z

According to this post, this is a method of rope scaling that result in less perplexity loss and a bigger possible scaling:
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

the code can be found in this notebook :
https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=b80b3f37

and the code for it seem to be a small change :

 #The method is just these three lines
    max_position_embeddings = 16384
    a = 8 #Alpha value
    base = base * a ** (dim / (dim-2)) #Base change formula

maybe it would be nice to add that option to exllama as well, with this technique finetuning for higher context may not even be necessary.

Panchovix · 2023-06-29T19:07:06Z

This sounds pretty good! But wondering how it would be implemented on exllama. compress_pos_emb is already a RoPE scaler.

There's

rotary_embedding_base

But it seems to be used for training purposes.

alkeryn · 2023-06-29T20:02:54Z

@Panchovix someone posted this code on 4chan, i haven't had the time to verify it as I'm on the move but maybe that's it.
https://boards.4chan.org/g/thread/94354163#p94356720

Panchovix · 2023-06-29T20:35:14Z

@alkeryn Thanks! It seems to work.

a = 4 # Similar to RoPE, higher is more perplex but more ctx
self.rotary_embedding_base = self.rotary_embedding_base * a ** (self.head_dim / (self.head_dim -2 ))

max_seq_len should be set as the same as you have to do with SuperHOT models (via -l)
Maybe it can be set like this on model.py:

self.alpha_value = 1.0 # Similar to RoPE, higher is more perplex but more ctx

And like this on model_init.py

parser.add_argument("-a", "--alpha", type = float, help = "alpha for context size extension via embedding extension")

...

if args.alpha:
    model_config.alpha_value = args.alpha # not exactly like this, but with this logic

Panchovix · 2023-06-29T22:26:46Z

Okay, I did a experimental PR to see if turbo wants to add it, or maybe testing it via other way.

#118

turboderp · 2023-06-29T22:28:40Z

I'd like to see some results from finetuning before I go and add even more config options. If I built out ExLlama every time someone had an interesting idea on reddit it'd be an unmaintainable behemoth by now. It's already kind of unwieldy.

laoda513 · 2023-06-30T06:27:55Z

Okay, I did a experimental PR to see if turbo wants to add it, or maybe testing it via other way.

#118

so for using this feature, we should first tune the model with lora or whatever first?
since exllama does not support turning now, should I first using auto-gptq lora ?

Panchovix · 2023-06-30T06:49:58Z

@laoda513 For NTK RoPE scaling, finetuning it is not needed. But based on my tests, superhot models works better with both RoPE scaling + comb scaling.

For now, no loader supports NTK RoPE.

That PR adds experimental supports only for exllama at the moment.

alkeryn · 2023-06-30T08:01:44Z

@Panchovix i don't quite understand how it would work better with rope + comb scaling but that's interesting, so you put 4 for each ?
though i think once we have comb finetunes, it'll probably outperform superhot + rope scaling or even the mix of both.
still being able to use any model at any context length without a finetune is already great !

ottobunge · 2023-06-30T08:12:29Z

I have tested the change and get better results with compression at 4 and alpha at for.

Using TheBloke_nous-hermes-13b-superhot-8k-GPTQ-4bit-128g, if I only have either compression or NTK Rope enabled, it tells me it cannot find the secret messages I left embedded in the paper, but with alpha 4 and compression at 4 it retrieves correctly

alkeryn · 2023-06-30T08:18:10Z

@ottobunge interesting, have you tried alpha 8 or more with no compression on a normal model ?
would still be interesting to see finetunes made for ntk.

ottobunge · 2023-06-30T08:25:35Z

at 8k on neko institute llama 13b 4bit 32g at alpha 8 and compression 1 I get nonsense.

ottobunge · 2023-06-30T08:26:22Z

trying alpha 10 and then alpha 4 compression 4 on this same model, to see differences

ottobunge · 2023-06-30T08:28:25Z

Alpha 10

ottobunge · 2023-06-30T08:30:43Z

Failure mode is worse at compression 4 alpha 4 on plain llama.
this model is probably not great at the task xD

alkeryn · 2023-06-30T08:31:00Z

@ottobunge that makes sense since the model was trained for 8k rope.
but i was asking about alpha 8 on a non 8k finetuned model with no compression.

ottobunge · 2023-06-30T08:31:24Z

That would be this
#115 (comment)

ottobunge · 2023-06-30T08:43:47Z

I'm downloading a non fine tuned version, but on the fined tuned I can run no compression at alpha 10 and get good results.

in fact it follows the formatting on the prompt better than compression 4 alpha 4

ottobunge · 2023-06-30T08:48:58Z

TheBloke_airoboros-13B-gpt4-1.4-GPTQ so a non fine tuned model at alpha 10
it got 3/4 pass phrases in the wrong order.

The correct order is on the second image

ottobunge · 2023-06-30T10:09:45Z

The best answer i got like this.

If I change the proportion more to one or another it start by misspelling milkshake to milshake or fails altogether if I change the proportion too much, and starts guessing cherry as the 4th, banana as the third and missing milkshake

Panchovix · 2023-06-30T20:47:53Z

I have updated the PR.

Before, the alpha value wasn't being applied correctly. (It was at 1.0) Now, it does it correctly, and thus, just by setting alpha for NTK RoPE scaling would be enough (without the need to set compress_pos_emb to the same value)

@ottobunge @alkeryn Can you guys test and see how it goes now? Results are WAY different, and IMO, better.

Panchovix · 2023-06-30T21:12:10Z

For tulu-30B-GPTQ (non-SuperHOT)

Perplexity at 2048 ctx (no compress_pos_emb, no alpha RoPE): 5.2153
Perplexity at 8192 ctx, compress_pos_emb = 4: 10.0813
Perplexity at 8192 ctx, alpha = 4: 5.3534
Perplexity at 8192 ctx, compress_pos_emb = 4, alpha = 4: 15.4406

For Tulu-30B-SuperHOT-8K-4bit-32g:

Perplexity at 2048 ctx (compress_pos_emb = 1, no alpha RoPE): 53.2788 (Basically, for <2048 ctx don't use SuperHOT models)
Perplexity at 8192 ctx, compress_pos_emb = 4: 5.8166
Perplexity at 8192 ctx, alpha = 4: 7.5073
Perplexity at 8192 ctx, compress_pos_emb = 4, alpha = 4: 6.0903

Basically, it seems that NTK RoPE scaling is better that we expected.

laoda513 · 2023-07-01T08:35:38Z

how about the mem cost increase for inference and training? it is linear? for example 1 for 2k and 2 for 4k..

and i think this is very exciting and interesting！When i think more on it, if we can easily extend a model trained with 2k to 8k.
Is that mean we can extend a model with 512 to 2k？,
And I think this doe not really exntend the 'attention', it just using the same volume of attention on a longer context right? is kind like....a human reading fastily....

Panchovix · 2023-07-01T20:11:32Z

how about the mem cost increase for inference and training? it is linear? for example 1 for 2k and 2 for 4k..

and i think this is very exciting and interesting！When i think more on it, if we can easily extend a model trained with 2k to 8k. Is that mean we can extend a model with 512 to 2k？, And I think this doe not really exntend the 'attention', it just using the same volume of attention on a longer context right? is kind like....a human reading fastily....

For training itself, sadly I'm not sure how it would be applied :(.

Also, thanks turbo for the PR merge!

Now NTK RoPE scaling can be used on exllama.

alkeryn · 2023-07-07T23:00:46Z

thank you everyone, i'm closing the issue ! :)

alkeryn mentioned this issue Jun 29, 2023

Interesting method to extend a model's max context length. #92

Closed

Panchovix mentioned this issue Jun 29, 2023

(Experimental) Add support to NTK RoPE scaling #118

Merged

Panchovix mentioned this issue Jul 1, 2023

Add Support for Static NTK RoPE scaling for exllama/exllama_hf oobabooga/text-generation-webui#2955

Merged

alkeryn closed this as completed Jul 7, 2023

pseudotensor mentioned this issue Aug 28, 2023

long context h2oai/h2ogpt#360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NTK RoPE scaling. #115

NTK RoPE scaling. #115

alkeryn commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 29, 2023 •

edited

Loading

alkeryn commented Jun 29, 2023

Panchovix commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 29, 2023

turboderp commented Jun 29, 2023

laoda513 commented Jun 30, 2023

Panchovix commented Jun 30, 2023 •

edited

Loading

alkeryn commented Jun 30, 2023

ottobunge commented Jun 30, 2023

alkeryn commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

alkeryn commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023 •

edited

Loading

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

Panchovix commented Jun 30, 2023

Panchovix commented Jun 30, 2023

laoda513 commented Jul 1, 2023

Panchovix commented Jul 1, 2023 •

edited

Loading

alkeryn commented Jul 7, 2023

NTK RoPE scaling. #115

NTK RoPE scaling. #115

Comments

alkeryn commented Jun 29, 2023 • edited Loading

Panchovix commented Jun 29, 2023 • edited Loading

alkeryn commented Jun 29, 2023

Panchovix commented Jun 29, 2023 • edited Loading

Panchovix commented Jun 29, 2023

turboderp commented Jun 29, 2023

laoda513 commented Jun 30, 2023

Panchovix commented Jun 30, 2023 • edited Loading

alkeryn commented Jun 30, 2023

ottobunge commented Jun 30, 2023

alkeryn commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

alkeryn commented Jun 30, 2023

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023 • edited Loading

ottobunge commented Jun 30, 2023

ottobunge commented Jun 30, 2023

Panchovix commented Jun 30, 2023

Panchovix commented Jun 30, 2023

laoda513 commented Jul 1, 2023

Panchovix commented Jul 1, 2023 • edited Loading

alkeryn commented Jul 7, 2023

alkeryn commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 29, 2023 •

edited

Loading

Panchovix commented Jun 30, 2023 •

edited

Loading

ottobunge commented Jun 30, 2023 •

edited

Loading

Panchovix commented Jul 1, 2023 •

edited

Loading