Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Falcon style ROPE #35

Merged
merged 4 commits into from
Sep 29, 2023
Merged

Remove Falcon style ROPE #35

merged 4 commits into from
Sep 29, 2023

Conversation

magician-blue
Copy link
Contributor

@magician-blue magician-blue commented Sep 27, 2023

All HF llama model are falcon style ROPE and we can convert them to original llama style ROPE with a permutation.
This pull request solve the bug when converting HF GQA to gguf format.
I learned idea from it and fix the similar bug in the llama2.c's exports.py.
Now I successfully convert Tinyllama-1.1B-chat to llama style ROPE. So, we can remove the falcon ROPE part.
I have upload the new export.py and llama2.mojo.

Details:
python export.py tl-chat.bin --hf PY007/TinyLlama-1.1B-Chat-v0.2 --version 0 to convert the model

@magician-blue
Copy link
Contributor Author

I have updated the model on huggingface.

@tairov
Copy link
Owner

tairov commented Sep 27, 2023

Hi @magician-blue , so do you mean the tl-chat model on HF is not compatible with this repo anymore ?

@magician-blue
Copy link
Contributor Author

magician-blue commented Sep 27, 2023

Hi @magician-blue , so do you mean the tl-chat model on HF is not compatible with this repo anymore ?

@tairov We still can run with our repo.

Change from

mojo llama2.mojo tl-chat.bin \
    -r falcon \
    -z tok_tl-chat.bin \
    -n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"

to

mojo llama2.mojo tl-chat.bin \
    -r llama \
    -z tok_tl-chat.bin \
    -n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"

@magician-blue
Copy link
Contributor Author

magician-blue commented Sep 27, 2023

If we can convert all HF llama model(they use falcon style rope) to llama style rope. Then we only need to implementone type of rope in our repo. This is what llama2.c and llama.cpp are doing.

@tairov
Copy link
Owner

tairov commented Sep 28, 2023

Looks cool. Could you share some details where is this convert.py file came from? I see it has some dependencies. Probably we can remove it from the PR, and then keep only link to a converted model in the README file so that the overall process will be simpler?

@magician-blue
Copy link
Contributor Author

magician-blue commented Sep 28, 2023

The original convert file comes from llama2.c and I modify some part of it to support GQA.
I have already make a pull request to llama2.c, but not merged yet.
We can wait for a while.

@magician-blue
Copy link
Contributor Author

The next thing I will do is to convert openllama3b(12G RAM), llama2-chat-7b(28G RAM), vicuna-7b to test my convertor and our llama2.mojo.
Besides, I'll focus on the tokenizer part of llama.cpp and llama2.c in order to find a way to remove the hardcode part of our tokenizer.

@tairov
Copy link
Owner

tairov commented Sep 28, 2023

In this case I guess the convert.py is not needed in the repo.
And it's cool that you have plans to research other types of models support

@tairov
Copy link
Owner

tairov commented Sep 28, 2023

model could be converted using script from llama2c
And for llama2.mojo we have a URL in the readme file

@tairov tairov merged commit e37bb87 into tairov:master Sep 29, 2023
@tairov
Copy link
Owner

tairov commented Sep 29, 2023

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants