Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More local model options #2

Open
bennmann opened this issue Mar 14, 2023 · 6 comments
Open

More local model options #2

bennmann opened this issue Mar 14, 2023 · 6 comments

Comments

@bennmann
Copy link

bennmann commented Mar 14, 2023

Hello,

What lines might one change to use model.generate of a local model on the same host?

I have a 16GB VRAM gaming GPU and have run local inference on bloomz-7B, RWKV 14B, Pythia 12B.

I want to be able to simply change a few lines to generate from a local model instead of hosting an alpa served version.

Thanks for your thoughts and consideration.

@yangkevin2
Copy link
Owner

Hi,

The util function here https://github.com/yangkevin2/doc-story-generation/blob/main/story_generation/common/util.py#L927 interfaces with Alpa to get next-token logprobs. You could try changing that to use your local model instead. Just be aware that the quality of generated text might be a lot worse using a much smaller model, though.

Thanks,
Kevin

@bennmann
Copy link
Author

Thank you! I created a branch and am now navigating my own personal dependency purgatory (using AMD GPU, ROCM, accelerate, and bitsandbytes 8bit, etc).
I will test these changes from the branch I just split and use model https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b

Thanks for your guidance, I will remember you if it makes cool stories.

@bennmann
Copy link
Author

Hi Kevin and anyone else,

There are also a lot of openai calls in various functions - maybe I can find them all and change them, this turned into more of a weekend project than an afternoon project, so I will be delayed replacing every openai call with a local model.generate equivalent.

I may return to this a little at a time, best guess a few weeks to completion as I've dug a little deeper over time. Anyone else feel free to look into my branch and try to suggest openai replacement model.generate equivalents.

-Ben

@yangkevin2
Copy link
Owner

Oh, yeah if you don't want to use the GPT3 API at all you'll have to replace all of those. Sorry, thought you meant just the Alpa stuff.

As an additional note, using your local models on a 16GB GPU will also pretty seriously compromise the quality of the resulting outputs, especially the plan/outline generation-- I'm not convinced that that part would work at all without using an instruction-tuned model (specifically text-davinci-002, since that supports suffix context in addition to a prompt). And in our preliminary experiments using "smaller" 13B models for the main generation procedure, the story quality was quite a bit worse too.

@bennmann
Copy link
Author

bennmann commented Mar 16, 2023 via email

@yangkevin2
Copy link
Owner

Yeah, if you're willing to do a bit of manual cherry-picking / interaction, then the requirements on model quality definitely go down significantly. I haven't tested with the new LLaMA models, but I agree it's likely they'd work better than the ones we tried previously (e.g., GPT-13B non-instruction-tuned). Would be curious to hear how it goes if you do end up trying that out.

Glad you enjoyed the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants