Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is is able to turning with exllama? #72

Closed
laoda513 opened this issue Jun 19, 2023 · 21 comments
Closed

Is is able to turning with exllama? #72

laoda513 opened this issue Jun 19, 2023 · 21 comments

Comments

@laoda513
Copy link

I have test this in 4*2080ti . Works well and almost 10t/s for llama 65b while only 0.65t/s for bnb 4bit, really amazing..First of all please accept my thanks and adoration.

If you need do tests with 20 series gpu, may be I can help.

And can this used for turning along with lora?

@turboderp
Copy link
Owner

Thanks, that's very interesting. I always figured 44 GB should be just enough for 65B, but it's good to have it confirmed, and 10 t/s isn't half bad. I guess stacking a lot of 2080-Tis together is a viable alternative to using 3090s if you've got enough PCIe slots. Who knew! :)

As for tuning LoRAs, though, it's not supported at the moment. There is no back propagation, and even if there were it would take drastically more VRAM. It can use LoRAs now, though. Still experimental, and I haven't added the option to the UI yet, but it's there.

@laoda513
Copy link
Author

image

Sorry, I forgot to say that I modified the 2080ti with 22g, but I also tested the situation of two 22g, and it is indeed possible.

In fact, I can use 8 cards to train a 65b model based on bnb4bit or gptq, but the inference is too slow, so there is no practical value. exllama makes 65b reasoning possible, so I feel very excited.

For training lora, I am just curious if there is a back propagation module, whether the training speed will be much higher than the traditional transformer library training, that would be great.

@KaruroChori
Copy link

I thought I read something strange on a different open issue: 22GB on a 2080ti?
Like you can solder more on it? Is that what you mean by "modified"?

@turboderp
Copy link
Owner

@KaruroChori : The 2080-Ti indeed has unpopulated memory slots on the back, and it uses 11 x 8 Gbit chips that could be replaced with pin-compatible 16 Gbit chips instead. In theory you could fit either 44 or 48 GB on a 2080-Ti this way. I haven't read about anyone actually getting it to work, though. It would be a big deal, though, because the 2080-Ti is by no means slow, and very cheap, all things considered.

@laoda513 : Back propagation is tricky, for a number of reasons. And alpaca_lora_4bit already does a pretty good job at it. You're not relying on the cache during training, and all of what ExLlama does to be able to produce invididual tokens quickly is largely irrelevant. I think it would be more productive to try to identify the bottlenecks in other (Q)LoRA training scripts and work from there.

@KaruroChori
Copy link

Thank you for the explanation. I agree, if properly priced they would be a much better option compared to a P40. Mostly because they would avoid the trap of its fake fp16 "support". I found it out the hard way :/.
What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

@laoda513
Copy link
Author

@KaruroChori : The 2080-Ti indeed has unpopulated memory slots on the back, and it uses 11 x 8 Gbit chips that could be replaced with pin-compatible 16 Gbit chips instead. In theory you could fit either 44 or 48 GB on a 2080-Ti this way. I haven't read about anyone actually getting it to work, though. It would be a big deal, though, because the 2080-Ti is by no means slow, and very cheap, all things considered.

@laoda513 : Back propagation is tricky, for a number of reasons. And alpaca_lora_4bit already does a pretty good job at it. You're not relying on the cache during training, and all of what ExLlama does to be able to produce invididual tokens quickly is largely irrelevant. I think it would be more productive to try to identify the bottlenecks in other (Q)LoRA training scripts and work from there.

thanks !

@laoda513
Copy link
Author

laoda513 commented Jun 21, 2023

Thank you for the explanation. I agree, if properly priced they would be a much better option compared to a P40. Mostly because they would avoid the trap of its fake fp16 "support". I found it out the hard way :/. What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

3090 is better consider about price, vram and speed

@EyeDeck
Copy link
Contributor

EyeDeck commented Jun 21, 2023

What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

It doesn't require any firmware or driver mods, Nvidia cards usually have multiple pre-set memory configurations that are supported, and which one it uses (or tries to use) is controlled by some soldered jumpers on the PCB. 2080 Tis have 22GB as one option, so if you do the appropriate memory chip upgrade, it just takes moving some resistors to get everything working. (Interestingly, I just found a current listing on ebay for some thusly modded ones.)

I've always wondered if 3090s have a strapping option for 48GB, it wouldn't surprise me at all if they do.

@ghost
Copy link

ghost commented Jun 21, 2023

What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

It doesn't require any firmware or driver mods, Nvidia cards usually have multiple pre-set memory configurations that are supported, and which one it uses (or tries to use) is controlled by some soldered jumpers on the PCB. 2080 Tis have 22GB as one option, so if you do the appropriate memory chip upgrade, it just takes moving some resistors to get everything working. (Interestingly, I just found a current listing on ebay for some thusly modded ones.)

I've always wondered if 3090s have a strapping option for 48GB, it wouldn't surprise me at all if they do.

It's supported.
3090
[source]

Glad I wasn't the only one thinking about 48GB 3090's :) 16GB 3060 Ti's are also possible.

@turboderp
Copy link
Owner

Are the chips available on the open market, or would you have to scavenge them from another GPU?

@ghost
Copy link

ghost commented Jun 21, 2023

They're like 10 quid per GDDR6X 16Gb Chip and 3 per GDDR6 Samsung 16Gb's. ~250 quid an upgrade comes to around 750 per 3090 48GB

@ghost
Copy link

ghost commented Jun 21, 2023

@turboderp sadly the 3090 ti nor 4090 don't have chips on both sides, sorry bro.

@EyeDeck
Copy link
Contributor

EyeDeck commented Jun 21, 2023

GPU-Z does show 16 Gbit chip support on my 3090 FE:

img

image

I can't find any evidence that it's as (relatively) easy on a 3090 though, at least the same group of Chinese students that had success with 22GB 2080 TIs and (were even selling) 16GB 3070s, and even hacked together a 44GB 2080 Ti (by transplanting the GPU onto a Quadro RTX 8000 board and modding the driver), weren't able to get a 48GB 3090 to work. Not yet anyway, seems to still be a work in progress.

Sure would be nifty if someone nails down a working process, but I fear the cost of the parts ($10 * 24 for the VRAM) + labor when considering the specialized equipment and skill it requires to reliably swap VRAM chips, would end up pretty close to the cost of an entire used 3090.

references

https://linustechtips.com/topic/1489822-get-more-vram-for-less-money/
https://linustechtips.com/topic/1494972-thisis-an-rtx-3070-16gb/page/2/#comment-15919918
https://twitter.com/TCatTheLynx/with_replies

@ghost
Copy link

ghost commented Jun 21, 2023

Thanks for the links @EyeDeck because they're supposedly working on 3090 ti 48GB on twitter, I guess by PCB transplant like the 44GB one?

Also found a 16Gbit GDDR6X D8BZC for $7.69 a pop or ~$190 for 48GB so maybe just under the threshold of being worth it after all fingers crossed. Just gotta wait patiently for those awesome dudes.

@KaruroChori
Copy link

They're like 10 quid per GDDR6X 16Gb Chip and 3 per GDDR6 Samsung 16Gb's. ~250 quid an upgrade comes to around 750 per 3090 48GB

Considering how expensive the a6000 is (around 5k£) that would be an extremely compelling option. I guess I will look into ultrasonic cleaners and reflow ovens over the weekend :D.

@ghost
Copy link

ghost commented Jun 22, 2023

Apparently, 3090's wont happen until anyone has a hold of NVidys vBIOS, 3080 20GB's will be a thing and 48GB 4090 is in the works with a PCB redesign. I still have hope someone will get a hold of the unlocked vBIOS. Oh and 44GB 2080 ti's are a little unstable.

@jmoney7823956789378
Copy link

They couldn't flash it with an A6000 bios? I thought they were nearly identical die and vram arrangements.

@KaruroChori
Copy link

They couldn't flash it with an A6000 bios? I thought they were nearly identical die and vram arrangements.

I am quite positive that would not work.
The silicon has been fused with two slightly different names from what I recall, so they are basically incompatible cards even if they are virtually identical.

@ghost
Copy link

ghost commented Jun 22, 2023

Yeah, I'm sure that's where NVidy cares most (turning 3090's into a6000), can't blame them, it would send shock waves down their entire enterprise market.

@turboderp
Copy link
Owner

They sure are charging a premium for enterprise GPUs. Well, either that or consumer GPUs are loss leaders for the enterprise stuff and we should actually be grateful. ;)

@ghost
Copy link

ghost commented Jun 22, 2023

We really should be grateful. :)

@turboderp turboderp closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants