Is is able to turning with exllama？ #72

laoda513 · 2023-06-19T14:25:06Z

I have test this in 4*2080ti . Works well and almost 10t/s for llama 65b while only 0.65t/s for bnb 4bit, really amazing..First of all please accept my thanks and adoration.

If you need do tests with 20 series gpu, may be I can help.

And can this used for turning along with lora?

turboderp · 2023-06-19T18:58:52Z

Thanks, that's very interesting. I always figured 44 GB should be just enough for 65B, but it's good to have it confirmed, and 10 t/s isn't half bad. I guess stacking a lot of 2080-Tis together is a viable alternative to using 3090s if you've got enough PCIe slots. Who knew! :)

As for tuning LoRAs, though, it's not supported at the moment. There is no back propagation, and even if there were it would take drastically more VRAM. It can use LoRAs now, though. Still experimental, and I haven't added the option to the UI yet, but it's there.

laoda513 · 2023-06-20T04:16:17Z

Sorry, I forgot to say that I modified the 2080ti with 22g, but I also tested the situation of two 22g, and it is indeed possible.

In fact, I can use 8 cards to train a 65b model based on bnb4bit or gptq, but the inference is too slow, so there is no practical value. exllama makes 65b reasoning possible, so I feel very excited.

For training lora, I am just curious if there is a back propagation module, whether the training speed will be much higher than the traditional transformer library training, that would be great.

KaruroChori · 2023-06-20T05:12:48Z

I thought I read something strange on a different open issue: 22GB on a 2080ti?
Like you can solder more on it? Is that what you mean by "modified"?

turboderp · 2023-06-20T10:21:18Z

@KaruroChori : The 2080-Ti indeed has unpopulated memory slots on the back, and it uses 11 x 8 Gbit chips that could be replaced with pin-compatible 16 Gbit chips instead. In theory you could fit either 44 or 48 GB on a 2080-Ti this way. I haven't read about anyone actually getting it to work, though. It would be a big deal, though, because the 2080-Ti is by no means slow, and very cheap, all things considered.

@laoda513 : Back propagation is tricky, for a number of reasons. And alpaca_lora_4bit already does a pretty good job at it. You're not relying on the cache during training, and all of what ExLlama does to be able to produce invididual tokens quickly is largely irrelevant. I think it would be more productive to try to identify the bottlenecks in other (Q)LoRA training scripts and work from there.

KaruroChori · 2023-06-20T16:06:54Z

Thank you for the explanation. I agree, if properly priced they would be a much better option compared to a P40. Mostly because they would avoid the trap of its fake fp16 "support". I found it out the hard way :/.
What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

laoda513 · 2023-06-21T01:32:06Z

@KaruroChori : The 2080-Ti indeed has unpopulated memory slots on the back, and it uses 11 x 8 Gbit chips that could be replaced with pin-compatible 16 Gbit chips instead. In theory you could fit either 44 or 48 GB on a 2080-Ti this way. I haven't read about anyone actually getting it to work, though. It would be a big deal, though, because the 2080-Ti is by no means slow, and very cheap, all things considered.

@laoda513 : Back propagation is tricky, for a number of reasons. And alpaca_lora_4bit already does a pretty good job at it. You're not relying on the cache during training, and all of what ExLlama does to be able to produce invididual tokens quickly is largely irrelevant. I think it would be more productive to try to identify the bottlenecks in other (Q)LoRA training scripts and work from there.

thanks !

laoda513 · 2023-06-21T01:35:57Z

Thank you for the explanation. I agree, if properly priced they would be a much better option compared to a P40. Mostly because they would avoid the trap of its fake fp16 "support". I found it out the hard way :/. What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

3090 is better consider about price, vram and speed

EyeDeck · 2023-06-21T18:32:49Z

What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

It doesn't require any firmware or driver mods, Nvidia cards usually have multiple pre-set memory configurations that are supported, and which one it uses (or tries to use) is controlled by some soldered jumpers on the PCB. 2080 Tis have 22GB as one option, so if you do the appropriate memory chip upgrade, it just takes moving some resistors to get everything working. (Interestingly, I just found a current listing on ebay for some thusly modded ones.)

I've always wondered if 3090s have a strapping option for 48GB, it wouldn't surprise me at all if they do.

ghost · 2023-06-21T20:15:26Z

What strikes me most is how this is not prevented by nvidia; considering how locked-in their drivers have been lately, I just assumed there was no chance, but I guess there are "alternative sources" for custom firmware/drivers.

It doesn't require any firmware or driver mods, Nvidia cards usually have multiple pre-set memory configurations that are supported, and which one it uses (or tries to use) is controlled by some soldered jumpers on the PCB. 2080 Tis have 22GB as one option, so if you do the appropriate memory chip upgrade, it just takes moving some resistors to get everything working. (Interestingly, I just found a current listing on ebay for some thusly modded ones.)

I've always wondered if 3090s have a strapping option for 48GB, it wouldn't surprise me at all if they do.

It's supported.

[source]

Glad I wasn't the only one thinking about 48GB 3090's :) 16GB 3060 Ti's are also possible.

turboderp · 2023-06-21T20:17:59Z

Are the chips available on the open market, or would you have to scavenge them from another GPU?

ghost · 2023-06-21T20:22:56Z

They're like 10 quid per GDDR6X 16Gb Chip and 3 per GDDR6 Samsung 16Gb's. ~250 quid an upgrade comes to around 750 per 3090 48GB

ghost · 2023-06-21T20:24:43Z

@turboderp sadly the 3090 ti nor 4090 don't have chips on both sides, sorry bro.

EyeDeck · 2023-06-21T21:37:52Z

GPU-Z does show 16 Gbit chip support on my 3090 FE:

img

I can't find any evidence that it's as (relatively) easy on a 3090 though, at least the same group of Chinese students that had success with 22GB 2080 TIs and (were even selling) 16GB 3070s, and even hacked together a 44GB 2080 Ti (by transplanting the GPU onto a Quadro RTX 8000 board and modding the driver), weren't able to get a 48GB 3090 to work. Not yet anyway, seems to still be a work in progress.

Sure would be nifty if someone nails down a working process, but I fear the cost of the parts ($10 * 24 for the VRAM) + labor when considering the specialized equipment and skill it requires to reliably swap VRAM chips, would end up pretty close to the cost of an entire used 3090.

references

https://linustechtips.com/topic/1489822-get-more-vram-for-less-money/
https://linustechtips.com/topic/1494972-thisis-an-rtx-3070-16gb/page/2/#comment-15919918
https://twitter.com/TCatTheLynx/with_replies

ghost · 2023-06-21T21:55:23Z

Thanks for the links @EyeDeck because they're supposedly working on 3090 ti 48GB on twitter, I guess by PCB transplant like the 44GB one?

Also found a 16Gbit GDDR6X D8BZC for $7.69 a pop or ~$190 for 48GB so maybe just under the threshold of being worth it after all fingers crossed. Just gotta wait patiently for those awesome dudes.

KaruroChori · 2023-06-21T21:57:45Z

They're like 10 quid per GDDR6X 16Gb Chip and 3 per GDDR6 Samsung 16Gb's. ~250 quid an upgrade comes to around 750 per 3090 48GB

Considering how expensive the a6000 is (around 5k£) that would be an extremely compelling option. I guess I will look into ultrasonic cleaners and reflow ovens over the weekend :D.

ghost · 2023-06-22T15:31:09Z

Apparently, 3090's wont happen until anyone has a hold of NVidys vBIOS, 3080 20GB's will be a thing and 48GB 4090 is in the works with a PCB redesign. I still have hope someone will get a hold of the unlocked vBIOS. Oh and 44GB 2080 ti's are a little unstable.

jmoney7823956789378 · 2023-06-22T15:38:50Z

They couldn't flash it with an A6000 bios? I thought they were nearly identical die and vram arrangements.

KaruroChori · 2023-06-22T15:51:33Z

They couldn't flash it with an A6000 bios? I thought they were nearly identical die and vram arrangements.

I am quite positive that would not work.
The silicon has been fused with two slightly different names from what I recall, so they are basically incompatible cards even if they are virtually identical.

ghost · 2023-06-22T15:53:32Z

Yeah, I'm sure that's where NVidy cares most (turning 3090's into a6000), can't blame them, it would send shock waves down their entire enterprise market.

turboderp · 2023-06-22T18:21:13Z

They sure are charging a premium for enterprise GPUs. Well, either that or consumer GPUs are loss leaders for the enterprise stuff and we should actually be grateful. ;)

ghost · 2023-06-22T18:32:16Z

We really should be grateful. :)

turboderp closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is is able to turning with exllama？ #72

Is is able to turning with exllama？ #72

laoda513 commented Jun 19, 2023

turboderp commented Jun 19, 2023

laoda513 commented Jun 20, 2023

KaruroChori commented Jun 20, 2023

turboderp commented Jun 20, 2023

KaruroChori commented Jun 20, 2023

laoda513 commented Jun 21, 2023

laoda513 commented Jun 21, 2023 •

edited

Loading

EyeDeck commented Jun 21, 2023

ghost commented Jun 21, 2023 •

edited by ghost

Loading

turboderp commented Jun 21, 2023

ghost commented Jun 21, 2023

ghost commented Jun 21, 2023 •

edited by ghost

Loading

EyeDeck commented Jun 21, 2023

ghost commented Jun 21, 2023

KaruroChori commented Jun 21, 2023

ghost commented Jun 22, 2023 •

edited by ghost

Loading

jmoney7823956789378 commented Jun 22, 2023

KaruroChori commented Jun 22, 2023

ghost commented Jun 22, 2023 •

edited by ghost

Loading

turboderp commented Jun 22, 2023

ghost commented Jun 22, 2023

Is is able to turning with exllama？ #72

Is is able to turning with exllama？ #72

Comments

laoda513 commented Jun 19, 2023

turboderp commented Jun 19, 2023

laoda513 commented Jun 20, 2023

KaruroChori commented Jun 20, 2023

turboderp commented Jun 20, 2023

KaruroChori commented Jun 20, 2023

laoda513 commented Jun 21, 2023

laoda513 commented Jun 21, 2023 • edited Loading

EyeDeck commented Jun 21, 2023

ghost commented Jun 21, 2023 • edited by ghost Loading

turboderp commented Jun 21, 2023

ghost commented Jun 21, 2023

ghost commented Jun 21, 2023 • edited by ghost Loading

EyeDeck commented Jun 21, 2023

ghost commented Jun 21, 2023

KaruroChori commented Jun 21, 2023

ghost commented Jun 22, 2023 • edited by ghost Loading

jmoney7823956789378 commented Jun 22, 2023

KaruroChori commented Jun 22, 2023

ghost commented Jun 22, 2023 • edited by ghost Loading

turboderp commented Jun 22, 2023

ghost commented Jun 22, 2023

laoda513 commented Jun 21, 2023 •

edited

Loading

ghost commented Jun 21, 2023 •

edited by ghost

Loading

ghost commented Jun 21, 2023 •

edited by ghost

Loading

ghost commented Jun 22, 2023 •

edited by ghost

Loading

ghost commented Jun 22, 2023 •

edited by ghost

Loading