Any plans to support phi-2 finetuning? #85

asmith26 · 2024-01-11T23:21:12Z

Hi there,

Just thought I'd ask if there were any plans to support phi-2 finetuning, https://huggingface.co/microsoft/phi-2?

Many thanks for any help, and this amazing lib!

danielhanchen · 2024-01-12T00:38:06Z

imrankh46 · 2024-01-14T16:16:46Z

Hi there,

Just thought I'd ask if there were any plans to support phi-2 finetuning, https://huggingface.co/microsoft/phi-2?

Many thanks for any help, and this amazing lib!

I tried, fine tuning of phi-2. Which is not good.
I just use hugging face transformer code not with unsloth..

danielhanchen · 2024-01-15T02:10:09Z

@imrankh46 Ye smaller models can sometimes not follow instructions well. Could you approx give an example maybe to show Phi not following instructions?

imrankh46 · 2024-01-15T06:43:43Z

@imrankh46 Ye smaller models can sometimes not follow instructions well. Could you approx give an example maybe to show Phi not following instructions?

here is the code.

https://colab.research.google.com/drive/1a7rL3UzWfo5I7OPyVmTEnR6_tRqIOblg?usp=sharing

cm2435 · 2024-01-16T11:35:10Z

@danielhanchen I'm also interesting in contributing this, let me know if you have space on your PR for another helping hand.

danielhanchen · 2024-01-16T11:54:47Z

@cm2435 Oh more than happy to collab if you're into it!! I actually took a look at Phi the other day:

Phi seems to not use swiglu but just general f(gate) @ down. Swiglu is (f(gate) * up) @ down. This does mean the Triton kernels backprop has to be derived.
Phi is using gelu instead of swish, and I think the first step is to derive the gradient for gelu. Gelu is f(x) = 0.5 * x* (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (x+ 0.044715 * torch.pow(x, 3.0)))) Maybe using Desmos / hand derived
Phi uses a general Layernom, not a RMS Layernorm this means again new kernels have to be written - although I think Triton's tutorials has this already done for us: https://triton-lang.org/main/getting-started/tutorials/05-layer-norm.html
There is some dropout (0.1) after each attention and MLP module I think. This shouldn't be hard to implement hopefully.
I don't know what partial_rotary_factor is in https://huggingface.co/microsoft/phi-2/blob/main/config.json. Need to investigate.

In general Phi is possible, just there are a few blockers - esp (2) and (3). But again - if you want to take a crack at this @cm2435 - I'll be super grateful + I'll collab with you! :) Possibly taking a stab at some steps like (2) finding the derivative might be better as a first step :)

cm2435 · 2024-01-18T18:26:20Z

@danielhanchen Started a fork and opened a staging PR to work off. I can start by adding some unit test coverage around the kernels if we need to test their accuracy, something simple like

assert torch.allclose(triton_out, torch_out)

or I can try to contribute some of the other steps you mentioned- what's your preference?

danielhanchen · 2024-01-27T15:21:47Z

@cm2435 Oh lmao just noticed i never responded on this thread WHOOPS! Well I responded on your other threads already I guess - sorry!!!

cm2435 mentioned this issue Jan 18, 2024

Staging PR for implimenting Phi-2 support. #97

Open

danielhanchen added currently fixing Am fixing now! on roadmap Feature request on roadmap labels Jan 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans to support phi-2 finetuning? #85

Any plans to support phi-2 finetuning? #85

asmith26 commented Jan 11, 2024

danielhanchen commented Jan 12, 2024

imrankh46 commented Jan 14, 2024

danielhanchen commented Jan 15, 2024

imrankh46 commented Jan 15, 2024

cm2435 commented Jan 16, 2024

danielhanchen commented Jan 16, 2024

cm2435 commented Jan 18, 2024 •

edited

danielhanchen commented Jan 27, 2024

Any plans to support phi-2 finetuning? #85

Any plans to support phi-2 finetuning? #85

Comments

asmith26 commented Jan 11, 2024

danielhanchen commented Jan 12, 2024

imrankh46 commented Jan 14, 2024

danielhanchen commented Jan 15, 2024

imrankh46 commented Jan 15, 2024

cm2435 commented Jan 16, 2024

danielhanchen commented Jan 16, 2024

cm2435 commented Jan 18, 2024 • edited

danielhanchen commented Jan 27, 2024

cm2435 commented Jan 18, 2024 •

edited