Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Control vectors #3451

Open
generalsvr opened this issue Mar 17, 2024 · 6 comments
Open

[Feature]: Control vectors #3451

generalsvr opened this issue Mar 17, 2024 · 6 comments

Comments

@generalsvr
Copy link

馃殌 The feature, motivation and pitch

Add support for control vectors

See https://github.com/vgel/repeng and ggerganov/llama.cpp#5970

Alternatives

No response

Additional context

No response

@justinphan3110
Copy link

justinphan3110 commented Apr 13, 2024

@simon-mo @generalsvr I should be able to help with this. Let me know how to start.

For more context about control vectors: Representation Engineering: A Top-Down Approach to AI Transparency

@Kaiyang-Chen
Copy link

We can achieve this by loading the control vectors when initializing the cache engine and apply the change to forward() of specified QKVLinear layers, but such changes will be added for all models and all kinds of linear method, which introduce extra complexity to the codebase. Do you have any hints on how we can abstract such logic and make the integration clear? @simon-mo

@sapountzis
Copy link

Something additional to consider is specifying different control vectors (and coefficients) per request which then get stacked into a control matrix with one dimension equal to the batch size.

This can be useful when serving users that require different styles of responses at the same time.

Not sure about the impact on latency.

@raywanb
Copy link

raywanb commented Apr 25, 2024

currently working on an implementation by wrapping the decoder layer and changing the forward pass. lmk if you wanna collaborate on this

@DreamGenX
Copy link

DreamGenX commented Apr 28, 2024

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

It comes with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1

There's a discussion in the comments with the authors of the Represenation Engineering paper.

@heraclex12
Copy link

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

It cames with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1

There's a discussion in the comments with the authors of the Represenation Engineering paper.

It seems that the colab link doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants