issues Search Results · repo:luyug/GradCache language:Python
Filter by
29 results
(61 ms)29 results
inluyug/GradCache (press backspace or delete to remove)If I have the same encoder, that is $f=g$, how to use GradCache?
CSWellesSun
- Opened on Jul 25, 2024
- #33
First of all, thank you for developing GradCache and making it available for the community. It s been incredibly useful
for my work.
Currently, GradCache supports loss functions that do not require label ...
penguinwang96825
- Opened on Jul 19, 2024
- #32
I would like to implement the algorithm for grokfast, which is an exponentially weighted mean of past gradients added to
the current gradients, with GradCache. I ve been able to use it without GradCache, ...
ben-walczak
- 2
- Opened on Jul 9, 2024
- #31
Hi, I use grad_cache to train my model, but it seems very slow, I want to konw is this normal? Does using grad cache
generally affect the training speed?
liuweie
- 6
- Opened on Jun 25, 2024
- #30
Hello, When reading the implementation, I noticed that in the forward-backward pass, you used a dot-product before
running the backward pass, specifically in the following line:
https://github.com/luyug/GradCache/blob/0c33638cb27c2519ad09c476824d550589a8ec38/src/grad_cache/grad_cache.py#L241 ...
ahmed-tabib
- Opened on Mar 19, 2024
- #29
Hi, it s a great work!
We have three inputs designated as i1, i2, and i3, which are to be processed by the llama-7b. For input i1, I will
extract two hidden states at two distinct locations and label ...
MikeDean2367
- Opened on Mar 13, 2024
- #28
Hi Luyu, thank you for your nice work.
I have a question on Distributed Contrastive Loss:
https://github.com/luyug/GradCache/blob/33695437d104e50a961cd9beba18b55c85a6537a/src/grad_cache/loss.py#L30-L34 ...
x-zb
- 4
- Opened on Dec 26, 2023
- #25
Hello,
Suppose my model returns multiple outputs. How should the functional approach be modified to handle this?
Thanks.
Soumya-dutta
- 1
- Opened on Dec 26, 2023
- #24
I am trying to train a Image-Text Contrastive learning model and I am using a Functional Approach. The number of grad
steps are 32 and the batch size per step is 32 which makes the total batch size as ...
AshStuff
- 2
- Opened on Dec 21, 2023
- #23
Great work! I find it works well for X and Y with its own encoder, but for some reason, I have to use the setting: X and
Y is with the same shape, X_i and Y_i is the positive sample, X_i and all Y_js are ...
lxx909546478
- Opened on Jun 14, 2023
- #22

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.