Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the training and evaluation pipelines #16

Closed
Ji4chenLi opened this issue Jul 5, 2023 · 3 comments
Closed

Questions about the training and evaluation pipelines #16

Ji4chenLi opened this issue Jul 5, 2023 · 3 comments

Comments

@Ji4chenLi
Copy link

Ji4chenLi commented Jul 5, 2023

Hi Yunfan,

Thank you so much for the great work! Since I'm trying to reproduce the results, I would like to ask some questions regarding the training and evaluation details.

  1. Can you provide the number of training epochs? (Training hyperparameters besides Table 6 #9)
  2. Let's look at Table 7. Denote the number of gradient steps as $N_{gs}$. Since you are using learning rate warm-up and cosine annealing, I assume the learning rate first increase linearly from 0 to 1e-4 when $N_{gs} \in [0, 7K]$. When $N_{gs}\in [7K + (2i * 17K), 7k + ( (2i + 1) * 17K)]$, the learning rate decrease from 1e-4 to 0. And when $N_{gs}\in [7K + ((2i + 1) * 17K), 7k + ( (2i + 2) * 17K)]$, the learning rate increase from 0 to 1e-4. Am I right?
  3. I notice that you fine-tune the last two layers and freeze all other layers of T5. Does it correspond to the following codes?
        for n, p in self.policy.t5_prompt_encoder.named_parameters():
            p.requires_grad = False
            if "t5.encoder.block.11.layer.1." in n or "final_layer_norm" in n:
                p.requires_grad = True
  1. When calculating the success rate (SR) for each task distribution and level, how many task instances did you sample? I assume the equation you used is $$SR = \frac{\text{number of success}}{\text{number of total task instances}} $$
  2. Can you share your vectorized implementation for the policy evaluation?
  3. When evaluating the performance of your methods and the other baseline, how did you set the parameter hide_arm_rgb when making the env? Should we always set it to True?

Thanks and regards,
Jiachen

@yunfanjiang
Copy link
Member

Hi Jiachen,

Thanks for your interest in our project. To answer your questions:

  1. We trained for 10 epochs in total.

  2. The equation you wrote seems to be cyclical. We used a schedule that first linearly increases then monotonically decreases. A similar implementation can be found here.

  3. We used "layer" to refer to transformer layer (block).

  4. Yes, we computed success rate for each task averaged over 100 instances.

  5. Our vectorized env implementation is based on this.

  6. Yes, we used that to emulate workspaces that are free of robot arm occlusions.

Feel free to let me know if you have further questions.

@Ji4chenLi
Copy link
Author

Thank you so much for your response! My questions now get mostly addressed. The questions I left are:

  1. What's the min_lr when performing the cosine annealing? Is it 1e-5 per here and Chinchilla?
  2. The Warmup Steps = 7K and the LR Cosine Annealing Steps = 17K as per Table 7. Could you let me know when the learning_rate decreases to min_lr? Is it step 17K or step 24K (7K + 17K)?

@yunfanjiang
Copy link
Member

Thank you so much for your response! My questions now get mostly addressed. The questions I left are:

  1. What's the min_lr when performing the cosine annealing? Is it 1e-5 per here and Chinchilla?
  2. The Warmup Steps = 7K and the LR Cosine Annealing Steps = 17K as per Table 7. Could you let me know when the learning_rate decreases to min_lr? Is it step 17K or step 24K (7K + 17K)?
  1. We used min_lr = 1e-7.
  2. It decreased for 17K steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants