Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] Log image per epoch and have slider per epoch #3455

Open
azinonos opened this issue Apr 1, 2022 · 29 comments
Open

[Q] Log image per epoch and have slider per epoch #3455

azinonos opened this issue Apr 1, 2022 · 29 comments
Labels
a:app Area: Frontend/Backend c:media

Comments

@azinonos
Copy link

azinonos commented Apr 1, 2022

Hello,

In my training loop I am logging an image generated from the data at every epoch. I am using PyTorch Lightning and Weights & Biases. The command I'm using is like the below:

fig = # compute fig
self.logger.experiment.log({"heatmap": [wandb.Image(fig)]})

However, when I log it to wandb, I get a weird slider which is not epochs but random steps. For example the first step is step 14, the second is step 19, the third is step 24 etc. So even though I can slide the slider to see the progress, I never really know which epoch each image corresponds to.

image

Alternatively, I tried to log multiple images (which is an uglier way) by putting the current epoch number as part of the title:

fig = # compute fig
self.logger.experiment.log({f"heatmap_e{self.current_epoch}": [wandb.Image(fig)]})

This logs multiple images with the epoch number, but does so in a seemingly arbitrary order. If I use the sorting functionality, I still don't get a correct sorting of the images, but a different arbitrary combination.

image

So to summarise:
Question/Issue 1) How to log images on every epoch and have a slider epoch instead of random steps?
Question/Issue 2) How to sort multiple images in a correct order instead of arbitrarily?

@ramit-wandb
Copy link
Contributor

Hi @azinonos,

Images (or for that matter any data logged through wandb.log) are ordered on the basis of step, an internal variable that allows us to order the data present in a meaningful manner. step is incremented on each call to wandb.log.

To answer your question, there are 2 ways to manage the step number:

  1. Collect all your metrics and send them over in a single log call:
wandb.log({
  'metric_1' : ...,
  'metric_2' : ...
})

vs

wandb.log({'metric_1' : ...})
wandb.log({'metric_2' : ...})
  1. Manually manage your step value using wandb.log({...}, step=STEP). Please note that the value of step must increase monotonically, otherwise calls to log may be ignored.

As for your second question, the images are always ordered in the order in which they are logged to W&B. There is no way to rearrange this order currently.

Thanks,
Ramit

@azinonos
Copy link
Author

azinonos commented Apr 5, 2022

Hi Ramit,

Thanks for the response. For the second solution, when you set a manual step with step=STEP, does this affect all steps on a global scope? Or just for this specific metric?

Regarding the logging order, I do log them in a specific order monotonically, but they end up getting logged randomly - that's the problem.

@ramit-wandb
Copy link
Contributor

Hey @azinonos,

When you set a manual step, it only defines the step value for that particular call to wandb.log. Its effect does not extend beyond that.

Would it be possible for you to share a minimal script or colab link reproducing the issue where images are not in order? It will help me understand why these images are not ordered correctly for you.

Thanks,
Ramit

@azinonos
Copy link
Author

azinonos commented Apr 6, 2022

Hi Ramit,
This is great then (about the manual step), will try that. I'll try and set up a colab with the issue in the next few days and send you a link, thanks.

@ramit-wandb
Copy link
Contributor

Hi @azinonos,

Just wanted to follow up here, were you able to set up a reproduction of the error you were seeing with unordered images? It will help me figure out the underlying cause of the bug here.

Thanks,
Ramit

@azinonos
Copy link
Author

azinonos commented Apr 20, 2022

Hi Ramit,

I've just set up a demo now. If you run this, you can see that the logged figures are in a jumbled order. If you try to sort them, you can see it sorts them in an odd way as if the integer is a string instead of numerically (eg. 1,11,12,13..,2, 21).

https://colab.research.google.com/drive/1QobUjeysZE2DHurQoKTBnBd79CcrsegT?usp=sharing

Also setting step=STEP does not seem to work for me, I get the below warning and wandb does not plot anything at all:
wandb: WARNING Step must only increase in log calls. Step 126 < 2727; dropping {'conf_matrix/covariance_embedding_heatmap': [<wandb.sdk.data_types.Image object at 0x7f9054f8b2d0>]}.

@ramit-wandb
Copy link
Contributor

Hi @azinonos,

Thanks for the reproduction! I have requested access so that I can look into this. As for the warning you see, this is what I meant by step having to increase monotonically. If you have logged a value for step which is 2727, you can only log values for steps greater than or equal to 2727.

Thanks,
Ramit

@azinonos
Copy link
Author

Hi @ramit-wandb, I've given you access to the Google Colab.

Regarding the error, I thought the step value is local in which case I could log a new monotonically increasing value every time I log the image (0, 1, 2 etc). Getting this error means the step value is actually global, so it gives me this error because wandb has already performed logs in other parts of the program and it has already incremented the counter to a high value (eg. 2727).

If this is indeed the case, how could I use it to log this specific image no_epochs time and not get this error?

@ramit-wandb
Copy link
Contributor

Hey @azinonos,

I looked through your colab, Looks like you are logging each image with a different key : image_{i}, which is why there is no specific order to them. Logging them through the same key, like image should order them by timestamp.

Thanks,
Ramit

@exalate-issue-sync
Copy link

Ramit Goolry commented:
Hi @azinonos,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

@azinonos
Copy link
Author

Hi @ramit-wandb ,

If I log them with the same key then I have the initial problem of having a slider which is not the number of epochs but a random step. The reason why I'm using the different key is so that I can have them in a manual order I can control.

@exalate-issue-sync
Copy link

WandB Internal User commented:
azinonos commented:
Hi @ramit-wandb ,

If I log them with the same key then I have the initial problem of having a slider which is not the number of epochs but a random step. The reason why I'm using the different key is so that I can have them in a manual order I can control.

@ramit-wandb
Copy link
Contributor

Ah, I see what you mean. Currently, there is no way to switch step out for another metric, since we use step to internally track when a metric is logged. If you really need to make sure that you see epoch at the bottom of your metric, for now I will suggest only making one call to wandb.log per epoch, so that step = epoch for all steps. This way, you will be able to see the epochs as needed.

Thanks,
Ramit

@exalate-issue-sync
Copy link

Ramit Goolry commented:
Hi @azinonos,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

@exalate-issue-sync
Copy link

Ramit Goolry commented:
Hi @azinonos, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

@azinonos
Copy link
Author

Hi @ramit-wandb , sorry for the late response. To be honest this hasn't really been resolved, the issue still remains that a) You can use a custom x-axis with the epochs and b) when you log things independently, you can't sort those images in a specific order. These would be features for wandb to implement.

@ramit-wandb ramit-wandb reopened this May 31, 2022
@ramit-wandb
Copy link
Contributor

Hi @azinonos,

I have created a feature request to allow for custom sliders for a given media object. I'll keep you updated on the status of its progress!

Thanks,
Ramit

@anthonyhu
Copy link

Hey both! I'm having the same issue logging images with wandb (0.13.10) and pytorch lightning (1.8.6).

The "step" on the slider of the logged images are not aligned with the other metrics which use self.global_step. Like @azinonos mentioned, they are a bit arbitrary even when explicitly passing the step argument during logging as in self.logger.log_metrics({key: wandb.Image(img)}, step=self.global_step). They depend on an internal step count of the logger if I understood correctly from @ramit-wandb?

Would it be possible for the slider to display the step specified as an argument, instead of the internal step count from the logger?

@ramit-wandb
Copy link
Contributor

@anthonyhu if you have defined teh step value manually, that should override the internal step value in Lightning (see here)

Could you provide some more details about what you are seeing, specifically, how are you setting step and what you are seeing in our UI (a link to your project would be great).

Thanks,
Ramit

@anthonyhu
Copy link

I think it might be more of an UI issue. When I try to modify the x-axis in the wandb project:
Screenshot 2023-02-22 at 18 22 56

It's correctly reflected on the scalar logs: (here at step 15)
Screenshot 2023-02-22 at 18 24 08

but it does not affect the wandb media. The slider remains "step" (which reads when hovering over it: "this increments every time you call wandb.log in your script")
and not "trainer/global_step"

Screenshot 2023-02-22 at 18 23 01

Is there a way for the slider to become what I've selected for the x-axis?

@ramit-wandb
Copy link
Contributor

I see what you mean, could you share a link to your project? There is no way to change the axis of the media panel currently, though that is something we have roadmapped for the future.

trainer/global_step is different from Step, and so they can not be used interchangeably.

@anthonyhu
Copy link

Thank you for the answer! I cannot share the project as it is a private one.

@ramit-wandb
Copy link
Contributor

I can understand. In that case I will be bumping up the priority for this request in our internal system. I'll reach out once I have some updates for you regarding this.

Thanks,
Ramit

@anthonyhu
Copy link

Thank you so much 🙏

@kptkin kptkin added c:media a:app Area: Frontend/Backend labels Mar 8, 2023
@pme0
Copy link

pme0 commented Mar 23, 2023

I think it might be more of an UI issue. When I try to modify the x-axis in the wandb project: Screenshot 2023-02-22 at 18 22 56

It's correctly reflected on the scalar logs: (here at step 15) Screenshot 2023-02-22 at 18 24 08

but it does not affect the wandb media. The slider remains "step" (which reads when hovering over it: "this increments every time you call wandb.log in your script") and not "trainer/global_step"

Screenshot 2023-02-22 at 18 23 01

Is there a way for the slider to become what I've selected for the x-axis?

🙌

Very keen to have this feature as well!

When logging the loss every k training steps with the actual step S = k, 2k, 3k, ... in the web interface I can change the x-axis variable to S for the loss but not the media, which seems to be logged automatically with wandb internal step 1, 2, 3.... The mismatch between the two makes it hard to check the media corresponding to a particular actual step.

I can't find a neat way to do it because the actual step at which validation occurs and media is logged (at the end of each epoch) is never a round number that I can easily associate with the wandb internal step (due to dataset/batch sizes not giving round number of steps per epoch).

@thawro
Copy link

thawro commented Mar 29, 2023

A workaround for the PyTorchLightning usecase:

  1. Add dict attribute to the class that subclasses the LightningModule
  2. Whenever you want to go with self.logger.log_something(something) update that dict instead
  3. Make only one logging call with the dict attribute

Example:

class ExampleLitModule(LightningModule):
    def __init__(self):
        super().__init__()
        self.metrics = {}

    def on_validation_epoch_end(self, stage: str):
        metrics = get_some_metrics() # your code
        self.metrics.update(metrics)
        self.logger.experiment.log(self.metrics, step=self.current_epoch)
        
        
class ExampleCallbackLogger(pl.Callback):
    def on_validation_epoch_end(self, trainer: pl.Trainer, pl_module: pl.LightningModule):
	some_figure = get_some_figure() # your code
        pl_module.metrics["some_figure"] = some_figure

Using that approach all metrics and figures will be saved for the same step, which increases every time self.logger.experiment.log() is called

@anthonyhu
Copy link

Thank you for the workaround!

@josiahls
Copy link

josiahls commented Apr 5, 2023

@ramit-wandb I also have a similar issue related to logging images with a custom step. I have:

wandb_context.define_metric(f"{root_name}*", step_metric="global_step")
# ...
wandb_context.log({f"{root_name}/loss": loss.item(), "global_step": step})
# ... 
wandb_context.log(
    {
        f"{root_name}/target_heatmap": [
            wandb_context.Image(target_heatmap, caption="Target heatmap")
        ],
        "global_step": step,
    }
)

Using the exact same "global_step": step parameter pass, but only root_name}/loss uses the global step in the ui. The {root_name}/target_heatmap uses the internal step when it makes it to the ui.

It important to note that this is only a subset example, all of the non-image metrics track with global_step, while all of the image metrics/logs track the internal step.

Version:

Name: wandb
Version: 0.14.0

Copy link

sydholl commented Oct 12, 2023

WandB Internal User commented:
ramit-wandb commented:
I see what you mean, could you share a link to your project? There is no way to change the axis of the media panel currently, though that is something we have roadmapped for the future.

trainer/global_step is different from Step, and so they can not be used interchangeably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:app Area: Frontend/Backend c:media
Projects
None yet
Development

No branches or pull requests

8 participants