Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable logging #45

Merged
merged 7 commits into from
Apr 19, 2019
Merged

Configurable logging #45

merged 7 commits into from
Apr 19, 2019

Conversation

khornlund
Copy link
Contributor

Hi guys,

This PR implements configurable logging via YAML.

In particular, it allows for:

  • configurable log format
  • configurable handlers (eg. rotating file handlers)

I've removed the train_logger, as we can use different log levels (info/debug) and the verbosity config to control what is output, and by logging to a file we can avoid having to save the log state out as part of a checkpoint.

Let me know what you think.

Cheers,
Karl

@SunQpark
Copy link
Collaborator

SunQpark commented Apr 12, 2019

Hi, @khornlund this is my brief review for this PR.

Removing the train_logger was the right thing to do. Previous version of utils/logger.py was doing nothing and I also removed it from my fork for the next update.

Moving the model print to the BaseModel constructor was not good. print(model) is the official way of displaying the structure of nn.Module, and previous version of BaseModel was written to be used exactly alike as the nn.Module. The BaseModel in this PR now has additional argument verbose in the super().__init__() call. Similarly, the variable name verbosity should not be changed since it is meant to be aligned with verbosity argument of fit() function in keras.

I'm not sure as for above things but using logging levels as verbosity options also seems dangerous. What happens if some imported modules use debug level logging to print a bunch of information which is not desired (and possibly annoying) for model training? The proper use of logging level is, as I understood, printing information only needed while debugging (intermediate tensor shapes or values for sanity check in this case) with debug level, and ignoring this information after debugging is done by setting severity level lower. Therefore, training progress and loss should be std output(by print function) or at most infolevel.

I don't think I fully understood what the yaml file is doing, but it seems that most of its content is about saving log into saved/ dir. However, I think saving output as a file can and shall be done by redirection syntax which is adding > log.txt after command. I think it is better to have logging.py handle all of the configuration for logging, than having explicit yaml file. How do you think about that?

Finally, current default logging format seems to have too much information for most of use cases.

I noticed that I said only bad things but, I really appreciate you for making this PR.

@khornlund
Copy link
Contributor Author

Hi @SunQpark , thanks for the speedy response!

Removing the train_logger was the right thing to do. Previous version of utils/logger.py was doing nothing and I also removed it from my fork for the next update.

Great - it was clunky having two loggers and I don't think the saving/reloading worked anyway.

Moving the model print to the BaseModel constructor was not good. print(model) is the official way of displaying the structure of nn.Module

I think it's really useful to be logging to output of str(BaseModel). For example see this extract from my info.log file, which shows which model is being used for the training. You don't get this saved information just using print().

2019-04-12 12:45:49,325 - MnistModel - INFO - <init>: 
MnistModel(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d(p=0.5)
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=10, bias=True)
)
Trainable parameters: 21840
2019-04-12 12:45:57,133 - Trainer - INFO -     epoch          : 1
2019-04-12 12:45:57,133 - Trainer - INFO -     loss           : 0.6276595388578861
2019-04-12 12:45:57,133 - Trainer - INFO -     my_metric      : 0.7997048493568044
2019-04-12 12:45:57,133 - Trainer - INFO -     my_metric2     : 0.935619604773189
2019-04-12 12:45:57,133 - Trainer - INFO -     val_loss       : 0.1181013416102592
2019-04-12 12:45:57,133 - Trainer - INFO -     val_my_metric  : 0.9625047492401216
2019-04-12 12:45:57,133 - Trainer - INFO -     val_my_metric2 : 0.9970079787234043
2019-04-12 12:45:57,135 - Trainer - INFO - Saving checkpoint: saved/Mnist_LeNet/0412_124550/checkpoint-epoch1.pth ...

Could you elaborate on what you don't like about this?

BaseModel was written to be used exactly alike as the nn.Module. The BaseModel in this PR now has additional argument verbose in the super().init() call.

Ok - perhaps there's a use case I'm not understanding where this would be a problem. It's an optional parameter, so it doesn't require being used in a different way. Perhaps you would prefer the Model not have a logger, and logging be performed by the outer function/class?

the variable name verbosity should not be changed since it is meant to be aligned with verbosity argument of fit() function in keras.

No problem, I'll change it so we're consistent with keras.

I'm not sure as for above things but using logging levels as verbosity options also seems dangerous. What happens if some imported modules use debug level logging to print a bunch of information which is not desired (and possibly annoying) for model training? The proper use of logging level is, as I understood, printing information only needed while debugging (intermediate tensor shapes or values for sanity check in this case) with debug level, and ignoring this information after debugging is done by setting severity level lower. Therefore, training progress and loss should be std output(by print function) or at most infolevel.

The beauty of the logging module is you can control the verbosity for each logger (usually you have a different logger instance per class or module). If you import a module that itself uses logging, you can set the verbosity of that independent of the verbosity of Model or Trainer. So you can set 3rd party imports to use WARN while you set Trainer to use DEBUG and Model to INFO, for example.

You can also filter what is recorded by each Handler - for example you could set stdout to only print INFO level statements, regardless of what the logging levels of the various loggers are set to. For example:

handlers:
    console:
        class: logging.StreamHandler
        level: INFO  # now only INFO level statements will be shown in the console
        formatter: simple
        stream: ext://sys.stdout

I don't think I fully understood what the yaml file is doing, but it seems that most of its content is about saving log into saved/ dir.

The yaml config allows a clean way of defining logging formats, and definition of logging.Handlers. In my example, we use a StreamHandler to output logging to stdout, and RotatingFileHandler to output logging to file. Each of the handlers can be configured to use a different logging format - so maybe you want minimal information printed to console but more robust logging to a file so it's easier to revisit at a later date.

However, I think saving output as a file can and shall be done by redirection syntax which is adding > log.txt after command.

This also works, but you get a lot of functionality for free by using the logging module.

For example, by using RotatingFileHandler, you can keep appending to a single log file until it reaches a specified size, at which point it backs up the current file and moves onto a new one. This keeps your file size manageable, and allows you to keep as much historical logging as you like (see maxBytes and backupCount).

Imagine trying out a new model architecture, and wanting to compare it to the results of a previous version from 2 weeks ago. You can just grep through the saved logs to find the name or datetime of a previous run.

The other advantage is users are free to add whatever Handler they like - eg. if someone wants training logging output to a web page so they can check on their models while not at their desk, they can use SocketHandler or implement their own Handler.

This might be a lot of information but I hope I'm making the case that by using logging instead of print() and >> commands you get a lot for free with little downside.

I think it is better to have logging.py handle all of the configuration for logging, than having explicit yaml file. How do you think about that?

I think it's good where possible to separate configuration from code - it makes it easier to track what is a code change and what is a configuration change. Plus, I think yaml is easier to read.

Finally, current default logging format seems to have too much information for most of use cases.

Ok - perhaps we could have a minimal format for printing to the console, but record more information to file so it's easy to search?

Let me know what you think and I'll make the appropriate changes :)

@SunQpark
Copy link
Collaborator

SunQpark commented Apr 15, 2019

Hi @khornlund, I'm sorry for underestimating what you have suggested.
Now I understand that using logging module can bring a lot of cool features to this project.

This project is designed for use by people unfamiliar with pytorch (or python). It would be very difficult to such people that reading code written by others and determining which part is important, and which is not. That's why this project tries to hide most of its complex functionality into base classes, and why I think the usage of seemingly difficult modules (like logging) should be kept minimal and subtle.

What I don't like for current model printing is not that it requires additional parameter(which was not true), but that it appears to do so. Someone who's not familiar with subclassing nn.Module is likely to think that verbose belongs to nn.Module and try to use that elsewhere. Someone who don't understand that .yaml file is likely to think that it is essential to know for deep learning project with pytorch. Those situations maybe too extreme cases to concern about, but I think considering this extremes will eventually help making a better project.

Let's discuss concrete things now.

  1. BaseModel

Based on my experiment, defining logger instance in the train.py before defining model and using logger.info(model) appeared to have almost the same result(setup_logger(model, verbose=2).info(model) for exactly same) as calling it inside of constructor. Your code seems to want separate loggers for each class, but I think having one global logger seems to be a better pattern.
My plan for the next update was to make new ConfigParser class to handle configuration and object initializations(get_instance now). Having unified logger will better suited for this plan too.

  1. yaml file

perhaps we could have a minimal format for printing to the console, but record more information to file so it's easy to search?

I really like this idea. I'd like to have console output having simple format (as before), and
As you may have noticed, this project saves model checkpoint and config file into saved/ModelName/<time_stamp>/, and I think the log output should be saved there too. Current location (saved/) seems problematic if we have multiple training instance running simultaneously. Maybe this requires specifying logger path on the runtime, but then we don't have to contain pre-made 'saved' folder.

Having the configuration as separate yaml file is OK, but I think it could be better hidden somewhere other than project root directory, since it is not important in out project (in that it is not directly related to training model with pytorch). Maybe we can make logger dir and put yaml file there, with the logger.py file.

Tell me if anything seems wrong or unclear. Please proceed as much as you are willing to do and leave comment, then I'll handle the rest (if you don't mind).

@SunQpark SunQpark mentioned this pull request Apr 15, 2019
@khornlund
Copy link
Contributor Author

Hi @SunQpark, I've made some new changes - comments below.

This project is designed for use by people unfamiliar with pytorch (or python). It would be very difficult to such people that reading code written by others and determining which part is important, and which is not. That's why this project tries to hide most of its complex functionality into base classes, and why I think the usage of seemingly difficult modules (like logging) should be kept minimal and subtle.

This is a fair point - logging gives you lots of cool functionality but it does increase the complexity.

What I don't like for current model printing is not that it requires additional parameter(which was not true), but that it appears to do so. Someone who's not familiar with subclassing nn.Module is likely to think that verbose belongs to nn.Module and try to use that elsewhere. Someone who don't understand that .yaml file is likely to think that it is essential to know for deep learning project with pytorch. Those situations maybe too extreme cases to concern about, but I think considering this extremes will eventually help making a better project.

Ok, I've removed the logger from the Model and added loggers to train.py and test.py, so the logging can happen from there instead.

Your code seems to want separate loggers for each class, but I think having one global logger seems to be a better pattern.

This could work - it might simplify things a bit, but you lose some flexibility going this route. Usually multiple loggers are used so you can control the verbosity of each.

I'd like to have console output having simple format (as before)

I've implemented this - compare the console output to what is logged to file.

As you may have noticed, this project saves model checkpoint and config file into saved/ModelName/<time_stamp>/, and I think the log output should be saved there too. Current location (saved/) seems problematic if we have multiple training instance running simultaneously. Maybe this requires specifying logger path on the runtime, but then we don't have to contain pre-made 'saved' folder.

I've reworked how this pathing works to address this - see utils.saving. Now you just set a save_dir in the config, and you'll get output like:

saved/
    |----Mnist_LeNet/
           | ----logs/
           |----<0415_214936>/
           |       |----checkpoints/
           |       |----runs/
           |----<0415_215103>/
           |       |----checkpoints/
           |       |----runs/

I think it makes sense to leave the logs outside the timestamp because if you continue training at a different time, it would be better to have a contiguous record of the training across multiple sessions. What do you think?

Having the configuration as separate yaml file is OK, but I think it could be better hidden somewhere other than project root directory, since it is not important in out project (in that it is not directly related to training model with pytorch). Maybe we can make logger dir and put yaml file there, with the logger.py file.

Ok, I don't have a good suggestion for this - I'll leave that to you to decide.

How do you feel about the changes? :)

@SunQpark
Copy link
Collaborator

I'm working on this now.
I expected std error messages are captured by errors.log but it seems not.
This file was empty although I raised exception by keyboard interrupt, or assert(False).

Is this normal? Then what errors.log is for?

If not, how can I fix it?

@khornlund
Copy link
Contributor Author

It will only capture things you log with level >= logging.ERROR eg.

try:
   # stuff
except Exception as ex:
  self.logger.error('Caught exception: {}'.format(ex))

Right now nothing does that, so it won't capture anything. You could wrap calls to train or test, and capture the errors, or just remove the handler to log errors (maybe this is the simplest).

SunQpark added a commit to SunQpark/pytorch-template that referenced this pull request Apr 19, 2019
@SunQpark SunQpark merged commit 300f38d into victoresque:master Apr 19, 2019
@SunQpark
Copy link
Collaborator

Merged PR to the master with the following changes.(Commits are not added here because of my poor usage of git. Sorry.)

  1. added ConfigParser to handle configuration.

setup_logging runs inside constructor of this class, and ConfigParser.get_logger returns logger for each uses.

  1. log file being saved with tensorboard log files.

Trained models are being saved under saved/models/Name/timestamp, log files under saved/models/Name/timestamp. In this way, log files are free from crash, when two models are being trained with same name.

  1. logging config file is changed from yaml to json

yaml seemed to be a good format, but does not comes with the python installation.
to avoid dependency to the yaml package, I used json format which is already being used in this project.

  1. changes made without agreement has been reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants