Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump devicegraphs to YAML and XML #609

Merged
merged 7 commits into from
May 3, 2018
Merged

Dump devicegraphs to YAML and XML #609

merged 7 commits into from
May 3, 2018

Conversation

shundhammer
Copy link
Contributor

@shundhammer shundhammer commented Apr 18, 2018

https://trello.com/c/hvI17ITA/155-2-storageng-log-the-final-partitioning

This will dump device and action graphs to file at appropriate places during installation and in the installed system (when the partitioner is used).

During installation, it will create a directory storage-inst below /var/log/YaST2 for those files; in the installed system, it will create a directory storage and rotate any previous such directory to keep at most 3 old ones of them:

  • storage (for the current files)
  • storage-01 (for the last older ones)
  • storage-02 (for even older ones)
  • storage-03 (for even older ones)

For each program start (of the partitioner), those directories are rotated.

For the installation, it's always just storage-inst; any previous directory by that same name will be deleted and recreated. This can only happen anyway when somebody does an ssh installation and aborts and restarts YaST, so any old files there (in the inst-sys!) are irrelevant.

storage-inst in the inst-sys needs to be copied to the target (this will be a separate PR against yast-installation) and picked up by save_y2logs (which will collect everything in /var/log/YaST anyway.

If YaST is running as non-root, those directories are created below ~/.y2storage/ to avoid cluttering the user's home directory too much (but this affects mostly us YaST developers starting tests).

In each of those directories, files are created to record the devicegraphs (and, at strategic points, also the actions):

  • 01-probed.xml
  • 01-probed.yml
  • 02-proposed.xml
  • 02-proposed.yml
  • 03-actions.txt
  • 04-partitioner.xml
  • 04-partitioner.yml
  • 05-actions.txt
  • 06-committed.xml
  • 07-committed.yml

When the user changes proposal parameters or keeps entering and leaving the partitioner, there might be more of those; one pair of XML and YAML files each time, and whenever actions are shown to the user, also the actions.

@coveralls
Copy link

coveralls commented Apr 23, 2018

Coverage Status

Coverage increased (+0.003%) to 97.011% when pulling 7b80315 on shundhammer:huha-dump-yaml into 0a44f52 on yast:master.

it "Saves the actions in a plain text file" do
subject.save(filename)
lines = File.read(filename).split("\n")
expect(lines[0]).to start_with(Time.now.year.to_s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: There would be room to fail when the test is run on 31/12 at 23:59:59 :). Yes, very theoretical case, but maybe we could compare with a rexp:

expect(lines[0]).to match(/\d{4}-\d{2}-\d{2}.*/)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, and then I thought what the hell, if anybody really runs that unit test at new year's eve, we should drag him out of the office and tell him to get a life and get appropriately drunk as befits that day. ;-)

I prefer a simple test at this place. We should not overengineer even new year's eve. ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will happen, you will see :)

@@ -0,0 +1,327 @@
# encoding: utf-8

# Copyright (c) [2017] SUSE LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: 2018

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

require "singleton"
require "fileutils"
require "yast"
require "rspec/mocks"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/shundhammer/yast-storage-ng/blob/85bc187c990b1a8cc1c996153e34d2a87814f927/src/lib/y2storage/dump_manager.rb#L101

    def dump
    ....
      elsif dump_obj.is_a?(RSpec::Mocks::InstanceVerifyingDouble) ||
          dump_obj.is_a?(RSpec::Mocks::Double)
        log.warn("Not dumping #{dump_obj.class}")

That's the one thing we talked about in today's meeting: This is not exactly pretty, but it helps a lot to keep unit tests simple; otherwise a double or instance_double for Y2Storage::Devicegraph would need the appropriate save method every time.

# vs. installed system), and the directories are cleared (installation) or
# rotated (installed system) for each program invocation.
#
# @param dump_obj[Y2Storage::Devicegraph|Y2Storage::ActionsPresenter]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: add space between param name and type. Moreover, when the param admits different types, the types are separated by commas:

# @param dump_obj [Y2Storage::Devicegraph, Y2Storage::ActionsPresenter]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

# @param dump_obj[Y2Storage::Devicegraph|Y2Storage::ActionsPresenter]
# @param file_base_name [String|nil] File base name to use.
#
# @return file base name with numeric prefix actually used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, document the return type:

 # @return [String, nil] file base name with ...

And also I would document the possible exception:

@raise [ArgumentError] when ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

#
# @return [Boolean]
def installation?
Yast::Mode.installation || Yast::Mode.auto || Yast::Mode.update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: installation includes auto, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. Maybe. I just wanted to make sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked; yes, you are right.

Fixed.

# @return [String] directory name with full path
def dump_dir
dir = installation? ? "storage-inst" : "storage"
base_dir + "/" + dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: File.join(base_dir, dir) is more secure to avoid errors with slashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one thing, this is all over the place, and I really don't want to change this everywhere; for another, what benefit does this provide compared to simple string operations? Linux is very robust about double slashes if they occur (as a matter of fact, I cleaned some places in my other PR today where old code copying the y2logs always had two or sometimes even three slashes where there should have been one).

I can understand the argument for multi-platform applications that need to take a Windows backslash as a path separator into account; but this is a non-issue for us, so let's please keep the code clear and easy to understand. All YaST code is full of those string operation anyway. And I really only see downsides in using yet another indirection for trivial operations like this (complexity, call overhead, readability).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack Overflow says even on Windows Ruby uses the forward slash, so this argument doesn't count even for multi-platform targets. The only issue really appears to be avoiding multiple slashes in case the path components to join already have some, and as I wrote, not only is Linux robust against this (it collapses them into one slash anyway), this code here already makes sure none of the components has a slash too many at those places (it's all constants that are under our control, not anything coming from user input).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine with me then ;)

#!/usr/bin/env rspec
# encoding: utf-8

# Copyright (c) 2016 SUSE LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: 2018

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Copy&paste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

expect(marker?("storage", "01-probed.yml")).to be true
expect(marker?("storage", "03-staging.yml")).to be true
expect(marker?("storage", "04-committed.yml")).to be true
expect(marker?("storage-03", "marker-01")).to be true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this test. Why do all these files exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of populate_full earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that those files are not always cleaned up for every individual test; some depend on a previous state to do the right thing (log rotation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. I didn't realize. So, those tests depend on the order they are executed. Well, at least I will comment it at the beginning of this test suite because I think this is the only one case in storage-ng tests where the order is important. Only to make it clear and avoid headaches in future ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those log rotating things it should be obvious; the comments as well as the output of the individual tests already say it.

@@ -94,6 +95,9 @@ def refresh_value
def calculate_actions
actiongraph = current_graph.actiongraph
@actions = Y2Storage::ActionsPresenter.new(actiongraph)
Y2Storage::DumpManager.dump(current_graph, "partitioner")
Y2Storage::DumpManager.dump(@actions)
@actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why the dump is performed here. IMHO, the strategic places would be:

  • in storage_manager.rb:
    • after probing
    • after committing
  • in inst_disk_proposal.rb
    • after initial proposal
    • after each guided setup
    • after each partitioner
  • in partitions_proposal.rb
    • after each partitioner

And maybe, all that places would be included with:

  • in storage_manager.rb:
    • after probing
    • after assigning staging=
    • after committing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more, should not be enough with saving the probed and committed devicegraphs (and actions)? Well, I guess the idea is to have all steps, even when the changes are not committed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; I wanted to be able to reproduce what happened after each step. I had this many times in bug reports about storage-old: Users do a lot of actions; they switch proposal settings back and forth and enter the expert partitioner and whatnot, and at some point they claim "it doesn't work as expected". It's very hard to figure out what the hell they did in all those steps, and in certain cases this turned out to be important.

I believe that storage-ng is a lot more robust and consistent against the user changing settings back and forth, but I'd rather be safe than sorry. And even from the sequence of those files you can instantly tell what a user did: Just accept one proposal, changed only one or two proposal settings (-> xx-proposal.yml), entered the partitioner (-> xx-partitioner.yml), maybe abandoned his changes and resort to the proposal after all (-> xx-proposal.yml), and, most importantly, what was actually committed.

As for why do it in calculate_actions: Because the actions are already calculated here, they are dumped; and to be very sure which version of the staging devicegraph it refers to, simply dump it one more time. They should even have the same timestamp (it's only seconds resolution). I wanted to avoid creating yet another actiongraph and ActionsPresenter, and, more importantly, I want to log exactly what we showed the user and not just an approximation.

Call me paranoid; just because I am paranoid that doesn't mean that they are not out there to get me. ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but in the case of summary_text, the problem I see is that the files will be saved every time the user accesses to the section "Installation Summary" in the Expert Partitioner. Is this desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, when the Partitioner is used in a running system, a last step with the summary of actions is shown, so also in this case the files will be saved. Probably, this dump could be moved to Dialogs::Main#run, so they are generated only after accepting the Partitioner. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the running system you might get one more, depending where the user clicks. The last step just before commit (which is different in the installed system) always dumps both the devicegraph and the actions; but this is specific to the partitioner in the installed system.

When the user clicks on the "summary" page (which he might or might not do), both graphs are dumped once more. So, in that case (user clicks on summary AND he is presented with the actions just prior to commit) we might get two identical files of each kind, but in what way should that be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, so we might get some extra ones. This is not a problem with rotation at all since it's not individual files that are rotated, but the whole directory that contains those files - for each program call (as mentioned in the initial description of this PR).

In the course of the last 3 years, I had to debug countless problems where I had to fiddle that stuff out of all that completely irrelevant noise in the y2logs (more often than not not even knowing what program run was relevant since the y2logs accumulate all of them in the installed system). It turned out to be necessary to have all that information; some users do a lot of actions to get to the final result, and it is absolutely vital to be able to reproduce what they did.

Those dump files will only accumulate if a user does a lot of actions: Change proposal settings and regenerate the proposal, enter the expert partitioner and whatnot. If a user only accepts the proposal, it will only be 3 sets:

  • probed
  • proposal
  • committed

That's it.

When the user keeps clicking around, there is no way to find out what happened without those dump files; not even the macro recorded during installation is any useful since you never know exactly what he clicked in the tree or in the table and what happened then. This is why I want (and I insist!) in having this detailed information.

We will have to maintain that stuff for about 10 years. If we don't add this now, we will have to live with bug reports coming in during all those 10 years where we will have no way of knowing what happened.

So, if it turns out to be too many of those dump files, we can simply ignore them, just as we ignore 99.8% of the y2logs (just look at all the garbage in there). But the one information we don't record might be the one we sorely miss when a user (typically a paying L3 customer) has a problem.

So, without even having seen any of the results of this, why would we throw away information that we can easily keep -- and that will be essential in the future?

As of now, not even the actions are recorded in the y2log; I wonder how anybody wants to reproduce or track down any bug reports coming in like this. I don't believe in miracles; I don't believe in bugs mysteriously debugging and fixing themselves.

I want those dump files. And I want as many of them as I can get.

It's easy to simply ignore some of them. But it's impossible to retrieve them when they are not recorded.

Copy link
Contributor

@ancorgs ancorgs May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have had discussions in the recent past about logging too much irrelevant stuff and how it hurts us (more memory consumption on the installation, hard to find the useful stuff in the middle of all that information, etc.). I think it was even one of the topics for the YaST Team workshop and I believe there was a general agreement that we wanted to improve the signal/noise ratio.

f I understood the discussion properly, this is another step in the same direction.

We want to log a couple of devicegraphs (in two formats) and an actiongraph for every move the users do. Not only after probing, when the user changes the staging devicegraph and on commit (those are relevant points in time). But also for discarded executions of the partitioner, for partial results in the middle of the partitioner execution, discarded results of the guided setup, every summary screen that displays an information that has really not changed and so on.

This is not a veto for the PR (as said, I didn't actually reviewed it in depth). But I think that logging two devicegraph and one actiongraph every time the user visualizes something (even if that "something" is completely irrelevant for installation purposes or has not changed) is inconsistent with the recent discussions about avoiding noise in the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the y2logs, you can simply discard the ones you don't want or need. The y2logs are just sequential text files where you'd have to write your own tools to filter out all the noise (and in the past, most of the team did exactly that).

If you prefer to see only the final result, well, fine, then just look at *probed.* and *commited.*. That's it (and that's also why they have special names that are easy to tell apart from the rest). For everything else, we record more of them which might or might not be relevant for the bug you are currently working on. And that's also why there is also a line in the y2log (which was snwint's idea) "Dumping devicegraph staging to 03-proposal" so you can easily find the corresponding place in the y2logs (or the other way round).

As I wrote above, if it turns out to be too much, we can easily throw out some of those places later - for SLE-15 SP1, for example. But please let's get some hands-on experience with it first so we can get an idea how useful it is.

Since the y2logs have degenerated to the point of uselessness with all the clutter (but omitting vital things, mostly for the sake of brevity in the Ruby code, I guess), we desperately need useful debugging tools. And this is one of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Fine for me. Let's go for it if you feel they might be useful.

My main concerns right now are log rotation on an installed system and memory consumption during installation (sadly we are in the point in which the approximately 40KiB used by every dump[1] makes a difference).

And just for the record, I have never missed information with the current level of logging while debugging partitioner or proposal bugs (and we got some reports for SLE15 and Leap15 already). I appreciate the fact of having the XML and yaml in separate files, since it's indeed more convenient than copying&pasting the XML from y2logs and then using the Ruby interpreter to make sense of the dump. But that's more about convenience than about missing information. So far, debugging what happened exactly has always[2] been possible.

[1] Rough estimation for a in a system with subvolumes, based on the files we have right now in test/data

[2] "always" means the last 5 months or so. The first versions of storage-ng really lacked proper logging, but that's pretty much fixed now.

@@ -77,6 +78,7 @@ def propose
result = calculate_proposal
return result if devices.nil? || devices.empty?
log.info("Proposed devicegraph:\n\n#{devices.to_str}\n")
DumpManager.dump(devices, "proposed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? In dialogs/proposal, the "proposal" and its actions are already saved. This "proposed" is the same devicegraph, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rely on wishful thinking, but I prefer to be on the safe side; see also my long comment above.

Copy link
Contributor

@joseivanlopez joseivanlopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants