-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process: fix found position bug with wait_output delta_only #3638
base: main
Are you sure you want to change the base?
Conversation
Fixes: 603bd27 If a buffer is length N and found message 0 is not at buffer[N-len(message0):N], the current code can result in missed messages on future calls to wait_output. wait_output currently stores the total length of the buffer as the search buffer offset, even when a keyword is found. Consider a buffer of 100 characters, keyword of len 10 found at 50. The old code will record 100 as the last length. So if your next keyword is length 10 and found at position 70, it will be lost if you call wait_output with delta_only again. This commit fixes that issue by saving only the end position of slice buffer[start:end] which contains the first keyword. Otherwise, it saves the length of the buffer if no keyword was found. Signed-off-by: Matthew G McGovern <mamcgove@microsoft.com>
if found_pos >= 0: | ||
# note: find returns start for slice of search_buffer[start:end] | ||
# So calculate end; start+len, as the new offset | ||
self.log_buffer_offset = found_pos + len(keyword) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the length of the whole log_buffer. It's not a scenario to search multiple times with the same string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained the bug to you earlier and you said it was a good point... What changed? This isn't about searching for the same string twice. It's searching more than once. You will lose any log messages which follow the first message. There's no guarantee you'll get one line at a time. And delta_only means that the ordering is important and there may be duplicate messages in the logs. So why assume there's nothing important in the rest of the buffer...
I'm not just digging around looking for random bugs to open; I ran into this bug while working on a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the bug is that the self.log_buffer.getvalue()
is called multiple times, so actually it doesn't present the same string. It may cause some problems in multi-threads.
But this place is by design. This is to find a signal to take action, not to find and match strings for other scenarios. Because it's low performance to get string out from a buffer. It's use cases like, 1) start a process, 2) block to waiting the process printing "started", 3) execute next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the bug is that the self.log_buffer.getvalue() is called multiple times, so actually it doesn't present the same string. It may cause some problems in multi-threads.
Oh right, I guess there were two bugs. Even better!
Because it's low performance to get string out from a buffer.
Using delta_only to look for a ordered series of messages will make a single pass through the buffer. I don't think this will be any slower than making a single or more passes later. I'm also pretty sure this function previously searched the entire buffer on every call to wait_output before delta_only was added...
But this place is by design. This is to find a signal to take action, not to find and match strings for other scenarios.
What if you have to check that events happened in a specific order AND take action afterwards...
dmesg=node.tools[Dmesg]
vf_info = node.nics.get_secondary_nic()
primary = node.execute_async('program_name')
# wait for start
primary.wait_output("started")
# toggle sriov
node.features[NetworkInterface].switch_sriov(enable=False)
primary.wait_output('hotplug: remove', delta_only=True)
node.features[NetworkInterface].switch_sriov(enable=True)
# wait for hotplug event
primary.wait_output('hotplug: add', delta_only=True)
dmesg.check_for_kernel_errors() # check for dump immediately after hotplug
# start a secondary process (here's where things get interesting)
secondary = node.execute("program_name --process_type=secondary")
secondary.wait_output("started")
primary.wait_output('secondary process: VF started {vf_info.name}', delta_only=True) # check info is sent to secondary
secondary.wait_output("received info from primary: VF {vf_info.name}", delta_only=True) # check info is received by
secondary.wait_output("sent primary info: ...", delta_only=True) # check secondary sent message to primary after start
primary.wait_output("received secondary info:", delta_only=True) # check second bit of info after secondary vf start
dmesg.check_for_kernel_errors() # check for dump after secondary start
... # send traffic from one nic to the other, do some test stuff.
primary_result = proc.wait_result(expected_exit_code=0 ... )
secondary_result = proc.wait_result(expected_exit_code=0 ...)
for match in some_regex.finditer(primary_result .stdout):
... # do output parsing for things like perf numbers and packet counts
# where order does not matter
Execute, then wait for output. Switch on feature, wait for output. Check thing A happened in process A after that feature was enabled. Check thing B happened in process B after thing A happened.
Why is this bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, for above example, it doesn't really convince me it's needed. The wait on primary/secondary actually is not ordered, because they are running in parallel. Leave the last wait is enough.
primary.wait_output("received secondary info:", delta_only=True)
Anyway, also discussed with Paul, a better design to support both scenarios.
- Add an argument called
offset_to_end
. The default value isTrue
, because most scenarios, it shouldn't wait a process twice. If it's needed, set it toFalse
. But I haven't seen any case need it yet. - Use the
offset_to_end
at the L515. When it's True, set it to len of the temp buffer. When it's False, following the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just really think the bug here is that 'delta_only' doesn't look at only the delta. This concept of what the function was 'for' wasn't clear to me based on the name and my expectation of what would be useful. I called this a 'bug fix' because I thought the delta_only
flag would what the name implies it should do, look at the delta from the last thing we matched.
The problem is the names doesn't accurately describe the behavior. So maybe we make this explicit for the caller:
class StartAt(Enum):
EndOfLastMatch='last_keyword'
EndOfLastBuffer='last_buffer'
wait_output(keyword: str, match_type: Optional[StartAt] = None)
# So then callers can use it like this:
process.wait_output(keyword="Some log line", match_type=StartAt.EndOfLastMatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it can be changed like this, but it needs save two positions, last_end
and last_matched
. It needs to support search from beginning too. But don't overuse it. The cost of this method is to create a string from a buffer, not from the searching. So if a string is ready, searching in the string multiple times is much lower cost than get_value from buffer multiple times. Use it for async signal only, not for logic.
So, it looks like,
class StartAt(Enum):
Beginning = "beginning"
LastMatch = 'last_match'
LastEnd = 'last_end'
# Set default to most useful `LastEnd`.
wait_output(keyword: str, start_at: Optional[StartAt] = StartAt.LastEnd)
# So then callers can use it like this:
process.wait_output(keyword="Some log line", match_type=StartAt.LastMatch)
Fixes: 603bd27
If a buffer is length N and found message 0 is not at buffer[N-len(message0):N], the current code can result in missed messages on future calls to wait_output.
wait_output currently stores the total length of the buffer as the search buffer offset, even when a keyword is found.
Consider a buffer of 100 characters, keyword of len 10 found at 50. The old code will record 100 as the last length. So if your next keyword is length 10 and found at position 70, it will be lost if you call wait_output with delta_only again.
This commit fixes that issue by saving only the end position of slice buffer[start:end] which contains the first keyword.
Otherwise, it saves the length of the buffer if no keyword was found.