Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#28179 win32 robustness #608

Merged
merged 8 commits into from Dec 20, 2018

Conversation

Labels
None yet
Projects
None yet
4 participants
@ahf
Copy link
Member

@ahf ahf commented Dec 20, 2018

I tried to run the test cases for process_t related code in a VM with limited amount of memory and where I also ran a "load heavy" task (compiled Tor with -jN in a while true loop) and made some discoveries:

  1. The exit handler would sometimes be called before we received the "Broken Pipe" event from ReadFileEx(). We now wait until we have received EOF state for both standard out and error before we check whether the process have terminated.
  2. We forgot to clean up the child process' end of the pipes after the process have been created.
  3. There is no need to log ordinary "end of process" conditions as "warning" log severity.
  4. We forgot to set reached_eof to true under certain pathological conditions. Together with (1) this could sometimes lead to an error in the slow tests where the process was terminated before all output from the child process had been received by Tor's event loop.
  5. Add some additional error checking around ReadFileEx() and WriteFileEx() which is recommended by the MSDN API documentation.

To run a test in PowerShell until it fails one can do:

while ($true) { .\test-slow.exe slow/process/... ; if ($LastExitCode -ne 0) { break : } }

ahf added 8 commits Dec 20, 2018
This prevents us from leaking the HANDLE for stdout, stderr, and stdin.

See: https://bugs.torproject.org/28179
…re closed.

This patch makes us delay checking for whether we have an exit code
value (via GetExitCodeProcess()) until both stdout and stderr have been
closed by the operating system either by the process itself or by
process cleanup after termination.

See: https://bugs.torproject.org/28179
This patch adds some additional error checking after calls to
ReadFileEx() and WriteFileEx(). I have not managed to get this code to
reach the branch where `error_code` is NOT `ERROR_SUCCESS`, but MSDN
says one should check for this condition so we do so just to be safe.

See: https://bugs.torproject.org/28179
This patch adds some missing calls to set `reached_eof` of our handles
when various error conditions happens or when we close our handle (which
happens at `process_terminate()`.

See: https://bugs.torproject.org/28179
Handle `ERROR_BROKEN_PIPE` from ReadFileEx() and WriteFileEx() in
process_win32_stdin_write_done() and
process_win32_handle_read_completion() instead of in the early handler.
This most importantmly makes sure that `reached_eof` is set to true when
these errors appears.

See: https://bugs.torproject.org/28179
This patch changes the CancelIoEx() example code to use CancelIo(),
which is available for older versions of Windows too. I still think the
kernel handles this nicely by sending broken pipes if either side
closes the pipe while I/O operations are pending.

See: https://bugs.torproject.org/28179
Let's not use log_warn() when a pipe is closed under what should be
considered normal conditions.

See: https://bugs.torproject.org/28179
@coveralls
Copy link

@coveralls coveralls commented Dec 20, 2018

Pull Request Test Coverage Report for Build 3351

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.001%) to 61.036%

Files with Coverage Reduction New Missed Lines %
src/feature/hs/hs_common.c 1 83.28%
Totals Coverage Status
Change from base Build 3349: -0.001%
Covered Lines: 43835
Relevant Lines: 71818

💛 - Coveralls

Copy link
Contributor

@nmathewson nmathewson left a comment

Branch looks okay -- just added one question.

* something to write and is thus usually not awaiting to finish any
* operations. If we WriteFileEx() to a file that has terminated we'll simply
* get an error from ReadFileEx() or its completion routine and move on with
* life. */
Copy link
Contributor

@nmathewson nmathewson Dec 20, 2018

Is there any possibility that these checks might prevent a dead process from being freed forever? I'm guessing no, but I feel like I should ask.

Copy link
Member Author

@ahf ahf Dec 20, 2018

I haven't been able to see that happen: even with processes that don't initialize properly (missing a .DLL for example) it seems like the console handles gets closed properly upon termination. I wanted to document this though such that it makes more sense for people looking at the code in the future.

* should check GetLastError() after a call to WriteFileEx() even though the
* `ret` return value was successful. If everything is good, GetLastError()
* returns `ERROR_SUCCESS` and nothing happens.
*
Copy link
Contributor

@nmathewson nmathewson Dec 20, 2018

Wow, what a horrible API they have! But I guess we can't do anything about that. (I wonder if it is intentional, or whether they had a bug and just documented it as the official behavior.)

Copy link
Member Author

@ahf ahf Dec 20, 2018

I agree. In the documentation they mostly talk about buffer size issues, but I think that is only happening if you do I/O where you have multiple async I/O calls out at the same time that works on the same overlapped structure.

@torproject-pusher torproject-pusher merged commit 7762088 into torproject:master Dec 20, 2018
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment