Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abciClient.BeginBlockSync should not hang on crashed server #1891

Merged
merged 4 commits into from Jul 11, 2018

Conversation

ethanfrey
Copy link
Contributor

@ethanfrey ethanfrey commented Jul 3, 2018

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG.md

Addressing issue #1890

Reproduced the errant behavior, and pushing to demo that.
Working on a solution.

@ethanfrey ethanfrey changed the title WIP: abciClient.BeginBlockSync should not hang on crashed server abciClient.BeginBlockSync should not hang on crashed server Jul 3, 2018
@codecov-io
Copy link

codecov-io commented Jul 3, 2018

Codecov Report

Merging #1891 into develop will increase coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #1891      +/-   ##
===========================================
+ Coverage    60.86%   60.87%   +0.01%     
===========================================
  Files          217      217              
  Lines        16963    16960       -3     
===========================================
  Hits         10325    10325              
+ Misses        5745     5740       -5     
- Partials       893      895       +2
Impacted Files Coverage Δ
p2p/pex/errors.go 0% <0%> (-33.34%) ⬇️
p2p/pex/pex_reactor.go 72.4% <0%> (-1.63%) ⬇️
p2p/pex/addrbook.go 68.75% <0%> (-0.5%) ⬇️
p2p/peer.go 60.81% <0%> (ø) ⬆️
state/execution.go 71.03% <0%> (+0.07%) ⬆️
blockchain/pool.go 69.58% <0%> (+3.84%) ⬆️

Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 🏎 🍰

cmn.Service, abcicli.Client) {
// some port between 20k and 30k
port := 20000 + cmn.RandInt32()%10000
addr := fmt.Sprintf("localhost:%d", port)
Copy link
Contributor

@melekes melekes Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we use :0 as port here so that the server would pick an ephemeral port for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, and it would be better, but I didn't know how to read the port out, so we can pass the same number into the client.

I do bet this would cause a random 1 in 10000 failure on some dev machines as is, so not ideal.

Copy link
Contributor

@melekes melekes Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the server should expose Addr() method, but this is for another PR. thanks

func (s *Server) Addr() net.Addr
Addr returns the server's network address, either a *TCPAddr or *UnixAddr.

@xla xla self-assigned this Jul 9, 2018
@xla xla added this to Ongoing in current iteration via automation Jul 9, 2018
current iteration automation moved this from Ongoing to Reviewed Jul 10, 2018
Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix frey - tested and indeed it seems to work. Also the reasoning makes sense since we clear the reqQueue but forget about anything thats already been read off the reqQueue and marked as willSend

I'm just wary of the sleep dependent tests - think there's any way we could eliminate the sleeps?

flush := c.FlushAsync()
// wait 20 ms for all events to travel socket, but
// no response yet from server
time.Sleep(20 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems non-deterministic. is there some way we can test this without depending on sleeps ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of any way without major changes, as we wait for another goroutine to process an event and there is no clean way to hook into some event system.

Good thing to keep in mine for a refactor of this code.

s.Stop()

// wait for the response from BeginBlock
fmt.Println("waiting for begin block")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of the print statements please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad.

I'm away from coding machines for a few days. Can you just push that onto this branch? I think I gave maintainers push access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #1950

@melekes melekes merged commit f9ae773 into tendermint:develop Jul 11, 2018
current iteration automation moved this from Reviewed to Done Jul 11, 2018
melekes added a commit that referenced this pull request Jul 11, 2018
@melekes melekes mentioned this pull request Jul 11, 2018
4 tasks
NexZhu pushed a commit to daotl/go-acei that referenced this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants