Skip to content

Conversation

@zaibon
Copy link
Contributor

@zaibon zaibon commented Oct 30, 2019

fixes #353

@codecov
Copy link

codecov bot commented Oct 30, 2019

Codecov Report

Merging #354 into master will decrease coverage by 0.05%.
The diff coverage is 57.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
- Coverage   40.89%   40.83%   -0.06%     
==========================================
  Files          61       61              
  Lines        3771     3781      +10     
==========================================
+ Hits         1542     1544       +2     
- Misses       2028     2032       +4     
- Partials      201      205       +4
Impacted Files Coverage Δ
pkg/gedis/gedis.go 14.03% <ø> (ø) ⬆️
pkg/provision/remote_store.go 0% <0%> (ø) ⬆️
pkg/gedis/commands_provision.go 44.49% <100%> (-0.76%) ⬇️
pkg/provision/reservation.go 22.85% <45.45%> (+10.35%) ⬆️
pkg/provision/source.go 32.92% <69.23%> (+0.45%) ⬆️
pkg/flist/flist.go 54.78% <0%> (-1.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06ac4b4...5c50099. Read the comment docs.

Comment on lines 58 to 72
last := time.Now()
defer close(ch)

for {
time.Sleep(next.Sub(time.Now()))
since := next
next = time.Now().Add(s.maxSleep)
now := time.Now()
time.Sleep(next.Sub(now))
next = now.Add(s.maxSleep)

res, err := s.store.Poll(pkg.StrIdentifier(s.nodeID), all, since)
res, err := s.store.Poll(pkg.StrIdentifier(s.nodeID), all, last)
if err != nil && err != ErrPollEOS {
log.Error().Err(err).Msg("failed to get reservation")
time.Sleep(time.Second * 20)
continue
}

last = time.Now()
Copy link
Member

@muhamadazmy muhamadazmy Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really believe the logic here can/will cause some reservations to not get processed. I am not saying that the logic before is correct, it was also faulty.

The problem with the logic is that it tries to ask for reservation since the "last" time the client did a read. While the correct logic should be asking for reservation since the "last" time the server had any reservations.

Let's say all reservations on the server has timestamps (if we gonna use time stamp as a read cursor on the server, we should use high precision value (with nano second, i would say). The the client on first poll call with cursor value set to 0 (which mean since the beginning of time). The server will respond will all the reservations available for that node (may be paging also should be implemented, since this can be a lot) but on next call, instead of calculating (when i made the last call) You should use the timestamp of the last reservation you received. the server only should return the ones with a timestamp after this value. (it's like a cursor)

A better value for the cursor can be the index of the reservation in an array, so you can always ask to poll for the next available value for what u last received.

Analysis why this code is not good enough

In ideal world the code will work as below

next = 0
last = 0
--- first iteration
   now = 1 #current time
   sleep next - now == sleep -1 #no sleep
   next = now + MS = 1 + 10 == 11 
   poll (last) == poll(0) # assume this took 3 seconds to execute
   last = now() == 1 + 3 == 4
--- second iteration
   now = 1 + 3 == 4 # the previous time + execution time of the poll
   sleep 11 - 4 == 7 
   next = now + MS = 14
   poll(last) == poll(4) # takes another 3 seconds
   last = now() == 4+3 == 7
--- 

In real world though, there is network latency, and serialization latency, etc.. (for example the server serializing the data, to byte stream, and before the stream is even written to output, the data structure changes)

Possible race conditions can happens:

  • the time between the server start writing the poll response, until the client call the last = now() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep very right.
So what do you propose. That I use the timestamp of the last received reservation instead ?

I guess quite some improvement can be made server side to have a better logic about that. But for what we have now, just using last receive reservation timestamp should be good enough ?

Also, feel free to open an issue on https://github.com/threefoldtech/jumpscaleX_threebot/issues to propose a better way to poll new reservations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess quite some improvement can be made server side to have a better logic about that. But for what we have now, just using last receive reservation timestamp should be good enough ?

That should be good enough idd if you are sure that a new reservation can not possible get assigned the same timestamp (hence I said a nano-second precision should be used instead of seconds).

Also the server should take care not to return the reservation with that exact timestamp, but only the one that has timestamp > cursor value

@zaibon zaibon force-pushed the provision_source_timer branch from 30b8eeb to f857772 Compare November 1, 2019 11:12
@zaibon zaibon merged commit 378ee86 into master Nov 1, 2019
@zaibon zaibon deleted the provision_source_timer branch November 1, 2019 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

provisiond: new reservation are not received from testnet BCDB

3 participants