Skip to content

Commit

Permalink
[neighsyncd] increase neighbor syncd restore timeout to 110 seconds (s…
Browse files Browse the repository at this point in the history
…onic-net#745)

* [neighsyncd] increase neighbor syncd restore timeout to 120 seconds

Neighbor syncd is restoring important information for teamd and BGP.
our timeout should not be shorter than the down stream service.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* [restore_neighbor] improve restore neighbor timeouts

Try to get the bgp timeout and use it for restoring neighbor timeout.
When unavailable, use default 110 seconds.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* Set default values according group discussion result

- restore_neighbors.py timeout at 110 seconds due to observed requirement
  of greater than 70 seconds.
- neighbor syncd timeout at 120 seconds (longer than 110 seconds).

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
  • Loading branch information
yxieca committed Feb 7, 2019
1 parent b78cc8d commit d680ce2
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 8 deletions.
4 changes: 2 additions & 2 deletions neighsyncd/neighsync.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@

/*
* This is the timer value (in seconds) that the neighsyncd waits for restore_neighbors
* service to finish, should be longer than the restore_neighbors timeout value (60)
* service to finish, should be longer than the restore_neighbors timeout value (110)
* This should not happen, if happens, system is in a unknown state, we should exit.
*/
#define RESTORE_NEIGH_WAIT_TIME_OUT 70
#define RESTORE_NEIGH_WAIT_TIME_OUT 120

namespace swss {

Expand Down
13 changes: 7 additions & 6 deletions neighsyncd/restore_neighbors.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@
logger.setLevel(logging.WARNING)
logger.addHandler(logging.NullHandler())

# timeout the restore process in 1 min if not finished
# timeout the restore process in 110 seconds if not finished
# This is mostly to wait for interfaces to be created and up after system warm-reboot
# and this process is started by supervisord in swss docker.
# It would be good to keep that time below routing reconciliation time-out.
TIME_OUT = 60
# There had been devices taking close to 70 seconds to complete restoration, setting
# default timeout to 110 seconds.
DEF_TIME_OUT = 110

# every 5 seconds to check interfaces states
CHECK_INTERVAL = 5
Expand Down Expand Up @@ -189,13 +190,13 @@ def set_statedb_neigh_restore_done():
# Once all the entries are restored, this function is returned.
# The interfaces' states were checked in a loop with an interval (CHECK_INTERVAL)
# The function will timeout in case interfaces' states never meet the condition
# after some time (TIME_OUT).
def restore_update_kernel_neighbors(intf_neigh_map):
# after some time (DEF_TIME_OUT).
def restore_update_kernel_neighbors(intf_neigh_map, timeout=DEF_TIME_OUT):
# create object for netlink calls to kernel
ipclass = IPRoute()
mtime = monotonic.time.time
start_time = mtime()
while (mtime() - start_time) < TIME_OUT:
while (mtime() - start_time) < timeout:
for intf, family_neigh_map in intf_neigh_map.items():
# only try to restore to kernel when link is up
if is_intf_oper_state_up(intf):
Expand Down

0 comments on commit d680ce2

Please sign in to comment.