Skip to content

vmac: fix creation / deletion VMAC issue due to race conditions #2388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

louis-6wind
Copy link
Contributor

Fix two race conditions issue where the VMAC interfaces are not created or deleted following the apparition or disparition of their link interface.

Due to a race condition, the VMAC interface sometimes remains undeleted
after Netlink reports the removal of its link interface.

The function cleanup_lost_interface() fails to delete the interface.
This failure occurs because the condition
'IF_BASE_IFP(vrrp->ifp) != ifp' in the following code segment never
matches:

> 		if (vrrp->ifp != ifp
> #ifdef _HAVE_VRRP_VMAC_
> 		    && IF_BASE_IFP(vrrp->ifp) != ifp && VRRP_CONFIGURED_IFP(vrrp) != ifp
> #endif

This non-matching situation arises because 'vrrp->ifp->base_ifp' is the
same as 'vrrp->ifp'.

During the initialization in if_get_by_ifname(), when a new interface_t
pointer is created, 'ifp->base_ifp' is set as follows:

> ifp->base_ifp = ifp;

netlink_if_link_populate() gets bypassed whenever
'ifp->base_ifp->ifindex' is different from the index of the actual link
interface. This bypass is inevitable when 'ifp' is newly allocated.
netlink_if_link_populate() normally populates the 'ifp' data including
the 'ifp->base_ifp'.

Fix the issue of VMAC deletion by adjusting the logic so that
netlink_if_link_populate() does not get skipped when the base_ifp of a
VMAC is the VMAC itself.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add #endif descriptions in netlink_if_link_populate() to clarify in
which #ifdef blocks the next commit is modifying the code. This is a
cosmetic change.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The VMAC interface sometimes does not reappear when its associated link
interface is quickly re-added after being deleted, a situation caused by
a race condition.

This problem manifests during the operations of cleanup_lost_interface()
This function checks for VMAC interfaces on top of the removed link
interface. It deletes these VMAC interfaces if they are present.
Subsequently, Netlink is invoked to refresh the information of all
interfaces, and this is followed by the cleaning of the link interface
data in memory.

The problem occurs when Netlink, queried indirectly by sub-functions of
cleanup_lost_interface(), detects that the link interface has
reappeared. Although the interface data in memory is updated
accordingly, cleanup_lost_interface() unconditionally clears this
refreshed information. As a result, the data regarding the link
interface is lost, preventing the re-creation of its associated
VMAC interfaces.

Fix the VMAC creation issue by adding a 'cleaning flag' that is set at
the start of the cleanup process. This flag says whether to proceed with
the interface data cleanup. If the interface is refreshed during the
Netlink polling, the flag is unset, thereby preventing the subsequent
clearing of the updated interface information.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
@louis-6wind
Copy link
Contributor Author

@pqarmitage Please review this

@pqarmitage
Copy link
Collaborator

@louis-6wind I was looking at these commits yesterday and trying to work out what was happening.

I think commit 984ac06 may have improved the situation, since it now doesn't delete any VMACs until it has worked through all the VRRP instances tracking a deleted interface, and only deletes the VMACs once the internal data structures have all been updated. I would be grateful for your thoughts as to whether commit 984ac06 affects any of your changes.

I am currently trying to work my way through all the outstanding pull requests, either to merge them, review them, or explain why they cannot be merged.

I note that you have been collaborating with Hari Sasank (@harisasank) and I am confused by pull requests #2565 and #2556, which appear to be addressing the same issue, although I note that #2565 has been closed now. The original commit (1960568) appears to have disappeared, and we now have three new commits (I haven't had time to look at these yet, since I was trying to work out what was happening between #2556 and #2565).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants