New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicast routes don't always get their iif updated based on unicast routing changes #79

Open
whariki opened this Issue Jun 23, 2016 · 39 comments

Comments

5 participants
@whariki

whariki commented Jun 23, 2016

The network setup we have is:

S+Rn_.y = multicast source and receiver n, 
                    where .y is the least significant octet of the IPv4 address

Cisco_n = Cisco PIM Router
pimd_n = pimd PIM Router (pimd v2.3.2)

The RP is the router at the top of the picture.

              Cisco_1 --------------------  <BGP>
                 |                        |
S+R1_.165 -------| <ospf>                 |
                 |                     Cisco_2 ------ S+R2_.46
               Cisco_3                    |
                 |                        | <BGP>
S+R3_.117 -------| <ospf>                 |
                 |                        |
                 | <eth0>                 |
               pimd_4                     |
                 | <ndl0>                 |
                 |                        |
                 | <ospf>                 |
                 |                        |
                 | <ndl0>                 |
               pimd_5                     |
                 | <eth0>                 |
                 |                        |
                 | <ospf>                 |
S+R4_.101 -------|------- Cisco_6         |
                 |           |            |
                 |           | <ospf>     |
                 |           |            |
               pimd_7      Cisco_8        |
                 |           |            |
           <ospf>|        S+R5_.85        |
                 |                        |
               pimd_9                     |
                 |                        |
                 |<ospf>                  |
                 |                        |
              Cisco_10 --------------------

Reachability via OSPF is preferred.
The BGP link between Cisco_10 and Cisco_2 is preferred for all pimd_n routers.
The Cisco_1 router is the only router that prefers the direct BGP route to the S+R2_.46 network.

We have intermittent failure when the link between pimd routers is restored. i.e. If the link pimd_4 and pimd_5 has been down long enough for the unicast routing to converge when it is restored the multicast routing doesn't always converge (fails approx. 50% of the time). Specifically the multicast receiver S+R3_.117 might not see traffic from sources S+R4_.101 and / or S+R5_.85 and / or source S+R2_.46. The combination of which sources are lost from which receiver varies in the current setup.

Tracking through debug logs it appears that the change in the unicast routing is not always picked up.

The attached logs are from pimd_4 and pimd_5.

CoolSq_10.200.55.119_20160615-160927_pimd_4.txt
SqCool_10.200.55.106_20160615-161131_pimd_5.txt

Events of interest are on the pimd_4 router log (I've bolded the timestamps of the symptoms):

  • 23:13:58 (approx) link is restored
  • 23:13:59.034 : first report of multicast packet being received over the wrong iif after the link restored.
  • 23:13:59.232 : PIM neigbour restored with pimd_5
  • 23:13:59.471 : first join/prune received from pimd_5
  • 23:14:20.423 : it looks like the unicast_routing_timer has triggered and the various group routes are being tested. Both the sources .85 and .101 return a rpf of vif 1, but you can see by comparing the previous and next summaries that .85 has changed Incoming : I.. to Incoming : I.. while .101 remains Incoming : I..
  • 23:14:40 : it looks like the source .101 has the incoming interface updated correctly but there are still subsequent reports of the multcast coming in on the wrong interface 23:14:57.732 Wrong iif: src 10.200.55.101, dst 239.0.4.1, iif 1

It is as if the "change_interfaces" call on line 641 of timer.c isn't actually causing the change to be applied to the group, maybe it needs the force change flag set?

@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 23, 2016

Definitely something fishy going on there, well spotted! Not sure what's going on, but I had a few pull requests regarding PIM Join handling pending that now have been merged to the GIT master. In case you want to try it out ...

I'm setting up a CORE network now to see if I can reproduce your problem, can't make any promises though. So anything you can dig up or patch/try yourself is most welcome!

@masaraksh79

This comment has been minimized.

masaraksh79 commented Jun 23, 2016

Hey. Have now tried the code changes from git-master on the system that @whariki has mentioned and found out that unfortunately these didn't fix the issue.

@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 23, 2016

OK, thanks for the followup @masaraksh79, sorry it didn't help!

This is an awesome bug report, but I'll have to be frank with you guys. First of all I'm on vacation, second I'm maintaining pimd as a bit of a side project, or hobby. The whole project relies on features and bug fixes provided by the community. This repository at GitHub is more of a focal point for that development.

Anyway, I'll try to reproduce your problem here and start reading up on the RFC's to figure out where the problem lies, but I wouldn't rely on me for a solution to the problem you've found. It seems to me though, that you're at a very good point yourself to figure it out and propose a solution.

@whariki

This comment has been minimized.

whariki commented Jun 24, 2016

That's understood. Keen to contribute. Will keep working on it my end and
post any progress.
On 24/06/2016 10:49 am, "Joachim Nilsson" notifications@github.com wrote:

OK, thanks for the followup @masaraksh79 https://github.com/masaraksh79,
sorry it didn't help!

This is an awesome bug report, but I'll have to be frank with you guys.
First of all I'm on vacation, second I'm maintaining pimd as a bit of a
side project, or hobby. The whole project relies on features and bug fixes
provided by the community. This repository at GitHub is more of a focal
point for that development.

Anyway, I'll try to reproduce your problem here and start reading up on
the RFC's to figure out where the problem lies, but I wouldn't rely on me
for a solution to the problem you've found. It seems to me though, that
you're at a very good point yourself to figure it out and propose a
solution.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#79 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AEoj9fBTL2mzRn1UkUhIoWo0AHGgVgwjks5qOw1sgaJpZM4I8aT8
.

@masaraksh79

This comment has been minimized.

masaraksh79 commented Oct 21, 2016

@troglobit seems that we've not progressed much on this issue ourselves since June. It looks like our firm is willing to pay fixing the problem in this project, mostly because it own resource is constrained to other activities...

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Oct 27, 2016

Tested this a bit and at least in one test environment following helped.

diff --git a/timer.c b/timer.c
index de9a11e..e69129c 100644
--- a/timer.c
+++ b/timer.c
@@ -617,8 +617,8 @@ void age_routes(void)
                            }

                            /* iif info found */
-                           if ((srcentry_save.incoming != mrt_srcs->incoming) ||
-                               (srcentry_save.upstream != mrt_srcs->upstream)) {
+                           if ((srcentry_save.incoming != mrt_srcs->source->incoming) ||
+                               (srcentry_save.upstream != mrt_srcs->source->upstream)) {
                                /* Route change has occur */
                                update_src_iif = TRUE;
                                mrt_srcs->incoming = mrt_srcs->source->incoming;
@@ -643,7 +643,7 @@ void age_routes(void)
                                          mrt_srcs->joined_oifs,
                                          mrt_srcs->pruned_oifs,
                                          mrt_srcs->leaves,
-                                         mrt_srcs->asserted_oifs, 0);
+                                         mrt_srcs->asserted_oifs, MFC_UPDATE_FORCE);

                    check_spt_threshold(mrt_srcs);
@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 4, 2017

So, what do you guys think, should we merge @idismmxiv patch to the next release?

@troglobit troglobit added this to the 2.4.0 milestone Jun 4, 2017

troglobit added a commit that referenced this issue Jun 6, 2017

Fix #79: Use of wrong ptr in route change comparison
Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 6, 2017

Mika's (@idismmxiv) patch is now on a separate branch, ready for merge.
Pending approval from the community, i.e. you guys.

@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 8, 2017

Hey, @whariki and @masaraksh79 maybe you've moved on to other things, if not it would be really appreciated if you could weigh in on the patch by @idismmxiv before I merge it to the upcoming pimd v3.0.

Thank you in advance! :)
/Joachim

@masaraksh79

This comment has been minimized.

masaraksh79 commented Jun 9, 2017

@troglobit, sorry for not replying with feedback to the forum on @idismmxiv patch. Tried this one last year but the test system didn't show an improvement, moreover, it had created more "noise". I'll clarify the result detail with my QA and would attempt to provide a better comment. Bottom line - it didn't solve my issue.

@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 9, 2017

@masaraksh79 No problem, thank you for checking back in! 😃

OK, that means I'll hold off on merging the patch (speculatively) until we know more.

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Aug 17, 2017

@masaraksh79. Just want to check again, regarding my patch proposal. You said that you tested similar change year ago, but did you notice the another change I made (not MFC_UPDATE_FORCE flag usage):

-                           if ((srcentry_save.incoming != mrt_srcs->incoming) ||
-                               (srcentry_save.upstream != mrt_srcs->upstream)) {
+                           if ((srcentry_save.incoming != mrt_srcs->source->incoming) ||
+                               (srcentry_save.upstream != mrt_srcs->source->upstream)

It looks to me like there is clear defect as in line 618 and 619. In line 608 mrt_srcs->source is passed as param, but in comparison in lines 618 and 619 in made against mrt_srcs->incoming.

@masaraksh79

This comment has been minimized.

masaraksh79 commented Aug 17, 2017

The last test I did with this patch was in November last year. Unfortunately, did not have time to get back to it yet. Cannot say whether it has improved or made things worse but it clearly didn't solve our setup issues. Nevertheless, have no negative feedback about this code change. Cheers.

@masaraksh79

This comment has been minimized.

masaraksh79 commented Aug 17, 2017

Ah, BTW, a comment I've received from the customer was: issue was "significantly mitigated" by configuring the RP to be in vicinity of the network traffic multicast source (I recall they have been using devices with PIM version older than 2.3.2; definitely -- without the last patch)

@troglobit

This comment has been minimized.

Owner

troglobit commented Aug 20, 2017

@idismmxiv @zenvirta @masaraksh79 ... sounds to me like we can safely merge 0ba2966 to master for the 3.0 release then. Yet keeping this issue open?

@masaraksh79

This comment has been minimized.

masaraksh79 commented Aug 20, 2017

from my point of view, there is no issue with the patch. Would appreciate if you do keep the issue open, yes.

@troglobit troglobit closed this in 0ba2966 Aug 20, 2017

troglobit added a commit that referenced this issue Aug 20, 2017

Merge pull request #110 from troglobit/fix-mrt-iif-change
Fix #79: Use of wrong ptr in route change comparison

@troglobit troglobit reopened this Aug 20, 2017

@troglobit

This comment has been minimized.

Owner

troglobit commented Aug 20, 2017

Merged. Thank you for taking the time to answer :)

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Aug 24, 2017

Thanks masaraksh79. Looks like there are still some issues in change_interfaces function as parameters are not used as they should. I'll try to look into that some day.

@searlernz

This comment has been minimized.

searlernz commented Feb 8, 2018

I have been trying to investigate this further using 2.3.2 build and some additional debug logging in the age_routes function to see if there is anything suspicious in the region noted by @idismmxiv.
Still haven't found anything but I have the following information about a similar breakage:

  1. R2_46 cannot see multicast from either R4_101 or R3_117.
  2. pimd_5 is sending encapsulated null register messages to Cisco_10 (the RP) via pimd_4 and receiving register_stop messages via pimd_7 (asymmetric routing)
  3. Multicast packets from R4_101 are arriving at pimd_9 from pimd_7 but are not being sent to Cisco_10
  4. Cisco_10 is sending JOIN messages to pimd_9 for source R4_101
  5. pimd_9 is indicating that there is a join on the interface to Cisco_10 and that interface is outbound
  6. pimd_9 is indicating SPT CACHE SG for the Source 101 S, G flags
  7. pimd_9 is logging receipt of the JOIN messages and routing status indicates that the interface to pimd_7 is Input and that it has received prune on the interface to pimd_7
  8. Cisco_10 indicates that the RPF for source 101 is pimd_7 and the forwarding is to Cisco_2
  9. All of the pimd elements have traceroute for Cisco_10(RP) via Cisco_2
  10. pimd_5 and pimd_7 are exchanging ASSERTs about source R4_101 and pimd_7 is indicating asserted state (the loser).

Would unencapsulated multicast packets from pimd_7 be passing through the pim daemon or would they be expected to be handled entirely within the kernel? It looks like router should be passing packets from pimd_7 to Cisco_10 but it is dropping them for some reason.
NB. All of the above points referring to R4_101 also apply to R3_117 but I have a little less visibility into the packet flow for that source.
Attached syslog and pimd -r output for pimd_9 in the broken state (changed IP addresses to match the naming in the diagram for the issue)
pimd_9_syslog.txt
pimd_status.txt

Any ideas?

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Feb 8, 2018

There was interesting log entry in every 3.30minutes: "S,G entry changed (timeout) causing interface switch at iif 1".
I suppose you have added that entry to age_routes, but as it keeps coming over and over again, change does not make affect for some reason. Could you add diff or something to show what are your changes and in what places they are?

@searlernz

This comment has been minimized.

searlernz commented Feb 8, 2018

Changes here

timer.c.txt

@searlernz

This comment has been minimized.

searlernz commented Feb 11, 2018

I had a slight concern that the "S,G entry changed (timeout) ...." messages were caused by the link break/re-make actions taken to cause the issue but I can confirm that they are still occurring several days later with no network linking events on the system and the multicast routing still hasn't recovered. I suspect that they could be a fertile area for further investigation.

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Feb 12, 2018

I tried to reproduce the problem but as I do not have Cisco box for RP I had to use pimd for RP as well. Unfortunately there seems to be some issues in this kind of scenario in pimd RP functionality (but that is an another story).

Anyway by doing following <two changes, link break/remake looked like working.

  • First it still looks to me that patch #110 is mandatory.
  • Second, in timer.c function check_spt_threshold removes idle kernel cache entry but leaves? pimd state to previous state. So, I suggest that you try to remove line 260
    delete_single_kernel_cache(mrt, kc);

you can also use command "ip mroute show all" to see how does multicast routing table look like from kernel point of view. If that is very different compared to pimd view of active routes, then situation is not good.

@searlernz

This comment has been minimized.

searlernz commented Feb 12, 2018

Thanks for the update, that looks like a promising line of inquiry (and small enough that my fears of causing regressions elsewhere are reduced)

@searlernz

This comment has been minimized.

searlernz commented Feb 14, 2018

Unfortunately, the devices running pimd have the truncated busybox implementation of ip so they don't support the ip mroute show all command.
This is the output for cat /proc/net/ip_mr_cache (little-endian)

Group        Origin        Iif      Pkts    Bytes         Wrong    Oifs
010400EF 2EFBC80A 0          1      100            0           1:1  
010400EF 7537C80A  0      59447  5944700    59442
010400EF 6537C80A  0     118895 11889500  118884

Group 239.0.4.1 sources 10.200.55.101 and 10.200.55.117 (R3_117 and R4_101) have no Oif so I think that you are correct in suspecting a mismatch between the kernel and pimd views that is not healing.

Will try your proposed fixes next.
Thanks

@searlernz

This comment has been minimized.

searlernz commented Feb 14, 2018

Also worth noting that the Wrong count is going up with received packets and the pimd_7 device has Oif pointing at pimd_9 for the same S,G combinations so everything points at failing to update kernel cache correctly.

@searlernz

This comment has been minimized.

searlernz commented Feb 15, 2018

Initial testing showing promise. Not exactly fast re-convergence (a few minutes) but so far not managed to get broken forever.

@searlernz

This comment has been minimized.

searlernz commented Feb 15, 2018

I think that testing is now complete and successful for this specific use-case (I added a debug log at line 260 in place of the deletion and that path is definitely executed during testing so the change in behaviour appears to be necessary for the fix. The patch #110 log is not appearing during testing but I did some due diligence review of the code and agree that the code makes no sense at all without it. (It looks like a cut-paste and modify error based on line 412/413 so the pattern should be
save existing route (lines 646,647)
call set_incoming to determine new incoming/upstream (line 648)
test if set_incoming has changed things and then update (lines 662,663 - pre patch - this is checking the wrong field for changes)

NB Lines 666 and 667 almost certainly hide the logic error and cause the correct behaviour to occur on the next pass through age_routes so it is arguably not necessary to have the patch in order to fix this issue.

For our use-case of very, very stable sparse group membership - basically every element joins once
and then stays a member until it reboots, this fix appears to work and is only executed on network topology change and appears to have no side-effects. Not sure what regression tests would be needed to verify other use-cases where members/sources are more dynamic (the intent of the original code appears to be to clean up kernel caching for sources that are no longer in the group)

@troglobit

This comment has been minimized.

Owner

troglobit commented Feb 16, 2018

This is great news, thank you for taking the time to debug this @searlernz ! Do you think you could whip up a patch or a pull-request to summarize the findings? Most of us coder types find it much easier to discuss around a piece of code rather than line numbers ... :)

@searlernz

This comment has been minimized.

searlernz commented Feb 18, 2018

multicast_route_iif_update_fix.txt@@ -617,8 +616,8 @@ void age_routes(void)

The patch file contains three items (diff against 2.3.2) :

  1. #110 fix which has already been incorporated
--- a/timer.c
+++ b/timer.c
@@ -617,8 +616,8 @@ void age_routes(void)
 			    }
 
 			    /* iif info found */
-			    if ((srcentry_save.incoming != mrt_srcs->incoming) ||
-				(srcentry_save.upstream != mrt_srcs->upstream)) {
+			    if ((srcentry_save.incoming != mrt_srcs->source->incoming) ||
+				(srcentry_save.upstream != mrt_srcs->source->upstream)) {
 				/* Route change has occur */
 				update_src_iif = TRUE;
 				mrt_srcs->incoming = mrt_srcs->source->incoming;

@@ -643,7 +642,7 @@ void age_routes(void)
 					  mrt_srcs->joined_oifs,
 					  mrt_srcs->pruned_oifs,
 					  mrt_srcs->leaves,
-					  mrt_srcs->asserted_oifs, 0);
+					  mrt_srcs->asserted_oifs, MFC_UPDATE_FORCE);
 
 		    check_spt_threshold(mrt_srcs);

NB. I am certain that the change to the conditional (first change block) corrects a logic error but would not change the behaviour of pimd (possibly apart from deferring acting on a route change for 20 seconds). The second change block forces route update even if the route appears not to have changed so that seems like it is important.

  1. Fix to stop kernel cache routes getting out of synch with pimd routes with a debug log to confirm that the path is executed when attempting to reproduce issue
--- a/timer.c
+++ b/timer.c
@@ -257,10 +257,9 @@ static void check_spt_threshold(mrtentry_t *mrt)
 
 	status = k_get_sg_cnt(udp_socket, kc->source, kc->group, &kc->sg_count);
 	if (status || prev_bytecnt == kc->sg_count.bytecnt) {
-	    /* Either (for whatever reason) there is no such routing
-	     * entry, or that particular (S,G) was idle.  Delete the
-	     * routing entry from the kernel. */
-	    delete_single_kernel_cache(mrt, kc);
+	   logit(LOG_DEBUG, 0, "Bypassing kernel cache deletion for S,G route %s,%s - idle",
+	         inet_fmt(kc->source, s1, sizeof(s1)),
+	         inet_fmt(kc->group, s2, sizeof(s2)));
 	    continue;
 	}
  1. A small modification to the pimd -r output to display the RP candidate table (otherwise the RP information is only shown for active groups which makes failover troubleshooting a bit harder than it needs to be. NB. Not required for the fix but I found it helpful
--- a/debug.c
+++ b/debug.c
@@ -272,6 +272,7 @@ void fdump(int i __attribute__((unused)))
     if (fp) {
 	dump_vifs(fp);
 	dump_pim_mrt(fp);
+	dump_rp_set(fp);
 	fclose(fp);
     }
 }
@searlernz

This comment has been minimized.

searlernz commented Feb 22, 2018

Any idea when this change might be incorporated in a formal release?

@troglobit

This comment has been minimized.

Owner

troglobit commented Feb 22, 2018

@searlernz If possible, I'd like to first get some peer review of 2. from @idismmxiv, but a v2.4.0 release is long overdue.

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Feb 23, 2018

I comment only step 2 here as one is already part of master and 3 is just extra debugging,
As said earlier, given change seems to fix the problem described here. Also this looks fairly safe change.

Only thing I am wondering here is this status comparison in if statement. In what circumstances status can be non zero, and what would happen in such scenario? At least I don not know.

@troglobit

This comment has been minimized.

Owner

troglobit commented Feb 23, 2018

@idismmxiv status is non-zero only if we fail to read the SG stats from the kernel. Like the comment says, "Either (for whatever reason) there is no such routing entry, or that particular (S,G) was idle ..."

But if there is no such routing entry there's no need to delete it, right?

The latter case, when prev_bytecnt == kc->sg_count.bytecnt, there has been no activity (idle), and that was reason enough to remove it. Which seems bogus without some other cleanup mechanism to also signal neighbor routers that we no longer want the S,G ... but now I'm only speculating.

@idismmxiv

This comment has been minimized.

Contributor

idismmxiv commented Feb 23, 2018

@troglobit, you are right about no need to remove unexisting route.

But there might be several reasons why given mrt has been idle. Sender has not sent anything for 20 secs or route is broken temporarily. Receiver still might want to receive that stream if any data happens to become available. So I think its up to receiver to tell us when to quit forwarding.

I think following is what actually happened in this issue. Route was lost and it took over 20 sec to recover. That caused given MRT entry to be removed from kernel. When next packet entered to kernel, it caused cache__miss. Put incoming packet come from different interface than before, and it was handled in somehow unwanted manner and that caused multicast stream to never recover.

Now after proposed change, mrt entry is not removed from kernel and rerouted multicast packet does not cause cache_miss, but wrong_iif event, and that code path enables routing recovery.

So I expect there to be several different ways to fix this, but I do not know what would be the right one.

@troglobit

This comment has been minimized.

Owner

troglobit commented Feb 24, 2018

@idismmxiv Agreed, that explanation sounds very plausible. I guess we should start by applying this patch, to at least let routing recover.

troglobit added a commit that referenced this issue Feb 24, 2018

Mitigate #79: Don't delete MRT entry on idle activity
Signed-off-by: Joachim Nilsson <troglobit@gmail.com>
@troglobit

This comment has been minimized.

Owner

troglobit commented Feb 24, 2018

Patch 2. merged in 7ef29de, with extra comment referencing this issue.

Hopefully some brave soul steps up to investigate this in more detail and can propose a better fix, or address the long reconfiguration time.

troglobit added a commit that referenced this issue Feb 24, 2018

Ref. #79: Display RP candidate table in `pimd -r` output (debugging)
Signed-off-by: Joachim Nilsson <troglobit@gmail.com>

@troglobit troglobit removed this from the 3.0 milestone Feb 25, 2018

@masaraksh79

This comment has been minimized.

masaraksh79 commented Jun 13, 2018

@troglobit ,FYI, our customer has rolled out radio firmware with kernel route fix in 2.3.2 and they're quite satisfied of its performance on their larger system, from the feedback I've received today. Good stuff!

PS. @searlernz

@troglobit

This comment has been minimized.

Owner

troglobit commented Jun 13, 2018

That's great news, @masaraksh79, thank you so much for the update! 😃👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment