Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicast group not appearing on Rendezvous Point router #211

Open
stormshield-rlibaert opened this issue Dec 13, 2021 · 6 comments
Open
Milestone

Comments

@stormshield-rlibaert
Copy link
Contributor

stormshield-rlibaert commented Dec 13, 2021

Hello,

We may have found a bug where Pimd as RP does not get the information of a new route.
We use pimd version 3.0-beta1.

We have the following topology :
Image Pasted at 2021-12-13 11-25

  • pc_1 is the receiver
  • pc_2 is the source
  • firewall_2 is the RP (Rendezvous Point) router
  • firewall_3 is the DR (Designated Router)

When starting emission of traffic with iperf, we see in pimctl show mrt that a new multicast route is created on the DR, but not on the RP. Similarly when the receiver is started, a new route is shown on both firewall_1 and the RP. However as the RP can not match the requested multicast group, no data is received on pc_1.

We investigated and found this commit may be introducing the bug : edb0aac
Issue related : #67

We tried the following patch : stormshield-rlibaert@cd3f7e6 and it seems to be fixing the issue.

We would like to know your advice on whether or not this is a correct way to fix the problem. If so, should we make a merge request ?
Also, we are not sure if it is related, but it seems that the "interval" field of spt-threshold is ignored. What do you think ?

@troglobit
Copy link
Owner

Hi,

you've stumbled into the last remaining blocker issue before 3.0 GA. I still have a few commits in my patch queue that I haven't pushed yet, and even some unstable fixes for what I believe is very close to your issue. I've been meaning to finalize this, and add a test case for it, but have not have the time. Hoping for Christmas break ...

A few questions:

  • Do you get this early after starting up the pimd instances, or does it happen after "a while"?
  • If it's the former, I believe you're seeing what I am, which seems to be lack of route updates on RP election changes
  • What multicast group(s) are you using for testing? The code you changed should only affect the reserved PIM-SSM group range 232.0.0.0/8

@stormshield-rlibaert
Copy link
Contributor Author

stormshield-rlibaert commented Dec 14, 2021

Hi,

Thank you for your fast answer! 😃

Indeed we get this as soon as pimd is started (actually after a few second, just the time to switch terminals and launch commands).
Our tests were performed in PIM-SM. I realize that this makes the fix kinda strange indeed. I'll go and take a look in the meantime.

@stormshield-rlibaert
Copy link
Contributor Author

stormshield-rlibaert commented Dec 14, 2021

Looks like I was mistaken, the fix was not working on our side. For now, just reverting the suspected commit makes pimd to work as we expect (for reference stormshield-rlibaert@203b2a8). And it makes more sense too.

Edit: in our case find_routes returns mrt == NULL, so the call to switch_shortest_path must be done before checking for return, otherwise it doesn't work.

@troglobit
Copy link
Owner

Yeah that's not right either, the SPT switchover should follow its established config. I've been sniffing around that same code myself.

The problem seem to be exactly what I've seen; when booting up the routers try to elect an RP and any PIM Joins sent before that has stabilized are prone to this problem. So my theory right now, if I remember correctly from a few months back when I was debugging this the last time, is that pimd doesn't resend all PIM Joins (and Leaves) after a new RP election. When that works, then we can have a closer look at reducing the convergence time.

@stormshield-rlibaert
Copy link
Contributor Author

Hello Joachim,

I further investigated the issue and as you were expecting the proposed revert is not good. I found out the RP indeed created the SG as I wanted but it also started to send registers more than needed, exactly as described in #128.

After a few researches, I came with another solution which is basically a fix for the issue I just mentioned.
stormshield-rlibaert@099748b

This works pretty well for my test case. However, I am pretty certain that it has some unwanted side effects (PMBR, BorderBit, Null Register, ...). That being said, trying to further fix the issue would imply much intrusive changes. I was thinking about reworking the receive_pim_register function to make it more like what we can find in FRrouting.

@troglobit
Copy link
Owner

Nice! Yeah I actually have a backlog of things I haven't pushed yet, some of it addressing issues you've mentioned here. Been busy with lots of other projects, however, so haven't got back to this in a while. I'd like to set up a test case for verifying convergence and such bits.

Please send a PR for stormshield-rlibaert/pimd@099748b that looks much better than what we have now :)

Meanwhile, I'll reopen #128 so we can close it properly with your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants