Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(ABORT) RTC::RTCP::FeedbackRtpTransport::AddPacket() | failed assertion `baseSet': base not set #1336

Closed
ibc opened this issue Feb 20, 2024 · 4 comments · Fixed by #1339
Closed
Assignees
Labels
Milestone

Comments

@ibc
Copy link
Member

ibc commented Feb 20, 2024

  • Operating system: Linux Debian
  • mediasoup version:3.13.19

Issue description

Logs:

2024-02-20T07:08:16.181Z mediasoup:ERROR:Worker (stderr) (ABORT) RTC::RTCP::FeedbackRtpTransport::AddPacket() | failed assertion `baseSet': base not set
2024-02-20T07:08:16.569Z mediasoup:ERROR:Channel Producer Channel error: Error: read ECONNRESET
2024-02-20T07:08:16.574Z mediasoup-demo-server:WARN:Room _createDataConsumer() | failed:Error: request timeout at Timeout._onTimeout (/var/www/v3demo.mediasoup.org/node_modules/protoo-server/lib/Peer.js:156:14) at listOnTimeout (node:internal/timers:569:17) at process.processTimers (node:internal/timers:512:7)
2024-02-20T07:08:16.784Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:1695953, code:null, signal:SIGABRT]
2024-02-20T07:08:16.823Z mediasoup-demo-server:ERROR mediasoup Worker died, exiting  in 2 seconds... [pid:1695953]

Core dump:

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f0774b3e440 (LWP 1695953))]
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f0774bcad9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f0774b7bf32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f0774b66472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x000055b064e4bd6e in RTC::RTCP::FeedbackRtpTransportPacket::AddPacket(unsigned short, unsigned long, unsigned long) ()
#5  0x000055b064dd9ade in RTC::TransportCongestionControlServer::FillAndSendTransportCcFeedback() ()
#6  0x000055b0650dcf3e in uv.run_timers ()
#7  0x000055b0650e0c88 in uv_run ()
#8  0x000055b064c9a849 in DepLibUV::RunLoop() ()
#9  0x000055b064ca4d62 in Worker::Worker(Channel::ChannelSocket*) ()
#10 0x000055b064c998e4 in mediasoup_worker_run ()
#11 0x000055b064c98a2c in main ()

Clearly a regression in PR #1088 " Make transport-cc feedback work similarly to libwebrtc".

@ibc ibc added the bug label Feb 20, 2024
@ibc ibc added this to the v3 updates milestone Feb 20, 2024
@ibc ibc assigned ibc and jmillan Feb 20, 2024
@ibc
Copy link
Member Author

ibc commented Feb 20, 2024

CC @penguinol

@ibc
Copy link
Member Author

ibc commented Feb 20, 2024

I strongly fail to see how this crash can happen:

  • In TransportCongestionControlServer there is a timer that invokes its FillAndSendTransportCcFeedback().
  • FillAndSendTransportCcFeedback() is the only method that calls FeedbackRtpTransportPacket::SetBase() and FeedbackRtpTransportPacket::AddPacket() and it always calls the former first:
    	void TransportCongestionControlServer::FillAndSendTransportCcFeedback()
      {
      	MS_TRACE();
    
      	if (!this->transportWideSeqNumberReceived)
      	{
      		return;
      	}
    
      	auto it = this->mapPacketArrivalTimes.lower_bound(this->transportCcFeedbackWideSeqNumStart);
    
      	if (it == this->mapPacketArrivalTimes.end())
      	{
      		return;
      	}
    
      	// Set base sequence num and reference time.
      	this->transportCcFeedbackPacket->SetBase(this->transportCcFeedbackWideSeqNumStart, it->second);
    
      	for (; it != this->mapPacketArrivalTimes.end(); ++it)
      	{
      		auto result =
      		  this->transportCcFeedbackPacket->AddPacket(it->first, it->second, this->maxRtcpPacketLen);
  • So it's impossible that FeedbackRtpTransportPacket::AddPacket() is called before FeedbackRtpTransportPacket::SetBase() is called, so this->baseSet is always true when FeedbackRtpTransportPacket::AddPacket() and hence the assertion should not fail:
    MS_ASSERT(this->baseSet, "base not set");
  • And nowhere else this->baseSet in FeedbackRtpTransportPacket class is reset to false.

@ibc
Copy link
Member Author

ibc commented Feb 20, 2024

OPPPPS, HERE!

void TransportCongestionControlServer::TransportDisconnected()
	{
		MS_TRACE();

		switch (this->bweType)
		{
			case RTC::BweType::TRANSPORT_CC:
			{
				this->transportCcFeedbackSendPeriodicTimer->Stop();

				// Create a new feedback packet.
				this->transportCcFeedbackPacket.reset(new RTC::RTCP::FeedbackRtpTransportPacket(0u, 0u));

				break;
			}

			default:;
		}
	}

Here we are overriding this->transportCcFeedbackPacket with a fresh packet so its baseSet property is of course false. However we are also stopping the timer so still I don't understand how the issue can happen.

@ibc
Copy link
Member Author

ibc commented Feb 20, 2024

Oh... we are reseting the packet in many other places!!!!! here the bug!!!

				case RTC::RTCP::FeedbackRtpTransportPacket::AddPacketResult::MAX_SIZE_EXCEEDED:
				{
					// This should not happen.
					MS_WARN_DEV("transport-cc feedback packet is exceeded");

					// Create a new feedback packet.
					this->transportCcFeedbackPacket.reset(new RTC::RTCP::FeedbackRtpTransportPacket(
					  this->transportCcFeedbackSenderSsrc, this->transportCcFeedbackMediaSsrc));
				}

				case RTC::RTCP::FeedbackRtpTransportPacket::AddPacketResult::FATAL:
				{
					// Create a new feedback packet.
					this->transportCcFeedbackPacket.reset(new RTC::RTCP::FeedbackRtpTransportPacket(
					  this->transportCcFeedbackSenderSsrc, this->transportCcFeedbackMediaSsrc));

					// Use current packet count.
					// NOTE: Do not increment it since the previous ongoing feedback
					// packet was not sent.
					this->transportCcFeedbackPacket->SetFeedbackPacketCount(
					  this->transportCcFeedbackPacketCount);

					break;
				}
			}

ibc added a commit that referenced this issue Feb 20, 2024
- Fixes #1336

### Details

- Regression introduced in PR #1088, which has many issues.
- Basically there are cases in which `this->transportCcFeedbackPacket` is reset so it's a complete new packet and hence its `baseSet` is `false`.
- However in `TransportCongestionControlServer::FillAndSendTransportCcFeedback()` in which we just call `this->transportCcFeedbackPacket.SetBase()` once at the top, but then we enter a loop in which the packet maybe full so it's sent (or other things) and hence we **reset** the packet while in the loop. And then at the top of the loop `this->transportCcFeedbackPacket.AddPacket()` is called so the assertion fails because its `SetBase()` was not called on that fresh packet.
- Also add a missing `break` in a `case` block in that loop.
- Also set proper packet count in all cases in which we reset `this->transportCcFeedbackPacket`.

### TODO

This is not done yet. The issue is mainly identified but I don't know yet how to properly fix it without doing other wrong things. Basically I'm not sure if the whole logic is ok after PR #1088:

- As said before we only set the base of the packet once at the beginning of the loop, and such a `SetBase()` is called with `this->transportCcFeedbackWideSeqNumStart` and a computed `timestamp`.
- However, within the loop below, the packet may become full and hence `SendTransportCcFeedback()` is called, which updates `++this->transportCcFeedbackPacketCount` and `this->transportCcFeedbackWideSeqNumStart`, and **also** resets the packet (so missing base again).
- So in order to fix the crash, we must call `SetBase()` again, but which which values? Because the values given initially at the top of `TransportCongestionControlServer::FillAndSendTransportCcFeedback()` were the value of `this->transportCcFeedbackWideSeqNumStart` at that time and a timestamp. But that `this->transportCcFeedbackWideSeqNumStart` has now changed, so then what?
- Basically every time `this->transportCcFeedbackPacket.reset()` is called (and there are various cases in the whole `TransportCongestionControlServer.cpp` file) **we must** call ``SetBase()` on the fresh packet **with** proper values. And which are those proper values if we are within that loop?
@ibc ibc closed this as completed in #1339 Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants