Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race condition when new transports are getting created #47

Closed

Conversation

james-maloney
Copy link
Contributor

Every so often we would see "Transport with same transportId already exists". Tracked it down to when we a user joined a room with multiple existing consumers. There is a race condition where _setupTransport was getting called multiple times. This PR adds some state to protect against calling _setupTransport multiple times.

@ibc
Copy link
Member

ibc commented Oct 25, 2018

Thanks. This seems interesting. Let please discuss some things before accepting this PR:

What would happen in the following scenario?

  • Create a recvTransport.
  • Add consumer1 to it via consumer1.receive(recvTransport).
  • Since it's the first Consumer, _setupTransport() is called, which sets this._creatingTransport = true and emits @needcreatetransport, which is handled by Transport.js (who sends the createTransport request to mediasoup).
  • But that will take some time before getting the response.
  • In the meanwhile, add consumer2 via consumer2.receive(recvTransport).
  • It will find this._creatingTransport set to true so it won't call _setupTransport() and will go directly into this code:
const remoteSdp = this._remoteSdp.createOfferSdp(
	Array.from(this._consumerInfos.values()));
const offer = { type: 'offer', sdp: remoteSdp };

The problem here is that it's using this._remoteSdp which is NOT yet ready (it will be ready once we get the network response to createTransport, whitin the _setupTransport() method. So IMHO it would fail, am I wrong?

@ibc
Copy link
Member

ibc commented Oct 25, 2018

BTW, the bug you describe is real, sure. But it should not happen. Please, check Transport.js. It holds a this._commandQueue which ensures that some API calls are called sequentially (and never in parallel). So, for example, when consumer.receive(transport) is called, this happens:

  • consumer.receive(transport) calls to transport.addConsumer(this) (in Transport.js).
  • transport.addConsumer() does this:
return this._commandQueue.push('addConsumer', { consumer });
  • When the queue ends its current task (if any) it emits exec passing as argument addConsumer and the consumer itself.
  • This is executed in transport._execCommand(command, promiseHolder), which waits for the promise returned by this._execAddConsumer(consumer) to finish before it runs any other task.
  • So, if two calls to transport.addConsumer() happen at the same time, theoretically the second one should start once the first one ends, and the first one is supposed to create the mediasoup server-side transport, etc.
  • So the "Transport with same transportId already exists" error should not happen.

But it happens... so there is a bug somewhere.

@james-maloney
Copy link
Contributor Author

Regarding your first comment, you are correct. I think a better solution would be to keep track of the transport state in _setupTransport and if it's already setup just return the resolved promise.

something like this:

	_setupTransport()
	{
		if (this._pTransport !== null) {
			return this._pTransport;
		}

		logger.debug('_setupTransport()');

		this._pTransport = Promise.resolve()
			.then(() =>
			{
				// Get our local DTLS parameters.
				const transportLocalParameters = {};
				const sdp = this._pc.localDescription.sdp;
				const sdpObj = sdpTransform.parse(sdp);
				const dtlsParameters = sdpCommonUtils.extractDtlsParameters(sdpObj);

				// Let's decide that we'll be DTLS server (because we can).
				dtlsParameters.role = 'server';

				transportLocalParameters.dtlsParameters = dtlsParameters;

				// Provide the remote SDP handler with transport local parameters.
				this._remoteSdp.setTransportLocalParameters(transportLocalParameters);

				// We need transport remote parameters.
				return this.safeEmitAsPromise(
					'@needcreatetransport', transportLocalParameters);
			})
			.then((transportRemoteParameters) =>
			{
				// Provide the remote SDP handler with transport remote parameters.
				this._remoteSdp.setTransportRemoteParameters(transportRemoteParameters);
			});

		return this._pTransport;
	}

Regarding your second comment, that also seems correct but we do see the server error quite frequently (although it's quite hard to repro without production traffic). For some background we see the issue most frequently when we have a room with 3+ peers that are producing and 20-30+ peers who are only consuming.

@ibc
Copy link
Member

ibc commented Oct 25, 2018

Hi, as I said in my previous comment, checking any variable within _setupTransport() is not a valid solution. If two calls to handler.addConsumer() happen at the same time (before the first one completes), the second one would bypass the _setupTransport() code and directly move to the next promise in the chain, which uses the transport (but it wouldn't be setup yet!). So this is not about cosmetic stuff like this._transportCreating vs this._pTransport because both are the same solution that IMHO is not valid.

We'll check this issue. However, I've done some tests with the CommentQueue class and it seems to work fine. It guarantees that an enqueued task is just executed once the previous one finishes, so I've no idea yet regarding why the error happens. It just shouldn't.

To summarize: this should not happen so there is nothing to add in the handlers code. The CommandQueue in Transport.js is supposed to deal with this. If there is an error (I'm not saying that there is not), it's somewhere else.

@ibc
Copy link
Member

ibc commented Oct 25, 2018

OK, now I see what you mean with storing this._pTransport promise. Indeed it should work.

The problem here is that, again, this error is supposed to not happen at all since calls to handler methods are enqueued.

@james-maloney
Copy link
Contributor Author

Perfectly fine if you don't want to accept the PR but the promise change I just submitted will ensure that the second consumer gets a properly setup transport. In my above example I left out that _setupTransport always gets called, so if it's been previously called it returns the already resolved promise.

@ibc
Copy link
Member

ibc commented Oct 25, 2018

I know that this PR fixes the problem, but what it worries me is that it should not happen. Have you took a look to the CommandQueue stuff usage in Transport.js?

@james-maloney
Copy link
Contributor Author

I've briefly looked at CommandQueue and did not see anything obviously wrong. I'll spend some time today checking it out.

@havfo
Copy link

havfo commented Oct 25, 2018

In _handleCommand at line 71 CommandQueue. this._busy is set to false before the queue is shifted. Wouldn't this in rare cases make it possible for another command to swoop in and try to execute the command again before the queue is shifted?

@james-maloney
Copy link
Contributor Author

@havfo since there is only one thread that should not be possible. The event loop won't take up a new task until the end of that block is executed.

@ibc
Copy link
Member

ibc commented Oct 25, 2018

Yes, that line 71 cannot be wrong.

Guys, yesterday I did some tests with the CommandQueue class and wasn't be able to reproduce the issue. However the test may not be good enough. I'll try to share it tomorrow. If there is a but, may be we all can figure it out.

Thanks!

@ibc
Copy link
Member

ibc commented Oct 26, 2018

Hi, here the test app I've done. I cannot reproduce the issue here in any way. But I would like more eyes on this code. May be it's not good enough to reproduce what it happens in real life when using mediasoup-client:

https://github.com/ibc/CommandQueueTest

@ibc
Copy link
Member

ibc commented Oct 27, 2018

Let's please handle this issue in #48. I've some news in there.

@ibc
Copy link
Member

ibc commented Oct 30, 2018

Closing as the result of #48.

@ibc ibc closed this Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants