Error: Received unexpected response with no commands pending #183

Closed
rwaldron opened this Issue Jul 14, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@rwaldron
Contributor

rwaldron commented Jul 14, 2016

@kevinmehall @johnnyman727 can either of you shed light on this error? I reviewed the commit that introduced it, but there is no explanation as to why this would ever happen. I'm suddenly and inexplicably experiencing hitting this error with a program that has:

  • Port A:
    • A repeating analog read on 7
    • A repeating I2C device read
  • Port B:
    • A repeating analog read on 7

When I eliminate the read on Port B, the issue seems to go away, but I'm not confident in the cause/effect. When I add logging to show the contents of the buffer when the exception is thrown, it is always a perfectly valid response. Here are samples from two runs that resulted in failure:

// A valid analog read response
Port:  A
Reply Buffer:  <Buffer 84 88 03>

// A valid I2C response
Port:  A
Reply Buffer:  <Buffer 84 50 54 90 7f cc b0 71 6c>

These are representative of all failures that I saw: always a valid response.


A bigger issue that this has brought to light is that the communication protocol is flawed in that there is no "pin identity" data in the response for analog reads.

@johnnyman727

This comment has been minimized.

Show comment
Hide comment
@johnnyman727

johnnyman727 Jul 15, 2016

Contributor

I can't think of anything off of the top of my head. Do all of the pin.analogRead and i2c.read calls provide a callback?

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

Contributor

johnnyman727 commented Jul 15, 2016

I can't think of anything off of the top of my head. Do all of the pin.analogRead and i2c.read calls provide a callback?

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Jul 15, 2016

Contributor

Do all of the pin.analogRead and i2c.read calls provide a callback?

Yes, the read operation is recursive, ie. approximately something like this:

function continuousAnalogRead(pin, callback) {
  tessel.port.A.analogRead(pin, (error, data) {
    callback(data);
    // Once the previous read is complete, request another one.
    // This is not great—Ideally I should be able to tell SAMD21 port 
    // that I want this pin to report its value at some frequency 
    continuousAnalogRead(pin, callback);
  });
}
continuousAnalogRead(7, data => runningAverage(data));
Contributor

rwaldron commented Jul 15, 2016

Do all of the pin.analogRead and i2c.read calls provide a callback?

Yes, the read operation is recursive, ie. approximately something like this:

function continuousAnalogRead(pin, callback) {
  tessel.port.A.analogRead(pin, (error, data) {
    callback(data);
    // Once the previous read is complete, request another one.
    // This is not great—Ideally I should be able to tell SAMD21 port 
    // that I want this pin to report its value at some frequency 
    continuousAnalogRead(pin, callback);
  });
}
continuousAnalogRead(7, data => runningAverage(data));
@johnnyman727

This comment has been minimized.

Show comment
Hide comment
@johnnyman727

johnnyman727 Jul 16, 2016

Contributor

I strongly suspect it's the issue I put forth above:

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

I might suggest checking the size of the incoming packet based on the command type (ie whether it's an analog read or i2c rx). If it's ever less than the expected size, I'm guessing that's the issue. Analog read will look like [24, BYTE_1, BYTE_2] and I2C read will look like [24, NUM_BYTES_READ...].

Contributor

johnnyman727 commented Jul 16, 2016

I strongly suspect it's the issue I put forth above:

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

I might suggest checking the size of the incoming packet based on the command type (ie whether it's an analog read or i2c rx). If it's ever less than the expected size, I'm guessing that's the issue. Analog read will look like [24, BYTE_1, BYTE_2] and I2C read will look like [24, NUM_BYTES_READ...].

@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Jul 17, 2016

Contributor

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

It's possible that some aspect of this guess is applicable. The communication "protocol" as it stands is broken by design. There can't be a mix of variable length messages and fixed length messages without sentinel/signal values on both ends of the variable length message; data should be encoded so as to never conflict with signal bytes... I could go on, but there's no point because fixing this now would be extraordinarily hard.

I've spent the day with this issue and I have a fix, but I'm going to stop working and take it up again tomorrow.

Analog read will look like [24, BYTE_1, BYTE_2] and I2C read will look like [24, NUM_BYTES_READ...].

Yes, I'm aware—I don't use the tessel.port. ... apis throughout Tessel-IO, because I don't like that analogRead returns a percent and not the actual value (huge resolution loss)

Contributor

rwaldron commented Jul 17, 2016

Perhaps the socket in JS emits the readable event before the entire response from the MCU gets there? So then there would be multiple readable events and the replyQueue would get shifted more than once for that one response.

It's possible that some aspect of this guess is applicable. The communication "protocol" as it stands is broken by design. There can't be a mix of variable length messages and fixed length messages without sentinel/signal values on both ends of the variable length message; data should be encoded so as to never conflict with signal bytes... I could go on, but there's no point because fixing this now would be extraordinarily hard.

I've spent the day with this issue and I have a fix, but I'm going to stop working and take it up again tomorrow.

Analog read will look like [24, BYTE_1, BYTE_2] and I2C read will look like [24, NUM_BYTES_READ...].

Yes, I'm aware—I don't use the tessel.port. ... apis throughout Tessel-IO, because I don't like that analogRead returns a percent and not the actual value (huge resolution loss)

@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Jul 17, 2016

Contributor

I've changed gears on this... I happened to notice something that was worth further attention and discovered that this is being caused by duplicate messages being sent from the SAMD. Every failure I've observed can be traced to this. This actually bring me back to the issue I mentioned earlier:

A bigger issue that this has brought to light is that the communication protocol is flawed in that there is no "pin identity" data in the response for analog reads.

There's no way to filter an duplicate message without knowing which pin the read came from:

replyBuf@readable <Buffer >
available@readable <Buffer 84 7c 04 84 7c 04>
replyBuf before while:  <Buffer 84 7c 04 84 7c 04>
while:  0
port:  A
byte:  132
rply:  <Buffer 84 7c 04 84 7c 04>
A7 0.9249023437499999
while:  1
port:  A
byte:  132
rply:  <Buffer 84 7c 04>
FAILURE
port:  A
byte:  132
rply:  <Buffer 84 7c 04>
/usr/lib/node/tessel-export.js:287
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
Contributor

rwaldron commented Jul 17, 2016

I've changed gears on this... I happened to notice something that was worth further attention and discovered that this is being caused by duplicate messages being sent from the SAMD. Every failure I've observed can be traced to this. This actually bring me back to the issue I mentioned earlier:

A bigger issue that this has brought to light is that the communication protocol is flawed in that there is no "pin identity" data in the response for analog reads.

There's no way to filter an duplicate message without knowing which pin the read came from:

replyBuf@readable <Buffer >
available@readable <Buffer 84 7c 04 84 7c 04>
replyBuf before while:  <Buffer 84 7c 04 84 7c 04>
while:  0
port:  A
byte:  132
rply:  <Buffer 84 7c 04 84 7c 04>
A7 0.9249023437499999
while:  1
port:  A
byte:  132
rply:  <Buffer 84 7c 04>
FAILURE
port:  A
byte:  132
rply:  <Buffer 84 7c 04>
/usr/lib/node/tessel-export.js:287
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Jul 17, 2016

Contributor

I'm going to try something drastic: I'm going to remove the error and dump the bytes when that path is reached. I will test it by doing continuous reads as many pins as possible along with an I2C peripheral.

Contributor

rwaldron commented Jul 17, 2016

I'm going to try something drastic: I'm going to remove the error and dump the bytes when that path is reached. I will test it by doing continuous reads as many pins as possible along with an I2C peripheral.

@kevinmehall

This comment has been minimized.

Show comment
Hide comment
@kevinmehall

kevinmehall Jul 17, 2016

Member

Each sent request should be paired up with exactly one response (except for asynchronous interrupts), so it identifies responses and their length by keeping the outstanding requests in a queue. So if you're receiving two responses without making two requests, you need to fix the firmware, not filter it in JS. The protocol is unfinished in plenty of other ways, but that's not one of them.

Member

kevinmehall commented Jul 17, 2016

Each sent request should be paired up with exactly one response (except for asynchronous interrupts), so it identifies responses and their length by keeping the outstanding requests in a queue. So if you're receiving two responses without making two requests, you need to fix the firmware, not filter it in JS. The protocol is unfinished in plenty of other ways, but that's not one of them.

@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Jul 17, 2016

Contributor

you need to fix the firmware, not filter it in JS

Yes, I agree, I didn't mean that we should fix this by "filtering" in JS, I meant that it would be useful to filter for the purpose of debugging the problem. I'm not reading one pin and observing one response, because the issue won't reproduce. Since I need several pins to be read, it would be useful to be able track which pins are associated with which failures, and identity information would be really useful in tracking failure patterns. Identity information is not unreasonable, but the intentional omission of identity information does seem unreasonable.

Contributor

rwaldron commented Jul 17, 2016

you need to fix the firmware, not filter it in JS

Yes, I agree, I didn't mean that we should fix this by "filtering" in JS, I meant that it would be useful to filter for the purpose of debugging the problem. I'm not reading one pin and observing one response, because the issue won't reproduce. Since I need several pins to be read, it would be useful to be able track which pins are associated with which failures, and identity information would be really useful in tracking failure patterns. Identity information is not unreasonable, but the intentional omission of identity information does seem unreasonable.

@johnnyman727

This comment has been minimized.

Show comment
Hide comment
@johnnyman727

johnnyman727 Jul 18, 2016

Contributor

could go on, but there's no point because fixing this now would be extraordinarily hard.

@rwaldron what's the difficult part of this process? tessel-export.js is tied to the firmware version through t2 update so it's unlikely to cause compatibility errors.

Contributor

johnnyman727 commented Jul 18, 2016

could go on, but there's no point because fixing this now would be extraordinarily hard.

@rwaldron what's the difficult part of this process? tessel-export.js is tied to the firmware version through t2 update so it's unlikely to cause compatibility errors.

@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Jul 18, 2016

Contributor

so it's unlikely to cause compatibility errors.

I'm not concerned with compatibility errors, even though there would be direct impact on Tessel-IO, which by-passes the tessel.port.[A|B].pins.* APIs and talks directly with the port itself—but again, I'm not worried about resolving those issues expeditiously and would never let them be a "blocker".

Specifically, I think there should be no mixed handler mechanisms, ie. having some things handled by queued callbacks and some things handled by emitting an event is one too many handler mechanisms. Everything coming through the port should be handled by emitting an appropriate event. To prove out my thinking, I wrote a patch that moves handling for analog read responses out of the queued callback path and into an event emitting mechanism. It works, but it's only a partial migration at the moment. I'm going to pursue it when I have more time. Right now it's just a design exercise.

Other issues I encountered along the way:

  • The firmware is responding with duplicate messages
  • The firmware is responding with large swaths of zeroes

    dedicated specifically the queued callback is that there shouldn't be any mixed handler
Contributor

rwaldron commented Jul 18, 2016

so it's unlikely to cause compatibility errors.

I'm not concerned with compatibility errors, even though there would be direct impact on Tessel-IO, which by-passes the tessel.port.[A|B].pins.* APIs and talks directly with the port itself—but again, I'm not worried about resolving those issues expeditiously and would never let them be a "blocker".

Specifically, I think there should be no mixed handler mechanisms, ie. having some things handled by queued callbacks and some things handled by emitting an event is one too many handler mechanisms. Everything coming through the port should be handled by emitting an appropriate event. To prove out my thinking, I wrote a patch that moves handling for analog read responses out of the queued callback path and into an event emitting mechanism. It works, but it's only a partial migration at the moment. I'm going to pursue it when I have more time. Right now it's just a design exercise.

Other issues I encountered along the way:

  • The firmware is responding with duplicate messages
  • The firmware is responding with large swaths of zeroes

    dedicated specifically the queued callback is that there shouldn't be any mixed handler
@rwaldron

This comment has been minimized.

Show comment
Hide comment
@rwaldron

rwaldron Aug 1, 2016

Contributor

Further observation:

  • Appears only when making "read" requests to both ports
Contributor

rwaldron commented Aug 1, 2016

Further observation:

  • Appears only when making "read" requests to both ports

@reconbot reconbot added the bug label Aug 9, 2016

@johnnyman727 johnnyman727 closed this in #204 Sep 2, 2016

@HipsterBrown HipsterBrown referenced this issue in tessel/this-week-in-tessel Sep 9, 2016

Closed

TWIT September 2016 Tesselcamp Special Report #39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment