Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressively slower transmission and potential buffer handling issue. #312

Open
jlewallen opened this issue May 23, 2021 · 5 comments
Open
Labels
type: imperfection Perceived defect in any part of project

Comments

@jlewallen
Copy link

I've been seeing a recurring issue when transmitting "large" amounts of data, in my case anywhere from a 100k or more. What happens is the transmission speed gets progressively slower and slower, with writes consuming more time. At a certain point M2M debugging starts to echo "Slowing down" and at that point the odds that the transmission will succeed are pretty small.

Researching/digging brought me to this, now closed, thread: #118

Much of what I'm seeing now seems to be related to the issues being discussed there. When I add some of the same logging, for example. I can see the call to write fail after several hundred thousand attempts.

My next step is to write a test case that reproduces the issue, as right now it's part of a large body of firmware.

I'm still wrapping my ahead around the internals, but I'm wondering if this could be a similar buffer handling issue or something else that others have seen? I'd appreciate any help I can get and will report back if I make any progress.

Thanks!
Jacob

@jlewallen
Copy link
Author

Hey everyone.... I slimmed down my test case, pulling in the SimpleWebServer example and making a few modifications:

  1. I buffer all the HTTP headers into one write call.
  2. The server sends a Content-Length of 1MB and produces that data as a stream of 1400 byte blocks. Each block is filled with the same byte, incremented on each block (1400 0's, 1400 1's, etc...) So packets are easy to see the local order of.

I then issued a curl command on my desktop to download while I took a Wireshark capture.

Notable in the capture is that things don't appear to be going smoothly from the beginning haha. It starts with a missing TCP segment, which should have been captured and then sputters through some more of those and out of order TCP and then eventually degrades into a whole flood of TCP retransmissions.

Which if I'm understanding the behind the scenes makes sense, as those retransmission's are only possible if the data is still around, and that's a lot of data to keep around, hence my buffering and speed issues.

Jacob

winc1500-slow-1MB-2.pcapng.gz

@jlewallen
Copy link
Author

In an effort to eliminate variables I moved my WiFi closer to the hardware and reran the test. First test went really well, and then the second test went south on me. I've uploaded the pcap from that test, as well which I think may demonstrate the problem better as things appear fairly smooth and then the problems become extremely visible.

I'm running these curl's from a wired connection on the same router serving the WiFi, by the way. Things don't get any better when both ends are on the WiFi.

winc1500-1MB-closer-wifi-2.pcapng.gz

Jacob

@jlewallen
Copy link
Author

I should note that the previous capture was done running the whole code/stack that originally exhibited the problem and appears more difficult to reproduce when I narrow the code down the my test case based on SimpleWebServer. I'm going to keep digging to see if something I'm doing is exacerbating this problem in that case.

Jacob

@jlewallen
Copy link
Author

This was my fault! An errant component was frequently toggling IRQs during the network handling and as soon as that was cleared away things started to behave much better. Sorry for the noise.

@per1234 per1234 added the conclusion: invalid Issue/PR not valid label May 25, 2021
@jlewallen
Copy link
Author

I spoke too soon. This is still an issue for me and I believe my prior work was just making my own code less of an factor. Closer WiFI for easier testing made the issue less likely to surface, but now that I've returned everything to their rightful positions it's back with a vengeance.

@per1234 per1234 reopened this May 25, 2021
@per1234 per1234 added type: imperfection Perceived defect in any part of project and removed conclusion: invalid Issue/PR not valid labels May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: imperfection Perceived defect in any part of project
Projects
None yet
Development

No branches or pull requests

2 participants