Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum cmd wait delay - required? #159

Closed
arnoutdekimo opened this issue Nov 8, 2023 · 14 comments
Closed

Minimum cmd wait delay - required? #159

arnoutdekimo opened this issue Nov 8, 2023 · 14 comments
Assignees

Comments

@arnoutdekimo
Copy link

Hi,

in cell/src/u_cell_private.c there is the structure gUCellPrivateModuleList which describes the different supported modules and their parameters. One of these is commandDelayMs , which causes a 100ms delay before each command that is sent.

For my usecase, where I want to reach the cloud asap to avoid leaking power, I'd ideally loose as little time as possible going through setup procedures. Currently, the AT commands sent to check and set up the modem delay the boot quite a bit.
If you look at the UART bus, one can see that it is pretty idle because of this 100ms delay
image

The most important question I have about this, is whether this 100ms delay needed in the first place or not? (and where does it come from)

A second question could be if there are ways to safely "cut corners" - that's to say, currently I am assuming if I use the ubxlib functions to bring up my cellular connection - and I assume that this will always work correctly. If I start skipping AT commands (e.g. by assuming certain configurations are already done), I'd have to deviate from the library and also have less reliability So if possible, that's something I'd like to avoid. But hence the question if the commands cannot simply be sped up or not.

Thanks for your feedback

@arnoutdekimo
Copy link
Author

arnoutdekimo commented Nov 8, 2023

FYI, to make this a bit more concrete :
When patching the lib to avoid first going through AT#CFUN=4 (#158), my log files look like this when changing this delay to 10ms:

0:00:00:105 - U_CELL_PWR: powering on, module is already on.
0:00:00:105 - ATE0
0:00:00:115 -
0:00:00:155 - OK
0:00:00:155 - AT+CMEE=2
0:00:00:166 -
0:00:00:206 - OK
0:00:00:206 - AT+UDCONF=1,0
0:00:00:217 -
0:00:00:257 - OK
0:00:00:257 - ATI9
0:00:00:267 -
0:00:00:307 - 01.24,A01.10
0:00:00:307 -
0:00:00:307 - OK
0:00:00:307 - AT&C1
0:00:00:318 -
0:00:00:359 - OK
0:00:00:359 - AT&D0
0:00:00:369 -
0:00:00:410 - OK
0:00:00:410 - AT+UCGED=2
0:00:00:421 -
0:00:00:461 - OK
0:00:00:461 - AT&K3
0:00:00:471 -
0:00:00:522 - OK
0:00:00:522 - AT+UPSV=0
0:00:00:533 -
0:00:00:573 - OK
0:00:00:573 - AT+UPSMR=1
0:00:00:584 -
0:00:00:624 - OK
0:00:00:624 - AT+CPSMS?
0:00:00:635 -
0:00:00:675 - +CPSMS:0,,,"01100000","00000000"
0:00:00:676 -
0:00:00:676 - OK
0:00:00:676 - AT+CEDRXS?
0:00:00:688 -
0:00:00:728 - +CEDRXS:
0:00:00:728 -
0:00:00:728 - OK
0:00:00:728 - AT+UMNOPROF?
0:00:00:740 -
0:00:00:780 - +UMNOPROF: 90
0:00:00:780 -
0:00:00:780 - OK
0:00:00:781 - AT+UGPRF?
0:00:00:792 -
0:00:00:832 - +UGPRF: 2,0,""
0:00:00:832 -
0:00:00:832 -
0:00:00:832 -
0:00:00:833 - OK
0:00:00:833 - Opened device with return code 0.
0:00:00:833 - Bringing up the network...
0:00:00:833 - U_CELL_NET: preparing to register/connect...
0:00:00:834 - AT+CREG=2
0:00:00:845 -
0:00:00:885 - OK
0:00:00:885 - AT+CGREG=2
0:00:00:896 -
0:00:00:936 - OK
0:00:00:936 - AT+CEREG=4
0:00:00:947 -
0:00:00:987 - OK
0:00:00:987 - AT+CIMI
0:00:00:998 -
0:00:01:038 - 206010311019587
0:00:01:038 -
0:00:01:038 - OK
0:00:01:039 - U_CELL_NET: user-specified APN is "web.be".
0:00:01:039 - AT+CGDCONT=1,"IP","web.be"
0:00:01:053 -
0:00:01:093 - OK
0:00:01:093 - U_CELL_NET: setting automatic network selection mode...
0:00:01:093 - AT+COPS?
0:00:01:104 -
0:00:01:144 - +COPS: 0,0,"JIM",7
0:00:01:144 -
0:00:01:144 - OK
0:00:01:145 - AT+CFUN=1
0:00:01:156 -
0:00:01:196 - OK
0:00:01:196 - AT+CREG?
0:00:01:207 -
0:00:01:247 - +CREG: 2,1,"0C3D","02BF6B05",7
0:00:01:248 -
0:00:01:248 - OK
0:00:01:248 - 7: RegH
0:00:01:248 - AT+CGREG?
0:00:01:550 -
0:00:01:590 - +CGREG: 2,4
0:00:01:590 -
0:00:01:590 - OK
0:00:01:590 - 7: OoC
0:00:01:591 - AT+CEREG?
0:00:01:892 -
0:00:01:932 - +CEREG: 4,1,"C3D","2BF6B05",7,,,,
0:00:01:933 -
0:00:01:933 - OK
0:00:01:933 - 7: RegH
0:00:01:933 - AT+COPS=3,0
0:00:02:235 -
0:00:02:275 - OK
0:00:02:275 - AT+COPS?
0:00:02:286 -
0:00:02:326 - +COPS: 0,0,"JIM",7
0:00:02:326 -
0:00:02:326 - OK
0:00:02:327 - AT+CGATT?
0:00:02:338 -
0:00:02:378 - +CGATT: 1
0:00:02:378 -
0:00:02:378 - OK
0:00:02:378 - AT+CGACT?
0:00:02:390 -
0:00:02:430 - +CGACT: 1,1
0:00:02:430 -
0:00:02:430 - OK
0:00:02:430 - AT+UPSD=0,0,0
0:00:02:443 -
0:00:02:483 - OK
0:00:02:483 - AT+UPSD=0,100,1
0:00:02:495 -
0:00:02:535 - OK
0:00:02:535 - U_CELL_NET: connected after 1 second(s).
0:00:02:535 -

In case of the default time (100ms), in the exact same scenario, the time was double:

0:00:05:084 - OK
0:00:05:084 - U_CELL_NET: connected after 2 second(s).
0:00:05:084 -

@RobMeades
Copy link
Contributor

Hi again: you're busy :-).

It's a tricky one, that parameter. Frankly it is kind of a "fudge factor" that, as I think your second question points to, can result in occasional instability if the corner is cut too far. The integration of SARA-R4 into ubxlib began using development versions of the SARA-R4 module FW and the reason that parameter is so large in the SARA-R4 case is because it was definitely required at the time: odd things could happen if you hit that SARA-R4 FW too rapidly.

The recommended value can be found in the SARA-R4 AT commands manual, section 2.2.5.1:

To ensure the DCE can transmit the buffered URCs, the DTE should wait some time (the recommended value is at least 20 ms) after the reception of an AT command final result code or URC before issuing a new AT command. Otherwise, the collision of the URCs with the subsequent AT command is possible.

So, as you can see, 10 ms may work but then you need to know if it always works, where is the corner? As you can see from the presence of the struct, the answer is module dependent and, likely, dependent on the AT command that has just been completed also (the module likely has asynchronous processes within itself that need to complete and our internal product testing will only push the limits so far).

The only dependable answer would be to run a good solid set of regression tests of your product with your choice of "trimmed" value: that would test your AT commands, in your use cases, on your choice of module. I would be reluctant to change the parameter for SARA-R4 in ubxlib now, mostly because we don't have all possible flavours and FW versions of SARA-R4 in the test system, so we couldn't be sure that we haven't broken one of them. But to make it easier for you to work with the ubxlib code, we could bring all of the timings out to #defines, allowing you to set the value you know works for your build/use-case without having to modify the source code. Would that help?

@arnoutdekimo
Copy link
Author

Hi :)

Many thanks for the elaborate answer. It's indeed sort of what I was expecting.

I can indeed bring it down a bit and evaluate.
For now, I branched ubxlib and made some changes there, but if I want to upgrade to a newer one I indeed would have to merge things, and having a "supported" library way of changing this is indeed cleaner. Defines are one way, having a function that allows to "override" parameters would be another. (e.g. push them out of "spec", but at your own risk) Personally I am a bit more of a fan of having more functions than more defines, but it's a bit of a detail I guess.

Thanks again, and I'm off to keeping busy:)

@RobMeades
Copy link
Contributor

Hmm, yes, maybe a function would be better. Shouldn't be difficult to add: it is on the TODO list.

@RobMeades RobMeades self-assigned this Nov 8, 2023
@cturvey
Copy link

cturvey commented Nov 8, 2023

Generally a lot of this interaction should be able to self-pace as you always expect some OK/ERROR type response, and URC should be output interleaved at a line level, so there shouldn't be characters jumbled together.
The response timeouts can be command and vendor specific, there's a lot of carrier/tower interaction stuff that can realistically take multiple minutes. Sending new AT commands might abort them, it's apt to eat or error commands so recovery can get complicated.
How much of this needs configuring each time, and what gets retained in profiles? This gets really messy on cellular modems.
I've previously built systems which query the settings I need, confirm them, and then only send the configuration command when the state needs changing. Clearly that complicates the process, but it can flow faster if it's not waiting for the modem to write things into FLASH, EEPROM, or SIMs unnecessarily.
On other modems initial queries against the SIM can also take a lot of time as there's bring-up and authentication required there.

@RobMeades
Copy link
Contributor

APIs to permit the AT command delay to be overridden added in a12674d (along with appropriate warnings). Gonna close this one, please feel free to re-open if we need to discuss further.

@arnoutdekimo
Copy link
Author

Note, even though I took your code changes and lowered the AT delay timeout, I noticed that during my OTA download the transfers were -still- slow. (> 100ms delay between AT commands)

Apparently this comes from u_sock.c, where a define U_SOCK_RECEIVE_POLL_INTERVAL_MS is used to wait before polling for more data until the full buffer is filled. The maximum allowed by the device in a single AT command is 1024kB, so when asking for more, this causes a delay there as well.

I then tried to lower the requested buffer lower than that 1 kB, but still ran into 100ms delay.
Evetually by breakpointing in the .. delay function, I noticed that U_AT_CLIENT_URC_TIMEOUT_MS also exists, and is also hardcoded to 100ms, and which delays the sending of subsequent AT commands - and thus lowering the maximum throughput.

Note that in case of 115200 baud, the download utilization is about 1/3rd. But in case of higher baudrates, these delays would totally dominate the transfer.

image

@arnoutdekimo
Copy link
Author

FYI, what happens when lowering the URC timeout to 10ms

image

@arnoutdekimo
Copy link
Author

Trying to push it even further (sub 10ms) required:

  • Not asking for more than 1kB for a socket
  • Changing (U_AT_CLIENT_DEFAULT_DELAY_MS), U_AT_CLIENT_URC_TIMEOUT_MS ``U_AT_CLIENT_STREAM_READ_RETRY_DELAY_MS to 1 ms
  • Calling uCellAtCommandDelaySet to 1 ms

That gives:

image

Ideally I want to try to now change the baudrate and see what speed I can reach now too, but I understand this is a bit tricky.
Anyway, in my situation at least, ideally I could use a "stable, tried and tested" slower baudrate and timeouts for most of the stuff, but during a OTA, I'd like to change gear to get the file asap. Maybe staying at this baudrate is just still acceptable, but changing the delays for bulk transfers would definitely be interesting. For now, I have to redefine them at compile time, so they will be active all the time. (For this this seems to work fine though)

@RobMeades
Copy link
Contributor

Very interesting: the timeout in a URC has been 100 ms pretty much since the AT Client was written. In theory there shouldn't be an issue reducing it, any partially received URC will be processed completely the next time around, but it is also true that, at some point, the AT Client needs to throw uninteresting received stuff away so the danger is that a URC could find its way into the bin if there was a hole in that logic and it would be difficult to determine that is happening; there is a static, back-to-backed, test jig for the AT Client but I wouldn't want to rely on it to pick up timing oddities.

I will create a branch on which I can test a reduced U_AT_CLIENT_URC_TIMEOUT_MS and see how it looks.

@RobMeades
Copy link
Contributor

In that case what I would do is add functions to the AT Client to set/get U_AT_CLIENT_URC_TIMEOUT_MS and U_AT_CLIENT_STREAM_READ_RETRY_DELAY_MS at run-time, and then have a triplet of uAtClientCommandDelaysSet()/uCellAtCommandDelaysGet()/uCellAtCommandDelaysSetDefault() functions which take as parameters all three of U_AT_CLIENT_DEFAULT_DELAY_MS, U_AT_CLIENT_URC_TIMEOUT_MS and U_AT_CLIENT_STREAM_READ_RETRY_DELAY_MS, the last function being an easy way to get back to square one (I'd probably deprecate the current uCellAtCommandDelaySet() function).

I would add a "HERE BE DRAGONS" comment above uCellAtCommandDelaysSet() indicating that one reduces delays at one's own risk, e.g. in a brief/rare period where high performance/throughput is so important that it is worth taking chances, that one must understand what one is doing, be equipped with the relevant equipment to debug problems (e.g. a Saleae probe), must be able to test revised timeouts to death in one's own product scenario and that, in particular, there is a high risk that URCs emitted by the module may be lost when timeouts are reduced, for instance leading to ubxlib no longer tracking the state of the module accurately (for instance if registration URCs are lost).

Does that present the right balance of risk/responsibility?

@arnoutdekimo
Copy link
Author

Hi, sounds what I would use yes.
From a cellular module vendor perspective, I'd be nice to officially support something faster than 3kB/s as transfer rate (instead of saying it is possible, but if it does not work - it's the customer's risk). But anyway from a library perspective, the modification you suggest is probably the easiest quick win on this, and would be helpful to me

Thanks for the help

@RobMeades
Copy link
Contributor

I've performed four complete test sweeps (1 hour runs of everything across all of the platforms we have in the test system) overnight with U_AT_CLIENT_URC_TIMEOUT_MS set to 20 ms rather than 100 ms and they don't show any failures that I can put down to URC drops.

Now of course it is very difficult to be sure without looking through every single AT log but I would have expected some hint of a problem.

What I will do is implement the API and also bring U_AT_CLIENT_URC_TIMEOUT_MS down to 20 ms, and U_AT_CLIENT_DEFAULT_DELAY_MS down to 25 ms, since the latter I can do just for SARA-R422. I will test the lot from now until Monday and, if all still seems good, I will push the lot here. If there is a hint of something up with reducing either of the two timeouts I'll leave them where they are and just push the API change here.

@RobMeades
Copy link
Contributor

This is now done in 731256a. All the interfaces are added uCellAtCommandTimingGet(), uCellAtCommandTimingSet() and, to make things easy, uCellAtCommandTimingSetDefault(). The transmit delay times for SARA-R422 (and LARA-R6) are modified to be 20 ms.

I did try reducing the U_AT_CLIENT_URC_TIMEOUT_MS from 100 ms to 20 ms but I wasn't happy that I had seen stable behaviour across all module types with that setting so, rather than delay, I've just left the value as it is; you may of course experiment with using a better value for your use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants