Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uSockGetHostByName() fails if we use stubs for wifi. #36

Closed
eeFLis opened this issue Dec 20, 2021 · 47 comments
Closed

uSockGetHostByName() fails if we use stubs for wifi. #36

eeFLis opened this issue Dec 20, 2021 · 47 comments

Comments

@eeFLis
Copy link
Contributor

eeFLis commented Dec 20, 2021

Hello all
We are using cell sockets to resolve the address of a host. However, with the current master branch this fails. The reason seems to be in uWifiSockInit() line 474 in u_sock.c. This function is also called when only cell sockets are used. In this case the call fails with error code -5.

If I comment out the line, everything works as before.
Is there any other way I can work around this problem?

@RobMeades
Copy link
Contributor

Hi there! -5 is -U_SOCK_EIO; a quick look at the code suggests that uWifiSockInit() is being called, which does a lock on uShortRangeLock() but uShortRangeInit() has not been called at this point and so the lock fails. I guess that the u_sock code should also call uShortRangeInit() if it is going to call uWifiSockInit(): @antevir?

@antevir
Copy link
Contributor

antevir commented Dec 21, 2021

Yes, it sounds like uShortRangeInit() has not been called. However, this should happen when you call uNetworkInit().
@eeFLis could you maybe share the init code you are using?

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 21, 2021

Hi
The error also occurs in the socket example provided in ubxlib/example/sockets/.
uShortRangeInit() is not called by uNetworkInit() because we are using wifi stubs.

@antevir
Copy link
Contributor

antevir commented Dec 22, 2021

Ahh.. just saw that in the title now. Unfortunately there are no proper way of excluding wifi at the moment. We are planning to address this in the next quarter: UPCOMING_CHANGES.md.
So I am afraid that if you want to remove wifi by mocking you will also need to mock u_wifi_sock.c.

@RobMeades
Copy link
Contributor

RobMeades commented Dec 23, 2021

Apologies for this @eeFLis: we had thought the stub mechanism would work originally but it is really not scaleable, hence we are planning a different solution as Andreas describes. Are you able to work around the issue as Andreas suggests until we have a better solution in place?

@eeFLis
Copy link
Contributor Author

eeFLis commented Jan 13, 2022

@RobMeades
Yes we can work around it until there is a better solution. Thank you.

@eeFLis
Copy link
Contributor Author

eeFLis commented Jul 5, 2022

Hi Rob
is there a way in release 1.0.0 to exclude unused APIs such as wifi ble gnss?
I thought it was planned for this version but I can't see any possibility.

@RobMeades
Copy link
Contributor

Hi again, and apologies: this was something we wanted to get in for this release but re-jigging all the underlying things to introduce the device/network API, which paves the way for doing what you want, took longer than we thought. So the hooks are there, we need to get down and do the implementation now.

In fact, just a few hours ago we had a meeting about next priorities and this was highlighted. We are shooting for October; basically it comes after LARA-R6 support and I2C support, both of which are happening now.

Apologies again: what I might do, as soon as we have something, is push a preview branch of it here so that you can see if it works for you.

@eeFLis
Copy link
Contributor Author

eeFLis commented Jul 7, 2022

Hi Rob

That would be great. this will help us to reduce the memory footprint. Thanks
If there is a preview we will test it.

@RobMeades
Copy link
Contributor

@eeFLis: just an update that this is being worked on but didn't make it into 1.1.0. We will let you know as soon as there is something to try.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 7, 2022

Hi Rob

Do you know when we will have something to try?
We are slowly running out of memory.

@RobMeades
Copy link
Contributor

RobMeades commented Nov 7, 2022

Hi there: unfortunately the guy who was working on this hasn't. That said, as a side-effect of doing the CMUX work, we've ended up creating the bits of code needed to make the jump-tables that are required for this kind of "link-time" separation to work, so I can probably start looking at this myself from the start of next week.

That would suggest probably not this year but, maybe, just maybe...

Apologies again for the extreme delay on this, don't like having issues open for an entire year, though it is not the record breaker :-(.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 7, 2022

Hi Rob

Since the lib is growing constantly (which is great), in most cases not all components (gnss,wifi,ble,cell) are needed.
herefore it would be useful to have this feature.

already many thanks for your work

@RobMeades
Copy link
Contributor

Understood: the growth worries me a little actually, it might be that some of the things we are adding even within, for example, cellular, are not of general interest and the code size just becomes an overhead. Anyway, will try to at least allow you to remove the things that you are definitely not interested in.

@RobMeades
Copy link
Contributor

RobMeades commented Nov 11, 2022

Had a bit of a revelation yesterday and realised that fixing this problem is a lot easier than I thought. I have pushed a preview branch of the solution here:

https://github.com/u-blox/ubxlib/tree/preview_separation_rmea

On this preview branch you should be able to change the UBXLIB_FEATURES make/CMake variable that you pass to the common ubxlib.mk and ubxlib.cmake files to, for instance, "cell" instead of "cell short_range gnss" and it should automatically stub-out the not-needed calls.

FYI, the preview branch is arranged such that it includes the preview-fix we did for your issue #75. Please let me know if it does what you want.

Also FYI, this is a preview, we've not actually reviewed this change internally yet, though I anticipate no problems. Once we merge the change to master and push it here I will delete the preview branch.

@RobMeades
Copy link
Contributor

RobMeades commented Nov 11, 2022

Actually, there's still a bug in that branch, the one you raised here originally, let me fix that and I'll update this issue when I've done it.

@RobMeades
Copy link
Contributor

Hopefully fixed now, branch updated.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 11, 2022

We use the STM32 Cube IDE, which does not have cmake integrated.
but I think it should be possible to stub-out the not-needed calls in the same way.

Is this only a temporary solution?
I thought you mention that the stub mechanism is not really scalable and you are planning another solution?

@RobMeades
Copy link
Contributor

RobMeades commented Nov 11, 2022

That's what I had originally thought but it is only not scalable because of having to swap in and out the stub files for the N cases; the revelation I had last night was that each common module which calls down into a ble/cell/gnss/short_range/wifi thing (which is where the cross-linkage occurs) simply has to provide its own stub versions of those calls that are weakly-linked, and that return "not supported" or whatever, then we can leave the stub files always in place and remove the real implementations as we wish, leaving the stub to take over. A nice simple rule: you call it, you stub it, very little to go wrong and easy on the brain.

I assume you're using the full Eclipse system? If you were just using a Makefile project we support that through the ubxlib.mk file but, anyway, all you should need to do is to add all of the stub files that have been introduced in the branch into your Eclipse-based build and then you can leave out all of the ble/gnss/short_range/wifi or whatever it is that you don't want and it should all link and work. I haven't yet been able to run this myself and won't be able to do so for a while so just let me know if I've missed anything.

The other approach, what I was preparing for, was to create interface types: for instance there would be one for things that MQTT needed for services from cell/wifi, but then we'd need to define/create structures of jump-tables and populate them at some point, etc,, all of which [I realised] is unnecessary overhead when weak and the GCC linker marches in to the rescue :-).

@RobMeades
Copy link
Contributor

Actually, let me push to the preview branch again: I've just changed some of the file names during review and so if you're manually adding them it is better to get that all right. Will comment back here when pushed...

@RobMeades
Copy link
Contributor

Right, please use this branch: https://github.com/u-blox/ubxlib/tree/preview_separation_use_this_one_rmea

I will delete the other one shortly.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 11, 2022

in u_device.c you check for U_ERROR_COMMON_NOT_IMPLEMENTED but stub functions return U_ERROR_COMMON_NOT_SUPPORTED. after this change it works for us. (we use cell only).

@RobMeades
Copy link
Contributor

Ah, great, thanks for that, we will fix it on the version we merge to master. I will leave this issue open until the final version ends up here.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 12, 2022

I think its not releated to this change but if uSockCreate() is the first function called after psm, it returns error code =U_SOCK_ENOBUFS. This because uAtClientUnlock in uCellSockCreate() returns U_ERROR_COMMON_DEVICE_ERROR.

If we call uCellPwrIsAlive() bevor uSockCreate() everything works fine. But as I understand it, this should not be necessary right ?

@RobMeades
Copy link
Contributor

Interesting: are you able to see what AT sequence causes the AT parser to get upset?

@RobMeades
Copy link
Contributor

RobMeades commented Nov 12, 2022

I mean, I guess it is that AT+USOCR is failing in some way; uCellPwrIsAlive() is going to bounce an AT off the module, just to make sure it is there, but you're right, that should make no difference at all. Just out of interest, are you using UART sleep as well as PSM?

EDIT: you're on R5 so you must be, it wouldn't go into "real" PSM otherwise.

@RobMeades
Copy link
Contributor

It might be interesting to see if you called something like uCellInfoGetManufacturerStr() at that same point, does it fail also, i.e. is this specifically sockets related or is it just that any AT command that does not retry, if called just after return from PSM, fails in this way?

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 12, 2022

uCellInfoGetManufacturerStr() works at the same point but uCellInfoGetIccidStr() dont.
seems like the problem is only with functions that wait for a specific response uAtClientResponseStart.
In the debug print it seems that then the command is sent before the module is awake.

`AT+CCID
U_CELL_INFO: unable to read ICCID.

AT
AT

OK
ATE0
ATE0

OK
AT+CMEE=2

OK
AT+UDCONF=1,0

OK
ATI9

03.15,A00.01

OK
AT&C1

+UUPSMR: 0

OK
AT&D0

OK
AT&K3

OK
AT+UPSV=3

OK
AT+UPSMR=1

OK
AT+CPSMS?

+CPSMS: 1,,,"01000011","00001000"

OK
AT+UMNOPROF?

+UMNOPROF: 90

OK
AT+UPSD=0,0,0

OK
AT+UPSD=0,100,1

OK
AT+UPSDA=0,3

OK

+UUPSDA: 0,"IP"
AT+USOCR=17,PORT

+USOCR: 0

OK
U_SOCK: socket created, descriptor 2, network handle 0x2000cf88, socket handle 2.
U_SOCK: connecting socket to "IP:PORT"...
AT+USOCO=0,"IP",PORT

OK
U_SOCK: socket with descriptor 2, network handle 0x2000cf88, socket handle 2, is connected to address "IP:PORT".`

@RobMeades
Copy link
Contributor

RobMeades commented Nov 12, 2022

Very interesting, thanks for that, there is definitely something going wrong here. Let me just get my head straight on some things:

  • is it correct that you have VINT connected to the MCU (so you can tell that the module is in deep sleep)?
  • before the start of the sequence above, the module has entered deep sleep due to 3GPP power saving.
  • while in deep sleep you send an AT command to do something, e.g. create a socket; this is the command at the origin of the AT sequence above.

What should happen is that, before sending the AT command, the AT client will call uCellPrivateWakeupCallback() which will call uCellPrivateIsDeepSleepActive() and, if power saving has been agreed with the network and VINT has gone low, deepSleepWakeUp() will be called, which will reconfigure the module: you can see that happening with the ATE0 etc. in your AT log.

But, somehow or other, deepSleepWakeUp() is not returning that the module is in deep sleep. Hmph.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 12, 2022

yes we have VINT connected to the MCU. we use uCellPwrGetDeepSleepActive to check whether the module has entered deep sleep. We can also see from the power consumption that the module is in deep sleep.
during the deep sleep the MCU wakes up to transmit some data, for this a UDP socket is opened. this is what you see in the AT sequence.

strangely enough the command order of uCellInfoGetManufacturerStr() is correct.

AT
AT

OK
ATE0
ATE0

OK
AT+CMEE=2

OK
AT+UDCONF=1,0

OK
ATI9

03.15,A00.01

OK
AT&C1

OK
AT&D0

+UUPSMR: 0

OK
AT&K3

OK
AT+UPSV=3

OK
AT+UPSMR=1

OK
AT+CPSMS?

+CPSMS: 1,,,"01000011","00001000"

OK
AT+UMNOPROF?

+UMNOPROF: 90

OK
AT+UPSD=0,0,0

OK
AT+UPSD=0,100,1

OK
AT+UPSDA=0,3

OK

+UUPSDA: 0,"IP"
AT+CGMI

u-blox

OK
U_CELL_INFO: ID string, length 6 character(s), returned by AT+CGMI is "u-blox".
AT+USOCR=17,5684

+USOCR: 0

OK
U_SOCK: socket created, descriptor 2, network handle 0x2000cf88, socket handle 2.
U_SOCK: connecting socket to "IP:PORT"...
AT+USOCO=0,"IP",PORT

OK
U_SOCK: socket with descriptor 2, network handle 0x2000cf88, socket handle 2, is connected to address "IP:PORT".

@RobMeades
Copy link
Contributor

How weird! Would you be able to put some debug prints into uCellPrivateWakeupCallback() and uCellPrivateIsDeepSleepActive() to determine the route the code is following?

@RobMeades
Copy link
Contributor

...maybe in deepSleepWakeUp() also.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 12, 2022

yes i can do that. I will get back to you when i have first results.

@RobMeades
Copy link
Contributor

RobMeades commented Nov 12, 2022

One possibility, while it is in my mind, knowing that you are using STM32F4 and that power saving is very important to you: do you happen to be running FreeRTOS in tickless mode? The reason I ask is because, in our default port to STM32F4, we do not switch on tickless mode so that we can implement uPortTaskGetTickTimeMs() by incrementing a counter in the SysTick interrupt; if you are using FreeRTOS in tickless mode you would need to implement uPortTaskGetTickTimeMs() in some other way, or get FreeRTOS to correct gTickTimerRtosCount on return from MCU sleep.

The AT client will only call uCellPrivateWakeupCallback() if it believes that more than 6 seconds have passed since it was last active [this being the minimum time for any sort of sleep, UART sleep included, to take effect]; if, for some reason, uPortTaskGetTickTimeMs() were returning the wrong answer (e.g. because SysTick had stopped while the MCU was also sleeping) then it would not know that time has passed and so wouldn't know to do the waking-up bit.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 12, 2022

wow that was exactly the problem. Yes we are using the tickeless idle mode. we correct the gTickTimerRtosCount now and the problem doesn't seem to occur anymore.
Many thanks for your help.

@RobMeades
Copy link
Contributor

Phew, glad that did it for you.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 17, 2022

Hi
we have encountered another problem related to power saving.
if we configure DTR pin to controll power saving everything works fine for some time.
But irregular the module keep CTS high and we stuck in uPortUartWrite(). The module can only communicate again after a reset.

have you ever observed this behavior?

we use SARA-R510S-01B-00
modem_version: 03.15
applications version: A00.01

image

@RobMeades
Copy link
Contributor

RobMeades commented Nov 19, 2022

Ah, yes, this looks like an issue that I have seen with SARA-R5: basically what happens is that if you toggle the DTR line at the wrong time, just as the module is going into sleep, it may miss the edge and not wake-up. While the module is asleep the CTS line floats high and so you cannot send anything to it, hence you will end up stuck in uPortUartWrite(); in the STM32F4 UART driver, for other reasons (it just seems to get stuck on very rare occasions), we added a 30 second timeout on UART writes (668e93b), so it should eventually return to you and then the next command you send to the module should wake it up again, 'cos you'll be toggling DTR again and will have another chance. Are you sure that the module remains unresponsive, i.e. the only way out is reset, in your case?

There are a few ways forward:

  1. Don't use DTR (i.e. set this pin to -1 in the configuration you pass to ubxlib), instead just let the initial UART activity wake the module up, which ubxlib will do for you automatically.

  2. Carry on as you are for now, maybe reducing the guard timer in the STM32F4 UART uPortUartWrite() function (how much you can reduce it by will depend on how much data you ever send to the module in one go) and accept that commands will fail every so often.

It is possible (not yet confirmed) that there will be a maintenance release for SARA-R5 early next year which will include a fix for this problem; you will understand that going through all of the necessary approvals required for a cellular module means that such releases are rare and take quite some effort/time, hence I can't promise timescales, but there is an intention to make such a release. With that you could return to using DTR.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 21, 2022

Yes I think that is the issue we are seeing.
It seems that there are further issues related to DTR. This is what it written in the section " Known bugs and limitations".
[u-blox ID 6980] When AT&D0 is set, a DTR transition during packet switched data mode
leads to a context deactivation instead of a no action as expected.

so we will not use DTR at the moment and hope that the bugs will be fixed soon.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 23, 2022

Hi
We have disabled DTR (set pin to -1).
Now we have the problem that the module temporarily does not switch to PSM (VINT remains high) although the URC +UUPSMR: 1 was received.

have you ever observed this behavior?

image

@RobMeades
Copy link
Contributor

You'll need @philwareublox for this, rather than me, but I do know [we might have talked of this already] that the +UUPSMR URC and entry to deep sleep are not necessarily related. +UUPSMR means that the protocol stack has gone into a "suspended" state but the module may not enter deep sleep.

The next thing you're going to ask is "why not and when will the module enter deep sleep?". In your trace above it seems like 10ish seconds have passed and the module has not entered deep sleep by that point; @philwareublox: what kind of things might keep the module awake for that long?

@philwareublox
Copy link

philwareublox commented Nov 24, 2022 via email

@philwareublox
Copy link

Just a few more points:

If +UPSV is set to 0, you should not get +UUPSMR: 1
If +USPV is set to 2 or 3, you will only get +UUPSMR: 1 as the “not going to sleep” parameter doesn’t show the RTS/DTR reason for not going to sleep.

The only way I think you are seeing +UUPSMR: 1 but VINT is not dropping is because the module is using +UPSV 2 or 3 mode still and these lines are still held for keeping the module awake.

What +UPSV mode are you using?

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 24, 2022

We use +UPSV mode 1.
We see that AT+UPSV=1,1300 command is sent during the wake-up procedure.

Is it possible that I send you the Saleae trace in a PM?

@philwareublox
Copy link

Please send it to phil.ware, using same format as Rob's email. Thanks.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 24, 2022

ok thanks you should have just received an email.

@RobMeades
Copy link
Contributor

The changes required to allow one or more of GNSS/Wifi/Ble/Cell to be left out of a build, the original question of this issue, are now pushed to master here, see commit bf8cd21. I will close this issue now and will delete the preview branch in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants