Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deep sleep state #75

Closed
eeFLis opened this issue Oct 28, 2022 · 32 comments
Closed

deep sleep state #75

eeFLis opened this issue Oct 28, 2022 · 32 comments
Assignees

Comments

@eeFLis
Copy link
Contributor

eeFLis commented Oct 28, 2022

Hi Rob

is there a way to detect when the module is going to enter deep sleep right before vint = low?
I think the only possibility is to check for +UUPSMR: 1 URC. I have seen that you already do this in the "private" area.
is it possible to make the deepSleepState readable through the API as well?

@RobMeades
Copy link
Contributor

Hi there: I'll need to refresh my memory, and probably consult @philwareublox, the expert in these matters (who is out of the office this week) but I believe the answer is that there is no good way, which is why I kept that URC detection over in the private area.

I will get back to you...

@RobMeades RobMeades self-assigned this Oct 28, 2022
@RobMeades
Copy link
Contributor

Having consulted the oracle @philwareublox, you are quite correct that a callback for +UUPSMR: 1 would tell you when the module has entered 3GPP power saving, i.e. the protocol stack has entered that state where, as agreed with the network, it is not contactable and yet will be contactable at a known time in the future, hence it remains logically attached to the network.

If you have also enabled "UART power saving" then the physical HW of the module may then enter a very low power deep sleep state; it also may not, i.e. you may get +UUPSMR: 2 and no deep sleep will occur because some protocol entity or other inside the module is still running.

Consequently +UUPSMR: 1 may, or may not, be followed by VINT going low.

Given the above, would a callback on +UUPSMR: 1 still be useful to you? If so I would be happy to add it. It would only work for SARA-R5 I think, though that is probably fine for you, and I would need to explain all of the provisos above quite clearly, somehow!

@eeFLis
Copy link
Contributor Author

eeFLis commented Oct 31, 2022

hi Rob
thanks for your answer.

It is fine for us if its only work for SARA-R5. we use a SARA R510s module.

we rather think of a get function for the deepSleepState uCellPrivateDeepSleepState_t instead of a +UUPSMR: 1 callback.
it would also be helpful if there is an additional state in uCellPrivateDeepSleepState_t when deep sleep is prevented (+UUPSMR: 2).

What do you think about that?

@RobMeades
Copy link
Contributor

If that is sufficient it would be easier to implement; in order to decouple it from deep sleep ('cos deep sleep requires 3GPP power saving but 3GPP power saving does not necessarily mean that there will be deep sleep), and since we already have uCellPwrGetDeepSleepActive(), I would probably call it uCellPwrGet3gppPowerSavingState() and it would reflect the last reported state according to the three +UUPSMR numbers, plus I would have it also check VINT so that you wouldn't have to call uCellPwrGetDeepSleepActive() as well. The states would be something like:

typedef enum {
    U_CELL_PWR_3GPP_POWER_SAVING_STATE_UNKNOWN,
    U_CELL_PWR_3GPP_POWER_SAVING_STATE_UNAVAILABLE, /**< 3GPP power saving is either not switched on
                                                         or is not allowed by the network. */
    U_CELL_PWR_3GPP_POWER_SAVING_STATE_BLOCKED, /**< 3GPP power saving could be active but application
                                                     activity on the module is blocking it. */
    U_CELL_PWR_3GPP_POWER_SAVING_STATE_ACTIVE, /**< the cellular protocol stack has entered 3GPP power
                                                    saving. */
    U_CELL_PWR_3GPP_POWER_SAVING_STATE_ACTIVE_DEEP_SLEEP_ACTIVE, /***< the cellular protocol stack on the
                                                                       module has entered 3GPP power saving
                                                                       and the module HW has been able to
                                                                       take advantage of this and has entered
                                                                       deep sleep. */
    U_CELL_PWR_3GPP_MAX_NUM_POWER_SAVING_STATES
} uCellPwr3gppPowerSavingState_t;

Would that be OK?

@eeFLis
Copy link
Contributor Author

eeFLis commented Oct 31, 2022

that would be great.

since there is U_CELL_PWR_3GPP_POWER_SAVING_STATE_UNAVAILABLE would it be possible to have a additional state like U_CELL_PWR_3GPP_POWER_SAVING_PARMETERS_AGREED_BY_NETWORK? which indicates that the parameters have been agreed by the network.

@RobMeades
Copy link
Contributor

Yup, can do that, let me put something together and I'll post a preview branch here for you to look at.

@RobMeades
Copy link
Contributor

I have pushed a preview branch here:

https://github.com/u-blox/ubxlib/tree/preview_cell_3gpp_ps_state

Note that I've performed no testing on this as yet, all I know is that it passes compilation/static-analysis/doxygen/code-style checking. When you get a chance, please let me know if it does what you want and I will then start testing.

@eeFLis
Copy link
Contributor Author

eeFLis commented Oct 31, 2022

that was fast. thank you that is exactly what we need.

@RobMeades
Copy link
Contributor

Great, thanks for the swift feedback. Feel free to use the preview branch if you want to get moving quickly; I won't be able to get the testing/reviewing done until next week but I think it is pretty unlikely to change.

When I have update the master branch here with the final version I will let you know and will delete the preview branch some time after that.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 28, 2022

Hi

There seems to be a issue when uCellPwrGet3gppPowerSavingState is called multiple times when the module is in deep sleep (VINT==low).
the function returns U_CELL_PWR_3GPP_POWER_SAVING_STATE_ACTIVE_DEEP_SLEEP_ACTIVE in the first call which is correct.
However, all calls afterwards returns U_CELL_PWR_3GPP_POWER_SAVING_STATE_AGREED_BY_NETWORK .

this seems to be because in uCellPrivateIsDeepSleepActive() the deepSleepState is set to U_CELL_PRIVATE_DEEP_SLEEP_STATE_ASLEEP afterwards uCellPrivateIsDeepSleepActive() is not called again in uCellPwrGet3gppPowerSavingState() .

@RobMeades
Copy link
Contributor

RobMeades commented Nov 29, 2022

Ah, well spotted, so would this be the right fix do you think (see the additional "or" check on deepSleepState in the if() statement in the middle):

powerSavingState3gpp = U_CELL_PWR_3GPP_POWER_SAVING_STATE_AGREED_BY_NETWORK;
if (pInstance->pSleepContext->powerSaving3gppOnNotOffCereg) {
    if (pInstance->deepSleepBlockedBy >= 0) {
        powerSavingState3gpp = U_CELL_PWR_3GPP_POWER_SAVING_STATE_BLOCKED_BY_MODULE;
        if (pApplication != NULL) {
            *pApplication = pInstance->deepSleepBlockedBy;
        }
    } else {
        if ((pInstance->deepSleepState == U_CELL_PRIVATE_DEEP_SLEEP_STATE_PROTOCOL_STACK_ASLEEP) ||
            (pInstance->deepSleepState == U_CELL_PRIVATE_DEEP_SLEEP_STATE_ASLEEP)) {
            powerSavingState3gpp = U_CELL_PWR_3GPP_POWER_SAVING_STATE_ACTIVE;
            if (uCellPrivateIsDeepSleepActive(pInstance)) {
                powerSavingState3gpp = U_CELL_PWR_3GPP_POWER_SAVING_STATE_ACTIVE_DEEP_SLEEP_ACTIVE;
            }
        }
    }
} else {
    powerSavingState3gpp = U_CELL_PWR_3GPP_POWER_SAVING_STATE_BLOCKED_BY_NETWORK;
}

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 29, 2022

yes that should fix the issue. Thank you

@RobMeades
Copy link
Contributor

Great, I've updated the preview branch. Haven't managed to get this onto master yet as I've been re-writing the test system for a few weeks, will let you know when it gets there.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 30, 2022

btw

during the long time test we noticed that sometimes the at client seems to be confused what was already read from the RX buffer.
The debug output shows messages that were not sent at this time by the module (we have checked the UART communication) and the printed messages are identical with the messages from the past.

Have you observed this behavior on your test system or do you have an idea what could be causing it?

@RobMeades
Copy link
Contributor

That's interesting: our testing is limited to about 40 minutes of fairly intense AT activity, we don't have a long term test setup at the moment, hence it's not something I've seen. That said, the test system re-write is intended to allow us a lot more flexibility, so we could set such a thing up. Can you describe the kind of scenario?

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 30, 2022

the scenario is as follows:

  1. Wake up from psm.
  2. send data to a UDP socket.
  3. Wait for an response.
  4. wait until the module switches to psm (VINT == low) and sleep for about 60 Seconds
  5. start again at point 1

this works for an hour and more before the error occurs

@RobMeades
Copy link
Contributor

Great - looks like it is time I set that up as a separate long-term test. Will aim to do so in the next few weeks.

The buffering code in the AT Client is more complex than it needs to be in order to support the intercept functions, needed by short-range EDM mode and cellular chip-to-chip encryption, so there could be a problem in there somewhere.

@eeFLis
Copy link
Contributor Author

eeFLis commented Nov 30, 2022

is there a way around this when it is not needed? This is because the application will stop working if it thinks it is receiving data that does not really exist.

@RobMeades
Copy link
Contributor

Unfortunately not - the code is just written to include that functionality (see bufferFill()), taking it out would require re-writing the code, which would of course be simpler, but then you've got new AT Client buffer handling code to re-test, whereas the current stuff has been in place for a few years now and so is, at least in other respects, pretty well hardened.

We will work to track it down: if you gain any further insights please update this ticket, or maybe start another dedicated to the subject.

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 6, 2022

Hi Rob

what we see when the issue occurs is that urcCallback reports a lot of readable data.
But definitely no data has been received between "URC checking done" and "data readable 565".
and the "readable data" are those already received at an earlier time.

U_AT_CLIENT_0-0: possible URC data readable 13, already buffered 0.
+CSCON: 1
U_AT_CLIENT_0-0: URC checking done.
U_AT_CLIENT_0-0: possible URC data readable 565, already buffered 0.

@RobMeades
Copy link
Contributor

RobMeades commented Dec 6, 2022

Interesting: depending on how adventurous you feel and how much static RAM you have free, there is a compilation switch U_CFG_AT_CLIENT_DETAILED_DEBUG in the AT Client which I used when debugging buffering behaviour for chip-to-chip encryption and left there in case of a future need.

When defined this causes log points that are already present in the code to become active and record the AT Client buffer state to a static logging buffer. The logging is started and stopped with uAtClientDetailedDebugOn()/uAtClientDetailedDebugOff() (without destroying the buffer in-between). You would then call uAtClientDetailedDebugPrint() when you wanted to print the buffer out [and empty it]; this is also called automatically on uAtClientDeinit().

I'm not yet done with the test system re-write, should be this week though, then I can try to reproduce the problem.

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 7, 2022

thanks for the hint. I have seen this option, unfortunately we dont have enough memory for it.

@RobMeades
Copy link
Contributor

How much RAM might you have free? We currently set the number of entries in the logging array to 1000, which is likely overkill. Each entry is 16 int32_t's, so 64 bytes each, making it 64 kbytes total. If you had, say, 5 kbytes free you could set it to 75 entries and might capture something.

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 7, 2022

thanks this way it fits

the issue seems to occur between place 3 and 4.
somewhere in the function uartReadNoStutter or can you recognize more from the log?

U_AT_CLIENT_0-0: URC checking done.
U_AT_CLIENT_0-0: possible URC data readable 0, already buffered 516.
U_AT_CLIENT_0-0:    0 100 ? @   251843: buffer 0x2000f2bc (2220)  ri 19  l 19 lb 19,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:    1 101 ? @   251843: buffer 0x2000f2bc (2220)  ri 19  l 0 lb 0,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:    2 102 ? @   251843: buffer 0x2000f2bc (2220)  ri 0  l 0 lb 0,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:    3   1   @   251843: buffer 0x2000f2bc (2220)  ri 0  l 0 lb 0,  pD 0x00000000 pDI 0x00000000 l 0 x 0 y 0 z 0 rl 0.
U_AT_CLIENT_0-0:    4   3   @   251843: buffer 0x2000f2bc (2220)  ri 0  l 0 lb 0,  pD 0x00000000 pDI 0x2000f2bc l 0 x 0 y 0 z 0 rl 0.
U_AT_CLIENT_0-0:    5   4   @   251853: buffer 0x2000f2bc (2220)  ri 0  l 0 lb 0,  pD 0x00000000 pDI 0x2000f2bc l 0 x 0 y 0 z 0 rl 1124.
U_AT_CLIENT_0-0:    6   5   @   251853: buffer 0x2000f2bc (2220)  ri 0  l 0 lb 1124,  pD 0x00000000 pDI 0x2000f2bc l 1124 x 1124 y 0 z 0 rl 1124.
U_AT_CLIENT_0-0:    7  16   @   251875: buffer 0x2000f2bc (2220)  ri 0  l 1124 lb 1124,  pD 0x00000000 pDI 0x2000f2bc l 1124 x 1124 y 0 z 0 rl 1124.
U_AT_CLIENT_0-0:    8 100 ? @   251875: buffer 0x2000f2bc (2220)  ri 0  l 1124 lb 1124,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:    9 100 ? @   251875: buffer 0x2000f2bc (2220)  ri 2  l 1124 lb 1124,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:   10 101 ? @   251875: buffer 0x2000f2bc (2220)  ri 2  l 1122 lb 1122,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:   11 102 ? @   251875: buffer 0x2000f2bc (2220)  ri 0  l 1122 lb 1122,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:   12 100 ? @   251875: buffer 0x2000f2bc (2220)  ri 6  l 1122 lb 1122,  pD 0x00000000 pDI 0x00000000 l -1 x -1 y -1 z -1 rl -1.
U_AT_CLIENT_0-0:   13 101 ? @   251875:U_AT_CLIENT_0-0: URC checking done.

@RobMeades
Copy link
Contributor

Oooo, interesting, let me refresh my memory as to what the log points mean and I'll post back here...

@RobMeades
Copy link
Contributor

RobMeades commented Dec 7, 2022

Hmmm, yes, you're right: between LOG_BUFFER_FILL(3) and LOG_BUFFER_FILL(4) readLength goes from 0 to 1124.

And, barring stack weirdness, the only way it can do that is if uartReadNoStutter() returned that value, and the only way it can do that is if uPortUartRead() for STM32F4 is returning 1124. So I think suspicion moves to the UART driver.

Are you using a debugger? If so, might it be possible to drop some diagnostic information into some variables down in the UART driver and then put a break-point on where the problem occurs and go look at them? If not, let me try to think up something else.

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 7, 2022

Yes we use a debugger. We will do it as you suggested.

However, the issue occurs only rarely and sometimes first after hours.
I will report back when I have more information.

@RobMeades
Copy link
Contributor

Thanks, and sorry to encumber you with this, I really will get to it next week, but if you happen to find out something first then do let me know.

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 7, 2022

no problem. We are grateful that you point us in the right direction.
I have more the feeling the problem is on our side as the problems seem to come from the porting layer and we use a STM32L4.
So I am sorry to encumber you with this.

@eeFLis
Copy link
Contributor Author

eeFLis commented Dec 12, 2022

I think we have found the problem

the error happened when we migrated the STM32F4 porting layer to STM32L4.
at LL_DMA_GetDataLength we didn't take care that the channel (“stream” in
STM32F4 lines) is shifted by one compared to the channel name.

#define LL_DMA_CHANNEL_1                  0x00000000U /*!< DMA Channel 1 */
#define LL_DMA_CHANNEL_2                  0x00000001U /*!< DMA Channel 2 */
#define LL_DMA_CHANNEL_3                  0x00000002U /*!< DMA Channel 3 */

since then the error has not appeared again
So I am sorry that I have encumbered you with this.

@RobMeades
Copy link
Contributor

Woohoo! Well found, the ST HAL is a bit of a nightmare for "out by one" problems. I really hope that is it; if not, please don't hesitate to raise the problem again. I won't feel encumbered :-).

@RobMeades
Copy link
Contributor

I believe the issues raised here are now sorted with commit 0c56afa being pushed to master here, will close this issue and delete the preview branch in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants