New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LwM2M server fallback #65745
LwM2M server fallback #65745
Conversation
ddcbabf
to
fc7bf85
Compare
Add API to find a security instance ID with given Short Server ID. Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
React to disable executable, as well as add callback that allows disabling server for a period of time. Also add API that would find a next server candidate based on the priority and server being not-disabled. Move all server related functions into its own header. Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
If server registration fails, allow fallback to secondary server, or fallback to bootstrap. Also allow fallback to different bootstrap server. Add API to tell RD client when server have been disabled by executable command. Changes to RD state machine: * All retry logic should be handled in NETWORK_ERROR state. * New state SERVER_DISABLED. * Internally disable servers that reject registration * Temporary disable server on network error. * Clean up all "disable timers" on start. * Select server first, then find security object for it. * State functions return void, error handling is done using states. * DISCONNECT event will only come when client is requested to stop. * NETWORK_ERROR will stop engine. This is generic error for all kinds of registration or network failures. * BOOTSTRAP_REG_FAILURE also stops engine. This is fatal, and we cannot recover. Refactoring: * Server selection logic is inside server object. * sm_handle_timeout_state() does not require msg parameter. Unused. * When bootstrap fail, we should NOT back off to registration. This is a fatal error, and it stops the engine and informs application. Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
014190c
to
7c3929b
Compare
7c3929b
to
74035be
Compare
Changed one arrow on image to get it less than 1000px wide. |
Tests for the bootstrap submitted in #66120 |
Not sure what tool you used but it would be fine (and even encouraged) to directly use the SVG version of your diagram. |
The png is drawn using https://app.diagrams.net/ (Previously it was draw.io) |
In fallback refactoring to the LwM2M engine, some changes to the server object are visible in hard-coded test values. Also, add Endpoint wrapper class that ensures the registration state of the returned endpoint. Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
Properly document the actions that application should take on certain events. This clarifies the events that indicate that the LwM2M engine is stopped. Add missing events to the state machine diagram and apply color coding to states. Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
74035be
to
272e303
Compare
Thanks for the suggestion. I was not even aware that we could use SVG directly. I have always drawn those diagrams in draw.io that could produce SVG, but I just exported as PNG. I now replaced the diagram with SVG version that can be opened in https://app.diagrams.net/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
This PR is a list of changes that implement fallback mechanism and various fixes to our LwM2M state machine.
priority
field, if we use LwM2M 1.1disable
functionality to server object.NETWORK_ERROR
state. All errors should lead to that state.Problems on current LwM2M engine
When I started to implement bootstrap testcases and fallback testcases from LwM2M interoperability set, it came too evident what the limitations of the current LwM2M engine is. This mainly consist of how RD_client is implemented.
Refactoring the Server Object
I implemented few missing resources into the server object. These are priority and disabled functionality.
Disabled works, so that it has one
k_timepoint_t
per server instance. When that timepoint is expired, we know that the server is not disabled, so it is active.Then obviously missing API was a functionality that goes through list of servers and pick the next one based on their priority values and whether they are active (as in, not disabled). This works fine, because Bootstrap servers, by the spec, do not have LwM2M server object instance. So I don't need to keep looking that that flag. Also, I noticed that current RD client is wrongly going first through Security instances and picking first one from there, this order is wrong. First we should pick the server instance, then find its Security object instance.
So there is now new API
Also, if the connection to server fails, and we want to fall back to secondary server, I though that we could disable current server for a short period, so the previous API would find a next candidate. So this new API is also there
Few other functions added, please see the
lwm2m_obj_server.h
Refactoring the RD-Client
If server registration fails, allow fallback to secondary server,
or fallback to bootstrap.
Also allow fallback to different bootstrap server.
Add API to tell RD client when server have been disabled by
executable command.
Changes to RD state machine:
of registration or network failures.
recover.
Refactoring:
This is a fatal error, and it stops the engine and informs application.
Clarification to the events
As a result of these changes, I clarified the documentation to match what the code does.
Now it should be clear, what application should do on each of the event. Please see the changes in
lwm2m.rst
and thelwm2m_engine_state_machine.png
In short, events that stop the engine:
lwm2m_rd_client_stop()
As seen from the diagram above, only those three events end up into the IDLE state. All other events are just informational, and should result the client to recover, if there was connection failures.
One new event added
SERVER_DISABLED
, and one new state with the same name.