Skip to content
jhofstee edited this page Jan 24, 2020 · 51 revisions

1. Budgetting the data partition

Available

  • CCGX: 28M orso
  • Venus GX 1st version: 100MB
  • Cerbo GX: 512MB

Usage

  • Log files: We have around 40 processes that always run and log. Per v2.23; the maximum space that the logfiles for a particular process takes has been reduced to 4 files of 25kB each, 100kB. So in total for all 40 processes, this amounts to 4.000kB. Details of the change in commit 9ce14ef1e, which was backported to v2.23
  • Firmware & settings file cache (mqtt-rpc)
  • Settings
  • Factory installed files (negligible from a size point of view)
  • VRM Logger backlog

2. Factory installed files on the data partition

# cat /data/venus/installer-version 
v2.11
Victron Energy

# cat /data/venus/serial-number     
HQ1825ZUT5T

# cat /data/venus/wpa-psk       
gt5nyede

# cat /data/venus/part-number
BPP900400100

In the same folder there is also one other file, which is auto generated:

# cat /data/venus/unique-id  
985dxxxxx3a1

3. Writing files to data

Filesystem in Linux are typically asynchronous; the data is reported as written when it is in the page cache, but not yet on the storage itself. For user settings / keys which are generated once and distribute etc it is important that the data and meta-info is on the disk before using the data, since an embedded device can be turned off / power cycled at any time. See Ensuring data reaches disk for details, in short for a single file do:

  1. create a new temp file (on the same file system!)
  2. write data to the temp file
  3. fsync() the temp file
  4. rename the temp file to the appropriate name
  5. fsync() the containing directory

And do check the return codes up to step 4, also of fsync. Ignore if 5 fails, since by that time the file is already updated in memory and there is no way to recover.

or

make sure you can deal with corrupted files which can be (re)generated.

4. Handling failures related to the data partition

In v2.30, various improvements have been added.

vrmlogger reads /run/data-partition-state, translates its content to a number, and sends it to VRM on boot and there-after only when different from its previously submitted value.

In case the data-partition is not mounted (state == failed or state == failed-to-mount); then an init script will stop vrmlogger; since it can't run without datapartition anyway; and uses curl to send dps to VRM itself.

While normally checking all data against a device-authorisation-token; vrm will accept dps transmissions always.

Note that curl sends it as a DPS-TRANSMISSION (c=100). Which causes it to be stored in the events table. Vrmlogger sends it as a normal data transmission; and then its not stored in the events table; instead its in the normal databases.

In VRM; this status is saved as dataAttribute dps; its different values are:

______State______ Description
0 - fine
1 - failed-once Set on device reboot; A run-time read-only remount occurred and was stored in u-boot var `data-failed-count` and a second fail was not detected.
2 - recovered This follows a 'failed-once', after 24 hours of no failure.
3 - failed This is set on device reboot on a second run-time read-only remount, based on the u-boot var `data-failed-count`
4 - failed-to-mount If `/data` wasn't even mounted at boot. It will mount a tmpfs for `/var/log`.

Primary reporting is done with report-data-failure.sh, where it ends up in the eventLog MySQL table. VRM logger also reportes the state of /run/data-partition-state, but failed and failed-to-mount are not (reliably) sent by vrmlogger. See report-data-failure.sh. This is because vrmlogger won't operate properly with a malfunctioning /data.

The test-data-partition.sh script contains more explanation of how the conclusions are reached.

To analyse status in the field; there is a Grafana dashboard.

Note: there is no authoritative status of which Venus/GX Devices are broken. As it stands, the curl script reports 'broken' events but no recovery, and vrmlogger doesn't report 'failed' and 'failed-to-mount'. So, it's half here, half there. Changes in VRM logger are underway to be able to handle /data getting read-only, after which we no longer need the curl reporting.

Clone this wiki locally