Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: nvs: auto restore FS on writing while power down error. #8367

Closed
nvlsianpu opened this issue Jun 13, 2018 · 20 comments

Comments

Projects
None yet
5 participants
@nvlsianpu
Copy link
Collaborator

commented Jun 13, 2018

The scenario when the last entry is corrupt due the power down while write is very likely.

Add functionality for auto-restore the NVS once such issue detected - it is obvious the corrupted entry will be lost during the process.
cc @Laczen @lemrey @carlescufi

@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jun 13, 2018

Hi @nvlsianpu, @lemrey, @carlescufi

I have been thinking on this and I am not convinced doing this directly in NVS is the best idea. The reworked nvs module works like this on intialization: it reads everything and if it finds an error in the length (different start from end) it stops and initializes nvs as read-only, init will in this case return
-EROFS.

The user application can then read all ok variables, call reinit() and store the variables again. It can even do more: by using direct flash access it can try to read the data that is not considered valid and see if the data is still valid (e.g. when the user application uses a crc to validate the data). This is no longer possible if nvs does an auto restore.

I think that even in the case of an auto restore the init should return an error value so that the user application is aware that not all saves were done OK. This handling should anyhow be done in the user application.

@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

Hi @Laczen,
I think that the scenario of a power loss is very common and a flash library should be able to handle that situation gracefully. As a user of a flash library, I expect it to recover from such situations automatically, with minimal user intervention, instead of giving up and basically telling me "fix it yourself".

That is very error prone because it requires users to write custom application logic that:

  • will make it very hard and time consuming to debug the single cases where users report errors after performing a filesystem recovery by themselves
  • requires an understanding of the inner workings of nvs, which a user of the library (not its developers) might not have or might not be good enough :)

You assumption to be able to read all data into RAM, erase all sectors and write it back might not always be valid and sounds to me like things could easily go wrong.

@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

Hi @lemrey,

I completely understand your point, but I am looking at this problem from the data perspective. If I am a user application I would like to be sure that all data that is provided to me is correct. If an error occurs nvs cannot provide correct data and the user application is best placed to determine what to do.

I agree that minimal user intervention should be required, so I propose the following:
a. add a routine that reads the entry that generates the error
b. add a routine that recovers the filesystem without the last entry

This way it should be very simple to recover if you don't care about the lost data (call nvs_init and if it return -EROFS call nvs_recover()), and you can do something more if you have to be sure that the data is correct (call nvs_read_error(fs, id, data, len) which returns the id giving problems, the invalid data; decide if you can live with the previous value, ... call nvs_recover() when you have decided).

@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

There are usually a few ways to recover from such scenarios.

The most naive I can think about it to "swap out" two sectors.

  • copy the last (source) sector to the erased (swap) sector
  • erase source sector
  • copy back to source from swap
  • erase swap

This is an expensive operation, and we could fail in between... this wouldn't work for nvs.

Another way is to use metadata. However, although we can detect a corrupt entry with non-matching header and tail length, we can't add or modify any metadata that is already stored in flash..

I believe we would be in one of two situations:

  1. the len in the header points to a non-erased memory unit with a non-matching len field
    In this case, there was a power loss while writing the trailer (tail-len). We can assume that the data was written correctly, since the trailer is written last, however, the system can't walk backwards from that entry and we can't fix that (the tail-len is already written).

  2. the len in the header points to an erased memory unit
    The power loss occurred at an unknown moment during the write of the entry; we cannot assume the header was written correctly nor that any data was written. We could write something in the trailer to indicate that the entry is corrupt, however, we wouldn't be able to walk backwards from it. Alternatively, we could write the same length that we have in the header to be able to navigate backwards, but then we could not distinguish this entry from a valid one.

A possibility would be to extend the metadata with an additional field, let's call it "check" for the sake for the explanation. The check field would be placed beside the header and its content would always be left erased when writing an entry. If during boot, a non-matching header and tail len are found, the check field is set to jump to the next sector, acting like a sector_end entry. During read operations, entries with a programmed check field would be skipped over; the check field would be used instead of the entry length in the header. During GC, these entries would be treated as deleted entries and not copied.

This way we would achieve the following:

  • only one write operation to recover from power loss
  • unaltered write amplification, i.e. writing one entry does not require more flash_write() driver API calls

By sacrificing one additional program unit of metadata per-entry, which means:

  • more metadata overhead is stored in flash
  • more data to read back when looking up an entry (one additional program unit)
@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

Hi @Laczen , I was writing a big comment and I didn't see your reply before I posted it :)
I agree that the data returned to the application should always be correct, so I am very happy with having a separate _recover() function that recovers the filesystem (without the last entry) and optionally an API to point to the last (corrupt) entry; it might useful in some cases.

Did you have any idea already on how to recover the filesystem without too much trouble?

@aurel32

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

We can assume that the data was written correctly, since the trailer is written last, however, the system can't walk backwards from that entry and we can't fix that (the tail-len is already written).

I don't think this is a correct assumption for the reworked version of the code. Depending on the write block size and the data length, the trailer might be written at the same time than the last chunk of data.

@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

Hi @lemrey,

As you know I'm trying to avoid doing rewrites in flash. Also adding a check does not help because it only works in forward reading, and the reworked version is doing only backwards reading.

I would recover the filesystem in the following way:

  1. Write a length at a free location that points back to the last valid entry start (this should always be possible as place is left to store the sector end entry, except of course when fail is exactly when writing the sector end).
  2. Call the garbage collector if this length was written in the sector end location.

When the system does a reboot it can check the difference in length between between start and end again and see if it has been corrected (and ignore the error in this case). This only leaves the case where a failure would occur when writing the sector end, in this (rare) case the sectors should be copied one at a time.

@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

Sorry @Laczen, perhaps I didn't explain myself well enough.
The check field would only be written once, if necessary, during recovery: there would be no rewrites.

When you say

Write length at a free location that points back to the last valid entry start

I agree with you that we would have space for that, but I am not sure how we would find that location during initialization. As I understand, during initialization nvs does read entries walking forward, until the last entry is found and the write location (from which to walk backwards) is set.
If a corrupt entry is found, its length might have any value: where to place the tail?

edit: what do you mean by sector end location? I thought the sector end entry was a special entry that could be placed anywhere to close a sector and jump to the next.

@nvlsianpu

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 14, 2018

@lemrey @Laczen Genareally comment on restoring damaged entry: always need to thread such entry as invalid, not trusted and, polluted. I really like not complex way for slowing the error - a recovery gc and sector swap. This is in line with that NVS should be the simplest, most lightweight FS in Zephyr.

@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

Hi @lemrey,

Finding the location to write the end would be done by stepping through the sector from the end (in steps of write block size. As long as the location is empty it can be written, the write would be done just after the location that is not empty.
By sector end location I mean the last location before the next sector start that has enough space to add the sector end (aligned(6 bytes)).

Hi @nvlsianpu,

I agree it should be simple and nvs cannot restore, the user application might. It feels wrong just to ignore the last write completely, restore the fs (without the last write) and continue without making the user application aware.

@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

@Laczen there might be no need to step throught the sector I think. We could just place the sector end at the very end of the sector. One advantage of writing the sector end the furthest possible from the end of the sector (and close to the corrupt header) would be the possibility to write a new sector end if a previous sector end write (during recovery) has failed, is that what you are thinking?

@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2018

@lemrey, No, all that is needed to restore the fs is to insert a length that would jump over the bad data (in reverse direction) all reads, gc, ... will work if this is done. No need to close a sector or do gc unless really required (when writing at the end of the sector). Really cool isn't it :). Simple, effective, all data is kept.
Only on startup the error will reappear but it is easy to find if it has been corrected: jump to the sector end, read back until you find a non-empty item: this has to be a length or invalid data, jump back until you reach the last valid item (the write location), if you can reach it the error has been corrected, if you cannot reach it (you get below the write location) the error has not been corrected.

@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 15, 2018

@Laczen, I see, I think that should work!
I have a couple of more general comments, this might not be the best place for it but nevertheless:

  1. Have you considered moving the id at the end of an entry?
    It would have the advantage of doing one less read per entry everytime an entry is read or jumped over.
  2. You could consider having a kconfig entry to size the read buffer and write buffers. I think these two could really help to keep the number of API calls low, when desirable. I tried to run the nvs sample with a flash simulator with 4 pages of 1k each and at the end of the sample nvs had made 8269 flash_read() calls, and 3175 calls on reboot.

Laczen added a commit to Laczen/zephyr that referenced this issue Jun 18, 2018

subsys: fs/nvs: Correction for issue zephyrproject-rtos#8367
Modifications to automatic restore the fs after incomplete write due to
power cut. Changed _nvs_sector_is_used to reduce number of reads.

nvs_reinit(): now supports automatic restore of fs after power cut.

_nvs_sector_is_used: modified to a block style compare to reduce the
number of reads.

Signed-off-by: Laczen JMS <laczenjms@gmail.com>
@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jun 18, 2018

@nvlsianpu, I have added the auto restore functionality. I have chosen the simple approach: copy the data that is surely correct.

@lemrey,

  1. id at the end: I haven't considered this, maybe in future...
  2. Configurable size for the read and write buffer could be done, but I see little advantage in normal use, I think the block_size of 8 byte is a good fit for minimum reads/writes and stack usage. Making it larger will only be advantageous for large data blocks. And when it is configurable, how to make sure that only sane values are used?
@lemrey

This comment has been minimized.

Copy link
Collaborator

commented Jun 18, 2018

Hi @Laczen , I think it would be a nice feature since what is stored in flash is really up to the application; some applications might store larger entries. Perhaps the buffers could be sized similarly to how you do now when reading a sector.

@Laczen

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2018

@MaureenHelm , this issue is solved in PR8141.

@nvlsianpu , could you close this issue, thanks

@MaureenHelm

This comment has been minimized.

Copy link
Member

commented Jul 12, 2018

This issue shouldn't be closed until #8141 gets merged. Please update the PR description to note that it fixes this issue.
https://help.github.com/articles/closing-issues-using-keywords/

Laczen added a commit to Laczen/zephyr that referenced this issue Jul 24, 2018

subsys: fs/nvs: Correction for issue zephyrproject-rtos#8367
Modifications to automatic restore the fs after incomplete write due to
power cut. Changed _nvs_sector_is_used to reduce number of reads.

nvs_reinit(): now supports automatic restore of fs after power cut.

_nvs_sector_is_used: modified to a block style compare to reduce the
number of reads.

Signed-off-by: Laczen JMS <laczenjms@gmail.com>
@aurel32

This comment has been minimized.

Copy link
Collaborator

commented Aug 28, 2018

Now that PR #8141 has been fixed, I guess this bug can be closed. Correct?

@nvlsianpu

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 28, 2018

AFAIK was fixed by #8141

@nvlsianpu nvlsianpu closed this Aug 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.