

# Enterprise LDPC Arch. for Soft-Error Detection and Noise Immunity Reduction

Presenter Jeff Yang, PhD.

Algorithm & Technology team, Silicon Motion Inc.





#### Legal Notice and Disclaimer

Nothing in these materials is an offer to sell any of the components or devices referenced herein.

The content of this document including, but not limited to, concepts, ideas, figures and architectures is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Silicon Motion Inc. and its affiliates. Silicon Motion Inc. assumes no responsibility or liability for any errors or inaccuracies that may appear in the informational content contained in this document.

Silicon Motion Inc. may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Silicon Motion, Inc., the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2022 Silicon Motion Inc. or its affiliates. All Rights Reserved.

Silicon Motion, the Silicon Motion logo, MonTitan, the MonTitan logo are trademarks or registered trademarks of Silicon Motion Inc.





#### SSD module level noise source

- Noise from the controller external data transfer bus: PCIE, DRAM/Flash interface.
- Noise from NAND's Program/read/erase, and all kind of disturbance become worse.
- Radiation induced soft error.
- Other noise.
- Use ECC to protection and provide noise immunity.
- ECC identify the codewords: correctable/uncorrectable, detectable/undetectable(mis-correct).







#### Radiation induced soft errors

- Cosmic rays generate Neutrons with particularly high penetrating power and passing through the buildings.
- Nuclear reaction cause reverse internal logic states as a single event upset(SEU).
- The rate of SEU becomes non-negligible as the process of VLSI improving.
- In SSD module, the soft-error rate become more and more important.
- Controller's error recovery flow should cover the soft-error occurred in NAND Flash.



Ref: H. Iwahita "Neutro-energy-dependent soft errors.." NTT technical review vol.19 No.6 Jun 2021





#### Nested LDPC protection to provide good

noise immunity

- Read retry(re-read) in an efficient and normal error recovery steps. If the error can be detected, we can apply the re-read
- LDPC error floor and error detect capabilities are still important.
- Media End-to-End(E2E) information to protect wrong location but correctable chunks. (USE LBA or PPA)
- Randomizer and the number of ones/zeros.
- Decoder result:
  - LDPC decoding pass/fail: dec\_fail.
  - CRC check pass/fail: crc\_fail.
  - Media E2E compare: me2e\_fail.
  - HE2E check is optional.





#### LDPC parity integrity check

- LDPC encoder includes huge combinational logics and registers. (also include ROM)
- A soft error with SEU may cause a totally wrong LDPC parity.
- An immediate check on LDPC parity is required to program the correct parity into NAND.
- During the data DMA from controller to NAND, if the wrong parity has been detect, skip the program CMD and re-encoding again.
- It is good to have a CRC check standard between controller and NAND.







#### LDPC decoder for soft error immunity.





#### Decoding path analysis.



- Most soft error, occurring in data path, can be fixed by LDPC decoder.
- Both CRC check and mediaE2E check can reconfirm the quality of payload.
- The output memory ECC also protects payload.
- Critical interface control single fail may cause wrong decoding.
- CRC/LDPC/ME2E/HE2E provide a very good detect capability.





#### Decoding process system failure



- The decoding process cannot terminate and cause hanging.
- Decoding process hanging.
  - Decoder interface control signal.
  - Decoder internal control signal.
  - ROM code.
- A timeout Monitor to trigger redecoding can solve the system hanging.





#### Noise(Error bits) from NAND flash.



- Hard-decoding threshold is most important for the read performance. Because the soft-info fetch will consume multiple READ-CMD. (multiple tRs).
- 4K LDPC hard-decoding provide better tradeoff than 2K or 1K LDPC code.
- Soft-dec region should move to Hard-dec region from reducing the error bit or enlarge the hard-decoding capability.
- For the most advance QLC, the experience shows the soft-dec region can be improved, but the long tail of the error bit distribution becomes more serious.
- Even if the lowest error bit read method, there is still existed few worse condition chunks.
- An Enhanced-decoding method is required to provide higher reliability of the SSDs.



## Reliability enhancement from advance soft-decoding.

WL-0 (layer-0)
WL-1 (layer-1)
WL-2

WL-(N-1)





- Identify the sensing location to get the proper soft-decoding information from other pages.
- Unknown Channel State with predefined LLR value also provide correction capability.
- Through the decoding successfully chunk to reconstruct more soft-info.
- Post processing will be launched after the target iteration reached.





#### Summary

- Native LDPC correction property on hard-decoding and the error floor capability are the key of the SSD performance and reliability.
- A robust soft-error immunity SSD controller is increasingly being valued.
  - Under a good LDPC, we should consider the encoder check and a watchdog timer.
- Enhance soft-decoding is required to solve the long tail of error bit distribution.





### Thank you.

Visit our booth!

Jeff.Yang@siliconmotion.com

