Speaker：Yeong-Luh Ueng 2018/4/17

Slides:

Advertisements

Similar presentations

研究生大進擊盧永豐

Advertisements

2014 年上学期湖南长郡卫星远程学校制作 13 Getting news from the Internet.

宏观经济学 N.Gregory Mankiw 上海杉达学院.

-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學資訊管理系李麗華教授.

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

Chaoping Li, Zhejiang University

Mode Selection and Resource Allocation for Deviceto- Device Communications in 5G Cellular Networks 林柏毅羅傑文.

Writing 促销英文信促销的目的就是要卖出产品，那么怎样才能把促销信写得吸引人、让人一看就对产品感兴趣呢？下面就教你促销信的四步写法。

An Ultra-Wearable, Wireless, Low Power ECG Monitoring System

Leftmost Longest Regular Expression Matching in Reconfigurable Logic

A TIME-FREQUENCY ADAPTIVE SIGNAL MODEL-BASED APPROACH FOR PARAMETRIC ECG COMPRESSION 14th European Signal Processing Conference (EUSIPCO 2006), Florence,

Unit 4 I used to be afraid of the dark.

Visualizing and Understanding Neural Machine Translation

IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 3, MARCH 2013

AN INTRODUCTION TO OFDM

An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET

Applications of Digital Signal Processing

Rate and Distortion Optimization for Reversible Data Hiding Using Multiple Histogram Shifting Source: IEEE Transactions On Cybernetics, Vol. 47, No. 2,February.

Platypus — Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies.

指導教授：許子衡教授報告學生：翁偉傑 Qiangyuan Yu , Geert Heijenk

氮化銦鎵藍光發光二極體效率衰退之抑制 Reduction of efficiency droop in Blue InGaN LEDs

The Empirical Study on the Correlation between Equity Incentive and Enterprise Performance for Listed Companies 上市公司股权激励与企业绩效相关性的实证研究汇报人：白欣蓉学号：

不断变迁的闪存行业形势 Memory has changed, especially serial - from a low cost, low pin count, slow memory to an advanced, high performance memory solution to save.

Flash数据管理 Zhou da

HLA - Time Management 陳昱豪.

Unit 7 What’s the highest mountain in the world?

组合逻辑3 Combinational Logic

Quantum Computer B 電機三莊子德

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi

第14章竞争市场上的企业上海杉达学院国贸系.

The role of leverage in cross-border mergers and acquisitions

Outrigger Optimization for Super Tall Structures Under Multiple Constraints 多约束条件下超高结构伸臂系统优化.

Jia Zhao Simon Fraser University BC, Canada

Interval Estimation區間估計

參加2006 SAE年會-與會心得報告臺灣大學機械工程系所黃元茂教授

重點資料結構之選定會影響演算法選擇對的資料結構讓您上天堂程式.

2019/1/2 Experimental Analysis on Performance Anomaly for Download Data Transfer at IEEE n Wireless LAN 在IEEE n無線LAN上下載數據傳輸的性能異常的實驗分析 Author:

Formal Pivot to both Language and Intelligence in Science

塑膠材料的種類塑膠在模具內的流動模式流動性質的影響溫度性質的影響

Lesson 44:Popular Sayings

2012清大電資院學士班「頂尖企業暑期實習」經驗分享心得報告實習企業：工業技術研究院　電光所實習學生：電資院學士班　　呂軒豪.

A high payload data hiding scheme based on modified AMBTC technique

推动全球能源变革，以创造清洁、安全、繁荣的低碳未来。

VIDEO COMPRESSION & MPEG

高性能计算与天文技术联合实验室智能与计算学部天津大学

2019/4/8 A Load Balancing Mechanism for multiple SDN Controllers based on Load Informing Strategy Miultiple controller 的 load balancing 機制，使用一個叫 Load informing.

Mechanics Exercise Class Ⅰ

Maintaining Frequent Itemsets over High-Speed Data Streams

Guide to a successful PowerPoint design – simple is best

BORROWING SUBTRACTION WITHIN 20

Changhua University of Education

中国科学技术大学计算机系陈香兰 2013Fall 第七讲存储器管理中国科学技术大学计算机系陈香兰 2013Fall.

虚拟仪器 virtual instrument

Common Qs Regarding Earnings

关联词 Writing.

從 ER 到 Logical Schema ──兼談Schema Integration

Inter-band calibration for atmosphere

WEBee: Physical-Layer Cross-Technology Communication via Emulation

A Data Mining Algorithm for Generalized Web Prefetching

An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。本文目的：做到既有较高的藏量（1bpp),

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨

磁共振原理的临床应用.

Reversible Data Hiding in Color Image with Grayscale Invariance

2 Number Systems, Operations, and Codes

Principle and application of optical information technology

Experimental Analysis of Distributed Graph Systems

A Trie-based Approach to Fast Flow Recognition for OpenFlow

Gaussian Process Ruohua Shi Meeting

Hybrid fractal zerotree wavelet image coding

Presentation transcript:

Speaker：Yeong-Luh Ueng 2018/4/17 A SHUFFLE-BASED ITERATIVE DEMODULATION AND DECODING SCHEME FOR LDPC CODED FLASH MEMORY Li-Chung Lee, Wei-Min Lai, Mao-Ruei Li, Yeong-Luh Ueng Dept. Electrical Engineering National Tsing Hua University, Hsinchu, Taiwan Speaker：Yeong-Luh Ueng 2018/4/17

Outline Introduction Preliminary Proposed Shuffle-based IDD Receiver NAND Flash Preliminary LDPC coded modulation using a very sparse LDPC code Layer-based IDD (Iterative demodulation and decoding) receiver Proposed Shuffle-based IDD Receiver Design challenge Hardware-friendly structure interleaver Optimized memory bank interface Simulation Results Conclusion This slide shows the outline of this talk. Firstly, we will introduce the NAND Flash. Then, the IDD scheme for LDPC coded modulation will be reviewed. The previous works about the layered-based IDD receiver will be introduced. Next, we will focus on the proposed shuffled-based IDD. Finally, we will show the simulation results and conclude this talk. 主要可以分為五個部分首先，先介紹NAND Flash之結構與特性再來，回顧IDD系統與LDPC碼調變，以及Layer-based IDD 接收器接著，為了增加硬體效益，我們提出使用shuffle-based IDD 接收器取代layer-based IDD 接收器，並且解決並優化其中遇到的問題。最後是模擬結果以及結論

Outline Introduction Preliminary Proposed Shuffle-based IDD Receiver NAND Flash Preliminary LDPC coded modulation using a very sparse LDPC code Layer-based IDD receiver Proposed Shuffle-based IDD Receiver Design challenge Hardware-friendly structure interleaver Optimized memory bank interface Simulation Results Conclusion

Introduction to TLC Flash Information is stored in floating-gate transistors. About TLC Higher density Degraded reliability and performanc Single-Level Cell, SLC 1bit (2 states) Multi-Level Cell, MLC 2bits (4 states) Triple-Level Cell, TLC 3bits (8 states) Flash memory is a kind of non-volatile storage. Information is stored using floating-gate transistors. It has many advantages such as small physical size, low power consumption and high storage density. Therefore, Flash memory has become more and more popular in recent years. Single level cell, multi level cell, and triple level cell can be used to stored 1 bit, 2 bits and 3 bits, respectively.

More threshold voltages for TLC Flash Model Using threshold voltage (VRef) to read Flash cell More threshold voltages for TLC TLC memory cell VRef0 VRef1 VRef2 VRef3 VRef4 VRef5 VRef6 111 110 100 101 001 011 010 000 A threshold voltage is used to determine which value is stored in the Flash cell. For example, if the sensed voltage is greater than the threshold voltage, this cell will be determined as zero. More than one threshold level should be used to sense the date bits stored in the TLC.

TLC Flash Model Hard-decision / soft-decision memory sensing Using more than one threshold voltage Increased read latency Hard-decision sensing Soft-decision sensing SLC Model SLC Model Modeled as PAM modulation using Gray mapping Modeled as 2-/ 4-/8-PAM modulation for SLC/ MLC/ TLC TLC Model 111 110 100 001 011 010 000 101 There are two methods to sense data: hard-decision and soft-decision. Conventionally, hard-decision sensing is used because the read latency is short. But! When the data reliability decreases, the control system will start soft-decision to prolong the Flash lifetime. The SLC/ MLC/TLC can be modeled as 2-PAM, 4-PAM and 8-PAM modulation schemes, respectively. Conventionally, Gray mapping is applied to the MLC and TLC since the neighboring level only differs in one bit. We can find that the storage density of the TLC is 3 times compared to the SLC. However, the TLC data reliability is much less than the SLC. More powerful error correction codes such as the LDPC codes are necessary for the TLC Flash.

Outline Introduction Preliminary Proposed Shuffle-based IDD Receiver NAND Flash Preliminary LDPC coded modulation using a very sparse LDPC code Layer-based IDD receiver Proposed Shuffle-based IDD Receiver Design challenge. Hardware-friendly structure interleaver Optimized memory bank interface Simulation Results Conclusion

Preliminary: LDPC Coded Modulation LDPC coded modulation schemes in [5][6] Non-Gray mapping Very sparse parity-check matrix Advantages Reduce complexity Improve decoding throughput 𝑑 𝑣 2 3 4 5-9 Gray mapping 0.04 0.2 0.44 0.32 Non-Gray [5][6] 0.9 0.1 Conventional Matrix Matrix [6] Conventionally, the Gray mapping is applied to the TLC. The authors in [5][6] proposed an LDPC coded modulation scheme based on a non-Gray mapping. The resultant parity-check matrix is very sparse. Look at this figure, the white blocks are all zero sub-matrices. We can find that the number of zero sub-matrices is larger than the upper matrix. This means that the complexity of the LDPC decoder can be decreased and the decoding throughput can be increased significantly. [5] J.-H. Shy, “LDPC coded modulation and its applications to MLC flash memory,” NTHU Thesis, 2014. [6] H.-C. Lee, J.-H. Shy, Y.-M. Chen, and Y.-L. Ueng, “LDPC coded modulation for TLC flash memory,” IEEE Information Theory Workshop(ITW), Nov.2017.

Preliminary –LDPC Coded Modulation Iterative demodulation and decoding (IDD) can enhance the error-rate performance. [4] Complex interface b/w demodulator and decoder Lower throughput This figure shows an LDPC coded 8PAM scheme together with the iterative demodulation and decoding (IDD) receiver. In the IDD receiver, soft information or log-likelihood ratio (LLR) message is exchanged between the decoder and demodulator. As a result, the error-rate performance is expected to be better than the conventional non-IDD receiver. However, the IDD receiver has a high hardware complexity and area-cost and hence it is rarely adopted in a practical system. 在IDD系統中，資料由LDPC編碼器完成編碼，並使用交錯器將位元打亂，最後經由8-PSK調變器將每三個位元轉為一個符元後存入FLASH當中。符元從Flash中讀出後，會經由解調器產生通道LLR值 Lc，並透過反向交錯器後，傳至LDPC解碼器進行解碼。而解碼器由通道LLR值產生外值訊息Le回傳給解調器幫助解碼。如此訊息在解調器與解碼器之間來回傳遞的系統我們稱之為IDD系統非IDD系統，則是解碼器回傳訊息給解調器。 IDD系統相較於非IDD系統可以提升解碼效能。然而相對的也有較高的硬體複雜度與面積花費。 [4] F. Schreckenbach, et al., “Optimization of symbol mappings for bit-interleaved coded modulation with iterative decoding,” IEEE COMMUN LETT, pp. 593–595, 2003.

Preliminary: Layer-based IDD Receiver [7] The IDD receiver proposed in [7] Two-codeword schedule The L 𝑐 and 𝐿 𝑒 memory are doubled Layered decoding is commonly used for LDPC codes. In an IDD receiver, data dependency exists between the demodulator and the decoder. The demodulator is idle until the layered LDPC decoder finishes the row decoding process. In a similar way, the decoder is idle when the demodulator works. In order to enhance the hardware efficiency, the authors in [7] proposed a two-codeword scheme, where the decoder and the demodulator process two different codewords at the same time. However, this architecture doubles the memory size in order to store information for the two different codewords. In this paper, we try to use a shuffled-based architecture to simply the data dependency. Since the shuffled-based decoding is a block-column-wise decoding, the demodulator can begin the demodulation process after the decoder finishes the computation for a single block-column. 在[7]中在IDD接收器中使用layer排程的LDPC解碼器，解碼過程如圖示，解碼時會依據教驗矩陣列方向順序解碼並且需等到所有列都解碼完成之後才能計算外值訊息並回傳給解調器。故而造成解調器與解碼器的硬體閒置問題。為了解決此問題在[7]中提出雙碼字排程技術讓解碼器與解調器在同一時間分別處理不同的碼字，以提高硬體使用效益相對的也造成了兩倍碼字的Lc 與 Le 的記憶體儲存量。然而若是將LDCP解碼器的解碼排成改為shuffle-based? 由圖可以看見在解完第一行之後即可先回傳外值訊息給解調器解調，不需要等其他行的運算。 [7] M. R. Li, T. Y. Kuan, H. C. Lee and Y. L. Ueng, “An IDD receiver of LDPC coded modulation scheme for flash memory applications,” 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Jeju, 2016, pp. 289-292.

Outline Introduction Preliminary Proposed Shuffle-based IDD Receiver NAND Flash Preliminary LDPC coded modulation using a very sparse LDPC code Layer-based IDD receiver Proposed Shuffle-based IDD Receiver Design challenge Hardware-friendly structure interleaver Optimized memory bank interface Simulation Results Conclusion

Proposed Shuffle-Based IDD Scheme Advantages of the shuffle-based scheme One-codeword decoding Reduction of the memory requirement Challenges Interface design Hardware idle In this paper, we propose a shuffle-based IDD scheme to reduce the memory requirement. It is not necessary to process two codewords at the same time. However, there are some challenges need to be overcome. The first is to design an efficient interface between the decoder and demodulator. The other is to avoid hardware idle which will result in a lower decoding throughput. 為了減少[7]中，雙碼字技術所帶來的龐大記憶體儲存量，在本篇提出使用shuffle-based LDPC 解碼器取代layer-based 解碼器然而直接取代卻會發生一些問題在其中。

Proposed Shuffle-Based IDD Receiver Challenge in LDPC decoder This figure shows the proposed shuffle-based IDD receiver. In the LDPC decoder, the C2V messages are recovered by the data stored the Min register and the V2C-sign memory. Then, the C2V messages are used to calculate the APP values, and also are stored in the FIFO register. After finishing the APP calculation, the V2C calculator works in order to calculate the V2C messages. The V2C sign memory will be updated in this stage. Now, the comparator will find the first two minimum values and the associated first minimum index. In order to exchange the message between the decoder and demodulator, the extrinsic LLR values are computed in the Le substractor unit, and the extrinsic values are stored in the Le memory. The demodulator will calculate its extrinsic messages in the next cycle. There are 2 interface issues limiting the receiver throughput. One is in the decoder which results in the idle issue in the APP calculation. The other is the memory interface. 如圖為 shuffle-based IDD接收器的架構，我們將LDPC解碼器的部分替換成shuffle排程的解碼器。在LDPC解碼的過程中，首先，會由Min Register 與 V2C-sign memory還原上一次迭代的資料的C2V值，此後將分為兩條路線，一條是將之與通道LLR值加總計算APP，另外一條則是存入FIFO等待用於V2C的運算。接著，當APP計算完成之後，將會傳給V2C Calculator做V2C運算並形轉成sign magnitude 的形式，將sign值存入V2C-sign memory，而量值則會經由比較器比較最新的最小值與次小值之後，經由barrel shifter轉成下一次計算要用的順序後存入Min Register。另外一方面，計算好的APP值可以經由Le substractor 與通道LLR相減產生外值訊息放入外值訊息的記憶體中，供給解調器計算下一次迭代的通道LLR。第一個問題點出現在LDPC解碼器內部當中，第二個問題則是出現在解調器與解碼器之間的記憶體連接介面。 Challenge in memory interface

Challenge - decoder The degree of the (j)th block column is larger than that of the (j+1)th block column Idle issue Decrease throughput Solution: Arrange the block columns based on an increasing degree In the shuffled-based scheme, the column-degree distribution of the parity check matrix affects the decoding throughput. Now consider the case that the weight of the jth block column is larger than that of the (j+1)th block column. When the C2V recover and the APP adder are ready to output data to the V2C calculator for the (j + 1)th block column, the V2C calculator is still busy on processing the data for the jth block column. In other words, there are some idle units in the decoder and the decoding throughput decreases. In order to avoid this problem, we arrange the block columns based on an increasing degree. 由於V2C的計算需要等待APP值算好才能開始，因此當第j個區塊行級數大於第j+1個區塊行級數時， APP adder 與 C2V recover 為了等待V2C calculator完成計算，故而需要一個CLK的IDLE 若解碼排成上有許多大小交錯的區塊行級數的話，IDLE的時間將會大幅增加為了解決此問題，我們將校驗矩陣的區塊行進行交換，使其區塊行級數呈現遞增的形式。改善後的排程範例如圖，相較於之前的18個CLK，經過調整後的解碼排成只需要17個CLK

Challenge – memory interface Hardware idle : Random interleaver Decrease throughput Solution : Propose hardware-friendly structure interleaver After resolving the decoder idle issue, now, we now focus on the decoder-demodulator interface. If a random interleaver is used, it is likely that bits corresponding to a single 8-PAM symbol belong to non-consecutive block columns. The demodulator and the decoder are necessary to wait for the desired information for a long time, resulting in a low decoding throughput. In order to optimize this interface, we propose using a structure interleaver. Using the proposed method, the decoder only needs to compute and send the Le values to the demodulator for three consecutive blocks rather than all block columns. Look at this figure, the shuffle-based IDD scheme is able to realize the one-codeword processing with minimized hardware idle. 而另外一個問題則是發生在解調器與解碼器之間的溝通，若在解調器與解碼器之間使用Random interleaver，意味著解調器計算好的通道LLR值，不一定是當前解碼器所需要的位置。相對的解碼器先解完的外值訊息，也不一定能夠組成解調器當下計算所需要的所有符元。此問題不僅導致硬體出現大量的閒置問題，並且在編碼器與解調器之間的介面設計也變得相當困難。因此在本篇中提出對於硬體友善的交錯結構，將解調器所要計算的一個循環大小的符元對應到的位置集中在相鄰的三個區塊之中。使得在解調器與解碼器之間的排程上能夠更圓滑順暢。

Proposed Optimized Memory Interface Updating of demodulator LLR 𝐿 𝑐,𝑗~𝑗+2 only requires extrinsic message 𝐿 𝑒,𝑗~𝑗+2 . Conventional Interface Proposed Interface Z×3 FIFO buffer The demodulator can update its LLR values when the decoder provides its extrinsic LLR values for three consecutive block columns. The Le values for block columns from 0 to j -1 and block columns from j + 3 to G -1 do not need to be buffered, and hence, the size of the Le buffer can be reduced significantly. 然而在此之中我們發現了一件事情由於解調器在計算通道LLR值時，僅需對應到之連續三個區塊的外值訊息即可，因此原本儲存外值訊息的memory bank可以替換成3個循環大小的緩衝器即可。如此大幅地降減少了儲存通道LLR值與外值訊息的記憶體。 Save 50% memory requirement

Outline Introduction Preliminary Proposed Shuffle-based IDD Receiver NAND Flash Preliminary LDPC coded modulation using a very sparse LDPC code Layer-based IDD receiver Proposed Shuffle-based IDD Receiver Design challenge Hardware-friendly structure interleaver Optimized memory bank interface Simulation Results Conclusion

BER Results 0.08 dB 0.05 dB This figure shows the BER performance, where a 2KByte code is used. It is observed that the proposed structure interleaver can also improve decoding performance by almost 0.1 dB. In addition, the decoding performance of shuffled-based IDD scheme is better than the layered-based scheme. 在本篇中採用shuffle-based IDD接收器其錯誤率表現如稜形方塊線，與layer-based IDD有差不多的錯誤率表現如米字線並且使用了我們提出的對於硬體友善的交錯結構後，錯誤率表現有明顯的提升，接近並且好於傳統非IDD系統。而相較於layer-based IDD則好了約0.05db。

Hardware Complexity This slide shows the improvements in the Lc and Le memory usage, where the two-codeword layered scheme is used as the based-line. When the shuffled-based IDD scheme is adopted, it has a 19.7% reduction in gate count. After optimizing the IDD system, a reduction in gate count of 53.5% is able to be achieved. 在解調器與解碼器中間的記憶體連接介面也有大幅的減少，最左邊為two-codeword layered-baded IDD receiver的記憶體使用量，中間的為改為one codeword shuffled IDD receiver，雖然從雙碼字的使用量降為單碼字，看似減少了一半的儲存量值，但是由於只是記憶體深度的改變，因此減少不大最右邊的則是優化過後的架構，將儲存外值訊息的記憶體替換成3個循環大小的緩衝器後，相較於最右邊減少了53.5%。

Comparison Results Gray-based non-IDD Layer-based IDD[7] Shuffle-based Gray-based non-IDD Layer-based IDD[7] Shuffle-based IDD Code 8PSK + (18432, 16704) Technology 90nm Algorithm NMS Max. Iteration number 15 Quantization (bits) 5 6 Clock frequency(MHz) 166 190 Throughput (Mbps) 679.9 1100 1555 Gate count(K) 1297 1891 1888 Area ( mm 2 ) 3.66 5.33 5.32 Hardware efficiency (Mbps/ mm 2 ) 185.76 206.37 292.19 This is a comparison table. By this table, we can find that this work achieves a better hardware efficiency. 最後是硬體的模擬結果，由表中我們可以看出相較於非IDD與layered-based IDD，本篇提出之架構有更好的硬體使用效益。 41.2% Improvement

Conclusion Propose an efficient shuffle-based IDD receiver for TLC applications Hardware-friendly structure interleaver Labelling bits corresponding to a single 8-PAM symbol are distributed in three consecutive block columns Improve decoding throughput Decrease design complexity of memory interface Optimized memory interface Using a small buffer Reduce area cost In this talk, we have presented an efficient shuffled-based IDD receiver. Compared to the layered-based IDD receiver, the shuffled-based receiver does not require to double memory size in order to store two-codeword information. Secondly, we have presented a hardware-friendly structure interleaver to enhance the decoding throughput. Finally, the optimized memory bank reduces the memory requirements for the interface between the decoder and the demodulator. According to the simulation results, we think that the shuffled-based IDD receiver has a great potential to be used in the next generation storage and communication systems. 在本篇論文中，提出高硬體效益的shuffle-based IDD接收器。在layer-based IDD接收器中為了增加硬體使用效益，使用雙碼字排程技術，而在本篇論文中使用shuffle-based LDPC解碼器去代layer-based 解碼器。提出對於硬體友善的交錯結構，使得解碼過程更加的順暢，以及減少硬體設計的複雜度。最後針對解碼器與解調器之間的記憶體連接介面進一步優化設計。使用緩衝器取代memory bank的使用，大幅減少了硬體使用面積。相較於layer-based IDD接收器，硬體效益高出40%。