

# Latency analysis and memory requirements of coded 16-PAM scheme for GEPOF

Rubén Pérez-Aranda (<u>rubenpda@kdpof.com</u>) David Ortiz Dunia Prieto

IEEE 802.3bv Task Force - January 2015

# Introduction



- This presentation provides a performance analysis in terms of latency and memory requirements for the coded modulation scheme defined in [1]
- Encoder and decoder are analyzed in separated sections and global conclusions are summed up at the end
- In this presentation, it is going to be considered as reference clock a clock running at a frequency equal to the PAM symbol rate (baud-rate)
- Therefore, the reference clock is  $F_S = 325 \text{ MHz}$



#### Coded 16-PAM encoder latency

# Coded 16-PAM - encoder latency



#### • Encoder scheme:

- FIFO<sub>1</sub> and FIFO<sub>2</sub> are required to provide constant data flow in both, the input and the output of the encoder; these FIFOs are needed for rate matching
- The required size of these FIFOs determines the latency of the MLCC encoder, because the rest of blocks can be considered with no latency
- The MLCC multiplexing pattern has been chosen to minimize the memory requirements of the FIFOs as well as the latency of both, the encoder and the decoder



### Coded 16-PAM - encoder latency, FIFO1





### Coded 16-PAM - encoder latency, FIFO<sub>2</sub>





# Coded 16-PAM - encoder latency



- Encoder latency very low: 90 cycles (= 0.27 us for  $F_S$  = 325 MHz)
- Memory requirements:
  - Size of FIFO<sub>1</sub> = 164 bits
  - Size of FIFO<sub>2</sub> = 135 bits



### Coded 16-PAM decoder latency

### Coded 16-PAM - decoder latency



# Coded 16-PAM - decoder latency - BCH dec



- It is going to be assumed that output from BCH decoder is fed back to the second stage as soon as the error correction has been carried out (i.e. Berlekamp + Chien) without waiting for the decoding failure checking.
  - This stratgey minimizes the FIFO<sub>1,2</sub> size without impacting the BER of neither 1<sup>st</sup> nor 2<sup>nd</sup> levels
  - Chien's search runs calculating the roots of the Error Locator Polynomial (ELP) in parallel to the output, flipping the bits which location evaluates the ELP to zero
  - After all the ELP roots have been calculated, the BCH decoder compares the number of roots found during Chien's search against the ELP degree. In case of being different, decoding failure is asserted, indicating that the error correction capability of the decoder has been exceeded
  - A 8-parallel Chien's search architecture is used to speed up this process reducing the latency
- Latencies of every sub-block composing the BCH decoder have been verified as technically feasible with real IC implementations

# Coded 16-PAM - decoder latency - FIFOs



- FIFO<sub>1,2</sub> is in between the 2 decoding stages to store the received symbols from the channel waiting for the first level decoding
  - FIFO<sub>1,2</sub> size (in number of symbols) is determined by the delay of BCH decoder (in number of cycles) for error correction
- FIFO<sub>1,2</sub> performs with a constant filling level being the input rate equal to the output rate in steady state of 0.5 2D symbols / cycle, after ramp-up
- FIFO<sub>1</sub> and FIFO<sub>2</sub> are in charge of rate matching between both, the input and the output of the multi-stage decoder
- Processing delay required by the BCH decoding failure detection is going to be supported by the FIFO<sub>1</sub>.
- Therefore, and extra stock in FIFO<sub>1</sub> is going to be implemented to synchronize the decoded information data of first level with the end of the error detection processing by BCH decoder that validates the failure flag

#### Coded 16-PAM - decoder latency - FIFO1





#### Coded 16-PAM - decoder latency - FIFO<sub>2</sub>





# Coded 16-PAM - decoder latency



- Decoder latency: 1440 cycles (= 4.43 us for  $F_S = 325$  MHz)
- Memory requirements:
  - Size of FIFO<sub>1</sub> = 599 bits
  - Size of FIFO<sub>2</sub> = 493 bits
  - Size of  $FIFO_{1,2} = 1193$  symbols
    - Let's consider, for example, 8 bits per symbol, taking into account that constellation expansion
      produced by THP has to be allocated
  - Size of  $FIFO_{1,2} = 9544$  bits

### Conclusions



- Latency of coded 16-PAM encoder: 0.27 us
- Latency of coded 16-PAM decoder: 4.43 us
- Total latency of coded 16-PAM: 4.71 us
- Latency of the rest of the PHY (transmission structure, equalizers, etc) is calculated in < 1.3 us</li>
- Then, PHY GMII-to-GMII latency: < 6 us
- Memory requirements of 16-PAM encoder: ~300 bits
- Memory requirements of 16-PAM decoder: ~10.6 kbits
  - only FIFOs are considered
  - internal memories of BCH decoder are not considered, although their size is estimated one order of magnitude smaller
  - FIFO<sub>12</sub> is the most memory demanding block of the decoder

#### References



• [1] Rubén Pérez-Aranda, et al., " High spectrally efficient coded 16-PAM scheme for GEPOF based on MLCC and BCH", 802.3bv TF, Interim Meeting, Jan 2015



#### Questions?

IEEE 802.3bv Task Force - January 2015