# 200G/lane PAM4: Error Profile Error Propagation and Error Correction Considerations

Upen Reddy, Kareti

David, Nozadze

Cisco Systems Inc.

# Overview

200G/lane evaluations until now from task force requires DFE tap1 coefficient greater than 0.7 – even for medium loss channels

### With High tap1 DFE, investigate

1. Skip level errors:

These errors are caused by noise conditions moving signal to next threshold level

- a) Are there any Skip level errors two-bit errors even after Gray coded PAM4 modulation
- b) If yes, is there a significant differences in error profile before and after RS-FEC (544,514,10)
- 2. Impact on Error Profile and Error propagation for concatenation of multiple high DFE tap1 channels
  - a) Concatenation stage to stage are there any error multiplication? other than each stage introduced errors and their propagation through that stage
  - b) Overall impact of channel performance with multi-stage channel concatenation

# Overview contd..

Task Force presentations showed MLSD has a better performance compared to DFE based applications.

- 3. Error profiles and Error propagation with MLSD
  - a) Performance of MLSD with 1 tap memory (1 +  $\alpha$  D) compared to DFE for the same tap1 value
  - b) MLSD error profiles for different  $\alpha$  values for the same DER
- 4. Multipart link conditions that are suitable different types of FEC strategies

# Definitions and Assumptions

- Analytical evaluations with "a" specified uses error propagation probability (a.k.a. "a") as per figure based on target DER
  - Evaluations using "a" assumes there no skip level errors to satisfy <= 0.75 value</p>
- Monte Carlo with "a" as an error propagation probability for next symbol when previous symbol in error
- Monte Carlo with DFE use DFE feed back loop to determine PAM4 levels in determining error propagation
- Monte Carlo with MLSD uses  $\alpha$  to find appropriate levels and confidence values of those levels based on trace back length of 5 symbols.
- Bit by bit evaluations, bit stream is passed through custom TX FFE, RX AFE and specified RX equalizer like DFE or MLSD
  - Bit by bit and Monte Carlo evaluations with DFE or MLSD appropriate additional gaussian noise to create target stress conditions IEEE P802.3dj 200Gb/s, 400Gb/s, 800Gb/s, and 1.6Tb/s Interim



CISCO

# Skip –level errors

- Consider a sample channel that is stressful • enough for evaluations where DFE tap1 coefficient values close to 1 or more
- COM evaluation of this channel shows DFE ٠ tap1 value is ~1





# Skip –level errors

 When DFE tap1 is less or equal 1, probability of skip level (two-bit) errors is less than 1e-7. therefore, has no significant impact on error propagation



|         | Bit by bit simulation |              |                 |       | Monte Carlo |              |                 |       | Analytical |        |
|---------|-----------------------|--------------|-----------------|-------|-------------|--------------|-----------------|-------|------------|--------|
|         | DFE                   | # of two-bit | prob of two-bit |       | DFE         | # of two-bit | prob of two-bit | CED   |            |        |
| DER     | tap1                  | errors       | errors          | CER   | tap1        | errors       | errors          | CER   | a          | CER    |
| 4.0E-04 | 1.15                  | 2840         | 2.84E-06        | 6E-8* | 1.15        | 5274         | 1.94E-06        | 5E-8* | 0.75       | 3.E-08 |
| 4.0E-04 | 1.05                  | 119          | 1.19E-07        | 5E-8* | 1.05        | 324          | 1.19E-07        | 5E-8* | 0.75       | 3.E-08 |
| 4.0E-04 | 0.96                  | 10           | 1.00E-08        | 5E-8* | 0.96        | 22           | 8.09E-09        | 4E-8* | 0.75       | 3.E-08 |
| 4.0E-04 | 0.86                  | N/A          | N/A             | 5E-8* | 0.86        | N/A          | N/A             | 4E-8* | 0.75       | 3.E-08 |

IEEE P802.3dj 200Gb/s, 400Gb/s, 800Gb/s, and 1.6Tb/s Interim

\* extrapolated

Bit by bit sim 1.00E+09

Monte Carlo

PAM symb #

2.72E+09

Meeting, - Jan 2023

### · **i | i · i | i ·** cisco

### Skip –level errors

 No significant deviation from analytical evaluations where skip level errors do not exist





### ..|...|.. cisco

### Concatenated sub-links

- Data is received and re-transmitted without error corrections
- Two Monte Carlo models are simulated:
  - 1. At each RX, DER0=1e-4, a=0.75
  - 2. At each RX, DER0=1e-4, DFE tap1=1
- 5.44e9 symbols simulated



## Concatenated channels

- DEROs at each RX is close to sum of DEROs of RXs before it. DEROs are adding up
- Error propagation factor is not changed and burst error probability is not changed
- No discernible error multiplication through each stage (sub-link)
- But segmented FEC would be an appropriate solution to this situation where each sub-link is stressed to the limit of FEC capability







### MLSD: vs DFE

- For the same α (tap 1) values with the same stressed noise conditions of DFE applied to MLSD; MSLD performs better even with error propagation
- The MLSD benefit over DFE depends the value of  $\alpha$  and the DER







 When α = 1 or DFE=1, error propagations due to DFE and MLSE are the same



#### IEEE P802.3dj 200Gb/s, 400Gb/s, 800Gb/s, and 1.6Tb/s Interim Meeting, - Jan 2023

# MLSD: for varying $\boldsymbol{\alpha}$

DER=4.5e-4



Number of symbol errors in RS FECIEW (taouet) 200Gb/s, 400Gb/s, 800Gb/s, and 1.6Tb/s Interim

Meeting, - Jan 2023

. | | | . | | | .

• For the varying  $\alpha$  (tap 1) values and the same target CISCO

DER, The error propagations is worse when  $\alpha = 1$ 

with a=0.75 and MLSE are the same

and MLSE are the same

MLSE is less than in DFE

For  $\alpha = 1$ , error propagations calculated analytically

When  $\alpha = 1$  or DFE=1, error propagations due to DFE

When alpha is not equal to one, error propagation in

# Error Correction Considerations

- Data presented so far includes
  - Gray coded PAM4 modulation
  - RS(544,514,10) symbol error profiles to determine postFEC CER/FLR
- But do not include impacts of
  - 1/(1+D) Modulo 4 precoding (end2end, or per segment)
  - Bit mux or symbol mux in PMA
  - Codeword interleaving (2 or 4 etc..)
- The worst-case link segment is very close to analytical evaluations with a = 0.75.
  - See blue line in 100+ Gb/s Ethernet FEC analysis, Cathy Liu
  - With out additional error mitigation strategies, DER of 1e-4 do not meet Ethernet FLR requirements.



Figure 2. FLR vs. SerDes detector DER0 with and without RS (544, 514, 15) FEC for random and burst error cases



IEEE P802.3dj 200Gb/s, 400Gb/s, 800Gb/s, and 1.6Tb/s Interim

Meeting, - Jan 2023

Figure 3. *FLR* vs. SerDes detector *SNR* with and without RS (544, 514, 15)<sup>5</sup>FEC for random and burst error cases

# ......

20.6

18.52

19.48

18.41

21.13

18.69

19.73

18.58

## **Error Correction** Considerations cont.

- For a workable solution consider impacts of
  - 1/(1+D) Modulo 4 precoding (end2end, or per segment)
  - Bit mux or symbol mux in PMA
  - Codeword interleaving (2 or 4 etc..)

For example, see healey 100GEL 01 0318

- ran 3df 01a 2211 provided impact of some of the • bit mux and symbol mux options
- Determine what FEC and error mitigation ٠ approaches are needed per link segment basis or per full link( end2end) basis with a detail error analysis after each segment

#### SNR required for target frame loss ratio

Evaluate performance of defined error correction schemes with 4:1 bit mux.



# Error Correction Considerations cont.

As noted in <u>kareti 3df 01a 2207</u>, stronger RSFEC(576,514,10) than KP4 FEC i.e., RSFEC (544,514,10) would not provide enough coding gain to offset the channel loss increase due to higher data rate needed for additional overheads

End2End FEC:

- Need to make sure capability of KP4FEC for full link is enough where each link segment that has a DER0 of  $\sim$  1e-4
  - MLSD based equalization per each link segment seems to be suitable for this FEC strategy

Segmented FEC:

 A worst-case link segment has DER0 of ~1e-4, KP4 FEC would be suitable per segment basis with an appropriate error mitigation techniques - like precoding, Codeword Interleaving etc..

- DFE based equalization in each link segment may be sufficient for this approach Concatenated FEC: (Inner FEC is convolved with Outer FEC in at least one or more segments of the link)

- The Inner FEC either completely correct the errors that are introduced by that segment of the link or leftover errors after Inner FEC are with in the capability of the Outer FEC
- Inner Code cannot correct the errors introduced by neighboring sub-links
- Coding gain of Inner FEC is not a primary factor, instead it is important that the profile of remaining errors from Inner FEC are within the capability of outer FEC.





## Error Correction Considerations cont.

- Any system will have varying stress conditions and some portion of the links/ports will have higher error profiles because of stressed conditions like losses, Xtalk etc.
- Similarly different Module types DR,FR,LR etc.. Would have different error profiles needing different error correction strategies.
- For worst-case links/ports Segmented FEC would be be more suitable
- Best approach would be if devices involved in a link can implement the following capabilities and bypass some of them when not necessary, or to optimize link performance, power and latency through device
  - Termination and regeneration of PCS or FEC, error mitigating techniques like precoding, codeword interleaving etc..

# Next Steps

- Find suitable bit mux or symbol mux options
- For three different FEC strategies addressed here, find must required (suitable) error mitigating techniques
- Protocols to select suitable FEC strategies, error mitigating techniques for a classes of channels in a high connectivity system
- Link training that accommodate the above considerations.

## Conclusions

- 200G/lane channels analyzed until now show very high DFE tap 1 values, skip level errors due to that are not playing a significant role
- In multipart link even at high error propagation conditions for each sub-link no discernible evidence of error multiplication from segment to segment, but error profiles add up from each segment
- MLSD provides a better performance comparable to DFE and error profiles are better when α ≠ 1

- Segmented FEC strategy suitable as it worst-case links that has higher error profiles
- Other FEC strategies may be used to optimize overall link power and latency keeping mind the considerations listed in Slide 15
- Propose a flexible implementation of FEC,PCS and error mitigation techniques in devices and at protocol level manage and optimize link performance, power and latency