

## Feasibility Of 100G-KR FEC

Zhongfeng Wang and Chung-Jue Chen Broadcom Corp., USA

### Contributors

- Mark Gustlin, Cisco
- Howard Frazier, Broadcom
- Sudeep Bhoja, Broadcom
- John Wang, Broadcom
- Vasu Parthasarathy, Broadcom
- Hongtao Jiang, Broadcom
- Wenwei Pan, Broadcom



# **Supporters**

- Adee Ran, Intel
- Rich Mellitz, Intel



# Outline

- Introduction
- Coding Strategy
- Candidate FEC Goals
- Candidate FEC Codes Meeting the Target Goals
- A Real Case Study on Performance, Latency and Complexity
- Conclusion



# Introduction

- Previous talk [1] has addressed some advantages of coding across physical lanes, e.g., low latency, high coding gain
- Encoding with slightly higher redundancy than KR FEC has been discussed in March IEEE meeting
- RS code, being simple, is able to achieve good tradeoff between decoding random errors and burst errors

[1] Z Wang, "FEC Options for 100G-KR", IEEE 802.3 100GCu March Meeting



# **Coding Strategy**

- Coding across physical lane is our main focus
- Bypass FEC decoding can minimize latency.

### Alignment Option

- Standards Lane Alignment Markers can be used to find the FEC block boundary. The alignment marker is always at the beginning of a FEC block. The alignment marker is not part of the FEC block in the following example.
- > Another option is discussed in [1].
- Example RS(198,182), m=10
  - Every Physical lane will have 1820/(4x65) = 7 x 65b PCS blocks + (1980-1820)/4 = 40b (5 bytes) of FEC parity bits.
  - Alignment Markers will appear every (16383 x 66 x 5)/ (1980 / 4) = 10922 FEC blocks

Bus Width

45 45

45

erything

Lane (

Lane 1 Lane 2

Lane 3

Every 10922 FEC Blocks the alignment markers repeat, FEC Block = Blue, Alignment Markers = Yellow

| Parity0 40Bits | PCS LN24 | PCS LN20 | PCS LN16 | PCS LN12 | PCS LN8  | PCS LN4 | PCS LN0 | AM16 | AM12 | AM8  | AM5 | AM0 |   |
|----------------|----------|----------|----------|----------|----------|---------|---------|------|------|------|-----|-----|---|
|                |          |          |          |          |          |         |         |      |      |      |     |     |   |
| Parity1 40Bits | PCS LN25 | PCS LN21 | PCS LN17 | PCS LN13 | PCS LN9  | PCS LN5 | PCS LN1 | AM17 | AM13 | AM9  | AM5 | AM1 |   |
|                |          |          |          |          |          |         |         |      |      |      |     |     |   |
| Parity2 40Bits | PCS LN26 | PCS LN22 | PCS LN18 | PCS LN14 | PCS LN10 | PCS LN6 | PCS LN2 | AM18 | AM14 | AM10 | AM6 | AM2 |   |
|                |          |          |          |          |          |         |         |      |      |      |     |     |   |
| Parity3 40Bits | PCS LN27 | PCS LN23 | PCS LN19 | PCS LN15 | PCS LN11 | PCS LN7 | PCS LN3 | AM19 | AM15 | AM11 | AM7 | AM4 |   |
|                |          |          |          |          |          |         |         |      |      |      |     |     | 1 |

[1] Mark Gustlin *"FEC Striping Options for 100 Gb/s Backplane and Copper Study Group"*, IEEE 802.3, Incline Village, May 2011

## **Candidate FEC Goals**

- Source data: multiple of 65 bits
- Latency (transmission +processing): < 100 ns
- Coding gain: > 5 dB @1e-15
- Hardware complexity: < 0.1 mm<sup>2</sup> (28nm)



# **Reed-Solomon Codes**

#### • RS(n, k, t) defined over GF(2<sup>m</sup>)

- Source data: k symbols = k\* m bits
- Coded block: n symbols = n \* m bits
- Random error correcting capability: t errors

### RS decoding steps [1]

- Syndrome Computation (SC): takes n/p cycles, when p denotes the parallel level of processing in a design
- Key Equation Solver (KES): normally takes 2\*t cycles,
- Chien Search and Forney (CSnF): takes n/p + (1~2) cycles.

[1] B. Chen, X. Zhang, and Z. Wang, "Error correction for multi-level NAND flash memory using Reed-Solomon codes," IEEE SiPS'2008.



### **Candidate FEC Code-I**

#### • RS(198, 182, t=8), m=10,

- Clocking requirement 27.61Ghz
- Net Coding Gain ~ 6.16 dB,
- Burst error capability: max=80 bits
- > Source data = 65bx28, coded data = 1980b
- Total Latency ~66ns
- $\succ$  Details of this code will be provided in later slide.



### **Candidate FEC Code-II**

#### • RS(276, 260, t=8), m=10,

- Clocking requirement 26.94Ghz
- Net Coding Gain ~ 6.10 dB
- ➢ Burst error cap.: max=80 bits
- Source data = 65bx40, Coded data = 2760b
- ➢ Total latency: ∼ 82 ns



### **Candidate FEC Code-III**

#### • RS(280, 260, t=10), m=10

- Clocking Requirement 27.35Ghz
- Net coding gain: ~ 6.44 dB
- Burst error capability: max=100 bits
- > Source data = 65bx40, coded data = 2800b
- > Total latency: ~92ns



## **A Real Case Study**

#### • RS(198, 182, t=8), m=10

- Bus width= 180bits
- Clock frequency: ~600 Mhz

#### Decoder Architecture

- Compute the syndromes as data arrives, parallel level =18
- Take 2\*t=16 cycles to solve Key Equation
- Take same parallel level (18) for Chien Search & Forney

#### • Decoder Complexity

Synthesized Area (relative to Fire code over Virtual Lane): < 1x



## A Real Case Study (II)

#### Latency

Timing

> Overall Latency ~ 66ns



#### Multi-code interleaving options

Can linearly increase the tolerance of burst errors and DFE error propagation

> With 2 code interleaved, overall latency is less than 88 ns

> With 4 code interleaved, the overall latency is less than 120 ns



### **Comparison with KR Fire code**

#### Net Coding Gain

RS(198,182) ~ 6.16 dB, Fire code ~ 2.3 dB

#### Burst Error across Lanes

- >RS(198,182) = 80 bits, Fire code = 11bits
- >4xInterleaved RS ~ 310 bits, Virtual Lane Fire code ~ 220 bits

### • Latency

- >RS(198,182) ~ 66 ns, Virtual Lane Fire code ~ 420 ns
- >4xInterleaved RS ~ 120 ns

### Complexity

- The area is roughly <1x that of the Virtual Lane Fire code</p>
- > The absolute area is very small in 28 nm (<0.1 mm<sup>2</sup>)

### Clocking

> The RS(198,182) code requires ~6% higher clock than Fire code.



## Conclusion

• FEC codes with small complexity, significant coding gain and low latency for 100GBASE-KR4 systems are technically feasible.

