# **Delay Constraints in 40GBASE-T**

IEEE 802.3: 40G-BASE-T Task Force

Peter Wu, William Lo Marvell Semiconductor

# Supporters

# **Outlines**

- Review delay constraints (i.e., latency) limit for 10GBASE-T
  - 25600BT- (i.e., 8 LDPC frames, 2.56us)
    - Defined in Clause 55.11
    - The sum of TX & RX path delays (PHY Medium delay not included)
  - Main latency
    - Frequency domain Echo/Next cancellation and LDPC decoder
- 40GBASE-T
  - Analysis on latency numbers for 40GBASE-T transceivers

### Latency at 10GBASE-T transceiver

- The main latencies are shown in ns and LDPC frames
- The latency is traded off for power at ENX and some blocks

| Blocks                                     | Latency (ns)         | LDPC frames | Note                                                                 |
|--------------------------------------------|----------------------|-------------|----------------------------------------------------------------------|
| LDPC decoder                               | > 640ns              | > 2         | RX                                                                   |
| Echo/Next canceller                        | 600 ~ 1200ns         | 2~ 4        | TX or RX side Frequency domain Filtering Trade off power and latency |
| Others:<br>FIFO, pipelines<br>and PCS etc. | mid~high<br>100's ns | 1 ~2        | TX and RX                                                            |
| Total latency                              | 2000~2500ns          | 6 ~ 8       | marginal meets 8 frame latency limit                                 |

### Why DFT FIR for Filtering with long taps?

#### **DFT FIR:**



|                                    | ECHO  | NEXT  | FEXT | FF EQ | Total FIR |
|------------------------------------|-------|-------|------|-------|-----------|
| FIR length                         | 500   | 300   | 100  | 80    |           |
| BlockSize or net samples           | 524   | 724   | 156  | 176   |           |
| FFTsize                            | 1024  | 1024  | 256  | 256   |           |
| log2N                              | 10    | 10    | 8    | 8     |           |
| Real operations/sample for FIR     | 500   | 300   | 100  | 80    | 7120      |
| Total operations/block for DFT FIR |       |       |      |       |           |
| (4*(N/2)log_2(N)*2+4*N)/2          | 22528 | 22528 | 4608 | 4608  |           |
| Real operations/sample for FFT     | 43    | 31    | 30   | 26    | 1005      |
| Approx Savings                     | 91%   | 90%   | 70%  | 67%   | 86%       |
| Gain                               | 11x   | 10x   | 3x   | 3x    | 7x        |

<sup>\*</sup>Reference: http://www.ieee802.org/3/10GBT/public/sep03/kasturia\_1\_0903.pdf

### Echo/NEXT cancellation – DFT FIR

#### At 10GbaseT:

- Total XTALK cancellation tap number at a 10G port: ~1000\*4 (echo)+300\*12 (Next)+128\*12 (Fext) ~ 9000 taps!!!
- With 90% power savings compared to time domain FIRs
- Time domain FIR implementation is not an option

### Cons:

- Higher latency
- Algorithm latency (algorithm FFT size/2) and implementation latency

### Extrapolation of Latency at 40GBASE-T

- 30meter cable with 4x of symbol rate:
  - Echo Taps requirements =  $N_{10GBASE-T}*30/100*4 \sim (1.2*N_{10GBASE-T})$
  - Still need to do it in DFT FIRs

## Latency at 40GBASE-T transceiver

| Blocks                                     | 10GBASE-T (LDPC frames) | 40GBASE-T (LDPC frames)                |
|--------------------------------------------|-------------------------|----------------------------------------|
| LDPC decoder                               | > 2                     | > 2*                                   |
| Echo/Next canceller                        | 2~ 4                    | 2.4~ 4.8 Higher because of longer taps |
| Others:<br>FIFO, pipelines<br>and PCS etc. | 1 ~2                    | 1 ~2*                                  |
| Total latency                              | 6 ~ 8                   | 6 ~ 8*                                 |

<sup>\* 65</sup>nm to 16nm may not hit 4X speed, more parallel architecture may increase latency.

## Conclusions:

- keep latency constraints not lower than 25600BT.
- Challenges to meet:
  - Echo will take longer
  - More parallelism will take longer
- Let the PHY venders to find the best trade off for power and latency under this constraints.

# Thank you