

# PAM4 digital receiver performance and feasibility

Vasu Parthasarathy Jan 2012

www.broadcom.com

### **Supporters and Contributors**



- Howard Frazier, Broadcom
- Will Bliss, Broadcom
- Kent Lusted, Intel
- Rich Mellitz, Intel
- Sanjay Kasturia, Inphi
- Hamid Rategh, Inphi
- Adee Ran, Intel
- Matt Brown, Applied Micro





- Explore PAM4 performance on channels submitted to .ap as well as recent submissions
- These channels have generally been accepted as difficult for most line codes
- Evaluate tradeoffs between complexity and performance with a digital ADC based receiver architecture
- Demonstrate technical feasibility of the architecture for supporting 100 Gbps operation



### **Simulation Setup**



#### • Digital Receiver Architecture

 3 tap FFE (1 pre,1 main, 1 post), peaking filter (~8db boost at Nyquist), ADC, 32 tap FFE, 2 tap DFE, 5 dB FEC at an increased line-rate of 27.5 Gbps (accounts for FEC overhead)

#### Simulation Parameters

- Tx Launch = 1Vppd
- $T_{rise/fall} = 20 ps$
- Tx RJ = 0.37 ps rms
- TXDJ = 0.05UI peak-peak
- Rx RJ = 0.37 ps rms
- AWGN PSD = -154 dBm/Hz double-sided
- Package Model: s-parameters from current 10GBASE-KR production package (corresponds to a small chip package model)
- SNR Target = 24.0 dB (corresponds to BER =  $10^{-12}$ )

### Channel 1: Molex<sup>1</sup> channel (crosstalk scaled by 8 dB)





<sup>1</sup> http://www.ieee802.org/3/ap/public/channel\_model/oganessyan\_m1\_0306.zip

### Channel 2: TE Connectivity<sup>2</sup> channel with Nelco4000-6





<sup>2</sup>http://www.ieee802.org/3/100GCU/public/ChannelData/TEC\_11\_0428/TEC\_STRADAWhisper42p8in\_Nelco6\_Channel\_IEEE802\_3\_1 00GbCu\_04282011.zip

# Channel 3: Emerson<sup>3</sup> channel





<sup>3</sup>http://www.ieee802.org/3/100GCU/public/ChannelData/emerson\_11\_0928/meier\_01\_1011.pdf (Thru\_S06-P20-10-EF\_S14-P23-04-GH\_NNN.s4p)

### **Transmitter Feasibility**



- Transmitters have been built with 10 taps of de-emphasis for NRZ designs at 10 Gbps<sup>4</sup>
- Literature reports of an 5 tap de-emphasis PAM4 transmitter at 20 Gbps<sup>5,6</sup>
- High precision DAC's have been fabricated around rates of 24 Gbps (12 Gsamples/sec)<sup>7</sup>
- PAM4 transmitter with 3 tap de-emphasis should be feasible in current technology at a reasonable power

<sup>4</sup> D.Crivelli et. al., "Architecture and Experimental Evaluation of a 10Gb/s MLSD based Transceiver for Optical Multimode applications", *Proceedings of ICC*, May 2008

<sup>5</sup> Z.Gao et. al., "A 10 Gb/s Wire-line Transceiver with Half Rate Period Calibration CDR", *Proceedings of IEEE ISCAS*, May 2009

<sup>6</sup> A.Amirkhany et. al., "A 24 Gb/s Software Programmable Analog Multi-Tone Transmitter", *IEE Journal of solid state circuits*, April 2008

<sup>7</sup> Greshishchev, Y.M. et. al., "A 56GS/S 6b DAC in 65nm CMOS with 256×6b memory", *Proceedings of the IEEE ISSCC*, April 2011

# Existing 10.3125GS/s 6bit ADC



- 10.3125GS/s ADC
- 4X time interleaving (2.5G subADCs)
- 6 bit ADC → ENOB ≈ 5bit
- 65nm CMOS process
- Power = 330mW
- ISSCC 2009

- 28nm provides ~ 50% power saving
- ENOB=6 requires ~ 2x more power
  - → 28nm 7bit 13.5G ADC power ~ 430mW



# Existing 40GS/s 6 bit ADC



- 40GS/s ADC
- 16X time interleaving (2.5G subADCs)
- 6 bit ADC → ENOB ≈ 5bit
- 65nm CMOS process
- Power = 1500mW
- ISSCC 2010



- 28nm provides ~ 50% power saving
- 13.5G ADC requires ~ 66% less power
- ENOB=6 requires ~ 2x more power

→ 28nm 7bit 13.5G ADC power ~ 500mW

[Greshishchev ISSCC10, 21.7, - 6b 40 GS/s ADC]

# Existing 63GS/s 8 bit ADC



- 63GS/s ADC
- 320X time interleaving
- 8 bit ADC → ENOB ≈ 6bit
- 40nm CMOS process
- Power = 1250mW
- OFC 2010



• 28nm provides ~ 30% power saving

http://www.fujitsu.com/downloads/MICRO/fme/dataconverters/OFC-2010-56Gss-ADC-Enabling-100GbE.pdf

- 13.5G ADC requires ~ 76% less power
  - → 28nm 8bit 13.5G ADC power ~ 190mW

## ADC Feasibility .....



- 10-50G ADCs with 5-6 bit ENOB have been successfully implemented as well as presented in major conferences
- 7bit 13.5G ADC power can be in 190-500mW range depending on the architectural and circuit implementation
- Further improvements in ADC is possible with architectural considerations tailored towards the PAM4 situation

## Equalizer (FFE) Feasibility



- Synthesized a parallelized 32 tap FFE with 40nm std cell TSMC library (effective bit-rate is around 26 Gbps)
- Develops on a Fast FFE implementation<sup>8</sup>
- Production part type synthesis with 20% timing margin to worst PVT corner (to estimate feasibility, area and power)
- Straightforward Fast FFE implementation, further optimizations possible in tap widths and adders for smaller area, power and latency
- POWER (Synopsys DC estimated pre-layout, static + dynamic): around twice that of a 10 tap KR FFE implementation at 10.5 Gbps
- Process node change to 28nm/20nm would further reduce the FFE power by at least 30%

<sup>8</sup> Richard Blahut, "Fast Algorithms for Digital Signal Processing", Addison-Wesley, 1985

# Equalizer (DFE) Feasibility



- Synthesized a 2 tap look-ahead<sup>9</sup> PAM4 DFE with 40nm std cell library (effective bit-rate is around 26 Gbps)
- Production part type synthesis with 20% timing margin to worst PVT corner (to estimate feasibility, area and power)
- Straightforward implementation used, further optimizations possible in look-ahead structure for lower area/power/latency
- Note that some amount of duty-cycle distortion (DCD) can be cancelled with a look-ahead DFE architecture
- POWER (Synopsys DC estimated pre-layout, static + dynamic) : similar to a 4 tap NRZ DFE at KR rates of 10.5 Gbps
- Process node change to 28nm/20nm would further reduce the FFE power by at least 30%

<sup>9</sup> Keshab K. Parhi, "Design of Multigigabit Multiplexer-Loop-Based Decision Feedback Equalizers", *IEEE Transactions On Very Large Scale Integration (VLSI) systems*, Vol. 13, No.4, April 2005

## Other blocks ...



- PGA adaptation blocks typically run at low speed (highly sub-sampled line-rate clock)
- LMS adaptation for stationary channels also typically run at very low speeds
- Timing recovery algorithms<sup>10</sup> for PAM4 are relatively simple to implement
- FEC block codes which provide 5 dB coding gain are readily available and have been presented at IEEE<sup>11, 12</sup>
- These codes have been analyzed in detail and shown to be low in power and area irrespective of the choice of line code

K. H. Mueller and M. S. Muller, "Timing Recovery in Digital Synchronous Data Receivers", *IEEE Transactions on Communications*, vol. COM-24, pp. 516-531, May 1976
S.Bhoja et. al., "Precoding proposal for PAM4 modulation", *IEEE Chicago meeting*, Sept. 2011
Z.Wang and C.J.Chen, "Feasibility of 100G-KR FEC", *IEEE Lake Tahoe meeting*, May 2011

Prior work .....



Digital Receiver performance over KR-compliant installed base<sup>13</sup> of channels above SNR Margin SNR Margin (dB)

- Coverage explored here with the digital architecture on a installed base of KR compliant channels accumulated over the last few years
- Full coverage on the installed base feasible

<sup>&</sup>lt;sup>13</sup> H.Frazier et. al., Feasibility of 100 Gb/s operation on installed backplane channels, *IEEE Lake Tahoe meeting*, May 2011





- Demonstrated that it is technically feasible to by use PAM4 as the line code/modulation technique
- Examined the performance of a digital PAM4 receiver architectures over some channels submitted to IEEE
- All of the major blocks required for an implementation are technically and economically feasible using current technology