

### **C2M Receiver Architecture**

Ali Ghiasi Ghiasi Quantum LLC

IEEE 802.3ck Task Force Meeting Bangkok

November 12, 2018

### Background



- Back channel training has been proposed for C2M in <u>sun\_3ck\_01a\_0918.pdf</u> as a lower power SerDes interface
- C2M implementations trade-offs are given in <u>slavick\_3ck\_02\_0918.pdf</u> suggesting back channel will reduce complexity of the C2M receiver
- This contribution will show that a self contained C2M receiver that doesn't require back channel training is simpler, more robust, less complex, and low power!

# **C2M Link Options**



### During Sept. Interim <u>slavick\_3ck\_02\_0918.pdf</u> presented 6 receiver architecture from simple to continuous protocol adaptation

- Majority voted for Slavick option B where the receiver is self contained and does not require back channel training
- This contribution will show that Slavick option B is simple with complexity, more robust, and low power!

|                                   | A: Low loss<br>C2M | B: Rx does it<br>all | C: Regs at<br>startup              | D: Regs<br>continuously   | E: Startup<br>Protocol | F: Continuous<br>Protocol |
|-----------------------------------|--------------------|----------------------|------------------------------------|---------------------------|------------------------|---------------------------|
| Reach                             | Short              | Medium               | Medium                             | Medium                    | Medium                 | Medium                    |
| Module Electrical Rx              | Simple             | Complex              | Simple                             | Simple                    | Simple                 | Simple                    |
| Host Electrical Tx FFE            | Fixed              | Fixed                | Adaptive                           | Adaptive                  | Adaptive               | Adaptive                  |
| Module Electrical Rx<br>Input Eye | HCB based          | HCB based            | Set at startup<br>VT               | Updated over<br>VT shifts | Set at<br>startup VT   | Updated over<br>VT shifts |
| Host Compliance                   | Same as past       | Similar to past      | KR/CR style                        | KR/CR style               | KR/CR style            | KR/CR style               |
| Management<br>involvement         | Low                | Low                  | Low -> High<br>(burst at startup?) | High                      | Low                    | Low                       |
| LinkUp time                       | Shortest           | Short                | Short->Long                        | Short->Long               | Medium                 | Medium                    |

#### Straw poll #3:

If we go with 16dB, where should equalization be added?

(A) Fixed TX FFE and more complex RX (slavick\_3ck... option B)

- (B) Adaptive TX with some kind of link training (slavick\_3ck... option C/D/E/F)
  - (C) More information needed

Pick one

A: 39, B: 11, C: 16

# Example of Low Power FFE Suitable for 100G AUI

Momtaz analog FFE implementation is a 40 GBd 7-Tap T/2 FFE with 2 pre-cursor and a power of just 80 mW in 65 nm CMOS based on clever design of using transconductance amplifier instead of delay line

- The implementation uses an innovative passive-active delay element which are process invariant
- Baseline FFE for 100GEL is 5 taps T-Spaced with no pre-curso
- Momtaz FFE with 20 GHz BW would not need to increase the BW by more than 30%
- The delay T can be increased from 12.5 ps to 18.8 ps by adjusting transconductance amplifier
- With 16 nm process fast enough most of the inductors would be elimianted
- The estimated above circuit in 16 nm CMOS would be ~40 m
- Momtaz implementation uses inductors and may not be suitable for high port counts ASCIs
- $Area=0.75 \text{ mm}^2$
- Power/(datarate.delay)=21.6  $\mu$ W
- The estimated 7 Tap FFE with 2 pre-cursor to support PAM4 in 16 nm CMOS would be about 60 mW.



Fig. 1. *M*-tap FFE block diagram.

**Pre-cursor** taps

Afhsin Momtaz, An 80 mW 40 Gb/s 7-Tap T/2-Spaced Feed-Forward Equalizer in 65 nm CMOS IEEE Journal of Solid-State Circuit, Vol. 45, No. 3, march 2010.



**Post-cursor taps** 



4

©

-///->

000000

# Example of Low Power FFE Suitable for 100G AUI

- Mammei analog FFE implementation is a 10-25 GBd 7-Taps FFE with no restriction on pre-cursor and a power of 90 mW in 28 nm LP CMOS
  - Delay elements are created with transconductance amplifier similar to Momtaz
  - But Mammei uses transimpedance amplifier to sum the current instead of using inductors as in case of Momtaz
  - Mammei FFE similar to Momtaz has BW of ~20 GHz and for 53.1 GBd operation would not need to increase the BW by ~ 30%
  - The delay is adjustable from 30-75 ps which can be easily be reduced to 18.8 ps
  - The estimated above circuit in 16 nm CMOS would be ~58 mW
  - Mammei compact FFE implementation suitable for high density ASIC integration
  - Area=0.085 mm<sup>2</sup>
  - Power/(datarate.delay)=20 μW
- The estimated 7 Tap FFE for PAM4 in 16 nm CMOS based on Mammei desing would be about 88 mW.



### Example of Low Power FFE Suitable for 100G AUI

#### Boesch analog FFE implementation is a 20 GBd 5-Taps FFE with no restriction on pre-cursor and a power of 20 mW in 40 nm CMOS

- Delay elements are created with transconductance amplifier similar to Momtaz
- But Boesch uses inverters for low power and transimpedance amplifier to sum the current instead of using inductors as in case of Momtaz
- The delay was optimized for 25 ps and 53.1 GBd operation would require reducing delay to 18.8 ps
- The estimated above circuit in 16 nm CMOS would be ~10.5 mW
- Boesch compact FFE implementation suitable for high density ASIC integration
- Area=0.003 mm<sup>2</sup>
- Power/(datarate.delay)=4 μW

#### The estimated 5 Tap FFE for PAM4 in 16 nm CMOS based on Boesch would be <20 mW!</p>

Ryan Boesch, A 0.003 mm2 5.2 mW/tap 20 GBd Inductor-less 5-Tap Analog RX-FFE, Symposium on VLSI Circuits, 2016.

Coefficients

5 bits + sign

 $v_i$ 

delay

delay

delay

delay



!! Summing Circuit !

### Adding Analog Low Power FFE EQ to sun\_3ck\_01a\_0918

- Power for non-DAC TX implementation should be based on conventional current summing implementation
   [5\*] instead of scaling down higher power DAC implementations
- Asymmetric balanced EQ is about the same as Analog FFE if one exclude Mux/De-mux, LT/PCS logic, and channel estimator power required for asymmetric balanced EQ operation!

| Architecture                                                 | Balanced EQ (1. Asymmetric,<br>2. symmetric)                                        | 3. Analog DFE **                                                | 4. ADC Based                                                                                                                                                                      | 5. Analog FFE                                                                     |  |
|--------------------------------------------------------------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--|
| Equalization                                                 | TX: FIR (2/4 taps for asymmetric structure, 2/11 taps for symmetric structure)      | TX: FIR (2/4)<br>RX: CTLE, with DFE<br>taps                     | TX: FIR (2/4)<br>RX: CTLE, 6-bit ADC, 8 postcursor digital<br>FFE                                                                                                                 | TX: FIR(2/4)<br>RX:CTLE, Analog 5-7 tap FFE                                       |  |
| TX Power*(mW)                                                | 196                                                                                 | 196 *157mW                                                      | 196 *157mW                                                                                                                                                                        | 157 mW<br>(by scaling TX of [5] from 64<br>Gb/s to 112 Gb/s)                      |  |
| · · · ·                                                      | 224 (symmetric structure)                                                           | -                                                               |                                                                                                                                                                                   |                                                                                   |  |
| RX Power (mW)                                                | 239<br>(by scaling [6])                                                             | 436<br>(by scaling [3], 2 DFE<br>tail tap power is very<br>low) | <ul><li>498</li><li>(310 by scaling [5] front end for 13.6dB channel;</li><li>108 for FFE by scaling FIR of [7] for 6b input;</li><li>80 for PLL, deserializer and CDR)</li></ul> | 220 mW<br>(by scaling [6] to 112G)<br>+60 mW for 7 T FFE<br>Total RX Power=280 mW |  |
| Relative total Power<br>(mW)                                 | <ul><li>0 (435 as Baseline for asymmetric)</li><li>28 (463 for symmetric)</li></ul> | <b>197</b><br>(total 632)                                       | <b>259</b><br>(total 694)                                                                                                                                                         | +2 mW<br>(total power 437 mW)                                                     |  |
| Power Difference for<br>2x400G Module C2M<br>at 106.25G (mW) | <b>0</b> for asymmetric (Total 3480)<br><b>224</b> for symmetric (Total 3704)       | <b>1,576</b> 1269 mW<br>(Total <b>5956</b> ) <sup>4744</sup> mW | 2,072 1760 mW<br>(Total 5552) 5240 mW                                                                                                                                             | +16 mW<br>(total 3480)                                                            |  |
| Projection with 30% reduction (mw)***                        | <b>0</b> for asymmetric (Total 305)<br>19 for symmetric (Total 324)                 | 137 (total 442)<br>110 (total 415)                              | 181 (total 486)<br>154 (total 459) mW                                                                                                                                             | 0 Analog FFE (Total 305 mW)                                                       |  |

For list of Sun reference please see http://www.ieee802.org/3/ck/public/18\_09/sun\_3ck\_01a\_0918.pdf. A. Ghiasi IEEE 802.3ck Task Force C

-^\/\-

000000

000000

### A reliable method is necessary to train the TX FFE

- Analog CTLE or CTLE/5T FFE receivers may only use simple threshold detectors to determine eye opening and/or spectrum shaping shown here
  - A simple low power threshold detector or HP/LP filter can't provide sufficient information such that transmit FFE is adjusted to new optimum setting
  - There is no guarantee that the adaptation will converge to optimum setting and not local minima
  - There is no guarantee in the process of TX adaptation that the link will stay up
- The power and complexity of more sophisticated monitoring scheme such as channel estimator which can provide more deterministic TX convergence must be considered!



#### Figure 4.7.2: Equalizer architecture.

J. Lee, A 20 Gb/s Adaptive equalizer in 0.13 um CMOS Technology, 4.7, ISSCC 2006.

000000

000000

╶ヘ∧∧ݷ

### Complexity of adding Link Training (LT) to Optical Modules

### An optical link consist of 2-4 segments where each segment must be trained

- LT on the backplane or CR links are point-point LT at start up only
- C2M links are segmented and would require continuous adaptation through slow-unpredictable I2C
- A low power CTLE RX does not have DSP capabilities that can guide TX FFE to optimum setting in few steps

C

-1//->

000000

000000

- There is no guarantee that TX FFE will not get stuck in local minima or even worse the link dropping
- 4 segmented link with 8 LT engine need to work seamlessly as shown in diagram below just to to bring up an optical link
- A module CDR implementing backchannel LT would require full Mux/De-mux with AN/PCS logic ruling out serial CDR implementations and non-CMOS implementations
- An optical module with back channel LT will be significantly more complex to qualify, mange, and diagnose.



# Summary



- Propose balanced asymmetric implementation using long TX FFE with back channel dramatically increases link complexity, difficult to guarantee will not go down during continuous training, and may not save power
  - The proposed balanced asymmetric scheme not only is more complex but actually may not be lower power as requires full mux/de-mux with AN/PCS implemented in the module PMA
  - Unless the receiver implements a channel estimator the is possibility that LT will be stuck in local minima
    or worse the link may fail during operation
  - Proposed balanced asymmetric proposal does not address high crosstalk channels where a DFE maybe required
- Analog CTLE with 5T RX FFE offers lower power, lower latency, without requiring full mux/demux and AN/PCS, without complex host-module dependencies, and can supports up to 16 dB host channels (see ghiasi\_3ck\_03\_1118).