# Complexity Study for Multi-Gig Automotive PHYs

Michael Leung, John Lin, Peter Wu, Brett McClellan, Wyant Chan

Marvell Semiconductor



### Supporter

Tzahi Madgar – Velans Semiconductor

# Summary of presentation

- A framework for DSP complexity comparison is provided in relative terms.
- PAM8 delivers lower relative complexity and power in DSP implementation compare to PAM2, PAM3, PAM4 and PAM5 systems
- For analog design, low baud rate PAM8 RX/TX paths is no more complex than high baud rate lower PAM schemes.

# Implementation Complexity Study Outline

- Objectives
  - Study the implementation complexity (area/power) of PHY for Multi-GBASE-T1 vs PAM level.
  - Focus on 2.5G, but results can be generalized to other speeds
- Assumptions
  - Covers only complexity from implementation aspect
  - PAM2, 3, 4, 5, 8 systems are considered
  - System SNR level is sufficient to support these PAM systems in 2.5G
- Methodology
  - Identify main components in a generic PHY
  - DSP Implementation complexity for different PAM schemes is studied using specific factors w.r.t 1000BASE-T1 counterparts.
  - Detailed study of Echo Canceller to established scaling factors.
  - For analog components, discussions are focused on general terms only.

# **BASE-T1 PHY Components**



- Major Digital Components
  - Echo Cancellation (EC) / Decision Feedback Equalizer (DFE)
  - DSP Tx/Rx Path, including Feed Forward Equalizer (FFE)
  - PCS, including Forward Error Correction (FEC)
- Major Analog Components
  - Analog TX/RX path
  - PLL & Clock Distribution
  - Host Interface Serdes (for SGMII or equivalent) or I/O (for RGMII)

# **DSP** Components Complexity

- For a given DSP function, implemented with similar DSP algorithms/ architecture (e.g. direct baud rate implementation)
  - Power dissipated will be proportional to Area x Clock
    Frequency
- Relative Area = Datapath Factor x Technology Factor x System Factor
  - Datapath Factor Bitwidth of inputs to DSP block
    - For large bitwidth, the factor is proportional to the bitwidth.
    - For small bitwidth, e.g. PAM3, PAM4 input, factor depends on implementation.
  - Technology Factor Area increases as clock rate increases.
    - Factor is IC silicon technology dependent (hockey stick curve).
  - System Factor Factor scaled for system requirement to be met
    - # echo taps, filter coefficient bitwidth.

# Complexity for Echo Canceller (EC)



- Datapath Factor Depends on # of PAM levels
- Technology Factor Depends on IC silicon technology and baud rate.
- System Factor Proportional to # of taps and bitwidths of adapting filter coefs.

# Complexity for EC vs PAM type

- Determination of these scaling factor is not trivial
  - Input bitwidths based on 2, 3, 4, 5, and 8 levels and relative complexity depends on implementation details
  - Area penalty with increasing clock rate is IC technology dependent.
  - System factor assumptions.
    - # of taps proportional to baud rate (decreases with # of PAM levels) to cover far-end reflection of corner case channels with imperfect termination.
    - Coefficient bitwidths constant.
- To determine and validate these scaling factors, realistic logic synthesis is carried out in current state of art technology (X-nm) and a next generation silicon technology (Y-nm).

# EC Datapath Synthesis Experiment

- Baseline PAM3 1000BASE-T1 EC synthesized as baseline for comparison.
- To understand the different factor affecting the complexity of EC datapath, EC datapath are modified in steps, synthesized and compared.
  - RTL changed to different PAM (PAM2, 3, 4, 5 and 8) and EC datapath is synthesized
  - Further synthesis is done at the appropriate baud frequency to study the clock rate penalty.
    - PAM2 3.75 x 750Mhz
    - PAM3 2.5 x 750Mhz
    - PAM4 1.875 x 750Mhz
    - PAM5 1.625 x 750Mhz
    - PAM8 1.25 x 750Mhz
  - Final synthesis is done by extending the number filter taps proportional to the baud rate so the EC covers the same total cable span. This final synthesis will be closely related to true 2.5G EC design.

# EC Datapath PAM Arithmetic



- 'Multiplication' logic for different PAM system implemented in design is shown above, labelled PAM2\*, PAM3\*, PAM4\*, PAM5\*, and PAM8\*.
- In addition, PAM arithmetic from Farjadrad\_3ch\_01a\_0518 are implemented
  - PAM4' scheme in Farjadrad\_3ch\_01a\_0518 (labelled PAM4' above), without dynamic offset correction.
  - PAM4' scheme with a separate summer of all adapted weights dynamically to preserve true arithmetic accuracy
  - PAM8' scheme similar to the Farjadrad\_3ch\_01a\_0518 (labelled PAM8' above), without correction

# EC Datapath PAM Arithmetic (Cont)

 Synthesis Area of different PAM EC relative to PAM3 EC Baseline (at same clk speed of 750Mhz and same # taps)

| Р | AM scheme | <u>X-nm</u> | <u>Y-nm</u> | <u>Average</u> |
|---|-----------|-------------|-------------|----------------|
|   | PAM2      | 0.99        | 0.99        | 0.99           |
|   | PAM3      | 1.00        | 1.00        | 1.00           |
|   | PAM4      | 1.56        | 1.41        | 1.49           |
|   | PAM5      | 1.61        | 1.58        | 1.59           |
|   | PAM8      | 1.81        | 1.69        | 1.75           |

• PAM8\* EC complexity is not significantly higher than PAM5\*/PAM4\* EC.

PAM4' EC (with offset applied) has higher complexity than the PAM4\* EC

| PAM scheme     | <u>X-nm</u> | <u>Y-nm</u> | <u>Average</u> |  |
|----------------|-------------|-------------|----------------|--|
| PAM4'          | 1.37        | 1.16        | 1.27           |  |
| PAM4*          | 1.56        | 1.41        | 1.49           |  |
| PAM4' + summer | 1.70        | 1.66        | 1.68           |  |
| PAM8'          | 1.53        | 1.44        | 1.48           |  |
| PAM8*          | 1.81        | 1.69        | 1.75           |  |

# EC Complexity vs Clk Rate (Technology Factor)

- EC Datapath Area re-synthesis at corresponding required baud rate
  - PAM2 3.75 x 750Mhz
  - PAM3 2.5 x 750Mhz
  - PAM4 1.875 x 750Mhz
  - PAM5 1.625 x 750Mhz
  - PAM8 1.25 x 750Mhz
- Relative to area factor of the corresponding different PAM scheme at <u>750Mhz</u>

| PAM scheme | <u>Freq (Mhz)</u> | <u>X-nm</u>   | <u>Y-nm</u>   | Notes                                         |
|------------|-------------------|---------------|---------------|-----------------------------------------------|
| PAM2       | 2812.5            | Failed Timing | Failed Timing | Frequency too high for sample rate processing |
| PAM3       | 1875              | 2.04          | 1.42          |                                               |
| PAM4       | 1400              | 1.60          | 1.55          |                                               |
| PAM5       | 1218.75           | 1.21          | 1.09          |                                               |
| PAM8       | 937.5             | 1.07          | 1.01          |                                               |

- Required clock rate increase for lower PAM EC will increase size
  - increase for PAM8 EC is small as clock rate increase is only 25%.

# EC Taps Factors & Overall Scaling

- Final EC datapaths are modified with # filter taps increased proportional to baud rate and resynthesized
- Final Area/Power Comparison relative to PAM3 1000BT1 Baseline <u>at the same IC</u> technology (at 750Mhz)
  - Power = Area x Clk Freq

|            | <u>Relative</u> | Area        | Relative Power |             |  |
|------------|-----------------|-------------|----------------|-------------|--|
| PAM scheme | <u>X-nm</u>     | <u>Y-nm</u> | <u>X-nm</u>    | <u>Y-nm</u> |  |
| 1GBT1 PAM3 | 1.00            | 1.00        | 1.00           | 1.00        |  |
| 2.5G PAM2  | N/A             | N/A         | N/A            | N/A         |  |
| 2.5G PAM3  | 4.80            | 3.69        | 12.00          | 9.23        |  |
| 2.5G PAM4  | 5.03            | 3.91        | 9.44           | 7.33        |  |
| 2.5G PAM5  | 3.21            | 2.90        | 5.22           | 4.71        |  |
| 2.5G PAM8  | 2.40            | 2.15        | 3.00           | 2.69        |  |

• PAM8 EC had significantly lower complexity and much lower power than other PAM EC

# Complexity for Other DSP Blocks



- DFE design complexity scaling should be similar to EC
  - DFE filter input is PAM levels
  - DFE tap size approximately proportional Baud Rate
- Digital RX/TX Path
  - Datapath width should be > system ENOB. Assumption is ENOB + 2.
  - Technology factor similar to EC factor assuming clock rate same as EC
  - System factor assume constant
    - FFE/Tx\_filter tap, coef bitwidths assumed constant

## **Complexity for Non-DSP Blocks**



- PCS Dominate by FEC Decoder block
- FEC (Reed Solomon Decoder)
  - RS's main system requirement is to cover EM Burst
    - EM Burst duration in 2.5G shielded system is reduced compare to 1GBaseT1
    - Assume RS code unchanged for 2.5G
  - RS complexity of 2.5G is thus independent of PAM scheme
  - Complexity depends on Data Rate (2.5x)

# **Complexity Summary Digital DSP Blocks**

| Y-nm Tech                                      | Parameters            | 1G - PAM3 | 2.5G - PAM8 | 2.5G - PAM5 | 2.5G - PAM4 | 2.5G - PAM3 | 2.5G - PAM2 |
|------------------------------------------------|-----------------------|-----------|-------------|-------------|-------------|-------------|-------------|
|                                                | Data rate             | 1         | 2.5         | 2.5         | 2.5         | 2.5         | 2.5         |
|                                                | Baud rate             | 1         | 1.25        | 1.625       | 1.875       | 2.5         | 3.75        |
|                                                | Technology factor**   | 1         | 1.01        | 1.13        | 1.25        | 1.55        | 2.1         |
|                                                | EC datapath factor    | 1         | 1.75        | 1.56        | 1.49        | 1           | 0.99        |
|                                                | ENOB**                | 6.5       | 8           | 7.25        | 7           | 6.5         | 6           |
|                                                | RX/TX Datapath factor | 8.5       | 10          | 9.25        | 9           | 8.5         | 8           |
| Factor equation                                |                       |           |             |             |             |             |             |
|                                                | •                     |           | factor      |             |             |             | factor      |
| Baud (taps) x Tech x EC datapath               | Echo / DFE            | 1.00      | 2.21        | 2.86        | 3.49        | 3.88        | 7.80        |
| Tech x RX/TX datapath (normalize)              | RX/TX Path            | 1.00      | 1.19        | 1.23        | 1.32        | 1.55        | 1.98        |
| Data rate                                      | PCS (RS)              | 1.00      | 2.50        | 2.50        | 2.50        | 2.50        | 2.50        |
|                                                | Digital Power         | factor    | factor      | factor      | factor      | factor      | factor      |
| Baud (taps) x Tech x EC datapath x Baud (clk)  | Echo / DFE            | 1.00      | 2.76        | 4.65        | 6.55        | 9.69        | 29.24       |
| Tech x RX/TX datapath (normalize) x Baud (clk) | RX/TX Path            | 1.00      | 1.49        | 2.00        | 2.48        | 3.88        | 7.41        |
| Data rate                                      | PCS (RS)              | 1.00      | 2.50        | 2.50        | 2.50        | 2.50        | 2.50        |

- Technology Factor curve fitted from synthesis results in Y-nm
- ENOB assumption used is for analysis purpose
- PAM8 scheme had both lower relative area/power factors on all DSP blocks

# Generalization to >2.5G DSP Blocks

| Y-nm Tech                                      | Parameters            | 1G - PAM3 | 2.5G - PAM8 | 2.5G - PAM5 | 2.5G - PAM4 | 2.5G - PAM3 | 2.5G - PAM2 |
|------------------------------------------------|-----------------------|-----------|-------------|-------------|-------------|-------------|-------------|
|                                                | Data rate             | 1         | 2.5         | 2.5         | 2.5         | 2.5         | 2.5         |
|                                                | Baud rate             | 1         | 1.25        | 1.625       | 1.875       | 2.5         | 3.75        |
|                                                | Technology factor**   | 1         | 1.01        | 1.13        | 1.25        | 1.55        | 2.1         |
|                                                | EC datapath factor    | 1         | 1.75        | 1.56        | 1.49        | 1           | 0.99        |
|                                                | ENOB**                | 6.5       | 8           | 7.25        | 7           | 6.5         | 6           |
|                                                | RX/TX Datapath factor | 8.5       | 10          | 9.25        | 9           | 8.5         | 8           |
| Factor equation                                |                       |           |             |             |             |             |             |
|                                                | Digital Area          | factor    | factor      | factor      | factor      | factor      | factor      |
| Baud (taps) x Tech x EC datapath               | Echo / DFE            | 1.00      |             |             |             |             |             |
| Tech x RX/TX datapath (normalize)              | RX/TX Path            | 1.00      | 1.19        | 1.23        | 1.32        | 1.55        | 1.98        |
| Data rate                                      | PCS (RS)              | 1.00      | 2.50        | 2.50        | 2.50        | 2.50        | 2.50        |
|                                                | Digital Power         | factor    | factor      | factor      | factor      | factor      | factor      |
| Baud (taps) x Tech x EC datapath x Baud (clk)  | Echo / DFE            | 1.00      | 2.76        | 4.65        | 6.55        | 9.69        | 29.24       |
| Tech x RX/TX datapath (normalize) x Baud (clk) | RX/TX Path            | 1.00      | 1.49        | 2.00        | 2.48        | 3.88        | 7.41        |
| Data rate                                      | PCS (RS)              | 1.00      | 2.50        | 2.50        | 2.50        | 2.50        | 2.50        |

- At 2.5G, for current technology (X-nm), lower PAM scheme scaling factors are even higher due to higher penalty from higher clock rate
- For PHY > 2.5G, the trend of higher PAM scheme having lower DSP complexity will further enlarged as the complexity penalty from higher Baud rate will increased as the technology factor will follow the hockey stick curve.

# Complexity for Analog TX/RX Blocks



- RX / TX Analog Path
- Higher Sample Rate -> Higher Signal Bandwidth -> Power Increase
- Low PAM system -> Lower ENOB Requirement -> Power Decrease
- Thus, overall power tradeoff between low PAM-high baud vs high PAM-low baud is relatively flat.



- PLL / Clock Distribution
  - Clock Logic and Distribution power increase with baud rate
  - Low PAM system -> Relaxed Jitter Tolerance -> Lower Power
  - Thus, overall power tradeoff between low PAM-high baud vs high PAM-low baud is also relatively flat.
- Serdes/IO
  - Power proportional to Data Rate, independent of PAM scheme

# Conclusion

- A framework for DSP complexity comparison is provided in relative terms.
- PAM8 delivers a lower DSP implementation complexity and power compared to PAM2, PAM3, PAM4 and PAM5 systems.
  - Demonstrated in both current and next-gen technology with realistic synthesized designs in 2.5G
- Of all major DSP blocks in a PHY, PAM8 scheme has either significantly lower complexity or similar complexity, as
  - DSP Power Proportional to Baud Rate
  - In additional, # Echo Tap Proportional to Baud Rate
  - PAM8 arithmetic complexity is comparable to other PAM scheme
- For analog design, low baud rate PAM8 RX/TX paths is no more complex than high baud rate lower PAM scheme.
- Generalizing beyond 2.5G, the DSP complexity of a higher PAM system (lower Baud) is lower, as long as system SNR is sufficient.

# Thank You