Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: Loop Bandwidth for 64B66B




Rick,

Thanks for making the effort...I am almost with you....

Starting from HARI, passing the quad SerDes, deskew, 8B10B decode, 64B66B
encode, and upto the entrance of the gearbox, I have understood your
proposal.

But you lost me on the Clock Multiplier Unit. Can you explain your version
of the CMU, including what are the clock inputs to the gearbox and to the
serializer?

This is how I understand it. The gearbox will need 644.53 MHz clock input
(in addition to 312.5 MHz reference clock input), so the CMU has to achieve
a 33/16 multiplier. So the phase comparator inputs will be 312.5/16 and
644.53/33, with PLL locked to 644.53 MHz. In that case, wouldn't the "1/20
loop bandwidth" thumb rule lead to a loop bandwidth of 312.5/16/20 = 0.98
MHz, because in effect, our reference frequency is now 312.5/16 MHz?

If we try to make the CMU such that PLL is locked to 33 times the reference
frequency of 312.5 MHz, we are asking the VCO to produce 10.3 GHz, which we
can then divide by 16 and feed to the gearbox. Isn't this pushing CMOS too
far? If I had the ability to integrate the SerDes into this solution, this
10.3G VCO would be desirable because the Serializer will need 10.3 GHz
anyway. Unfortunately, early implementations will require that we separate
the Serializer from the rest of the functions.

Regards,
Vipul

=======

> -----Original Message-----
> From: owner-stds-802-3-hssg-64b66b@xxxxxxxx
> [mailto:owner-stds-802-3-hssg-64b66b@xxxxxxxx]On Behalf Of Rick Walker
> Sent: Wednesday, December 29, 1999 2:36 PM
> To: stds-802-3-hssg-64B66B@xxxxxxxx
> Subject: Re: Loop Bandwidth for 64B66B
>
>
>
>
> Hi Vipul,
>
> > It seems to me that the use of 64B66B code may adversely affect the
> > design of a Clock Multiplier Unit in a Serial PMD, though I have not
> > quantified it to decide the magnitude of the problem.  This is not a
> > flaw in the 64B66B code, but rather an unfortunate consequence of the
> > way frequency multiplication works out.  I think this problem can also
> > occur if we use other low-overhead codes.
>
> Yes.  This is a tricky bit.  As you point out, I also think it
> is inherent
> in any low overhead code.  If we only add a small overhead, then
> the clock
> multiplication ratio is necessarily a rational ratio close to 1, and
> involving big numbers.
>
> > To simplify, let's take a specific example.  Suppose a Serial PMD's
> > transmit path is designed such that data received at the HARI
> > interface is decoded to 8B, then again encoded using 64B66B for
> > optical transmission.  Width of SerDes data path is 8. So the output
> > of the 64B66B encoder must find a way to ship out data in chunks of 8
> > bits.  This requires a clock frequency conversion from f_in to f_out =
> > f_in * 66/8 = f_in * 33/4.
>
> I agree.
>
> > A typical Clock Multiplier Unit implementation will have a phase
> > detector, comparing the phases of f_in/4 and f_out/33, where f_out is
> > the output of a VCO.  In our example, f_in will be 156.25 MHz, and
> > f_out will be 1.289 GHz.  The input to 8:1 SerDes will be this 1.289
> > GHz clock, and 8-wide data.
>
> I would do it a bit differently.
>
> The HARI 1:10 demux generates 10 bit words at a rate of 312.5 GW/s.
> This is the same byte rate after 10:8 decoding.
>
> The ratio of the recovered HARI word rate and the 66/64 serial rate is
> then 33:1.
>
> I would run the final mux as a 16:1 mux, but preface it with a digital
> "gearbox" that does a 66:33:16 conversion.  This operation takes
> two blocks
> of 16 bits from the first 33 bit word, and saves the extra bit for the
> next cycle.  It then concatenates the saved bit with 15 bits of
> the second
> input word to produce a third output word.  Another 16 bits is
> then stripped
> off, leaving 2 bits to be saved for the next input word...  and
> so on. This
> approach trades off circuit complexity for low latency.
>
> Perhaps a simpler implementation uses a dual port memory of size
> = 66 bytes.
>
> Since 66*8 is divisible by both 16 and 33, then the transfers become
> synchronous.  The phase-locked input source writes 8 encoded
> 66-bit blocks
> at 312.5 MHz, and the TX data path reads them out as 33 16-bit words at
> a rate of 644.53125 MHz.
>
> > It is the integer 33 that is at the heart of this problem.  One can
> > argue that this makes for a "stiff" PLL - the VCO, whose job is to put
> > out f_out, is being refreshed at a rate of f_out/33 - resulting in an
> > unusually low loop bandwidth.
>
> Let's figure this out.  The reference frequency is 312.5 MHz, and we
> are locking our PLL to 33x the reference frequency.  The general rule
> of thumb is that the PLL loop BW can be about 1/20th of the reference
> frequency.  In this case, the loop BW can be 312.5/20 = 15.625 MHz.
>
> My experience says that a loop BW of 2 MHz is adequate to completely
> dominate the 1/f noise of a bipolar ring oscillator.  Other designers
> could conceivably use higher-Q LC oscillators with even less BW
> requirements.
>
> So, I disagree with your feeling that the loop BW is "unusually low".
>
> >From my experience, we have about 8x more BW than is strictly required.
>
> > This may be generally regarded as a Bad
> > Thing because:
> >
> > 1. It increases the probability of VCO drift, making it difficult to
> > meet the frequency tolerance specification.
>
> A 2MHz BW is wide enough to tame the 1/f noise of the worst VCO that
> anyone is likely to use.  We can have a 15 MHz BW if we use a classical
> linear loop, and could effectively have 150 MHz small signal BW if we
> use a bang-bang loop.
>
> I see no problem here.
>
> > 2. Alternatively, it forces the use of a large capacitor in the low
> > pass filter that will reside at the output of the phase detector.
> > Large capacitors have to be external.  That increases noise
> > susceptibility.
>
> This is likely to be true for a pure bipolar implementation due to the
> low Rout of bipolar devices.  In a CMOS or SiGe/CMOS implementation, the
> capacitor can be put on-chip due to the high charge-pump impedance.
>
> > 3. It increases low-frequency jitter.  Phase noise in VCO output at
> > all frequencies above the loop bandwidth will reach SerDes.
>
> I don't think so.  The low frequency jitter will very nicely track
> to the incoming reference within the 4-5 MHz loop BW.
>
> > 4. It increases lock acquisition time of a PLL.
>
> I would imagine using some kind of a frequency aided loop to address the
> startup issue.  In any case, I don't think even up to a millisecond
> power-on delay is significant for this application.
>
> > My question: is this a big problem, or is this a small issue,
> > routinely handled using careful design practices?  If a
> > non-proprietary solution is known, what is it?
>
> Let me know if my reasoning makes sense to you.  I'll be happy to try to
> hash it out in more detail if anything seems to be unclear.
>
> Short term, I agree that it is a bit of a hassle that no commercial
> parts exist to implement this circuit directly.  However, I believe
> that prototypes can be relatively easily made with FPGA devices using
> off-the-shelf 16:1 serializers.  The FPGA does all the tricky bits,
> barrel shifting at a relatively low clock rate.  The 16:1 clock gen is
> phase locked with a 33:1 divider to the recovered HARI word clock.
>
> This is what we are looking into for a demonstration vehicle.
>
> Long term, I imagine a single BiCMOS chip doing everything: HARI RX/TX
> + coding/decoding + 10.3Gb/s RX/TX.
>
> Best regards,
> --
> Rick Walker
>