Re: Hari Debate, Response to Richard Taborek's Note to Dan Dove,dated 12/16/99 08:50:38 PM
Al and fellow Stripers,
First of all, you DO realize that my response is going to be long-winded. So
I'll warn you about it right up front :-).
Secondly, I'd like to set the record straight with my personal and commercial
intentions with respect to this issue. My new company will be developing 10 Gbps
Physical layer components. As an architect, I don't care which way the Striping
issue is decided. I will direct my engineering staff to develop solutions
commensurate with the committee decision. I strongly support a direction to make
a quick decision and to use protocol and PMD independence as a cornerstone in
guiding that decision. The three industry efforts affected by this decision are
10 GbE, 10 GFC and InfiniBand. I firmly believe that the industry will suffer as
a whole if multiple Striping methodologies are employed.
P.S. I'm renaming Byte-Striping to Column-Striping since it's more befitting the
actual striping methodology.
That being said, put on your helmets :-)
Rich
widmer@xxxxxxxxxx wrote:
> 
> To Richard Taborek and Hari Interest Group
> 
> Richard,
> 
> I feel uncomfortable that I have to express again disagreement with some of
> your statements. I had hoped that we could together present the significant
> issues with their pros and cons in a factual and fair way so not every
> voting member would have to study all the details.
I and many other IEEE 802.3, FC and InfiniBand supporters have tried to do
assemble a 10 Gbps data transport architecture (Hari) primarily intended for
intra-cabinet use multiple protocols including 10 Gigabit Ethernet, 10 Gigabit
Fibre Channel and InfiniBand. I believe that I have done my part to present the
issues and sub-issues in a factual manner. Your disagreement with my statements
is your perogative. The purpose of this reflector is to present and then discuss
the issues related to the development of IEEE P802.3ae. I have no problem with
getting into the nitty gritty details. Upgrading to bigger hard disks is good
for the economy.  
 
> In the above referenced note you make reference to a presentation by Mark
> Ritter at
> http://grouper.ieee.org/groups/802/3/10G_study/public/nov99/ritter_1_1199.pdf
> pages 15 and 16, and then you proceed to some conclusions adverse to word
> striping. There are elementary flaws in your argumentation: The Figure 3 of
> your note does not match the referenced proposal.
On this point, I have reviewed my comments and believe that they match the
reference proposal. The point I made in Figure 3 is that the Word-Striping
proposal for 10 GbE convolutes the translation of information form the MAC
through the PCS, whereas Column-Striping preserves the coding quite well.
> Regardless of the
> specifics, Mark Ritter presented just an example to illustrate how a
> maximum number of commas can be provided within the Ethernet and FC
> constraints for people who see this as a necessity, which in fact it is
> not. The example also shows that word-striping does not require any ordered
> set specifically dedicated to synchronization. The comma can be part of
> many delimiters such as SOF, EOF, etc. as is common practice in the Fibre
> Channel format, so synchronization can be acquired using any ordered set,
> i.e. with word striping you get two functions wrapped into a single word.
My counter argument is that 8B/10B synchronization for Hari requires that
code-groups containing commas, not necessarily ordered-sets code-groups
containing commas, be present in the code-group stream. The current
Word-Striping proposal DICTATES the following synchronization rules for 10
Gigabit Ethernet according to the .../ritter_1_1199.pdf proposal:
a) code-groups containing commas shall only be present in multi-code-group
ordered-set;
b) ordered-sets shall be comprised of four code-groups (word);
c) code-groups containing commas shall be located in only the last code-group
position of a word;
d) code-groups containing commas may occur within the data portion of an
Ethernet Packet (in the preamble).
8B/10B synchronization rules for Column-Striping are much simpler and
require only that a column of code-groups containing commas be transmitted every
other column (i.e. KRKR, where K is a code-group containing a comma).
It is traditional and deemed appropriate to not transmit any 8B/10B special
code-groups during the transmission of an Ethernet Packet. Therefore, the only
slot available for special code-group transmission is during the Inter Packet
Gap (IPG). Note that with a proposed 12-byte IPG for 10 GbE, Word-Striping rules
dictate that the both the Start AND End of Packet delimiters be encoded as
4-code-group ordered sets in order to guarantee that at least one code-group
containing a comma is present in each lane during the IPG to satisfy link
synchronization requirements.  This causes the generation of convoluted
delimiters and the presence of special code-groups within Ethernet Packets. This
is clearly illustrated in .../ritter_1_1199.pdf on page 15. 
You assert that: "with word striping you get two functions wrapped into a single
word." I assume that the two functions you allude to are synchronization and
delimiter support. Note that commas for synchronization must be present in each
lane during the IPG and that 10 GbE proposed delimiters need only be one
code-group in length. Therefore, according to the current proposals it only
takes fewer total code-groups (8 maximum) to terminate an Ethernet Packet and
provide synchronization for Column-Striping, whereas it takes 4-5 words
(16-20 code-groups to do the same for Word-Striping.
In the Striping Evaluation Criteria I presented in my 'Hari Byte vs. Word
Striping' note of 11/26/99, I did not count Link Synchronization as one of the
criteria. I now believe that it is appropriate to do so and propose that the
advantage be assigned to Column-Striping.
 
> You find fault with "for Word-Striping the Ethernet Start-of-Packet does
> not occur in lane 0". I am mystified why anybody should care, as long as
> the words are delivered in the proper format and sequence at the
> destination, which they do. Please explain the reason for your objection.
> Words are rotated from lane to lane without regard to content and I see no
> disadvantage from this procedure. The rotation of SOF from lane to lane has
> the beneficial effect of rotating commas from lane to lane.
Quite simply, Column-Striping facilitates the support of this simple rule,
whereas meeting this rule is difficult, if not impossible for Word-Striping.
Bottom line is that ULP requirements of this type are simple to meet with
Column-Striping. For Ethernet the SOP is proposed to double as a Lane ID,
with SOP identifying Lane 0. I won't stoop so low as to claim this quality as an
advantage for Column-Striping. However, I can't possibly count it as a
disadvantage :-)
As far as rotating commas from lane to lane goes, I believe I've addressed this
issue above. In review, this is an advantage based on a disadvantage for
Word-Striping. 
> You also refer to "the relative complexity of determining the last byte of
> a packet" (with Word-Striping). We do not propose any changes to the 10 Gb
> Ethernet EOF format, certainly not at the MAC interface. So there is no
> difference here.
It is improper to argue that: "We do not propose any changes to the 10 Gb
Ethernet EOF format, certainly not at the MAC interface." since striping
granularity is a PCS attribute, it should not be reflected across OSI layers.
The MAC is part of the OSI Data Link layer, whereas the PCS is a sublayer of the
Physical layer. More specifically, an "EOF" entity is not defined for 10 Gigabit
Ethernet. For 1 GbE, a End-of-Frame Delimiter (EFD) is defined at the
Reconciliation sublayer and is defined as the de-assertion of the TX_EN and
RX_DV signals. Since the Reconciliation Layer is the direct interface to the
MAC, it is inappropriate to associate an 8B/10B code group and/or "EOF format"
with the MAC interface. 
Given that you ARE proposing a specific 10 Gigabit Ethernet End-of-Packet
format,
I maintain that you are significantly increasing "the relative complexity of
determining the last byte of a packet". I have previously addressed this issue
in my 'Hari Byte vs. Word Striping' note of 11/26/99 under evaluation criteria
#10, Preservation of Ordered-Sets. I'll reintroduce this issue and discussion
here:
"10) Preservation of Ordered-Sets: Some protocols heavily utilize
ordered-sets. Most notable is Fibre Channel. As a result, it is
advantageous to preserve those ordered-set definitions to enable the
smoothest migration to 10 Gbps products. However, This requirement
conflicts with a desire to specify protocol independent PMDs, especially
since significant differences exist between protocols at 1 Gbps and
migration to 10 Gbps with the fewest protocol changes is desirable for
each.
Column-Striping and other objectives specified in the proposed Hari Coding
Objectives presentation, Rich Taborek:
http://grouper.ieee.org/groups/802/3/10G_study/public/nov99/taborek_1_1199.pdf
suggest minor changes to 1 GbE coding to simultaneously arrive at a 10
GbE and protocol independent PMD. InfiniBand coding objectives are in
line with the latter proposal.
The proposed Word Striping on Multiple Serial Lanes, Mark Ritter, IBM:
http://grouper.ieee.org/groups/802/3/10G_study/public/nov99/ritter_1_1199.pdf
suggests a significant departure from 1 GbE coding to 10 GbE. In
addition, this proposal drastically violates Fibre Channel ordered-set
rules in its 10 GbE proposed encodings by not observing the
/K28.5/Dxx.y/Dxx.y/Dxx.y/ format. Proposed 10 GbE mappings include all
of the following formats. Some of the mappings have apparent
inconsistencies. All violate FC ordered-set rules:
/Dxx.y/Dxx.y/Dxx.y/K28.5/ - Idle Comma position reversed
/K27.7/Dxx.y/Dxx.y/K28.5/ - Start of Packet, Comma during Ethernet Preamble
/K29.7/K28.0/K28.0/K28.5/ - End of Packet in byte 0
/Dxx.y/K29.7/K28.0/K28.5/ - End of Packet in byte 1
/Dxx.y/Dxx.y/K29.7/K28.5/ - End of Packet in byte 2
/Dxx.y/Dxx.y/Dxx.y/K29.7/ - End of Packet in byte 3"
Note that for Word-Striping case 'End of Packet in byte 3', the single EOP
/K29.7/ delimiter is both preceded by data code-groups and followed by the
same, since the Idle Word which likely follows in the same lane is encoded as
follow: /Dxx.y/Dxx.y/Dxx.y/K28.5/. Therefore, a single bit error can be
recognized as erroneous EOP. This scenario is in clear violation of Gigabit
Ethernet error robustness rules. The general rule applied there was: "No pattern
of three or fewer errors can cause an undetected packet error." This is the
basis I used for my assertion concerning "the relative complexity of
determining the last byte of a packet" with Word-Striping.
> I think you have misread a chart of the Ritter presentation on page 15. At
> the top of the page is a representation of words as they appear at the Hari
> parallel input or output, arranged vertically in byte order 0 through 3, as
> marked on the left of the diagram. The earliest word is on the left side.
> The diagram on the bottom shows the same words in the same configuration as
> they might be modified inside the Hari to generate a maximum number of
> commas as deemed necessary by some folks (the 4 horizontal rows represent
> bytes as indicated on the left, not lanes). For me, a high number of commas
> is just desirable, not necessary. I have more to say on this below. The
> serialized words on 4 lanes are shown on page 16 (in non-staggered format,
> for simplicity). We do not show the order of bytes transmitted on the lanes
> as your Figure 3 infers, it is irrelevant anyway for the circuits outside
> the SerDes and framing domain.
I have carefully interpreted your comments above and attempted to reconcile them
with my note of 12/16/99, and I'm at a loss to find a discrepancy except for the
fact that you DON'T show the order of bytes transmitted on the lanes on the
lanes and I DO. I have no choice but to ignore this comment until you point out
my misinterpretation.
> From objections voiced at meetings and in off-line conversations, I sense
> that some people project incorrectly certain conditions and requirements
> from the byte-striped approach to the word-striped approach without
> adequate study and make wrong inferences. The notion that the start of
> frame should be aligned with lane #0 is one of the false assumptions which
> surfaces again and again. It has mislead prominent supporters of the byte
> striped approach to totally false conclusions about the operation of skips
> and insertions which is handled at a higher level where the underlying
> number of lanes is not even visible or detectable, therefore, the
> assignment of a word to a particular lane can not have any relevancy at
> all. To give the word striped approach a fair evaluation, you cannot apply
> the operating rules of byte striping, there are basic differences. For
> those who do not have the time or inclination to study detailed
> presentations on the subject, I will try to explain the process of
> staggered word striping here in compact form for the case of four lanes:
I lost this battle in the Hari meetings and have been assimilated. I had
proposed a Lane ID. The winning proposal used the SOP in lane 0 to identify lane
0. I'm that this doesn't seamlessly work for Word-Striping. Column-Striping is a
very flexible concept and doesn't seem to have a problem supporting any function
thrown at it.
> The major differences between byte and word striping are the deskewing
> technique and the way byte or word synchronization is enabled via commas on
> each of the lanes.
> 
> Transmitter End:
> 1) In a first step, any minor reformatting required is performed (e.g. for
> the Idle).
Agreed. This is independent of Striping methodology.
> 2) A first word is then encoded for transmission on lane 0 using a starting
> disparity which equals the ending disparity of the previous word
> transmitted on lane 0. The ending disparity is stored in a latch for use by
> the next word on the same lane.
No such requirement exists for Column-Striping. No Byte/Word rule exists.
Each XGMII word is encoded as a column (i.e. across the four lanes). 
 
> 3) The serialization of the said first word is started on lane zero as soon
> as encoding is complete.
This is a Word-Striping disadvantage in terms of minimized latency. There is no
reason to wait to encode more that a single code-group per lane in order to
serialize and transmit.
> 4) Exactly the same steps are applied to the second word, except that it is
> transmitted on lane 1 and serialization starts one byte interval later.
This assumes that words on each lane are delayed by a code-group transmit time
equal to the lane number. This "word skewing" further complicates Word-Striping
initialization and further increases overall data latency.
> 5) The steps are repeated for the third and fourth word for transmission on
> lanes 2 and 3 respectively.
> 6) The 5th word fits behind the first word on lane 0 and the process
> continues as described above.
Column-Striping - Transmitter End:
1) Encode a 4-byte Word received from the XGMII to 4 Hari lanes for immediate
serialization and transmission. What could be simpler?
 
> Receiver End:
> Clock and data recovery can be identical for word and byte striping.
This is the most important similarity between the two Striping methodologies. In
all cases each lane must clock its 320 psec (bit) duration serial data at a
1.5625 GHz clock rate. In addition, each Hari lane is likely to be out of phase
with respect to each other lane requiring multiple 1.5625 GHz clock phases in
order to clock all lanes reliably. Furthermore, dynamic skew and/or lane jitter
complicate reliable bit processing. Bit processing CDR logic is the highest
speed, highest power and most demanding logic within Hari.
> However, for word striping, each lane assembles the deserialized words
> using derivatives of its respective receiver clock, not a shared clock
> phase. Up to this point, there is no interaction and alignment required
> among the four lanes. The separate words are then multiplexed into a shared
> register using a set of four 78 MHz clocks derived from any one of the 4
> input clocks by a hardwired arbitrary selection. The phase of each of the
> four clocks is offset by a quarter cycle (3.2 ns) from the next preceding
> clock. The shared register may be the first word cell of the FIFO, if rate
> adjustments are required via skip word removals or insertions. Clocking out
> of the register or FIFO is always done by a single shared 78 MHz clock
> which may differ from the input rate in the case of a FIFO. The large 12.8
> ns interval associated with each of the 4 inputs to the shared buffer
> allows hardwired skew elimination for peak to peak skews among the lanes in
> excess of 6 ns. For details of the relationship between the clocks and the
> skewed data transitions see the diagrams at:
> http://grouper.ieee.org/groups/802/3/10G_study/public/email_attach/word_staggered.pdf
Past CDR logic, the next highest speed, highest power and most demanding logic
within Hari is that of the deserializer. Proponents of Word-Striping claim as an
advantage the ability to reuse existing SerDes designs. Engineers familiar with
CDR and SerDes circuitry fully appreciate the complexity and difficulty this
critical logic and its affect on link performance if not properly designed or
utilized at 1 Gbps data rates. It is important to note that Hari proposes
roughly a 3X increase in line rate, a 4X increase in number of signals, and a 2X
increase in the number of CDR/SerDes circuits employed in a link in order to
control overall link jitter. This amounts to a 24X increase in power
consumption, complexity, pick your own term. This 24X increase is not to be
underestimated. Hari proposes to significantly alter the basic architecture of a
link by partitioning jitter domains at the expense of a 24X penalty if
traditional CDR/SerDes designs are utilized.
Perhaps the most important advantage of Column-Striping is the freedom to
optimize Hari CDR/SerDes designs and the resultant minimization of the 24X Hari
penalty. This is afforded primarily by migrating CDR/SerDes path logic into
parallel path logic following a thorough examination of required link
functionality.
Lets be realistic about data rates. A clock rate of 78 MHz corresponds to the
movement of (40-bit) encoded Words at a rate of 2.5 Gbps. In order to move 10
Gbps of data through your design at a 78 MHz clock rate, you would need to
employ 128-bit busses carrying 4 words. Surely you don't propose doing 128-bit
compares on 4 ordered-sets at a time, each of which may contain a function which
may need to be executed sequentially according the order of transmission?
Column-Striping - Receiver End:
1) Perform bit level CDR on each lane individually and forward bits to the
Deserializer @ 1.5625 GHz
2) Perform Deserialization on each lane individually and forward parallel data
to parallel logic for further processing @ a maximum rate of 312 MHz per 10-bit
value.
3) Parallel logic running at an implementation dependent rate performs
alignment, synchronization, deskew, error control processing, clock tolerance
compensation, retiming, code-group interpretation, etc.  
> Once word sync has been acquired on all lanes, no more commas are required
> for as long as synchronization is maintained. Skew drift (within the 6 ns
> specifications) does not affect operation, in contrast to byte striping,
> which may require adjustments to compensate for drift in delays.
Skew compensation tolerance is a function of the deskew pattern defined within
the IPG. Assuming that you agree that one cannot deskew regardless of Striping
methodology in the absence of an IPG (i.e. random data stream), Word-Striping
itself FIXES the maximum skew compensation tolerance at 1/2 the encoded Word
length (20-bits). Skew compensation tolerance for Column-Striping is dictated by
the deskew pattern. For 10 GbE, the proposed Hari KR pattern inherently provides
a 20-bit skew compensation tolerance in a striping methodology independent
manner.
> While the transport structure is word oriented, the content can be
> formatted in any desired fashion using control characters. The only rule
> is that the format must not introduce any misaligned commas. Isolated
> spurious commas which can be generated by transmission errors are filtered
> out, e.g. by a two-bit reversible counter stepped up by aligned commas
> and down by misaligned commas enabling resynchronization only in the `00'
> state. Well designed circuits should maintain synchronism over periods of
> months. Operation in this mode is the ultimate in protocol independence
> and more difficult to achieve with byte striped operation.
What's a "transport structure"? 8B/10B is a transmission code and is byte
oriented. Even the title of the patent, your patent, says this (US04486739 -
Byte oriented DC balanced (0,4) 8B/10B partitioned block transmission code). The
choice of striping granularity (transport structure?) beyond a byte is strictly
artificial.
The rule you state about misaligned commas is applicable to adjacent code-groups
and is independent of Striping methodology.
The functionality of the "two-bit reversible counter" you mention is embodied in
what's referred to as the Synchronization State Machine. For Gigabit Ethernet,
this functionality is described in clause 36 of the 802.3z standard. The same
functionality is applicable to each Hari lane.
All lanes need to achieve synchronization independently by reliably detecting
and aligning to comma boundaries. All lanes require separate synchronization
state machines regardless of Striping methodology.
You say all of this motherhood and apple pie stuff which is INDEPENDENT OF
STRIPING METHODOLOGY and then make the fallacious statement: "Operation in this
mode is the ultimate in protocol independence and more difficult to achieve with
byte striped operation." I have no problem with the first part about the
protocol independence. I have a major problem with the second part. Please
justify this claim and its relation to the preceding text if any. Otherwise,
please retract it.
> A misalignment would most likely be the consequence of a failure in the
> clock recovery loop. however, the byte striped approach has an additional
> failure mode in the skew control loop which must be active continuously
> to correct skew drift over time. Recovery from a failure generally will
> by associated with a longer traffic interruption for the byte striped version.
Column-Striping allows complete freedom in CDR/SerDes implementation to best
address the requirements of 10 GbE link architecture. I have no idea why anyone
implement a skew compensation function which does not work. This seems to be
what you're describing above. Yes, skew will drift over time, so will frequency.
What does this have to go with Striping methodology? Column-Striping handles
skew drift through the simple allocation of de-skew buffers big enough to handle
the maximum amount of skew between all lanes. The maximum amount of skew
obviously includes skew drift. The de-skew buffers operate in a manner analogous
to de-skew buffers for a standard parallel bus. After all, Hari is in essence a
parallel bus with each signal being a serial encoded transmission link. 
 
> The reasons why we like nevertheless a decent density of commas can be
> summarized as follows:
> 1) The 3.125 Gbaud lanes may not operate as flawlessly as similar lower
> rate systems.
> 2) If a loss of word synchronization on any of the lanes occurs because of
> clock circuit degradation or in the event of excessive externally induced
> noise, we would like to recover synchronization as quickly as possible so
> no more than a handful of packets per sync slip are lost until the failing
> hardware can be replaced or bypassed.
> 3) The commas provide useful diagnostic information and can easily be
> provided in a compatible way with the known protocols.
I agree with all of the above. However, it is more motherhood and apple pie
stuff which is independent of Striping
methodology.
 
> To repeat again, word-striping does not constrain formatting in any other
> way than limiting where commas can be placed (It must be an agreed uniform
> byte position within a word (0, 1, 2, or 3). There are no hard requirements
> for comma density except for initialization. Even during initialization, it
> is not required that commas be present on all lanes at the same time. All
> lanes acquire word alignment independently of each other.
Column-Striping places no constraints on the positional placement of commas
within lanes as does Word-Striping. The requirement for data to be arranged in
columns comes form the common sense notion of simple translation from wide
parallel busses, such as the XGMII and parallel logic on the 'other' side of
Hari to Hari's four serial lanes. Comma columns for synchronization and Skip
columns for clock tolerance compensation fall out of this translation. There is
no apparent benefit to NOT aligning Commas and Skips at the Transmitter in order
to simplify synchronization, clock tolerance compensation and deskew at the
Receiver.
> Clearly, we can multiplex a Fibre Channel stream unchanged over a
> word-striped transport system.
But you ARE proposing significant changes to the Fibre Channel stream including
breaking all Fibre Channel ordered-set rules. I've presented this my 'Hari Byte
vs. Word Striping' note of 11/26/99.
> Moreover, we can freely choose the number of lanes, 5, 4, 2, or 1 and
> convert easily from one number of lanes to another without any change except
> adjustments in the disparities.
Neither 10 Gigabit Ethernet, nor any gigabit or multi-gigabit standard that I'm
aware of is interested in choosing a number of lanes other than 1, 4 or 12.
Not-withstanding, Column-Striping is clearly independent of the number of lanes
and enables the maintenance of a much lower IPG size relative to Word-Striping
as the number of lanes is decreased or increased.
Column-Striping requires NO adjustments in the disparities regardless of the
number of lanes supported, for any reason.
> Likewise, an Ethernet format as described by H. Frazier is compatible with
> the exception of the Idle (which already is a modification of 1Gb version).
> It should not be difficult to reach agreement on a common Idle format. Note
> that the Idles would not even have to be exactly alike. Also, if you need a
> special Skip word, it can readily be defined with sufficient Huffman distance
> from other delimiters,
Hari and its coding proposed for 10 GbE in Kauai in November is 100% compatible
with the Ethernet format as described by Mr. Frazier. The supporters of the Hari
proposal already agree on the Ethernet Idle format. Those supporters include
proponents of Hari for Infiniband as well as Fibre Channel. 
Hari architecture differentiates between the Idles issued by the MAC during IPG
and the code-groups generated by the PCS during MAC Idle. PCS code-groups, KR,
arranged in columns provide the synchronization (K) and skip (R) functions. 10
GbE delimiters are simpler still, one code-group in length, and are exactly the
same as their Gigabit Ethernet counterparts. One of the tasks I was chartered
with during Hari development was to establish Hari coding objectives. Common
Idle formats at the encoded data level are not required. Protocol and PMD
independence are Hari requirements. Synchronization and Skip are two other Hari
requirements. Synchronization and Skip can be supported in a protocol and PMD
independent manner with Column-Striping.  
 
> We have the opportunity to study some improvements in word formats over
> existing practice as seen on the serial links without affecting anything at
> the established interfaces. We have made some limited suggestions in this
> direction (to simplify comma detection and disparity adjustments) and I am
> disappointed that you have used these optional changes in your
> argumentation against word striping. It is a totally separate issue and of
> secondary importance. Since these changes are visible only in the
> serialized formats, they should be of little concern to the protocol
> architects. If a majority does not want to consider improvements, such
> efforts can be abandoned.
A key requirement of Hari is PMD independence. Since Word-Striping requires
disparity adjustments to support the Serial PMD, and Column-Striping does not,
then I feel compelled to point this disadvantage out. This is exactly what I did
in assembling my Striping evaluation criteria. I believe that I have a
responsibility to my Hari co-proposers and the P802.3ae Task Force to justify
all elements of the Hari proposal.     
 
> The format for word striping is independent of the number of lanes. In this
> context, one might ask whether the rate of 3.125 Gbaud per lane is not too
> aggressive for a number of reasons. Some competent and experienced people
> from several companies think so. A word striped Hari with 5 lanes provides
> attractive features:
> 1) Implementation in a less aggressive technology which may be important
> because the macros have to be implemented not just in small transceiver
> related chips but also in large protocol chips and should not dictate the
> technology selection for the large chips.
> 2) The transmission rate at 2.5 Gbaud matches the Infiniband rate. A lot of
> standardization work related to the physical link specifications and the
> circuit designs can be shared.
> 3) A very low density of commas will provide reliable word synchronization
> information on all five lanes, because only commas which are modulo five
> words apart can show up on the same lane as the previous comma.
> 4) If several independent 1.250 Gbaud Ethernet links are trunked together,
> there are more design options because 2 links perfectly fit into a single
> lane. Conversion from 5 lanes to 4, 2, or a single lane is straightforward
> and not circuit intensive.
Please feel free to propose a 5-lane Serial 10 GMII if you like, just don't call
it Hari. I support Hari per the existing 4-lane proposal set aired in Kauai in
November. I and my company believe that, while the rate of 3.125 Gbaud per lane
is aggressive, it is appropriate in light of commonly available technology and
timely extensions.
As an orthogonal aside, it's funny to see you worrying about 3.125 Gbaud per
lane and not worrying about a 12.5 Gbaud Serial PMD based on 4 simple
multiplexed 3.125 Gbaud per lanes :-)  
 
> Synchronization: The word striped solution provides enough commas on all
> lanes with normal ordered sets. The 4-lane byte striped version requires a
> dedicated four-word sequence for synchronization, continuous monitoring of
> skew, and adjustment of skew parameters. The four-word sequence, if carried
> over to the media, is not suitable for lane numbers other than 4 and must
> be translated.
I have no idea how the above paragraph is related to its preface a:
Synchronization. It mixes synchronization, skew and lane numbers other 4
(irrelevant to 10 GbE). Only commas are used for synchronization. A 40-bit
pattern is required to perform a 20-bit deskew. Different 40-bit patterns are
proposed for Column- and Word-Striping. The Word-Striping pattern is:
a) inflexible insofar as skew granularity handling;
b) wasteful in terms of overhead since deskew is basically an initialization
process as is synchronization;
c) orthogonal in nature to the data format conveyed to it by the MAC.
I have no clue as to what is implied by the last sentence as it is clear that
Column-Striping as proposed is extendable to any number of lanes with no
protocol changes. All I can think of is that you have some specific PMD in mind
and failed to mention it. Please explain?    
> In contrast to word striping, the sequence is wholly dedicated to
> synchronization and skew adjustment. It cannot be used for anything
> else, it wastes bandwidth in the interframe or packet gap.
I believe that you're missing the whole point of Hari's KR sequence and the
separation of "church and state". Think of Synchronization as "church" and Skip
as "state". It's that simple. The Hari proposal EXACTLY meets historical
Ethernet objectives. You are completely mistaken about wasting bandwidth in the
IPG. Ethernet IPG is architected NOT to convey any information, but only to meet
Link functional requirements such as synchronization, clock tolerance
compensation, deskew, etc. and to provide a "breath of air" between packets to
packet processors. Ethernet architecture specifies MAC Control frames for
handling ULP information such as flow control, trunking, etc.
It is becoming clear to me that you are trying to turn Ethernet into something
that it doesn't want to be.
> Word striping gives more freedom to allocate this transport capacity
> for functions such as:
> 1) Flow control
> 2) Lane Identification in trunking mode
> 3) Packet length for scramblers on the media side of Hari
> 4) Comma character for coded scrambled data if the scrambler is on the MAC
> side. For such a configuration, a known simple synchronization technique
> overwrites a SONET framing character with a comma character for
> transmission over the coded segment and then restores the original format
> at the receiving end. For the case of 5 lanes, a single comma an even
> number of words apart will provide successive synchronization on all 5
> lanes. It is more difficult to do this with the byte striped format in
> spite of its vaunted protocol independence because of the more complex and
> larger synchronization pattern.
All I have to say is WOW! Now I'm REALLY glad I'm a Column-Striping proponent!
 
> Complexity: You have addressed this item in several communications (e.g. to
> Mark Ritter 12/12/99 03:54:11 AM), but I have the impression that you
> equate complexity with number of circuits or silicon area. The circuit area
> for the two approaches is comparable and not different enough to tilt the
> selection one way or the other. The circuit count is only one of several
> contributing factors to complexity. I would certainly rate generally any
> control loop as more complex than multiplexers and register to register
> transfers at rates which can readily be handled by standard logic circuits.
> Custom circuits and analog circuits should also be classified as more
> complex than standard logic. Circuits of this type are more costly to
> develop for high volume manufacturability. There is also more effort
> involved to develop suitable production tests and such tests usually
> require more circuit overhead than standard logic circuits. The byte
> striped solution requires a circuit macro to align the 4 lanes first at the
> baud interval and then at the byte interval. This is by its nature a
> control loop which operates on inputs from other control loops (Clock
> recovery) and almost certainly includes some high speed custom circuits. I
> think most designers would classify it as a complex circuit. Traditional
> circuits performing the identical or comparable function at lower signaling
> rates are familiar power hogs. Things may be different now, but it
> certainly would help understanding your position if you could point to some
> applicable literature references, I would not dare to insist on an exact
> circuit count or part number! Instead, I would prefer an answer to why Hari
> should be saddled with this extra, complex circuit macro when we can
> clearly do without it by adopting word striping.
You again seem to be choosing implementations which start out by not working
where I would choose a different starting point and a top down architecture
which considers all link elements and functions. Once again, perhaps the most
important advantage of Column-Striping is the freedom to optimize Hari
CDR/SerDes designs and the resultant minimization of the 24X Hari penalty. This
is afforded primarily by migrating CDR/SerDes path logic into parallel path
logic.
 
> In a previous note, Mark Ritter referred to the deskew macro as extra
> complexity and you have dismissed this reminder in your note of 12/12/99
> 03:54:11 AM as a mere "emphatic assertion". Granted, you have never
> emphatically denied the need for such a control loop, but you have managed
> to consistently ignore its existence and leave it out of any evaluation.
> Can you make the case that deskewing in all its aspects at the stated rate
> with byte striping is simpler than the extra multiplexers and register to
> register transfers at 78 MHz required for word striping?
Since both Striping methodologies have to process data at 10 Gbps, and deskew
requirements for each are the same (<20 bits maximum skew), and deskew
processing is an implementation dependent (control loops between parallel logic
and the deserializer are but one implementation alternative afforded by
Column-Striping), I believe that of utmost importance is the meeting of
requirements while best addressing all other system requirements such as power
consumption. Unfortunately, you seem to tie Word-Striping to a desire to utilize
existing power hungry SerDes/CDR designs. I have no such desires, and therefore,
must conclude that the deskew process for Column-Striping may be implemented at
the same rate as for Word-Striping. In the end, same rate deskew processing
combined with simpler, lower power SerDes/CDR designs enabled by Column-Striping
better meet 10 GbE objectives.
> In the same note you disagree with the statement "Byte striping makes link
> deskew much more complex and requires a unique initialization sequence to
> deskew fully" and ask for proof to the contrary. The proof is that word
> striped systems without this skew alignment circuit operate in mainstream
> products of major companies.
Let me get this straight. You said that Column-Striping is more complex
(simplified) and your proof is referencing a Word-Striping system. So What? What
system is this, and how does that prove that Word-Striping is superior to
Column-Striping? In what way is it superior? 
This is the crux of the whole Striping methodology discussion. I maintain the
Word-Striping is an inferior element of a multi-lane serial link architecture
and have established evaluation criteria to compare Striping methodologies.
Let's stick to the criteria. Your proof does nothing to help evaluate Striping
methodologies. 
> From your own presentations, it is evident that byte striping relies on a
> delicately crafted multiple word sequence for initialization and, contrary
> to your assertions in an other note, word striping can recover in normal
> traffic from a loss of synchronization. Since deskew timing is hard wired
> for the word striped approach, no adjustments are needed beyond word
> synchronization, independent for each lane.
I don't exactly view the proposed Hari Idle KR sequence as a "delicately crafted
multiple word sequence". In fact, it's simpler than the 2 code-group Gigabit
Ethernet Idle sequence which had two flavors to adjust for disparity. No
disparity adjustments are required for the Column-Striped Hari 10 GbE Idle.
Ever. For any reason. The same cannot be said about Word-Striping. In addition,
the KR Idle sequence IS the initialization sequence.   
> Power Dissipation: In several notes you have claimed an advantage for byte
> striping which is not justified for the following reasons:
> 1) The byte striped version requires a skew control loop as referenced
> above which includes custom circuits for the inter-lane skew alignment
> which unlikely operates with negligible power. There is no such circuit
> required for word striping and I see no accounting for this difference in
> your notes.
This is merely an implementation enabled by Column-Striping which is not
possible with Word-Striping. Other implementation alternatives exist. Since
these implementations are sensitive in nature, I'd rather focus on the
architectures which enable the best implementations, especially in terms of
power consumption. The key point is that Hari ballpark power consumption is 24X
GbE with traditional SerDes/CDR designs which Word-Striping proponents are
trying to preserve, for reasons beyond my comprehension.
> 2) You have correctly identified extra registers and multiplexers for the
> word striped approach and attributed extra power to these circuits without
> accounting for the differences in switching rates. I have shown in a
> previous note that the power in this area can be expected to be comparable
> based on a well known equation. Neither you nor anyone else has claimed
> that the equation does not apply to the situation or is used incorrectly.
> You simply ignored it and repeated without any new justification your claim
> of an advantage.
I agree that the power in the parallel logic is low and that the difference
between the two Striping methodologies is probably negligible since like
implementations based on the two methodologies can operate at the same clock
rates. Do you dispute this in any way? The advantage for Column-Striping lies
primarily with the high traditional SerDes/CDR power consumption for
Word-Striping.   
> The best case you can make is that perhaps the difference in power is not
> significant compared to the power used in clock recovery and off-chip
> drivers. However, a significant power penalty could accrue from the byte
> striped deskew circuit. Since word striped Hari does not have a comparable
> circuit, it is up to the advocates byte striping to present the case that
> their deskewing technique uses negligible power, unless we all assent to a
> privileged status for the byte striped proposal.
One only becomes privileged (for a moment) by opening up to new ideas. Just
concentrate on that SerDes/CDR since that's the power hog in a Hari system.
 
> Protocol Independence: You repeat this vague claim over and over again, but
> not once have you shown why word striping would be any worse. Outside the
> Idle portion, both approaches have equally loose constraints, and for the
> Idle portion, byte striping is clearly more restrictive which can be
> harmful as explained for a scrambling example above. Protocol compatibility
> with Infiniband is of secondary importance because of the different data
> rates and other differences.
So you disagree that any consideration be given to InfiniBand as a common ~10
Gbps protocol! This is clearly against the goals of the Hari work which strived
for protocol and PMD independence as primary goals. My 10G market view has 10
GbE being dominant in the LAN and extending to the MAN/WAN and Fibre Channel
being dominant in the SAN (Storage). However, Something must serve all of this
data. Enter servers with higher performance busses than PCI. This is a primary
InfiniBand application. I view all 10G markets as complementary. Highest volumes
and lowest costs will be realized by the use of common technologies. I'm sorry
that you view InfiniBand as "of secondary importance".
I've said this before and I'll say it again: Essentially the same Idle formats
have been proposed for Hari for 10 GbE and InfiniBand. I haven't finished my 10
GFC Hari proposal yet (spending way too much time on Striping methodologies),
but the only additional and separate requirement that I can garner is the
support of ordered-sets within a 10 GFC IFG. These can easily be added to the
Hari IPG/IFG function suite.  I won't comment of the scrambling example since I
believe that it's too far off in left field to go chase.
 
> Mapping for 10 Gb Fibre Channel and Ethernet: I find it surprising and
> strange that you assert an advantage here. Please recall the many hours and
> days you and others put into the effort to define a suitable Idle/Skip
> structure for byte-striping and then reflect how much simpler it would have
> been within the word striped constraints.
Please recall that the proposed (Mr. Frazier) 10 GbE Idle pattern was KR columns
prior to the Hari group getting together. Once all the issues, protocols and
PMDs were placed on the table, it was evident to the Hari group that we could
easily achieve protocol and PMD independence with Column-Striping. ...AND there
would be NO changes required to the 10 GbE Idle pattern to boot!
 
> PMD Dependencies: Please reveal to us for which type of PMD word striping
> is worse and why. I have explained in a previous note why the KKKK/RRRR
> sequence is not suitable for the 12.5 Gbaud PMD.
I don't want to go off into a PMD tangent in this note, so I'll focus on the
alternative you present only. That is, a LAN PHY Serial PMD @ 12.5 Gbaud.
Personally, I don't like the 12.5 Gbaud requirement because it is based on the
high 8B/10B overhead without consideration of widely available 10 Gbaud
components. I'd like to see this PMD operating @ ~10 Gbaud and still
transporting exactly 10 Gbps of data per its P802.3ae objective. This means that
8B/10B Hari coding is stripped off. Therefore, no 8B/10B code-group sequences
are relevant. 
In the case that support for this PMD exists, I would propose to replace the IPG
for the PMD medium. I agree that KKKK/RRRR is an EMI ugly pattern.
> You also published another note on related issues on 12/16/99 05:07:52 AM.
> I disagree with many of the facts and conclusions presented there as well
> but refrain from a point by point refutation. I am confident that readers
> familiar with both byte and word striping can figure out the reasons for my
> disagreement themselves, otherwise, I can be reached at the phone below
> most of the time.
> 
> Respectfully,
> 
> Albert Widmer            Tel. 914 945-2047          Email:
> widmer@xxxxxxxxxx
> IBM T.J. Watson Research Center
> Yorktown Heights, NY 10598-0218
-- 
Merry Christmas and Happy New Millennium,
Rich
------------------------------------------------------------- 
Richard Taborek Sr.         Tel: 408-330-0488 or 408-370-9233       
Chief Technology Officer                   Cell: 408-832-3957
nSerial Corporation             Email: rtaborek@xxxxxxxxxxxxx  
2500-5 Augustine Dr.           Alt email: rtaborek@xxxxxxxxxx 
Santa Clara, CA 95054