Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Does Ten-Gigabit Ethernet need fault tolerance? (nonredundant NICs)




Joe,

You use the term "rostering algorithm".  Does this mean that the using P802.3ad
would not be a simple binary decision circuit built into the chip?  Very few fault
tolerance systems have more than a binary structure, those are tertiary with simple
"lockstep" hardware logic.

Over the years, I have learned that the closer that you get to the level that is
being "protected" the faster and more reliable fault tolerance is.  In this case, it
is the optical transport that is being "protected".  I am all for the use of link
aggregation for existing 802.3 interfaces, primarily because fault allowance
technology does not exist for them otherwise.  Simple hardware fault tolerance
technology does exist for 10gb interfaces today.

10GbE will most likely not be implemented over BLSR rings in its early stages of
deployment.  This is because of the massive amount of fiber transport facilities
that are being deployed today.  I do think that any WAN implementation will use the
2km interface directly into what is called a "lite LTE".  This is an LTE that has
line/segment SONET/SDH OAM&P functionality, without the TDM multiplexing of a
standard LTE.  The 10GbE will have path overhead functionality only.  This type of
interface will need very simple fiber maintenance functionality, the kind that is
resolved by a simple binary hardware solution.

The 40km implementation will be used over metropolitan, leased fiber systems.  These
will be, for the most part, diverse path 1+1 systems.  This kind of deployment will
need very robust, tightly coupled fault tolerance functionality.  Without the
ability to control fiber breaks, fiber degradation, and other fiber related issues,
the ability to switch to alternate receiver with minimum loss of data traffic will
be paramount.  I have a hard time believing that any upper layer functionality can
accomplish this with 100% reliability.

Thank you,
Roy Bynum
MCI WorldCom


Joe Gwinn wrote:

> Roy,
>
> At 9:12 PM 99/7/24, Roy Bynum wrote:
> >
> >Does RTFC allow a minimally trained individual to simply plug two fiber T/R
> >pairs into the 10GbE interface to implement fault tolerance and if a second T/R
> >pair, parallel to the first, is not plugged in the fault tolerance is not
> >implemented?  This will be the simplest and most common implementation process.
>
> Yes, this will work, by design.  The rostering algorithm will just treat
> the missing path as broken, and press on.  There is no problem with parts
> of the segment having non-redundant NICs, although those NICs will be cut
> out of the segment if those NICs or their links fail.
>
> Joe
>
> >Joe Gwinn wrote:
> >
> >> Those people reading the RTFC Principles of Operation may wish to keep a
> >> few points in mind:
> >>
> >> RTFC assumes smart NICs and dumb hubs.  However, for 10GbE we probably
> >> would want dumber NICs and smarter hubs, both to allow a single NIC design
> >> to be used for both FT and non-FT systems, and to allow the hub to assign
> >> NodeIDs, rather than having to depend on humans to manually set these in
> >> strictly increasing numerical order on the NIC cards.  Letting users out of
> >> handling such a critical step is probably essential.  So, don't feel that
> >> the partitioning set forth in the RTFC Principles of Operation is in any
> >> way required; I would expect GbE to repartition the rostering algorithm
> >> between NICs and hubs.
> >>
> >> The NICs could be designed such that up to four standard-issue NIC cards
> >> could be installed side-by-side in a single PCI bus, strapped to
> >> communicate with one another via the PCI bus they share, together
> >> implementing a RTFC quad node.  Or, one could implement dual-ring NIC
> >> cards, with two dual cards side by side to implement a quad redundant
> >> system.  My guess would be that most commercial users will find that dual
> >> suffices, while there are military applications for both dual and quad
> >> redundant systems.  Basically, dual redundant suffices for ordinary fault
> >> tolerance, but quad redundant is often needed for battle damage tolerance.
> >>
> >> Joe
> >>
> >> The basic technical document, the RTFC Principles of Operation, is on the
> >> GbE website as:
> >>
> >> http://grouper.ieee.org/
> >> groups/802/3/10G_study/public/email_attach/gwinn_1_0699.pdf
> >>
> >> http://grouper.ieee.org/groups/802/3/10G_study/public/email_attach/gwinn_2_0
> >> 699.pdf
> >>
> >> The first document is the text, and the second contains the drawings.