Re: Does Ten-Gigabit Ethernet need fault tolerance?
Rich,
Actually most fault tolerance in mainframes is accomplished by transaction processing.
This is not instantaneous.  Some systems have multiple nodal concurrent processing.  The
fail over process occurs within the completion time of two transactions.  These are
somewhat expensive because the systems are duplicated.  This commonly done in clustered
systems.
Some systems have lock-step hardware fail over.  These are VERY expensive and the fail
over is within a two instruction completion cycle.  I have worked on the FT SParc
systems as well other FT systems.
The technology and mechanisms to do this type of fault tolerance is expensive.  It might
apply to building FT routers/switches, that have the same reliability as current voice
switches (MTBF of 200+ years).  ( Nodal failure account for about 90% of the reliability
issues in the Internet.  Transport reliability is handled by SONET/SDH. ) I have some
doubts that it would apply to transport link services.
Thank you,
Roy Bynum
MCI WorldCom
Rich Taborek wrote:
> Roy,
>
> I apologize for not including any FT restoration times in my note. However, my point
> is that restoration time is typically zero in a mainframe channel environment.
> Furthermore, I see very little difference today between the directions mainframe
> channel netwoks are taking and where Ethernet already is. The basic components that
> are required to constuct a FT network are already present in Ethernet.
>
> Best Regards,
> Rich
>
> --
>
> Roy Bynum wrote:
>
> > Rich,
> >
> > I hate to burst your bubble.  I would not call a traffic restoration of minutes
> > "fault tolerance".  If you will look at the functionality of an L3 only traffic
> > restoration service you will find it to be unable to do better in complex
> > architectures.  Even on the OC48 POS systems that use the SONET over head fault
> > detection to trigger an L3 restoration event, it still greatly exceeds the
> > restoration time of an L1 (SONET/SDH) only restoration process.  Even use of the
> > fault restoration within ATM greatly exceeds an L1 only restoration process.  This
> > kind of traffic restoration is NOT fault tolerance.  At present, only the tightly
> > coupled remote error indication process of SONET/SDH can be considered fault
> > tolerance; an L1 only process, even then it is greatly effected by the
> > implementation architecture.
> >
> > Thank you,
> > Roy Bynum
> > MCI WorldCom
> >
> > Rich Taborek wrote:
> >
> > > Joe,
> > >
> > > I heartily agree with Mick's point. Fault Tolerance in Ethernet, if implemented
> > > at all, is generally implemented in at L3 or above (the MAC). I believe this to
> > > be appropriate for Ethernet networks. It seems to me that FT requirements are
> > > orthogonal to data transport requirements which are the purview of Ethernet.
> > > Here are some examples of networks which support FT:
> > >
> > > 1) WANs implementing SONET FT at level 1: Very, very expensive but capable of
> > > reliably transporting any type of data reliably upon protocol conversion. FT at
> > > level 1. FT at other levels is essentially ancillary.
> > >
> > > 2) Mainframe channels: Fairly fault tolerant individually, and virtually
> > > fault-free considering multi-path configurations and multiple hosts controlled
> > > by a single operations point. I would put these in the very expensive category.
> > > FT at levels 1, 2 and 3 and above.
> > >
> > > 3) Commodity servers, Ethernet and FT extensions such as Link Aggregation and
> > > Rapid Reconfiguration: Fault tolerant as a system, expensive only in comparison
> > > to non-fault tolerant systems (i.e. dirt cheap compared to other FT
> > > alternatives). FT at levels 3 and above.
> > >
> > > I also agree that sufficient failure detection is already built into Ethernet.
> > >
> > > I'd put my money at the threshold of door #3.
> > >
> > > Best Regards,
> > > Rich
> > >
> > > --
> > >
> > > "Mick Seaman" wrote:
> > >
> > > > What needs to be built in is the detection of failure. What we don't need to
> > > > do is to build everything into the MAC. I suggest you look at the fault
> > > > tolerant capabilities provided by P802.3ad and the work on Rapid
> > > > Reconfiguration starting in 802.1.
> > > >
> > > > Both these (will) provide a degree of fault tolerance based on using
> > > > protocols that are independent of MAC details to allow network nodes to
> > > > precalculate their response to a low level indication of failure. There is
> > > > really no need to build these protocols into the MAC.
> > > >
> > > > Mick
> > > >
> > > > -----Original Message-----
> > > > From: owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx
> > > > [mailto:owner-stds-802-3-hssg@xxxxxxxxxxxxxxxxxx]On Behalf Of Joe Gwinn
> > > > Sent: Friday, July 16, 1999 3:15 PM
> > > > To: stds-802-3-hssg@xxxxxxxx
> > > > Subject: Does Ten-Gigabit Ethernet need fault tolerance?
> > > >
> > > > The purpose of this note is to present a case for inclusion of fault
> > > > tolerance in 10GbE, and to offer a suitable proven technology for
> > > > consideration.  However, no salesman will call.
> > > >
> > > > Why Fault Tolerance?  Ten-Gigabit Ethernet is going to be a relatively
> > > > expensive, high-performance technology intended for major backbones,
> > > > perhaps even nibbling at the bottom end of the wide-area network (WAN)
> > > > market.  In such applications, high availability is very much desired; loss
> > > > of such a backbone or WAN is much too disruptive (and therefore expensive)
> > > > to be much tolerated, and this kind of a market will gladly pay a
> > > > reasonable premium to achieve the needed fault tolerance.
> > > >
> > > > Why add Fault Tolerance now?  Because it's easiest (and thus cheapest) if
> > > > done from the start, and because having FT built in and therefore becoming
> > > > ubiquitous will be a competitive discriminator, neutralizing one of the
> > > > remaining claimed advantages of ATM.
> > > >
> > > > Isn't Fault Tolerance difficult?  In hub-and-spoke (logical star, physical
> > > > loop) topologies such as GbE and10GbE, it's not hard to achieve both fault
> > > > tolerance (FT) and military-level damage tolerance (DT).  In networks of
> > > > unrestricted topology, it's a lot harder.  The presence of bridges does not
> > > > affect this conclusion.
> > > >
> > > > How do I know that FT is so easily achieved?  Because it's already been
> > > > done, may be bought commercially, and is in use on one military system and
> > > > is proposed for others.  The FT/DT technology mentioned here was developed
> > > > on a US Navy project, and is publically available without intellectual
> > > > property restrictions.  Why was the technology made public?  To encourage
> > > > its adopotion and use in COTS products, so that defense contractors can buy
> > > > FT/DT lans from catalogs, rather than having to develop them again and
> > > > again, at great risk and expense.
> > > >
> > > > What is the difference between Fault Tolerance and Damage Tolerance?  In
> > > > fault tolerance, faults are rare and do not correlate in either time or
> > > > place. The classic example is the random failure of hardware components.
> > > > (Small acts of damage, such as somebody tripping over a wire or breaking a
> > > > connector somewhere, are treated as faults here because they are also rare
> > > > and uncorrelated.) In damage tolerance, the individual faults are sharply
> > > > correlated in time and place, and are often massive in number. The classic
> > > > military example is a weapon strike. In the commercial world, a major power
> > > > failure is a good example. Damage tolerance is considered much harder to
> > > > accomplish than fault tolerance. If you have damage tolerance, you also
> > > > have fault tolerance, but fault tolerance does not by itself confer damage
> > > > tolerance.
> > > >
> > > > How is this Damage Tolerance achieved?  All changes in LAN segment topology
> > > > (the loss or gain of nodes (NICs), hubs, or fibers) are detected in MAC
> > > > hardware by the many link receivers, which report both loss and acquisition
> > > > of modulated light. This surveillance occurs all the time on all links, and
> > > > is independent of data traffic. Any change in topology provokes the
> > > > hardware into "rostering mode", the automatic exploration of the segment
> > > > using a flood of special "roster" packets to find the best path, where
> > > > "best" is defined as that path which includes the maximum number of nodes
> > > > (NICs).
> > > >
> > > > Just how fault tolerant and damage tolerant is this scheme?  A segment will
> > > > work properly with any number of nodes and hubs, if sufficient fibers
> > > > survive to connect them together, and will automatically configure itself
> > > > into a working segment within a millisecond of the last fault. If the
> > > > number of broken fibers is less than the number of hubs, all surviving
> > > > nodes will remain accessible, regardless of the fault pattern. If the
> > > > number of fiber breaks is equal to or greater than the number of hubs,
> > > > there is a simple equation to predict the probability of loss of access to
> > > > a typical node due to loss of hubs and/or fibers, given only the number of
> > > > hubs and the probability of any fiber breaking: Pnd[p,r]= ((2p)(1-p))^r,
> > > > where p is the probability of fiber breakage and r is the number of
> > > > surviving hubs (which ranges from zero to four in a quad system). This
> > > > equation is exact (to within 1%) for fiber breakage probabilities of 33% or
> > > > less, and applies for any number of hubs.
> > > >
> > > > The simplicity of this equation is a consequence of the simplicity of this
> > > > protocol, which is currently implemented in standard-issue FPGAs (not
> > > > ASICs), and works without software intervention.  It can also be
> > > > implemented in firmware.
> > > >
> > > > To give a numerical example, in a 33-node 4-hub segment, loss of 42 fibers
> > > > (16% of the segment's 264 fibers) would lead to only 0.5% of the nodes
> > > > becoming inaccessible, on average. Said another way, after 42 fiber breaks,
> > > > there are only five chances out of a thousand that a node will become
> > > > inaccessible. This is very heavy damage, with one fiber in six broken. To
> > > > take a more likely example, with three broken fibers, all nodes will be
> > > > accessible, and with four broken fibers, there is less than one chance in a
> > > > million that a node will become inaccessible. Recovery takes two ring tour
> > > > times plus settling time (electrical plus mechanical), typically less than
> > > > one millisecond in ship-size networks, measured from the last fault.
> > > > Chattering and/or intermittent faults can be handled by a number of
> > > > mechanisms, including delaying node entry by up to one second. Few current
> > > > LAN technologies approach this degree of resilience, or speed of recovery.
> > > >
> > > > In commercial systems and some military systems, a dual-ring solution is
> > > > sufficient.  Up to quad-ring solutions are comercially available, needed
> > > > for some military systems.  However, the ability to support up to quad
> > > > redundant systems should be provided in 10GbE, for two reasons.  First,
> > > > quad is needed for the military market, which may be economically
> > > > significant in the early years of 10GbE.  Second, quad provides a clear
> > > > growth path and a way to reassure non-military customers that their most
> > > > stringent problems can be solved: One can ask them if their needs really
> > > > exceed those of warships duelling with supersonic missiles.
> > > >
> > > > The basic technical document, the RTFC Principles of Operation, is on the
> > > > GbE website as "http://grouper.ieee.org/ groups/802/3/ 10G_study/public/
> > > > email_attach/ gwinn_1_0699.pdf" and "http://grouper.ieee.org/
> > > > groups/802/3/10G_study/ public/ email_attach/ gwinn_2_0699.pdf".   I was a
> > > > member of the team that developed the technology, and am the author of
> > > > these documents.
> > > >
> > > > Although these documents assume RTFC, a form of distributed shared memory,
> > > > the basic rostering technology can easily be adapted for Gigabit and
> > > > Ten-Gigabit Ethernet as well.  For nontechnical reasons, RTFC originally
> > > > favored smart nodes connected via dumb hubs.  However, the overall design
> > > > can be somewhat simplified if one goes the other way, to dumb nodes and
> > > > smart hubs.  This also allows the same dumb nodes to be used in both non-FT
> > > > and FT networks, increasing node production volume, and does not force
> > > > users to throw nodes away to upgrade to FT.
> > > >
> > > > I therefore would submit that 10GbE would greatly benefit from fault
> > > > tolerance, and also that it's very easily achieved if included in the
> > > > original design of 10GbE.
> > > >
> > > > Joe Gwinn
>
> -------------------------------------------------------------
> Richard Taborek Sr.    Tel: 650 210 8800 x101 or 408 370 9233
> Principal Architect         Fax: 650 940 1898 or 408 374 3645
> Transcendata, Inc.           Email: rtaborek@xxxxxxxxxxxxxxxx
> 1029 Corporation Way              http://www.transcendata.com
> Palo Alto, CA 94303-4305    Alt email: rtaborek@xxxxxxxxxxxxx