[RPRWG] [OAM-AH] Flows

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

The issue of which OAM flows should be supported by the RPR layer was discussed in the OAM-Ad Hoc during the Ottawa meeting. Regarding the Echo flow it was the consensus of the group that it should be supported, but the opinions were divided regarding: CC, RDI and Activation/Deactivation flows.

We decided to ask for inputs from the whole group, before we make changes to the draft. So, as per Mike's email, anyone that has an opinion on this issue is invited to send me an email. Next Sunday (5/26) I will send an email to all the interested people with the list of participants in the Ad-Hoc. From then on, all the OAM Ad-Hoc related emails should be sent to that list only (of course, anyone that will want to be added to the list latter on can do so by sending me an email). I suggest to start with emails, and then set up a conference call to reach the final decision of the Ad-Hoc.

To help you decide if you want to be part of the OAM Ad-hoc and as a start point, here is my opinion on this issue (it reflects what has been accepted, as for today, into the draft):

RPR defines a: "network technology optimized for the use in the MAN" (from draft 0.2), and as such it has to provide tools that will enable maintenance and fault localization in a fast and cost effective way for large (in number of stations and ring circumference) networks, that may be shared by different managing entities. This is different from the LAN case in which the network is restricted to a limited geographical area managed by a single entity.

An important feature of OAM is to be able to discover faults fast (at least before the user complains) and indicate where the fault is: Station and layer. Thus each layer should have its own OAM mechanisms, to allow the segregation and the "agnostic" behavior.

One such mechanism is Echo, basically defined as: send a frame to a destination, the destination should loop the Echo back, and the source expects to receive the looped frame. The source waits for Tsec, and if no response is received back it declares a failure. Echo is a very good tool during configuration and fault localization, but it is not a continuous fault monitoring tool. We could of course define a "Continuous Echo" mode in which the Echo frame is sent continuously, but this method will not cover all the faults.

As an example, let us assume that we have a ring and at some point in time a configuration change in a Station, or a fault, makes the Station address identical to another Station address. The Echo will still be OK, but it may be looped by the wrong Station, no fault indication will be raised (note that lower layers will not see this fault).

Let us now look how CC operates: Each Station sends to each other station a frame once every Tsec (continuously). The destination station expects these frames, and if it does not receive them within nxTsec it rises a fault indication. In the example above, if a Station is stealing the CC frames, the destination Station will not receive them, and a fault indication will be raised. Now the network manager knows which Stations are affected, and the failure location.

RDI is useful in single side failures (for example a Station is missconfigured in one ringlet only). In this case only one Station will discover the fault, the other side will still receive the CC and operate normally. So the RDI is the vehicle to indicate that a failure has been detected by the other station, without the need to correlate faults through management (that may be from different service providers).

The example above is a simple one, I suppose that the topology mechanism may eventually find that there is something wrong, but the CC will add immediately very valuable information, and in my opinion the task of the topology discovery mechanism is exactly what it declares: to discover the topology, and not to discover faults and fault locations. Note also that the fairness keep alive will not discover this type of faults.

It has been claimed that RPR is a MAC and it is not connection oriented, and that CC is more suited for ATM and MPLS. My opinion is that CC does not dictate connection oriented, it only verifies connectivity in our shared media between any pair of stations. In other words, since RPR is a shared media without physical connection between non adjacent stations, the CC can be viewed as a heart beat between any pair of Stations.

Regarding the CC timeouts, this is something that has to be discussed. My opinion is a CC once per second, at this rate it can be implemented either in the MAC hardware or in the MAC software (for a 128 Stations ring, 127 CC frames have to be sent and 127 monitored, every second). I also think that this has to be an optional mechanism, since service providers may prefer to save bandwidth at the cost of lower availability figures. The Bandwidth required for a 128 stations ring with CC enabled between all the stations is ~1.5Mbps per ringlet (0.15% for a 1Gbps ring).

Regarding the Activation/Deactivation flow it allows the Station to start CC without the need to coordinate the operation through management. It is useful specially when the Stations are "owned" by different management entities, but it also saves coordination of the activation and deactivation of the CC sink side with the CC source side to avoid unwanted alarms during CC configuration. It is my opinion that this flow will be handled by the MAC Software, and the timers where set accordingly. It is also optional, and the process allows supporting Stations to interoperate gracefully with non-supporting stations.