Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: [EFM] OAM - Faye's seven points



Geoff,
 
I am pretty sure your points are valid.  Based on my past
experience (that is not the only scenario, I do recognize
that!), here are some points.
 
The assumptions I stated 'to avoid sending a technician' are: 
 
1. CPE is cheap, therefore it will be quite expensive to
have an alternative dial up management interface dedicated
for each CPE.  The only way any management commands
from carrier can get to CPE is through the headend.
 
2. When a CPE is in trouble, not necessarily just the EFM
link that is bad.  This can be software hanging or subscriber
line went bad.  (This means reset can reset both the CPE
OR the subscriber line).
 
3. If the first reset CPE command timed out, falls back to
'link keep-alive' stage and determine if the link is good/bad?
Note that if the 'reset' command is issued by EMS to the
headend or CPE directly, EMS usually does have retries.
This becomes a design issue with headend to handle
the proxy commands correctly that not to congest it's
management links to the CPEs.
 
In another words, there are two management segments:
 
One from EMS to head-end
and one from head-end to CPE(s)
 
The later is closer to device management (like ILMI in
ATM) than legacy network management. 
 
These are my assumptions, please correct me if I am
wrong.
 
-faye
-----Original Message-----
From: Geoff Thompson [mailto:gthompso@xxxxxxxxxxxxxxxxxx]
Sent: Thursday, September 20, 2001 9:47 AM
To: Faye Ly
Cc: stds-802-3-efm
Subject: RE: [EFM] OAM - Faye's seven points

Faye

At 04:54 PM 9/18/01 -0700, Faye Ly wrote:
Geoff,
 
Some OAM traffic is more critical than others.  For example -
 
OAM command like 'reset' (in our case, reset CPE) should not be retried.

Actually, I don't agree. Resets should be confirmed by the entity being reset. If a confirmation is not received in a "reasonable" amount of time then the protocol should try some set number of times before giving up and assuming that communication has been lost with the entity.

Certainly don't want to reset the CPE a couple of times  just because network is slow.

Agreed, but that doesn't negate what I said above rather it says that your reset retry protocol should have a reasonable amount of time between retrys.

Giving up means sending a technician to the field to actually toggle the power button on the CPE.

If this is the case then the equipment vendor will have done a bad job of systems design. Hopefully there will be enough information elsewhere in the system to help figure out if the cable has gone open or hopelessly noisy or the far end power has gone down or anyone of a number of other real live faults as opposed to a far-end microprocessor program counter going off into the weeds. That is the result of poor design.

This is very expensive. 

Agreed

The whole reason of requesting for a dedicated OAM channel/IPG/whatever is to gurantee that no actual human needs to be sent to the field.   Maybe this is not do-able but we ought to try our best.

I do not believe that there is any correlation between the need to send/not send a technician to the field and the presence of a separate OAM channel.


On a side note -
 
Can you please clarify the statement "P2P PHYs do not drop packets"?

P2P PHYS don't drop packets any more (or any less) than any other piece of pipe.
There are only 2 places for bits to go in a setup that consists of:
________                    _________
P2P PHY |_____MEDIUM_______| P2P PHY
________|                  |_________

1) Go where they are supposed to.
2) Not go there, in which case you can't communicate with anything in the far end.

When the link is reestablished (or gone around) then there are already plenty of counters to look at in the existing MAC management to count the lost packets.


This is good.  I don't need to keep all those dropped packets/bytes
error counters then.  Thanks.
 
-faye

Geoff