Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

RE: [RPRWG] D2.2 and new IPS Sequence # processing



Jason:

 

I got tied up in a few things, so I haven’t been able to spend as much time as would be liked. I still have not received any e-mails from the IPS/Topology AdHoc, so I am still in the dark regarding any progress being made. I am assuming I was never added to that list.

 

Anyhow, some observations regarding IPS stability and the sequence checks…

 

I modified my code based on D2.2 to allow dynamic changes to the sequence number check algorithm used.

 

Using the current D2.2 check on sequence #, the ring never seems to converge (as per my original comments). Part of this was because I was operating my state machines event driven rather than storing packet info and processing later when all other checks occur. That can be quite burdensome for the S/W. Also, there are some really bad startup conditions to attempt to overcome.

 

I tried the sliding window approach (Case 2 below), which is essentially your original Check1/Check2 in a more optimized form. If the sequence # is >=, I will process the packet. This works well, subject to some caveats described below.

 

Case 1 below simply processes all packets regardless of sequence number. This proves to be fairly unstable. It worked better than the plain Yes/No check in the current specification, but suffered from instability and severe protocol flapping when entering/exiting wrap modes (note: I did not try this in a steered mode). I would see several hundred transitions/packet generations happening from a single event (and that was when it stabilized fast). This would seem to indicate we do need some type of sequence checking to enforce “newness” rules.

 

The sliding window method of sequence number checking worked well until the S/W on a previously connected node is restarted, and the start sequence # is then in the wrong half of the sequence range. At this point, the protocol stale-mates – just like the current D2.2 bad startup conditions. There are other ways this can happen, but this is the easiest to describe.

 

What I would recommend (and I will try to verify the concept in the near future), would be to store the last “bad” sequence number received. If 3 (or some configurable constant) consecutive sequences are received which are in the forward moving direction of the stored “bad” sequence #, reset the sequence # to adopt the new sequence to the “bad” number. This will force a resynchronization, but allow adoption of the windowed approach of sequence checking.

 

Additionally, I believe there needs to be a value for “sequence number currently not set”, and certain state changes (like signal fail, topology change etc), that can set the stored sequence number to that value which would allow “any” new sequence number to be automatically declared valid.

 

        switch (RPR802_17IPSCheckAndReserveInNodeArray (ring_unit, node_num, ringlet, mac_addr))

        {

            case RPR_802_17_EXISTING_ENTRY:

                switch (rpr_802_17_sequence_check_type)

                {

                    case 1:

                        /* Skip the sequence check in this mode */

                        ring_unit->node_info [ringlet][node_num]->current_sequence_num = sequence;

                        process_data = TRUE;

                        break;

                    case 2:

                        /* Try a sliding window sequence number */

                       

                        if (((sequence +

                             ~(ring_unit->node_info [ringlet][node_num]->current_sequence_num &

                               RPR_802_17_IPS_SEQNUM_MASK) + 1) & 0x20) == 0)

                        {

                            ring_unit->node_info [ringlet][node_num]->current_sequence_num = sequence;

                            process_data = TRUE;

                        }

                        else

                        {

                            printk (KERN_WARNING "IPS: Node: %lu, Ringlet: %u - Sequence check failed. Recv: %u, stored: %u.\n",

                                node_num,

                                ringlet,

                                sequence,

                                ring_unit->node_info [ringlet][node_num]->current_sequence_num);

                        }/*IF*/

                        break;

                    case 0:

                    default:

                        /* The official D2.2 specification check */

                        if (sequence != ring_unit->node_info [ringlet][node_num]->current_sequence_num)

                        {

                            ring_unit->node_info [ringlet][node_num]->current_sequence_num = sequence;

                            process_data = TRUE;

                        }/*IF*/

                        break;

                }/*SWITCH*/

                break;

            case RPR_802_17_NEW_ENTRY:

                ring_unit->node_info [ringlet][node_num]->current_sequence_num = sequence;

                process_data = TRUE;

                break;

            case RPR_802_17_REPLACED_ENTRY:

                ring_unit->node_info [ringlet][node_num]->current_sequence_num = sequence;

                process_data = TRUE;

                break;

            case RPR_802_17_ALLOC_FAILURE:

                bad_packet = TRUE;

                printk (KERN_WARNING "IPS: Node: %lu, Ringlet: %u - Unable to locate an empty node.\n",

                    node_num,

                    ringlet);

                break;

            default:

                /* All enums are accounted for...must be a program error */

                printk (KERN_WARNING "IPS: Program error. Invalid enumerated value received for type RPR_802_17_ALLOC_RESULT.\n");

                break;

        }/*SWITCH*/

 

Hope this information helps,

 

Regards,

 

Michael Allen

 

 

-----Original Message-----
From: owner-stds-802-17@xxxxxxxxxxxxxxxxxx [mailto:owner-stds-802-17@xxxxxxxxxxxxxxxxxx]On Behalf Of Jason Fan
Sent: Tuesday, April 22, 2003 1:07 PM
To: Michael Allen; stds-802-17@xxxxxxxx
Subject: RE: [RPRWG] D2.2 and new IPS Sequence # processing

 

Hi Michael,

 

Thanks for your inputs. There is work going on in the PAH related to the protection state machine to simplify and clarify its presentation. Jim and I will make sure that you are added to the PAH mailing list. The normal PAH meeting time is 9:30 am on Tuesdays, and call-in information is sent weekly by Jim.

 

I'm glad that you are implementing and testing the topology and protection portions of the standard. This will be very important to determine problems in the state machines as currently defined. Your inputs at the PAH will be quite valuable.

 

In terms of your questions, the intent of the protection state machine is that all checks for a given state must be performed upon any trigger that causes entry to the state machine (until a check passes). This will be made clear in upcoming versions of the draft. The information that needs to be stored for handling WTR expiration is neighbor station information that enables a given station to determine what action to take when its WTR timer expires. Only messages received from the short path neighbor station are relevant, so information in messages from other stations on the ring doesn't need to be separately stored.

 

Since you have a test implementation of the protection state machine, it would be great if you would bring up issues that you've seen with the definition of the sequence number check to the PAH, and also via comment.

 

-- Jason

-----Original Message-----
From: Michael Allen [mailto:michael_allen@xxxxxxxxxxxxxxx]
Sent: Tuesday, April 22, 2003 10:16 AM
To: Jason Fan; stds-802-17@xxxxxxxx
Subject: RE: [RPRWG] D2.2 and new IPS Sequence # processing

Hi Jason:

 

If I follow your logic to completion, this would mean I have to cache *ALL* the IPS messages from *ALL* nodes and then when any event occurs (like WTR), I would have to reprocess all those messages. This is because some of the tests are based on packets from neighbors, and some are tests against non-neighbors (state 28 for example).

 

In addition to the points I made below about sequence #’s, there can be problems with startup. When a neighbor is instantiated, the sequence # is often 0 (structures are normally zero initialized). As such, the first neighbor message will probably have a zero sequence too. Wouldn’t this cause that message to be ignored?

 

As a side note…I defeated the sequence check, and the state machine seems to work much more reliably – I actually end up with totally idle nodes rather than stick WTR’s.

 

Regards,

 

Michael

 

-----Original Message-----
From: Jason Fan [mailto:Jason@xxxxxxxxxxxx]
Sent: Tuesday, April 22, 2003 10:02 AM
To: Michael Allen; stds-802-17@xxxxxxxx
Subject: RE: [RPRWG] D2.2 and new IPS Sequence # processing

 

Hi Michael,

 

The intent of line 36 is that if a TP frame is received from a neighbor that meets the conditions of line 36 prior to the WTR expiring, the relevant information in the TP frame will still be available for the purposes of the check when WTR expires. At that point the transition will occur into the IDLE state. The expiration of WTR is a trigger for doing state machine processing just as when a new TP frame is received.

 

-- Jason

-----Original Message-----
From: Michael Allen [mailto:michael_allen@xxxxxxxxxxxxxxx]
Sent: Monday, April 21, 2003 5:15 PM
To: stds-802-17@xxxxxxxx
Subject: [RPRWG] D2.2 and new IPS Sequence # processing

I have been implementing the D2.2 IPS state machine and run into what I believe is an issue with the new sequence # check. If I interpret the check correctly, it looks like the first packet received with a new sequence # is processed & then all future ones are suppressed (until the sequence # changes again).

 

It seems like this can cause a problem when attempting to unwrap a link (on a wrapping ring), since a WTR node would require state 36 to fire to get out of the WTR and the neighbor is the same as the original. The problem is when the packet defined in state 36 arrives, but the WTR timer has not expired. Once the packet has been processed, and the timer later expires, it is not possible to get into state 36.

 

Also, state 40 is the only state that formally copies neighbor addresses. Since the whole state machine banks on the neighbor MAC addresses being updated at the right time, this should be in the state tables.

 

It seems as though ALL ClearXxxSideEdgeStatus() operations should remove a wrap. That way, reception of IPS with STEER mode can switch the ring from wrapping to steered immediately, and any current wrap conditions will get removed. Only the wrap operation should be conditional on all nodes being able to wrap.

 

Comments are welcome,

 

Regards,

 

Michael Allen