Date: Wed, 22 May 96 11:32:32 +0100 From: "Trevor Warwick INF-SP" Organization: Madge Networks To: p8021@hepnrc.hep.net (802-1), 100271.522@compuserve.com (Tony Jeffree) Subject: P802.1d/D11 comment Bug in example spanning tree code in ISO 10038 ---------------------------------------------- There is one serious bug in the example "C" code in the standard. We can argue all day about exactly how closely implementors were intended to follow this code, but I think it is now a fact of life that many people used the code almost verbatim. Therefore, it is important that the bug is fixed. The bug allows the propagation of stale information by a non-root bridge. It arises because the only mechanism in the code that checks for the root's message being too old is message_age_timer_expiry(), which is called from a single "for all ports" loop in the tick() function. This allows the following sequence of events to occur: 1) Receive BPDU from good root, on (root) port 2, with Message Age = 15, Max Age = 15. This BPDU is stored, and is valid, because it has not yet been timed out. 2) Hold timer prevents transmission of BPDU on (designated) port 1. 3) Tick() routine runs. The single loop in this routine will allow the transmission of a BPDU on port 1, using the stored value of Message Age, since the message_age_timer_expired() routine has not yet run for port 2. 4) The BPDU is transmitted on port 1, probably with Message Age = 16, Max Age = 15. There is another sequence as follows: 1) Receive BPDU from good root, on (root) port 2, with Message Age = 15, Max Age = 15. This BPDU is stored, and is valid, because it has not yet been timed out. 2) Receive BPDU from a worse new root or worse new designated bridge on (designated) port 1. 3) Use reply() procedure to send back BPDU. 4) The BPDU is transmitted on port 1, probably with Message Age = 16, Max Age = 15. There is nothing in the code that will stop this happening again, next time round. I have seen a situation with a circular configuration of four two-port bridges, where there was a valid spanning tree message from the true root circulating clockwise, and an invalid message from an old root circulating anti-clockwise with a continually incrementing Message Age field. All ports were forwarding... Architecturally, I am unsure of the correct way to go about fixing this. The same problem occurs in the textual description in 4.6 onwards, because there is no formal statement about exactly how the timers are supposed to operate. I would actually prefer to see 4.6 to 4.8 replaced with a formal state table, but I accept that there may not be much interest in doing this after five years. Pragmatically, there are two "C" code changes that can be made to acheive a simple, though rather inelegant, fix. Firstly, the loop in tick() is split in two, so that all message_age_timer_expired() routines are called first. Secondly, a check is added to transmit_config() to ensure that a message is never transmitted with a Message Age greater than Max Age. These changes can be used to fix broken implementations today while a better solution is found. /*======================================================================*/ /* section of transmit_config() */ config_bpdu[port_no].max_age = bridge_info.max_age; /* (4.6.1.3.2(7)) */ config_bpdu[port_no].hello_time = bridge_info.hello_time; config_bpdu[port_no].forward_delay = bridge_info.forward_delay; config_bpdu[port_no].topology_change_acknowledgment = port_info[port_no].topology_change_acknowledge; /* (4.6.1.3.2(8)) */ /* ** This line removed */ /* port_info[port_no].topology_change_acknowledge = FALSE; */ /* TW Fix */ /* (4.6.1.3.2(8)) */ config_bpdu[port_no].topology_change = bridge_info.topology_change; /* (4.6.1.3.2(9)) */ /* ** This clause added */ /* ** TW Fix ** We have to be sure that we're not about to propagate a stale message */ if (config_bpdu[port_no].message_age < bridge_info.max_age) { send_config_bpdu(port_no, &config_bpdu[port_no]); port_info[port_no].topology_change_acknowledge = FALSE; /* (4.6.1.3.2(8)) */ } /*======================================================================*/ /* section of tick() */ /* ** TW Fix ** Processing of message_age_timer moved out of loop, as ** this is part of the fix to avoid propagating stale ** information. It ensures that all received messages have been ** aged out before we do any further timer-based processing. */ for (port_no = 1; port_no <= No_of_ports; port_no++) { if (message_age_timer_expired(port_no)) { message_age_timer_expiry(port_no); } } for (port_no = 1; port_no <= No_of_ports; port_no++) { if (forward_delay_timer_expired(port_no)) { forward_delay_timer_expiry(port_no); } /* if (message_age_timer_expired(port_no)) ** { ** message_age_timer_expiry(port_no); ** } */ if (hold_timer_expired(port_no)) { hold_timer_expiry(port_no); } } /*======================================================================*/