Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

----- Original Message -----

From: Benjamin Brown

To: STDS-802-3-CM@LISTSERV.IEEE.ORG

Sent: Monday, May 17, 2004 2:18 PM

Subject: Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective

Jeff,

Let me see if I understand what you're trying to say. Differentiated
Services (DiffServ - DS), as an example upper layer protocol (ULP),
applies DS code points (DSCPs) to its traffic. These DSCPs are a
tag that classify different types of traffic. Since ULPs see layer 2 as
a simple pipe, it neither expects nor desires any kind of support for
these various DSCPs. In other words, if congestion occurs and packets
must be dropped in the layer 2 "pipe" it would make no difference to
DS what kind of traffic was dropped. It could be done totally randomly.

Jeff Warren wrote:

YES – A L3 classifier does classify traffic into “profiles”, or traffic classes. The fields a classifier examines can vary. A systems provider (i.e. manufacture of said L3 device) may or may not offer the same ability to search the same set of fields as some other systems provider. Additionally the customers, perhaps a Service Provider may chose to use very different rules for the determination of DSCP values from another SP. But the fact remains yes an initial DSCP value is calculated for an IP packet at the ingress of a DS domain. Along theL3 route this DSCP value can change, a router can change it in a way that cause the packet to have a higher probability to be dropped in a subsequent hop for example.

It is my believe that DS as a standard does not have any explicit dependences on L2 technologies, however in real life implementations of DS are developed by engineers with the full understanding of the underlying hardware’s capabilities. For example running DS over a L2 interface such as SONET or ATM might rely on the hardware capabilities of a network processor if an NP is used to implement that interface type. If the interface is GE this might be a MAC controller from one of the popular Ethernet silicon providers, etc. Each of these underlying L2 hardware implementations may have different “queuing” capabilities, some may rely on on-chip RAM, others off chip memory, etc. The behavior of these hardware capabilities can and do come into play with regards to the flexibility (e.g. granularity) of “configuring” how DS “works” for a given interface. The interface scheduling parameters such as minimum and maximum BW, maximum latency, buffer sizes, etc could be different in the above example for the GE, SONET, and ATM interfaces. Buffers may be statically allocated at run time or dynamically allocated during the execution of a configuration command or change in the presence of long term congestion, etc.

When a customer goes to the expense of acquiring DS capable products and then spends even more on people expenses to properly configure (traffic engineer) a network it should be done in a way where L2 link BW’s are optimized for the expected loads. Then congestion is minimized, however L2 congestion can never be completely eliminated even with over-provisioning. The key is that when it does occur LET a wire-speed implementation of DS (one worth purchasing) deal with the congestion by altering drop probabilities, i.e. changing DSCP values in a consistent manner across the DS domain.

However, should the layer 2 provide some level of support, it would
be advantageous to DS if there was some type of mapping between
DSCPs and layer 2 priority tags. These priority tags could then be
used to separate the traffic into priority queues. Now, when buffer
congestion occurs and packets must be dropped, the lower priority
packets can be dropped before the higher ones. As long as the
mapping is done right, the packets that DS thinks are the most
important are least likely to be dropped.

Jeff Warren wrote:

In a previous e-mail I mentioned that some vendors already provide a proprietary means of mapping DSCP values to 802.1 priorities on egress ports. These proprietary implementations can also look at received L2 tags and use them in the same manner as DSCP values.

According to David Martin it sounds like 802.1ad is “standardizing” this mapping as we type. This new 802.1ad standard combined with L2 Ethernet back planes within modular systems sounds like a good match. The link between blades in the modular chassis is a L2 link with the potential to duplicate DS capabilities using 802.1 priority markers.

Now the challenge is for CM to sort out which DS capabilities (e.g. Random Early Discard (RED) and Weighted RED) to reproduce in L2 Ethernet controllers such that the combined L3 and L2 links work consistently.

Assuming I'm close on the above, why then is it so hard to take this
one more step? If Ethernet can make advances so that the highest
priority traffic is even less likely to drop packets in the face of congestion,
further aiding DS (and other ULPs) to maintain the traffic flow of the
traffic they think is the most important, why is that bad? Indeed, why
isn't that justification for this project?

Jeff Warren wrote:

It’s not a bad thing. I think it’s just a matter of duplicating the L3 ULP capabilities across the L2 link, NOT coming close to matching the capabilities. Remember the expensive part is human costs to do the traffic engineering; you wouldn’t want to do this for the L3 links and then allow the L2 links to alter the expected behavior. I’m sure there are probably compromises that can be made in an attempt to minimize the complexities of CM, after all it’s Ethernet and it’s suppose to be plug and play, not a complex set of knobs to twists like configuring DS.

The whole approach of creating priorities within 802.1 (P802.1Q)
and now assigning a mapping between DSCPs and 802.1 priorities
so that everybody does it the same (isn't this being done in P802.1ad?)
is considered very worthwhile to most people. It seems to me that
it may be equally worthwhile to extend this further into layer 2 so
that a mechanism exists to communicate, not only within a bridge
but between bridges, when buffer congestion is occurring for these
different priorities so that steps can be taken to preserve the highest
priority traffic.

Jeff Warren wrote:

Yes – do this the way DS does, change drop probabilities; now at L2 change 802.1 markers. You hit on the hard part, defining a L3->L2 mechanism to communicate the expected behavior in light of congestion.

Other comments from a seasoned DS implementer:

Steve wrote:

If CM is allowed to drop packets in switches you’re relying on TCP to discover the loss, which won't happen until a timeout, which means that the end-2-end application latency will be high. If you use (more granular than 802.3x) flow control you can design a TCP stack in such a way that it discovers that the send buffer is filling up and it can then inform the application to perform some rate control/prioritization.

This is not an attempt to "improve Diffserv" so much as to work around a design choice in TCP (the retransmission timer) that is not optimized for networks with really short RTTs. Since 802.3 can't redesign the TCP protocol, but it can dictate changes to Ethernet flow control that would allow a TCP implementer to react to congestion more quickly than to wait until a loss event triggers a time-out.

Here are a couple of comments to this:

TCP will detect a loss if it gets 3 duplicate ACKs, such as would happen if the receiver discovered a loss (because it sees a later packet with a discontinuous sequence number). This can be triggered much more quickly than a timeout, but the receiver will only send the duplicate ACKs when it finally sees a packet from the sender that was sent after the packet that was lost. This is only likely to happen if the switch implements some form of random discard (e.g., RED) when the buffer starts to fill, so that a large burst of packets from the sender aren't all dropped at once.

If the switch implemented ECN (RFC3540), then the receiver could quickly notify the sender to back off faster than the duplicate ACK mechanism. ECN as defined in RFC3540 happens at the IP layer by manipulating two bits adjacent to the DSCP bits. In theory you could add ECN to Ethernet if you can find a free bit in the header somewhere: the MAC at the next IP node (router or end-host) could check if this MAC-layer ECN bit was sent and then notify the IP layer, which could choose to mark the IP ECN bits.

NOTE: Both of these scenarios should be simulated before CM tries to stick something complicated into the MAC layer.

Cheers,

    - Jeff (over-and-out, see you next week in LB)

Thanks,
Ben

Jeff Warren wrote:

Hi Gary,

I agree we can't ignore "How does what we do at L2 impact UPL". Yikes - there are many ULP's, not just DS.

I agree that "We leave the discard decisions up to the UPL".

I can't tell you exactly how CM will impact DS until I know what CM does? I am consulting with one of the original DS authors, Steve Blake, he and I use to work together a few companies ago when we were both at IBM, this was when DS was standardized. We discussed this point this morning and the general feeling was that a FDX point-to-point (PTP) L2 Ethernet link should not be making implicit or explicit selective packet drop decisions, that this would in a negative way impact the operation of DS.

If what CM does is to channelize traffic across a PTP link (high or low BW, short or long distance, inside a chassis across an Ethernet backplane or externally across a physical copper or optical link) into 'N' number of Classes (CoS transmission buffers) then when one Class is "some how" determined to be using too much BW and that Class is "Turned OFF" for some period of time there is a high likelihood that packets in that Class will be dropped. When this happens you've just impacted DS ability to do its drop probability calculations properly........... more to come on this as we better understand what CM does.

NOTE: 802.1p has already defined a L2 "marker" that is used as a form of L2 rate control. 802.1p is used by intermediate L2 switches (between router hops; DS Hops) to provide intermediate L2 class of service prioritization. Most switch/router products that are worth purchasing use this feature along with DSCP (L3) to prioritize traffic across their available ingress & egress ports.

Regards,

   - Jeff

----- Original Message -----

From: McAlpine, Gary L

To: STDS-802-3-CM@LISTSERV.IEEE.ORG

Sent: Thursday, May 13, 2004 11:48 AM

Subject: Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective

Jeff,

I am perfectly fine with NOT having such an objective if you think we can get away with it. I'm not sure we can ignore how, what we propose to do at L2, will affect the operation of the upper layers.

On the subject of DiffServ, I would simply propose that, within the limited scope of the high speed, short range, L2 interconnects we are trying to enhance, we leave the discard decisions up to the upper layers while maintaining the desired latency and traffic differentiation qualities within layer 2. We have shown through simulations that layer 2 rate control mechanisms can eliminate (or significantly reduce frame) discards within layer 2 (effectively pushing the discard decision up to L3). We have also shown that rate control combined with prioritization in layer 2 can maintain excellent latency and traffic differentiation qualities.

Maybe you can explain to us how this is likely to affect the operation of DiffServ. I haven't dug deep enough into DiffServ to know if it is counting on L2 devices to discard. My intuition is telling me it is likely to improve the operation of DiffServ, as well as other upper layer protocols of interest.

Gary

-----Original Message-----
From: owner-stds-802-3-cm@LISTSERV.IEEE.ORG [mailto:owner-stds-802-3-cm@LISTSERV.IEEE.ORG] On Behalf Of Jeff Warren
Sent: Wednesday, May 12, 2004 7:58 PM
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective

Gary,

You mentioned improved operation of DiffServ as a goal for CM. DS is a collection of a number of RFC's, here's the basic set of DS RFCs.

RFC2474 "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers"

RFC2475 "An Architecture for Differentiated Services"

RFC2597 "Assured Forwarding PHB"

RFC2598 "An Expedited Forwarding PHB"

What did you have in mind for supporting ULP's such as DS? For example this L3 protocol's purpose in life is to decide drop probabilities for individual packets on a hop-by-hop basis. For example assured forwarding has three levels of drop precedence (red, yellow, green). Then there's expedited forwarding in the highest priority queue, plus best effort, etc......

I've been wondering how an "Ethernet" standard is going to assist a L3 protocol such as DS by classification MAC traffic? Would you propose duplication of or cooperation with DS protocols? Maybe you let DS do its thing and present a CM enabled Ethernet egress port with remarked packets that are fortunate enough to have passed across the devices fabric w/o being dropped. This CM enabled Ethernet port would then align the offered MAC load to the queues it has available, it does this for example by inspecting DSCP values and comparing these DSCP values to a predefined buffer table. Bla bla bla......

The lights haven't gone off for me yet, I don't see the value in CM supporting DS because switch manufactures have already figured this one out and implemented this concept of multiple egress queues working at line rate. Plus these implementations require global knowledge of networking policies to configure them properly, and more importantly to standardize them were talking about linking behavior of ingress and egress ports, knowledge of system wide (e.g. switch or router) buffer management capabilities, all very much vendor specific capabilities that would be very difficult to get everyone to agree to.

Regards,

   - Jeff Warren

----- Original Message -----

From: McAlpine, Gary L

To: STDS-802-3-CM@LISTSERV.IEEE.ORG

Sent: Wednesday, May 12, 2004 5:44 PM

Subject: [8023-CMSG] Proposed Upper Layer Compatibility Objective

My A.R. from last meeting (thank you Ben).

Here's a first shot at a Upper Layer Compatibility objective:

The objective is:
"To define 802.3 congestion control support that, at a minimum, will do nothing to degrade the operation of existing upper layer protocols and flow/congestion control mechanisms, but has the explicit goal of
facilitating the improved operation of some existing and emerging protocols, over 802.3 full-duplex link technology."

If we can narrow the scope and still make it meaningful, I'm all for it.

I have attached RFC3168 (on ECN) as a reference. It contains a very good overview of congestion control at the TCP and IP layers. I would also consider this and DiffServ as examples of existing ULPs we would want to do our part to improve the operation of. IMO, what we can do at the 802.3 level to better support these will also provide the support we need for improved operation of some emerging protocols such as iSCSI and RDMA.

Gary <<rfc3168.txt>>
--
-----------------------------------------
Benjamin Brown
178 Bear Hill Road
Chichester, NH 03258
603-491-0296 - Cell
603-798-4115 - Office
benjamin-dot-brown-at-ieee-dot-org
(Will this cut down on my spam???)
-----------------------------------------