Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective



This is a second retransmission of a message I sent to the reflector
last Friday. I'm not sure why it is bouncing. This time I converted the
message to plain text. Hopefully it'll get through.

-----Original Message-----
From: McAlpine, Gary L
Sent: Friday, May 14, 2004 9:57 AM
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: RE: [8023-CMSG] Proposed Upper Layer Compatibility Objective


Jeff,

Let me try to come at this from the other direction.

I think what we are trying to enable is a new class of 802.3 L2 subnet
with properties more like a system interconnect than a LAN (actually
something between a system interconnect and a LAN). The desire is for
the new class to have properties that are important to interconnects
utilized in blade system backplanes (for servers systems, storage
systems, and communications systems), SANs, and parallel processing
cluster interconnects (for the rest of this email I will refer to this
new class as "CI" for cluster interconnect).

The CI should be able to operate under heavy loading conditions with an
extremely low frame loss rate and efficient BW utilization (increased
reliability, minimized error recovery, high throughput). It must also be
able to provide low latency and latency variations under heavy loading
conditions.

The primary transport and network layers for utilizing Ethernet in this
space are TCP & IP. There are various developments around the industry
aimed at reducing the latencies and overheads in server stacks, which
will enable server nodes (in the near future) to consume a 10G link
while executing the associated applications (at least in benchmark
applications, which are important selling tools).

I think the primary target link technology for the CI is 10G full duplex
Ethernet. Although support for 1G will be required, I really don't see a
need to include Fast Ethernet. In addition, probably 99% of the
application space for the CI could be addressed with a link length
limitation of 100 meters or less. Also, probably 99% of CIs could be
constructed with no more than 2 to 3 stages of switching and 3 to 4
hops. (Of course, the same could probably be said for a large percentage
of non-client LAN subnets within the typical datacenter).

We know, with the appropriate rate controlling mechanisms we can control
the frame loss and latency characteristics under heavy loading
conditions within the CI class L2 subnets. We also know there is some
throughput tradeoff, depending on the rate control mechanisms employed
(we are testing mechanisms of varying complexity with some showing
almost no throughput degradation). In addition, we know we can use
prioritization and differentiation to get really good latency & jitter
characteristics on high priority traffic (unfortunately, prioritization
and differentiation isn't always an option).

We have a couple basic choices for supporting the low latency & jitter
requirements of some applications and traffic: 1) If the applications
and/or the system are implemented to take advantage of prioritization,
the latency & jitter sensitive traffic can be transported at the
appropriate priority for the requirements, 2) If prioritization is not
an option, we can use rate control to constrain the latency of all the
traffic to meet the requirements of the latency sensitive traffic.

To utilize prioritization of traffic through sockets over TCP/IP, we
need some form of traffic differentiation service that can be utilized
from applications. This is why it may be important to ensure that what
we do in L2, to meet the CI requirements, will operate in harmony with
DiffServ (or something else). I don't see this as much of a problem
within a CI because one wouldn't put a router in the middle of the CI
subnet. From the CI subnet's point of view, it's all L2 switches and
endpoints. Where things get sticky is where the CI subnet connects into
a larger network through a router or bridge. And, since there may be a
significant overlap between what we define as CI space and what is high
speed datacenter LAN space, one might be inclined to wonder if
datacenter LANs could also benefit from the mechanisms we define to
enable the CI space.

So I guess the question is, if we utilize rate control and
prioritization to control latencies and frame discards in datacenter LAN
L2 subnets, how will this affect the operation of the ULPs (negatively
or positively)? Rate control will quickly exert backpressure (within the
confines of the L2 subnet)  to the upper layers in repsonse to
congestion. If the upper layers are at an endpoint, traffic should
simply backup into the endpoint buffers. If the upper layer is the IP
layer in a router or is IP functionality in an L2+ switch, how will this
affect affect the operation?

Thanks,
Gary
-----Original Message-----
From: Jeff Warren [mailto:IEEE@nc.rr.com]
Sent: Thursday, May 13, 2004 11:06 AM
To: McAlpine, Gary L; STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective


Hi Gary,

I agree we can't ignore "How does what we do at L2 impact UPL". Yikes -
there are many ULP's, not just DS.

I agree that "We leave the discard decisions up to the UPL".

I can't tell you exactly how CM will impact DS until I know what CM
does? I am consulting with one of the original DS authors, Steve Blake,
he and I use to work together a few companies ago when we were both at
IBM, this was when DS was standardized. We discussed this point this
morning and the general feeling was that a FDX point-to-point (PTP) L2
Ethernet link should not be making implicit or explicit selective packet
drop decisions, that this would in a negative way impact the operation
of DS.

If what CM does is to channelize traffic across a PTP link (high or low
BW, short or long distance, inside a chassis across an Ethernet
backplane or externally across a physical copper or optical link) into
'N' number of Classes (CoS transmission buffers) then when one Class is
"some how" determined to be using too much BW and that Class is "Turned
OFF" for some period of time there is a high likelihood that packets in
that Class will be dropped. When this happens you've just impacted DS
ability to do its drop probability calculations properly........... more
to come on this as we better understand what CM does.

NOTE: 802.1p has already defined a L2 "marker" that is used as a form of
L2 rate control. 802.1p is used by intermediate L2 switches (between
router hops; DS Hops) to provide intermediate L2 class of service
prioritization. Most switch/router products that are worth purchasing
use this feature along with DSCP (L3) to prioritize traffic across their
available ingress & egress ports.


Regards,

   - Jeff
----- Original Message -----
From: McAlpine, Gary L
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Sent: Thursday, May 13, 2004 11:48 AM
Subject: Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective


Jeff,

I am perfectly fine with NOT having such an objective if you think we
can get away with it. I'm not sure we can ignore how, what we propose to
do at L2, will affect the operation of the upper layers.

On the subject of DiffServ, I would simply propose that, within the
limited scope of the high speed, short range, L2 interconnects we are
trying to enhance, we leave the discard decisions up to the upper layers
while maintaining the desired latency and traffic differentiation
qualities within layer 2. We have shown through simulations that layer 2
rate control mechanisms can eliminate (or significantly reduce frame)
discards within layer 2 (effectively pushing the discard decision up to
L3). We have also shown that rate control combined with prioritization
in layer 2 can maintain excellent latency and traffic differentiation
qualities.

Maybe you can explain to us how this is likely to affect the operation
of DiffServ. I haven't dug deep enough into DiffServ to know if it is
counting on L2 devices to discard. My intuition is telling me it is
likely to improve the operation of DiffServ, as well as other upper
layer protocols of interest.

Gary




 -----Original Message-----
From: owner-stds-802-3-cm@LISTSERV.IEEE.ORG
[mailto:owner-stds-802-3-cm@LISTSERV.IEEE.ORG] On Behalf Of Jeff Warren
Sent: Wednesday, May 12, 2004 7:58 PM
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Subject: Re: [8023-CMSG] Proposed Upper Layer Compatibility Objective


Gary,

You mentioned improved operation of DiffServ as a goal for CM. DS is a
collection of a number of RFC's, here's the basic set of DS RFCs.
RFC2474 "Definition of the Differentiated Services Field (DS Field) in
the IPv4 and IPv6 Headers"
RFC2475 "An Architecture for Differentiated Services"
RFC2597 "Assured Forwarding PHB"
RFC2598 "An Expedited Forwarding PHB"
What did you have in mind for supporting ULP's such as DS? For example
this L3 protocol's purpose in life is to decide drop probabilities for
individual packets on a hop-by-hop basis. For example assured forwarding
has three levels of drop precedence (red, yellow, green). Then there's
expedited forwarding in the highest priority queue, plus best effort,
etc......

I've been wondering how an "Ethernet" standard is going to assist a L3
protocol such as DS by classification MAC traffic? Would you propose
duplication of or cooperation with DS protocols? Maybe you let DS do its
thing and present a CM enabled Ethernet egress port with remarked
packets that are fortunate enough to have passed across the devices
fabric w/o being dropped. This CM enabled Ethernet port would then align
the offered MAC load to the queues it has available, it does this for
example by inspecting DSCP values and comparing these DSCP values to a
predefined buffer table. Bla bla bla......

The lights haven't gone off for me yet, I don't see the value in CM
supporting DS because switch manufactures have already figured this one
out and implemented this concept of multiple egress queues working at
line rate. Plus these implementations require global knowledge of
networking policies to configure them properly, and more importantly to
standardize them were talking about linking behavior of ingress and
egress ports, knowledge of system wide (e.g. switch or router) buffer
management capabilities, all very much vendor specific capabilities that
would be very difficult to get everyone to agree to.

Regards,

   - Jeff Warren



----- Original Message -----
From: McAlpine, Gary L
To: STDS-802-3-CM@LISTSERV.IEEE.ORG
Sent: Wednesday, May 12, 2004 5:44 PM
Subject: [8023-CMSG] Proposed Upper Layer Compatibility Objective


My A.R. from last meeting (thank you Ben).
Here's a first shot at a Upper Layer Compatibility objective:
The objective is:
"To define 802.3 congestion control support that, at a minimum, will do
nothing to degrade the operation of existing upper layer protocols and
flow/congestion control mechanisms, but has the explicit goal of
facilitating the improved operation of some existing and emerging
protocols, over 802.3 full-duplex link technology."

If we can narrow the scope and still make it meaningful, I'm all for it.

I have attached RFC3168 (on ECN) as a reference. It contains a very good
overview of congestion control at the TCP and IP layers. I would also
consider this and DiffServ as examples of existing ULPs we would want to
do our part to improve the operation of. IMO, what we can do at the
802.3 level to better support these will also provide the support we
need for improved operation of some emerging protocols such as iSCSI and
RDMA.
Gary <<rfc3168.txt>>