|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: TCP Framing (considered helpful?)
Replies in text below (between [Huff] and [/Huff] ).
.
.
.
John L. Hufferd
Senior Technical Staff Member (STSM)
IBM/SSG San Jose Ca
(408) 256-0403, Tie: 276-0403, eFax: (408) 904-4688
Internet address: hufferd@us.ibm.com
Stephen Bailey <steph@cs.uchicago.edu>@ece.cmu.edu on 05/21/2001 06:50:41
AM
Sent by: owner-ips@ece.cmu.edu
To: ips@ece.cmu.edu
cc:
Subject: Re: TCP Framing (considered helpful?)
John,
> I think we must depend on Markers to insure that everything can operate
at
> top speed, and at the lowest cost.
A key question is whether markers actually ensure that everything
operates at `top speed, and at the lowest cost'.
Matt thinks so. I (and, presumably those who wrote the framing
document) think not.
[Huff] I do not think you can say that. I also support framing (Warp), as
a much more elegant solution, but I find it inappropriate to depend on it
actually happening, and being made available in the various OSs in the time
frame we need. I do believe, that over time it will be made available, and
is a better approach for all TCP/IP applications that can use it. [/Huff]
My issue is not even with `lowest cost'. I don't believe markers will
allow you to run at top speed. Specifically:
1) I doubt the feasibility of implementing the control required for
an eddy buffer (where you store data you can't place) at 10G.
Admittedly, the validity of this claim can't really be assessed
without actually working the implementation, so for 99% of the
list participants (myself included) this is a `yes it is, no it
isn't' point.
[Huff] I believe this has had much more work done on it then you
think. I have personally stepped through the proposals from
several vendors that are working on this option for their HW HBAs.
Usually, because of the iSCSI PDU headers, the data/commands
can be placed directly into the SCSI Host buffers, almost every
time. Only when the PDU headers arrive slightly out of order
(do to normal routing) are the packets unable to be placed
directly into the Host buffer. And that requires some, but only
a small amount, of buffering space.
It is the packet drops that occur on PDU headers, and resultant
error retries, that cause the need for large amounts of "on
HBA/chip" buffering.
So by using Markers, these HW iSCSI HBAs can limit the amount of
buffering on the chip/HBAs. [/Huff]
2) an eddy buffer solution requires some substantial speed-up in
both the NIC data path, and MOST IMPORTANTLY: the host bus. In
order to unload the eddy buffer while still handling incoming
traffic at line rate, clearly the host bus bandwidth must be >
line rate.
[Huff] This is not an effect of an eddy buffer solution, it is a
fact that every TCP/IP NIC has to deal with. Especially at the
new Speeds. Our current PCI buss will not support 10 Gigabit, further
PCI-X will not support it either, even PCI-DDR does not fully support
the full data rate. So it needs to rely on the TCP/IP window
management. The only other thing you can do is drop the packets.
this clearly makes the problem worse. [/Huff]
I know of at least one general purpose framed solution operating at
10G which has been available for >3 years (SGI's GSN/ST/XIO NIC). I'm
sure there are others.
I can't imagine there's any argument that a framed solution would be
voted `most likely to run fast and be cheap'. Every storage network
and cluster interconnect has been designed that way since antiquity.
The key tradeoff involves the OS vendors, and I'm wondering why we're
speaking for them. The question IS, how much more work is it to
introduce TCP framing over and above what is required to insert iSCSI
into their network framework. My experience from writing NIC and
storage drivers for many commercial UNIX-family OSes is:
1) it's an easy and well defined process to insert a new SCSI
transport driver into the SCSI stack.
2) it's hard and poorly defined process to insert ANYTHING into the
network stack.
[Huff] I think you are making my point. This is the problem with SW
Stacks. That is why I believe that it will take a very long time for
the various vendors to include such changes into their "bet you business"
TCP/IP SW Stacks. The point that Matt and I have been trying to make
is that most OS vendors are NOT creating the iSCSI HW HBAs (NICs).
These iSCSI HW HBAs (NICs) have the TCP/IP completely on the HBA, and
they have added the iSCSI processing also so that they can steer the
packets directly into the approprate SCSI Host buffers. Adding either
Markers or Framing into the iSCSI HW HBAs is not a big problem. It is
only a problem of getting Framing (timely) into Host TCP/IP Stacks.
[/Huff]
Networking has historically been a user-mode activity. Architected
services are only provided to user mode programs. Kernel clients have
been few and far between and so are handled on a case-by-case basis.
For example NFS. Every OS has hacks to make NFS run fast, but they
are not stable interfaces for general purpose use.
Even Solaris' SysV-derived STREAMS stack, which is intended precisely
to provide flexible, crisp interfaces for kernel network clients, does
not document the relevant (IP stack) intermodule interfaces.
I know that there are more and more kernel network clients, but they
are coming either on fluid platforms (e.g. linux), in which case the
argument of `it'll take too long to get OS support' doesn't apply, or
they are vendor-supplied, in which case a performance iSCSI solution
in ANY form may take a while, and the choice of framing or markers
isn't going to make a difference.
[Huff] I think you are saying something I agree with and something I
do not agree with. That is, that software changes to TCP/IP in the
various "Bet you Business" OSs, will take some time. However, it is
not true that new iSCSI device drivers will take very long. Two types
are being created today. By Cisco, IBM, Intel, etc. These types are
iSCSI DD that make calls to normal TCP/IP stacks, and the DD that
are being written by the iSCSI HW HBA vendors. These do not require
the OS vendor to do anything special. This is happening NOW,
(Check with CISCO, Intel, and IBM (me?)). The last thing we want
is to depend on a TCP/IP change to get in the
way of our momentum. [/Huff]
I can't say squat about the architecture of Winsock, but the fact that
there is a Microsoft author of the framing proposal who seems very
serious about supporting framing and RDMA as quickly as possible
suggests that framing support should be available on Windows very
soon.
[Huff] My following statements are not meant as a negative of Microsoft.
However, they and all producers of Key complicated new Software do
not quickly bring these to the general market in a way that is as
pleasing to HW vendors as HW vendors would like.
I believe that Microsoft's heart is in the right place on this issue,
and that they will do the right thing with framing, over time.
But it is not clear in what release that will be shipped, nor what support
pack it will be included. Also it is not clear how the support
will be handled for current Win2k, WinNT etc.
This is why I think we should have Framing a Must implement
and an Optional to use. It is the easiest thing for SW to
create, and brings the needed cost reduction to iSCSI HW and
it is completely under our (iSCSI protocol) control.
[/Huff]
Steph
Home Last updated: Tue Sep 04 01:04:38 2001 6315 messages in chronological order |