|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: A Transport Protocol Without ACKDoug, I agreed that FCP offers some instructive ideas. I would like to decouple the allocation of initial credits from the login process (as per a previous message) and allow the target to really dynamically allocate them on a per initiator basis. On the general BB credit model, the only real issue I have there is that in FC the credits are for frames (in FC often 2K bytes) not bytes. I agree that commands can just be stored by the target like any other data, but there is big difference in size between a command frame and a user data frame. We don't need byte level granularity, but keeping the "credit unit" to something like 512 byes (or smaller) would allow for more efficient target memory management at modest controller complexity. Note you could still send things like 2K byte payloads, you just end up using 4 512 byte credits rather than a single frame credit. It was the coupling of 1 credit per frame, and then the need for large frames for efficient bus utilization that got us into trouble. In FC one objection to making a lot more smaller credits is the number of primitive tokens you would have to send (since each mapped to one credit), but here we need control packets anyway, so we can free up and use credits in bunches rather than individually (similar to what is done in FCP). In theory an initiator could send down multiple commands and then start sending down data sort of randomly between the commands, creating potential starvation issues. But no initiator that I know of does anything like that. I've never seen one that will send down some write data, jump to another command and send data, and then go back to the first command (maybe someone else has?). As long as the amount of data you can send with credits is smaller than the TCP window size, then you should never get starvation as far as I can tell (am I missing something)? Jim -----Original Message----- From: Douglas Otis [mailto:dotis@sanlight.net] Sent: Wednesday, September 20, 2000 12:17 AM To: Jim McGrath; 'Randall Stewart' Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu' Subject: RE: A Transport Protocol Without ACK Jim, I understand a desire to stick with what works. Regardless of the IP transport, the traffic to each LUN will require some other flow control mechanism. Yes, we could re-invent the wheel as it applies to SCSI, but if you examine FC Class 3 FCP, you will see an appropriate flow control mechanism in place. It uses Buffer-to-Buffer credit tokens generated by comma frame delimiters. These frame delimiters are defined within the FC-encapsulation documentation. Simple, direct and easy. See http://search.ietf.org/internet-drafts/draft-otis-fc-sctp-ip-00.txt. To facilitate processing frame delimiters within software, both could be presented before the frame rather than as shown in the rough draft. Perhaps even a Null CRC option could be added if one trusts SCTP checksum for software implementations. Once flow control is in place, there is little need for extending command tags, CRN, or anything associated with the LUN as these structures are then independent of transport bandwidth. I doubt there is a great benefit in having more than 256 commands pending on a nexus. Expanding any field only makes converting to a drive interface state-full, difficult and far less reliable. With the FCP flow control mechanism, T10 does not need to redefine SCSI for initiators that overwhelm the target. The target would have adequate control of resources. Should IP-SCSI be driven by controller design? Caching, volume management, reservations, and nearly every feature offered by a controller is significantly reduced in value should the controller be placed next to the drive. If you are in a facility 35 miles from a location holding drives, you may find 50 miles of fiber transversed creating some 800+ micro-seconds of round-trip time simply due to the speed of light. You may shudder to think about any NIC buffer. Where would you want the controller and where would you want the drive? The controller must remain on the client side of the network. As such, drive design should steer the IP-SCSI standard. At least with FCP, the drive manufacturers have already spoken. Those making controllers will just have to make more of them and develop controller locking protocols should this controller be part of a remote cluster. If you examine FCP documentation, you will find that you can send data with the command as an option. You can also send the response at the end of data as an option. Every vital feature used to justify tossing FCP structures become moot. Should just an 8M byte FIFO buffer be placed between an IP agent and a FC agent, as much as 65 milli-seconds of latency can be created. Merely this additional latency will greatly facilitate rate differences between these two agents. FCP flow control and burst limits could easily finish the task. You speak of TCP as a proven technology, but TCP is not being suggested for IP-SCSI. TCP with some other mechanism is used to solve ills created by a persistent single byte stream. This is not proven technology, nor likely to function properly without major tweaking. At least if you wish to have a hand at creating a suitable API for multi-object-streams far and away more suitable for SCSI, now is the time. Perhaps either Randall Stewart's U-SCTP or a stale frame timer should be added to prevent overlapping retry mechanisms if this protocol is used as a bridge to FC. As far as the configuration effort, convert these requirements into LDAP structures. This would allow a single database to manage all aspects of configuration. Stuffing this information across the transport only weakens security. A bad idea and makes deciding who manages difficult. Networks will always have a means to identify equipment in some binary fashion, and LDAP and DHCP servers combine this information into meaningful structures with meaningful names. All values required for the various transport layers would be derived from these standard servers. As far as what to do with Stream 0- revision negotiations, FC-domain mapping, SRC-DST filtering done in purely binary form would be the best means at getting equipment to accept commands without a high overhead. The equipment does not care what the binary number represents. As far as a clever means of doing remote DNS, SCTP has that covered. Again, this information comes from an LDAP server accessed by the driver and not the SAM interface or the SCSI transport layer. Yes, there are many options within FC that should be excluded. If FCP structures can be used, perhaps while holding one's nose, they should be. There are far too many benefits for doing so, and too few benefits for not. In the end, a better product would have a common set of structures to speak SAN. If you wish to make round wheels out of square blocks, don't let me stop you. I think I see a set of wheels already. Doug > -----Original Message----- > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > Jim McGrath > Sent: Tuesday, September 19, 2000 4:05 PM > To: 'Randall Stewart'; Jim McGrath > Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu' > Subject: RE: A Transport Protocol Without ACK > > > > Actually the burden of proof issues is why I suggest we look at > some things > that are actually being used today (since you don't have to guess how they > behave). That is one of TCP's great strengths, and a bit of a > weakness for > SCTP (no offense to SCTP supporters, but it certainly does not have a big > and long "track record" yet, and so I can understand the concerns > others may > have as to whether things would work out as well in practice as they do in > proposal). > > Jim > > PS Personally, I'm a big believer is copying stuff that works, making the > minimum amount of required changes, and then doing a rapid but controlled > deployment (I've been involved in a lot of those sorts of things > in ATA and > SCSI). Having been involved in both these sorts of endevors and the > opposite (big, clean sheet of paper efforts, like 1394 (no offense to > 1394/Firewire supporters, but I was working on it a decade ago)), > I know how > easy it is to underestimate the work required by the latter, and to be > turned off by the "inelegance" of the former. For me, life has become too > short - I'm willing to accept inelegance as the price for speed of > deployment. > > > -----Original Message----- > From: Randall Stewart [mailto:rrs@cisco.com] > Sent: Tuesday, September 19, 2000 4:38 AM > To: Jim McGrath > Cc: 'Y P Cheng'; 'Ips@Ece. Cmu. Edu' > Subject: Re: A Transport Protocol Without ACK > > > Jim: > > Any transport protocol proposal is ok. As long as it can be seen and > reviewed. So far I have seen only two TCP and SCTP. > > Oh, a little side note, any transport protocol proposed MUST be able to > show TCP like behavior in the face of congestion. And I think, IMHO, that > this means that if it is NOT using RFC2581 procedures it MUST show that > it does backoff and share with TCP. It also has a HEAVY burden of proof to > show this facility at least in my mind and I would think in the > IESG's mind > as well... > > R > > > Jim McGrath wrote: > > > I would expand your search to include non standard protocols (i.e. > > proprietary ones) as well if they offered something and were adequately > > understood by the outside world. We do that in storage quite a lot - > > indeed, some standard protocols are direct descendants of what were once > > proprietary protocols (e.g. ATA, the most widely used desktop disk > > interface, and ESCON, a dominant mainframe class interface > (both of which > > originated from IBM proprietary technologies)). > > > > Jim > > > > -----Original Message----- > > From: Y P Cheng [mailto:ycheng@advansys.com] > > Sent: Monday, September 18, 2000 5:52 PM > > To: 'Ips@Ece. Cmu. Edu' > > Subject: RE: A Transport Protocol Without ACK > > > > From: randall@stewart.chicago.il.us > > > I see no viable transport protocol here and I don't see this > > > conversation of any use unless you get exact details AND point > > > to a internet draft that defines EXACTLY how it works (or possibly > > > some other standards document). > > > > Both I2O and VI are transport protocols which define the format of a > request > > to a transport service provider, i.e. an adapter card. I2O is used but > not > > limited to deliver SCSI requests and VI is used for any payload > including > IP > > packets. VI is mapped into FC with the device headers between the FC > header > > and data payload. VI can certainly be used for delivery of > SCSI requests > > too. Both protocols require the service provider to have reliable > delivery > > and reception. VI defines different QoS. > > > > > > I don't claim any credit about this transport layer protocol. Every > > fibre > > > > channel and Infiniband adapter designer knows about this protocol -- > > > > although there is no standard. I am sure the TCP > accelerator card is > > doing > > > > the same. This protocol is a great alternative to the use of TCP/IP > and > > > > should be incorporated into iSCSI. > > > > > > No it is not. You are not offering an alternative yet.. > > > > I did not imply iSCSI should use I2O or VI. In fact, the > purpose iSCSI is > > to map SCSI requests into IP packets as well as to define the delivery . > It > > seems to me that the working group has set its mind on TCP/IP and is > > believing this is the only solution. The consensus seems if > there is any > > other solutions that address flow control and congestion, it > would end up > > like TCP/IP. I am simply pointing out if we keep an iSCSI request as a > > single atomic transaction without separating it into the > > TCP/IP-stream-oriented Writes and Reads that each deals with a > single DU, > > then, the deadlock problem goes away. While the work group thinks we > should > > take advantage the flow control and congestion management of > TCP/IP, there > > are alternatives known as BB-credit and EE-credit management. The fibre > > channel adapters make reliable delivery, lost packet detection, and > > retransmission without TCP/IP. > > > > Randall, you are right, I did not spent time to provide the > working group > a > > draft defining such transaction-oriented protocol. All I have > provided is > > an idea that besides TCP/IP. The designers for SCSI and fibre channel > > adapters have solved the head-of-queue blocking, the congestion, and > > retransmission problems. The transaction-oriented WRITE-REQUEST and > > READ-RESPONSE, in my humble opinion, allows us to implement > iSCSI simpler > > than that of WRITE and READ stream requests. The performance cost of > > requiring ACKs on every DU with size greater than MTU on a network with > long > > latency is very expensive.. By defining a greater ACK granularity is an > > attempt to solve this performance problem. If we do wish to > ACK on every > > DU, then, on a long latency network, we must have a method to stream the > > PDUs to ensure the performance. The method should not consume a large > > amount of memory space. One should never ignore the TCP/IP > memory-to-memory > > copy overhead when the backbone will be running at OC-192 speed in the > near > > future. Finally, please don't ever ask two NIC cards to > synchronize with > > each other. It is really hard to do as those of us in business of > designing > > NIC cards can testify. > > > > Y.P. Cheng, CTO, ConnectCom Solutions Corp. >
Home Last updated: Tue Sep 04 01:07:09 2001 6315 messages in chronological order |