|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Flow ControlAt 04:56 PM 9/21/00 -0700, Jim McGrath wrote: >While memory may be getting cheaper, latency and transfer rates are getting >higher. We have gone from 25 m parallel SCSI buses to transcontinental >TCP/IP connections; from 1 MB/s to 100 MB/s (and greater) transfer rates. >These combine to make the maximum amount of data in flight that keeps the >connection full to be growing much faster than memory cost is declining. >(Exponential growth rates are applied to both memory cost and transmission >speed; distance also appears to be growing very fast, although perhaps not >exponentially). > >So while your argument is works if you keep the fabric size the same and >increase the transfer rate (as it has been with the ATA interface - buffer >costs have declined over the years), it does not work if the fabric keeps on >growing as well. > >If a fabric introduces 1 ms (two orders of magnitude less than the worse >cases I have heard) at Gbit speed, then we need 100 Kbytes of buffer space >for a connection. We don't have enough buffer to reserve this for all >possible connections we could get (Fibre Channel designs could not reserve 4 >KByte for a smaller number of potential connections until recently). Something to think about w.r.t. this problem: RDMA semantics: Pros: - Sender only targets memory that it knows is available to use and thus does not inject more data than what the receiver can use. This mitigate the overflow problem. - End-to-end ULP ACKs provide an implicit credit scheme for the associated target resources. Con: - One must "slice" up the target resources among a set of senders which can create scalability problems depending upon the resources required per session. This is where SEND semantics have their advantages - one can use statistical access to deal with burst with minimal buffer overflow reserves and combine this with the idea described below. - RDMA support requires additional buffer access / tracking logic within the endnode to track the impacted memory. The semantics are not difficult to implement but it is additional cost within the implementation. Note: SEND semantics have DMA chain costs as well so the actual delta in implementation will vary depending upon the amount of resources one can effectively map /register at a given time. - For small messages, RDMA does not always provide any cost/benefit advantage which is why most implementations support SEND and RDMA semantics. >Jim > >PS if we actually are starting to need windows greater than 64 KBytes, is >this a problem? My understanding is that deployed TCP/IP products do not >easily support extremely large windows. This argues for spreading a single >SCSI command across multiple TCP/IP connections for pipelining to overcome >latency, not for bandwidth. Large window support is not difficult to implement and is supported in many endnodes. However, memory even in large endnodes is still limited and subject to oversubscription so if a link cannot replenish its buffers quickly enough, it drops the incoming packet and the transport retransmission / congestion management takes over and adjusts the injection rate. The question is whether one would like to implement a WRED (weight random early detection - used today in routing elements) type of system within an endnode (server, storage, etc.) whereby it would drop inbound packets when resources are tight based on some criteria of the inbound packet (IP addr, QoS, TCP port, etc.). This would allow the endnode to control which services should have priority when the workload approaches / exceeds the available buffer resources. This would also allow one to vary the amount of "emergency" reserve buffers discussed by others without having to communicate any of this end-to-end or specify it within the architecture beyond the interface and drop value interpretation. I believe there is value in creating the policy interfaces to communicate whether a given connection has any special policies associated with it and one of these policies can be where it is in the drop priority list when circumstances warrant it. The actual policy would be outside of iSCSI (see the previous e-mail discussions about QoS and policy from this summer for other areas where a policy interface would have benefit) to keep iSCSI opaque to the upper layer / application requirements. Mike
Home Last updated: Tue Sep 04 01:07:07 2001 6315 messages in chronological order |