|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Connection Consensus ProgressSorry this is a little late, I haven't had a chance to send email in couple days. > (B) Should iSCSI have a session abstraction that > binds multiple TCP connections into one > iSCSI connection? You already know this, but I'd say no. > R1) Parallel transfers to/from and failover support for > tape devices. In contrast to disks, multiple SCSI > connections to the same tape do not work (e.g., > blocks can be written in the wrong order). I'd like to hear from a tape guru who believes that this a) is important b) will work. My limited experience in tape is that neither is the case. The tape drivers I have dug into use only a single SCSI command at a time and rely on read-ahead and write-behind buffering in the device to keep the performance up. Assuming that this is the case the performance portion of R1) is subsumed by R2) (parallelism for a single SCSI data transfer across multiple links), and the failover support is equivalent to R4). Plus: > R1) and R2) are beyond the capabilities of existing SCSI- > based systems (note that a parallel bus is a single link). iSCSI is hard enough as it is, I don't see the point of making it harder just to provide a capability which has not yet proven wide applicability. > R2) Obtaining parallelism for a single SCSI command > across multiple transport connections using > different physical links. As I have mentioned before, I believe that physical link speeds will increase at a more than adequate rate, so even if this feature is designed in, it will not be widely used. We have already seen a huge acceleration in the rate at which faster links are coming, and iSCSI (+ hardware TCP or equivalent) will only increase that rate. I also think multiple adapter/connections per session will be incapable of delivering better performance in common circumstances. One reason is that in order to get good throughput on a link, you need to ensure that the operation is large enough to a) mask fixed processing latencies b) ensure sufficient outstanding credit on each link to mask the latency of returning additional credit. If you are using N links, your minimum optimal SCSI operation may be up to N times as large. The N times as large case will only occur if there is ONLY a single SCSI op outstanding at a time (the tape case), because none of the network latencies will be masked by previous and subsequent operations. If, in the typical case, there are multiple outstanding operations, the minimum optimal SCSI operation will not be N times as large, but it will still need to be larger than the single link case because of whatever critical path overhead comes from processing N times as many credit flows. My experience with current FC targets and various OS initiators is that the size of single SCSI operations from a typical file system is already on small side for a single short gigabit link. The typical operation size is usually somewhat immutable for a particular OS. It's usually wedded to fundamental memory management design decisions. We've been on the wrong side of the `if only the OS would give me bigger operations, we could really kick ass' enough times that it seems like a fools game to hope for that. OS initiators ARE capable of generating lots of concurrent transfer demand, but it's usually with more outstanding commands rather than fewer, larger ones. See R5) below. iSCSI is intended to work on networks with larger latencies (i.e. bigger) than the current batch of storage technologies, so the link latency effects will become even more pronounced than is commonly expected now. We have seen substantial overall performance degradation on FC running @ 40 km [contrary to the Pittsburgh meeting minutes, Finisar makes FC transceivers that go 40+km, and maybe other companies do too], even with a large pool of link credits, because of inadequate transfer demand to mask the link latency. Finally, the `iSCSI is hard enough without tackling additional capabilities of unproven merit' argument applies to this too. > R3) Obtaining parallelism for a single SCSI command > across multiple transport connections using the > same physical links. > R3) needs more explanation, as TCP is known to be able > to saturate Gigabit Ethernet, given enough data to > transfer. Is the argument for R3) that for the > transfer sizes likely to be seen in iSCSI, TCP > spends enough of its time in slow start and the > like that multiple TCP connections gain performance? My hunch is that doing this is horribly poor network citizenship. If there is a way to get more performance out of a single end to end connection, it's the transport's (TCP's) responsibility to get it. Running multiple connections to end run TCP's congestion avoidance algorithms has the potential to either slow everybody down or make the network unstable (which will certainly slow everybody down too). For that reason, I would suggest that iSCSI should categorically prohibit this behavior. If you want to live by the sword (operate well on a general network), you have to die by the sword (put up with the inefficiencies required to keep the network healthy). > R4) Optimize failure handling, so that a single TCP > connection loss doesn't immediately translate > into a SCSI error visible to higher level > (time-consuming) recovery logic. This seems like a straw-man for several reasons. First, this requirement suggests that the SCSI layer is not well adapted to handle errors. A major part of any SCSI layer is all about error handling. However, SCSI layers usually assume that the low level driver will make allowances for handling media-specific conditions. The big problem with non-fatal FC conditions causing fatal SCSI errors was inadequate FC layer engineering. Early FC drivers badly abused the hospitality of the upper SCSI layers. For example, an event like a LIP (or any other link level event) typically had some finite duration and was directly detectable by the driver, so stupid drivers would detect the link failure and immediately return the SCSI operation with a retriable error code. The retry operation would come back to the FC driver which would then observe that the link was still down and fail the operation retriably again. This would burn through the retry count instantly and result in a hard error. More subtle was when a LIP caused other nodes to LIP themselves, at some substantial interval later, often to work around implementation bugs (can you say Tachyon?). This would lead to many link up/down transitions in a short period of time. This is not a hard problem to solve, but many early driver writers did not contemplate how horrible it was going to be out there on the loop. One very large company even went so far as to say that FC-AL could never be implemented reliably and the only solution was to make sure all their FC was fabric just because they got surprised by the LIP storms. A connection drop in iSCSI is essentially a `media' event, and an iSCSI driver should not immediately fail subsequent operations to the addressed target without attempting to reestablish the connection first. We make this same assumption in SST. In fact, SST goes so far as to specify that blowing away a connection by either end is a perfectly acceptable and expected error recovery strategy in the case of some infrequent non-nominal conditions. Second, I do not believe multiple connections will work effectively to handle errors which can not be handled with appropriate connection failure recovery strategies. There are actually two cases. The first is a single interface with multiple connections (which I already suggested should be outlawed in response to R3). In this case, when one connection fails, so will the other. The second is multiple interfaces, each with a single connection. In this case, the broken connection must be discovered before any form of recovery can occur for the transfers on it. Having multiple open connections does not reduce the length of the critical path for recovery, so supporting multiple connections per iSCSI session can not satisfy this requirement. > R5) Obtaining parallelism between multiple SCSI commands > across multiple transport connections using > different physical links. I do not see that this offers anything which can not be achieved with multiple iSCSI sessions using different physical links. The only thing it seems to give potentially is link aggregation in the case where all commands are sent to the target using ordered queue instead of simple queue. I've never seen this happen. Has anybody else? Disk drivers use simple queue when they don't care, and some form of synchronous behavior (unqueued, or just sending one command at a time) when they care about order. If the commands are simple queue, it doesn't matter whether they're sent in a single session or multiple sessions. The tape case is discussed under R1). > Those against should check that none of R1-R4 are important enough > to be requirements. I have also argued in some cases that multiple connections per iSCSI session would not be capable of effectively satisfying the requirements. Don't get me wrong, I'm not arguing that link aggregation is a bad thing. It would be great if somehow (magically) it just worked. It would be a nice selling point for iSCSI whether or not it is actually widely used. I AM arguing that any straightforward proposal is unlikely to deliver on the promise for reasons which are beyond the control of the iSCSI standard. And, more complexity in the standard will slow down its deployment. Steph
Home Last updated: Tue Sep 04 01:07:43 2001 6315 messages in chronological order |