|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Notes of 06/21 meetingAt 12:04 AM 6/29/00 -0700, Costa Sapuntzakis wrote: >It was pointed out that the command reference number >as spec'ed was not long-lived enough to provide error >recovery. However, the task tag could be used to do >error recovery. This is why it should be at least a 32-bit value and possibly a 48-bit value. It should also be done on all commands to simplify the problems later described, i.e. multiple TCP op support, the ability to deal with overflows (only receive what is within the window of support ops and let a SACK-like error recovery deal anything lost), simplifies the hardware (always present and can be used to retain ordering without much overhead), simplifies mirroring since one can immediately forward the ops in the order the initiator wanted without stalls, etc. >-------------- > >There was then a discussion about whether the command >reference number should be per LU or per session. Large value per session and then it does not matter. >There was a lot of talk about whether we want >to support multiple TCP connections/session. > >John Hufferd pointed out that SCSI load balancers already exist >that take advantage of multiple sessions (multiple SCSI busses) >to stripe commands to a target. He argued that multiple >TCP connections are unnecessary. He also argued that no applications >make effective use of SCSI ORDERED attribute, because the >interface are not there. Very simple implementation can be built with multiple TCP connections / per session. With the command reference numbers always sent on operation, the start / stop problem is mitigated because one is receiving / processing the operations in the order they were received. In addition, one can develop the hooks that separate specs provide for arbitration policies, QoS, etc. to deal with different link bandwidth / etc. attributes. >However, they have to stop and wait for ordered commands. >One application where stop and wait hurts is tape (where >all writes are ordered), so some tape applications write >self-describing blocks to tape which can be written in any order. > >Remote asynchronous mirroring can also be done with ordered >writes. Hufferd argued that remote asynchronous mirroring must >be solved at a higher layer and is being solved today. Not that difficult to do with what I described above. >Most of those arguing for multiple TCP connection said that > - it isn't that hard > - it would make iSCSI better than other SCSI transports > - it would make high-perf apps easier to write Add in - Multi-path support is much easier to implement. - Higher performance can be achieved - Implementations are fairly simple - minimal state - Application transparent ability to take advantage / recover from hot-plug / removal of fabric components >------------- >Deadlock: > >Luciano pointed out that it is possible to run out of >buffers and deadlock with multiple TCP connections. > >The source of the problem is > 1) receive too many out-of-order commands > 2) receiving too much unsolicited (immediate) > data > >The solution to 1) is to either > - limit the number of out-of-order commands that > are read from each TCP pipe to 1 (requires NIC > to know that command is out-of-order) and then > stop reading from the connections (deskewing) > - have a windowing mechanism on the command > ordering queue in target > - have a separate TCP pipe for emergency > recovery commands > - Nuspeed aborts command with SCSI status TASK QUEUE FULL > >The consensus seems to have resulted in windowing >being adopted. The NIC does not have to track this per se. If the NIC has the SGL for the target buffer it can perform the DMA. If the SGL does not exist, then it can drop the message without issuing a TCP ACK (Issue is whether one wants to slow this down at the TCP level or allow it to complete but have the NIC still drop the buffers w.r.t. the DMA targeting - preference is to complete from TCP point of view but drop the DMA operation). The operation target and buffers are locally posted so the rate can be controlled quite easily. The windowing proposal will work well as a control point for SGL posting to individual commands - again with minimal if any complexity. If the command reference number is always present, life can be further simplified. >The consensus solution to 2) was to allow the >target to drop immediate data and request it be >retransmited via ready-to-transmit (RTT). > >-------------- > >Should task management commands be ordered with respect to tasks? > >Those against feared that ordering task mangement commands >would prevent their timely delivery. > >Those for feared that not ordering task management commands >would lead to surprising behaviors (like ABORT TASK SET >overtaking and not aborting all previously issued tasks). > >---------------- > >Can a single iSCSI TCP connection use multiple paths in the network >simultaneously? > >Answer: Most networks keep a flow on one path to help ensure >minimal re-ordering, so no in that case. Of course, this being IP, >people could design a network that sprays packets of a flow across >multiple paths and it would still work... Most of us would prefer to not have a single connection flow through different paths - the complexity to the hardware for what is nominally a rare event would be increased. A well-behaved environment is possible to implement but then one is asking for IP to do this and creating additional specification work. Mike
Home Last updated: Tue Sep 04 01:08:12 2001 6315 messages in chronological order |