|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: Flow ControlMatt, I will try to explain below. Somesh > -----Original Message----- > From: Matt Wakeley [mailto:matt_wakeley@agilent.com] > Sent: Saturday, October 14, 2000 7:18 PM > To: IPS Reflector > Subject: Re: iSCSI: Flow Control > > > Somesh, > > I still don't understand what you are trying to solve. > > With the iSCSI session wide command credit method, there is a > portion of the > iSCSI layer that sits right below the SCSI layer. It > receives the commands > from the SCSI layer and passes the results of each I/O from > each NIC back to > the SCSI layer. The MaxCmdRn indicates how many commands the > target (as a > whole) can "buffer". The iSCSI layer will "scatter" the > commands to the NICs > until it has used up the MaxCmdRn buffers. Each NIC, once > iSCSI has posted a > command to it, will attempt to send the command as long as > the TCP window is > open. Practically every message sent from the target to the > initiator contains > the new MaxCmdRn. Each in initiator NIC that receives a > message passes this > (new) value to the common iSCSI. This value does NOT have to > be sent to every > other NIC, since once a command is posted to a NIC, it is > committed to send > it. What you describe is a good model for the initiator side (even though there could be some implementation optimizations). As the iSCSI host driver (IHD) receives commands from SCSI layer, it has to check the following before it can post the command Check whether there is space in the host queues for each NIC (i.e. the host memory which has been designated to be used for posting commands to a NIC - may be limited by NIC limitations or host memory limitation). There may be models where there is no such limit. This is also the time when (Mike's comment) the scatter will be done on some algorithm and is independent of the flow control model. In the session-wide flow control model: The IHD has to be perform the additional check of whether the MaxCmdRn is being exceeded or not. In a connection-wide model: No such check has to be performed as the NIC should be able to handle that on its own. NOTE: There is a cost to performing each of these checks in SMP servers if multiple processors are involved - lock and variable moving from cache to cache. -- Now in cases where the command cannot be posted to the NIC queue, it must be left in another queue in the host which is then processed when the condition is removed. The condition will be removed when a command status is received (also could be RTT but that will be useless if the model assumes interrupting the host - you really don't want to interrupt the host on RTT) - and the host is interrupted In a connection-wide model: The interrupt processing routine checks the NICs command posting queue (or equivalent status) and if it had been full, knows to check the common queue for more commands. If not, then it know there is nothing to do for command posting. In a session-wide model: Update the global location of MaxCmdRn (take a lock and release lock and thrash cache if multiple CPUs active). Then always have to check is there are commands waiting to be posted (again by checking variable and locks etc). If yes, then post those commands - repeating the algorithm that was used when the upper layer posted a command. NOTE: If we feel that the SCSI layer will generate commands faster than the session-wide credit then the session-wide credit will cause extra processing. It is much more straightforward to be able to post from the top half, then to have to try to post from top-half and then actually post from the bottom. If there is significant credit issue, then the outbound command queues will be going through starvation at times. > > Each Target NIC will have a poll of buffers to receive > asynchronous (non DATA) > iSCSI messages. As each (small) command message is received, > it is placed > into one of these buffers, processed by common iSCSI and the > CDB is passed to > the SCSI layer which stores it into its command buffer. The > message buffer is > then given back to the NIC for further messages. The question is how much credit are you going to hand out to the remote side. If there are N buffers posted per card and M cards, will you make a credit of N available (underutlization) or N * M (which assumes that the send will send evenly and is risky if there is sudden congestion on one or more connections). Also the same discussion of the system cost of a calculating and using a centralized value of MaxCmdRn applies if arrays have multiple processors. > > "GUPTA,SOMESH (HP-Cupertino,ex1)" wrote: > > > Yes I am trying to describe the synchronization pts and software > > intervention caused by a session wide flow control model > > But I still don't understand the "problem" that the credit > per connection > solves over the credit per session model. > > In your description, the initiator still "scatters" the > commands to the NICs, > then the NICs have the burden of trying to figure out if they > can send the > command or not. Furthermore, if some NICs have open TCP > windows, but don't > have command credit, the command can't be sent. Look at it as an opportunity to differentiate and streamline performance than as a burden. It would definitely be a feature for multi-port NICs where all the ports used for a session are on the same NIC. Saves host CPU cycles thereby improving the attractiveness of the solution :-) > > In the iSCSI session wide credit model, the initiator will > not post commands > to any NIC if it doesn't have credit. Any commands posted to > a NIC will be > sent as long as it's TCP window is open. > > > 1. Post a large enough number at each NIC. OK. The window open up > > (indicated through a new MaxCmdRn received on one connection). This > > value now must be communicated to the other connections, so that > > they can not be flow controlled also. Or the new value must be > > received on each connection. > > As I indicated above, the goal is to not overflow the SCSI > command buffer, so > the command is not discarded causing a lot of error recovery. > A command CDB > is only 16 bytes. It does not make sense to allocate 16 byte > buffers to NICs > for command reception. As I indicated above, the NIC receives > the message, the > iSCSI layer strips out the CDB and hands it to SCSI, then > reposts the message > buffer to the NIC. > > > Also since you have posted a large enough number at each NIC, > > you are really not having any benefit at all from the session-wide > > value - what is the advantage? > > Having a session wide MaxCmdRn allows the initiator to stop > sending SCSI > commands, while still enabling non command messages to be > sent. They are > received by each NIC and passed to iSCSI for processing, but > since they are > not > passed up to SCSI, nothing is overflowed. Again, there is no benefit over what a connection-wide flow control would provide. So that is a tie. In terms of being flow controlled by TCP window, or ability to scatter commands across the connections appropriately or not overflowing, or letting data/status packets continue flowing, there is no difference. > > > > 2. Have the NICs grab them from a pool through an atomic bus > > transaction. That has got to be tougher to implement than it > > looks, and the bus performance issues due to the need to maintain > > ordering etc? > > As indicated above, each NIC passes the iSCSI messages to a > central iSCSI > message processor that sends the appropriate SCSI messages to SCSI. > > -Matt > >
Home Last updated: Tue Sep 04 01:06:37 2001 6315 messages in chronological order |