|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI: more on StatRNDoug: Let me rephrase the question so that you can understand: for most commands, the SCSI error handling mechanism will kick in before the iSCSI stat rn mechanism. For the remianing few, there are other mechanisms which make stat_rn redundant. Please make a case for stat_rn, Thanks, Prasenjit Prasenjit Sarkar Research Staff Member IBM Almaden Research San Jose "Douglas Otis" <dotis@sanlight.net>@ece.cmu.edu on 10/20/2000 11:25:26 AM Sent by: owner-ips@ece.cmu.edu To: Prasenjit Sarkar/Almaden/IBM@IBMUS cc: "Randall R. Stewart" <randall@stewart.chicago.il.us>, <ips@ece.cmu.edu> Subject: RE: iSCSI: more on StatRN Prasenjit, I was not suggesting a change to the iSCSI by adding a means of detecting failure within a deterministic fashion. It would be simpler not to involve the SCSI layer in order to invoke a response. The use of StatRN is a less deterministic means of detecting a connection failure using large differences between StatRN and ExpStatRN. Once a recovery mechanism is in place, a starting point for a replay of status would use a reported ExpStatRN, so I expect. How long does the target wait for an initiator to discover a problem and allow reconnection? At this involves a substantial amount of resources, these timeouts should be defined. The OS should not be depended upon to indicate a TCP failure. Doug As the only cause for long delays in responses can be failed connections and received responses free-up resources, we felt that score boarding responses at the initiator could be accomplished by simple bitmaps and there is no need to flow-control responses. Status acknowledgment is done by the ini- tiator through ExpStatRN (Expected Status RN) and large difference between StatRN and ExpStatRN indicates a failed connection. > > I'm aware tape timeouts could exceed 3 minutes or an hour, but tape > commands > are highly causal and there are existing SCSI mechanisms to deal with tape > error recovery. Also, there are ways in SCSI to report intermediate > status, so you > really dont need a hearbeat mechanism. > > I'm really trying to understand the motivation for stat_rn, > > Prasenjit > > > Prasenjit Sarkar > Research Staff Member > IBM Almaden Research > San Jose > > > "Douglas Otis" <dotis@sanlight.net>@ece.cmu.edu on 10/20/2000 09:36:55 AM > > Sent by: owner-ips@ece.cmu.edu > > > To: Prasenjit Sarkar/Almaden/IBM@IBMUS, "Randall R. Stewart" > <randall@stewart.chicago.il.us> > cc: <ips@ece.cmu.edu> > Subject: RE: iSCSI: more on StatRN > > > > Prasenjit, > > The timeouts for streaming devices do range beyond 3 minutes. There are > also system parameters that affect the rate TCP time out as well. > In other > words, do not expect either to timeout before the other. With timeouts in > the 10 minute range, a heart-beat would be desired in the range of tens of > seconds if no other communications. This should allow a reasonably quick > response to a network failure after several successive failed responses. > In > iSCSI speak, it could be an iSCSI version of Echo (ping). SCTP has > Heartbeat detection. > > Doug > > > The ballpark figure for SCSI varies but by 3 minutes you can be rest > > assured that SCSI will give up on a command, and will have probably > > issued a lun/target reset. > > > > I've other arguments against the stat_rn mechanism, but I'll wait till > > this is resolved, > > > > Prasenjit > > > > Prasenjit Sarkar > > Research Staff Member > > IBM Almaden Research > > San Jose > > > > > > "Randall R. Stewart" <randall@stewart.chicago.il.us>@ece.cmu.edu on > > 10/20/2000 06:01:55 AM > > > > Sent by: owner-ips@ece.cmu.edu > > > > > > To: Prasenjit Sarkar/Almaden/IBM@IBMUS > > cc: ips@ece.cmu.edu > > Subject: Re: iSCSI: more on StatRN > > > > > > > > Prasenjit: > > > > Being a transportish geek I don't know what the "failure" time is > > on SCSI... can you give a ball-park figure? > > > > Another thought on this issue, is if SCSI retransmits, when > > it times out (I think it does??), this just adds more > > to the queue of things in TCP that are attempting to be sent. > > > > On the TCP failure side, in most cases that I have seen > > a TCP connection fail, I have always seen it around 3 minutes > > or more before the failure was report... > > > > R > > > > Prasenjit Sarkar/Almaden/IBM wrote: > > > > > > If the time TCP takes to give up on a connection is more than the time > > SCSI > > > takes > > > to give up on a command, the stat_rn mechanism would not be useful. > > > > > > While I know the values for certain operating systems, I would like to > > hear > > > from > > > people who can assert confidently that the TCP fail connection time < > > SCSI > > > command failure time. > > > > > > Prasenjit > > > > > > Prasenjit Sarkar > > > Research Staff Member > > > IBM Almaden Research > > > San Jose > > > > > > "Mallikarjun C." <cbm@rose.hp.com>@ece.cmu.edu on 10/19/2000 07:40:16 > PM > > > > > > Please respond to cbm@rose.hp.com > > > > > > Sent by: owner-ips@ece.cmu.edu > > > > > > To: ips@ece.cmu.edu > > > cc: > > > Subject: Re: iSCSI: Question on StatRN usage > > > > > > Julian, > > > > > > Thanks for the clarifications, I am pleased to understand that > > > there's no overloading of any reference #s - the usage of new > > > term "DataRN" in your new draft makes it a lot clearer. > > > > > > Some comments. > > > > > > >Mallikarjun and Prasanjit, > > > > > > > >Sorry for the confusion. > > > > > > > >The text is confusing and I have corrected it the new text. StatRN is > > > >mandatory (it is the only way we have to ACK status and is > not related > > to > > > >ordering). > > > > > > Eventhough StatRN itself may not be used by an initiator for ordering > > > (unless it > > > wants to order completions, for whatever reason), StatRNs are > > themseleves > > > are in a monotonically increasing order. It is helpful to state this > > > explicitly. > > > > > > > > > > >As for the data the intent was to use StatRN to just number > > data packets > > > >for a given command (start with whatever you want) and have > them acked > > > with > > > >a NOP with the same task tag (this is important for input data > > for which > > > we > > > >have no other way of acking them). Those numbers are not related to > the > > > >Status numbers. No ordering or recovery is required up to command > > restart. > > > >I assume that numbers will not wrap unless a target sends more blocks > > than > > > >bytes (and it can!) but even then > > > >no harm is done. > > > >At recovery the restarted command will be followed by a NOP with the > > same > > > >initiator tag indicating what is the > > > >the block expected. The initiator does not have to do any > scoreboardong > > > > >only keep the counters. > > > >The target can free early resources and iSCSI can recover eve long > > reads. > > > >For writes evidently R2T does the job but it means that > write data can > > be > > > >recovered only with R2T. > > > > > > This implies that in case an iSCSI implementation is counting the # of > > > bytes transferred in/out during a task, it shall not assume > an error if > > > the count is the less than expected transfer size - if the retry bit > > > was set (This is especially true for writes, where the initiator > doesn't > > > know from which point target starts issuing R2Ts). I would suggest > > adding > > > this comment as well to enable better interoperability. > > > > > > >Should we overload on CmdRN/ExpCmdRN to shorten recovery? I don't see > a > > > >need. > > > > > > NO, I don't see either. My concern was that overloading these RNs for > > > data would become a scalability bottleneck, when a session > > spans mulitple > > > NICs. I am glad that it's not what was intended. > > > > > > Comments on your next email: > > > >The NOP message PDUs are not associated with a task, are meant for > > > >immediate delivery, and their only purpose is synchronizing the > > > ordering > > > >registers of the target and initiator. > > > > > > I would like to point out that NOP PDUs are indeed associated with a > > task! > > > They are associated with a task whose read data they are > ack'ing (given > > > that the DataRN is only task-unique). Also, I would like to point out > > > that the current definition of NOP payload does not have > Initiator Task > > Tag > > > - it needs to be added. > > > > > > Thanks. > > > -- > > > Mallikarjun > > > M/S 5601 > > > Networked Storage Architecture > > > HP Storage Organization > > > Hewlett-Packard, Roseville. > > > cbm@rose.hp.com > > > > > > phone: (916) 785-5621 > > > fax: (916) 785-2875 > > > > -- > > Randall R. Stewart > > randall@stewart.chicago.il.us or rrs@cisco.com > > 815-342-5222 (cell) 815-477-2127 (work) > > > > > > > > > >
Home Last updated: Tue Sep 04 01:06:36 2001 6315 messages in chronological order |