|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"Somesh, Can you give us a reference for those rates? Where do they come from? Regards, Julo "Somesh Gupta" <someshg@yahoo.com> on 04/04/2001 23:02:06 Please respond to someshg@yahoo.com To: Julian Satran/Haifa/IBM@IBMIL, someshg@yahoo.com cc: ips@ece.cmu.edu Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" Assuming that the packet corruption escape rate is 1 in 10billion, we have (rough assuming 1K byte per packet), 1 escaped packet every 10 Trillion Bytes of data transfer. Seems to me that if I had to transfer 1 MBytes for having to recover at the command level rather than at a more granular level, that does not pose much of an additional burden (1 MB out of 10 Trillion bytes). Also assuming each i/o is 1 MByte in size, you would have to do recovery for every 1 in 10 million transactions. I don't know how realistic the 1 in 10 billion packet corruption escape rate is but I am using the number from past discussions. Somesh > -----Original Message----- > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com] > Sent: Wednesday, April 04, 2001 11:56 AM > To: someshg@yahoo.com > Cc: ips@ece.cmu.edu > Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > > > > What are the numbers you are looking at: > > 1 per 10 sec, 1/10h or 1 /10y? > > Julo > > "Somesh Gupta" <someshg@yahoo.com> on 04/04/2001 20:15:53 > > Please respond to someshg@yahoo.com > > To: Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu > cc: > Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > > > > > > > -----Original Message----- > > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of > > julian_satran@il.ibm.com > > Sent: Wednesday, April 04, 2001 7:32 AM > > To: ips@ece.cmu.edu > > Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > > > > > > > > > SNACK is here for two reasons - Status retry (which is cheap) and Data > > retry as a side benefit. > > Unless there is clear benefit (i.e. the event is frequent enough > to justify recovery at this level), the entire mechanism should be > dropped - it is neither cheap nor free. If it is relatively > infrequent, the recovery at the command level should be a sufficient > mechanism > > > CRC errors are not that rare (although we don't have real data the > > simulation with file systems seem to indicate that numbers could > > be as high > > a 0.0002%). A restart of link - is expensive (slow start) and even if > they > > are far lower for many applications a slow start is a painfull event. > > Intuitively, it seems that the combination of link level CRC, TCP > checksum, and good hardware (ECC, parity etc) should lead to a > much lower level of errors caught by the iSCSI CRC algorithm. We have > to seperate error detection (i.e. what if I have bad hardware or > some vendor makes bad/buggy intermediate system) from recovery > mechanisms (not based on hardware being bad or buggy - market forces > will wean out the vendor) which should not be based on assumptions > of bugs in hardware/software of specific implementations. > > > > > Removing them from the spec is not a path we should take lightly. > > I would phrase it the other way. We should not keep adding things > unless there is very clear proof that the additional feature is > beneficial and does not have negative side effects (and there is > some consensus on adding it) > > > > Julo > > > > "Jon Hall" <jhall@emc.com> on 02/04/2001 16:13:35 > > > > Please respond to "Jon Hall" <jhall@emc.com> > > > > To: ips@ece.cmu.edu > > cc: > > Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > > > > > > > > > > > I agree with Somesh. And would go farther -- the complexity > > that results from retaining enough target-side state to respond > > to a SACK/SNACK request is non-trivial and needs clear justification. > > Intuitively, a CRC that discovers an error in an iSCSI pdu header > > (that the TCP cksum missed) seems like it should be a rare event. > > > > What is the frequency of this event? IMO the answer to this > > question should be written into the protocol spec -- assuming > > that it substantiates the benefit of SACK/SNACK. Otherwise, the > > SACK/SNACK pdu should be removed. > > > > -Jon > > > > julian_satran@il.ibm.com writes: > > > > > >Somesh, > > > > > >As I stated earlier - the DataSN was created to detect missing data > PDUs. > > >SNACK is needed to recover missing StatusSN and missing dataSN > is only a > > >bonus if the target wants to support it. It is a trivial mechanism and > I > > >think it should stay. > > > > > >Julo > > > > > >"Somesh Gupta" <someshg@yahoo.com> on 31/03/2001 02:25:52 > > > > > >Please respond to someshg@yahoo.com > > > > > >To: Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu > > >cc: > > >Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > > > > > > > > > > > > > >Sorry to have been missing for a while. Hope you will > > >appreciate my being back in action :-). It was a fairly > > >clear consensus in Orlando that applications broke up > > >their transfers into reasonably small chunks i.e. they > > >did not have very long running transfers. > > > > > >Therefore the consensus was that a command level recovery > > >mechanism was sufficient instead of an ack/sack for each > > >data PDU. > > > > > >The SACK mechanism was a post Orlando invention. Without > > >an ack mechanism (for every data PDU), the SACK mechanism > > >just imposes additional burden on either end of the session, > > >without really much benefit. > > > > > >The benefit of having SACK is of saving bandwidth in case > > >the data part of the data PDU failed an integrity check > > >(but passed TCP checksum). This is a rare enough case that > > >as a percentage, the bandwidth loss from retransmitting > > >all the data associated with a read or write command is > > >very very small. > > > > > >In addition, it avoids the complexity of restarting > > >something from the middle, as compared to from the begining. > > > > > >To me it seems that there is significant simplicity (from > > >implementation, reliability and recovery process) from > > >having smaller data transfer per command. > > > > > >I would really like to get rid of the SACK command. > > > > > >Somesh > > > > > >> -----Original Message----- > > >> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On > Behalf Of > > >> julian_satran@il.ibm.com > > >> Sent: Wednesday, March 28, 2001 6:57 AM > > >> To: ips@ece.cmu.edu > > >> Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > >> > > >> > > >> > > >> > > >> Mallikarjun, > > >> > > >> Last summer I thought that recovery within a connection > should be left > > to > > >> TCP. It is simple and could be made available through IPsec > (if no new > > >> option of any form can be added). > > >> > > >> Two things killed this: > > >> > > >> The requirement to have a data encapsulation that can pass through > > >> application proxies (like a storage router) > > >> The "NO WAY" message we got from IESG-Security on a CRC only IPSec > > >> header > > >> > > >> > > >> As for the ACK - I am very much in favor of it (it is a no brainer) > and > > >> implementations are in fact allowed to drop even unacked data. > > >> > > >> I am bound by the Orlando meeting decision to drop it. Except the > > regular > > >> "oppose everything" crowd the two vocal opponents where Somesh > > Gupta and > > >> Matt Wakeley. > > >> > > >> David may want or not to re-open the issue - I am not going > to ask for > > >it. > > >> > > >> Regards, > > >> Julo > > >> > > >> "Mallikarjun C." <cbm@rose.hp.com> on 28/03/2001 00:45:02 > > >> > > >> Please respond to cbm@rose.hp.com > > >> > > >> To: Black_David@emc.com > > >> cc: Julian Satran/Haifa/IBM@IBMIL, cbm@rose.hp.com, > > someshg@yahoo.com, > > >> steph@cs.uchicago.edu, John Hufferd/San Jose/IBM@IBMUS, > > >> ldalleore@snapserver.com, venkat@rhapsodynetworks.com > > >> Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > >> > > >> > > >> > > >> > > >> David and Julian, > > >> > > >> I appreciate both your views, and should I say that they're > > >> along predicted lines :-) > > >> > > >> - David's right in saying that the situation is akin to FC's. > > >> However, I would like to point out that FC is an unreliable > > >> transport, and hence is forced to pick up a lot of the transport > > >> baggage (at least in FCP-2, as I understand), in addition > > >> to being a SCSI encapsulation layer. Unfortunately, even with > > >> TCP being the "reliable" transport, iSCSI is going along the > > >> same lines - ie. transport baggage + SCSI encapsulation. My > > >> point is - if this is indeed a necessary evil, why don't we > > >> complete iSCSI's transport functionality by data-ACKs? > > >> > > >> - If data SACK is introduced mostly to make up for TCP's > shortcomings, > > >> we're making its usage (and implementation) drastically less > > appealing > > >> since the only way error recovery algorithms can *rely* on > data SACK > > >> is when replay is supported (or, "ReplaySupport=yes" in my > > proposal), > > >> which is extremely expensive. IOW, we're defining data SACK in the > > >> draft and not providing any incentives to implement and use it! > > >> > > >> - I submit that since iSCSI is being hailed as the ideal SCSI > Transport > > >> protocol in its definition so far (and I believe, rightly so - > > >mandating > > >> command ordering, bi-di support, SCSI CRN support to name a few > > >> examples), > > >> the perfectly SCSI-legal R/W interactions that break in other > > >transports > > >> *do not* have to break in iSCSI. > > >> > > >> - A last idea (may seem radical at this point) in regards to iSCSI > > >> being a "full transport". This provides us an opportunity to "cast > > >> off" the transport baggage in future when we truly move to a > > "reliable" > > >> transport (perhaps TCP with CRCs/SCTP ?) - if we do a good job of > > >> keeping the encapsulation stuff separate from the transport stuff. > > >> (Julian, I heard from Randy that ideas similar to this > were explored > > >> in your Haifa meeting. And yes, he recalls they were > given up since > > >> TCP was supposed to be reliable and granularity of recovery > > was deemed > > >> one I/O.) > > >> > > >> With that said, may I request David (with his co-chair hat on, :-)) > > >> to add some binding comments/observations on this discussion? > > >> > > >> If we decide to leave data SACKs as unattractive to implement, > > the draft > > >> should in the least add a statement like - "Note that satisfying all > > >> possible data SACK requests for a task with an unacknowledged status > > >> implies implementing the I/O replay buffer on the part of targets." > > >> -- > > >> Mallikarjun > > >> > > >> > > >> Mallikarjun Chadalapaka > > >> Networked Storage Architecture > > >> Network Storage Solutions Organization > > >> MS 5668 Hewlett-Packard, Roseville. > > >> cbm@rose.hp.com > > >> > > >> > > >> > > >> > > >> >I think Julian's basically right -- I would point > > >> >out that any case of write after read that breaks > > >> >over iSCSI will also break over Fibre Channel. > > >> >On FC, the scenario starts with a frame CRC failure > > >> >on read data at the Initiator, so applications > > >> >have to cope and typically do so by enforcing > > >> >ordering at the app rather than using SCSI task > > >> >ordering. > > >> > > > >> >While SCSI has clever tools like ACA and task > > >> >ordering that appear to allow dependent operations > > >> >to be sent to the target concurrently, in practice > > >> >they don't work and/or aren't used (funny thing, > > >> >those two reinforce each other ;-) ). Hence > > >> >a minimal approach to them is in order: > > >> >- Make sure the result will interoperate. > > >> >- Make sure T10 doesn't ding us for leaving something > > >> > completely out. > > >> >- Don't specify anything not needed for the above. > > >> > > > >> >My 0.02, > > >> >--David > > >> > > > >> >> -----Original Message----- > > >> >> From: julian_satran@il.ibm.com [SMTP:julian_satran@il.ibm.com] > > >> >> Sent: Tuesday, March 27, 2001 9:23 AM > > >> >> To: cbm@rose.hp.com > > >> >> Cc: someshg@yahoo.com; steph@cs.uchicago.edu; > hufferd@us.ibm.com; > > >> >> cbm@rose.hp.com; ldalleore@snapserver.com; Venkat Rangan; > > >> >> Black_David@emc.com > > >> >> Subject: Re: iSCSI ERT: data SACK/replay > buffer/"semi-transport" > > >> >> > > >> >> > > >> >> > > >> >> Mallikarjun, > > >> >> > > >> >> I commiserate with you at the lack of ack for data but the Orlando > > >> meeting > > >> >> stated - no. Recall that I kept the number only as a mechanism to > > >> detect > > >> >> missing packets. > > >> >> > > >> >> You can achieve the effect you want by keeping around data for a > > while > > >> >> (you > > >> >> determine how long and then discard). > > >> >> > > >> >> If a SACK comes and you can recover - fine. If not you either > > reaccess > > >> the > > >> >> media (if you know how) or reject > > >> >> and let the initiator retry. > > >> >> > > >> >> You should not worry about R/W conflicts as programs bound to have > > >such > > >> >> conflicts either: > > >> >> > > >> >> 1)can live with them or > > >> >> 2)protect themselves through some locks and rely on > > >> "operation-end-status" > > >> >> to keep results deterministic. > > >> >> > > >> >> Regards, > > >> >> Julo > > >> >> > > >> >> > > >> >> > > >> >> "Mallikarjun C." <cbm@rose.hp.com> on 27/03/2001 03:34:16 > > >> >> > > >> >> Please respond to cbm@rose.hp.com > > >> >> > > >> >> To: cbm@rose.hp.com, someshg@yahoo.com, steph@cs.uchicago.edu, > > >Julian > > >> >> Satran/Haifa/IBM@IBMIL, John Hufferd/San Jose/IBM@IBMUS > > >> >> cc: Black_David@emc.com > > >> >> Subject: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> Hi Error Recovery Team, > > >> >> > > >> >> iSCSI can discard PDUs because of digest errors and request > > >> >> retransmissions using the iSCSI data SACK. To deal with such > > >> >> an eventuality, targets that want to support data SACK have > > >> >> the following options: > > >> >> > > >> >> (A) maintain a complete "replay" buffer for the entire I/O since > > >> >> a SACK could come anytime before the status is ack'ed by the > > >> >> initiator. [ simple, but extremely expensive in memory > resources] > > >> >> > > >> >> (B) (re-introduce data-ACKs into the draft, and) implement > > data-ACKs. > > >> >> Thus enables keeping only those I/O buffers that haven't been > > ack'ed > > >> >> by the initiator. IOW, become a real full transport! [ everyone > > >> disliked > > >> >> it earlier...] > > >> >> > > >> >> (C) re-access the medium for data retransmission requests. > > Now there > > >> >> are 3 sub-cases in this to handle the changed data on the > > medium in > > >a > > >> >> write-after-read scenario. (SEE NOTE.1 at the bottom on how it > is > > >> >> legal.) > > >> >> (1) On seeing any write, stall till status is ack'ed > > for all the > > >> >> previous reads (basically drain the pipe). > [simple, but > > >> incurs > > >> >> an additional roundtrip delay for all writes]. > > >> >> (2) A variation of the above, keep an eye only on the prior > > >> >> overlapping reads. [more BW efficient, but > > complicated to > > >> >> resolve the block dependencies in a stream of > > >> reads followed > > >> >> by writes] > > >> >> (3) Document the caveat and leave it upto the applications > > >> >> to avoid this case since this leads to data integrity > > >> issues. > > >> >> [pushing to apps since the transport can't get > > it right!] > > >> >> > > >> >> My first preference is (B), followed by (A), and I suggest we not > go > > >> >> to (C) at all with its inherent dangers. > > >> >> > > >> >> Doing (B) naturally completes the transport job that iSCSI has > taken > > >> >> on itself in view of TCP's claimed unreliable checksum. That is > the > > >> >> right thing to do architecturally instead of being a > > "semi-transport"! > > >> >> > > >> >> Comments? > > >> >> -- > > >> >> Mallikarjun > > >> >> > > >> >> > > >> >> Mallikarjun Chadalapaka > > >> >> Networked Storage Architecture > > >> >> Network Storage Solutions Organization > > >> >> MS 5668 Hewlett-Packard, Roseville. > > >> >> cbm@rose.hp.com > > >> >> > > >> >> > > >> > > >_________________________________________________________________ > > _________ > > >> >> Note.1: A Read followed by a Write (to the same blocks) is > perfectly > > >> legal > > >> >> if SCSI sets the ORDERED task attribute on both the > > >> commands AND > > >> >> sets the NACA bit to one to indicate that Write shall be > > >> executed > > >> >> only if the Read did not fail (result in a Check > Condition). > > >> >> > > >> >> In the current case, since Read completed just fine from > > >SCSI's > > >> >> point of view, SCSI is moving on to execute Write. Those > > read > > >> >> buffers > > >> >> had been freed up since iSCSI received an ACK at the TCP > > >level, > > >> >> and > > >> >> since iSCSI has no other way to have the data ack'ed! > > > > > > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > > _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Home Last updated: Tue Sep 04 01:05:11 2001 6315 messages in chronological order |