|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"Correct - as to if it happens when the Sun goes Nova I think it is far more frequent and for critical applications (business or life) people might be paying to avoid small glitches several times a day. Julo "Jon Hall" <jhall@emc.com> on 05/04/2001 15:55:11 Please respond to "Jon Hall" <jhall@emc.com> To: ips@ece.cmu.edu cc: Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" Julian, I don't understand. Are you saying that an "expensive" target will implement specific error recovery mechanisms for very rare events? Or are you saying that this case is not a rare event? If the former, there is a problem of completeness (e.g., should there be recovery procedures for when the sun goes nova :-). If the latter, this would be very interesting and useful to know about... -Jon julian_satran@il.ibm.com writes: > >Jon, > >Inexpensive implementation are always free to do away with recovery. That >si true for targets too. >But not specifying the mechanism for the more expensive one we make them >non-interoperable. > >Julo > >"Jon Hall" <jhall@emc.com> on 04/04/2001 22:55:40 > >Please respond to "Jon Hall" <jhall@emc.com> > >To: ips@ece.cmu.edu >cc: >Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" > > >But CRC errors are not really the issue. It is the >singular case of a TCP cksum failing to detect what a >CRC succeeds in detecting, and this occurring to a TCP >segment containing an iSCSI hdr with a StatSN. > >Is there a reason to believe that iSCSI StatSNs will be >lost at a higher rate than is currently documented for TCP >cksum failure? Or, is the problem a loss of one TCP segment >in tens (possibly hundreds) of millions of segments. Where >the bad segment may contain a StatSN but probably doesn't >because it is a data pdu. If the latter, why does a SCSI-level >timeout and retry (on the initiator) not suffice? [Note, >an initiator timeout/retry does not require a connection >to be closed.] > >I realize that I am being annoyingly repetitious, but it is >not an idle question. For some targets, retained rsp status >is not cheap (and retained rsp data is not tractable at all). > >IMO there appears to be no real need for SNACK. And, more >radically, there appears to be no need for StatSNs. > >Maybe, as Somesh said, this is a dead horse but why include >something in the spec which suggests a need for target-side >complexity, while not solving a clear and compelling >requirement? > >-Jon > >julian_satran@il.ibm.com writes: >> >>SNACK is here for two reasons - Status retry (which is cheap) and Data >>retry as a side benefit. >>CRC errors are not that rare (although we don't have real data the >>simulation with file systems seem to indicate that numbers could be as >high >>a 0.0002%). A restart of link - is expensive (slow start) and even if they >>are far lower for many applications a slow start is a painfull event. >> >>Removing them from the spec is not a path we should take lightly. >> >>Julo >> >>"Jon Hall" <jhall@emc.com> on 02/04/2001 16:13:35 >> >>Please respond to "Jon Hall" <jhall@emc.com> >> >>To: ips@ece.cmu.edu >>cc: >>Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" >> >> >> >> >> >>I agree with Somesh. And would go farther -- the complexity >>that results from retaining enough target-side state to respond >>to a SACK/SNACK request is non-trivial and needs clear justification. >>Intuitively, a CRC that discovers an error in an iSCSI pdu header >>(that the TCP cksum missed) seems like it should be a rare event. >> >>What is the frequency of this event? IMO the answer to this >>question should be written into the protocol spec -- assuming >>that it substantiates the benefit of SACK/SNACK. Otherwise, the >>SACK/SNACK pdu should be removed. >> >>-Jon >> >>julian_satran@il.ibm.com writes: >>> >>>Somesh, >>> >>>As I stated earlier - the DataSN was created to detect missing data PDUs. >>>SNACK is needed to recover missing StatusSN and missing dataSN is only a >>>bonus if the target wants to support it. It is a trivial mechanism and I >>>think it should stay. >>> >>>Julo >>> >>>"Somesh Gupta" <someshg@yahoo.com> on 31/03/2001 02:25:52 >>> >>>Please respond to someshg@yahoo.com >>> >>>To: Julian Satran/Haifa/IBM@IBMIL, ips@ece.cmu.edu >>>cc: >>>Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" >>> >>> >>> >>> >>>Sorry to have been missing for a while. Hope you will >>>appreciate my being back in action :-). It was a fairly >>>clear consensus in Orlando that applications broke up >>>their transfers into reasonably small chunks i.e. they >>>did not have very long running transfers. >>> >>>Therefore the consensus was that a command level recovery >>>mechanism was sufficient instead of an ack/sack for each >>>data PDU. >>> >>>The SACK mechanism was a post Orlando invention. Without >>>an ack mechanism (for every data PDU), the SACK mechanism >>>just imposes additional burden on either end of the session, >>>without really much benefit. >>> >>>The benefit of having SACK is of saving bandwidth in case >>>the data part of the data PDU failed an integrity check >>>(but passed TCP checksum). This is a rare enough case that >>>as a percentage, the bandwidth loss from retransmitting >>>all the data associated with a read or write command is >>>very very small. >>> >>>In addition, it avoids the complexity of restarting >>>something from the middle, as compared to from the begining. >>> >>>To me it seems that there is significant simplicity (from >>>implementation, reliability and recovery process) from >>>having smaller data transfer per command. >>> >>>I would really like to get rid of the SACK command. >>> >>>Somesh >>> >>>> -----Original Message----- >>>> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of >>>> julian_satran@il.ibm.com >>>> Sent: Wednesday, March 28, 2001 6:57 AM >>>> To: ips@ece.cmu.edu >>>> Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" >>>> >>>> >>>> >>>> >>>> Mallikarjun, >>>> >>>> Last summer I thought that recovery within a connection should be left >>to >>>> TCP. It is simple and could be made available through IPsec (if no new >>>> option of any form can be added). >>>> >>>> Two things killed this: >>>> >>>> The requirement to have a data encapsulation that can pass through >>>> application proxies (like a storage router) >>>> The "NO WAY" message we got from IESG-Security on a CRC only IPSec >>>> header >>>> >>>> >>>> As for the ACK - I am very much in favor of it (it is a no brainer) and >>>> implementations are in fact allowed to drop even unacked data. >>>> >>>> I am bound by the Orlando meeting decision to drop it. Except the >>regular >>>> "oppose everything" crowd the two vocal opponents where Somesh Gupta >and >>>> Matt Wakeley. >>>> >>>> David may want or not to re-open the issue - I am not going to ask for >>>it. >>>> >>>> Regards, >>>> Julo >>>> >>>> "Mallikarjun C." <cbm@rose.hp.com> on 28/03/2001 00:45:02 >>>> >>>> Please respond to cbm@rose.hp.com >>>> >>>> To: Black_David@emc.com >>>> cc: Julian Satran/Haifa/IBM@IBMIL, cbm@rose.hp.com, >someshg@yahoo.com, >>>> steph@cs.uchicago.edu, John Hufferd/San Jose/IBM@IBMUS, >>>> ldalleore@snapserver.com, venkat@rhapsodynetworks.com >>>> Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport" >>>> >>>> >>>> >>>> >>>> David and Julian, >>>> >>>> I appreciate both your views, and should I say that they're >>>> along predicted lines :-) >>>> >>>> - David's right in saying that the situation is akin to FC's. >>>> However, I would like to point out that FC is an unreliable >>>> transport, and hence is forced to pick up a lot of the transport >>>> baggage (at least in FCP-2, as I understand), in addition >>>> to being a SCSI encapsulation layer. Unfortunately, even with >>>> TCP being the "reliable" transport, iSCSI is going along the >>>> same lines - ie. transport baggage + SCSI encapsulation. My >>>> point is - if this is indeed a necessary evil, why don't we >>>> complete iSCSI's transport functionality by data-ACKs? >>>> >>>> - If data SACK is introduced mostly to make up for TCP's shortcomings, >>>> we're making its usage (and implementation) drastically less >appealing >>>> since the only way error recovery algorithms can *rely* on data SACK >>>> is when replay is supported (or, "ReplaySupport=yes" in my >proposal), >>>> which is extremely expensive. IOW, we're defining data SACK in the >>>> draft and not providing any incentives to implement and use it! >>>> >>>> - I submit that since iSCSI is being hailed as the ideal SCSI Transport >>>> protocol in its definition so far (and I believe, rightly so - >>>mandating >>>> command ordering, bi-di support, SCSI CRN support to name a few >>>> examples), >>>> the perfectly SCSI-legal R/W interactions that break in other >>>transports >>>> *do not* have to break in iSCSI. >>>> >>>> - A last idea (may seem radical at this point) in regards to iSCSI >>>> being a "full transport". This provides us an opportunity to "cast >>>> off" the transport baggage in future when we truly move to a >>"reliable" >>>> transport (perhaps TCP with CRCs/SCTP ?) - if we do a good job of >>>> keeping the encapsulation stuff separate from the transport stuff. >>>> (Julian, I heard from Randy that ideas similar to this were explored >>>> in your Haifa meeting. And yes, he recalls they were given up since >>>> TCP was supposed to be reliable and granularity of recovery was >deemed >>>> one I/O.) >>>> >>>> With that said, may I request David (with his co-chair hat on, :-)) >>>> to add some binding comments/observations on this discussion? >>>> >>>> If we decide to leave data SACKs as unattractive to implement, the >draft >>>> should in the least add a statement like - "Note that satisfying all >>>> possible data SACK requests for a task with an unacknowledged status >>>> implies implementing the I/O replay buffer on the part of targets." >>>> -- >>>> Mallikarjun >>>> >>>> >>>> Mallikarjun Chadalapaka >>>> Networked Storage Architecture >>>> Network Storage Solutions Organization >>>> MS 5668 Hewlett-Packard, Roseville. >>>> cbm@rose.hp.com >>>> >>>> >>>> >>>> >>>> >I think Julian's basically right -- I would point >>>> >out that any case of write after read that breaks >>>> >over iSCSI will also break over Fibre Channel. >>>> >On FC, the scenario starts with a frame CRC failure >>>> >on read data at the Initiator, so applications >>>> >have to cope and typically do so by enforcing >>>> >ordering at the app rather than using SCSI task >>>> >ordering. >>>> > >>>> >While SCSI has clever tools like ACA and task >>>> >ordering that appear to allow dependent operations >>>> >to be sent to the target concurrently, in practice >>>> >they don't work and/or aren't used (funny thing, >>>> >those two reinforce each other ;-) ). Hence >>>> >a minimal approach to them is in order: >>>> >- Make sure the result will interoperate. >>>> >- Make sure T10 doesn't ding us for leaving something >>>> > completely out. >>>> >- Don't specify anything not needed for the above. >>>> > >>>> >My 0.02, >>>> >--David >>>> > >>>> >> -----Original Message----- >>>> >> From: julian_satran@il.ibm.com [SMTP:julian_satran@il.ibm.com] >>>> >> Sent: Tuesday, March 27, 2001 9:23 AM >>>> >> To: cbm@rose.hp.com >>>> >> Cc: someshg@yahoo.com; steph@cs.uchicago.edu; hufferd@us.ibm.com; >>>> >> cbm@rose.hp.com; ldalleore@snapserver.com; Venkat Rangan; >>>> >> Black_David@emc.com >>>> >> Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" >>>> >> >>>> >> >>>> >> >>>> >> Mallikarjun, >>>> >> >>>> >> I commiserate with you at the lack of ack for data but the Orlando >>>> meeting >>>> >> stated - no. Recall that I kept the number only as a mechanism to >>>> detect >>>> >> missing packets. >>>> >> >>>> >> You can achieve the effect you want by keeping around data for a >>while >>>> >> (you >>>> >> determine how long and then discard). >>>> >> >>>> >> If a SACK comes and you can recover - fine. If not you either >>reaccess >>>> the >>>> >> media (if you know how) or reject >>>> >> and let the initiator retry. >>>> >> >>>> >> You should not worry about R/W conflicts as programs bound to have >>>such >>>> >> conflicts either: >>>> >> >>>> >> 1)can live with them or >>>> >> 2)protect themselves through some locks and rely on >>>> "operation-end-status" >>>> >> to keep results deterministic. >>>> >> >>>> >> Regards, >>>> >> Julo >>>> >> >>>> >> >>>> >> >>>> >> "Mallikarjun C." <cbm@rose.hp.com> on 27/03/2001 03:34:16 >>>> >> >>>> >> Please respond to cbm@rose.hp.com >>>> >> >>>> >> To: cbm@rose.hp.com, someshg@yahoo.com, steph@cs.uchicago.edu, >>>Julian >>>> >> Satran/Haifa/IBM@IBMIL, John Hufferd/San Jose/IBM@IBMUS >>>> >> cc: Black_David@emc.com >>>> >> Subject: iSCSI ERT: data SACK/replay buffer/"semi-transport" >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> Hi Error Recovery Team, >>>> >> >>>> >> iSCSI can discard PDUs because of digest errors and request >>>> >> retransmissions using the iSCSI data SACK. To deal with such >>>> >> an eventuality, targets that want to support data SACK have >>>> >> the following options: >>>> >> >>>> >> (A) maintain a complete "replay" buffer for the entire I/O since >>>> >> a SACK could come anytime before the status is ack'ed by the >>>> >> initiator. [ simple, but extremely expensive in memory resources] >>>> >> >>>> >> (B) (re-introduce data-ACKs into the draft, and) implement >data-ACKs. >>>> >> Thus enables keeping only those I/O buffers that haven't been >>ack'ed >>>> >> by the initiator. IOW, become a real full transport! [ everyone >>>> disliked >>>> >> it earlier...] >>>> >> >>>> >> (C) re-access the medium for data retransmission requests. Now >there >>>> >> are 3 sub-cases in this to handle the changed data on the medium >in >>>a >>>> >> write-after-read scenario. (SEE NOTE.1 at the bottom on how it is >>>> >> legal.) >>>> >> (1) On seeing any write, stall till status is ack'ed for all >the >>>> >> previous reads (basically drain the pipe). [simple, but >>>> incurs >>>> >> an additional roundtrip delay for all writes]. >>>> >> (2) A variation of the above, keep an eye only on the prior >>>> >> overlapping reads. [more BW efficient, but complicated >to >>>> >> resolve the block dependencies in a stream of >>>> reads followed >>>> >> by writes] >>>> >> (3) Document the caveat and leave it upto the applications >>>> >> to avoid this case since this leads to data integrity >>>> issues. >>>> >> [pushing to apps since the transport can't get it >right!] >>>> >> >>>> >> My first preference is (B), followed by (A), and I suggest we not go >>>> >> to (C) at all with its inherent dangers. >>>> >> >>>> >> Doing (B) naturally completes the transport job that iSCSI has taken >>>> >> on itself in view of TCP's claimed unreliable checksum. That is the >>>> >> right thing to do architecturally instead of being a >>"semi-transport"! >>>> >> >>>> >> Comments? >>>> >> -- >>>> >> Mallikarjun >>>> >> >>>> >> >>>> >> Mallikarjun Chadalapaka >>>> >> Networked Storage Architecture >>>> >> Network Storage Solutions Organization >>>> >> MS 5668 Hewlett-Packard, Roseville. >>>> >> cbm@rose.hp.com >>>> >> >>>> >> >>>> >>>__________________________________________________________________________ > >>>> >> Note.1: A Read followed by a Write (to the same blocks) is perfectly >>>> legal >>>> >> if SCSI sets the ORDERED task attribute on both the >>>> commands AND >>>> >> sets the NACA bit to one to indicate that Write shall be >>>> executed >>>> >> only if the Read did not fail (result in a Check Condition). >>>> >> >>>> >> In the current case, since Read completed just fine from >>>SCSI's >>>> >> point of view, SCSI is moving on to execute Write. Those >>read >>>> >> buffers >>>> >> had been freed up since iSCSI received an ACK at the TCP >>>level, >>>> >> and >>>> >> since iSCSI has no other way to have the data ack'ed! >> >
Home Last updated: Tue Sep 04 01:05:10 2001 6315 messages in chronological order |