|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"At 09:12 AM 4/3/2001 -0400, Stephen Bailey wrote: > > The Stone and Partridge paper is mostly not applicable to an iSCSI > > environment. The principal failure mechanisms were major software > > bugs in the driver stack of PC-oriented machines. People make mistakes in all implementations. Examination of other similar packet processing technology for mistakes is applicable to any effort and one should perform a risk assessment as to the probability of the mistakes being repeated here. The fact that the mistakes were in PC-oriented machines is basically irrelevant and storage is not immune from having similar mistakes (have seen storage implementations that were just as poor in terms of quality as any other segment of the industry). >I'm in complete agreement with Bob. > >I haven't seen a good analysis of TCP checksum escapes which resulted >from intermediary manipulation (I haven't read the papers, but >hopefully soon), but my hunch is that it's incredibly rare. > >An endpoint precipiated TCP checksum `escape' also escape a CRC or any >other similar integrity check. That is why I think all this >additional integrity checking (on iSCSI headers & data), is an >incredible amount of extra work (not just in computing the CRCs, but >also in designing the SACK mechanism and recovery for digest failures) >for no real gain. I agree that some of the recovery is overkill but disagree that error detection is as well. At a minimum, one needs to have a strong end-to-end error detection mechanism. Many believe a 16-bit checksum is not adequate to protect their data and given the importance of this data to our customers, most feel the specification must define such a mechanism (with some having strong feelings that this mechanism should NOT be optional). Now whether we need to have 2 CRCs, etc. is a separate debate but they need to be there and most of us will require that they be used in any product / solution delivered to the customer. >The real loss is that it's immensely slowing time-to-market for iSCSI >(both in the front end specification and the back end implementation). A fast TTM solution that is not the highest quality (prevents silent data corruption) will lead to customer distrust and a repeat of the FC adoption rate - only 10 years later has it really started to penetrate customer solutions. >A straw-man proposal (very unpopular given where we are, I know) would >be to specify iSCSI without additional integrity checks (other than >what you can get through security mechanisms, which is probably not >visible to iSCSI anyway), and if that `fails' (I'm sure it won't), we >can put an integrity shim between iSCSI and the transport. > >One example of how to do this would be Julian's TAF. Another would be >the WARP RDMA layer. If another layer is put in place that provides data integrity, then it is redundant to do this at the iSCSI layer as well and this is one place where an option can be used, i.e. one negotiates the underlying framing mechanism (e.g. WARP) and if it is present, then iSCSI does not activate the CRC services. If it is not, then it does thereby insuring that there is always end-to-end data integrity present in the solution. >We don't have to specify how to do this now If this is to be supported then it should be specified now (can be done rather opaquely by just setting a "transport services" attribute for strong end-to-end data integrity protection. >, and the point is that >it's hard to do so, because we really don't know what problem we're >solving with it. We're OK as long as we have a way to address it in >the future without completely chucking what already exists. > >The other point to remember is that iSCSI still has to make the >ID->Proposed->Draft->Internet traversal, and anybody that thinks it's >going to do that on the first try is kidding themselves. It's more >important to get SOMETHING out there that exposes the implementation >holes than to design a cathedral on paper. Nothing is perfect the first time out but in the tightening economy and increasing customer quality demands from the get-go, the trade-off between quality / reliability and TTM is not something people should rush to make. The market is not what it used to be where good enough was alright; customers expect more today and with good cause. Mike
Home Last updated: Tue Sep 04 01:05:09 2001 6315 messages in chronological order |