RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"

To: ips@ece.cmu.edu
Subject: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
From: "John Hufferd" <hufferd@us.ibm.com>
Date: Fri, 6 Apr 2001 22:44:10 -0700
Content-type: text/plain; charset=us-ascii
Importance: Normal
Sender: owner-ips@ece.cmu.edu

OK, if you go at it long enough you are punished with my two cents.

We of course need some real numbers on what the probability of the CRC
detected error, when TCP does not detect it.

Given the fact that we do not have that information,  I could only just use
some of the numbers that have been kicking around on this thread.

A Billion is NOT a large number, especially when we are talking about 10
Gigabit Links ( vendors sampling 10 Gigabit/sec HBAs, next year, some
shipping them in general availability (GA) in that year, and the rest in
2003.  And yes I also got information of a company that is currently
developing  100 Gigabit Links.)  So when I looked at some of the numbers, I
found that it meant that a link would see a failure about every twenty
minutes, some went for 200 minutes, etc.

I have one war story that might apply.  Years ago when we were first
thinking about putting the small disk drives in our large storage
controllers, we had folks calculate the Mean Time to Failure (MTF), of the
various Desktop HD.   Some individual MTF numbers sounded large, for any
given drive.  But then we computed the number of drives we would have in a
large installation,  it ended up that we would have a drive failure at
least every day.  (Thankfully, we had significantly better MTF numbers in
the drives that were actually used.)   So, the point is that sometimes
these large numbers come back to bite you in ways you had not considered at
first, when you think about it in a large installation.

OK, back to the thread.

Now I see sites all the time with 10s to 100s of Tape Units, and these
units.  In many cases this will mean that there will be a tape unit failure
that causes the critical  backup job to fail, somewhere on the computing
room floor, about every 2, 20, or 200 min.  This is a major impact on a
computing center that must process hundreds of backup each day.

Therefore, those of you that think you are talking about  very rare events,
should at least compute the 10 Gigabit/second Rates, and then the number of
paths, etc. that might be in an enterprise installation, and then state how
often a computing center will see such an event.  When many of these things
are done at night with unattended operations, these can be a significant
issue.  If it is probable that only one failure will occur per night, then
you are certain that when a disaster does occur,  they will not have a
valid backup over some amount of the data.

OK, I am not saying who's right or wrong here, but just that some of the
numbers I have heard, on this thread are not that impressive when looked at
with a 10 Gigabit/sec links and many paths.  (Let alone the future 100
Gigabit/sec links)  (Oh by the way, remember a 10 Gigabit link is really a
20 Gigabit link when you factor in full duplex.)

So it might be useful, for the Rare Event folks to do the calculations on
their numbers and tell us what they mean in terms of Minutes between
failures on 10 Gigabit links.  Then the rest of us can compute our own
picture on how many links we will probably have in our installation.



.
.
.
John L. Hufferd
Senior Technical Staff Member (STSM)
IBM/SSG San Jose Ca
(408) 256-0403, Tie: 276-0403,  eFax: (408) 904-4688
Internet address: hufferd@us.ibm.com

Prev by Date: DRAFT Minneapolis Minutes
Next by Date: RE: iSCSI:flow control, acknowledgement, and a deterministic reco very
Prev by thread: RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by thread: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:05:09 2001
6315 messages in chronological order