Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"

To: "Jon Hall" <jhall@emc.com>
Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
From: "John Hufferd" <hufferd@us.ibm.com>
Date: Thu, 19 Apr 2001 00:47:40 -0700
Cc: ips@ece.cmu.edu
Content-type: text/plain; charset=us-ascii
Importance: Normal
Sender: owner-ips@ece.cmu.edu

Jon,
You said:
...I've worked in this context (though its been some years now).
It was true (at one time) that tape had a tractability limit, e.g.,
a tape backup of a terabyte was out of the question.  Has that changed?

Not knowing how far back you go, or what technology you were using, Yes I
guess it has changed, terabytes are a common back-up size, even larger.
The current high-end tapes are really quite good, and their speed continues
to increase.  They are currently tracking Disk technology, maybe one half
to at most one generation behind in head technology, etc.

The disks are getting Larger and Larger, and the Tape technology has had to
track close to disk,  the new LTO tapes continue this track.

Nightly backups of the Terabytes of data used by a Computing Center's
Servers are the rule.  And by the way Tapes continue  increasing the speed.
Of course they are not increasing at 10x, but understand that they are
being written in parallel by backup application such as TSM (aka ADSM),
which write many tapes in parallel.  The Database of the backup
applications and the Tape Libraries have solved a lot of problems that we
use to have to deal with.

.
.
.
John L. Hufferd
Senior Technical Staff Member (STSM)
IBM/SSG San Jose Ca
(408) 256-0403, Tie: 276-0403,  eFax: (408) 904-4688
Internet address: hufferd@us.ibm.com

"Jon Hall" <jhall@emc.com>@ece.cmu.edu on 04/09/2001 09:02:10 AM

Sent by:  owner-ips@ece.cmu.edu

To:   ips@ece.cmu.edu
cc:
Subject:  Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"

"John Hufferd" writes:
>OK, if you go at it long enough you are punished with my two cents.

:-)

>We of course need some real numbers on what the probability of the CRC
>detected error, when TCP does not detect it.
>
>Given the fact that we do not have that information,  I could only just
use
>some of the numbers that have been kicking around on this thread.
>
>A Billion is NOT a large number, especially when we are talking about 10
>Gigabit Links ( vendors sampling 10 Gigabit/sec HBAs, next year, some
>shipping them in general availability (GA) in that year, and the rest in
>2003.  And yes I also got information of a company that is currently
>developing  100 Gigabit Links.)  So when I looked at some of the numbers,
I
>found that it meant that a link would see a failure about every twenty
>minutes, some went for 200 minutes, etc.

This is exactly why its necessary to understand the flow.  In a tape
context is it OK to assume that the data flowing from the target to
the initiator is responses to cmds, and that only a part of that is
iSCSI headers with StatSNs?  If that's right, then run the numbers
against the flow (don't use my numbers, they are riddled with guesses).

>I have one war story that might apply.  Years ago when we were first
>thinking about putting the small disk drives in our large storage
>controllers, we had folks calculate the Mean Time to Failure (MTF), of the
>various Desktop HD.   Some individual MTF numbers sounded large, for any
>given drive.  But then we computed the number of drives we would have in a
>large installation,  it ended up that we would have a drive failure at
>least every day.  (Thankfully, we had significantly better MTF numbers in
>the drives that were actually used.)   So, the point is that sometimes
>these large numbers come back to bite you in ways you had not considered
at
>first, when you think about it in a large installation.
>
>OK, back to the thread.
>
>Now I see sites all the time with 10s to 100s of Tape Units, and these
>units.  In many cases this will mean that there will be a tape unit
failure
>that causes the critical  backup job to fail, somewhere on the computing
>room floor, about every 2, 20, or 200 min.  This is a major impact on a
>computing center that must process hundreds of backup each day.

Exactly, I've worked in this context (though its been some years now).
It was true (at one time) that tape had a tractability limit, e.g.,
a tape backup of a terabyte was out of the question.  Has that changed?

>Therefore, those of you that think you are talking about  very rare
events,
>should at least compute the 10 Gigabit/second Rates, and then the number
of
>paths, etc. that might be in an enterprise installation, and then state
how
>often a computing center will see such an event.  When many of these
things
>are done at night with unattended operations, these can be a significant
>issue.  If it is probable that only one failure will occur per night, then
>you are certain that when a disaster does occur,  they will not have a
>valid backup over some amount of the data.
>
>OK, I am not saying who's right or wrong here, but just that some of the
>numbers I have heard, on this thread are not that impressive when looked
at
>with a 10 Gigabit/sec links and many paths.  (Let alone the future 100
>Gigabit/sec links)  (Oh by the way, remember a 10 Gigabit link is really a
>20 Gigabit link when you factor in full duplex.)
>
>So it might be useful, for the Rare Event folks to do the calculations on
>their numbers and tell us what they mean in terms of Minutes between
>failures on 10 Gigabit links.  Then the rest of us can compute our own
>picture on how many links we will probably have in our installation.

But why does the fact that we may someday run at 10 gig change the
question?  Is there some reason to believe that at 10 gig the nature
of a tape flow has changed?  You could certainly have more flows, but
the number of packets per flow will not increase.  The speed of tape
access won't change.  You could do more tapes simultaneously, but
you still have the tractability of handling large numbers of tapes.

As an aside, is a "Rare Event" person, like a flat-earth person?
(I want to get my role right :-).

-Jon

Prev by Date: Re: iSCSI : New PDU opcode usage in rev 5.92
Next by Date: Re: [Tsvwg] [SCTP checksum problems]
Prev by thread: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport"
Next by thread: April 6 deadline for comments to Requirements-01
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:04:59 2001
6315 messages in chronological order