|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Synchronization problem and TCP big window problemHello, This mail is about three problems outlined in the recent discussions : a) the lost of synchronization in the TCP byte stream b) in case of lost or out of order packet, the extra delay in the command completion on the initiator side. This leading to block the initiator because its command window is closed. c) the quantity of TCP dedicated storage needed to cope with a full TCP compliant implementation on a fast link with a long round trip time. About c) -------- The problem is that the quantity of memory needed is unlimited. More the link is fast, more the target is far from the initiator and more TCP dedicated storage is needed. Some calculations several did showed how big can be this memory and we don't know where we are going in the future the link being faster and faster. This TCP dedicated memory is needed to cope with the out of order or lost datagrams. To have good performance one want to use SACK. If a datagram is lost, the receive side have to store all the byte stream incoming through the TCP pipe until the send side re-transmit. This byte stream even if acknowledged with SACK has to be stored in a temporary TCP dedicated buffer. This is because iSCSI can't process it. iSCSI lost the synchronization due to the missing datagram. The quantity of memory needed depends on the RTT and on the link speed hence it can be very big and will get bigger and bigger in the future. Hence a large memory buffer will be needed just to handle error cases. Proposition to solve these problems. =================================== Add a "pad" command in iSCSI. This pad command is only one byte: the opcode. How does it works? ------------------ At the login time the initiator and the target agree on two synchronization periods (SPEs). One for each direction. A synchronization period is a number of bytes that separates two synchronization points (SPO). At each SPO the sender guarantees that it will put the beginning of an iSCSI header, eventually adding some padding before with the pad command. The SPE value is implementation dependent and could be determined based on the memory capacity of the receiver. Shorter the SPE is, and less memory the receiver needs to handle lost or out order datagrams. On the receiver side when a hole in the TCP data stream occurs (datagram lost), the receiver continues to SACK the incoming data stream and store it in a TCP dedicated buffer up to the next SPO. Then, from the SPO it can start again to interpret the data stream and process it. It stops copying in the TCP dedicated buffer. That means for example, in case of WRITE data, copy the data on disk in case of READ data, copy it into the host reception buffer, in case of command completion, do the cleanup and so on. When it receives the missing datagram it empties the TCP dedicated buffer. For example, if the receiver can store up to 5Mbytes of TCP dedicated memory per TCP connexion it could choose a SPE of 5Mbytes. In case of bad quality line, if its dedicated memory get full (because it got other holes in the data stream after the re-synchronization and the first holes have not been filled in by the sender), it drops everything new it receives till it gets the missing datagrams. Advantages of this proposal =========================== 1) Reduce the memory needed on the receive side while maintaining good performance 2) Cap the memory needed for TCP even with long RTT and increasing bandwidth 3) Allow synchronization check each SPE 4) Negligeable loss of bandwidth (padding) Regards, Pierre
Home Last updated: Tue Sep 04 01:08:11 2001 6315 messages in chronological order |