Most efficient way to transmit file over TCP - c#

I'm currently transmitting files by Gzipping them and then converting to a base64 string, it's working well enough however I'd like to make it more efficient if possible as I'm sure this is not the best way to do it due to the 33% size increase due to Base64.
The two other options I'm considering is directly reading and writing bytes or serializing the object and sending it.
What would be the best way to do this in terms of space? (Im trying to keep the size of the file as small as possible) The files are relatively small around 100kb. I'd appreciate any insight.

If you don't want to send length first, you could use this method - after you have acquired the NetworkStream object form the connection - for reading all data from stream. Again there is no need for base64 in your case, so this solution could read a byte array which it would receive from the sending side via NetworkStream.
public static byte[] ReadFully(Stream input)
{
byte[] buffer = new byte[16*1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}

Related

Why is my DeflateStream not receiving data correctly over TCP?

I have a TcpClient class on a client and server setup on my local machine. I have been using the Network stream to facilitate communications back and forth between the 2 successfully.
Moving forward I am trying to implement compression in the communications. I've tried GZipStream and DeflateStream. I have decided to focus on DeflateStream. However, the connection is hanging without reading data now.
I have tried 4 different implementations that have all failed due to the Server side not reading the incoming data and the connection timing out. I will focus on the two implementations I have tried most recently and to my knowledge should work.
The client is broken down to this request: There are 2 separate implementations, one with streamwriter one without.
textToSend = ENQUIRY + START_OF_TEXT + textToSend + END_OF_TEXT;
// Send XML Request
byte[] request = Encoding.UTF8.GetBytes(textToSend);
using (DeflateStream streamOut = new DeflateStream(netStream, CompressionMode.Compress, true))
{
//using (StreamWriter sw = new StreamWriter(streamOut))
//{
// sw.Write(textToSend);
// sw.Flush();
streamOut.Write(request, 0, request.Length);
streamOut.Flush();
//}
}
The server receives the request and I do
1.) a quick read of the first character then if it matches what I expect
2.) I continue reading the rest.
The first read works correctly and if I want to read the whole stream it is all there. However I only want to read the first character and evaluate it then continue in my LongReadStream method.
When I try to continue reading the stream there is no data to be read. I am guessing that the data is being lost during the first read but I'm not sure how to determine that. All this code works correctly when I use the normal NetworkStream.
Here is the server side code.
private void ProcessRequests()
{
// This method reads the first byte of data correctly and if I want to
// I can read the entire request here. However, I want to leave
// all that data until I want it below in my LongReadStream method.
if (QuickReadStream(_netStream, receiveBuffer, 1) != ENQUIRY)
{
// Invalid Request, close connection
clientIsFinished = true;
_client.Client.Disconnect(true);
_client.Close();
return;
}
while (!clientIsFinished) // Keep reading text until client sends END_TRANSMISSION
{
// Inside this method there is no data and the connection times out waiting for data
receiveText = LongReadStream(_netStream, _client);
// Continue talking with Client...
}
_client.Client.Shutdown(SocketShutdown.Both);
_client.Client.Disconnect(true);
_client.Close();
}
private string LongReadStream(NetworkStream stream, TcpClient c)
{
bool foundEOT = false;
StringBuilder sbFullText = new StringBuilder();
int readLength, totalBytesRead = 0;
string currentReadText;
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE * 100;
byte[] bigReadBuffer = new byte[c.ReceiveBufferSize];
while (!foundEOT)
{
using (var decompressStream = new DeflateStream(stream, CompressionMode.Decompress, true))
{
//using (StreamReader sr = new StreamReader(decompressStream))
//{
//currentReadText = sr.ReadToEnd();
//}
readLength = decompressStream.Read(bigReadBuffer, 0, c.ReceiveBufferSize);
currentReadText = Encoding.UTF8.GetString(bigReadBuffer, 0, readLength);
totalBytesRead += readLength;
}
sbFullText.Append(currentReadText);
if (currentReadText.EndsWith(END_OF_TEXT))
{
foundEOT = true;
sbFullText.Length = sbFullText.Length - 1;
}
else
{
sbFullText.Append(currentReadText);
}
// Validate data code removed for simplicity
}
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE;
c.ReceiveTimeout = timeOutMilliseconds;
return sbFullText.ToString();
}
private string QuickReadStream(NetworkStream stream, byte[] receiveBuffer, int receiveBufferSize)
{
using (DeflateStream zippy = new DeflateStream(stream, CompressionMode.Decompress, true))
{
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
var returnValue = Encoding.UTF8.GetString(receiveBuffer, 0, bytesIn);
return returnValue;
}
}
EDIT
NetworkStream has an underlying Socket property which has an Available property. MSDN says this about the available property.
Gets the amount of data that has been received from the network and is
available to be read.
Before the call below Available is 77. After reading 1 byte the value is 0.
//receiveBufferSize = 1
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
There doesn't seem to be any documentation about DeflateStream consuming the whole underlying stream and I don't know why it would do such a thing when there are explicit calls to be made to read specific numbers of bytes.
Does anyone know why this happens or if there is a way to preserve the underlying data for a future read? Based on this 'feature' and a previous article that I read stating a DeflateStream must be closed to finish sending (flush won't work) it seems DeflateStreams may be limited in their use for networking especially if one wishes to counter DOS attacks by testing incoming data before accepting a full stream.
The basic flaw I can think of looking at your code is a possible misunderstanding of how network stream and compression works.
I think your code might work, if you kept working with one DeflateStream. However, you use one in your quick read and then you create another one.
I will try to explain my reasoning on an example. Assume you have 8 bytes of original data to be sent over the network in a compressed way. Now let's assume for sake of an argument, that each and every byte (8 bits) of original data will be compressed to 6 bits in compressed form. Now let's see what your code does to this.
From the network stream, you can't read less than 1 byte. You can't take 1 bit only. You take 1 byte, 2 bytes, or any number of bytes, but not bits.
But if you want to receive just 1 byte of the original data, you need to read first whole byte of compressed data. However, there is only 6 bits of compressed data that represent the first byte of uncompressed data. The last 2 bits of the first byte are there for the second byte of original data.
Now if you cut the stream there, what is left is 5 bytes in the network stream that do not make any sense and can't be uncompressed.
The deflate algorithm is more complex than that and thus it makes perfect sense if it does not allow you to stop reading from the NetworkStream at one point and continue with new DeflateStream from the middle. There is a context of the decompression that must be present in order to decompress the data to their original form. Once you dispose the first DeflateStream in your quick read, this context is gone, you can't continue.
So, to resolve your issue, try to create only one DeflateStream and pass it to your functions, then dispose it.
This is broken in many ways.
You are assuming that a read call will read the exact number of bytes you want. It might read everything in one byte chunks though.
DeflateStream has an internal buffer. It can't be any other way: Input bytes do not correspond 1:1 to output bytes. There must be some internal buffering. You must use one such stream.
Same issue with UTF-8: UTF-8 encoded strings cannot be split at byte boundaries. Sometimes, your Unicode data will be garbled.
Don't touch ReceiveBufferSize, it does not help in any way.
You cannot reliably flush a deflate stream, I think, because the output might be at a partial byte position. You probably should devise a message framing format in which you prepend the compressed length as an uncompressed integer. Then, send the compressed deflate stream after the length. This is decodable in a reliable way.
Fixing these issues is not easy.
Since you seem to control client and server you should discard all of this and not devise your own network protocol. Use a higher-level mechanism such as web services, HTTP, protobuf. Anything is better than what you have there.
Basically there are a few things wrong with the code I posted above. First is that when I read data I'm not doing anything to make sure the data is ALL being read in. As per microsoft documentation
The Read operation reads as much data as is available, up to the
number of bytes specified by the size parameter.
In my case I was not making sure my reads would get all the data I expected.
This can be accomplished simply with this code.
byte[] data= new byte[packageSize];
bytesRead = _netStream.Read(data, 0, packageSize);
while (bytesRead < packageSize)
bytesRead += _netStream.Read(data, bytesRead, packageSize - bytesRead);
On top of this problem I had a fundamental issue with using DeflateStream - namely I should not use DeflateStream to write to the underlying NetworkStream. The correct approach is to first use the DeflateStream to compress data into a ByteArray, then send that ByteArray using the NetworkStream directly.
Using this approach helped to correctly compress data over the network and property read the data on the other end.
You may point out that I must know the size of the data, and that is true. Every call has a 8 byte 'header' that includes the size of the compressed data and the size of the data when it is uncompressed. Although I think the second was utimately not needed.
The code for this is here. Note the variable compressedSize serves 2 purposes.
int packageSize = streamIn.Read(sizeOfDataInBytes, 0, 4);
while (packageSize!= 4)
{
packageSize+= streamIn.Read(sizeOfDataInBytes, packageSize, 4 - packageSize);
}
packageSize= BitConverter.ToInt32(sizeOfDataInBytes, 0);
With this information I can correctly use the code I showed you first to get the contents fully.
Once I have the full compressed byte array I can get the incoming data like so:
var output = new MemoryStream();
using (var stream = new MemoryStream(bufferIn))
{
using (var decompress = new DeflateStream(stream, CompressionMode.Decompress))
{
decompress.CopyTo(output);;
}
}
output.Position = 0;
var unCompressedArray = output.ToArray();
output.Close();
output.Dispose();
return Encoding.UTF8.GetString(unCompressedArray);

TCP download directly to storage

I've written a program that was initially intended for very basic text communication over the internet using the .net TCPClient class in C#. I decided to try setting up a procedure to read a file from one computer, break it up into smaller pieces which are each sent to the receiving computer, and have it reassembled and saved there. Essentially a file transfer.
I then realized that all the data I'm transferring is going into the memory of the receiving computer and then onto the storage in the next step. I am now wondering, is this the best way to do it? If data can be transferred and immediately written to the storage location where it's headed (bypassing the RAM step), is this the way a program like Google Chrome would handle downloads? Or are there usually important reasons for the data to be stored in memory first?
By the way, for clarity, let's all agree that "storage" would be like a hard drive and "memory" refers to RAM. Thanks.
Th way it is done usually is you open a FileStream read data in byte[] from TcpClient and write the number of bytes read from NetworkStream to FileStream.
Here is a pseduso example :
TcpClient tcp;
FileStream fileStream = File.Open("WHERE_TO_SAVE", FileMode.Open, FileAccess.Write);
NetworkStream tcpStream = tcp.GetStream();
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = tcpStream.Read(buffer, 0, buffer.Length)) > 0)
{
fileStream.Write(buffer, 0, bytesRead);
}
tcpStream.Dispose();
fileStream.Dispose();

Is it safe to read massive "count" of bytes from Stream then copy them to a new array?

I'm sorry about the confusing title, I honestly had no clue what to call it! I was tossing up whether this belong on one of the security/crypto exchange sites but as this is predominantly a programming question I will post it here. Feel free to move it!
I have an AES crypto stream, and as AES pads the original data with blocks, the resulting encrypted data is almost always a different size to the original unencrypted data. When decrypting, you need to know how many bytes to read from the crypto stream (how many unencrypted bytes there are). I was originally planning on sending the original, unencrypted data length in the packet but then I thought of another way. If I just read 4096 bytes from the Crypto Stream and store how many actual bytes were read, I can then copy the correct amount of bytes to a new array and use that.
Is it safe to do that? My code is the following:
using (ICryptoTransform crypt = AES.CreateDecryptor())
{
using (MemoryStream memStrm = new MemoryStream(data))
{
using (CryptoStream cryptStrm = new CryptoStream(memStrm, crypt, CryptoStreamMode.Read))
{
byte[] bytes = new byte[size];
int read = cryptStrm.Read(bytes, 0, 4096);
byte[] temp = new byte[read];
Array.Copy(bytes, temp, read);
return temp;
}
}
}
By safe I mean, will it always produce correct decrypted data?
Why are you jumping through so many hoops? MemoryStream, CryptoStream, temporary arrays...
return crypt.TransformFinalBlock(data, 0, data.Length);
To make your crypto secure, you should also use a random IV for each encryption, stored alongside the ciphertext. And adding a MAC (such as HMAC-SHA-256) in an encrypt-then-mac construction prevents a number of active attacks, including padding oracles.

Find Length of Stream object in WCF Client?

I have a WCF Service, which uploads the document using Stream class.
Now after this, i want to get the Size of the document(Length of Stream), to update the fileAttribute for FileSize.
But doing this, the WCF throws an exception saying
Document Upload Exception: System.NotSupportedException: Specified method is not supported.
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.get_Length()
at eDMRMService.DocumentHandling.UploadDocument(UploadDocumentRequest request)
Can anyone help me in solving this.
Now after this, i want to get the Size of the document(Length of Stream), to update the fileAttribute for FileSize.
No, don't do that. If you are writing a file, then just write the file. At the simplest:
using(var file = File.Create(path)) {
source.CopyTo(file);
}
or before 4.0:
using(var file = File.Create(path)) {
byte[] buffer = new byte[8192];
int read;
while((read = source.Read(buffer, 0, buffer.Length)) > 0) {
file.Write(buffer, 0, read);
}
}
(which does not need to know the length in advance)
Note that some WCF options (full message security etc) require the entire message to be validated before processing, so can never truly stream, so: if the size is huge, I suggest you instead use an API where the client splits it and sends it in pieces (which you then reassemble at the server).
If the stream doesn't support seeking you cannot find its length using Stream.Length
The alternative is to copy the stream to a byte array and find its cumulative length. This involves processing the whole stream first , if you don't want this, you should add a stream length parameter to your WCF service interface

C# Reading 'Zip' files with FileStream

I have written a program that will etablish a network connection with a remote computer using TCPClient I am using it to transfer files in 100k chunks to a remote .net application and it inturn then writes them to the HardDrive. All file transfers work good except when it comes to ZIP files - it is curious to note that the reasembled file is always 98K...is there some dark secret to ZIP files that prevent them from being handled in this manner. Again all other file transfers work fine, image, xls, txt, chm, exe etc.
Confused
Well, you haven't shown any code so it's kinda tricky to say exactly what's wrong.
The usual mistake is to assume that Stream.Read reads all the data you ask it to instead of realising that it might read less, but that the amount it actually read is the return value.
In other words, the code shouldn't be:
byte[] buffer = new byte[input.Length];
input.Read(buffer, 0, buffer.Length);
output.Write(buffer, 0, buffer.Length);
but something like:
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ( (bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
But that's just a guess. If you could post some code, we'd have a better chance of figuring it out.
The actual code would be helpful.
Are you using BinaryReader / BinaryWriter?
(i.e. data based rather than text based).
You could try using a hex file compare (e.g. Beyond Compare) to compare the original and copy and see if that gives you any clues.
It might be that you are overwriting (instead of appending to) the existing file with each chunk received? Therefore the file's final size will be <= the size of one chunk.
But without any code, it's difficult to tell the reason for the problem.

Categories