Trying to understand Data transfer in byte[] Array C# - c#

First I would like to say that I tried to do research on how to understand what I'm about to ask but just couldn't come up with what I was looking for. This being said I thought I would ask some of you crazy smart people to explain this in lamens terms for me as best as possible.
My issue is that I have a perfectly good "Copy Pasted" code to download a file using ftpWebRequest. I will paste the code below:
public static void DownloadFiles()
{
FtpWebRequest request = (FtpWebRequest)WebRequest.Create("ftp://****/test.zip");
request.Credentials = new NetworkCredential("****", "*****");
request.UseBinary = true;
request.Method = WebRequestMethods.Ftp.DownloadFile;
FtpWebResponse response = (FtpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
FileStream writer = new FileStream("***/test.zip", FileMode.Create);
int bufferSize = 2048;
int readCount;
byte[] buffer = new byte[2048];
readCount = responseStream.Read(buffer, 0, bufferSize);
while (readCount > 0)
{
writer.Write(buffer, 0, readCount);
readCount = responseStream.Read(buffer, 0, bufferSize);
}
responseStream.Close();
response.Close();
writer.Close();
}
Like I said this works perfectly and downloads the zip file without any issues. What I'm trying to understand because I think "Copy Paste" code is a horrible idea is how this is actually functinning. The only part that I do not understand is what is actually happening between these points:
int bufferSize = 2048;
int readCount;
byte[] buffer = new byte[2048];
readCount = responseStream.Read(buffer, 0, bufferSize);
while (readCount > 0)
{
writer.Write(buffer, 0, readCount);
readCount = responseStream.Read(buffer, 0, bufferSize);
}
If someone would be so kind to please explain what is actually happening in the buffer, why we set it to 2048 and what the while loop is actually doing would be very appreciated. Just as a side not I have gone through the code and put message boxes and other debug's in to try and understand whats going on but without success. Thank you in advance and I apologize if this seems very elementary.

data is read from the stream in 2 kB chunks and written to file:
int bufferSize = 2048;
int readCount;
byte[] buffer = new byte[2048]; // a buffer is created
// bytes are read from stream into buffer. readCount contains
// number of bytes actually read. it can be less then 2048 for the last chunk.
// e.g if there are 3000 bytes of data, first Read will read 2048 bytes
// and second will read 952 bytes
readCount = responseStream.Read(buffer, 0, bufferSize);
while (readCount > 0) // as long as we have read some data
{
writer.Write(buffer, 0, readCount); // write that many bytes from buffer to file
readCount = responseStream.Read(buffer, 0, bufferSize); // then read next chunk
}

The code reads the contents of the responseStream into the buffer array and writes it out to the writer. The while loop is necessary because when you call Read() on a stream, there's no guarantee it will read enough bytes to fill the buffer. The stream only guarantees that, if there's anything to read (in other words, you haven't reached the end of the stream) it will read at least one byte. Whatever it does, it will return the actual number of bytes read. So, you have to keep calling Read() until it returns 0, meaning that you've reached the end of the stream. The size of the buffer is arbitrary. You could use a buffer of length 1 but there might be performance benefits to using something around 2-4kb (due to typical sizes of network packets or disk sectors) .

Related

HttpListenerResponse OutputStream Write is slow

I'm working on a HTTP service and reading a file from disk and writing it to the output stream.
The code thats taking a long time looks like this:
using (FileStream fileStream = fileInfo.OpenRead())
{
context.Response.ContentType = "application/octet-stream";
context.Response.ContentLength64 = fileStream.Length;
byte[] buffer = new byte[10 * 1024];
int bytesRead = 0;
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) > 0)
{
context.Response.OutputStream.Write(buffer, 0, bytesRead);
}
context.Response.Close();
}
I tried changing the buffer size and came to the conclusion that a larger buffer size gives fewer writes in the loop and a shorter transfer time. I also measured the time for the write operation to complete and it takes ~40ms for each write.
Is there anyway to write the data faster? If read all bytes and write is to the output stream, the transfer is instant. However the files can be large and I don't want to read the whole file into memory.

Convert a VERY LARGE binary file into a Base64String incrementally

I need help converting a VERY LARGE binary file (ZIP file) to a Base64String and back again. The files are too large to be loaded into memory all at once (they throw OutOfMemoryExceptions) otherwise this would be a simple task. I do not want to process the contents of the ZIP file individually, I want to process the entire ZIP file.
The problem:
I can convert the entire ZIP file (test sizes vary from 1 MB to 800 MB at present) to Base64String, but when I convert it back, it is corrupted. The new ZIP file is the correct size, it is recognized as a ZIP file by Windows and WinRAR/7-Zip, etc., and I can even look inside the ZIP file and see the contents with the correct sizes/properties, but when I attempt to extract from the ZIP file, I get: "Error: 0x80004005" which is a general error code.
I am not sure where or why the corruption is happening. I have done some investigating, and I have noticed the following:
If you have a large text file, you can convert it to Base64String incrementally without issue. If calling Convert.ToBase64String on the entire file yielded: "abcdefghijklmnopqrstuvwx", then calling it on the file in two pieces would yield: "abcdefghijkl" and "mnopqrstuvwx".
Unfortunately, if the file is a binary then the result is different. While the entire file might yield: "abcdefghijklmnopqrstuvwx", trying to process this in two pieces would yield something like: "oiweh87yakgb" and "kyckshfguywp".
Is there a way to incrementally base 64 encode a binary file while avoiding this corruption?
My code:
private void ConvertLargeFile()
{
FileStream inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read);
byte[] buffer = new byte[MultipleOfThree];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
byte[] secondaryBuffer = new byte[buffer.Length];
int secondaryBufferBytesRead = bytesRead;
Array.Copy(buffer, secondaryBuffer, buffer.Length);
bool isFinalChunk = false;
Array.Clear(buffer, 0, buffer.Length);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
if(bytesRead == 0)
{
isFinalChunk = true;
buffer = new byte[secondaryBufferBytesRead];
Array.Copy(secondaryBuffer, buffer, buffer.length);
}
String base64String = Convert.ToBase64String(isFinalChunk ? buffer : secondaryBuffer);
File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String);
}
inputStream.Dispose();
}
The decoding is more of the same. I use the size of the base64String variable above (which varies depending on the original buffer size that I test with), as the buffer size for decoding. Then, instead of Convert.ToBase64String(), I call Convert.FromBase64String() and write to a different file name/path.
EDIT:
In my haste to reduce the code (I refactored it into a new project, separate from other processing to eliminate code that isn't central to the issue) I introduced a bug. The base 64 conversion should be performed on the secondaryBuffer for all iterations save the last (Identified by isFinalChunk), when buffer should be used. I have corrected the code above.
EDIT #2:
Thank you all for your comments/feedback. After correcting the bug (see the above edit), I re-tested my code, and it is actually working now. I intend to test and implement #rene's solution as it appears to be the best, but I thought that I should let everyone know of my discovery as well.
Based on the code shown in the blog from Wiktor Zychla the following code works. This same solution is indicated in the remarks section of Convert.ToBase64String as pointed out by Ivan Stoev
// using System.Security.Cryptography
private void ConvertLargeFile()
{
//encode
var filein= #"C:\Users\test\Desktop\my.zip";
var fileout = #"C:\Users\test\Desktop\Base64Zip";
using (FileStream fs = File.Open(fileout, FileMode.Create))
using (var cs=new CryptoStream(fs, new ToBase64Transform(),
CryptoStreamMode.Write))
using(var fi =File.Open(filein, FileMode.Open))
{
fi.CopyTo(cs);
}
// the zip file is now stored in base64zip
// and decode
using (FileStream f64 = File.Open(fileout, FileMode.Open) )
using (var cs=new CryptoStream(f64, new FromBase64Transform(),
CryptoStreamMode.Read ) )
using(var fo =File.Open(filein +".orig", FileMode.Create))
{
cs.CopyTo(fo);
}
// the original file is in my.zip.orig
// use the commandlinetool
// fc my.zip my.zip.orig
// to verify that the start file and the encoded and decoded file
// are the same
}
The code uses standard classes found in System.Security.Cryptography namespace and uses a CryptoStream and the FromBase64Transform and its counterpart ToBase64Transform
You can avoid using a secondary buffer by passing offset and length to Convert.ToBase64String, like this:
private void ConvertLargeFile()
{
using (var inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[MultipleOfThree];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
String base64String = Convert.ToBase64String(buffer, 0, bytesRead);
File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
}
}
}
The above should work, but I think Rene's answer is actually the better solution.
Use this code:
public void ConvertLargeFile(string source , string destination)
{
using (FileStream inputStream = new FileStream(source, FileMode.Open, FileAccess.Read))
{
int buffer_size = 30000; //or any multiple of 3
byte[] buffer = new byte[buffer_size];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
byte[] buffer2 = buffer;
if(bytesRead < buffer_size)
{
buffer2 = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, buffer2, 0, bytesRead);
}
string base64String = System.Convert.ToBase64String(buffer2);
File.AppendAllText(destination, base64String);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
}
}
}

FileStream returns Length = 0

I'm trying to read a local file and upload it on ftp server. when i read a image file, everything is ok, but when i read a doc or docx file, FileStream returns length = 0. Here is my code:
i checked with some other files, it appears that it only works fine with images and it returns 0 for any other file
if (!ftpClient.FileExists(fileName))
{
try
{
ftpClient.ValidateCertificate += (control, e) => { e.Accept = true; };
const int BUFFER_SIZE = 64 * 1024; // 64KB buffer
byte[] buffer = new byte[BUFFER_SIZE];
using (Stream readStream = new FileStream(tempFilePath, FileMode.Open, FileAccess.Read))
using (Stream writeStream = ftpClient.OpenWrite(fileName))
{
while (readStream.Position < readStream.Length)
{
buffer.Initialize();
int bytesRead = readStream.Read(buffer, 0, BUFFER_SIZE);
writeStream.Write(buffer, 0, bytesRead);
}
readStream.Flush();
readStream.Close();
writeStream.Flush();
writeStream.Close();
DeleteTempFile(tempFilePath);
return true;
}
}
catch (Exception ex)
{
return false;
}
}
I couldn't find whats wrong with it. could you please help me?
While this doesn't answer your specific question, you don't actually need to know the length of your stream. Just keep reading until you hit a zero length read. A zero byte read is guaranteed to indicate the the end of any stream.
Return Value
Type: System.Int32
The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
while (true)
{
int bytesRead = readStream.Read(buffer, 0, BUFFER_SIZE);
if(bytesRead==0)
{
break;
}
writeStream.Write(buffer, 0, bytesRead);
}
alternatively:
readStream.CopyTo(writeStream);
is probably the most concise method of stating your goal...
it was just a silly mistake, i have two fileupload and i've saved the other fileupload, so it creates a zero length file. as it appears the code works fine.
thanks everyone.

how to buffer an input stream until it is complete

i'm implementing a wcf service that accepts image streams. however i'm currently getting an exception when i run it. as its trying to get the length of the stream before the stream is complete. so what i'd like to do is buffer the stream until its complete. however i cant find any examples of how to do this...
can anyone help?
my code so far:
public String uploadUserImage(Stream stream)
{
Stream fs = stream;
BinaryReader br = new BinaryReader(fs);
Byte[] bytes = br.ReadBytes((Int32)fs.Length);// this causes exception
File.WriteAllBytes(filepath, bytes);
}
Rather than try to fetch the length, you should read from the stream until it returns that it's "done". In .NET 4, this is really easy:
// Assuming we *really* want to read it into memory first...
MemoryStream memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
memoryStream.Position = 0;
File.WriteAllBytes(filepath, memoryStream);
In .NET 3.5 there's no CopyTo method, but you can write something similar yourself:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
However, now we've got something to copy a stream, why bother reading it all into memory first? Let's just write it straight to a file:
using (FileStream output = File.OpenWrite(filepath))
{
CopyStream(stream, output); // Or stream.CopyTo(output);
}
I'm not sure what you are returning (or not returning), but something like this might work for you:
public String uploadUserImage(Stream stream) {
const int KB = 1024;
Byte[] bytes = new Byte[KB];
StringBuilder sb = new StringBuilder();
using (BinaryReader br = new BinaryReader(stream)) {
int len;
do {
len = br.Read(bytes, 0, KB);
string readData = Encoding.UTF8.GetString(bytes);
sb.Append(readData);
} while (len == KB);
}
//File.WriteAllBytes(filepath, bytes);
return sb.ToString();
}
A string can hold up to 2 GB, I believe.
Try this :
using (StreamWriter sw = File.CreateText(filepath))
{
stream.CopyTo(sw);
sw.Close();
}
Jon Skeets answer for .Net 3.5 and below using a Buffer Read is actually done incorrectly.
The buffer isn't cleared between reads which can result in issues on any read that returns less than 8192, for example if the 2nd read, read 192 bytes, the 8000 last bytes from the first read would STILL be in the buffer which would then be returned to the stream.
My code below you supply it a Stream and it will return a IEnumerable array.
Using this you can for-each it and Write to a MemoryStream and then use .GetBuffer() to end up with a compiled merged byte[].
private IEnumerable<byte[]> ReadFullStream(Stream stream) {
while(true) {
byte[] buffer = new byte[8192];//since this is created every loop, its buffer is cleared
int bytesRead = stream.Read(buffer, 0, buffer.Length);//read up to 8192 bytes into buffer
if (bytesRead == 0) {//if we read nothing, stream is finished
break;
}
if(bytesRead < buffer.Length) {//if we read LESS than 8192 bytes, resize the buffer to essentially remove everything after what was read, otherwise you will have nullbytes/0x00bytes at the end of your buffer
Array.Resize(ref buffer, bytesRead);
}
yield return buffer;//yield return the buffer data
}//loop here until we reach a read == 0 (end of stream)
}

Why does FileStream feed FtpWebRequest but not MemoryStream?

I'm using the msdn example here:
http://msdn.microsoft.com/en-us/library/system.net.ftpwebrequest.aspx
I changed the FileStream to a MemoryStream and it doesn't read the bytes
when I change it back to FileStream it works fine.
Any clue?
Thanks
CompressMemoryStream();
Stream requestStream = _request.EndGetRequestStream(ar);
const int bufferLength = 2048;
byte[] buffer = new byte[bufferLength];
int count = 0;
int readBytes = 0;
do
{
//MemoryStream _compressedOutStream
//is created/filled by 'CompressMemoryStream()'
readBytes = _compressedOutStream.Read(buffer, 0, bufferLength);
requestStream.Write(buffer, 0, readBytes);
count += readBytes;
}
while (readBytes != 0);
requestStream.Close();
state.Request.BeginGetResponse(
new AsyncCallback(EndGetResponseCallback),
state
);
What's the value of readBytes on the first iteration through the loop?
My first guess would be that you made the same mistake that I often make: writing to a stream, then forgetting to rewind it to the beginning before starting to read back from it. If that's the case, then readBytes will be zero on the first (and only) loop iteration, because you're at the end of the stream -- there's nothing to read.
Try setting stream.Position = 0 before you start reading.

Categories