C# Reading 'Zip' files with FileStream

C# Reading 'Zip' files with FileStream - c#

I have written a program that will etablish a network connection with a remote computer using TCPClient I am using it to transfer files in 100k chunks to a remote .net application and it inturn then writes them to the HardDrive. All file transfers work good except when it comes to ZIP files - it is curious to note that the reasembled file is always 98K...is there some dark secret to ZIP files that prevent them from being handled in this manner. Again all other file transfers work fine, image, xls, txt, chm, exe etc.
Confused

Well, you haven't shown any code so it's kinda tricky to say exactly what's wrong.
The usual mistake is to assume that Stream.Read reads all the data you ask it to instead of realising that it might read less, but that the amount it actually read is the return value.
In other words, the code shouldn't be:
byte[] buffer = new byte[input.Length];
input.Read(buffer, 0, buffer.Length);
output.Write(buffer, 0, buffer.Length);
but something like:
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ( (bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
But that's just a guess. If you could post some code, we'd have a better chance of figuring it out.

The actual code would be helpful.
Are you using BinaryReader / BinaryWriter?
(i.e. data based rather than text based).
You could try using a hex file compare (e.g. Beyond Compare) to compare the original and copy and see if that gives you any clues.

It might be that you are overwriting (instead of appending to) the existing file with each chunk received? Therefore the file's final size will be <= the size of one chunk.
But without any code, it's difficult to tell the reason for the problem.

Related

Most efficient way to transmit file over TCP

I'm currently transmitting files by Gzipping them and then converting to a base64 string, it's working well enough however I'd like to make it more efficient if possible as I'm sure this is not the best way to do it due to the 33% size increase due to Base64.
The two other options I'm considering is directly reading and writing bytes or serializing the object and sending it.
What would be the best way to do this in terms of space? (Im trying to keep the size of the file as small as possible) The files are relatively small around 100kb. I'd appreciate any insight.

If you don't want to send length first, you could use this method - after you have acquired the NetworkStream object form the connection - for reading all data from stream. Again there is no need for base64 in your case, so this solution could read a byte array which it would receive from the sending side via NetworkStream.
public static byte[] ReadFully(Stream input)
{
byte[] buffer = new byte[16*1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}

TCP download directly to storage

I've written a program that was initially intended for very basic text communication over the internet using the .net TCPClient class in C#. I decided to try setting up a procedure to read a file from one computer, break it up into smaller pieces which are each sent to the receiving computer, and have it reassembled and saved there. Essentially a file transfer.
I then realized that all the data I'm transferring is going into the memory of the receiving computer and then onto the storage in the next step. I am now wondering, is this the best way to do it? If data can be transferred and immediately written to the storage location where it's headed (bypassing the RAM step), is this the way a program like Google Chrome would handle downloads? Or are there usually important reasons for the data to be stored in memory first?
By the way, for clarity, let's all agree that "storage" would be like a hard drive and "memory" refers to RAM. Thanks.

Th way it is done usually is you open a FileStream read data in byte[] from TcpClient and write the number of bytes read from NetworkStream to FileStream.
Here is a pseduso example :
TcpClient tcp;
FileStream fileStream = File.Open("WHERE_TO_SAVE", FileMode.Open, FileAccess.Write);
NetworkStream tcpStream = tcp.GetStream();
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = tcpStream.Read(buffer, 0, buffer.Length)) > 0)
{
fileStream.Write(buffer, 0, bytesRead);
}
tcpStream.Dispose();
fileStream.Dispose();

Find Length of Stream object in WCF Client?

I have a WCF Service, which uploads the document using Stream class.
Now after this, i want to get the Size of the document(Length of Stream), to update the fileAttribute for FileSize.
But doing this, the WCF throws an exception saying
Document Upload Exception: System.NotSupportedException: Specified method is not supported.
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.get_Length()
at eDMRMService.DocumentHandling.UploadDocument(UploadDocumentRequest request)
Can anyone help me in solving this.

Now after this, i want to get the Size of the document(Length of Stream), to update the fileAttribute for FileSize.
No, don't do that. If you are writing a file, then just write the file. At the simplest:
using(var file = File.Create(path)) {
source.CopyTo(file);
}
or before 4.0:
using(var file = File.Create(path)) {
byte[] buffer = new byte[8192];
int read;
while((read = source.Read(buffer, 0, buffer.Length)) > 0) {
file.Write(buffer, 0, read);
}
}
(which does not need to know the length in advance)
Note that some WCF options (full message security etc) require the entire message to be validated before processing, so can never truly stream, so: if the size is huge, I suggest you instead use an API where the client splits it and sends it in pieces (which you then reassemble at the server).

If the stream doesn't support seeking you cannot find its length using Stream.Length
The alternative is to copy the stream to a byte array and find its cumulative length. This involves processing the whole stream first , if you don't want this, you should add a stream length parameter to your WCF service interface

Joining Binary files that have been split via download

I am trying to join a number of binary files that were split during download. The requirement stemmed from the project http://asproxy.sourceforge.net/. In this project author allows you to download files by providing a url.
The problem comes through where my server does not have enough memory to keep a file that is larger than 20 meg in memory.So to solve this problem i modified the code to not download files larger than 10 meg's , if the file is larger it would then allow the user to download the first 10 megs. The user must then continue the download and hopefully get the second 10 megs. Now i have got all this working , except when the user needs to join the files they downloaded i end up with corrupt files , as far as i can tell something is either being added or removed via the download.
I am currently join the files together by reading all the files then writing them to one file.This should work since i am reading and writing in bytes. The code i used to join the files is listed here http://www.geekpedia.com/tutorial201_Splitting-and-joining-files-using-C.html
I do not have the exact code with me atm , as soon as i am home i will post the exact code if anyone is willing to help out.
Please let me know if i am missing out anything or if there is a better way to do this , i.e what could i use as an alternative to a memory stream. The source code for the original project which i made changes to can be found here http://asproxy.sourceforge.net/download.html , it should be noted i am using version 5.0. The file i modified is called WebDataCore.cs and i modified line 606 to only too till 10 megs of data had been loaded the continue execution.
Let me know if there is anything i missed.
Thanks

You shouldn't split for memory reasons... the reason to split is usually to avoid having to re-download everything in case of failure. If memory is an issue, you are doing it wrong... you shouldn't be buffering in memory, for example.
The easiest way to download a file is simply:
using(WebClient client = new WebClient()) {
client.DownloadFile(remoteUrl, localPath);
}
Re your split/join code - again, the problem is that you are buffering everything in memory; File.ReadAllBytes is a bad thing unless you know you have small files. What you should have is something like:
byte[] buffer = new byte[8192]; // why not...
int read;
while((read = inStream.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, read);
}
This uses a moderate buffer to pump data between the two as a stream. A lot more efficient. The loop says:
try to read some data (at most, the buffer-size)
(this will read at least 1 byte, or we have reached the end of the stream)
if we read something, write this many bytes from the buffer to the output

In the end i have found that by using a FTP request i was able to get arround the memory issue and the file is saved correctly.
Thanks for all the help

That example is loading each entire chunk into memory, instead you could do something like this:
int bufSize = 1024 * 32;
byte[] buffer = new byte[bufSize];
using (FileStream outputFile = new FileStream(OutputFileName, FileMode.OpenOrCreate,
FileAccess.Write, FileShare.None, bufSize))
{
foreach (string inputFileName in inputFiles)
{
using (FileStream inputFile = new FileStream(inputFileName, FileMode.Append,
FileAccess.Write, FileShare.None, buffer.Length))
{
int bytesRead = 0;
while ((bytesRead = inputFile.Read(buffer, 0, buffer.Length)) != 0)
{
outputFile.Write(buffer, 0, bytesRead);
}
}

Copy a file without using the windows file cache

Anybody know of a way to copy a file from path A to path B and suppressing the Windows file system cache?
Typical use is copying a large file from a USB drive, or server to your local machine. Windows seems to swap everything out if the file is really big, e.g. 2GiB.
Prefer example in C#, but I'm guessing this would be a Win32 call of some sort if possible.

In C# I have found something like this to work, this can be changed to copy directly to destination file:
public static byte[] ReadAllBytesUnbuffered(string filePath)
{
const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;
var fileInfo = new FileInfo(filePath);
long fileLength = fileInfo.Length;
int bufferSize = (int)Math.Min(fileLength, int.MaxValue / 2);
bufferSize += ((bufferSize + 1023) & ~1023) - bufferSize;
using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.None,
bufferSize, FileFlagNoBuffering | FileOptions.SequentialScan))
{
long length = stream.Length;
if (length > 0x7fffffffL)
{
throw new IOException("File too long over 2GB");
}
int offset = 0;
int count = (int)length;
var buffer = new byte[count];
while (count > 0)
{
int bytesRead = stream.Read(buffer, offset, count);
if (bytesRead == 0)
{
throw new EndOfStreamException("Read beyond end of file EOF");
}
offset += bytesRead;
count -= bytesRead;
}
return buffer;
}
}

Even more important, there are FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING.
MSDN has a nice article on them both: http://support.microsoft.com/kb/99794

I am not sure if this helps, but take a look at Increased Performance Using FILE_FLAG_SEQUENTIAL_SCAN.
SUMMARY
There is a flag for CreateFile()
called FILE_FLAG_SEQUENTIAL_SCAN which
will direct the Cache Manager to
access the file sequentially.
Anyone reading potentially large files
with sequential access can specify
this flag for increased performance.
This flag is useful if you are reading
files that are "mostly" sequential,
but you occasionally skip over small
ranges of bytes.

If you dont mind using a tool, ESEUTIL worked great for me.
You can check out this blog entry comparing Buffered and NonBuffered IO functions and from where to get ESEUTIL.
copying some text from the technet blog:
So looking at the definition of buffered I/O above, we can see where the perceived performance problems lie - in the file system cache overhead. Unbuffered I/O (or a raw file copy) is preferred when attempting to copy a large file from one location to another when we do not intend to access the source file after the copy is complete. This will avoid the file system cache overhead and prevent the file system cache from being effectively flushed by the large file data. Many applications accomplish this by calling CreateFile() to create an empty destination file, then using the ReadFile() and WriteFile() functions to transfer the data.
CreateFile() - The CreateFile function creates or opens a file, file stream, directory, physical disk, volume, console buffer, tape drive, communications resource, mailslot, or named pipe. The function returns a handle that can be used to access an object.
ReadFile() - The ReadFile function reads data from a file, and starts at the position that the file pointer indicates. You can use this function for both synchronous and asynchronous operations.
WriteFile() - The WriteFile function writes data to a file at the position specified by the file pointer. This function is designed for both synchronous and asynchronous operation.
For copying files around the network that are very large, my copy utility of choice is ESEUTIL which is one of the database utilities provided with Exchange.

Eseutil is a correct answer, also since Win7 / 2008 R2, you can use the /j switch in Xcopy, which has the same effect.

I understand this question was 11 years ago, nowadays there is robocopy which is kind of replacement for xcopy.
you need to check /J option
/J :: copy using unbuffered I/O (recommended for large files)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Reading 'Zip' files with FileStream - c#

The actual code would be helpful. Are you using BinaryReader / BinaryWriter? (i.e. data based rather than text based). You could try using a hex file compare (e.g. Beyond Compare) to compare the original and copy and see if that gives you any clues.

It might be that you are overwriting (instead of appending to) the existing file with each chunk received? Therefore the file's final size will be <= the size of one chunk. But without any code, it's difficult to tell the reason for the problem.

Related

Most efficient way to transmit file over TCP

TCP download directly to storage

Find Length of Stream object in WCF Client?

Joining Binary files that have been split via download

Copy a file without using the windows file cache

Categories

Resources