Joining Binary files that have been split via download - c#

I am trying to join a number of binary files that were split during download. The requirement stemmed from the project http://asproxy.sourceforge.net/. In this project author allows you to download files by providing a url.
The problem comes through where my server does not have enough memory to keep a file that is larger than 20 meg in memory.So to solve this problem i modified the code to not download files larger than 10 meg's , if the file is larger it would then allow the user to download the first 10 megs. The user must then continue the download and hopefully get the second 10 megs. Now i have got all this working , except when the user needs to join the files they downloaded i end up with corrupt files , as far as i can tell something is either being added or removed via the download.
I am currently join the files together by reading all the files then writing them to one file.This should work since i am reading and writing in bytes. The code i used to join the files is listed here http://www.geekpedia.com/tutorial201_Splitting-and-joining-files-using-C.html
I do not have the exact code with me atm , as soon as i am home i will post the exact code if anyone is willing to help out.
Please let me know if i am missing out anything or if there is a better way to do this , i.e what could i use as an alternative to a memory stream. The source code for the original project which i made changes to can be found here http://asproxy.sourceforge.net/download.html , it should be noted i am using version 5.0. The file i modified is called WebDataCore.cs and i modified line 606 to only too till 10 megs of data had been loaded the continue execution.
Let me know if there is anything i missed.
Thanks

You shouldn't split for memory reasons... the reason to split is usually to avoid having to re-download everything in case of failure. If memory is an issue, you are doing it wrong... you shouldn't be buffering in memory, for example.
The easiest way to download a file is simply:
using(WebClient client = new WebClient()) {
client.DownloadFile(remoteUrl, localPath);
}
Re your split/join code - again, the problem is that you are buffering everything in memory; File.ReadAllBytes is a bad thing unless you know you have small files. What you should have is something like:
byte[] buffer = new byte[8192]; // why not...
int read;
while((read = inStream.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, read);
}
This uses a moderate buffer to pump data between the two as a stream. A lot more efficient. The loop says:
try to read some data (at most, the buffer-size)
(this will read at least 1 byte, or we have reached the end of the stream)
if we read something, write this many bytes from the buffer to the output

In the end i have found that by using a FTP request i was able to get arround the memory issue and the file is saved correctly.
Thanks for all the help

That example is loading each entire chunk into memory, instead you could do something like this:
int bufSize = 1024 * 32;
byte[] buffer = new byte[bufSize];
using (FileStream outputFile = new FileStream(OutputFileName, FileMode.OpenOrCreate,
FileAccess.Write, FileShare.None, bufSize))
{
foreach (string inputFileName in inputFiles)
{
using (FileStream inputFile = new FileStream(inputFileName, FileMode.Append,
FileAccess.Write, FileShare.None, buffer.Length))
{
int bytesRead = 0;
while ((bytesRead = inputFile.Read(buffer, 0, buffer.Length)) != 0)
{
outputFile.Write(buffer, 0, bytesRead);
}
}

Related

Zip using C# of same file produces output zip of different size on server and windows 10

There is one dll and if we zip it on windows 10 and windows server 2012 using same code, it produces different size. Size difference is exactly 5 bytes. C# code is
private static void Zip()
{
var fileInfo = new FileInfo(#"C:\USB\adammigrate.dll");
using (Stream stream = File.Open(#"C:\USB\adammigrate.dll", FileMode.Open, FileAccess.Read, FileShare.Read))
using (var zipArchive = ZipFile.Open(#"C:\USB\adammigrate1.zip", ZipArchiveMode.Create))
{
var entry = zipArchive.CreateEntry("adammigrate.dll", CompressionLevel.NoCompression);
entry.LastWriteTime = fileInfo.LastWriteTime.ToUniversalTime();
using (Stream stream2 = entry.Open())
{
var buffer = new byte[BufferSize];
int numBytesRead;
while ((numBytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
stream2.Write(buffer, 0, numBytesRead);
}
}
}
}
Size on disk is same but actual size is differ by 5 bytes. PFA Zip Info for both the zip files.
The zip format is not guaranteed to have a single representation; it can be different sizes depending on specifics of the implementation (i.e. multiple encoding options are available, and it may or may not choose the absolute best one). The only important question is: does it unzip back to the original content? If it fails to unzip, then it is interesting. But being different sizes by itself doesn't mean anything.
If you want an absolute guarantee of getting the exact same zip output, you'll have to find an implementation that offers output stability as a documented feature; the implementation you're using clearly doesn't offer that. Usually tools like zip want to retain the ability to quietly improve their choices for your benefit between versions (and sometimes it might try to make things better generally, with the side-effect that sometimes it makes it worse).
If you were anticipating always getting the exact same bytes back: then zip is probably not an appropriate file format for you, unless you only compare the sizes and contents of the internal payloads (i.e. what they would be once decompressed), not the zip file itself.

Copy (multiple) files to multiple locations

Using C# (.NET 4.5) I want to copy a set of files to multiple locations (e.g. the contents of a folder to 2 USB drives attached to the computer).
Is there a more efficient way of doing that then just using foreach loops and File.Copy?
Working towards a (possible) solution.
My first thought was some kind of multi-threaded approach. After some reading and research I discovered that just blindly setting up some kind of parallel and/or async process is not a good idea when it comes to IO (as per Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?).
The bottleneck is the disk, especially if it's a traditional drive, as it can only read/write synchronously. That got me thinking, what if I read it once then output it in multiple locations? After all, in my USB drive scenario I'm dealing with multiple (output) disks.
I'm having trouble figuring out how to do that though. One I idea I saw (Copy same file from multiple threads to multiple destinations) was to just read all the bytes of each file into memory then loop through the destinations and write out the bytes to each location before moving onto the next file. It seems that's a bad idea if the files might be large. Some of the files I'll be copying will be videos and could be 1 GB (or more). I can't imagine it's a good idea to load a 1 GB file into memory just to copy it to another disk?
So, allowing flexibility for larger files, the closest I've gotten is below (based off How to copy one file to many locations simultaneously). The problem with this code is that I've still not got a single read and multi-write happening. It's currently multi-read and multi-write. Is there a way to further optimise this code? Could I read chunks into memory then write that chunk to each destination before moving onto the next chunk (like the idea above but chunked files instead of whole)?
files.ForEach(fileDetail =>
Parallel.ForEach(fileDetail.DestinationPaths, new ParallelOptions(),
destinationPath =>
{
using (var source = new FileStream(fileDetail.SourcePath, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var destination = new FileStream(destinationPath, FileMode.Create))
{
var buffer = new byte[1024];
int read;
while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, read);
}
}
}));
I thought I'd post my current solution for anyone else who comes across this question.
If anyone discovers a more efficient/quicker way to do this then please let me know!
My code seems to copy files a bit quicker than just running the copy synchronously but it's still not as fast as I'd like (nor as fast as I've seen some other programs do it). I should note that performance may vary depending on .NET version and your system (I'm using Win 10 with .NET 4.5.2 on a 13" MBP with 2.9GHz i5 (5287U - 2 core / 4 thread) + 16GB RAM). I've not even figured out the best combination of method (e.g. FileStream.Write, FileStream.WriteAsync, BinaryWriter.Write) and buffer size yet.
foreach (var fileDetail in files)
{
foreach (var destinationPath in fileDetail.DestinationPaths)
Directory.CreateDirectory(Path.GetDirectoryName(destinationPath));
// Set up progress
FileCopyEntryProgress progress = new FileCopyEntryProgress(fileDetail);
// Set up the source and outputs
using (var source = new FileStream(fileDetail.SourcePath, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize, FileOptions.SequentialScan))
using (var outputs = new CompositeDisposable(fileDetail.DestinationPaths.Select(p => new FileStream(p, FileMode.Create, FileAccess.Write, FileShare.None, bufferSize))))
{
// Set up the copy operation
var buffer = new byte[bufferSize];
int read;
// Read the file
while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
{
// Copy to each drive
await Task.WhenAll(outputs.Select(async destination => await ((FileStream)destination).WriteAsync(buffer, 0, read)));
// Report progress
if (onDriveCopyFile != null)
{
progress.BytesCopied = read;
progress.TotalBytesCopied += read;
onDriveCopyFile.Report(progress);
}
}
}
if (ct.IsCancellationRequested)
break;
}
I'm using CompositeDisposable from Reactive Extensions (https://github.com/Reactive-Extensions/Rx.NET).
IO operations in general should be considered as asynchronous as there is some hardware operations which are run outside your code, so you can try to introduce some async/await constructs for read/write operations, so you can continue the execution during hardware operations.
while ((read = await source.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await destination.WriteAsync(buffer, 0, read);
}
You also must to mark your lambda delegate as async to make this work:
async destinationPath =>
...
And you should await the resulting tasks all the way. You may find more information here :
Parallel foreach with asynchronous lambda
Nesting await in Parallel.ForEach

From Wcf streaming service to directly to disk

I have got a wcf service that sends me streams (large ones usually). As the client application my role is to get a stream over WCF and save it to disk. I've written some code but it seems like first getting the stream into ram and then write it to disk from ram. I want to safely get the stream and writing it directly to disk while not filling the ram with huge files. What is the good way of doing this? Here is what I did until now:
Stream sourceStream = SsClient.GetFile(FolderId, Helper.GetISession());
using (var targetStream = new FileStream(thisComputerPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None))
{
//read from the input stream in 65000 byte chunks
const int bufferLen = 65000;
var buffer = new byte[bufferLen];
int count;
while ((count = sourceStream.Read(buffer, 0, bufferLen)) > 0)
{
// save to output stream
targetStream.Write(buffer, 0, count);
}
targetStream.Close();
sourceStream.Close();
}
I hope I could explain my problem clear enough. Excuse me for my english by the way.
I don't mind using ram for buffering purposes or something like that, i just don't want it to be filled with 1-2 gb of streams each time as it would give clients computer hard times if it just has 2 gb of ram.
Did you check the following posts
How to Save a Stream
and
Writing large stream to a file
Let us know incase of any queries on these implementations.

how to copy one Stream object values to second Stream Object in asp.net

In my project user can upload file up to 1GB. I want to copy that uploaded file stream data to second stream.
If I use like this
int i;
while ( ( i = fuVideo.FileContent.ReadByte() ) != -1 )
{
strm.WriteByte((byte)i);
}
then it is taking so much time.
If i try to do this by byte array then I need to add array size in long which is not valid.
If someone has better idea to do this then please let me know.
--
Hi Khepri thanks for your response. I tried Stream.Copy but it is taking so much time to copy one stream object to second.
I tried with 8.02Mb file and it took 3 to 4 minutes.
The code i have added is
Stream fs = fuVideo.FileContent; //fileInf.OpenRead();
Stream strm = ftp.GetRequestStream();
fs.CopyTo(strm);
If i am doing something wrong then please let me know.
Is this .NET 4.0?
If so Stream.CopyTo is probably your best bet.
If not, and to give credit where credit is due, see the answer in this SO thread. If you're not .NET 4.0 make sure to read the comments in that thread as there are some alternative solutions (Async stream reading/writing) that may be worth investigating if performance is at an absolute premium which may be your case.
EDIT:
Based off the update, are you trying to copy the file to another remote destination? (Just guessing based on GetRequestStream() [GetRequestStream()]. The time is going to be the actual transfer of the file content to the destination. So in this case when you do fs.CopyTo(strm) it has to move those bytes from the source stream to the remote server. That's where the time is coming from. You're literally doing a file upload of a huge file. CopyTo will block your processing until it completes.
I'd recommend looking at spinning this kind of processing off to another task or at the least look at the asynchronous option I listed. You can't really avoid this taking a large period of time. You're constrained by file size and available upload bandwidth.
I verified that when working locally CopyTo is sub-second. I tested with a half gig file and a quick Stopwatch class returned a processing time of 800 millisecondss.
If you are not .NET 4.0 use this
static void CopyTo(Stream fromStream, Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = fromStream.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}

C# Reading 'Zip' files with FileStream

I have written a program that will etablish a network connection with a remote computer using TCPClient I am using it to transfer files in 100k chunks to a remote .net application and it inturn then writes them to the HardDrive. All file transfers work good except when it comes to ZIP files - it is curious to note that the reasembled file is always 98K...is there some dark secret to ZIP files that prevent them from being handled in this manner. Again all other file transfers work fine, image, xls, txt, chm, exe etc.
Confused
Well, you haven't shown any code so it's kinda tricky to say exactly what's wrong.
The usual mistake is to assume that Stream.Read reads all the data you ask it to instead of realising that it might read less, but that the amount it actually read is the return value.
In other words, the code shouldn't be:
byte[] buffer = new byte[input.Length];
input.Read(buffer, 0, buffer.Length);
output.Write(buffer, 0, buffer.Length);
but something like:
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ( (bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
But that's just a guess. If you could post some code, we'd have a better chance of figuring it out.
The actual code would be helpful.
Are you using BinaryReader / BinaryWriter?
(i.e. data based rather than text based).
You could try using a hex file compare (e.g. Beyond Compare) to compare the original and copy and see if that gives you any clues.
It might be that you are overwriting (instead of appending to) the existing file with each chunk received? Therefore the file's final size will be <= the size of one chunk.
But without any code, it's difficult to tell the reason for the problem.

Categories