In my application, the user selects a big file (>100 mb) on their drive. I wish for the program to then take the file that was selected and chop it up into archived parts that are 100 mb or less. How can this be done? What libraries and file format should I use? Could you give me some sample code? After the first 100mb archived part is created, I am going to upload it to a server, then I will upload the next 100mb part, and so on until the upload is finished. After that, from another computer, I will download all these archived parts, and then I wish to connect them into the original file. Is this possible with the 7zip libraries, for example? Thanks!
UPDATE: From the first answer, I think I'm going to use SevenZipSharp, and I believe I understand now how to split a file into 100mb archived parts, but I still have two questions:
Is it possible to create the first 100mb archived part and upload it before creating the next 100mb part?
How do you extract a file with SevenZipSharp from multiple splitted archives?
UPDATE #2: I was just playing around with the 7-zip GUI and creating multi-volume/split archives, and I found that selecting the first one and extracting from it will extract the whole file from all of the split archives. This leads me to believe that paths to the subsequent parts are included in the first one (or is it consecutive?). However, I'm not sure if this would work directly from the console, but I will try that now, and see if it solves question #2 from the first update.
Take a look at SevenZipSharp, you can use this to create your spit 7z files, do whatever you want to upload them, then extract them on the server side.
To split the archive look at the SevenZipCompressor.CustomParameters member, passing in "v100m". (you can find more parameters in the 7-zip.chm file from 7zip)
You can split the data into 100MB "packets" first, and then pass each packet into the compressor in turn, pretending that they are just separate files.
However, this sort of compression is usually stream-based. As long as the library you are using will do its I/O via a Stream-derived class, it would be pretty simple to implement your own Stream that "packetises" the data any way you like on the fly - as data is passed into your Write() method you write it to a file. When you exceed 100MB in that file, you simply close that file and open a new one, and continue writing.
Either of these approaches would allow you to easily upload one "packet" while continuing to compress the next.
edit
Just to be clear - Decompression is just the reverse sequence of the above, so once you've got the compression code working, decompression will be easy.
Related
As part of our installer build, we have to zip thousands of large data files into about ten or twenty 'packages' with a few hundred (or even thousands of) files in each which are all dependent on being kept with the other files in the package. (They are versioned together if you will.)
Then during the actual install, the user selects which packages they want included on their system. This also lets them download updates to the packages from our site as one large, versioned file rather than asking them to download thousands of individual ones which could also lead to them being out of sync with others in the same package.
Since these are data files, some of them change regularly during the design and coding stages, meaning we then have to re-compress all files in that particular zip package, even if only one file has changed. This makes the packaging step of our installer build take well over an hour each time, with most of that going to re-compressing things that we haven't touched.
We've looked into leaving the zip packages alone, then replacing specific files inside them, but inserting and removing large files from the middle of a zip doesn't give us that much of a performance boost. (A little, but not enough that its worth it.)
I'm wondering if its possible to pre-process files down into a cached raw 'compressed state' that matches how it would be written to the zip package, but only the data itself, not the zip header info, etc.
My thinking is if that is possible, during our build step, we would first look for any data file that doesn't have a compressed cache associated with it, and if not, we would compress that file and write the result to the cache.
Next we would simply append all of the caches together in a file stream, adding any appropriate zip header needed for the files.
This would mean we are still recreating the entire zip during each build, but we are only recompressing data that has changed. The rest would just be written as-is which is very fast since it is a straight write-to-disk. And if a data file changes, its cache is destroyed, so next build-pass it would be recreated.
However, I'm not sure such a thing is possible. Is it, and if so, is there any documentation to show how one would go about attempting this?
Yes, that's possible. The most straightforward approach would be to zip each file individually into its own associated zip archive with one entry. When any file is modified, you replace its associated zip file to keep all of those up to date. Then you can write a simple program to take a set of those single entry zip files and merge them into a single zip file. You will need to refer to the documentation in the PKZip appnote. Take a look at that.
Now that you've read the appnote, what you need to do is use the local header, data, and central header from each individual zip file, write the local header and data as is sequentially to the new zip file, and save the central header and the offsets of the local headers in the new file. Then at the end of the new file save the current offset, write a new central directory using the central headers you saved, updating the offsets appropriately, and ending with a new end of central directory record with the offset of the start of the central directory.
Update:
I decided this was a useful enough thing to write. You can get it here.
You could zip each file before hand, and then "zip" them together with no compression at the end to quickly aggregate them into a distributable package. It won't be as efficient as compressing all the data at once, but should be faster to make modifications.
I cannot seem to locate an actual exe that implements this type of functionality. It appears that most existing tools I've tried that have the ability to merge/update will reprocess(compress) the data stream as you have already stated you saw.
However it seems what you describe can be done if you or someone wants to write it. If you take a look at this link for the ZIP file format specification, you can get an overview of the structure you would have to parse out and process. It looks like you can pretty quickly go from file to file gathering up and discarding the files of interest, then merging in your new/updated files. You would still need to rebuild a new central directory (refer to section 4.3.6 of the above linked document) within your new destination archive.
After a little more digging, the DotNetZip Library forum has a message asking about the same type of functionality which also gives a description just like I described above. It also links to this document which seems to indicate that support for that may be added to the DotNetZip library for you to further experiment with.
Almost all of file transfer softwares like [NetSupport, Radmin, PcAnyWhere..] and also the different codes i used in my application, it slows down the transfer speed when you send alot of small sized files < 1kb like Folder of a game that has alot of files.
for example on a LAN (ethernet CAT5 cables) i send a single file, let say a video, the transfer rate is between 2MB and 9MB
but when i send a folder of a game that has alot of files the transfer rate is about 300kb-800kb
as i guess it's because the way of sending a file:
Send File Info [file_path,file_Size].
Send file bytes [loop till end of the file].
End Transfer [ensure it received completely].
but when you use the regular windows [copy-paste] on a shared folder on the network, the transfer rate of sending a folder is always fast like sending a single file.
so im trying to develop a file transfer application using [WCF service c# 4.0] that would use the maximum speed available on LAN, and I'm thinking about this way:
Get all files from the folder.
if(FileSize<1 MB)
{
Create additional thread to send;
SendFile(FilePath);
}
else
{
Wait for the large file to be sent. // fileSize>1MB
}
void SendFile(string path) // a regular single file send.
{
SendFileInfo;
Open Socket and wait for server application to connect;
SendFileBytes;
Dispose;
}
but im confused about using more than one Socket for a file transfer, because that will use more ports and more time (delay of listening and accepting).
so is it a good idea to do it?
need an explaination about if it's possible to do, how to do it, a better protocol than tcp that would meant for this.
thanks in advance.
It should be noted you won't ever achieve 100% LAN speed usage - I'm hoping you're not hoping for that - there are too many factors there.
In response to your comment as well, you can't reach the same level that the OS uses to transfer files, because you're a lot further away from the bare metal than windows is. I believe file copying in Windows is only a layer or two above the drivers themselves (possibly even within the filesystem driver) - in a WCF service you're a lot further away!
The simplest thing for you to do will be to package multiple files into archives and transmit them that way, then at the receiving end you unpack the complete package into the target folder. Sure, some of those files might already be compressed and so won't benefit - but in general you should see a big improvement. For rock-solid compression in which you can preserve directory structure, I'd consider using SharpZipLib
A system that uses compression intelligently (probably medium-level, low CPU usage but which will work well on 'compressible' files) might match or possibly outperform OS copying. Windows doesn't use this method because it's hopeless for fault-tolerance. In the OS, a transfer halted half way through a file will still leave any successful files in place. If the transfer itself is compressed and interrupted, everything is lost and has to be started again.
Beyond that, you can consider the following:
Get it working using compression by default first before trying any enhancements. In some cases (depending on size/no. files) it might be you can simply compress the whole folder and then transmit it in one go. Beyond a certain size, however, and this might take too long, so you'll want to create a series of smaller zips.
Write the compressed file to a temporary location on disk as it's being received, don't buffer the whole thing in memory. Delete the file once you've then unpacked it into the target folder.
Consider adding the ability to mark certain file types as being able to be sent 'naked'- i.e. uncompressed. That way you can exclude .zips, avis etc files from the compression process. That said, a folder with a million 1kb zip files will clearly benefit from being packed into one single archive - so perhaps give yourself the ability to set a min size beyond which that file will still be packed into a compressed folder (or perhaps a file count/size on disk ratio for a folder itself - including sub-folders).
Beyond this advice you will need to play around to get the best results.
perhaps, an easy solution would be gathering all files together onto a big stream (like zipping them, but just append to make this fast) and send this one stream. This would give more speed, but will use up some cpu on both devices and a good idea how to separate all files in the stream.
But using more ports would, from what i know, only be a disadvantage, since there would be more different streams colliding and so the speed would go down.
All,
I'm making a training kit that has content given to use with 2 VOB files that I need the software to automatically merge to 1. We'll be getting upto 10-15 vob files from this vender and our requirements are to move to a single file.
Is merging these files as easy as opening byte streams and combining them?
Thanks!
If the specifications of the files match it should be possible to use the header from the first file and copy the remaining files minus their header into one file. But the specifications needs to match exactly on everything from encoding type and parameters to number of audio channels.
If so, then all you need to do is read all the files and skip the first xxx bytes of every file except the first one.
It won't work if the VOB-files are encrypted (DVD encryption).
Note: This is a job specialized tools do well. They are optimized and (more or less) bug free. So if you can, use them (i.e. from the command line).
No, it is not simple merging. Otherwise old DOS command >type 1.VOB, 2.VOB > Final.VOB would have done the job.
Unless it is for some learning, just use any VOB merging tool to merge these two.
A lot of this is probably going to depend on if the VOB files have the same resolution and bit rate, as well ensuring a lot of other encoding parameters are the same. If they are using the exact same encoding parameters, simply doing a concatenation of the files will probably work. My experience with DVDs shows that files from the DVD work fine when this is done. However, my first guess is that this wouldn't work if there was any format differences between the files.
I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with .NET to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.
So there would be no need for extra space on my drive. Any ideas or suggestions?
Yes. Zip is a streamed format which means that you can use the data as you decompress it rather than having to decompress everything first.
With .net's System.IO.Compression classes you can apply similar compression as used in zip files (Deflate & GZip) to any stream you like, but if you want to work with actual zip format files you'll need a third party library like this one (sharpziplib).
A better solution might be to keep the files decompressed on the drive, but turn on compression on the file system level. This way you'll just be reading CSV files, and the OS will take care of making sure it doesn't take too much space.
Anyhoo, to answer your question, maybe the GZipStream class can help you.
sharpziplib allows for stream-based decompression - see this related question - the item provides similar stream-based Read methods, so you can process each item like you would with any stream.
I'm not sure about zip files, but you could use GZ format with GZipSteam (works like any other input stream). Unfortunately, the entire System.IO.Compression namespace is only 2 classes (the other does DEFLATE).
EDIT: There's a class called ZipPackage. I'm not sure how if it will let you do decompression streaming, but it might be worth looking into.
Also, take a look at #ziplib.
My server doesnt allow upload/download of big files. On the other hand, I built a bootstrapper that needs to upload/download big files.
How can I split a big file into smaller subfiles.. and do the merging later on?
An already done c# library would be great... but I'm happy hear suggestions about how to program this myself... or even use a utility.
** Windows platform **
On Unix, you can use the split command to break apart the file, and then use cat to concatenate them together.
split -b 1024M bigfile.tar.gz bigfile
This will create oodles of files like bigfileaa bigfileab, etc.
So then ftp all the little beasties to the destination and do the cat:
cat bigfile* > bigfile.tar.gz
On Windows, you might have an option in your Zip application to break apart an archive and remerge it on the other end. Actually, a googling of the search terms: zip split turns up several such options.
On windows you can easyly split it with WinRar.
Or you do it "with your own hand":
1) 1
2) 2
Every zip program I've ever used has this ability.
7zip is my current favorite on windows. It has a nice command line version, too.
You can make a split and join program with a handful of lines each. Just read some fixed amount (512KB, 4MB, whatever) from a file and write it out to a new file. Repeat this (and change the filename you write to) until you reach the end of the file.
Another program needs to read from these files and write their contents (one after another) to a target file.
Pretty easy, really, and if you want to get some programming experience it would be a good exercise.
Or, you can write a small application to meet your needs...
Just bytes read and then write....So, it can eazily split the big file into small ones