I've got a project which requires a fairly complicated process and I want to make sure I know the best way to do this. I'm using ASP.net C# with Adobe Flex 3. The app server is Mosso (cloud server) and the file storage server is Amazon S3. The existing site can be viewed at NoiseTrade.com
I need to do this:
Allow users to upload MP3 files to
an album "widget"
After the user has uploaded their
album/widget, I need to
automatically zip the mp3 (for other
users to download) and upload the
zip along with the mp3 tracks to
Amazon S3
I actually have this working already (using client side processing in Flex) but this no longer works because of Adobe's flash 10 "security" update. So now I need to implement this server-side.
The way I am thinking of doing this is:
Store the mp3 in a temporary folder
on the app server
When the artist "publishes" create a
zip of the files in that folder
using a c# library
Start the amazon S3 upload process (zip and mp3s)
and email the user when it is
finished (as well as deleting the
temporary folder)
The major problem I see with this approach is that if a user deletes or adds a track later on I'll have to update the zip file but the temporary files will not longer exist.
I'm at a loss at the best way to do this and would appreciate any advice you might have.
Thanks!
The bit about updating the zip but not having the temporary files if the user adds or removes a track leads me to suspect that you want to build zips containing multiple tracks, possibly complete albums. If this is incorrect and you're just putting a single mp3 into each zip, then StingyJack is right and you'll probably end up making the file (slightly) larger rather than smaller by zipping it.
If my interpretation is correct, then you're in luck. Command-line zip tools frequently have flags which can be used to add files to or delete files from an existing zip archive. You have not stated which library or other method you're using to do the zipping, but I expect that it probably has this capability as well.
MP3's are compressed. Why bother zipping them?
I would say it is not necessary to zip a compressed file format, you are only gong to get a five percent reduction in filesize, give or take a little. Mp3's dont really zip up by their nature the have compressed most of the possible data already.
DotNetZip can zip up files from C#/ASP.NET. I concur with the prior posters regarding compressibility of MP3s. DotNetZip will automatically skip compression on MP3, and just store the file, just for this reason. It still may be interesting to use a zip as a packaging/archive container, aside from the compression.
If you change the zip file later (user adds a track), you could grab the .zip file from S3, and just update it. DotNetZip can update zip files, too. But in this case you would have to pay for the transfer cost into and out of S3.
DotNetZip can do all of this with in-memory handling of the zips - though that may not be feasible for large archives with lots of MP3s and lots of concurrent users.
Related
I've found tons of information about how to create and upload a zip file to Azure storage, but I'm trying to see if there's a good way to add content to an existing zip file in blob storage that doesn't involve downloading the whole blob, rebuilding in memory, then uploading again.
My use case involves zipping several hundred, up to several million, items into an archive to be stored in Azure blob storage for later download. I have code that will handle splitting so I don't wind up with a single several-GB size file, but I still run into memory management issues when dealing with large quantities of files. I'm trying to address this by creating the zip file in blob storage, and adding subsequent files to it one by one. I recognize this will incur cost for the additional writes, and that's fine.
I know how to use Append Blobs and Block Blobs, and I have working code to create the zip file and upload, but I can't seem to find out if there's a way to do this. Anyone managed to accomplish this, or able to confirm that this is not possible?
Since you're dealing with zip files, only way to add new files to an existing zip file is to download the blob, add new file to that zip file and then reupload that blob.
I have around 1GB of data which my client want to download from my portal as a zip file, but in the backend the files are served from AWS S3.
Currently i am downloading all files to memory stream and zipping it, which is taking a lot of time and at times it timeout and client is not sure if request is processing as I dont have download progress in the browser.
So is there a best solution to download huge data zipped from S3
Thanks
You could run some code on an Amazon EC2 instance that downloads the data from Amazon S3 (very quick if in the same region), zips it, then puts the zip back into S3.
The user can then download it directly from S3.
If you want to get fancy, they could download via a Pre-Signed URL and you could have a lifecycle rule that deletes it after a day or two.
OK, so I have little experience with S3 myself, but in general cases like this call for asynchronous processing. As in - when your user clicks "download", you initiate a background process that downloads and zips the files in some temporary location. During this, your client sees something like "preparing download, please wait". Ideally with a progress bar so that he can see that the process isn't stalled. When it's done, the download starts for real and without any timeouts since you already have the full ZIP file in a temp location.
Alternatively, see if you can streamline the whole process. Right now it sounds like you're downloading all the files to memory, creating the ZIP file in memory, and only then you're starting to output the first bytes to the client. You can do better. There are libraries out there that allow zipping "on-the-fly". While you're still downloading the files from S3 on one end, the other end is already spitting out ZIP file to the client. This way you don't need to keep everything in memory either.
As part of our installer build, we have to zip thousands of large data files into about ten or twenty 'packages' with a few hundred (or even thousands of) files in each which are all dependent on being kept with the other files in the package. (They are versioned together if you will.)
Then during the actual install, the user selects which packages they want included on their system. This also lets them download updates to the packages from our site as one large, versioned file rather than asking them to download thousands of individual ones which could also lead to them being out of sync with others in the same package.
Since these are data files, some of them change regularly during the design and coding stages, meaning we then have to re-compress all files in that particular zip package, even if only one file has changed. This makes the packaging step of our installer build take well over an hour each time, with most of that going to re-compressing things that we haven't touched.
We've looked into leaving the zip packages alone, then replacing specific files inside them, but inserting and removing large files from the middle of a zip doesn't give us that much of a performance boost. (A little, but not enough that its worth it.)
I'm wondering if its possible to pre-process files down into a cached raw 'compressed state' that matches how it would be written to the zip package, but only the data itself, not the zip header info, etc.
My thinking is if that is possible, during our build step, we would first look for any data file that doesn't have a compressed cache associated with it, and if not, we would compress that file and write the result to the cache.
Next we would simply append all of the caches together in a file stream, adding any appropriate zip header needed for the files.
This would mean we are still recreating the entire zip during each build, but we are only recompressing data that has changed. The rest would just be written as-is which is very fast since it is a straight write-to-disk. And if a data file changes, its cache is destroyed, so next build-pass it would be recreated.
However, I'm not sure such a thing is possible. Is it, and if so, is there any documentation to show how one would go about attempting this?
Yes, that's possible. The most straightforward approach would be to zip each file individually into its own associated zip archive with one entry. When any file is modified, you replace its associated zip file to keep all of those up to date. Then you can write a simple program to take a set of those single entry zip files and merge them into a single zip file. You will need to refer to the documentation in the PKZip appnote. Take a look at that.
Now that you've read the appnote, what you need to do is use the local header, data, and central header from each individual zip file, write the local header and data as is sequentially to the new zip file, and save the central header and the offsets of the local headers in the new file. Then at the end of the new file save the current offset, write a new central directory using the central headers you saved, updating the offsets appropriately, and ending with a new end of central directory record with the offset of the start of the central directory.
Update:
I decided this was a useful enough thing to write. You can get it here.
You could zip each file before hand, and then "zip" them together with no compression at the end to quickly aggregate them into a distributable package. It won't be as efficient as compressing all the data at once, but should be faster to make modifications.
I cannot seem to locate an actual exe that implements this type of functionality. It appears that most existing tools I've tried that have the ability to merge/update will reprocess(compress) the data stream as you have already stated you saw.
However it seems what you describe can be done if you or someone wants to write it. If you take a look at this link for the ZIP file format specification, you can get an overview of the structure you would have to parse out and process. It looks like you can pretty quickly go from file to file gathering up and discarding the files of interest, then merging in your new/updated files. You would still need to rebuild a new central directory (refer to section 4.3.6 of the above linked document) within your new destination archive.
After a little more digging, the DotNetZip Library forum has a message asking about the same type of functionality which also gives a description just like I described above. It also links to this document which seems to indicate that support for that may be added to the DotNetZip library for you to further experiment with.
my users upload their files in my Web Application and the application saves their files in the server that has the most free space. The result is that one folder of a user may contain multiple files that are saved in multiple servers.
Now i want to give my users the option to download their whole folder as a ZIP file.
Can somebody provide an appropriate namespace or guidance of what will be the best practice to achieve this functionality?
The application is written in C# .NET. So far i saw
1) http://msdn.microsoft.com/en-us/library/system.io.packaging.aspx
together with
2) http://msdn.microsoft.com/en-us/library/system.io.packaging.zippackage.aspx
Am i looking at the right direction?
Let say a user has his files over 3 servers named server1,server3 and server6
point 1. You must have those locations somewhere? right?
now suppose server7 needs to pack those files into 1 zip and stream to the client.
You will need to download all those files to server7 and then pack those files to zip archive.
To download the files you can use webrequest method. Hope this is clear.
YOU WILL NEED TO DOWNLOAD THE FILES TO A SINGLE LOCATION BEFORE ZIPPING THEM, there is no shortcut over this.
Suppose, I have a list of MP3 files on my server. And I want a user to download multiple files (which he wants through any means). For, this what i want is to create the zip file dynamically and while saving it into the Output Stream using the dotnetzip or ioniczip libraries.
Well, that's not the perfect solution if the zip file got heavy in size. As, in that scenario the server doesn't support resumable downloads. So, to overcome this approach, I need to handle the zip file structure internally and provide the resume support.
So, is there any library (open source) which i can use to provide resumable dyanamic zip files stream directly to the Output Stream. Or, if possible I will be happy if someone let me know the structure of zip file specially the header content + data content.
Once a download has started, you should not alter the ZIP file anymore. Because then a resume will just result in a broken ZIP file. So make sure your dynamically created ZIP file stays available!
The issue of providing resume-functionality was solved in this article for .NET 1.1, and it is still valid and functional.