I have an C# console app where I want to upload multiple files(roughly ~20k files in single run with each file less than 5 mb) (not multi-part) to S3 bucket . One way is to call PutObjectRequest in for each loop but i don`t think so its most effiecient way of doing so.
Is there any better way of uploading multiple files to S3?
I have all files in a local hard disk , where I have to change file name before uploading
You cannot upload multiple files in a single request, however, you can easily upload multiple files in parallel.
The simplest way would be use the Task Parallel Library
Related
Currently, we are merging some output files through C# because we used to have these chunks in a drive on a server but now we are going to move these files directly from Snowflake to the S3 Bucket so it should be better to merge these files on the S3 bucket, we know that AWS has some function call Multipart Upload but we don't know if we could upload these files from Snowflake to S3 using that functionality.
At this moment we are exploring options, most of what we found is that we could create a lambda function for merging the files that are already in the S3 Bucket but the examples that we found are made mostly in python and our app is on .NET we also found about AWS Glue Crawler but we are not very sure about going with this option, Multipart Upload could be a good option but we lack experience with this type of implementations, so any help or example is welcome.
AWS Glue Crawler would be perfect in this situation.
Use a crawler to get the schema
Use a Glue ETL job to merge the files and write them back to S3
Make sure to turn on job bookmark(I will skip the previously merged files)
Example: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html
I've found tons of information about how to create and upload a zip file to Azure storage, but I'm trying to see if there's a good way to add content to an existing zip file in blob storage that doesn't involve downloading the whole blob, rebuilding in memory, then uploading again.
My use case involves zipping several hundred, up to several million, items into an archive to be stored in Azure blob storage for later download. I have code that will handle splitting so I don't wind up with a single several-GB size file, but I still run into memory management issues when dealing with large quantities of files. I'm trying to address this by creating the zip file in blob storage, and adding subsequent files to it one by one. I recognize this will incur cost for the additional writes, and that's fine.
I know how to use Append Blobs and Block Blobs, and I have working code to create the zip file and upload, but I can't seem to find out if there's a way to do this. Anyone managed to accomplish this, or able to confirm that this is not possible?
Since you're dealing with zip files, only way to add new files to an existing zip file is to download the blob, add new file to that zip file and then reupload that blob.
I have around 1GB of data which my client want to download from my portal as a zip file, but in the backend the files are served from AWS S3.
Currently i am downloading all files to memory stream and zipping it, which is taking a lot of time and at times it timeout and client is not sure if request is processing as I dont have download progress in the browser.
So is there a best solution to download huge data zipped from S3
Thanks
You could run some code on an Amazon EC2 instance that downloads the data from Amazon S3 (very quick if in the same region), zips it, then puts the zip back into S3.
The user can then download it directly from S3.
If you want to get fancy, they could download via a Pre-Signed URL and you could have a lifecycle rule that deletes it after a day or two.
OK, so I have little experience with S3 myself, but in general cases like this call for asynchronous processing. As in - when your user clicks "download", you initiate a background process that downloads and zips the files in some temporary location. During this, your client sees something like "preparing download, please wait". Ideally with a progress bar so that he can see that the process isn't stalled. When it's done, the download starts for real and without any timeouts since you already have the full ZIP file in a temp location.
Alternatively, see if you can streamline the whole process. Right now it sounds like you're downloading all the files to memory, creating the ZIP file in memory, and only then you're starting to output the first bytes to the client. You can do better. There are libraries out there that allow zipping "on-the-fly". While you're still downloading the files from S3 on one end, the other end is already spitting out ZIP file to the client. This way you don't need to keep everything in memory either.
I want to download a single file in a remote Zip file that is in the cloud. The zip file is too large for me to download as a whole therefore I have decided to look for a way to download only a single file(XML) that I need within the archive. I have tried and Tested a webclient and web request but it downloads the whole zip file(also too large file for these technuques usually fails). I'm eyeing the SharpZipLib but I dont know how to use it. Is it the right library I should use or there are other available ones I can get and test. Thank you so much.
I've got a project which requires a fairly complicated process and I want to make sure I know the best way to do this. I'm using ASP.net C# with Adobe Flex 3. The app server is Mosso (cloud server) and the file storage server is Amazon S3. The existing site can be viewed at NoiseTrade.com
I need to do this:
Allow users to upload MP3 files to
an album "widget"
After the user has uploaded their
album/widget, I need to
automatically zip the mp3 (for other
users to download) and upload the
zip along with the mp3 tracks to
Amazon S3
I actually have this working already (using client side processing in Flex) but this no longer works because of Adobe's flash 10 "security" update. So now I need to implement this server-side.
The way I am thinking of doing this is:
Store the mp3 in a temporary folder
on the app server
When the artist "publishes" create a
zip of the files in that folder
using a c# library
Start the amazon S3 upload process (zip and mp3s)
and email the user when it is
finished (as well as deleting the
temporary folder)
The major problem I see with this approach is that if a user deletes or adds a track later on I'll have to update the zip file but the temporary files will not longer exist.
I'm at a loss at the best way to do this and would appreciate any advice you might have.
Thanks!
The bit about updating the zip but not having the temporary files if the user adds or removes a track leads me to suspect that you want to build zips containing multiple tracks, possibly complete albums. If this is incorrect and you're just putting a single mp3 into each zip, then StingyJack is right and you'll probably end up making the file (slightly) larger rather than smaller by zipping it.
If my interpretation is correct, then you're in luck. Command-line zip tools frequently have flags which can be used to add files to or delete files from an existing zip archive. You have not stated which library or other method you're using to do the zipping, but I expect that it probably has this capability as well.
MP3's are compressed. Why bother zipping them?
I would say it is not necessary to zip a compressed file format, you are only gong to get a five percent reduction in filesize, give or take a little. Mp3's dont really zip up by their nature the have compressed most of the possible data already.
DotNetZip can zip up files from C#/ASP.NET. I concur with the prior posters regarding compressibility of MP3s. DotNetZip will automatically skip compression on MP3, and just store the file, just for this reason. It still may be interesting to use a zip as a packaging/archive container, aside from the compression.
If you change the zip file later (user adds a track), you could grab the .zip file from S3, and just update it. DotNetZip can update zip files, too. But in this case you would have to pay for the transfer cost into and out of S3.
DotNetZip can do all of this with in-memory handling of the zips - though that may not be feasible for large archives with lots of MP3s and lots of concurrent users.