I have logic that downloads a group of files as a zip. The issue is there is no progress so the user does not know how far along the download is.
This Zip file doesn't exist before hand, the user selects the files they want to download and then I use the SharpZipLib nuget package to create a zip
and stream it to the response.
It seems I need to set the Content-Length header for the browser to show a total size progress indicator. The issue I'm having is it seems
this value has to be exact, if its too low or too high by 1 byte the file does not get downloaded properly. I can get an approximate
end value size by adding all the files size together and setting there to be no compressions level but I don't see a way I can calculate the final zip size exactly.
I hoped I could of just overesitmated the final size a bit and the browser would allow that but that doesn't work, the file isn't downloaded properly so you cant access it.
Here are some possible solution I've come up with but they have there own issues.
1 - I can create the zip on the server first and then stream it, therefore knowing the exact size I can set the Content-length. Issue with this
is the user will have to wait for all the files to be streamed to the web server, the zip to be created and then I can start streaming it to the user. While this is going on the user wont even see the file download as being started. This also results in more memory usage of the web server as it has to persist the entire zip file in memory.
2 - I can come up with my own progress UI, I will use the combined file sizes to get a rough final size estimation and then as the files are streamed I push updates to the user via signalR indicating the progress.
3- I show the user the total file size before download begins, this way they will at least have a way to assess themselves how far along it is. But the browser has no indication of how far along it is so if they may forget and when they look at the browser download progress there will be no indication how far along it is
These all have their own drawbacks. Is there a better way do this, ideally so its all handled by the browser?
Below is my ZipFilesToRepsonse method. It uses some objects that aren't shown here for simplicity sake. It also streams the files from azure blob storage
public void ZipFilesToResponse(HttpResponseBase response, IEnumerable<Tuple<string,string>> filePathNames, string zipFileName)
{
using (var zipOutputStream = new ZipOutputStream(response.OutputStream))
{
zipOutputStream.SetLevel(0); // 0 - store only to 9 - means best compression
response.BufferOutput = false;
response.AddHeader("Content-Disposition", "attachment; filename=" + zipFileName);
response.ContentType = "application/octet-stream";
Dictionary<string,long> sizeDictionary = new Dictionary<string, long>();
long totalSize = 0;
foreach (var file in filePathNames)
{
long size = GetBlobProperties(file.Item1).Length;
totalSize += size;
sizeDictionary.Add(file.Item1,size);
}
//Zip files breaks if we dont have exact content length
//and it isn't nesccarily the total lengths of the contents
//dont see a simple way to get it set correctly without downloading entire file to server first
//so for now we wont include a content length
//response.AddHeader("Content-Length",totalSize.ToString());
foreach (var file in filePathNames)
{
long size = sizeDictionary[file.Item1];
var entry = new ZipEntry(file.Item2)
{
DateTime = DateTime.Now,
Size = size
};
zipOutputStream.PutNextEntry(entry);
Container.GetBlockBlobReference(file.Item1).DownloadToStream(zipOutputStream);
response.Flush();
if (!response.IsClientConnected)
{
break;
}
}
zipOutputStream.Finish();
zipOutputStream.Close();
}
response.End();
}
There are a few Stackoverflow questions about this but none of them really offer a solution.
The Scenario - I'm creating an Outlook AddIn in VS2013. The user selects emails, hits my AddIn button and the emails are sent to a webservice to be stored in a database and linked to a client. Anyone, in any location will be able to open the email to view it.
Currently I am using the MailItem.SaveAs(filePath) function, then using File.ReadAllBytes(filePath) to create a byte array that can be sent to the webservice.
I delete the file as soon as I create the byte[]:
for (int x = 0; x < Emails.Count; x++)
{
//TODO: RMc - is there a better way than saving to disk? unable to convert MailItem directly to stream...!
Guid g = Guid.NewGuid();
string filePath = "c:\\temp\\" + g.ToString() + ".msg";
Emails.ElementAt(x).SaveAs(filePath);
byte[] by = File.ReadAllBytes(filePath);
File.Delete(filePath);//no longer needed now we have byte array
//this is where I create a list of objects with all required data to send to web service
}
Writing the file to disk is slow - it creates a *.msg file that may never actually get used if no one wants to view it. So I would like to be able to save the MailItem object to a byte array directly - and then I could save that to the database and only create the *.msg file if a user requires it.
The MailItem object appears to by dynamic, and so I think this is the problem.
Can anyone offer a solution or an alternative way of achieving what I have described?
Messages in the Outlook Object Model and Extended MAPI are not streamable. You can save a message in a particular format, but it will not be an exact copy. MSG format will be the closest you can get.
Use MailItem.SaveAs(..., olMsg) to save a message as an MSG file.
I am using MailKit/MimeKit 1.2.7 (latest NuGet version).
I am using ImapClient to receive emails that can have diverse attachments (images, text files, binary files, etc).
MimeMessage's Attachment property helps me access all these attachments --- unless the emails are being sent with Apple Mail and contain images (it seems that Apple Mail does not attach images with Content-Disposition "attachment" (read here ... comment from Jeffrey Stedfast at the very bottom).
Embedded images are not listed in the Attachments collection.
What are my options? Do I really have to traverse the body parts one by one and see what's inside? Or is there an easier solution?
The Working with Messages document lists a few ways of examining the MIME parts within a message, but another simple option might be to use the BodyParts property on the MimeMessage.
To start, let's take a look at how the MimeMessage.Attachments property works:
public IEnumerable<MimeEntity> Attachments {
get { return BodyParts.Where (x => x.IsAttachment); }
}
As you've already noted, the reason that this property doesn't return the attachments you are looking for is because they do not have Content-Disposition: attachment which is what the MimeEntity.IsAttachment property is checking for.
An alternate rule might be to check for a filename parameter.
var attachments = message.BodyParts.Where (x => x.ContentDisposition != null && x.ContentDisposition.FileName != null).ToList ();
Or maybe you could say you just want all images:
var images = message.BodyParts.OfType<MimePart> ().Where (x => x.ContentType.IsMimeType ("image", "*")).ToList ();
Hope that gives you some ideas on how to get the items you want.
I have a WCF webservice that saves files to a folder(about 200,000 small files).
After that, I need to move them to another server.
The solution I've found was to zip them then move them.
When I adopted this solution, I've made the test with (20,000 files), zipping 20,000 files took only about 2 minutes and moving the zip is really fast.
But in production, zipping 200,000 files takes more than 2 hours.
Here is my code to zip the folder :
using (ZipFile zipFile = new ZipFile())
{
zipFile.UseZip64WhenSaving = Zip64Option.Always;
zipFile.CompressionLevel = CompressionLevel.None;
zipFile.AddDirectory(this.SourceDirectory.FullName, string.Empty);
zipFile.Save(DestinationCurrentFileInfo.FullName);
}
I want to modify the WCF webservice, so that instead of saving to a folder, it saves to the zip.
I use the following code to test:
var listAes = Directory.EnumerateFiles(myFolder, "*.*", SearchOption.AllDirectories).Where(s => s.EndsWith(".aes")).Select(f => new FileInfo(f));
foreach (var additionFile in listAes)
{
using (var zip = ZipFile.Read(nameOfExistingZip))
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
zip.AddFile(additionFile.FullName);
zip.Save();
}
file.WriteLine("Delay for adding a file : " + sw.Elapsed.TotalMilliseconds);
sw.Restart();
}
The first file to add to the zip takes only 5 ms, but the 10,000 th file to add takes 800 ms.
Is there a way to optimize this ? Or if you have other suggestions ?
EDIT
The example shown above is only for test, in the WCF webservice, i'll have different request sending files that I need to Add to the Zip file.
As WCF is statless, I will have a new instance of my class with each call, so how can I keep the Zip file open to add more files ?
I've looked at your code and immediately spot problems. The problem with a lot of software developers nowadays is that they nowadays don't understand how stuff works, which makes it impossible to reason about it. In this particular case you don't seem to know how ZIP files work; therefore I would suggest you first read up on how they work and attempted to break down what happens under the hood.
Reasoning
Now that we're all on the same page on how they work, let's start the reasoning by breaking down how this works using your source code; we'll continue from there on forward:
var listAes = Directory.EnumerateFiles(myFolder, "*.*", SearchOption.AllDirectories).Where(s => s.EndsWith(".aes")).Select(f => new FileInfo(f));
foreach (var additionFile in listAes)
{
// (1)
using (var zip = ZipFile.Read(nameOfExistingZip))
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
// (2)
zip.AddFile(additionFile.FullName);
// (3)
zip.Save();
}
file.WriteLine("Delay for adding a file : " + sw.Elapsed.TotalMilliseconds);
sw.Restart();
}
(1) opens a ZIP file. You're doing this for every file you attempt to add
(2) Adds a single file to the ZIP file
(3) Saves the complete ZIP file
On my computer this takes about an hour.
Now, not all of the file format details are relevant. We're looking for stuff that will get increasingly worse in your program.
Skimming over the file format specification, you'll notice that compression is based on Deflate which doesn't require information on the other files that are compressed. Moving on, we'll notice how the 'file table' is stored in the ZIP file:
You'll notice here that there's a 'central directory' which stores the files in the ZIP file. It's basically stored as a 'list'. So, using this information we can reason on what the trivial way is to update that when implementing steps (1-3) in this order:
Open the zip file, read the central directory
Append data for the (new) compressed file, store the pointer along with the filename in the new central directory.
Re-write the central directory.
Think about it for a moment, for file #1 you need 1 write operation; for file #2, you need to read (1 item), append (in memory) and write (2 items); for file #3, you need to read (2 item), append (in memory) and write (3 items). And so on. This basically means that you're performance will go down the drain if you add more files. You've already observed this, now you know why.
A possible solution
In the previous solution I have added all files at once. That might not work in your use case. Another solution is to implement a merge that basically merges 2 files together every time. This is more convenient if you don't have all files available when you start the compression process.
Basically the algorithm then becomes:
Add a few (say, 16, files). You can toy with this number. Store this in -say- 'file16.zip'.
Add more files. When you hit 16 files, you have to merge the two files of 16 items into a single file of 32 items.
Merge files until you cannot merge anymore. Basically every time you have two files of N items, you create a new file of 2*N items.
Goto (2).
Again, we can reason about it. The first 16 files aren't a problem, we've already established that.
We can also reason what will happen in our program. Because we're merging 2 files into 1 file, we don't have to do as many read and writes. In fact, if you reason about it, you'll see that you have a file of 32 entries in 2 merges, 64 in 4 merges, 128 in 8 merges, 256 in 16 merges... hey, wait we know this sequence, it's 2^N. Again, reasoning about it we'll find that we need approximately 500 merges -- which is much better than the 200.000 operations that we started with.
Hacking in the ZIP file
Yet another solution that might come to mind is to overallocate the central directory, creating slack space for future entries to add. However, this probably requires you to hack into the ZIP code and create your own ZIP file writer. The idea is that you basically overallocate the central directory to a 200K entries before you get started, so that you can simply append in-place.
Again, we can reason about it: adding file now means: adding a file and updating some headers. It won't be as fast as the original solution because you'll need random disk IO, but it'll probably work fast enough.
I haven't worked this out, but it doesn't seem overly complicated to me.
The easiest solution is the most practical
What we haven't discussed so far is the easiest possible solution: one approach that comes to mind is to simply add all files at once, which we can again reason about.
Implementation is quite easy, because now we don't have to do any fancy things; we can simply use the ZIP handler (I use ionic) as-is:
static void Main()
{
try { File.Delete(#"c:\tmp\test.zip"); }
catch { }
var sw = Stopwatch.StartNew();
using (var zip = new ZipFile(#"c:\tmp\test.zip"))
{
zip.UseZip64WhenSaving = Zip64Option.Always;
for (int i = 0; i < 200000; ++i)
{
string filename = "foo" + i.ToString() + ".txt";
byte[] contents = Encoding.UTF8.GetBytes("Hello world!");
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
zip.AddEntry(filename, contents);
}
zip.Save();
}
Console.WriteLine("Elapsed: {0:0.00}s", sw.Elapsed.TotalSeconds);
Console.ReadLine();
}
Whop; that finishes in 4,5 seconds. Much better.
I can see that you just want to group the 200,000 files into one big single file, without compression (like a tar archive).
Two ideas to explore:
Experiment with other file formats than Zip, as it may not be the fastest. Tar (tape archive) comes to mind (with natural speed advantages due to its simplicity), it even has an append mode which is exactly what you are after to ensure O(1) operations. SharpCompress is a library that will allow you to work with this format (and others).
If you have control over your remote server, you could implement your own file format, the simplest I can think of would be to zip each new file separately (to store the file metadata such as name, date, etc. in the file content itself), and then to append each such zipped file to a single raw bytes file. You would just need to store the byte offsets (separated by columns in another txt file) to allow the remote server to split the huge file into the 200,000 zipped files, and then unzip each of them to get the meta data. I guess this is also roughly what tar does behind the scene :).
Have you tried zipping to a MemoryStream rather than to a file, only flushing to a file when you are done for the day? Of course for back-up purposes your WCF service would have to keep a copy of the received individual files until you are sure they have been "committed" to the remote server.
If you do need compression, 7-Zip (and fiddling with the options) is well worth a try.
You are opening the file repeatedly, why not add loop through and add them all to one zip, then save it?
var listAes = Directory.EnumerateFiles(myFolder, "*.*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".aes"))
.Select(f => new FileInfo(f));
using (var zip = ZipFile.Read(nameOfExistingZip))
{
foreach (var additionFile in listAes)
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
zip.AddFile(additionFile.FullName);
}
zip.Save();
}
If the files aren't all available right away, you could at least batch them together. So if you're expecting 200k files, but you only have received 10 so far, don't open the zip, add one, then close it. Wait for a few more to come in and add them in batches.
If you are OK with performance of 100 * 20,000 files, can't you simply partition your large ZIP into a 100 "small" ZIP files? For simplicity, create a new ZIP file every minute and put a time-stamp in the name.
You can zip all the files using .Net TPL (Task Parallel Library) like this:
while(0 != (read = sourceStream.Read(bufferRead, 0, sliceBytes)))
{
tasks[taskCounter] = Task.Factory.StartNew(() =>
CompressStreamP(bufferRead, read, taskCounter, ref listOfMemStream, eventSignal)); // Line 1
eventSignal.WaitOne(-1); // Line 2
taskCounter++; // Line 3
bufferRead = new byte[sliceBytes]; // Line 4
}
Task.WaitAll(tasks); // Line 6
There is a compiled library and source code here:
http://www.codeproject.com/Articles/49264/Parallel-fast-compression-unleashing-the-power-of
I am developing a website, in which client uploads some document files like doc, docx, htm, html, txt, pdf etc. I want to retrieve last modified date of an uploaded file. I have created one handler(.ashx) which does the job of saving the files.
Following is the code:
HttpPostedFile file = context.Request.Files[i];
string fileName = file.FileName;
file.SaveAs(Path.Combine(uploadPath, filename));
As you can see, its very simple to save the file using file.SaveAs() method. But this HttpPostedFile class is not exposing any property to retrieve last modified date of file.
So can anyone tell me how to retrieve last modified date of file before saving it to hard disk?
Today you can access to this information from client side using HTML5 api
//fileInput is a HTMLInputElement: <input type="file" multiple id="myfileinput">
var fileInput = document.getElementById("myfileinput");
// files is a FileList object (simliar to NodeList)
var files = fileInput.files;
for (var i = 0; i < files.length; i++) {
alert(files[i].name + " has a last modified date of " + files[i].lastModifiedDate);
}
Source and more information
You can't do this. An HTTP post request does not contain this information about an uploaded file.
Rau,
You can only get the date once it's on the server. If you're ok with this, then try:
string strLastModified =
System.IO.File.GetLastWriteTime(Server.MapPath("myFile.txt")).ToString("D");
the further caveat here being that this datetime will be the date at which it was saved on the server and not the datetime of the original file.
It is not possible, until you save the file to disk.
You typically cannot get the last modified date because the date is not stored in the file.
The Operating System actually stores file attributes like Created, Accessed, and Last Modified. See Where are “last modified date” and “last accessed date” saved?
(I say typically because certain file types like images may have EXIF tag data like the date/time the photo was taken.)