System.IO.Compression.ZipArchive keeps file locked after dispose? - c#

I have a class that takes data from several sources and writes them to a ZIP file. I've benchmarked the class to check if using CompressionLevel.Optimal would be much slower than CompressionLevel.Fastest. But the benchmark throws an exception on different iterations and in different CompressionLevel values each time I run the benchmark.
I started removing the methods that add the file-content step by step until I ended up with the code below (inside the for-loop) which does basically nothing besides creating an empty zip-file and deleting it.
Simplified code:
var o = #"e:\test.zip";
var result = new FileInfo(o);
for (var i = 0; i < 1_000_000; i++)
{
// Alternate approach
// using(var archive = ZipFile.Open(o, ZipArchiveMode.Create))
using (var archive = new ZipArchive(result.OpenWrite(), ZipArchiveMode.Create, false, Encoding.UTF8))
{
}
result.Delete();
}
The loop runs about 100 to 15k iterations on my PC and then throws an IOException when trying to delete the file saying that the file (result) is locked.
So... did I miss something about how to use System.IO.Compression.ZipArchive? There is no close method for ZipArchive and using should dispose/close the archive... I've tried different .NET versions 4.6, 4.6.1, 4.7 and 4.7.2.
EDIT 1:
The result.Delete() is not part of the code that is benchmarked
EDIT 2:
Also tried to play around with Thread.Sleep(5/10/20) after the using block (therefore the result.Delete() to check if the lock persists) but up to 20ms the file is still locked at some point. Didnt tried higher values than 20ms.
EDIT 3:
Can't reprodurce the problem at home. Tried a dozen times at work and the loop never hit 20k iterations. Tried once here and it completed.
EDIT 4:
jdweng (see comments) was right. Thanks! Its somehow related to my "e:" partition on a local hdd. The same code runs fine on my "c:" partition on a local ssd and also on a network share.

In my experience files are may not be consistently unlocked when the dispose method for the stream returns. My best guess is that this is due to the file system doing some operation asynchronously. The best solution I have found is to retry the delete operation multiple times. i.e. something like this:
public static void DeleteRetrying(this FileInfo self, int delayMs = 100, int numberOfAttempts = 3)
{
for (int i = 0; i < numberOfAttempts-1; i++)
{
try
{
self.Delete();
}
catch (IOException)
{
// Consider making the method async and
// replace this with Task.Delay
Thread.Sleep(delayMs);
}
}
// Final attempt, let the exception propagate
self.Delete();
}
This is not an ideal solution, and I would love if someone could provide a better solution. But it might be good enough for testing where the impact of a non deleted file would be manageable.

Related

Creating StreamWriter instances crashes the application with RPC error

I've stumbled upon extremely weird error. When using FileStream in the first using - application iterates through the loop and prints out "Done", however, then it exits with the error code 5. Try/Catch doesn't work either.
This seems to be extremely fragile error state because if I fiddle with the file names (for example C:\TFS\file1.xml.No.xml -> C:\TFS\file1.xml.N.xml) then it works fine.
If I use var tw = File.CreateText in the first using then the application exits with the code 1073741845. I've manage to reduce the problem significantly to just few lines of code below for a reproducible example.
Perhaps someone can explain why in the world this would behave so weirdly? I'm also interested in why I am not able to recover from this error state? I've tried [HandleProcessCorruptedStateExceptions] and [SecurityCritical] with no effect.
static void Main(string[] args)
{
var ds = new DataSet();
for (int i = 0; i <= 2; i++)
{
using (var fs = new FileStream(#"C:\TFS\file1.xml.No.xml", FileMode.Create))
{
}
using (var tw = File.CreateText(#"C:\TFS\file1.xml"))
{
ds.WriteXml(tw);
}
Console.WriteLine($"Pass {i} done.");
}
Console.WriteLine("Done");
Console.ReadLine();
}
Using .NET Framework 4.7 Console Application project.
EDIT:
If I put Thread.Sleep(2000) in each using statement - I then encounter this error after the 2nd pass - it prints Pass 1 done. and Pass 2 done. before has exited with code 5 (0x5) so the frequency of writing does not seem to be responsible for this behaviour.
Upon further tinkering with this small sample - I can reproduce the issue without using DataSet at all and just with creating StreamWriter instances. The below example should produce before exiting abruptly:
TW1 created.
TW1 flushed.
TW1 created.
TW2 flushed.
Pass 0 done.
static void Main(string[] args)
{
for (int i = 0; i <= 2; i++)
{
var tw1 = File.CreateText(#"C:\TFS\file1.xml.No.xml");
Console.WriteLine("TW1 created.");
tw1.Flush();
Console.WriteLine("TW1 flushed.");
Thread.Sleep(2000);
var tw2 = File.CreateText(#"C:\TFS\file1.xml");
Console.WriteLine("TW1 created.");
tw2.Flush();
Console.WriteLine("TW2 flushed.");
Thread.Sleep(2000);
Console.WriteLine($"Pass {i} done.");
}
Console.WriteLine("Done");
Console.ReadLine();
}
**EDIT2: **
So it appears for us this issue was caused by Kaspersky Endpoint Security for Windows v11.
The process exit code does not mean that much, you favor seeing the debugger stop to tell you about an unhandled exception. But sure, this isn't healthy. This is an anti-malware induced problem, they don't like XML files. Often a problem on a programmer's machine, they also don't like executable files appearing from seemingly no-where, created by a process that uses process interop like the IDE does to run msbuild. Strong malware signals. So first thing you want to do is temporarily turn it off to see if that solves the problem.
It surely will, next thing you'd do is switching to something a bit less aggressive. The anti-malware solution provided by the OS never gets in the way like that. If you use Avast or anything else that has a "deep scan" mode then uninstall asap.
And worry a bit about what your user might use, getting an IOException from FileStream is quite normal so a try/catch is pretty much required. In general you don't want to overwrite a file or delete a directory that you created milliseconds ago, luckily it is never a sensible thing to do.

Last batch never uploads to Solr when uploading batches of data from json file stream

This might be a long shot but I might as well try here. There is a block of c# code that is rebuilding a solr core. The steps are as follows:
Delete all the existing documents
Get the core entities
Split the entities into batches of 1000
Spin of threads to preform the next set of processes:
Serialize each batch to json and writing the json to a file on the server
hosting the core
Send a command to the core to upload that file using System.Net.WebClient solrurl/corename/update/json?stream.file=myfile.json&stream.contentType=application/json;charset=utf-8
Delete the file. I've also tried deleting the files after all the batches are done, as well as not deleting the files at all
After all batches are done it commits. I've also tried committing
after each batch is done.
My problem is the last batch will not upload if it's much less than the batch size. It flows through like the command was called but nothing happens. It throws no exceptions and I see no errors in the solr logs. My questions are Why? and How can I ensure the last batch always gets uploaded? We think it's a timing issue, but we've added Thread.Sleep(30000) in many parts of the code to test that theory and it still happens.
The only time it doesn't happen is:
if the batch is full or almost full
we don't run multiple threads it
we put a break point at the File.Delete line on the last batch and wait for 30 seconds or so, then continue
Here is the code for writing the file and calling the update command. This is called for each batch.
private const string
FileUpdateCommand = "{1}/update/json?stream.file={0}&stream.contentType=application/json;charset=utf-8",
SolrFilesDir = #"\\MYSERVER\SolrFiles",
SolrFileNameFormat = SolrFilesDir + #"\{0}-{1}.json",
_solrUrl = "http://MYSERVER:8983/solr/",
CoreName = "MyCore";
public void UpdateCoreByFile(List<CoreModel> items)
{
if (items.Count == 0)
return;
var settings = new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc };
var dir = new DirectoryInfo(SolrFilesDir);
if (!dir.Exists)
dir.Create();
var filename = string.Format(SolrFileNameFormat, Guid.NewGuid(), CoreName);
using (var sw = new StreamWriter(filename))
{
sw.Write(JsonConvert.SerializeObject(items, settings));
}
var file = HttpUtility.UrlEncode(filename);
var command = string.Format(FileUpdateCommand, file, CoreName);
using (var client = _clientFactory.GetClient())//System.Net.WebClient
{
client.DownloadData(new Uri(_solrUrl + command));
}
//Thread.Sleep(30000);//doesn't work if I add this
File.Delete(filename);//works here if add breakpoint and wait 30 sec or so
}
I'm just trying to figure out why this is happening and how to address it. I hope this makes sense, and I have provided enough information and code. Thanks for any help.
Since changing the size of the data set and adding a breakpoint "fixes" it, this is most certainly a race condition. Since you haven't added the code that actually indexes the content, it's impossible to say what the issue really is, but my guess is that the last commit happens before all the threads have finished, and only works when all threads are done (if you sleep the threads, the issue will still be there, since all threads sleep for the same time).
The easy fix - use commitWithin instead, and never issue explicit commits. The commitWithin parmaeter makes sure that the documents become available in the index within the given time frame (given as milliseconds). To make sure that the documents you submit becomes available within ten seconds, append &commitWithin=10000 to your URL.
If there's already documents pending a commit, the documents added will be committed before the ten seconds has ellapsed, but even if there's just one last document being submitted as the last batch, it'll never be more than ten seconds before it becomes visible (.. and there will be no documents left forever in a non-committed limbo).
That way you won't have to keep your threads synchronized or issue a final commit, as long as you wait until all threads have finished before exiting your application (if it's an application that actually terminates).

Using Directory.Delete() and Directory.CreateDirectory() to overwrite a folder

In my WebApi action method, I want to create/over-write a folder using this code:
string myDir = "...";
if(Directory.Exists(myDir))
{
Directory.Delete(myDir, true);
}
Directory.CreateDirectory(myDir);
// 1 - Check the dir
Debug.WriteLine("Double check if the Dir is created: " + Directory.Exists(myDir));
// Some other stuff here...
// 2 - Check the dir again
Debug.WriteLine("Check again if the Dir still exists: " + Directory.Exists(myDir));
Issue
Strangely, sometimes right after creating the directory, the directory does not exist!
Sometimes when checking the dir for the first time (where the number 1 is); Directory.Exist() returns true, other times false. Same happens when checking the dir for the second time (where the number 2 is).
Notes
None of this part of code throw any exception.
Only can reproduce this when publishing the website on server. (Windows server 2008)
Happens when accessing the same folder.
Questions
Is this a concurrency issue race condition?
Doesn't WebApi or the Operating System handle the concurrency?
Is this the correct way to overwrite a folder?
Should I lock files manually when we have many API requests to the same file?
Or in General:
What's the reason for this strange behavior?
UPDATE:
Using DirectoryInfo and Refresh() instead of Directory does not solve the problem.
Only happens when the recursive option of Delete is true. (and the directory is not empty).
Many filesystem operations are not synchonous on some filesystems (in case of windows - NTFS). Take for example RemoveDirectory call (which is called by Directory.DeleteDirectory at some point):
The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed.
As you see, it will not really delete directory until all handles to it are closed, but Directory.DeleteDirectory will complete fine. In your case that is also most likely such concurrency problem - directory is not really created while you executing Directory.Exists.
So, just periodically check what you need and don't consider filesystem calls in .NET to be synchronous. You can also use FileSystemWatcher in some cases to avoid polling.
EDIT: I was thinking how to reproduce it, and here is the code:
internal class Program {
private static void Main(string[] args) {
const string path = "G:\\test_dir";
while (true) {
if (Directory.Exists(path))
Directory.Delete(path);
Directory.CreateDirectory(path);
if (!Directory.Exists(path))
throw new Exception("Confirmed");
}
}
}
You see that if all filesystem calls were synchronous (in .NET), this code should run without problem. Now, before running that code, create empty directory at specified path (preferrably don't use SSD for that) and open it with windows explorer. Now run the code. For me it either throws Confirmed (which exactly reproduces your issue) or throws on Directory.Delete saying that directory does not exist (almost the same case). It does it 100% of the time for me.
Here is another code which when running on my machine confirms that it's certainly possible for File.Exists to return true directly after File.Delete call:
internal class Program {
private static void Main(string[] args) {
while (true) {
const string path = #"G:\test_dir\test.txt";
if (File.Exists(path))
File.Delete(path);
if (File.Exists(path))
throw new Exception("Confirmed");
File.Create(path).Dispose();
}
}
}
To do this, I opened G:\test_dir folder and during execution of this code tried to open constantly appearing and disappearing test.txt file. After couple of tries, Confirmed exception was thrown (while I didn't create or delete that file, and after exception is thrown, it's not present on filesystem already). So race conditions are possible in multiple cases and my answer is correct one.
I wrote myself a little C# method for synchronous folder deletion using Directory.Delete(). Feel free to copy:
private bool DeleteDirectorySync(string directory, int timeoutInMilliseconds = 5000)
{
if (!Directory.Exists(directory))
{
return true;
}
var watcher = new FileSystemWatcher
{
Path = Path.Combine(directory, ".."),
NotifyFilter = NotifyFilters.DirectoryName,
Filter = directory,
};
var task = Task.Run(() => watcher.WaitForChanged(WatcherChangeTypes.Deleted, timeoutInMilliseconds));
// we must not start deleting before the watcher is running
while (task.Status != TaskStatus.Running)
{
Thread.Sleep(100);
}
try
{
Directory.Delete(directory, true);
}
catch
{
return false;
}
return !task.Result.TimedOut;
}
Note that getting task.Result will block the thread until the task is finished, keeping the CPU load of this thread idle. So that is the point where it gets synchronous.
Sounds like race condition to me. Not sure why - you did not provide enough details - but what you can do is to wrap everything in lock() statement and see if the problem is gone. For sure this is not production-ready solution, it is only a quick way to check. If it's indeed a race condition - you need to rethink your approach of rewriting folders. May be create "GUID" folder and when done - update DB with the most recent GUID to point to the most recent folder?..

How to delay the StreamWriter loop?

I'm trying to use this newbie-handy console "outputer". When I try to apply the loop on it, it either goes nuts producing huge amounts of lines in output file or not working at all if try to use delay. Extremely simplified version of my issue:
C#:
using (StreamWriter writer = new StreamWriter())
{
Console.SetOut(writer);
while (true)
{
Console.WriteLine("1");
}
}
I have tried almost every possible way to make a 1sec delay (timers, delays,actions,sleeps). Every option works fine until I apply the StreamWriter. Using delay inside the loop does nothing - tonns of lines "1" in output file. Using delay outside the loop keeps the output file empty. "Try-Catch" says nothing. Guys, where is the trouble? Maybe StreamWriter is not delay-compatible?
Your file will only be closed and written for sure when the using block ends. Before, it's up to your operating system and disk's caching mechanism to actually write the file.
Your loop never ends. Your file might never get written. See to it that your loop can be terminated normally and the using block can close the file.
You can use Thread.Sleep and specify the timespan you want to sleep inside your loop. You also need to terminate the loop at some point, for example:
using (StreamWriter writer = new StreamWriter("c:\\temp\\temp.txt"))
{
Console.SetOut(writer);
for (int i = 0; i < 10; i++)
{
Thread.Sleep(TimeSpan.FromSeconds(1));
Console.WriteLine("{0}: {1}", DateTime.Now.ToShortTimeString(), i);
}
}

Check If File Is In Use By Other Instances of Executable Run

Before I go into too detail, my program is written in Visual Studio 2010 using C# .Net 4.0.
I wrote a program that will generate separate log files for each run. The log file is named after the time, and accurate up at millisecond (for example, 20130726103042375.log). The program will also generate a master log file for the day if it has not already exist (for example, *20130726_Master.log*)
At the end of each run, I want to append the log file to a master log file. Is there a way to check if I can append successfully? And retry after Sleep for like a second or something?
Basically, I have 1 executable, and multiple users (let's say there are 5 users).
All 5 users will access and run this executable at the same time. Since it's nearly impossible for all user to start at the exact same time (up to millisecond), there will be no problem generate individual log files.
However, the issue comes in when I attempt to merge those log files to the master log file. Though it is unlikely, I think the program will crash if multiple users are appending to the same master log file.
The method I use is
File.AppendAllText(masterLogFile, File.ReadAllText(individualLogFile));
I have check into the lock object, but I think it doesn't work in my case, as there are multiple instances running instead of multiple threads in one instance.
Another way I look into is try/catch, something like this
try
{
stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch {}
But I don't think this solve the problem, because the status of the masterLogFile can change in that brief millisecond.
So my overall question is: Is there a way to append to masterLogFile if it's not in use, and retry after a short timeout if it is? Or if there is an alternative way to create the masterLogFile?
Thank you in advance, and sorry for the long message. I want to make sure I get my message across and explain what I've tried or look into so we are not wasting anyone's time.
Please let me know if there's anymore information I can provide to help you help me.
Your try/catch is the way to do things. If the call to File.Open succeeds, then you can write to to the file. The idea is to keep the file open. I would suggest something like:
bool openSuccessful = false;
while (!openSuccessful)
{
try
{
using (var writer = new StreamWriter(masterlog, true)) // append
{
// successfully opened file
openSuccessful = true;
try
{
foreach (var line in File.ReadLines(individualLogFile))
{
writer.WriteLine(line);
}
}
catch (exceptions that occur while writing)
{
// something unexpected happened.
// handle the error and exit the loop.
break;
}
}
}
catch (exceptions that occur when trying to open the file)
{
// couldn't open the file.
// If the exception is because it's opened in another process,
// then delay and retry.
// Otherwise exit.
Sleep(1000);
}
}
if (!openSuccessful)
{
// notify of error
}
So if you fail to open the file, you sleep and try again.
See my blog post, File.Exists is only a snapshot, for a little more detail.
I would do something along the lines of this as I think in incurs the least overhead. Try/catch is going to generate a stack trace(which could take a whole second) if an exception is thrown. There has to be a better way to do this atomically still. If I find one I'll post it.

Categories