FileStream using block not disposing of file properly when using CopyToAsync - c#

I have a situation where I need to asynchronously move a small list of files to another location on the network. I have the following method to do this, but it is occasionally throwing an IO Exception (cannot access the file x because it is being used by another process) when trying to delete the source file. I expected the using block to take care of disposing the FileStreams for me so am not sure what is going on.
public static async Task MoveFileAsync(string sourceFile, string destinationFile)
{
using (var sourceStream = new FileStream(sourceFile, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous | FileOptions.SequentialScan))
using (var destinationStream = new FileStream(destinationFile, FileMode.CreateNew, FileAccess.Write, FileShare.None, 4096, FileOptions.Asynchronous | FileOptions.SequentialScan))
{
await sourceStream.CopyToAsync(destinationStream);
}
File.Delete(sourceFile);
}
I tried doing this with a File.Move in a Parallel.ForEach loop but found the above method was much quicker in my tests. Any pointers on what might be going on would be greatly appreciated.

Related

why are the StreamReader functions, ReadLineAsync and ReadToEndAsync so slow on a file?

Does anyone know why ReadLineAsync and ReadToEndAsync are so much slower, than their synchronous counterparts ReadLine and ReadToEnd? I could understand the slowness if i was awaiting multiple calls, but that not the case.
I'm using a Release build, and i'm not starting it with debugging.
I tested it on a 420MB CSV file, containing only the following line repeated:
1234567890123456,12345678,12345678901,1,1,1,1234567890123456789
1234567890123456,12345678,12345678901,1,1,1,1234567890123456789
1234567890123456,12345678,12345678901,1,1,1,1234567890123456789
[etc...]
I tested it with the following program (results are in comments):
static void Main(string[] args)
{
var sw = new Stopwatch();
sw.Restart();
One_ReadToEnd();
Console.WriteLine($"One_ReadToEnd: {sw.Elapsed}"); // One_ReadToEnd: 00:00:06.1749275
sw.Restart();
One_ReadToEndAsync().GetAwaiter().GetResult();
Console.WriteLine($"One_ReadToEndAsync: {sw.Elapsed}"); // One_ReadToEndAsync: 00:00:23.3265661
sw.Restart();
Many_ReadLine();
Console.WriteLine($"Many_ReadLine: {sw.Elapsed}"); // Many_ReadLine: 00:00:05.9391718
sw.Restart();
Many_ReadLineAsync().GetAwaiter().GetResult();
Console.WriteLine($"Many_ReadLineAsync: {sw.Elapsed}"); // Many_ReadLineAsync: 00:00:31.4988402
}
const string path = #"C:\Temp\test.csv";
static void One_ReadToEnd()
{
using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read), Encoding.ASCII))
{
sr.ReadToEnd();
sr.Close();
}
}
static async Task One_ReadToEndAsync()
{
using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous), Encoding.ASCII))
{
await sr.ReadToEndAsync();
sr.Close();
}
}
static void Many_ReadLine()
{
using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read), Encoding.ASCII))
{
while (!sr.EndOfStream)
sr.ReadLine();
sr.Close();
}
}
static async Task Many_ReadLineAsync()
{
using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous), Encoding.ASCII))
{
while (!sr.EndOfStream)
await sr.ReadLineAsync();
sr.Close();
}
}
These were the results:
One_ReadToEnd: 00:00:06.1749275
One_ReadToEndAsync: 00:00:23.3265661
Many_ReadLine: 00:00:05.9391718
Many_ReadLineAsync: 00:00:31.4988402
Looking at your code, its not apples-to-apples
sync:
using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read), Encoding.ASCII))
and async:
using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous), Encoding.ASCII))
see the difference (no, not the 4096 - that's the default buffer size), but the FileOptions.Asynchronous. This does not make the streamreader asynchronous, but makes opening the file in async mode - ie the file can be read or written asychronously ("overlapped" in windows-speak)
Normally this shouldn't make a difference, but who knows what layers of code is in there, so try without the filestream options and see if that changes things.
the docs say:
using (StreamReader reader = File.OpenText("existingfile.txt"))
{
Console.WriteLine("Opened file.");
result = await reader.ReadToEndAsync();
Console.WriteLine("Contains: " + result);
}
in the ReadToEndAsync documentation, so its not necessary to open the file overlapped to get your app running asynchronously. Or you can try opening the file in overlapped mode in your synchronous version.
Edit: Had a quick look at the source, it all seems the same logic between sync and async CS code even if it accesses the streams via properties and is littered with GetAwait type calls, but the async section has this comment. I can't find the referenced bug but maybe this is causing the massive slowdown in accessing the file.
// Access to instance fields of MarshalByRefObject-derived types requires special JIT helpers that check
// if the instance operated on is remote. This is optimised for fields on this but if a method is Async
// and is thus lifted to a state machine type, access will be slow.
// As a workaround, we either cache instance fields in locals or use properties to access such fields.
// See Dev11 bug #370300 for more info.
This seems to be StreamReader.ReadLineAsync performance can be improved.
In the issue's thread there are benchmarks done with BenchmarkDotNet with a comment from a Microsoft dev below the results:
My observations:
.NET 6 is 12-13% faster on average. If IO was the limiting factor, I would expect the difference to be larger for async method.
Sync implementation is as twice as fast as async. This is expected, as async File IO has some non trivial overhead compared to sync. The main benefit of async File IO is improved scalability, not performance.
I had a quick look at the implementation and there is definitely place for improvement. I am going to change the issue title (there is no such thing as Stream.ReadLineAsync) and it's 2 (not 6) times slower and make it up-for-grabs.
The work to improve this was included in System.IO work planned for .NET 7 but as of Jan 2023 with .NET7.0.100 released the issue is still open and the async versions seem to be 2x slower than the non-async ones.

GZipStream compression not working

I'm trying to read in a file and compress it using GZipStream, like this:
using (var outStream = new MemoryStream())
{
using (var fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
using (var gzipStream = new GZipStream(outStream, CompressionMode.Compress))
{
fileStream.CopyTo(gzipStream);
Debug.WriteLine(
"Compressed from {0} to {1} bytes",
fileStream.Length,
outStream.Length);
// "outStream" is utilised here (persisted to a NoSql database).
}
}
}
The problem is that outStream.Length always shows 10 bytes. What am I doing wrong?
I've tried calling gzipStream.Close() after the fileStream.CopyTo line (as suggested in other forums) but this seems to close outStream too, so the subsequent code that uses it falls over.
MSDN says: The write operation might not occur immediately but is buffered until the buffer size is reached or until the Flush or Close method is called.
In other words, the fact that all the Write operations are done doesn't mean the data is already in the MemoryStream. You have to do gzipStream.Flush() or close the gzipStream first.
Example:
using (var outStream = new MemoryStream())
{
using (var fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
using (var gzipStream = new GZipStream(outStream, CompressionMode.Compress))
{
fileStream.CopyTo(gzipStream);
}
Debug.WriteLine(
"Compressed from {0} to {1} bytes",
fileStream.Length,
outStream.Length);
// "outStream" is utilised here (persisted to a NoSql database).
}
}
Also, ideally, put it outside of the FileStream as well - you want to close files as soon as you can, rather than waiting for some other processing to finish.

Read and Write to File at the same time

for an application that uses a File as some sort of global storage for device reservations in a firm I need a way to read and write to a file (or lock a file, read from it, write to it, and unlock it). A little code snippet will shot what I mean:
FileStream in = new FileStream("storage.bin", FileMode.Open);
//read the file
in.Close();
//!!!!!
//here is the critical section since between reading and writing, there shouldnt
//be a way for another process to access and lock the file, but there is the chance
//because the in stream is closed
//!!!!!
FileStream out = new FileStream("storage.bin", FileMode.Create);
//write data to file
out.Close();
this should get something like this
LockFile("storage.bin");
//read from it...
//OVERwrite it....
UnlockFile("storage.bin");
the method should be absolute safe, since the program should run on 2000 devices at the same time
Simply holding a FileStream open with exclusive (not shared) access will prevent other processes from accessing the file. This is the default when opening a file for read/write access.
You can 'overwrite' a file that you currently hold open by truncating it.
So:
using (var file = File.Open("storage.bin", FileMode.Open))
{
// read from the file
file.SetLength(0); // truncate the file
// write to the file
}
the method should be absolute safe, since the program should run on 2000 devices at the same time
Depending on how often you're writing to the file, this could become a chokepoint. You probably want to test this to see how scalable it is.
In addition, if one of the processes tries to operate on the file at the same time as another one, an IOException will be thrown. There isn't really a way to 'wait' on a file, so you probably want to coordinate file access in a more orderly fashion.
You need a single stream, opened for both reading and writing.
FileStream fileStream = new FileStream(
#"c:\words.txt", FileMode.OpenOrCreate,
FileAccess.ReadWrite, FileShare.None);
Alternatively you can also try
static void Main(string[] args)
{
var text = File.ReadAllText(#"C:\words.txt");
File.WriteAllText(#"C:\words.txt", text + "DERP");
}
As per http://msdn.microsoft.com/en-us/library/system.io.fileshare(v=vs.71).aspx
FileStream s2 = new FileStream(name, FileMode.Open, FileAccess.Read, FileShare.None);
You need to pass in a FileShare enumeration value of None to open on the FileStream constructor overloads:
fs = new FileStream(#"C:\Users\Juan Luis\Desktop\corte.txt", FileMode.Open,
FileAccess.ReadWrite, FileShare.None);
I ended up writing this helper class to do this:
public static class FileHelper
{
public static void ReplaceFileContents(string fileName, Func<String, string> replacementFunction)
{
using (FileStream fileStream = new FileStream(
fileName, FileMode.OpenOrCreate,
FileAccess.ReadWrite, FileShare.None))
{
StreamReader streamReader = new StreamReader(fileStream);
string currentContents = streamReader.ReadToEnd();
var newContents = replacementFunction(currentContents);
fileStream.SetLength(0);
StreamWriter writer = new StreamWriter(fileStream);
writer.Write(newContents);
writer.Close();
}
}
}
which allows you to pass a function that will take the existing contents and generate new contents and ensure the file is not read or modified by anything else whilst this change is happening
You are likely looking for FileStream.Lock and FileStream.Unlock
I think you just need to use the FileShare.None flag in the overloaded Open method.
file = File.Open("storage.bin", FileMode.Open, FileShare.None);

Async FileStream Writes "NUL" into file

I am using this code to write asynchronously to a file
public static void AsyncWrite(string file, string text)
{
try
{
byte[] data = Encoding.Unicode.GetBytes(text);
using ( FileStream fs = new FileStream(file, FileMode.Create,
FileAccess.Write, FileShare.Read, 1, true))
fs.BeginWrite(data, 0, data.Length, null, null);
}
catch
{
}
}
For some reason, from time to time, rather than writing text into the file as expected, Notepad++ shows the following ouput :
BeginWrite is asynchronous, so it might well happen that the stream is closed through the using statement while other things are happening.
I'd not use using when doing asynchronous writing. Instead I'd create a proper callback method and close the stream there. This would also give you the chance to call EndWrite as recommended.

What's the least invasive way to read a locked file in C# (perhaps in unsafe mode)?

I need to read a Windows file that may be locked, but I don't want to create any kind lock that will prevent other processes from writing to the file.
In addition, even if the file is locked for exclusive use, I'd like to see what's inside.
Although this isn't my exact use case, consider how to read a SQL/Exchange log or database file while it's in use and mounted. I don't want to cause corruption but I still want to see the insides of the file and read it.
You can do it without copying the file, see this article:
The trick is to use FileShare.ReadWrite (from the article):
private void LoadFile()
{
try
{
using(FileStream fileStream = new FileStream(
"logs/myapp.log",
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite))
{
using(StreamReader streamReader = new StreamReader(fileStream))
{
this.textBoxLogs.Text = streamReader.ReadToEnd();
}
}
}
catch(Exception ex)
{
MessageBox.Show("Error loading log file: " + ex.Message);
}
}
The accepted answer is not correct. If the file is really locked, you cannot just change the file share. This would work if the lock has been set with this fileshare option too but it does not mean that it is the case. In fact, you can test #CaffGeek solution pretty easily by opening the file without the FileShare.ReadWrite and than trying to open it with this flag to ReadWrite. You will get that the file is using by another process.
Code:
string content;
var filePath = "e:\\test.txt";
//Lock Exclusively the file
var r = File.Open(filePath, FileMode.Open, FileAccess.Write, FileShare.Write);
//CaffGeek solution
using (FileStream fileStream = new FileStream(
filePath,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(fileStream))
{
content = streamReader.ReadToEnd();
}
}
As you can see, it crashes. This result is the same with any FileStream method like the File.Open. It will crash what ever you put for FileShare during the open stage.
//OPEN FOR WRITE with exclusive
var r = File.Open(filePath, FileMode.Open, FileAccess.Write, FileShare.Write);
//OPEN FOR READ with file share that allow read and write
var x = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); //Crash
Copying the file is not also an option. You can try it your self by opening the file exclusively and try to copy the file on Windows Explorer or by code:
var filePath = "e:\\test.txt";
var filePathCopy = "e:\\test.txt.bck";
//Lock the file
var r = File.Open(filePath, FileMode.Open, FileAccess.Write, FileShare.Write);
File.Copy(filePath, filePathCopy);
var x = File.Open(filePathCopy, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (var reader = new StreamReader(x))
{
content = reader.ReadToEnd();
}
r.Close();
File.Delete(filePathCopy);
This code crash when you hit the File.Copy line. The exception is the same as before : file is being using by another process.
You need to kill the process that has the lock of the file if you want to read it OR if you have the source code of the file that is locking the file to change this one to use FileShare.ReadWrite instead of just FileShare.Write.
You can probably create a copy and read that, even if the file is locked.
Or maybe a StreamReader on a FileStream depending on how SQL opened the file?
new FileStream("c:\myfile.ext", FileMode.Open, FileAccess.Read, FileShare.ReadWrite);

Categories