Write to open FileStream using reactive programming - c#

I am writing a small logger and I want to open the log file once, keep writing reactively as log messages arrive, and dispose of everything on program termination.
I am not sure how I can keep the FileStream open and reactively write the messages as they arrive.
I would like to update the design from my previous solution where I had a ConcurrentQueue acting as a buffer, and a loop inside the using statements that consumed the queue.
Specifically, I want to simultaneously take advantage of the using statement construct, so I don't have to explicitly close the stream and writer, and of the reactive, loopless programming style. Currently I only know how to use one of these constructs at once: either the using/loop combination, or the explicit-stream-close/reactive combination.
Here's my code:
BufferBlock<LogEntry> _buffer = new BufferBlock<LogEntry>();
// CONSTRUCTOR
public DefaultLogger(string folder)
{
var filePath = Path.Combine(folder, $"{DateTime.Now.ToString("yyyy.MM.dd")}.log");
_cancellation = new CancellationTokenSource();
var observable = _buffer.AsObservable();
using (var stream = File.Create(_filePath))
using (var writer = new StreamWriter(stream))
using (var subscription = observable.Subscribe(entry =>
writer.Write(GetFormattedString(entry))))
{
while (!_cancellation.IsCancellationRequested)
{
// what do I do here?
}
}
}

You need to use Observable.Using. It's designed to create an IDisposble resource that gets disposed when the sequence ends.
Try something like this:
IDisposable subscription =
Observable.Using(() => File.Create(_filePath),
stream => Observable.Using(() => new StreamWriter(stream),
writer => _buffer.AsObservable().Select(entry => new { entry, writer })))
.Subscribe(x => x.writer.Write(GetFormattedString(x.entry)));

Related

Dequeue a collection to write to disk until no more items left in collection

So, I have a list of file shares.
I then need to obtain ALL folders in these file shares. This is all the "easy" stuff I have done.
Ultimately after some logic, I am adding an object into a collection.
All the while this is happening in the background using Async/Await and Tasks, I want to be able to have another thread/task spin up so it can keep going through the collection and write data to disk.
Now, for each folder, I obtain security information about that folder. There will be at LEAST 1 item. But for each item, I add this information into a collection (for each folder).
I want to write to disk in a background until there are no more folders to iterate through and the job is complete.
I was thinking of using a BlockingCollection however this code does smell and ultimately does not close the file because of the while(true) statement.
private static BlockingCollection<DirectorySecurityInformation> AllSecurityItemsToWrite = new BlockingCollection<DirectorySecurityInformation>();
if (sharesResults.Count > 0)
{
WriteCSVHeader();
// setup a background task which will dequeue items to write.
var csvBGTask = Task.Run(async () =>
{
using (var sw = new StreamWriter(FileName, true))
{
sw.AutoFlush = true;
while (true)
{
var dsi = AllSecurityItemsToWrite.Take();
await sw.WriteLineAsync("... blah blah blah...");
await sw.FlushAsync();
}
}
});
allTasks.Add(csvBGTask);
}
foreach(var currentShare in AllShares)
{
var dirs = Directory.EnumerateDirectories(currentShare .FullName, "*", SearchOption.AllDirectories);
foreach(var currentDir in dirs) { // Spin up a task in the BG and run to do some security analysis and add to the AllSecurityItemsToWrite collection }
}
This is at its simplest but core example.
Any ideas? I just want to keep adding on the background task and have another task just dequeue and write to disk until there are no more shares to go through (shareResults).
Recommand to use Channel.
Channel<DirectorySecurityInformation> ch =
Channel.CreateUnbounded<DirectorySecurityInformation>();
Write
var w = ch.Writer;
foreach(var dsi in DSIs)
w.TryWrite(dsi);
w.TryComplete();
Read
public async void ReadTask()
{
var r = ch.Reader;
using (var sw = new StreamWriter(filename, true))
{
await foreach(var dsi in r.ReadAllAsync())
sw.WriteLine(dsi);
}
}
while (true)
{
var dsi = AllSecurityItemsToWrite.Take();
//...
}
Instead of using the Take method, it's generally more convenient to consume a BlockingCollection<T> with the GetConsumingEnumerable method:
foreach (var dsi in AllSecurityItemsToWrite.GetConsumingEnumerable())
{
//...
}
This way the loop will stop automatically when the CompleteAdding method is called, and the collection is empty.
But I agree with shingo that the BlockingCollection<T> is not the correct tool in this case, because your workers are running on an asynchronous context. A Channel<T> should be preferable, because it can be consumed without blocking a thread.

Proper way to use DisposeAsync on C# streams

I'm writing a method which asynchronously writes separate lines of text to a file. If it's cancelled it deletes the created file and jumps out of the loop.
This is the simplified code which works fine... And I marked 2 points which I'm not sure how they are being handled. I want the code to not block the thread in any case.
public async Task<IErrorResult> WriteToFileAsync(string filePath,
CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();
using var stream = new FileStream(filePath, FileMode.Create);
using var writer = new StreamWriter(stream, Encoding.UTF8);
foreach (var line in Lines)
{
if (cancellationToken.IsCancellationRequested)
{
//
// [1] close, delete and throw if cancelled
//
writer.Close();
stream.Close();
if (File.Exists(filePath))
File.Delete(filePath);
throw new OperationCanceledException();
}
// write to the stream
await writer.WriteLineAsync(line.ToString());
}
//
// [2] flush and let them dispose
//
await writer.FlushAsync();
await stream.FlushAsync();
// await stream.DisposeAsync(); ??????
return null;
}
1
I'm calling Close() on FileStream and StreamWriter and I think it will run synchronously and blocks the thread. How can I improve this? I don't want to wait for it to flush the buffer into the file and then delete the file.
2
I suppose the Dispose method will be called and not DisposeAsync at the end of the using scope. (is this assumption correct?).
So Dispose blocks the thread and in order to prevent that I'm flushing first with FlushAsync so that Dispose would perform less things. (to what extent is this true?)
I could also remove using and instead I could write DisposeAsync manually in these two places. But it will decrease readability.
If I open the FileStream with useAsync = true would it automatically call DisposeAsync when using block ends?
Any explanation or a variation of the above code which performs better is appreciated.
As you have it, the using statement will call Dispose(), not DisposeAsync().
C# 8 brought a new await using syntax, but for some reason it's not mentioned in the What's new in C# 8.0 article.
But it's mentioned elsewhere.
await using var stream = new FileStream(filePath, FileMode.Create);
await using var writer = new StreamWriter(stream, Encoding.UTF8);
But also note that this will only work if:
You're using .NET Core 3.0+ since that's when IAsyncDisposable was introduced, or
Install the Microsoft.Bcl.AsyncInterfaces NuGet package. Although this only adds the interfaces and doesn't include the versions of the Stream types (FileStream, StreamWriter, etc.) that use it.
Even in the Announcing .NET Core 3.0 article, IAsyncDisposable is only mentioned in passing and never expanded on.
On another note, you don't need to do this (I see why now):
writer.Close();
stream.Close();
Since the documentation for Close says:
This method calls Dispose, specifying true to release all resources. You do not have to specifically call the Close method. Instead, ensure that every Stream object is properly disposed.
Since you're using using, Dispose() (or DisposeAsync()) will be called automatically and Close won't do anything that's not already happening.
So if you do need to specifically close the file, but want to do it asynchronously, just call DisposeAsync() instead. It does the same thing.
await writer.DisposeAsync();
public async Task<IErrorResult> WriteToFileAsync(string filePath,
CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();
await using var stream = new FileStream(filePath, FileMode.Create);
await using var writer = new StreamWriter(stream, Encoding.UTF8);
foreach (var line in Lines)
{
if (cancellationToken.IsCancellationRequested)
{
// not possible to discard, FlushAsync is covered in DisposeAsync
await writer.DisposeAsync(); // use DisposeAsync instead of Close to not block
if (File.Exists(filePath))
File.Delete(filePath);
throw new OperationCanceledException();
}
// write to the stream
await writer.WriteLineAsync(line.ToString());
}
// FlushAsync is covered in DisposeAsync
return null;
}

c# streamWriter WriteLineAsync without await

I want to write to StreamWriter line async but I don't want to await on this.
for(int i= 0 ;i<1000;i++)
{
sw.WriteLineAsync(i.ToString());
}
But i got an error that i invoke to WriteLineAsync in same time.
What can I do to fix that?
I want to close this StreamWriter after this loop.
How can I verify that I now close without to write all data on StreamWriter, Or when I close the stream all the data that sent with WriteLineAsync will be write before the stream close?
If you want to write the lines using async, you should use await. Because, the file might be corrupted and you might encounter with possible stream errors like "the stream is already in use". In short, you should synchronize the write action.
So, I provided an example;
private async Task WriteToFileAsAsync()
{
string file = #"sample.txt";
using (FileStream stream = new FileStream(file, FileMode.Create, FileAccess.ReadWrite))
{
using (StreamWriter streamWriter = new StreamWriter(stream))
{
for (int i = 0; i < 1000; i++)
{
await streamWriter.WriteLineAsync(i.ToString());
}
}
}
}
Also, you can close and dispose the StreamWriter within using blocks.
EDIT
If you want to perform write action in separately from main thread, don't use async methods and just create a seperate Task and assign it an another thread.
private void WriteToFile()
{
string file = #"sample.txt";
using (FileStream stream = new FileStream(file, FileMode.Create, FileAccess.ReadWrite))
{
using (StreamWriter streamWriter = new StreamWriter(stream))
{
for (int i = 0; i < 1000; i++)
{
streamWriter.WriteLine(i.ToString());
}
}
}
}
Then call like this;
Task.Factory.StartNew(WriteToFile);
In your particular context there are two main issues in doing what you would like to do.
Technically speaking, if you call an async method, you need to await for it sooner or later, hence you can collect the task, and await for it later on. However, the WriteLineAsync method is not atomic, therefore calling it and performing other operations on the stream can corrupt the stream itself.
If you don't want to await, then don't call the Async method..
If you want to close a stream after using, use using with it.
using (StreamWriter writer = new StreamWriter("temp.txt"))
{
for (int i = 0; i < 1000; i++)
{
writer.WriteLine(i.ToString());
}
}
If you want it to be asynchronous, then you have to call the Async version, otherwise it will corrupt the file.

Downloading multiple files by fastly and efficiently(async)

I have so many files that i have to download. So i try to use power of new async features as below.
var streamTasks = urls.Select(async url => (await WebRequest.CreateHttp(url).GetResponseAsync()).GetResponseStream()).ToList();
var streams = await Task.WhenAll(streamTasks);
foreach (var stream in streams)
{
using (var fileStream = new FileStream("blabla", FileMode.Create))
{
await stream.CopyToAsync(fileStream);
}
}
What i am afraid of about this code it will cause big memory usage because if there are 1000 files that contains 2MB file so this code will load 1000*2MB streams into memory?
I may missing something or i am totally right. If am not missed something so it is better to await every request and consume stream is best approach ?
Both options could be problematic. Downloading only one at a time doesn't scale and takes time while downloading all files at once could be too much of a load (also, no need to wait for all to download before you process them).
I prefer to always cap such operation with a configurable size. A simple way to do so is to use an AsyncLock (which utilizes SemaphoreSlim). A more robust way is to use TPL Dataflow with a MaxDegreeOfParallelism.
var block = new ActionBlock<string>(url =>
{
var stream = (await WebRequest.CreateHttp(url).GetResponseAsync()).GetResponseStream();
using (var fileStream = new FileStream("blabla", FileMode.Create))
{
await stream.CopyToAsync(fileStream);
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 100 });
Your code will load the stream into memory whether you use async or not. Doing async work handles the I/O part by returning to the caller until your ResponseStream returns.
The choice you have to make dosent concern async, but rather the implementation of your program concerning reading a big stream input.
If I were you, I would think about how to split the work load into chunks. You might read the ResponseStream in parallel and save each stream to a different source (might be to a file) and release it from memory.
This is my own answer chunking idea from Yuval Itzchakov and i provide implementation. Please provide feedback for this implementation.
foreach (var chunk in urls.Batch(5))
{
var streamTasks = chunk
.Select(async url => await WebRequest.CreateHttp(url).GetResponseAsync())
.Select(async response => (await response).GetResponseStream());
var streams = await Task.WhenAll(streamTasks);
foreach (var stream in streams)
{
using (var fileStream = new FileStream("blabla", FileMode.Create))
{
await stream.CopyToAsync(fileStream);
}
}
}
Batch is extension method that is simply as below.
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}

StreamWriter Creates Zero-Byte File

I have a Task that reads strings from a blocking collection and is supposed to write them out to a file. Trouble is, while the file is created, the size of the file is 0 bytes after the task completes.
While debugging, I see that non-empty lines are retrieved from the blocking collection, and the stream writer is wrapped in a using block.
For debugging I threw in a flush that should not be required and write the lines to the console. There are 100 non-empty lines of text read from the blocking collection.
// Stuff is placed in writeQueue from a different task
BlockingCollection<string> writeQueue = new BlockingCollection<string>();
Task writer = Task.Factory.StartNew(() =>
{
try
{
while (true)
{
using (FileStream fsOut = new FileStream(destinationPath, FileMode.Create, FileAccess.Write))
using (BufferedStream bsOut = new BufferedStream(fsOut))
using (StreamWriter sw = new StreamWriter(bsOut))
{
string line = writeQueue.Take();
Console.WriteLine(line); // Stuff is written to the console
sw.WriteLine(line);
sw.Flush(); // Just in case, makes no difference
}
}
}
catch (InvalidOperationException)
{
// We're done.
}
});
Stepping through in the debugger, I see that the program terminates in an orderly manner. There are no unhandled exceptions.
What might be going wrong here?
You are re-creating the file on every run of the loop. Change the FileMode.Create to FileMode.Append and it will keep the previous values you wrote on it.
Also, using exceptions to detect that you should stop is a really bad practice, if this a consumer-producer solution, you can easily do better by having the producer setting a thread safe flag variable signaling it has finished the work and will not produce anything else.

Categories