How to guaranteed write into file in multithreading with exceptions? - c#

This is a simplified example
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
new Thread(() => Method1()).Start();
new Thread(() => Method2()).Start();
Console.Read();
}
private static void Method1()
{
using (StreamWriter sw = new StreamWriter(#"h:\data.txt"))
{
int i = 100000000;
while (true)
{
Thread.Sleep(100);
sw.WriteLine(i);
i++;
}
}
}
private static void Method2()
{
Thread.Sleep(6000);
throw null;
}
}
}
StreamWriter doesn't write the data into file if exception occurs too early and in another thread. File data.txt is empty at the time when exception occurs.
I played with this situation a little and found a bunch of workarounds:
If I increase the sleep interval for the exception's thread (and decrease interval between writings into file) the file will be filled with data. It is not a choice because I don't know when an exception occurs.
As a consequence of previous workaround I can decrease the buffer size of the stream writer. But it seems not working if I set it too small - for example, this code
FileStream fs = new FileStream(#"h:\data.txt", FileMode.Create);
using (StreamWriter sw = new StreamWriter(fs, Encoding.Default, 10))
doesn't work because the first writing operation occurs only when about 385 integers are waiting in the buffer to be written into file.
File will be filled if I close the writer before exception occurs. But that is not a good choice - I have to write into file from 1 to 10 times per second. It is not a good idea to open and close the writer so frequently, is it?
I can catch the exception like this
private static void Method2()
{
try
{
Thread.Sleep(6000);
throw null;
}
catch
{
Console.WriteLine("Exception!");
}
}
and all will be OK - no application termination and file will be filled pack by pack. But that is not the case also - I can't control when and where exceptions occur. I try to use try-catch everywhere but I can miss something.
So the situation is: StreamWriter's buffer is not full, exception occured in another thread and is not catched, so the application will be terminated. How not to lose this data and to write it into file?

As I understand your situation you are assuming that there is a bug somewhere and the process might be terminated at any time. You want to save as much data as possible.
You should be able to call Flush on the StreamWriter. This will push the data to the OS. If your process terminates the data will eventually be written by the OS.
In case you cannot convince StreamWriter to actually flush for some reason you can use a FileStream and write to that (pseudocode: fileStream.Write(Encoding.GetBytes(myString))). You can then flush the FileStream or use a buffer size of 1.
Of course it's best if you prevent the process from being terminated in the first place. This usually is straight forward with Task as opposed to using raw Threads.

Flushing the stream will ensure that all of it's content is pushed into it's underlying file.
That will take care of ensring all of the data is saved after you completed an operation and that a following exception will not make your application loose data.

Related

How to periodically flush c# FileStream to the disk?

Context:
I am implementing a logging mechanism for a Web API project that writes serialized objects to a file from multiple methods which in turn is read by an external process (nxLog to be more accurate). The application is hosted on IIS and uses 18 worker processes. The App pool is recycled once a day. The expected load on the services that will incorporate the logging methods is 10,000 req/s. In short this is a classic produces/consumer problem with multiple producers (the methods that produce logs) and one consumer (the external process who reads from the log files). Update: Each process uses multiple threads as well.
I used BlockingCollection to store data (and solve the race condition) and a long running task that writes the data from the collection to the disk.
To write to the disk I am using a StreamWriter and a FileStream.
Because the write frequency is almost constant ( as I said 10,000 write per second) I decided to keep the streams open for the entire lifetime of the application pool and periodically write logs to the disk. I rely on the App Pool recycle and my DI framework to dispose my logger daily. Also note that this class will be singleton, because I didn't want to have more than one thread dedicated to writing from my thread pool.
Apparently the FileStream object will not write to the disk until it is disposed. Now I don't want the FileStream to wait for an entire day until it writes to the disk. The memory it will require to hold all that serialized object will be tremendous, not to mention that any crash on the application or the server will cause data loss or corrupted file.
Now my question:
How can I have the underlying streams (FileStream and StreamWriter) write to the disk periodically without disposing them? My initial assumption was that it will write to the disk once FileSteam exceeds its buffer size which is 4K by default.
UPDATE: The inconsistencies mentioned in the answer have been fixed.
Code:
public class EventLogger: IDisposable, ILogger
{
private readonly BlockingCollection<List<string>> _queue;
private readonly Task _consumerTask;
private FileStream _fs;
private StreamWriter _sw;
public EventLogger()
{
OpenFile();
_queue = new BlockingCollection<List<string>>(50);
_consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
private void OpenFile()
{
_fs?.Dispose();
_sw?.Dispose();
_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
_fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
_sw = new StreamWriter(_fs);
}
public void Dispose()
{
_queue?.CompleteAdding();
_consumerTask?.Wait();
_sw?.Dispose();
_fs?.Dispose();
_queue?.Dispose();
}
public void Log(List<string> list)
{
try
{
_queue.TryAdd(list, 100);
}
catch (Exception e)
{
LogError(LogLevel.Error, e);
}
}
private void Write()
{
foreach (List<string> items in _queue.GetConsumingEnumerable())
{
items.ForEach(item =>
{
_sw?.WriteLine(item);
});
}
}
}
There are a few "inconsistencies" with your question.
The application is hosted on IIS and uses 18 worker processes
.
_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
.
writes serialized objects to a file from multiple methods
Putting all of this together, you seem to have a single threaded situation as opposed to a multi-threaded one. And since there is a separate log per process, there is no contention problem or need for synchronization. What I mean to say is, I don't see why the BlockingCollection is needed at all. It's possible that you forgot to mention that there are multiple threads within your web process. I will make that assumption here.
Another problems is that your code does not compile
class name is Logger but the EventLogger function looks like a constructor.
some more incorrect syntax with string, etc
Putting all that aside, if you really have a contention situation and want to write to the same log via multiple threads or processes, your class seems to have most of what you need. I have modified your class to do some more things. Chief to note are the below items
Fixed all the syntax errors making assumptions
Added a timer, which will call the flush periodically. This will need a lock object so as to not interrupt the write operation
Used an explicit buffer size in the StreamWriter constructor. You should heuristically determine what size works best for you. Also, you should disable AutoFlush from StreamWriter so you can have your writes hit the buffer instead of the file, providing better performance.
Below is the code with the changes
public class EventLogger : IDisposable, ILogger {
private readonly BlockingCollection<List<string>> _queue;
private readonly Task _consumerTask;
private FileStream _fs;
private StreamWriter _sw;
private System.Timers.Timer _timer;
private object streamLock = new object();
private const int MAX_BUFFER = 16 * 1024; // 16K
private const int FLUSH_INTERVAL = 10 * 1000; // 10 seconds
public EventLogger() {
OpenFile();
_queue = new BlockingCollection<List<string>>(50);
_consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
void SetupFlushTimer() {
_timer = new System.Timers.Timer(FLUSH_INTERVAL);
_timer.AutoReset = true;
_timer.Elapsed += TimedFlush;
}
void TimedFlush(Object source, System.Timers.ElapsedEventArgs e) {
_sw?.Flush();
}
private void OpenFile() {
_fs?.Dispose();
_sw?.Dispose();
var _logFilePath = $"D:\\Log\\log{DateTime.Now.ToString("yyyyMMdd")}{System.Diagnostics.Process.GetCurrentProcess().Id}.txt";
_fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
_sw = new StreamWriter(_fs, Encoding.Default, MAX_BUFFER); // TODO: use the correct encoding here
_sw.AutoFlush = false;
}
public void Dispose() {
_timer.Elapsed -= TimedFlush;
_timer.Dispose();
_queue?.CompleteAdding();
_consumerTask?.Wait();
_sw?.Dispose();
_fs?.Dispose();
_queue?.Dispose();
}
public void Log(List<string> list) {
try {
_queue.TryAdd(list, 100);
} catch (Exception e) {
LogError(LogLevel.Error, e);
}
}
private void Write() {
foreach (List<string> items in _queue.GetConsumingEnumerable()) {
lock (streamLock) {
items.ForEach(item => {
_sw?.WriteLine(item);
});
}
}
}
}
EDIT:
There are 4 factors controlling the performance of this mechanism, and it is important to understand their relationship. Below example will hopefully make it clear
Let's say
average size of List<string> is 50 Bytes
Calls/sec is 10,000
MAX_BUFFER is 1024 * 1024 Bytes (1 Meg)
You are producing 500,000 Bytes of data per second, so a 1 Meg buffer can hold only 2 seconds worth of data. i.e. Even if FLUSH_INTERVAL is set to 10 seconds the buffer will AutoFlush every 2 seconds (on an average) when it runs out of buffer space.
Also remember that increasing the MAX_BUFFER blindly will not help, since the actual flush operation will take longer due to the bigger buffer size.
The main thing to understand is that when there is a difference in incoming data rates (to your EventLog class) and outgoing data rates (to the disk), you will either need an infinite sized buffer (assuming continuously running process) or you will have to slow down your avg. incoming rate to match avg. outgoing rate
Maybe my answer won't address your concrete concern, but I believe that your scenario could be a good use case for memory-mapped files.
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
This could be very interesting because you'll be able to do logging from different processes (i.e. IIS worker processes) without locking issues. See MemoryMappedFile.OpenExisting method.
Also, you can log to a non-persistent shared memory-mapped file and, using a task scheduler or a Windows service, you can take pending logs to their final destination using a persistable memory-mapped file.
I see a lot of potential on using this approach because of your multi/inter-process scenario.
Approach #2
If you don't want to re-invent the wheel, I would go for a reliable message queue like MSMQ (very basic, but still useful in your scenario) or RabbitMQ. Enqueue logs in persistent queues, and a background process may consume these log queues to write logs to the file system.
This way, you can create log files once, twice a day, or whenever you want, and you're not tied to the file system when logging actions within your system.
Use the FileStream.Flush() method - you might do this after each call to .Write. It will clear buffers for the stream and causes any buffered data to be written to the file.
https://msdn.microsoft.com/en-us/library/2bw4h516(v=vs.110).aspx

C# The process cannot access file 'XYZ' because it is being used by another process

I am been fighting with this problem the last couple of days, it works fine when I am on my dev machine, but on the client it is showing this error.
Now this is the code that I have that seems to be showing the error so any help or guidance would be amazing, thank you in advance.
private void document()
{
StreamWriter sWrite = new StreamWriter("C:\\Demo\\index.html");
//LOTS OF SWRITE LINES HERE
sWrite.Close();
System.Diagnostics.Process.Start("C:\\Demo\\index.html");
}
So I have no idea what it keeps telling me the file is already being used by another process if I run this method twice.
Some of it depends on the exact behavior. This could be for a few reasons: it could be, for example, due to an exception. The following code will produce the exception you've described.
for (int i = 0; i < 10; i++)
{
const string path = #"[path].xml";
try
{
// After the first exception, this call will start throwing
// an exception to the effect that the file is in use
StreamWriter sWrite = new StreamWriter(path, true);
// The first time I run this exception will be raised
throw new Exception();
// Close will never get called and now I'll get an exception saying that the file is still in use
// when I try to open it again. That's because the file lock was never released due to the exception
sWrite.Close();
}
catch (Exception e)
{
}
//LOTS OF SWRITE LINES HERE
Process.Start(path);
}
A "using" block will fix this because it's equivalent to:
try
{
//...
}
finally
{
stream.Dispose();
}
In the context of your code, if you're doing a whole bunch of line writes it actually does make sense to consider if (and when) you want to call Flush at some point. The question is whether the write should be "all or none" - i.e. if an exception occurs, do you want the previous lines to still be written? If not, just use a "using" block - it'll call "Flush" once at the end in the "Dispose." Otherwise, you can call "Flush" earlier. For example:
using (StreamWriter sw = new StreamWriter(...))
{
sw.WriteLine("your content");
// A bunch of writes
// Commit everything we've written so far to disc
// ONLY do this if you could stop writing at this point and have the file be in a valid state.
sw.Flush();
sw.WriteLine("more content");
// More writes
} // Now the using calls Dispose(), which calls Flush() again
A big possible bug is if you're doing this on multiple threads (especially if you're doing a lot of writes). If one thread calls your method and starts writing to the file, and then another thread calls it too and tries to start writing to the file as well, the second thread's call will fail because the first thread's still using the file. If this is the case, you'll need to use some kind of lock to make sure that the threads "take turns" writing to the file.
here is what you can do for example before trying to open the file from the Process.Start
var path = #"C:\Demo\index.html";
using (FileStream fs = new FileStream(path, FileMode.Append, FileAccess.Write))
using (StreamWriter sw = new StreamWriter(fs))
{
sw.WriteLine("Your contents to be written go here");
}
System.Diagnostics.Process.Start(path);

Nonblocking io using BinaryWriter to write to usblp0

I'm doing a program in c# (mono) to print to a fiscal printer (escpos) and it works okay. The problem is that when I print, the program hangs until the buffer I have is cleared. So, as you imagine if I print a couple of images it gets bigger and so it hangs for a while. This is not desirable. I have tested in 2 ways
One way:
BinaryWriter outBuffer;
this.outBuffer = new BinaryWriter(new FileStream (this.portName,System.IO.FileMode.Open));
.... apend bytes to buffer...
IAsyncResult asyncResult = null;
asyncResult = outBuffer.BaseStream.BeginWrite(buffer,offset,count,null,null);
asyncResult.AsyncWaitHandle.WaitOne(100);
outBuffer.BaseStream.EndWrite(asyncResult); // Last step to the 'write'.
if (!asyncResult.IsCompleted) // Make sure the write really completed.
{
throw new IOException("Writte to printer failed.");
}
second Way:
BinaryWriter outBuffer;
this.outBuffer = new BinaryWriter(new FileStream (this.portName,System.IO.FileMode.Open));
.... apend bytes to buffer...
outBuffer.Write(buffer, 0, buffer.Length);
and neither method is allowing the program to continue the execution. Example: if it starts to print and paper is out it will hang until the printer resumes printing which is not the right way.
Thanks in advance for your time and patience.
The problem is that you're making the program wait for the write to complete. If you want it to happen asynchronously, then you need to provide a callback method that will be called when the write is done. For example:
asyncResult = outBuffer.BaseStream.BeginWrite(buffer,offset,count,WriteCallback,outBuffer);
private void WriteCallback(IAsyncResult ar)
{
var buff = (BinaryWriter)ar.AsyncState;
// following will throw an exception if there was an error
var bytesWritten = buff.BaseStream.EndWrite(ar);
// do whatever you need to do to notify the program that the write completed.
}
That's one way to do it. You should read up on the Asynchronous Programming Model for other options, and pick the one that best suits your needs.
You can also use the Task Parallel Library, which might be a better fit.

How to handle large numbers of concurrent disk write requests as efficiently as possible

Say the method below is being called several thousand times by different threads in a .net 4 application. What’s the best way to handle this situation? Understand that the disk is the bottleneck here but I’d like the WriteFile() method to return quickly.
Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory Queue where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatching Queue and writing files. If your WriteFile is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.
UPDATE:
Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.
Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.
Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a Task:
private void WriteFileAsynchronously(string FileName, MemoryStream Data)
{
Task.Factory.StartNew(() => WriteFileSynchronously(FileName, Data));
}
private void WriteFileSynchronously(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.
If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has WriteFile just throwing stuff into a ConcurrentQueue or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.
I've had a similar problem that I solved by having the WriteFile method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.
A second thread in my program starts reading that file that WriteFile is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.
This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.
It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.
Encapsulate your complete method implementation in a new Thread(). Then you can "fire-and-forget" these threads and return to the main calling thread.
foreach (file in filesArray)
{
try
{
System.Threading.Thread updateThread = new System.Threading.Thread(delegate()
{
WriteFileSynchronous(fileName, data);
});
updateThread.Start();
}
catch (Exception ex)
{
string errMsg = ex.Message;
Exception innerEx = ex.InnerException;
while (innerEx != null)
{
errMsg += "\n" + innerEx.Message;
innerEx = innerEx.InnerException;
}
errorMessages.Add(errMsg);
}
}

File Locking (Read/Write) in ASP.NET Application

I have two ASP.NET web application. One is responsible for processing some info and writing to a log file, and the other application is reponsible for reading the log file and displays the information based on user request.
Here's my code for the Writer
public static void WriteLog(String PathToLogFile, String Message)
{
Mutex FileLock = new Mutex(false, "LogFileMutex");
try
{
FileLock.WaitOne();
using (StreamWriter sw = File.AppendText(FilePath))
{
sw.WriteLine(Message);
sw.Close();
}
}
catch (Exception ex)
{
LogUtil.WriteToSystemLog(ex);
}
finally
{
FileLock.ReleaseMutex();
}
}
And here's my code for the Reader :
private String ReadLog(String PathToLogFile)
{
FileStream fs = new FileStream(
PathToLogFile, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite);
StreamReader Reader = new StreamReader(fs);
return Reader.ReadToEnd();
}
My question, is the above code enough to prevent locking in a web garden environemnt?
EDIT 1 : Dirty read is okay.
EDIT 2 : Creating Mutex with new Mutex(false, "LogFileMutex"), closing StreamWriter
Sounds like your trying to implement a basic queue. Why not use a queue that gives you guarenteed availability. You could drop the messages into an MSMQ, then implement a windows service which will read from the queue and push the messages to the DB. If the writting to the DB fails you simply leave the message on the queue (Although you will want to handle posion messages so if it fails cause the data is bad you don't end up in an infinite loop)
This will get rid of all locking concerns and give you guarenteed delivery to your reader...
You should also be disposing of your mutex, as it derives from WaitHandle, and WaitHandle implements IDisposable:
using (Mutex FileLock = new Mutex(true, "LogFileMutex"))
{
// ...
}
Also, perhaps consider a more unique name (a GUID perhaps) than "LogFileMutex", since another unrelated process could possibly use the same name inadvertantly.
Doing this in a web based environment, you are going to have a lot of issues with file locks, can you change this up to use a database instead?
Most hosting solutions are allowing up to 250mb SQL databases.
Not only will a database help with the locking issues, it will also allow you to purge older data more easily, after a wile, that log read is going to get really slow.
No it won't. First, you're creating a brand new mutex with every call so multiple threads are going to access the writing critical section. Second, you don't even use the mutex in the reading critical section so one thread could be attempting to read the file while another is attempting to write. Also, you're not closing the stream in the ReadLog method so once the first read request comes through your app won't be able to write any log entries anyway until garbage collection comes along and closes the stream for you... which could take awhile.

Categories