How to periodically flush c# FileStream to the disk? - c#

Context:
I am implementing a logging mechanism for a Web API project that writes serialized objects to a file from multiple methods which in turn is read by an external process (nxLog to be more accurate). The application is hosted on IIS and uses 18 worker processes. The App pool is recycled once a day. The expected load on the services that will incorporate the logging methods is 10,000 req/s. In short this is a classic produces/consumer problem with multiple producers (the methods that produce logs) and one consumer (the external process who reads from the log files). Update: Each process uses multiple threads as well.
I used BlockingCollection to store data (and solve the race condition) and a long running task that writes the data from the collection to the disk.
To write to the disk I am using a StreamWriter and a FileStream.
Because the write frequency is almost constant ( as I said 10,000 write per second) I decided to keep the streams open for the entire lifetime of the application pool and periodically write logs to the disk. I rely on the App Pool recycle and my DI framework to dispose my logger daily. Also note that this class will be singleton, because I didn't want to have more than one thread dedicated to writing from my thread pool.
Apparently the FileStream object will not write to the disk until it is disposed. Now I don't want the FileStream to wait for an entire day until it writes to the disk. The memory it will require to hold all that serialized object will be tremendous, not to mention that any crash on the application or the server will cause data loss or corrupted file.
Now my question:
How can I have the underlying streams (FileStream and StreamWriter) write to the disk periodically without disposing them? My initial assumption was that it will write to the disk once FileSteam exceeds its buffer size which is 4K by default.
UPDATE: The inconsistencies mentioned in the answer have been fixed.
Code:
public class EventLogger: IDisposable, ILogger
{
private readonly BlockingCollection<List<string>> _queue;
private readonly Task _consumerTask;
private FileStream _fs;
private StreamWriter _sw;
public EventLogger()
{
OpenFile();
_queue = new BlockingCollection<List<string>>(50);
_consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
private void OpenFile()
{
_fs?.Dispose();
_sw?.Dispose();
_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
_fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
_sw = new StreamWriter(_fs);
}
public void Dispose()
{
_queue?.CompleteAdding();
_consumerTask?.Wait();
_sw?.Dispose();
_fs?.Dispose();
_queue?.Dispose();
}
public void Log(List<string> list)
{
try
{
_queue.TryAdd(list, 100);
}
catch (Exception e)
{
LogError(LogLevel.Error, e);
}
}
private void Write()
{
foreach (List<string> items in _queue.GetConsumingEnumerable())
{
items.ForEach(item =>
{
_sw?.WriteLine(item);
});
}
}
}

There are a few "inconsistencies" with your question.
The application is hosted on IIS and uses 18 worker processes
.
_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
.
writes serialized objects to a file from multiple methods
Putting all of this together, you seem to have a single threaded situation as opposed to a multi-threaded one. And since there is a separate log per process, there is no contention problem or need for synchronization. What I mean to say is, I don't see why the BlockingCollection is needed at all. It's possible that you forgot to mention that there are multiple threads within your web process. I will make that assumption here.
Another problems is that your code does not compile
class name is Logger but the EventLogger function looks like a constructor.
some more incorrect syntax with string, etc
Putting all that aside, if you really have a contention situation and want to write to the same log via multiple threads or processes, your class seems to have most of what you need. I have modified your class to do some more things. Chief to note are the below items
Fixed all the syntax errors making assumptions
Added a timer, which will call the flush periodically. This will need a lock object so as to not interrupt the write operation
Used an explicit buffer size in the StreamWriter constructor. You should heuristically determine what size works best for you. Also, you should disable AutoFlush from StreamWriter so you can have your writes hit the buffer instead of the file, providing better performance.
Below is the code with the changes
public class EventLogger : IDisposable, ILogger {
private readonly BlockingCollection<List<string>> _queue;
private readonly Task _consumerTask;
private FileStream _fs;
private StreamWriter _sw;
private System.Timers.Timer _timer;
private object streamLock = new object();
private const int MAX_BUFFER = 16 * 1024; // 16K
private const int FLUSH_INTERVAL = 10 * 1000; // 10 seconds
public EventLogger() {
OpenFile();
_queue = new BlockingCollection<List<string>>(50);
_consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
void SetupFlushTimer() {
_timer = new System.Timers.Timer(FLUSH_INTERVAL);
_timer.AutoReset = true;
_timer.Elapsed += TimedFlush;
}
void TimedFlush(Object source, System.Timers.ElapsedEventArgs e) {
_sw?.Flush();
}
private void OpenFile() {
_fs?.Dispose();
_sw?.Dispose();
var _logFilePath = $"D:\\Log\\log{DateTime.Now.ToString("yyyyMMdd")}{System.Diagnostics.Process.GetCurrentProcess().Id}.txt";
_fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
_sw = new StreamWriter(_fs, Encoding.Default, MAX_BUFFER); // TODO: use the correct encoding here
_sw.AutoFlush = false;
}
public void Dispose() {
_timer.Elapsed -= TimedFlush;
_timer.Dispose();
_queue?.CompleteAdding();
_consumerTask?.Wait();
_sw?.Dispose();
_fs?.Dispose();
_queue?.Dispose();
}
public void Log(List<string> list) {
try {
_queue.TryAdd(list, 100);
} catch (Exception e) {
LogError(LogLevel.Error, e);
}
}
private void Write() {
foreach (List<string> items in _queue.GetConsumingEnumerable()) {
lock (streamLock) {
items.ForEach(item => {
_sw?.WriteLine(item);
});
}
}
}
}
EDIT:
There are 4 factors controlling the performance of this mechanism, and it is important to understand their relationship. Below example will hopefully make it clear
Let's say
average size of List<string> is 50 Bytes
Calls/sec is 10,000
MAX_BUFFER is 1024 * 1024 Bytes (1 Meg)
You are producing 500,000 Bytes of data per second, so a 1 Meg buffer can hold only 2 seconds worth of data. i.e. Even if FLUSH_INTERVAL is set to 10 seconds the buffer will AutoFlush every 2 seconds (on an average) when it runs out of buffer space.
Also remember that increasing the MAX_BUFFER blindly will not help, since the actual flush operation will take longer due to the bigger buffer size.
The main thing to understand is that when there is a difference in incoming data rates (to your EventLog class) and outgoing data rates (to the disk), you will either need an infinite sized buffer (assuming continuously running process) or you will have to slow down your avg. incoming rate to match avg. outgoing rate

Maybe my answer won't address your concrete concern, but I believe that your scenario could be a good use case for memory-mapped files.
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
This could be very interesting because you'll be able to do logging from different processes (i.e. IIS worker processes) without locking issues. See MemoryMappedFile.OpenExisting method.
Also, you can log to a non-persistent shared memory-mapped file and, using a task scheduler or a Windows service, you can take pending logs to their final destination using a persistable memory-mapped file.
I see a lot of potential on using this approach because of your multi/inter-process scenario.
Approach #2
If you don't want to re-invent the wheel, I would go for a reliable message queue like MSMQ (very basic, but still useful in your scenario) or RabbitMQ. Enqueue logs in persistent queues, and a background process may consume these log queues to write logs to the file system.
This way, you can create log files once, twice a day, or whenever you want, and you're not tied to the file system when logging actions within your system.

Use the FileStream.Flush() method - you might do this after each call to .Write. It will clear buffers for the stream and causes any buffered data to be written to the file.
https://msdn.microsoft.com/en-us/library/2bw4h516(v=vs.110).aspx

Related

How to guaranteed write into file in multithreading with exceptions?

This is a simplified example
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
new Thread(() => Method1()).Start();
new Thread(() => Method2()).Start();
Console.Read();
}
private static void Method1()
{
using (StreamWriter sw = new StreamWriter(#"h:\data.txt"))
{
int i = 100000000;
while (true)
{
Thread.Sleep(100);
sw.WriteLine(i);
i++;
}
}
}
private static void Method2()
{
Thread.Sleep(6000);
throw null;
}
}
}
StreamWriter doesn't write the data into file if exception occurs too early and in another thread. File data.txt is empty at the time when exception occurs.
I played with this situation a little and found a bunch of workarounds:
If I increase the sleep interval for the exception's thread (and decrease interval between writings into file) the file will be filled with data. It is not a choice because I don't know when an exception occurs.
As a consequence of previous workaround I can decrease the buffer size of the stream writer. But it seems not working if I set it too small - for example, this code
FileStream fs = new FileStream(#"h:\data.txt", FileMode.Create);
using (StreamWriter sw = new StreamWriter(fs, Encoding.Default, 10))
doesn't work because the first writing operation occurs only when about 385 integers are waiting in the buffer to be written into file.
File will be filled if I close the writer before exception occurs. But that is not a good choice - I have to write into file from 1 to 10 times per second. It is not a good idea to open and close the writer so frequently, is it?
I can catch the exception like this
private static void Method2()
{
try
{
Thread.Sleep(6000);
throw null;
}
catch
{
Console.WriteLine("Exception!");
}
}
and all will be OK - no application termination and file will be filled pack by pack. But that is not the case also - I can't control when and where exceptions occur. I try to use try-catch everywhere but I can miss something.
So the situation is: StreamWriter's buffer is not full, exception occured in another thread and is not catched, so the application will be terminated. How not to lose this data and to write it into file?
As I understand your situation you are assuming that there is a bug somewhere and the process might be terminated at any time. You want to save as much data as possible.
You should be able to call Flush on the StreamWriter. This will push the data to the OS. If your process terminates the data will eventually be written by the OS.
In case you cannot convince StreamWriter to actually flush for some reason you can use a FileStream and write to that (pseudocode: fileStream.Write(Encoding.GetBytes(myString))). You can then flush the FileStream or use a buffer size of 1.
Of course it's best if you prevent the process from being terminated in the first place. This usually is straight forward with Task as opposed to using raw Threads.
Flushing the stream will ensure that all of it's content is pushed into it's underlying file.
That will take care of ensring all of the data is saved after you completed an operation and that a following exception will not make your application loose data.

static class with so many request at same time

I created a static ip and log class.
the ip class find out the users ip address and log class log it into a text file.
every thing work just fine but i wonder what happens if so many requests came at a same time?
i mean the both classes are static and base on static classes it causes problem.
how can i managed them?
here is my ip class:
public static class IP
{
public static string IP()
{
System.Web.HttpContext context = System.Web.HttpContext.Current;
string ipAddress = context.Request.ServerVariables["HTTP_X_FORWARDED_FOR"];
if (!string.IsNullOrEmpty(ipAddress))
{
string[] addresses = ipAddress.Split(',');
if (addresses.Length != 0)
{
return addresses[0];
}
}
return context.Request.ServerVariables["REMOTE_ADDR"];
}
}
}
and here part of my log class which write into the text file:
private static void WriteLine(string message)
{
string filePath = FilePath();
CreateFile(filePath);
try
{
using (StreamWriter log = File.AppendText(filePath))
log.WriteLine(message);
}
catch (Exception)
{
//If can not access to file do nothing
//throw;
}
}
You aren't going to run into contention problems due to your classes being static. Your IP.IP() method class is pure (i.e. it does not change the state of anything) and contains no locks, so there is no chance of there being any contention there.
You do potentially have problems in WriteLine due to the fact that you are probably writing your log file on the same thread as you are doing your work. That means the file write is acting as a lock since only one write can occur at any one time.
What you want is to log to a queue and then to write that queue on a separate thread; that is a classic producer-consumer pattern.
Alternatively you could avoid reinventing the wheel and use an existing logging framework that will handle these things for you like log4net
Streamwriter has a default of 4kb buffer which can be modified if needed as defined by:
public StreamWriter(
Stream stream,
Encoding encoding,
int bufferSize
)
More than likely, your computer (including disk access) is most likely a lot faster than your internet access.
It will work fine, because you don't have any public variable which will be kept in memory and changing on every time class is accessed.
So as the method ends, the scope of your variables will be finished.
However if they are in memory, they will not be effected by how many users use it at the same time, and there will be no mess.

What is the maximum number of simultaneous I/O operations in .net 4.5?

What is the maximum number of files I can write simultaneously using a Multithreaded scenario, using static void, ie:
public static void WriteToXmlFile<T>(string filePath, T objectToWrite, bool append = false) where T : new()
{
TextWriter writer = null;
try
{
var serializer = new XmlSerializer(typeof(T));
writer = new StreamWriter(filePath, append);
serializer.Serialize(writer, objectToWrite);
}
finally
{
if (writer != null)
writer.Close();
}
}
This static void will be accessed from multiple tasks running simultaneously. Is there any default limit for the number of files I can write at the same time, or any way I can modify any such property?
The bottleneck here is going to be in hardware, not in software. The head on your disk drive can only be in one place at any one time. It isn't going to work any faster (and it is likely to work slower to to repeatedly seeking to new blocks) if there are multiple threads telling it to do stuff at the same time.
The degree of parallelziation that would be helpful is going to be equal to the number of physical drives that you have. If you have 4 disk drives, you can benefit from writing out 4 files at once.
And it's of course worth noting that there is no need for more than one thread here, ever, regardless of whether you want to perform parallel writes or not. You can use asynchrony rather than multiple threads to achieve parallelism, if you actually have multiple physical disk drives.

Implementing stop and restart in file stream transfer - how to? C# .NET

I'm looking for texts or advice on implementing stop and restart in file stream transfer.
The goal is for the application to use a single read source, and output to multiple write sources, and be able to restart from a recorded position if a transfer fails.
The application is being written in C# .NET.
Psuedo code:
while (reader.Read())
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
I need to be able to implement stop or pause. Which could work like so. To stop, continue is marked false:
while (reader.Read() && Continue)
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
Clearly at this stage I need to record the number of bytes read, and the number of bytes written to each write source.
My questions are:
If I were to only record the read bytes, and use this for restarts, one or more writers could have written while others have not. Simply restarting using a measure of read progress might corrupt the written data. So I need to use a 'written bytes per writer' record as my new start position. How can I be sure that the bytes were written (I may not have the ability to read the file from the write source to read the file length)?
Can anyone adviser or point me in the right direction of a text on this kind of issue?
Use a thread synchronization event.
(pseudocode):
ManualResetEvent _canReadEvent = new ManualResetEvent(true);
public void WriterThreadFunc()
{
while (_canReadEvent.Wait() && reader.Read())
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
}
public void Pause()
{
_canReadEvent.Reset();
}
public void Continue()
{
_canReadEvent.Set();
}
The good thing is that the writer thread won't consume any CPU when it's paused and it will continue directly it's signaled (as opposed to using a flag and Thread.Sleep())
The other note is that any check should be the first argument in the while since reader.Read() will read from the stream otherwise (but the content will be ignored since the flag will prevent the while block from being executed).

How to handle large numbers of concurrent disk write requests as efficiently as possible

Say the method below is being called several thousand times by different threads in a .net 4 application. What’s the best way to handle this situation? Understand that the disk is the bottleneck here but I’d like the WriteFile() method to return quickly.
Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory Queue where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatching Queue and writing files. If your WriteFile is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.
UPDATE:
Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.
Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.
Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a Task:
private void WriteFileAsynchronously(string FileName, MemoryStream Data)
{
Task.Factory.StartNew(() => WriteFileSynchronously(FileName, Data));
}
private void WriteFileSynchronously(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.
If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has WriteFile just throwing stuff into a ConcurrentQueue or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.
I've had a similar problem that I solved by having the WriteFile method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.
A second thread in my program starts reading that file that WriteFile is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.
This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.
It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.
Encapsulate your complete method implementation in a new Thread(). Then you can "fire-and-forget" these threads and return to the main calling thread.
foreach (file in filesArray)
{
try
{
System.Threading.Thread updateThread = new System.Threading.Thread(delegate()
{
WriteFileSynchronous(fileName, data);
});
updateThread.Start();
}
catch (Exception ex)
{
string errMsg = ex.Message;
Exception innerEx = ex.InnerException;
while (innerEx != null)
{
errMsg += "\n" + innerEx.Message;
innerEx = innerEx.InnerException;
}
errorMessages.Add(errMsg);
}
}

Categories