How to optimize writing to multiple files in the background - c#

My program needs to write very often messages to several files. As it is very time consuming, I need to optimise it. Below, you can find an extract from my program where I try to write async to file in the background. It seems to work, but I am not sure if it is the best practice as I do not dispose tasks (this part is commented). I do not do it because I do not want my program to wait for those tasks completion. Simply, I want my message to be written to few files in the background as quickly as possible. As those files could be accessed by several threads, I added lock.
I use static methods because these methods are used everywhere in my code and I do not want to instantiate this class, just to write one line of message to file, everywhere (maybe that's wrong).
================== Class ==============================================
namespace test
{
public static class MessageLOG
{
private static string debugFileName = Settings.DebugLOGfilename;
private static string logFileName = Settings.LOGfilename;
private static object DebuglockOn = new object();
private static object LoglockOn = new object();
private static StreamWriter DebugSW;
private static StreamWriter LogSW;
private static void DebugFile(string message)
{
uint linesCount = 0;
string _filename = debugFileName;
if(DebugSW == null && !string.IsNullOrEmpty(_filename))
DebugSW = new StreamWriter(_filename);
if(DebugSW != null)
{
lock(DebuglockOn)
{
DebugSW.WriteLine(message);
linesCount++;
if (linesCount > 10)
{
DebugSW.Flush();
linesCount = 0;
}
}
}
}
private static void LogFile(string message)
{
uint linesCount = 0;
string _filename = logFileName;
if(LogSW == null && !string.IsNullOrEmpty(_filename))
LogSW = new StreamWriter(_filename);
if(LogSW != null)
{
lock(LoglockOn)
{
LogSW.WriteLine(string.Format("{0} ({1}): {2}", DateTime.Now.ToShortDateString(), DateTime.Now.ToShortTimeString(), message));
linesCount++;
if (linesCount > 10)
{
LogSW.Flush();
linesCount = 0;
}
}
}
public static void LogUpdate(string message)
{
ThreadPool.QueueUserWorkItem(new WaitCallback( (x) => LogFile(message)));
ThreadPool.QueueUserWorkItem(new WaitCallback( (x) => DebugFile(message)));
ThreadPool.QueueUserWorkItem(new WaitCallback( (x) => Debug.WriteLine(message)));
}
//This method will be called when the main thread is being closed
public static void CloseAllStreams()
{
if (DebugSW != null)
{
DebugSW.Flush();
DebugSW.Close();
}
if (LogSW != null)
{
LogSW.Flush();
LogSW.Close();
}
}
=============== main window ===========
void MainWIndow()
{
... some code ....
MessageLog.LogUpdate("Message text");
... code cont ....
MessageLog.CloseAllStreams();
}

You should re-think your design. Your locks should not be local variables in your method. This is redundant because each method call creates a new object and locks to it. This will not force synchronization across multiple threads (https://msdn.microsoft.com/en-us/library/c5kehkcz(v=vs.80).aspx). Since your methods are static, the locks need to be static variables and you should have a different lock per file. You can use ThreadPool.QueueUserWorkItem (https://msdn.microsoft.com/en-us/library/kbf0f1ct(v=vs.110).aspx) instead of Tasks. ThreadPool is an internal .NET class that re-uses threads to run async operations. This is perfect for your use case because you don't need control over each thread. You just need some async operation to execute and finish on its own.
A better approach would be to create a logger class that runs on its own thread. You can have a queue and enqueue messages from multiple threads and then have the LoggerThread handle writing to the file. This will ensure that only one thread is ever writing to the file. This will also maintain logging order if you use a FIFO queue. You will no longer need to lock writing to the file, but you will need to lock your queue. You can use the .NET Monitor (https://msdn.microsoft.com/en-us/library/system.threading.monitor(v=vs.110).aspx) class to block the LoggerThread until a message is queued (look at methods Enter/Wait/Pulse). To optimize it even more, you can now keep a stream open to the file and push data to it as it gets queued. Since only one thread ever accesses the file, this will be OK. Just remember to close the stream to the file when you are done. You can also set up a timer that goes off once in a while to flush the content. Keeping the stream open is not always recommended, especially if you anticipate other applications attempting to lock the file. However, in this case, it might be OK. This will be a design decision you need to make that fits best with your application.

You´re opening a new stream and committing writings for each write action : very poor performance.
My recommendation is to use only one StreamWriter for each file, this instance must be a class field and you need to still using the lock to ensure is thread safe.
Also this would require that you don't use the using statement in each write method.
Also periodically , maybe every X number of writes, you could make a Stream.Flush to commit writings on disk. This Flush must be protected by the lock.

Related

System.Threading.ThreadStateException (maybe a raise condition)

First, I would like to describe the general structure of the classes/methods involved in my problem.
I have a class which should start a thread cyclically. This thread deals with a function which writes the log entries into a log file. I realized this with a timer (System.Threading.Timer). Further there is a ThreadHandler class which keeps all threads in a list. These threads are controlled with the standard functions of System.Threading.Thread by the name of the thread. Now to the code that is affected by my problem:
In the constructor of my Log class (LogWriter) I call the method InitializeLoggerThread():
private void InitializeLoggerThread()
{
LoggerLoggingThread = new System.Threading.Thread(new System.Threading.ThreadStart(WriteLog));
LoggerLoggingThread.Name = LoggerLogginThreadName;
ObjectObserver.ThreadHandler.AddThread(LoggerLoggingThread); // Obersver class from which all objects can be accessed
}
The Timer itself will be starte das
public void StartLogging()
{
this.LoggerTimer = new System.Threading.Timer(LoggerCallback, null, 1000, LoggerInterval);
}
Furthermore the Log class contains the implementation oft he timer:
private const int LoggerInterval = 5000;
private System.Threading.Thread LoggerLoggingThread;
private static void LoggerCallback(object state)
{
if ((BufferCount > 0))
{
ObjectObserver.ThreadHandler.StartThread(LoggerLogginThreadName);
}
}
The ThreadHandler will strart the thread with te following function:
public void StartThread(string threadName)
{
lock (Locker)
{
if (GetThread(threadName).ThreadState == ThreadState.Stopped || GetThread(threadName).ThreadState == ThreadState.Unstarted)
{
GetThread(threadName).Start();
}
}
}
I have already checked the parameters etc.. Everything is correct in this case. Basically, when debugging, it seems that the threads all try to start the logger thread at the same time.
The thread will be calles by its name with the following function:
public Thread GetThread(string threadName)
{
foreach (Thread thread in Threads)
{
if (thread.Name == threadName)
{
return thread;
}
}
return null;
}
Now to my question: What is the error in my construct that I get
System.Threading.ThreadStateException at StartThread(...)
after the first execution a multiple attempt of the execution?
If desired, I can provide a copy&paste code of all important functions for debugging.
Thread passes through a set of states (taken from here):
Once thread is completed you cannot start it again. Your timer function tries to start Stopped thread though:
if (GetThread(threadName).ThreadState == ThreadState.Stopped ...
GetThread(threadName).Start();
There are awesome logging frameworks in .NET, such as NLog and log4net, please don't try to reinvent the wheel, most likely these frameworks already can do what you need and they are doing that much more efficiently.

Suggestions to improve Task responsiveness?

I have an application which downloads a single file in 4 different segments (each segment is a different long running task) and serializes the progress at periodic intervals.
When the serialize takes place on a 2 core machine the process will block for 3-10 seconds and then complete. Once this blocking behavior has happened the first time it will never happen again. The 2nd to (n) calls to serialize execute immediately without a hitch. It appears that during the first call blocking the framework did some Task/Threading optimization that prevents the blocking behavior form occurring again.
Does anyone have any insight into this optimization and if it is possible to do this type of optimization at initialization to avoid the blocking behavior all together?
Here is some pseudo code to help describe the situation:
class Download
{
private DownloadSegment[] _downloadSegments = new DownloadSegment[4];
public void StartDownload()
{
for (int i = 0; i < 4; i++)
{
_downloadSegments[i] = new DownloadSegment();
_downloadSegments[i].Start();
}
}
public void Serialize()
{
try
{
//enter lock so the code does not serilize the progress while
//writing new downloaded info at the same time
foreach(DownloadSegment segment in _downloadSegments)
{
Monitor.Enter(segment.SyncRoot);
}
//code to serialize the progress
}
finally
{
foreach (DownloadSegment segment in _downloadSegments)
{
Monitor.Exit(segment.SyncRoot);
}
}
}
}
class DownloadSegment
{
public object SyncRoot = new object();
public void Start()
{
Task downloadTask = new Task(
() =>
{
//download code
lock (SyncRoot)
{
//wirte to disk code
}
},
new CancellationToken(),
TaskCreationOptions.LongRunning);
downloadTask.Start(TaskScheduler.Default);
}
}
Thank you for your suggestions usr. It turns out it was the filestream write causing the problem.
It turns out that when you use FileStream.SetLength() to allocate file space for a large file and then seek to a section deep inside that file, once you start writing at that position in the file the sparse file space will be "backfilled" and this can be slow. The locks I have in the code simply expose the problem by making the unblocked segments wait on the ones that are busy.
I will be posting a separate question to see if there is a more efficient way to write/create these files.

Do I need to use locking in the following c# code?

In the following code I am using two threads to share sane resource in this example it's a queue so do I need to use lock while en-queueing or dequeuing if yes then why because program seems to work fine.
class Program
{
static Queue<string> sQ = new Queue<string>();
static void Main(string[] args)
{
Thread prodThread = new Thread(ProduceData);
Thread consumeThread = new Thread(ConsumeData);
prodThread.Start();
consumeThread.Start();
Console.ReadLine();
}
private static void ProduceData()
{
for (int i = 0; i < 100; i++)
{
sQ.Enqueue(i.ToString());
}
}
private static void ConsumeData()
{
while (true)
{
if (sQ.Count > 0)
{
string s = sQ.Dequeue();
Console.WriteLine("DEQUEUE::::" + s);
}
}
}
}
Yes you do, System.Collections.Generic.Queue<T> is not thread safe for being written to and read from at the same time. You either need to lock on the same object before enquing or dequing or if you are using .NET 4/4.5 use the System.Collections.Concurrent.ConcurrentQueue<T> class instead and use the TryDequeue method.
The reason your current implementation has not caused you a problem so far, is due to the Thread.Sleep(500) call (not something you should be using in production code) which means that the prodThread doesn't write to the queue while the consumeThread reads from it since the read operation takes less than 500ms. If you remove the Thread.Sleep odds are it will throw an exception at some point.

Threads interaction (data from one thread to another) c#

I need to pass information from thread of scanning data to recording information thread(write to xml file).
It should looks something like this:
Application.Run() - complete
Scanning thread - complete
Writing to xlm thread - ???
UI update thread - I think I did it
And what i got now:
private void StartButtonClick(object sender, EventArgs e)
{
if (FolderPathTextBox.Text == String.Empty || !Directory.Exists(FolderPathTextBox.Text)) return;
{
var nodeDrive = new TreeNode(FolderPathTextBox.Text);
FolderCatalogTreeView.Nodes.Add(nodeDrive);
nodeDrive.Expand();
var t1 = new Thread(() => AddDirectories(nodeDrive));
t1.Start();
}
}
private void AddDirectories(TreeNode node)
{
string strPath = node.FullPath;
var dirInfo = new DirectoryInfo(strPath);
DirectoryInfo[] arrayDirInfo;
FileInfo[] arrayFileInfo;
try
{
arrayDirInfo = dirInfo.GetDirectories();
arrayFileInfo = dirInfo.GetFiles();
}
catch
{
return;
}
//Write data to xml file
foreach (FileInfo fileInfo in arrayFileInfo)
{
WriteXmlFolders(null, fileInfo);
}
foreach (DirectoryInfo directoryInfo in arrayDirInfo)
{
WriteXmlFolders(directoryInfo, null);
}
foreach (TreeNode nodeFil in arrayFileInfo.Select(file => new TreeNode(file.Name)))
{
FolderCatalogTreeView.Invoke(new ThreadStart(delegate { node.Nodes.Add(nodeFil); }));
}
foreach (TreeNode nodeDir in arrayDirInfo.Select(dir => new TreeNode(dir.Name)))
{
FolderCatalogTreeView.Invoke(new ThreadStart(delegate
{node.Nodes.Add(nodeDir);
}));
StatusLabel.BeginInvoke(new MethodInvoker(delegate
{
//UI update...some code here
}));
AddDirectories(nodeDir);
}
}
private void WriteXmlFolders(DirectoryInfo dir, FileInfo file)
{//writing information into the file...some code here}
How to pass data from AddDirectories(recursive method) thread to WriteXmlFolders thread?
Here is a generic mechanism how one thread generates data that another thread consumes. No matter what approach (read: ready made classes) you would use the internal principle stays the same. The main players are (note that there are many locking classes available in System.Threading namespace that could be used but these are the most appropriate for this scenario:
AutoResetEvent - this allows a thread to go into sleep mode (without consuming resources) until another thread will wake it up. The 'auto' part means that once the thread wakes up, the class is reset so the next Wait() call will again put it in sleep, without the need to reset anything.
ReaderWriterLock or ReaderWriterLockSlim (recommended to use the second if you are using .NET 4) - this allows just one thread to lock for writing data but multiple threads can read the data. In this particular case there is only one reading thread but the approach would not be different if there were many.
// The mechanism for waking up the second thread once data is available
AutoResetEvent _dataAvailable = new AutoResetEvent();
// The mechanism for making sure that the data object is not overwritten while it is being read.
ReaderWriterLockSlim _readWriteLock = new ReaderWriterLockSlim();
// The object that contains the data (note that you might use a collection or something similar but anything works
object _data = null;
void FirstThread()
{
while (true)
{
// do something to calculate the data, but do not store it in _data
// create a lock so that the _data field can be safely updated.
_readWriteLock.EnterWriteLock();
try
{
// assign the data (add into the collection etc.)
_data = ...;
// notify the other thread that data is available
_dataAvailable.Set();
}
finally
{
// release the lock on data
_readWriteLock.ExitWriteLock();
}
}
}
void SecondThread()
{
while (true)
{
object local; // this will hold the data received from the other thread
// wait for the other thread to provide data
_dataAvailable.Wait();
// create a lock so that the _data field can be safely read
_readWriteLock.EnterReadLock();
try
{
// read the data (add into the collection etc.)
local = _data.Read();
}
finally
{
// release the lock on data
_readWriteLock.ExitReadLock();
}
// now do something with the data
}
}
In .NET 4 it is possible to avoid using ReadWriteLock and use one of the concurrency-safe collections such as ConcurrentQueue which will internally make sure that reading/writing is thread safe. The AutoResetEvent is still needed though.
.NET 4 provides a mechanism that could be used to avoid the need of even AutoResetEvent - BlockingCollection - this class provides methods for a thread to sleep until data is available. MSDN page contains example code on how to use it.
In case you use it as the answer
Take a look at a producer consumer.
BlockingCollection Class
How to: Implement Various Producer-Consumer Patterns

Multi-threaded application interaction with logger thread

Here I am again with questions about multi-threading and an exercise of my Concurrent Programming class.
I have a multi-threaded server - implemented using .NET Asynchronous Programming Model - with GET (download) and PUT (upload) file services. This part is done and tested.
It happens that the statement of the problem says this server must have logging activity with the minimum impact on the server response time, and it should be supported by a low priority thread - logger thread - created for this effect. All logging messages shall be passed by the threads that produce them to this logger thread, using a communication mechanism that may not lock the thread that invokes it (besides the necessary locking to ensure mutual exclusion) and assuming that some logging messages may be ignored.
Here is my current solution, please help validating if this stands as a solution to the stated problem:
using System;
using System.IO;
using System.Threading;
// Multi-threaded Logger
public class Logger {
// textwriter to use as logging output
protected readonly TextWriter _output;
// logger thread
protected Thread _loggerThread;
// logger thread wait timeout
protected int _timeOut = 500; //500ms
// amount of log requests attended
protected volatile int reqNr = 0;
// logging queue
protected readonly object[] _queue;
protected struct LogObj {
public DateTime _start;
public string _msg;
public LogObj(string msg) {
_start = DateTime.Now;
_msg = msg;
}
public LogObj(DateTime start, string msg) {
_start = start;
_msg = msg;
}
public override string ToString() {
return String.Format("{0}: {1}", _start, _msg);
}
}
public Logger(int dimension,TextWriter output) {
/// initialize queue with parameterized dimension
this._queue = new object[dimension];
// initialize logging output
this._output = output;
// initialize logger thread
Start();
}
public Logger() {
// initialize queue with 10 positions
this._queue = new object[10];
// initialize logging output to use console output
this._output = Console.Out;
// initialize logger thread
Start();
}
public void Log(string msg) {
lock (this) {
for (int i = 0; i < _queue.Length; i++) {
// seek for the first available position on queue
if (_queue[i] == null) {
// insert pending log into queue position
_queue[i] = new LogObj(DateTime.Now, msg);
// notify logger thread for a pending log on the queue
Monitor.Pulse(this);
break;
}
// if there aren't any available positions on logging queue, this
// log is not considered and the thread returns
}
}
}
public void GetLog() {
lock (this) {
while(true) {
for (int i = 0; i < _queue.Length; i++) {
// seek all occupied positions on queue (those who have logs)
if (_queue[i] != null) {
// log
LogObj obj = (LogObj)_queue[i];
// makes this position available
_queue[i] = null;
// print log into output stream
_output.WriteLine(String.Format("[Thread #{0} | {1}ms] {2}",
Thread.CurrentThread.ManagedThreadId,
DateTime.Now.Subtract(obj._start).TotalMilliseconds,
obj.ToString()));
}
}
// after printing all pending log's (or if there aren't any pending log's),
// the thread waits until another log arrives
//Monitor.Wait(this, _timeOut);
Monitor.Wait(this);
}
}
}
// Starts logger thread activity
public void Start() {
// Create the thread object, passing in the Logger.Start method
// via a ThreadStart delegate. This does not start the thread.
_loggerThread = new Thread(this.GetLog);
_loggerThread.Priority = ThreadPriority.Lowest;
_loggerThread.Start();
}
// Stops logger thread activity
public void Stop() {
_loggerThread.Abort();
_loggerThread = null;
}
// Increments number of attended log requests
public void IncReq() { reqNr++; }
}
Basically, here are the main points of this code:
Start a low priority thread that loops the logging queue and prints pending logs to the output. After this, the thread is suspended till new log arrives;
When a log arrives, the logger thread is awaken and does it's work.
Is this solution thread-safe? I have been reading Producers-Consumers problem and solution algorithm, but in this problem although I have multiple producers, I only have one reader.
It seems it should be working. Producers-Consumers shouldn't change greatly in case of single consumer. Little nitpicks:
acquiring lock may be an expensive operation (as #Vitaliy Lipchinsky says). I'd recommend to benchmark your logger against naive 'write-through' logger and logger using interlocked operations. Another alternative would be exchanging existing queue with empty one in GetLog and leaving critical section immediately. This way none of producers won't be blocked by long operations in consumers.
make LogObj reference type (class). There's no point in making it struct since you are boxing it anyway. or else make _queue field to be of type LogObj[] (that's better anyway).
make your thread background so that it won't prevent closing your program if Stop won't be called.
Flush your TextWriter. Or else you are risking to lose even those records that managed to fit queue (10 items is a bit small IMHO)
Implement IDisposable and/or finalizer. Your logger owns thread and text writer and those should be freed (and flushed - see above).
While it appears to be thread-safe, I don't believe it is particularly optimal. I would suggest a solution along these lines
NOTE: just read the other responses. What follows is a fairly optimal, optimistic locking solution based on your own. Major differences is locking on an internal class, minimizing 'critical sections', and providing graceful thread termination. If you want to avoid locking altogether, then you can try some of that volatile "non-locking" linked list stuff as #Vitaliy Lipchinsky suggests.
using System.Collections.Generic;
using System.Linq;
using System.Threading;
...
public class Logger
{
// BEST PRACTICE: private synchronization object.
// lock on _syncRoot - you should have one for each critical
// section - to avoid locking on public 'this' instance
private readonly object _syncRoot = new object ();
// synchronization device for stopping our log thread.
// initialized to unsignaled state - when set to signaled
// we stop!
private readonly AutoResetEvent _isStopping =
new AutoResetEvent (false);
// use a Queue<>, cleaner and less error prone than
// manipulating an array. btw, check your indexing
// on your array queue, while starvation will not
// occur in your full pass, ordering is not preserved
private readonly Queue<LogObj> _queue = new Queue<LogObj>();
...
public void Log (string message)
{
// you want to lock ONLY when absolutely necessary
// which in this case is accessing the ONE resource
// of _queue.
lock (_syncRoot)
{
_queue.Enqueue (new LogObj (DateTime.Now, message));
}
}
public void GetLog ()
{
// while not stopping
//
// NOTE: _loggerThread is polling. to increase poll
// interval, increase wait period. for a more event
// driven approach, consider using another
// AutoResetEvent at end of loop, and signal it
// from Log() method above
for (; !_isStopping.WaitOne(1); )
{
List<LogObj> logs = null;
// again lock ONLY when you need to. because our log
// operations may be time-intensive, we do not want
// to block pessimistically. what we really want is
// to dequeue all available messages and release the
// shared resource.
lock (_syncRoot)
{
// copy messages for local scope processing!
//
// NOTE: .Net3.5 extension method. if not available
// logs = new List<LogObj> (_queue);
logs = _queue.ToList ();
// clear the queue for new messages
_queue.Clear ();
// release!
}
foreach (LogObj log in logs)
{
// do your thang
...
}
}
}
}
...
public void Stop ()
{
// graceful thread termination. give threads a chance!
_isStopping.Set ();
_loggerThread.Join (100);
if (_loggerThread.IsAlive)
{
_loggerThread.Abort ();
}
_loggerThread = null;
}
Actually, you ARE introducing locking here. You have locking while pushing a log entry to the queue (Log method): if 10 threads simultaneously pushed 10 items into queue and woke up the Logger thread, then 11th thread will wait until the logger thread log all items...
If you want something really scalable - implement lock-free queue (example is below). With lock-free queue synchronization mechanism will be really straightaway (you can even use single wait handle for notifications).
If you won't manage to find lock-free queue implementation in the web, here is an idea how to do this:
Use linked list for an implementation. Each node in linked list contains a value and a volatile reference to the next node. therefore for operations enqueue and dequeue you can use Interlocked.CompareExchange method. I hope, the idea is clear. If not - let me know and I'll provide more details.
I'm just doing a thought experiment here, since I don't have time to actually try code right now, but I think you can do this without locks at all if you're creative.
Have your logging class contain a method that allocates a queue and a semaphore each time it's called (and another that deallocates the queue and semaphore when the thread is done). The threads that want to do logging will call this method when they start. When they want to log, they push the message onto their own queue and set the semaphore. The logger thread has a big loop that runs through the queues and checks the associated semaphores. If the semaphore associated with the queue is greater than zero, then the queue gets popped off and the semaphore decremented.
Because you're not attempting to pop things off the queue until after the semaphore is set, and you're not setting the semaphore until after you've pushed things onto the queue, I think this will be safe. According to the MSDN documentation for the queue class, if you are enumerating the queue and another thread modifies the collection, an exception is thrown. Catch that exception and you should be good.

Categories