Writing to file in a thread safe manner - c#

Writing Stringbuilder to file asynchronously. This code takes control of a file, writes a stream to it and releases it. It deals with requests from asynchronous operations, which may come in at any time.
The FilePath is set per class instance (so the lock Object is per instance), but there is potential for conflict since these classes may share FilePaths. That sort of conflict, as well as all other types from outside the class instance, would be dealt with retries.
Is this code suitable for its purpose? Is there a better way to handle this that means less (or no) reliance on the catch and retry mechanic?
Also how do I avoid catching exceptions that have occurred for other reasons.
public string Filepath { get; set; }
private Object locker = new Object();
public async Task WriteToFile(StringBuilder text)
{
int timeOut = 100;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
while (true)
{
try
{
//Wait for resource to be free
lock (locker)
{
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
}
break;
}
catch
{
//File not available, conflict with other class instances or application
}
if (stopwatch.ElapsedMilliseconds > timeOut)
{
//Give up.
break;
}
//Wait and Retry
await Task.Delay(5);
}
stopwatch.Stop();
}

How you approach this is going to depend a lot on how frequently you're writing. If you're writing a relatively small amount of text fairly infrequently, then just use a static lock and be done with it. That might be your best bet in any case because the disk drive can only satisfy one request at a time. Assuming that all of your output files are on the same drive (perhaps not a fair assumption, but bear with me), there's not going to be much difference between locking at the application level and the lock that's done at the OS level.
So if you declare locker as:
static object locker = new object();
You'll be assured that there are no conflicts with other threads in your program.
If you want this thing to be bulletproof (or at least reasonably so), you can't get away from catching exceptions. Bad things can happen. You must handle exceptions in some way. What you do in the face of error is something else entirely. You'll probably want to retry a few times if the file is locked. If you get a bad path or filename error or disk full or any of a number of other errors, you probably want to kill the program. Again, that's up to you. But you can't avoid exception handling unless you're okay with the program crashing on error.
By the way, you can replace all of this code:
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
With a single call:
File.AppendAllText(Filepath, text.ToString());
Assuming you're using .NET 4.0 or later. See File.AppendAllText.
One other way you could handle this is to have the threads write their messages to a queue, and have a dedicated thread that services that queue. You'd have a BlockingCollection of messages and associated file paths. For example:
class LogMessage
{
public string Filepath { get; set; }
public string Text { get; set; }
}
BlockingCollection<LogMessage> _logMessages = new BlockingCollection<LogMessage>();
Your threads write data to that queue:
_logMessages.Add(new LogMessage("foo.log", "this is a test"));
You start a long-running background task that does nothing but service that queue:
foreach (var msg in _logMessages.GetConsumingEnumerable())
{
// of course you'll want your exception handling in here
File.AppendAllText(msg.Filepath, msg.Text);
}
Your potential risk here is that threads create messages too fast, causing the queue to grow without bound because the consumer can't keep up. Whether that's a real risk in your application is something only you can say. If you think it might be a risk, you can put a maximum size (number of entries) on the queue so that if the queue size exceeds that value, producers will wait until there is room in the queue before they can add.

You could also use ReaderWriterLock, it is considered to be more 'appropriate' way to control thread safety when dealing with read write operations...
To debug my web apps (when remote debug fails) I use following ('debug.txt' end up in \bin folder on the server):
public static class LoggingExtensions
{
static ReaderWriterLock locker = new ReaderWriterLock();
public static void WriteDebug(string text)
{
try
{
locker.AcquireWriterLock(int.MaxValue);
System.IO.File.AppendAllLines(Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", ""), "debug.txt"), new[] { text });
}
finally
{
locker.ReleaseWriterLock();
}
}
}
Hope this saves you some time.

Related

Overwrite instance of BinaryWriter C#

I have a class that needs to keep an instance of a BinaryWriter open over several write function calls (data is packet based). It also has to create a new file once it has written a certain amount of data/packets.
Normally I would just close the Binary Writer and reinstantiate it with a new file path, but the overhead associated with that operation is too great for my application. I tried closing the writer in a seperate thread, but that interferes with the new instance I create later.
My last ditch attempt was not to close the Writer (and stream) at all, and simply create a new instance of it everytime I'd written the required packets. This seems to work, and doesn't cause any memory leaks, but I'd really like to know what goes on if you do this.
Here is my (simplified) code to illustrate:
class Writer
{
BinaryWriter binWriter;
int bytesWritten;
int filesWritten;
const int maxFilesize = 10E9;
Writer(string filepath)
{
binWriter = new BinaryWriter(File.Open(filepath, FileMode.Create));
bytesWritten = 0;
filesWritten = 0;
}
WritePacket(byte[] packet)
{
if(bytesWritten<maxFileSize)
{
binWriter.Write(packet);
bytesWritten += packet.Length;
}
else
{
// this is where I'd normally call Dispose(), but the overhead
// is too high, and disposing the stream in a seperate thread
// interferes with the new one
// what actually happens here? it's the only thing I've found to
//work...
filesWritten++;
binWriter = new BinaryWriter(File.Open(filepath + filesWritten, FileMode.Create));
}
}
It feels bad, but this is the only solution that works so far. Any insight would be great!

System.IO.IOException: The process cannot access the file '.txt' because it is being used by another process

I am using the next code to log errors of an web application.
using (StreamWriter myStream = new StreamWriter(sLogFilePath, true))
{
myStream.WriteLine(string.Format("{0, -45}{1, -25}{2, -10 {3}", guid, DateTime.Now, StringEnum.GetStringValue(enumMsg), sText));
}
Sometimes, the following exception 'System.IO.IOException: The process cannot access the file '.txt' because it is being used by another process.' is thrown.
I think this is caused by multiple instances of the web app at the same time. Can you help me fix this problem, please ?
EDIT: I have to add that for every method I log like this:
Date - Method X started.
Date - Exception.Message (table not found or other errors)
Date - Method X stopped.
and when this Error appears, it's logged only this:
Date - System.IO.IOException: The process cannot access the file '.txt' because it is being used by another process.
Sadly Windows does not allow waiting on a file lock. In order to get around this all your applications will have to create a lock that all the processes involved can check.
The use of this code will only prevent threads within a single process from accessing the file:
/* Suitable for a single process but fails with multiple processes */
private static object lockObj = new Object();
lock (lockObj)
{
using (StreamWriter myStream = new StreamWriter(sLogFilePath, true))
{
myStream.WriteLine(string.Format("{0, -45}{1, -25}{2, -10 {3}", guid, DateTime.Now, StringEnum.GetStringValue(enumMsg), sText));
}
}
In order to lock across multiple processes a Mutex lock is required. This gives a name to the lock that other processes can check for. It works like this:
/* Suitable for multiple processes on the same machine but fails for
multiple processes on multiple machines */
using (Mutex myMutex = new Mutex(true, "Some name that is unlikly to clash with other mutextes", bool))
{
myMutex.WaitOne();
try
{
using (StreamWriter myStream = new StreamWriter(sLogFilePath, true))
{
myStream.WriteLine(string.Format("{0, -45}{1, -25}{2, -10 {3}", guid, DateTime.Now, StringEnum.GetStringValue(enumMsg), sText));
}
}
finally
{
myMutex.ReleaseMutex();
}
}
I don't think Mutexes can be access from remote machines so if you have a file on a file share and you are trying to write to it from processes on multiple machines then you are probably better off writing a server component on the machine that hosts the file to mediate between the processes.
Your web server will run requests in multiple threads. If two or more requests have to log exceptions at the same time, this will lead to the exception you see.
You could either lock the section as James proposed, or you could use a logging framework that will handle multithreading issues for you, for example Lgo4net or NLog.
Assuming you need each thread to eventually write to the log, you could lock the critical section
private static object fileLock = new Object();
...
lock (fileLock)
{
using (StreamWriter myStream = new StreamWriter(sLogFilePath, true))
{
myStream.WriteLine(string.Format("{0, -45}{1, -25}{2, -10 {3}", guid, DateTime.Now, StringEnum.GetStringValue(enumMsg), sText));
}
}
This means only 1 thread at any given time can be writing to the file, other threads are blocked until the current thread has exited the critical section (at which point the file lock will have been removed).
One thing to note here is that lock works per process, therefore if your site is running the context of a web farm/garden then you would need to look at a system-wide locking mechanism i.e. Mutexes.
I've added this code to my class:
public static bool IsFileLocked(FileInfo file)
{
FileStream stream = null;
try
{
stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch
{
return true;
}
finally
{
if (stream != null)
{
stream.Close();
}
}
return false;
}
and now my LogToFile method is like this:
while (IsFileLocked(fi))
{
}
using (StreamWriter myStream = new StreamWriter(sLogFilePath, true))
{
if (displayTime == true)
myStream.WriteLine(string.Format("{0, -45}{1, -25}{2, -10}{3}", guid, DateTime.Now, StringEnum.GetStringValue(enumMsg), sText));
else
myStream.WriteLine(string.Format("{0, -70}{1, -10}{2} ", guid, StringEnum.GetStringValue(enumMsg), sText));
}
I hope this will work.

Memory Mapped File gets deleted from memory

For some reason, when i read from a memory mapped file a couple of times it just gets randomly deleted from memory, i don't know what's going on. Is the kernel or GC deleting it from memory? If they are, how do i prevent them from doing so?
I am serializing an object to Json and writing it to memory.
I get an exception when trying to read again after a couple of times, i get FileNotFoundException: Unable to find the specified file.
private const String Protocol = #"Global\";
Code to write to memory mapped file:
public static Boolean WriteToMemoryFile<T>(List<T> data)
{
try
{
if (data == null)
{
throw new ArgumentNullException("Data cannot be null", "data");
}
var mapName = typeof(T).FullName.ToLower();
var mutexName = Protocol + typeof(T).FullName.ToLower();
var serializedData = JsonConvert.SerializeObject(data);
var capacity = serializedData.Length + 1;
var mmf = MemoryMappedFile.CreateOrOpen(mapName, capacity);
var isMutexCreated = false;
var mutex = new Mutex(true, mutexName, out isMutexCreated);
if (!isMutexCreated)
{
var isMutexOpen = false;
do
{
isMutexOpen = mutex.WaitOne();
}
while (!isMutexOpen);
var streamWriter = new StreamWriter(mmf.CreateViewStream());
streamWriter.WriteLine(serializedData);
streamWriter.Close();
mutex.ReleaseMutex();
}
else
{
var streamWriter = new StreamWriter(mmf.CreateViewStream());
streamWriter.WriteLine(serializedData);
streamWriter.Close();
mutex.ReleaseMutex();
}
return true;
}
catch (Exception ex)
{
return false;
}
}
Code to read from memory mapped file:
public static List<T> ReadFromMemoryFile<T>()
{
try
{
var mapName = typeof(T).FullName.ToLower();
var mutexName = Protocol + typeof(T).FullName.ToLower();
var mmf = MemoryMappedFile.OpenExisting(mapName);
var mutex = Mutex.OpenExisting(mutexName);
var isMutexOpen = false;
do
{
isMutexOpen = mutex.WaitOne();
}
while (!isMutexOpen);
var streamReader = new StreamReader(mmf.CreateViewStream());
var serializedData = streamReader.ReadLine();
streamReader.Close();
mutex.ReleaseMutex();
var data = JsonConvert.DeserializeObject<List<T>>(serializedData);
mmf.Dispose();
return data;
}
catch (Exception ex)
{
return default(List<T>);
}
}
The process that created the memory mapped file must keep a reference to it for as long as you want it to live. Using CreateOrOpen is a bit tricky for exactly this reason - you don't know whether disposing the memory mapped file is going to destroy it or not.
You can easily see this at work by adding an explicit mmf.Dispose() to your WriteToMemoryFile method - it will close the file completely. The Dispose method is called from the finalizer of the mmf instance some time after all the references to it drop out of scope.
Or, to make it even more obvious that GC is the culprit, you can try invoking GC explicitly:
WriteToMemoryFile("Hi");
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
ReadFromMemoryFile().Dump(); // Nope, the value is lost now
Note that I changed your methods slightly to work with simple strings; you really want to produce the simplest possible code that reproduces the behaviour you observe. Even just having to get JsonConverter is an unnecessary complication, and might cause people to not even try running your code :)
And as a side note, you want to check for AbandonedMutexException when you're doing Mutex.WaitOne - it's not a failure, it means you took over the mutex. Most applications handle this wrong, leading to issues with deadlocks as well as mutex ownership and lifetime :) In other words, treat AbandonedMutexException as success. Oh, and it's good idea to put stuff like Mutex.ReleaseMutex in a finally clause, to make sure it actually happens, even if you get an exception. Thread or process dead doesn't matter (that will just cause one of the other contendants to get AbandonedMutexException), but if you just get an exception that you "handle" with your return false;, the mutex will not be released until you close all your applications and start again fresh :)
Clearly, the problem is that the MMF loose its context as explained by Luaan. But still nobody explains how to perform it:
The code 'Write to MMF file' must run on a separate async thread.
The code 'Read from MMF' will notify once read completed that the MMF had been read. The notification can be a flag in a file for example.
Therefore the async thread running the 'Write to MMF file' will run as long as the MMF file is read from the second part. We have therefore created the context within which the memory mapped file is valid.

Can this code have bottleneck or be resource-intensive?

It's code that will execute 4 threads in 15-min intervals. The last time that I ran it, the first 15-minutes were copied fast (20 files in 6 minutes), but the 2nd 15-minutes are much slower. It's something sporadic and I want to make certain that, if there's any bottleneck, it's in a bandwidth limitation with the remote server.
EDIT: I'm monitoring the last run and the 15:00 and :45 copied in under 8 minutes each. The :15 hasn't finished and neither has :30, and both began at least 10 minutes before :45.
Here's my code:
static void Main(string[] args)
{
Timer t0 = new Timer((s) =>
{
Class myClass0 = new Class();
myClass0.DownloadFilesByPeriod(taskRunDateTime, 0, cts0.Token);
Copy0Done.Set();
}, null, TimeSpan.FromMinutes(20), TimeSpan.FromMilliseconds(-1));
Timer t1 = new Timer((s) =>
{
Class myClass1 = new Class();
myClass1.DownloadFilesByPeriod(taskRunDateTime, 1, cts1.Token);
Copy1Done.Set();
}, null, TimeSpan.FromMinutes(35), TimeSpan.FromMilliseconds(-1));
Timer t2 = new Timer((s) =>
{
Class myClass2 = new Class();
myClass2.DownloadFilesByPeriod(taskRunDateTime, 2, cts2.Token);
Copy2Done.Set();
}, null, TimeSpan.FromMinutes(50), TimeSpan.FromMilliseconds(-1));
Timer t3 = new Timer((s) =>
{
Class myClass3 = new Class();
myClass3.DownloadFilesByPeriod(taskRunDateTime, 3, cts3.Token);
Copy3Done.Set();
}, null, TimeSpan.FromMinutes(65), TimeSpan.FromMilliseconds(-1));
}
public struct FilesStruct
{
public string RemoteFilePath;
public string LocalFilePath;
}
Private void DownloadFilesByPeriod(DateTime TaskRunDateTime, int Period, Object obj)
{
FilesStruct[] Array = GetAllFiles(TaskRunDateTime, Period);
//Array has 20 files for the specific period.
using (Session session = new Session())
{
// Connect
session.Open(sessionOptions);
TransferOperationResult transferResult;
foreach (FilesStruct u in Array)
{
if (session.FileExists(u.RemoteFilePath)) //File exists remotely
{
if (!File.Exists(u.LocalFilePath)) //File does not exist locally
{
transferResult = session.GetFiles(u.RemoteFilePath, u.LocalFilePath);
transferResult.Check();
foreach (TransferEventArgs transfer in transferResult.Transfers)
{
//Log that File has been transferred
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists locally
}
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists remotely
}
}
if (token.IsCancellationRequested)
{
break;
}
}
}
}
Something is not quite right here. First thing is, you're setting 4 timers to run parallel. If you think about it, there is no need. You don't need 4 threads running parallel all the time. You just need to initiate tasks at specific intervals. So how many timers do you need? ONE.
The second problem is why TimeSpan.FromMilliseconds(-1)? What is the purpose of that? I can't figure out why you put that in there, but I wouldn't.
The third problem, not related to multi-programming, but I should point out anyway, is that you create a new instance of Class each time, which is unnecessary. It would be necessary if, in your class, you need to set constructors and your logic access different methods or fields of the class in some order. In your case, all you want to do is to call the method. So you don't need a new instance of the class every time. You just need to make the method you're calling static.
Here is what I would do:
Store the files you need to download in an array / List<>. Can't you spot out that you're doing the same thing every time? Why write 4 different versions of code for that? This is unnecessary. Store items in an array, then just change the index in the call!
Setup the timer at perhaps 5 seconds interval. When it reaches the 20 min/ 35 min/ etc. mark, spawn a new thread to do the task. That way a new task can start even if the previous one is not finished.
Wait for all threads to complete (terminate). When they do, check if they throw exceptions, and handle them / log them if necessary.
After everything is done, terminate the program.
For step 2, you have the option to use the new async keyword if you're using .NET 4.5. But it won't make a noticeable difference if you use threads manually.
And why is it so slow...why don't you check your system status using task manager? Is the CPU high and running or is the network throughput occupied by something else or what? You can easily tell the answer yourself from there.
The problem was the sftp client.
The purpose of the console application was to loop through a list<> and download the files. I tried with winscp and, even though, it did the job, it was very slow. I also tested sharpSSH and it was even slower than winscp.
I finally ended up using ssh.net which, at least in my particular case, was much faster than both winscp and sharpssh. I think the problem with winscp is that there was no evident way of disconnecting after I was done. With ssh.net I could connect/disconnect after every file download was made, something I couldn't do with winscp.

Performance issues using task factory in C#

I am working on a logging system for a web application which logs a sequence of events in a dictionary object before sending it to my logging object using Task.Factory.StartNew(() => iLogEventSave()). The logger seemed to work fine, but in some instances some events were not being saved properly so I used the lock() statement to correct the issue. This seemed to do the trick, but the application's performance has dramatically decreased by doing this. How can I have the UI/Page render without having to wait for the Tasks to finish their job?
Below is the code
private static readonly object Locker = new object();
public void iLogEventSave(object state)
{
XmlDocument doc = new XmlDocument();
IDictionary<string, string> EventDetails = (IDictionary<string, string>)state;
string logFile = "";
if(ConfigurationManager.AppSettings["Log_File_Path"].ToString() =="")
{
logFile = HttpRuntime.AppDomainAppPath + "Logs\\" + DateTime.Now.ToString("yyyy_MM_dd") + ".txt";
}
else
{
logFile = ConfigurationManager.AppSettings["Log_File_Path"].ToString() + DateTime.Now.ToString("yyyy_MM_dd") + ".txt";
}
lock (Locker)
{
if (File.Exists(logFile))
{
doc.Load(logFile);
}
else
{
var root = doc.CreateElement("Log");
doc.AppendChild(root);
}
var el = (XmlElement)doc.DocumentElement.AppendChild(doc.CreateElement("Event"));
foreach (KeyValuePair<string, string> item in EventDetails)
{
XmlElement Desc = doc.CreateElement("Details");
Desc.SetAttribute(item.Key.ToString(), item.Value);
el.AppendChild(Desc);
}
doc.Save(logFile);
}
}
If your log did not save several events while being executed asynchronously, you have an unhandled error that you did not address. Considering that you're using a file, I'm going to go out on a limb and say that it failed because two threads were competing for access to the same log file and the first thread to grab it locked the other one out. This is why your lock would now work, it prevents other threads from trying to grab the file.
But logging to a file means that you've effectively restricted yourself to one thread at a time and dealing with the entire file as it grows. You have to load more and more, append more and more, and locking the thread means that the more threads are waiting to log the events, the higher your overhead. All this could certainly add up to a decrease in performance.
May I recommend using a database table to log events? File I/O is very expensive, resource and time-wise. Databases have less overhead and far better throughput by comparison in these very scenarios.

Categories