I have three separate codes running in separate threads.
Thread task 1: Reading data from a device and writing it into a ConcurrentDictionary.
Thread task 2: Writes the data in the ConcurrentDictionary to the computer as a separate file.
I have read many posts on the forum saying that concurrentdictionary is safe for separate threads. I've also read that there are lockout situations. In fact, more than one question mark occurred in my head.
Is there a need for locking for the concurrentdictionary? In which cases locking is required ? How can I do if locking is required? What problems does use in the following way cause?
Thread code 1: Data comes in every second.
public void FillModuleBuffer(byte[] buffer, string IpPort)
{
if (!CommunicationDictionary.dataLogList.ContainsKey(IpPort))
{
CommunicationDictionary.dataLogList.TryAdd(IpPort, buffer);
}
}
Thread code 2: The code below works in a timer. The timer duration is 200 ms.
if (CommunicationDictionary.dataLogList.ContainsKey(IpPort))
{
using (stream = File.Open(LogFilename, FileMode.Append, FileAccess.Write))
{
using (BinaryWriter writer = new BinaryWriter(stream))
{
writer.Write(CommunicationDictionary.dataLogList[IpPort]);
writer.Flush();
writer.Close();
CommunicationDictionary.dataLogList.TryRemove(IpPort,out _);
}
}
}
Note: the codes have been simplified for clarity.
Note 2: I used Dictionary before that. I encountered a very different problem. While active, after 2-3 hours, I got the error that the array was out of index range even though there was no data in the Dictionary.
The example code should be kind of threadsafe, but it shows a missunderstanding on how the concurrent dictionary should be used. For example:
if (!CommunicationDictionary.dataLogList.ContainsKey(IpPort))
{
CommunicationDictionary.dataLogList.TryAdd(IpPort, buffer);
}
This happens to work because there is only one thread that adds to the dictionary, but since there are separate statements the dictionary may change between them. If you look at the documentation for TryAdd you can see that it will return false if the key is already present. So no need for the ContainsKey. There are quite a few different methods with the purpose of doing multiple things at the same time, to ensure the entire operation is atomic.
Same with the reading thread. All accesses to the concurrentDictionary should be replaced with one call to TryRemove
if (CommunicationDictionary.dataLogList.TryRemove(IpPort,out var data))
{
using (stream = File.Open(LogFilename, FileMode.Append, FileAccess.Write))
{
using (BinaryWriter writer = new BinaryWriter(stream))
{
writer.Write(data);
writer.Flush();
writer.Close();
}
}
}
Note that this will save some datachunks, and throwaway others, without any hard guarantee what chunks will be saved or not. This might be the intended behavior, but it would be more common with a queue that ensures that all data is saved. A typical implementation would wrap a concurrentQueue in a blockingCollection with one or more producing threads, and one consuming thread. This avoids the need for a separate timer.
Related
So I have 16 threads that simultaneously run this method:
private void Work()
{
int currentByte;
char currentChar;
try
{
while (true)
{
position++;
currentByte = file.ReadByte();
currentChar = Convert.ToChar(currentByte);
entries.Add(new Entry(currentChar));
}
}
catch (Exception) { }
}
And then I have one more thread running this method:
private void ManageThreads()
{
bool done;
for(; ; )
{
done = !threads.Any(x => x.IsAlive == true);//Check if each thread is dead before continuing
if (done)
break;
else
Thread.Sleep(100);
}
PrintData();
}
Here is the problem: the PrintData method just prints everything in the 'entries' list to a text file. This text file is different every time the program is run even with the same input file. I am a bit of a noob when it comes to multi-threaded applications so feel free to dish out the criticism.
In general unless type explicitly calls out thread safety in its documentation you should assume it is not thread-safe*. Streams in .Net do not have such section and should be treated non-thread safe - use appropriate synchronization (i.e. locks) that guarantees that each stream is accessed from one thread at a time.
With file streams there is another concern - OS level file object may be updated from other threads - FileStream tries to mitigate it by checking if its internal state matches OS state - see FileStream:remarks section on MSDN.
If you want thread safe stream you can try to use Synchronized method as shown in C#, is there such a thing as a "thread-safe" stream?.
Note that code you have in the post will produce random results whether stream is thread safe or not. Thread safety of a stream will only guarantee that all bytes show up in output. If using non thread safe stream there is no guarantees at all and some bytes may show up multiple times, some skipped and any other behavior (crashes, partial reads,...) are possible.
* Thread-safe as in "internal state of the instance will be consistent whether it is called from one thread or multiple". It does not mean calling arbitrary methods from different threads will lead to useful behavior.
I'm writing a Stringbuilder to file
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
This is equivilent (I think)
File.AppendAllText(Filepath, text.ToString());
Obviously in a multi threaded environment these statements on their own would cause failures to write as they collided.
I've put a lock on this code, but that isn't ideal, as it's too expensive and may exacerbate this bottleneck. Is there some other way of causing one threads file access to block another's. I've been told "blocking not locking", I thought lock did block, but they must be hinting at a cheaper way of preventing simultaneous use of the file system.
How do I block execution in a less time expensive manner?
You can't have multiple threads write to the same file simultaneously, thus, there is no such "bottleneck" . A lock makes perfect sense for this scenario. If you are concerned about this being expensive, just add the writes to a queue, and let a single thread manage writing them to file.
Pseudo code
public static readonly Object logsLock = new Object();
// any thread
lock(logsLock)
{
logs.Add(stringBuilderText);
}
// dedicated thread to writing
lock(logsLock)
{
// ideally, this should be a "get in, get out" situation,
// where you only need to make a copy of the logs, then exit the lock,
// then write them, then lock the logsLock again, and remove only the logs
// you successfully wrote to to file, then exit the lock again.
logs.ForEach(writeLogToFile);
}
You can lock the stream using the lock method.
http://msdn.microsoft.com/en-us/library/system.io.filestream.lock.aspx
I have a process that needs to read and write to a file. The application has a specific order to its reads and writes and I want to preserve this order. What I would like to do is implement something that lets the first operation start and makes the second operation wait until the first is done with a first come first served like of queue to access the file. From what I have read file locking seems like it might be what I am looking for but I have not been able to find a very good example. Can anyone provide one?
Currently I am using a TextReader/Writer with .Synchronized but this is not doing what I hoped it would.
Sorry if this is a very basic question, threading gives me a headache :S
It should be as simple as this:
public static readonly object LockObj = new object();
public void AnOperation()
{
lock (LockObj)
{
using (var fs = File.Open("yourfile.bin"))
{
// do something with file
}
}
}
public void SomeOperation()
{
lock (LockObj)
{
using (var fs = File.Open("yourfile.bin"))
{
// do something else with file
}
}
}
Basically, define a lock object, then whenever you need to do something with your file, make sure you get a lock using the C# lock keyword. On reaching the lock statement, execution will block indefinitely until a lock has been obtained.
There are other constructs you can use for locking, but I find the lock keyword to be the most straightforward.
If you're using a current version of the .Net Framework, you can benefit from Task.ContinueWith.
If your units of work are logically always, "read some, then write some", the following expresses that intent succinctly and should scale:
string path = "file.dat";
// Start a reader task
var task = Task.Factory.StartNew(() => ReadFromFile(path));
// Continue with a writer task
task.ContinueWith(tt => WriteToFile(path));
// We're guaranteed that the read will occur before the write
// and that the write will occur once the read completes.
// We also can check the antecedent task's result (tt.Result in our
// example) for any special error logic we need.
Say the method below is being called several thousand times by different threads in a .net 4 application. What’s the best way to handle this situation? Understand that the disk is the bottleneck here but I’d like the WriteFile() method to return quickly.
Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory Queue where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatching Queue and writing files. If your WriteFile is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.
UPDATE:
Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.
Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.
Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a Task:
private void WriteFileAsynchronously(string FileName, MemoryStream Data)
{
Task.Factory.StartNew(() => WriteFileSynchronously(FileName, Data));
}
private void WriteFileSynchronously(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.
If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has WriteFile just throwing stuff into a ConcurrentQueue or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.
I've had a similar problem that I solved by having the WriteFile method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.
A second thread in my program starts reading that file that WriteFile is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.
This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.
It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.
Encapsulate your complete method implementation in a new Thread(). Then you can "fire-and-forget" these threads and return to the main calling thread.
foreach (file in filesArray)
{
try
{
System.Threading.Thread updateThread = new System.Threading.Thread(delegate()
{
WriteFileSynchronous(fileName, data);
});
updateThread.Start();
}
catch (Exception ex)
{
string errMsg = ex.Message;
Exception innerEx = ex.InnerException;
while (innerEx != null)
{
errMsg += "\n" + innerEx.Message;
innerEx = innerEx.InnerException;
}
errorMessages.Add(errMsg);
}
}
I have two ASP.NET web application. One is responsible for processing some info and writing to a log file, and the other application is reponsible for reading the log file and displays the information based on user request.
Here's my code for the Writer
public static void WriteLog(String PathToLogFile, String Message)
{
Mutex FileLock = new Mutex(false, "LogFileMutex");
try
{
FileLock.WaitOne();
using (StreamWriter sw = File.AppendText(FilePath))
{
sw.WriteLine(Message);
sw.Close();
}
}
catch (Exception ex)
{
LogUtil.WriteToSystemLog(ex);
}
finally
{
FileLock.ReleaseMutex();
}
}
And here's my code for the Reader :
private String ReadLog(String PathToLogFile)
{
FileStream fs = new FileStream(
PathToLogFile, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite);
StreamReader Reader = new StreamReader(fs);
return Reader.ReadToEnd();
}
My question, is the above code enough to prevent locking in a web garden environemnt?
EDIT 1 : Dirty read is okay.
EDIT 2 : Creating Mutex with new Mutex(false, "LogFileMutex"), closing StreamWriter
Sounds like your trying to implement a basic queue. Why not use a queue that gives you guarenteed availability. You could drop the messages into an MSMQ, then implement a windows service which will read from the queue and push the messages to the DB. If the writting to the DB fails you simply leave the message on the queue (Although you will want to handle posion messages so if it fails cause the data is bad you don't end up in an infinite loop)
This will get rid of all locking concerns and give you guarenteed delivery to your reader...
You should also be disposing of your mutex, as it derives from WaitHandle, and WaitHandle implements IDisposable:
using (Mutex FileLock = new Mutex(true, "LogFileMutex"))
{
// ...
}
Also, perhaps consider a more unique name (a GUID perhaps) than "LogFileMutex", since another unrelated process could possibly use the same name inadvertantly.
Doing this in a web based environment, you are going to have a lot of issues with file locks, can you change this up to use a database instead?
Most hosting solutions are allowing up to 250mb SQL databases.
Not only will a database help with the locking issues, it will also allow you to purge older data more easily, after a wile, that log read is going to get really slow.
No it won't. First, you're creating a brand new mutex with every call so multiple threads are going to access the writing critical section. Second, you don't even use the mutex in the reading critical section so one thread could be attempting to read the file while another is attempting to write. Also, you're not closing the stream in the ReadLog method so once the first read request comes through your app won't be able to write any log entries anyway until garbage collection comes along and closes the stream for you... which could take awhile.