Thread Safe Generic List block Enumeration - c#

First I would like to state that I have looked up probably a hundred google, and stackoverflow questions related to my question. I cannot find anything that answers my specific question.
I converted a DataTable into a List. I have multiple threads that enumerate the List with foreach. However, once every 5 minutes a master thread needs to refresh that List with the latest data. When this occurs, I need to block other threads from reading the thread until the master thread has fully updated the List.
All the articles and questions I have found blocks access on the single add. I know I can write a blocking for the update, but I need that lock to be also syncronized with all the other threads that enumerate. I do not want to update the List while other threads are in middle of their own enumeration.
How can I write a lock that will be utilized foreach statements and also for my update function?
Thanks
EDIT: I want to block "Consumers / Observers" when the producer thread is producing. I do not want Consumers / Observers blocking each other.

Put all code that populates and enumerates the list in a lock block:
lock (theList)
{
Update(theList);
}
lock (theList)
{
foreach (var thingy in theList)
{
DoStuff(thingy);
}
}

At first I though you could try to use a BlockingCollection. It implements the producer/consumer problem.
What you need actually is a way to set a rendez-vous for your reading threads and your master thread.
I think this can be done with monitors (especially by using Wait and Pulse methods). Try it and tell us what you found.

Related

List RemoveRange Thread safety

I have a generic List that's getting added and removed by two different threads.
Thread 1 :
batchToBeProcessed = this.list.Count;
// Do some processing
this.list.RemoveRange(0, batchToBeProcessed.Count);
Thread 2 :
lock()
{
this.list.Add(item);
}
Is RemoveRange, thread safe in the following scenario?
Say the list has 10 items thats being processed in thread1, and while removerange is getting executed, thread2 adds 1 item, what would happen?
Edit : Do not want to use a concurrentbag since I need the ordering.
The answer to your question is No, it is not safe to have multiple threads modifying the list concurrently. You must synchronize access to the list using a lock or similar synchronization primitive. List is safe for multiple concurrent readers. Or you can have a single writer. But you can't modify the list in one thread while some other thread is accessing it at all.
As to what will happen if you try to Add while another thread is executing a RemoveRange, it's likely that the code will throw an exception. But it's possible no exception will be thrown and your list will become corrupted. Or it could work just fine. Which will happen depends on timing. But there's no guarantee that it will work.
To my knowledge, the .NET Framework does not have a concurrent data structure that gives you all the functionality of List.
Is RemoveRange, thread safe in the following scenario?
No, all the methods in List<T> class are not thread safe.
Say the list has 10 items thats being processed in thread1, and while removerange is getting executed, thread2 adds 1 item, what would happen?
It's undefined, new item may be added to the end of the list, old items may not be removed, ...

Check how many threads are running/exit threads

In c# im creating my threads like this:
void LaunchThread(string url, string search, string regexstring)
{
new Thread(delegate()
{
Scrape(url, search, regexstring,false);
}).Start();
}
and I use an INT variable to follow how many threads are currently running but i have a feeling it can be a little wonky at times and not be accurate (due to when you time the check on how many exist)
I have 2 questions:
Is there a variable that could tell me how many threads are currently running
Is there a way to close/exit all threads mid way not waiting for them to complete?
thanks alot SO im new to c# and multitasking as a whole
If you're interested in only the threads you've started, keeping a counter like you describe should work fine. Just make sure that you use proper locking when you get/change it.
If the work you're doing in the worker threads involves a loop, you can add logic to the loop that checks a variable to see whether it should keep running. You could set this variable from another thread (again, assuming proper locking).
Keep a collection of the threads you spawn like List<Thread>. Then you can get the Count property to check the number of threads. You can even get live threads only using threadCollection.Count(t => t.IsAlive).
I assume you're using HttpWebRequest. Check out these two threads:
Terminate loopless thread instantly without Abort or Suspend
Killing HttpWebRequest object using Thread.Abort

Is it safe to put TryDequeue in a while loop?

I have not used concurrent queue before.
Is it OK to use TryDequeue as below, in a while loop? Could this not get stuck forever?
var cq = new ConcurrentQueue<string>();
cq.Enqueue("test");
string retValue;
while(!cq.TryDequeue(out retValue))
{
// Maybe sleep?
}
//Do rest of code
It's safe in the sense that the loop won't actually end until there is an item it has pulled out, and that it will eventually end if the queue has an item to be taken out. If the queue is emptied by another thread and no more items are added then of course the loop will not end.
Beyond all of that, what you have is a busy loop. This should virtually always be avoided. Either you end up constantly polling the queue asking for more items, wasting CPU time and effort tin the process, or you end up sleeping and therefore not actually using the item in the queue as soon as it is added (and even then, still wasting some time/effort on context switches just to poll the queue).
What you should be doing instead, if you find yourself in the position of wanting to "wait until there is an item for me to take" is use a BlockingCollection. It is specifically designed to wrap various types of concurrent collections and block until there is an item available to take. It allows you to change your code to queue.Take() and have it be easier to write, semantically stating what you're doing, be clearly correct, noticeably more effective, and completely safe.
Yes it is safe as per the documentation, but it is not a recommended design.
It might get "Stuck forever" if the queue was empty at the first call TryDequeue, and if no other thread pushes data in the queue after that point (you could break the while after N attempts or after a timeout, though).
ConcurrentQueue offers an IsEmpty member to check if there are items in the Queue. It is much more efficient to check it than to loop over a TryDequeue call (particularly if the queue is generally empty)
What you might want to do is :
while(cq.IsEmpty())
{
// Maybe sleep / wait / ...
}
if(cq.TryDequeue(out retValue))
{
...
}
EDIT:
If this last call returns false: another of your threads dequeued the item. If you don't have other threads, this is safe, if you do, you should use while(TryDequeue)

Comparison of Join and WaitAll

For multiple threads wait, can anyone compare the pros and cons of using WaitHandle.WaitAll and Thread.Join?
WaitHandle.WaitAll has a 64 handle limit so that is obviously a huge limitation. On the other hand, it is a convenient way to wait for many signals in only a single call. Thread.Join does not require creating any additional WaitHandle instances. And since it could be called individually on each thread the 64 handle limit does not apply.
Personally, I have never used WaitHandle.WaitAll. I prefer a more scalable pattern when I want to wait on multiple signals. You can create a counting mechanism that counts up or down and once a specific value is reach you signal a single shared event. The CountdownEvent class conveniently packages all of this into a single class.
var finished = new CountdownEvent(1);
for (int i = 0; i < NUM_WORK_ITEMS; i++)
{
finished.AddCount();
SpawnAsynchronousOperation(
() =>
{
try
{
// Place logic to run in parallel here.
}
finally
{
finished.Signal();
}
}
}
finished.Signal();
finished.Wait();
Update:
The reason why you want to signal the event from the main thread is subtle. Basically, you want to treat the main thread as if it were just another work item. Afterall, it, along with the other real work items, is running concurrently as well.
Consider for a moment what might happen if we did not treat the main thread as a work item. It will go through one iteration of the for loop and add a count to our event (via AddCount) indicating that we have one pending work item right? Lets say the SpawnAsynchronousOperation completes and gets the work item queued on another thread. Now, imagine if the main thread gets preempted before swinging around to the next iteration of the loop. The thread executing the work item gets its fair share of the CPU and starts humming along and actually completes the work item. The Signal call in the work item runs and decrements our pending work item count to zero which will change the state of the CountdownEvent to signalled. In the meantime the main thread wakes up and goes through all iterations of the loop and hits the Wait call, but since the event got prematurely signalled it pass on by even though there are still pending work items.
Again, avoiding this subtle race condition is easy when you treat the main thread as a work item. That is why the CountdownEvent is intialized with one count and the Signal method is called before the Wait.
I like #Brian's answer as a comparison of the two mechanisms.
If you are on .Net 4, it would be worthwhile exploring Task Parallel Library to achieve Task Parellelism via System.Threading.Tasks which allows you to manage tasks across multiple threads at a higher level of abstraction. The signalling you asked about in this question to manage thread interactions is hidden or much simplified, and you can concentrate on properly defining what each Task consists of and how to coordinate them.
This may seem offtopic but as Microsoft themselves say in the MSDN docs:
in the .NET Framework 4, tasks are the
preferred API for writing
multi-threaded, asynchronous, and
parallel code.
The waitall mechanism involves kernal-mode objects. I don't think the same is true for the join mechanism. I would prefer join, given the opportunity.
Technically though, the two are not equivalent. IIRC Join can only operate on one thread. Waitall can hold for the signalling of multiple kernel objects.

List with non-null elements ends up containing null. A synchronization issue?

First of all, sorry about the title -- I couldn't figure out one that was short and clear enough.
Here's the issue: I have a list List<MyClass> list to which I always add newly-created instances of MyClass, like this: list.Add(new MyClass()). I don't add elements any other way.
However, then I iterate over the list with foreach and find that there are some null entries. That is, the following code:
foreach (MyClass entry in list)
if (entry == null)
throw new Exception("null entry!");
will sometimes throw an exception.
I should point out that the list.Add(new MyClass()) are performed from different threads running concurrently. The only thing I can think of to account for the null entries is the concurrent accesses. List<> isn't thread-safe, after all. Though I still find it strange that it ends up containing null entries, instead of just not offering any guarantees on ordering.
Can you think of any other reason?
Also, I don't care in which order the items are added, and I don't want the calling threads to block waiting to add their items. If synchronization is truly the issue, can you recommend a simple way to call the Add method asynchronously, i.e., create a delegate that takes care of that while my thread keeps running its code? I know I can create a delegate for Add and call BeginInvoke on it. Does that seem appropriate?
Thanks.
EDIT: A simple solution based on Kevin's suggestion:
public class AsynchronousList<T> : List<T> {
private AddDelegate addDelegate;
public delegate void AddDelegate(T item);
public AsynchronousList() {
addDelegate = new AddDelegate(this.AddBlocking);
}
public void AddAsynchronous(T item) {
addDelegate.BeginInvoke(item, null, null);
}
private void AddBlocking(T item) {
lock (this) {
Add(item);
}
}
}
I only need to control Add operations and I just need this for debugging (it won't be in the final product), so I just wanted a quick fix.
Thanks everyone for your answers.
List<T> can only support multiple readers concurrently. If you are going to use multiple threads to add to the list, you'll need to lock the object first. There is really no way around this, because without a lock you can still have someone reading from the list while another thread updates it (or multiple objects trying to update it concurrently also).
http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
Your best bet probably is to encapsulate the list in another object, and have that object handle the locking and unlocking actions on the internal list. That way you could make your new object's "Add" method asynchronous and let the calling objects go on their merry way. Any time you read from it though you'll most likely still have to wait on any other objects finishing their updates though.
The only thing I can think of to account for the null entries is the concurrent accesses. List<> isn't thread-safe, after all.
That's basically it. We are specifically told it's not thread-safe, so we shouldn't be surprised that concurrent access results in contract-breaking behaviour.
As to why this specific problem occurs, we can but speculate, since List<>'s private implementation is, well, private (I know we have Reflector and Shared Source - but in principle it is private). Suppose the implementation involves an array and a 'last populated index'. Suppose also that 'Add an item' looks like this:
Ensure the array is big enough for another item
last populated index <- last populated index + 1
array[last populated index] = incoming item
Now suppose there are two threads calling Add. If the interleaved sequence of operations ends up like this:
Thread A : last populated index <- last populated index + 1
Thread B : last populated index <- last populated index + 1
Thread A : array[last populated index] = incoming item
Thread B : array[last populated index] = incoming item
then not only will there be a null in the array, but also the item that thread A was trying to add won't be in the array at all!
Now, I don't know for sure how List<> does its stuff internally. I have half a memory that it is with an ArrayList, which internally uses this scheme; but in fact it doesn't matter. I suspect that any list mechanism that expects to be run non-concurrently can be made to break with concurrent access and a sufficiently 'unlucky' interleaving of operations. If we want thread-safety from an API that doesn't provide it, we have to do some work ourselves - or at least, we shouldn't be surprised if the API sometimes breaks its when we don't.
For your requirement of
I don't want the calling threads to block waiting to add their item
my first thought is a Multiple-Producer-Single-Consumer queue, wherein the threads wanting to add items are the producers, which dispatch items to the queue async, and there is a single consumer which takes items off the queue and adds them to the list with appropriate locking. My second thought is that this feels as if it would be heavier than this situation warrants, so I'll let it mull for a bit.
If you're using .NET Framework 4, you might check out the new Concurrent Collections. When it comes to threading, it's better not to try to be clever, as it's extremely easy to get it wrong. Synchronization can impact performance, but the effects of getting threading wrong can also result in strange, infrequent errors that are a royal pain to track down.
If you're still using Framework 2 or 3.5 for this project, I recommend simply wrapping your calls to the list in a lock statement. If you're concerned about performance of Add (are you performing some long-running operation using the list somewhere else?) then you can always make a copy of the list within a lock and use that copy for your long-running operation outside the lock. Simply blocking on the Adds themselves shouldn't be a performance issue, unless you have a very large number of threads. If that's the case, you can try the Multiple-Producer-Single-Consumer queue that AakashM recommended.

Categories