I have some code which processes a number of response objects from my database in parallel (using AsParallel()). Each response has many components. The responses may share the same components. I do some modifications to the component data and save it to the db, so I need to prevent multiple threads working on the same component object at the same time.
I use locks to achieve this. I have a ConcurrentDictionary<int, object> to hold all the necessary lock objects. Like this:
private static ConcurrentDictionary<int, object> compLocks = new ConcurrentDictionary<int, object>();
var compIds = db.components.Select(c => c.component_id).ToList();
foreach (var compId in compIds)
{
compLocks[compId] = new object();
}
Then later on I do this:
responses.AsParallel().ForAll(r =>
{
... do some time consuming stuff with web services ...
// this is a *just in case* addition,
// in case a new component was added to
// the db since the dictionary was constructed
// NOTE: it did not have any effect, and I'm no longer
// using it as #Henk pointed out it is not thread-safe.
//if (compLocks[c.component_id] == null)
//{
// compLocks[c.component_id] = new object();
//}
componentList.AsParallel().ForAll(c =>
{
lock (compLocks[c.component_id])
{
... do some processing, save the db records ...
}
});
});
This seems to run perfectly fine but towards the end of program execution (it runs for several hours as there are lots of data) I get the following exception:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Concurrent.ConcurrentDictionary`2.get_Item(TKey key)
I am sure that the ConcurrentDictionary is being populated with every possible component ID.
I have 3 questions:
How is this exception even possible, and how do I fix it?
Do I need a ConcurrentDictionary for this?
Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Post-Answer Edit
To make it clear what the cause of all this was, it's that .AsParallel() doesn't enumerate the collection of responses. It's lazy-evaluated, meaning new responses (and therefore new components) can be added to the collection during run-time (from other processes). Enforcing a snap-shot with .ToList() before the .AsParallel() fixed the problem.
My code for adding component IDs to compLocks during run-time didn't remedy this problem is because it is not thread safe.
1) How is this exception even possible?
Apparently it is, but not from the posted code alone. It would happen if data is added to the db (would it be an option to capture responses with a ToList() beforehand?)
2) Do I need a ConcurrentDictionary for this?
Not with a fixed list, but when the solution involves add-when-missing then yes, you need a Concurrent collection.
3) Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Not totally sure. The locking looks OK but you will still do the processing of duplicates multiple times. Just not at the same time.
Reaction to the edit:
if (compLocks[c.component_id] == null)
{
compLocks[c.component_id] = new object();
}
this is not thread-safe. It is now possible that multiple lock objects are created for 1 component_id value. You need to use one of the GetOrAdd() methods.
But I would not expect this to give the exception you're getting, so it's probably not the direct problem.
I would start by replacing:
lock (compLocks[c.component_id])
{
...
}
by:
object compLock;
if (!compLocks.TryGetValue(c.component_id, out compLock)) Debug.Assert(false);
lock(compLock)
{
...
}
Then set it running and go and get a coffee. When the assert fails you'll be able to debug and get a better idea of what's happening.
As for your questions:
1.How is this exception even possible?
Without seeing the rest of your code, impossible to say.
2.Do I need a ConcurrentDictionary for this?
If you initialize the dictionary once from a single thread, then subsequently only ever read from the dictionary, then it doesn't need to be a ConcurrentDictionary.
3.Is my understanding of how locking works correct in this instance / is there a better way of doing this?
Again, difficuly to say without seeing more code, but I don't see anything obviously wrong with the small sample of code you've posted. But threading is hard and it's quite possible there are race conditions elsewhere in your code.
Related
I have a List that I am adding values to, every interval seconds, which is running in a thread.
var point = GetPoint(presentValue);
DataSource[itemIndex].Add(point);
In an Event I then read values from that List, to be exact I search for the closest value to my target. I Create a local variable for that list to work with, but sometimes I get the Exception
"Destination array not long enough" when creating this List.
I've figured out that this must mean the List was changed while the new List was created, so it's got something to do with the Code above. After a bit of research I found about thread-safety and the "lock" keyword, which I then tried to use. I tried locking to the list itself, to the list's SyncRoot and to a custom sync object, but the error still occured.
lock (SyncHelper.TrendDataPointLock)
{
var point = GetPoint(presentValue);
DataSource[itemIndex].Add(point);
}
and
lock (SyncHelper.TrendDataPointLock)
{
points = new List<DataPoint>(ActualPoints);
}
I know that I'm not fully familiar with the aspects of thread safety, but after looking at many different approaches I still can't seem to make this work.
1: Any advice on how to fix my error
2: Do I need to have a lock statement on every access of that list in order to be sure that the thread will pause before the other lock is released?
3: If not 2, then does locking to the list itself, make every thread block, no matter if they also have a lock statement around the list access or not? So locking on the Add statement "should" fix my problem.
EDIT:
DataSourceis a Dictionary<int, List<DataPoint>>
ActualPointsis a reference to the list DataSource[itemIndex]
The only place where I Edit this list is in the Code above, and when I clear the list.
the pointsvariable is only there for accessing certain indexes to find the closest value to my target, but the index is always lower than points.Count, to be exact, binary search thought the list so im starting in the middle. The Application only crashes when accessing ActualPoints to create the points list, so everything after that shouldn't make a difference.
Try a collection which is already threadsafe. Check out Thread-Safe Collections.
I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.
I use a System.Runtime.Caching.MemoryCache to hold items which never expire. However, at times I need the ability to clear the entire cache. How do I do that?
I asked a similar question here concerning whether I could enumerate the cache, but that is a bad idea as it needs to be synchronised during enumeration.
I've tried using .Trim(100) but that doesn't work at all.
I've tried getting a list of all the keys via Linq, but then I'm back where I started because evicting items one-by-one can easily lead to race conditions.
I thought to store all the keys, and then issue a .Remove(key) for each one, but there is an implied race condition there too, so I'd need to lock access to the list of keys, and things get messy again.
I then thought that I should be able to call .Dispose() on the entire cache, but I'm not sure if this is the best approach, due to the way it's implemented.
Using ChangeMonitors is not an option for my design, and is unnecassarily complex for such a trivial requirement.
So, how do I completely clear the cache?
I was struggling with this at first. MemoryCache.Default.Trim(100) does not work (as discussed). Trim is a best attempt, so if there are 100 items in the cache, and you call Trim(100) it will remove the ones least used.
Trim returns the count of items removed, and most people expect that to remove all items.
This code removes all items from MemoryCache for me in my xUnit tests with MemoryCache.Default. MemoryCache.Default is the default Region.
foreach (var element in MemoryCache.Default)
{
MemoryCache.Default.Remove(element.Key);
}
You should not call dispose on the Default member of the MemoryCache if you want to be able to use it anymore:
The state of the cache is set to indicate that the cache is disposed.
Any attempt to call public caching methods that change the state of
the cache, such as methods that add, remove, or retrieve cache
entries, might cause unexpected behavior. For example, if you call the
Set method after the cache is disposed, a no-op error occurs. If you
attempt to retrieve items from the cache, the Get method will always
return Nothing.
http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.dispose.aspx
About the Trim, it's supposed to work:
The Trim property first removes entries that have exceeded either an absolute or sliding expiration. Any callbacks that are registered
for items that are removed will be passed a removed reason of Expired.
If removing expired entries is insufficient to reach the specified trim percentage, additional entries will be removed from the cache
based on a least-recently used (LRU) algorithm until the requested
trim percentage is reached.
But two other users reported it doesnt work on same page so I guess you are stuck with Remove() http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.trim.aspx
Update
However I see no mention of it being singleton or otherwise unsafe to have multiple instances so you should be able to overwrite your reference.
But if you need to free the memory from the Default instance you will have to clear it manually or destroy it permanently via dispose (rendering it unusable).
Based on your question you could make your own singleton-imposing class returning a Memorycache you may internally dispose at will.. Being the nature of a cache :-)
Here's is what I had made for something I was working on...
public void Flush()
{
List<string> cacheKeys = MemoryCache.Default.Select(kvp => kvp.Key).ToList();
foreach (string cacheKey in cacheKeys)
{
MemoryCache.Default.Remove(cacheKey);
}
}
I know this is an old question but the best option I've come across is to
Dispose the existing MemoryCache and create a new MemoryCache object.
https://stackoverflow.com/a/4183319/880642
The answer doesn't really provide the code to do this in a thread safe way. But this can be achieved using Interlocked.Exchange
var oldCache = Interlocked.Exchange(ref _existingCache, new MemoryCache("newCacheName"));
oldCache.Dispose();
This will swap the existing cache with a new one and allow you to safely call Dispose on the original cache. This avoids needing to enumerate the items in the cache and race conditions caused by disposing a cache while it is in use.
Edit
Here's how I use it in practice accounting for DI
public class CustomCacheProvider : ICustomCacheProvider
{
private IMemoryCache _internalCache;
private readonly ICacheFactory _cacheFactory;
public CustomCacheProvider (ICacheFactory cacheFactory)
{
_cacheFactory = cacheFactory;
_internalCache = _cacheFactory.CreateInstance();
}
public void Set(string key, object item, MemoryCacheEntryOptions policy)
{
_internalCache.Set(key, item, policy);
}
public object Get(string key)
{
return _internalCache.Get(key);
}
// other methods ignored for breviy
public void Dispose()
{
_internalCache?.Dispose();
}
public void EmptyCache()
{
var oldCache = Interlocked.Exchange(ref _internalCache, _cacheFactory.CreateInstance());
oldCache.Dispose();
}
}
The key is controlling access to the internal cache using another singleton which has the ability to create new cache instances using a factory (or manually if you prefer).
The details in #stefan's answer detail the principle; here's how I'd do it.
One should synchronise access to the cache whilst recreating it, to avoid the race condition of client code accessing the cache after it is disposed, but before it is recreated.
To avoid this synchronisation, do this in your adapter class (which wraps the MemoryCache):
public void clearCache() {
var oldCache = TheCache;
TheCache = new MemoryCache("NewCacheName", ...);
oldCache.Dispose();
GC.Collect();
}
This way, TheCache is always in a non-disposed state, and no synchronisation is needed.
I ran into this problem too. .Dispose() did something quite different than what I expected.
Instead, I added a static field to my controller class. I did not use the default cache, to get around this behavior, but created a private one (if you want to call it that). So my implementation looked a bit like this:
public class MyController : Controller
{
static MemoryCache s_cache = new MemoryCache("myCache");
public ActionResult Index()
{
if (conditionThatInvalidatesCache)
{
s_cache = new MemoryCache("myCache");
}
String s = s_cache["key"] as String;
if (s == null)
{
//do work
//add to s_cache["key"]
}
//do whatever next
}
}
Check out this post, and specifically, the answer that Thomas F. Abraham posted.
It has a solution that enables you to clear the entire cache or a named subset.
The key thing here is:
// Cache objects are obligated to remove entry upon change notification.
base.OnChanged(null);
I've implemented this myself, and everything seems to work just fine.
I have a class that maintains a private Dictionary instance that caches some data.
The class writes to the dictionary from multiple threads using a ReaderWriterLockSlim.
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
Right now, I have the following:
public ReadOnlyCollection<MyClass> Values() {
using (sync.ReadLock())
return new ReadOnlyCollection<MyClass>(cache.Values.ToArray());
}
Is there a way to do this without copying the collection many times?
I'm using .Net 3.5 (not 4.0)
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
You have three choices.
1) Make a copy of the data, hand out the copy. Pros: no worries about thread safe access to the data. Cons: Client gets a copy of out-of-date data, not fresh up-to-date data. Also, copying is expensive.
2) Hand out an object that locks the underlying collection when it is read from. You'll have to write your own read-only collection that has a reference to the lock of the "parent" collection. Design both objects carefully so that deadlocks are impossible. Pros: "just works" from the client's perspective; they get up-to-date data without having to worry about locking. Cons: More work for you.
3) Punt the problem to the client. Expose the lock, and make it a requirement that clients lock all views on the data themselves before using it. Pros: No work for you. Cons: Way more work for the client, work they might not be willing or able to do. Risk of deadlocks, etc, now become the client's problem, not your problem.
If you want a snapshot of the current state of the dictionary, there's really nothing else you can do with this collection type. This is the same technique used by the ConcurrentDictionary<TKey, TValue>.Values property.
If you don't mind throwing an InvalidOperationException if the collection is modified while you are enumerating it, you could just return cache.Values since it's readonly (and thus can't corrupt the dictionary data).
EDIT: I personally believe the below code is technically answering your question correctly (as in, it provides a way to enumerate over the values in a collection without creating a copy). Some developers far more reputable than I strongly advise against this approach, for reasons they have explained in their edits/comments. In short: This is apparently a bad idea. Therefore I'm leaving the answer but suggesting you not use it.
Unless I'm missing something, I believe you could expose your values as an IEnumerable<MyClass> without needing to copy values by using the yield keyword:
public IEnumerable<MyClass> Values {
get {
using (sync.ReadLock()) {
foreach (MyClass value in cache.Values)
yield return value;
}
}
}
Be aware, however (and I'm guessing you already knew this), that this approach provides lazy evaluation, which means that the Values property as implemented above can not be treated as providing a snapshot.
In other words... well, take a look at this code (I am of course guessing as to some of the details of this class of yours):
var d = new ThreadSafeDictionary<string, string>();
// d is empty right now
IEnumerable<string> values = d.Values;
d.Add("someKey", "someValue");
// if values were a snapshot, this would output nothing...
// but in FACT, since it is lazily evaluated, it will now have
// what is CURRENTLY in d.Values ("someValue")
foreach (string s in values) {
Console.WriteLine(s);
}
So if it's a requirement that this Values property be equivalent to a snapshot of what is in cache at the time the property is accessed, then you're going to have to make a copy.
(begin 280Z28): The following is an example of how someone unfamiliar with the "C# way of doing things" could lock the code:
IEnumerator enumerator = obj.Values.GetEnumerator();
MyClass first = null;
if (enumerator.MoveNext())
first = enumerator.Current;
(end 280Z28)
Review next possibility, just exposes ICollection interface, so in Values() you can return your own implementation. This implementation will use only reference on Dictioanry.Values and always use ReadLock for access items.
What I have?
An object that is saved in a static variable and called whenever needed
This object interfaces with another application.
I have two collections (Generic Lists) in this object
Logs
And
"Data That Is PreFeteched" to be used later
Problem is when more than one person is trying to use this object (the object interfaces with another application) modifying the collection leads to exceptions or loss of data
Exception in case of loops or using the Find function of the Generic List
What I am trying to do is removed the prefetched data or logs from time to time. I can do this initially when any function in the object is called, but if the collection is modified when two people (or threads) are trying to call the same function at once leads to exceptions or loss of data
Loss of data in case I go:
List AlreadySavedData
{
get
{
//Rough Syntax maybe incorrect - but in actual application is correct
_alreadySavedData= _alreadySavedData.Find(Delegate (Data d {return d.CreatedOn.Date == DateTime.Now.Data;}));
return _alreadySavedData;
}
}
I thought by doing the above I could at least limit my collection of "data that is pre-fetched" or Logs from day to day. But when trying to access or modify the collection at the same time, sometimes one call to "AlreadySavedData" can overwrite a parallel call which might have just modified(added to) the collection leading to loss of data.
Any help will be appreciated
If you must have multiple threads using the collection you will need to provide syncronization. The easiest way is to do this:
protected volatile _alreadySavedData;
List AlreadySavedData
{
get
{
lock(_alreadySavedData)
{
//Rough Syntax maybe incorrect - but in actual application is correct
_alreadySavedData= _alreadySavedData.Find(Delegate (Data d {return d.CreatedOn.Date == DateTime.Now.Data;}));
return _alreadySavedData;
}
}
}
You will need to do this anywhere the static collection is being altered or used. Dealing with concurrency in multi-threaded applications is problematic at best.
For a complete rant+guide+suggestions on the subject see this article:
How to correctly implement multi-threading in C#