Removing multiple items from ConcurrentDictionary based on conditional key - c#

Let's say that I have a ConcurrentDictionary:
var dict = new ConcurrentDictionary<string, someObject>();
dict.TryAdd("0_someA_someB_someC", obj0);
dict.TryAdd("1_someA_someB_someC", obj1);
dict.TryAdd("2_someA_someB_someC", obj2);
dict.TryAdd("3_someA_someB_someC", obj3);
The <number>_ in the keys is incremented and being a dictionary, there is no guarantee that the elements are in order.
Now, imagine I wanted to remove all items from the dictionary that have the number less than 2. I have no idea what the keys will look like, only that they will be prefixed with a number as above.
How can I remove all elements from the dictionary who's key starts with a value less than 2?
For example, the resulting dict after this process will look like this:
dict.TryAdd("2_someA_someB_someC", obj2);
dict.TryAdd("3_someA_someB_someC", obj3);

Presuming it has always this format, you can use LINQ:
var keysToRemove = dict.Keys.Where(key => int.Parse(key.Remove(key.IndexOf('_'))) < 2).ToList();
keysToRemove.ForEach(key => dict.TryRemove(key, out someObject obj));
String.Remove removes the part starting from _ and then parses the remaiming first part, the number. It will only select the keys which number is lower than 2.
This list will be used to remove the items from the dictionary. Of course you need a lock to make this thread-safe.

Parse the number that comes before the first underscore (Tip: IndexOf and Substring)
Convert it to integer (Tip: int.TryParse)
Compare number to the value (2 in this case)
Filter the keys applying this method, store them in a collection. Iterate over the collection and call TryRemove method to remove entries associated with the key.

You would need to iterate across the dictionary collecting the keys that match the criteria and then iterate across that list of keys deleting from the dictionary. A for-each across a dictionary returns items with Key and Value properties, so you can examine the Key property to decide whether or not to delete. You cannot delete in the same loop as this will result in an error.

You could use Split to split the key into an array on the _ character, convert the first item in the resulting array into an int (note this will throw if the key doesn't start with an int), and if it's less than 2, remove it from the dictionary:
foreach (var item in dict.Where(kvp => int.Parse(kvp.Key.Split('_')[0]) < 2))
{
SomeObject tempObject;
dict.TryRemove(item.Key, out tempObject);
}

Related

Is it possible, with a dictionary, to specify part of the key, and allow the rest to be anything that the data type allows?

I have a list of people each with two sets of genes. The genes are designated by a string of the first few alphabet characters. A capital letter means it is a dominant allele, a lower case means recessive.
The first character of each set specifies the eye colour, and a combination of different alleles allows for different eye colours.
Is it possible to look up the combination of alleles, i.e. Ab, Ac or cA, and return the value for the existence of A.
My code so far is:
Dictionary<string, string> EyeColours = new Dictionary<string, string>
{ {"A","Blue"}, { "aa", "DarkBlue" }, { "bb", "Hazel" }, { "cc", "Gray" }, { "dd", "Amethyst" } };
and an example of the gene sets would be
{"AabAC", "aBAAd"}
I want the value "Blue" to be returned if the either of the first two chars is A, is there an efficient way to do this, or do I have to just brute force it?
Have a dictionary with all the different values as keys and the colour of the eyes as the value. Split your gene to either be a capital letter or your 2 lower case letters (co dominance). Then all you need is a simple System.Linq query:
string gene = <process it>;
List<string> result = new List<string>( );
result = dictionary.Where(kvp => kvp.Key.IndexOf(gene) >= 0).ToDictionary(kvp => kvp.Key, kvp => kvp.Value).Values.ToList();
There is no efficient mechanism to look up all keys based on a substring. While making multiple lookups can work, it's also not very efficient.
One way to handle this efficiently would be to introduce an additional dictionary, which will contain the letter as a key, and map it to multiple keys in the other dictionary, all of which contain that letter.
Then, the lookup can be though of as follows:
Find the keys which contain the alleles in the first dictionary
Lookup each key in the second dictionary to find the genes
This, of course, will require to add items in both dictionaries.
Dictionary has a ContainsKey(key) method:
From your examples:
{"AabAC", "aBAAd"}
You can pull your substrings out, I am assuming "A" or "Aa". Then perform:
EyeColours.ContainsKey("A");
If the first on is successful, no need to do the second. If not try with the second set.
Maybe have some handling to insert a new set if both fail.
From https://msdn.microsoft.com/en-us/library/kw5aaea4(v=vs.110).aspx
"This method approaches an O(1) operation."
Or you can also try Dictionary.TryGetValue(key)
From https://msdn.microsoft.com/en-us/library/bb347013(v=vs.110).aspx
"This method approaches an O(1) operation."
Both are efficient, and both are built into dictionary.
string eyeColorSet1 = gene.Substring(0, 1)
and
string eyeColorSet2 = gene.Substring(0, 2)
To get your leading alleles to check is not difficult or expensive either.
I would put all of this into its own method named
string GetEyeColour(string geneSequence)
or something to that effect. Get your test substrings, search the dictionary, add new if that's something you want to handle.
Given the efficiency of the dictionary methods, keeping only 1 collection and checking it for 2 values is going to be easier to manage and maintain than 2 collections. This also holds true if you are going to check for 3 values, ie "A", "a", or "Aa", from your example set.

how to find elements present in all three lists (most eficiently)

I use C# and I have three List<int> (say of equal size n and of distinct elements). My goal is to find elements present in all three. So I could iterate through first one and check if item is in the other two. That would be O(n^2). I could sort the other two lists first and then check for item in them using binary search. That would be O(nlogn) (without sorting).
Or I could construct two dictionaries Dictionary<int, byte>, where the key would be my list's item and then checking for an item would be O(1) and the total O(n). But how about the price of constructing dictionary? Can anyone tell how much does that cost?
Also perhaps there is even more efficient algorithm?
Using a HashSet is fairly simple, and I think it will be your best bet for performance.
HashSet<T> hset = new HashSet<T>(list1);
hset.IntersectWith(list2);
hset.IntersectWith(list3);
return hset.ToList(); // skip the ToList() if you don't explicitly need a List
You could use only one Dictionary<int, byte> where value of byte could be 1 or 2. For first list you just do the insert with value equal to 1, and for second list you do TryGetValue and based on result do either insert or update with value equal to 2. For third list you check if value is 2.

What is the quickest way to compare a C# Dictionary to a 'gold standard' Dictionary for equality?

I have a known-good Dictionary, and at run time I need to create a new Dictionary and run a check to see if it has the same key-value pairs as the known-good Dictionary (potentially inserted in different orders), and take one path if it does and another if it doesn't. I don't necessarily need to serialize the entire known-good Dictionary (I could use a hash, for example), but I need some on-disk data that has enough information about the known-good Dictionary to allow for comparison, if not for recreation. What is the quickest way to do this? I can use a SortedDictionary, but the amount of time required to initialize and add values counts in the speed of this task.
Concrete example:
Consider a Dictionary<String,List<String>> that looks something like this (in no particular order, obviously):
{ {"key1", {"value1", "value2"} }, {"key2", {"value3", "value4"} } }
I create that Dictionary once and save some form of information about it on disk (a full serialization, a hash, whatever). Then, at runtime, I do the following:
Dictionary<String,List<String>> d1 = new Dictionary<String,List<String>> ();
Dictionary<String,List<String>> d2 = new Dictionary<String,List<String>> ();
Dictionary<String,List<String>> d3 = new Dictionary<String,List<String>> ();
String key11 = "key1";
String key12 = "key1";
String key13 = "key1";
String key21 = "key2";
String key22 = "key2";
String key23 = "key2";
List<String> value11 = new List<String> {"value1", "value2"};
List<String> value12 = new List<String> {"value1", "value2"};
List<String> value13 = new List<String> {"value1", "value2"};
List<String> value21 = new List<String> {"value3", "value4"};
List<String> value22 = new List<String> {"value3", "value4"};
List<String> value23 = new List<String> {"value3", "value5"};
dict1.add(key11, value11);
dict1.add(key21, value21);
dict2.add(key22, value22);
dict2.add(key12, value12);
dict3.add(key13, value13);
dict3.add(key23, value23);
dict1.compare(fileName); //Should return true
dict2.compare(fileName); //Should return true
dict3.compare(fileName); //Should return false
Again, if the overall time from startup to the return from compare() is quicker, I can change this code to use a SortedDictionary (or anything else) instead, but I can't guarantee ordering and I need some consistent comparison. compare() could load a serialization and iterate through the dictionaries, it could serialize the in-memory dictionary and compare the serialization to the file name, or it could do any number of other things.
Solution one: use set equality.
If the dictionaries are of different sizes, you know they are unequal.
If they are of the same size then build a mutable hash set of keys from one dictionary. Remove from it all the keys from the other dictionary. If you attempted to remove a key that wasn't there, then the key sets are unequal and you know which key was the problem.
Alternatively, build two hash sets and take their intersection; the resulting intersection should be the size of the original sets.
This takes O(n) time and O(n) space.
Once you know that the key sets are equal then go through all the keys one at a time, fetch the values, and do comparison of the values. Since the values are sequences, use SequenceEquals. This takes O(n) time and O(1) space.
Solution two: sort the keys
Again, if the dictionaries are of different size, you know they are unequal.
If they are of the same size, sort both sets of keys and do a SequenceEquals on them; if the sequences of keys are unequal then the dictionaries are unequal.
This takes O(n lg n) time and O(n) space.
If that succeeds, then again, go through the keys one at a time and compare the values.
Solution three:
Again, check the dictionaries to see if they are the same size.
If they are, then iterate over the keys of one dictionary and check to see if the key exists in the other dictionary. If it does not, then they are not equal. If it does, then check the corresponding values for equality.
This is O(n) in time and O(1) in space.
How to choose amongst these possible solutions? It depends on what the likely failure mode is, and whether you need to know what the missing or extra key is. If the likely failure mode is a bad key then it might be more performant to choose a solution that concentrates on finding the bad key first, and only checking for bad values if all the keys turn out to be OK. If the likely failure mode is a bad value, then the third solution is probably best, since it prioritizes checking values early.
Due to my comments on the accepted answer, here's a stricter check.
goodDictionary.Keys.All(k=>
{
List<string> otherVal;
if(!testDictionary.TryGetValue(k,out otherVal))
{
return false;
}
return goodDictionary[k].SequenceEquals(otherVal);
})
If you already have serialisation, then take the hash (I recommend SHA-1) of each serialised dictionary and then compare them.
I don't think there is a magic bullet here; you just need to do a lookup for each key pair:
public bool IsDictionaryAMatch(Dictionary<string, List<string>> dictionaryToCheck)
{
foreach(var kvp in dictionaryToCheck)
{
// Do the Keys Match
if(!goodDictionary.Exists(x => x.Key == kvp.Key))
return false;
foreach(var valueElement in kvp.Value)
{
// Do the Values in each list match
if(!goodDictionary[kvp.Key].Exists(x => x == valueElement))
return false;
}
}
return true;
}
Well, at some point you need to compare that each key has the same value, but before that you can do quick things, like checking to see how many keys each dictionary has, then checking that the list of keys match. Those should be fairly quick, and if either of those tests fail you can abort the more expensive testing.
After that, you might be able to build separate lists of keys and then fire off a Paraells query to compare the actual values.

How to find point between two keys in sorted dictionary

I have a sorted dictionary that contains measured data points as key/value pairs. To determine the value for a non-measured data point I want to extrapolate the value between two known keys using a linear interpolation of their corresponding values. I understand how to calculate the non-measured data point once I have the two key/value pairs it lies between. What I don't know is how to find out which keys it lies between. Is there a more elegant way than a "for" loop (I'm thinking function/LINQ query) to figure out which two keys my data point lies between?
Something like this would work:
dic.Keys.Zip(dic.Keys.Skip(1),
(a, b) => new { a, b })
.Where(x => x.a <= datapoint && x.b >= datapoint)
.FirstOrDefault();
This traverses they keys using the fact that they are ordered and compares all two keys following each other in order - since LINQ is lazy once you find the first match the traversal will stop.
The standard C# answers are all O(N) complexity.
Sometimes you just need a small subset in a rather large sorted collection. (so you're not iterating all the keys)
The standard C# collections won't help you here. And a solution is as followed:
http://www.itu.dk/research/c5/ Use the IntervalHeap in the C5 collections library. This class supports a GetRange() method and will lookup the startkey with O(log N) complexity and iterate the range with O(N) complexity. Which will be definately useful for big datasets if performance is critical. e.g. Spatial Partitioning in gaming
Possible you're asking about following:
myDictionary.Keys.Where(w => w > start && w < end)
regular loop should be ok here:
IEnumerable<double> keys = ...; //ordered sequence of keys
double interpolatedKey = ...;
// I'm considering here that keys collection doesn't contain interpolatedKey
double? lowerFoundKey = null;
double? upperFoundKey = null;
foreach (double key in keys)
{
if (key > interpolatedKey)
{
upperFoundKey = key;
break;
}
else
lowerFoundKey = key;
}
You can do it in C# with LINQ with shorter but less effective code:
double lowerFoundKey = key.LastOrDefault(k => k < interpolatedKey);
double upperFoundKey = key.FirstOrDefault(k => k > interpolatedKey);
In order to it efficiently with LINQ it should have a method which is called windowed in F# with parameter 2. It will return an IEnumerable of adjacent pairs in keys collection. While this function is missing in LINQ regular foreach loop should be ok.
I don't think there is a function on SortedDictionary that lets you find elements around the one you need faster than iterating elements. (+1 to BrokenGlass solution)
To be able to find items faster you need to switch to some other structure. I.e. SortedList provides similar functionality but allows to index its Key collection and hence you can use binary serach to find the range.

how to fetch keys of dictionary in C#

I'm using a Dictionary<char, ulong> where the char is a obj no, & ulong is offset.
Now, I need to access the Keys of Dictionary. Keys is any random number which cant predict.
I am using VS-2005.
I am new to C# so plz provide some code.
As Matt says, the Keys property is your friend. This returns a KeyCollection, but usually you just want to iterate over it:
foreach (char key in dictionary.Keys)
{
// Whatever
}
Note that the order in which the keys are returned is not guaranteed. In many cases it will actually be insertion order, but you must not rely on that.
I'm slightly concerned that you talk about the keys being random numbers when it looks like they're characters. Have you definitely chosen the right types here?
One more tip - if you will sometimes need the values as well as the keys, you can iterate over the KeyValuePairs in the dictionary too:
foreach(KeyValuePair<char, ulong> pair in dictionary)
{
char c = pair.Key;
ulong x = pair.Value;
...
}
Dictionary provides a Keys property that contains all of the keys in the dictionary.

Categories