I am trying to have a data structure with multiple string keys. To do this, I tried to create a Dictionary with string[] element. But the ContainsKey do no seem to work as I expect:
Dictionary<string[], int> aaa = new Dictionary<string[], int>();
int aaaCount = 0;
aaa.Add(new string[] { string1, string2 }, aaaCount++);
if (!aaa.ContainsKey(new string[] { string1, string2 }))
{
aaa.Add(new string[] { string1, string2 }, aaaCount++);
}
I see that there are two entries in aaa after the execution of the code above while I was expecting only one. Is this the expected behaviour? How can I ensure that there are no duplicate entries in the Dictionary?
Note: I tried the same with a list as well (List and the result is the same - the Contains method does not really work with string[])
If you want to use string[] as TKey, you should pass IEqualityComparer<string[]> to the constructor of Dictionary. Because Otherwise Dictionary uses standard comparison for TKey and in case of string[] it just compares references hence string[] is reference type. You have to implement IEqualityComparer yourself. It can be done in the following way:
(The implementation is quite naive, I provide it just as the starting point)
public class StringArrayComparer : IEqualityComparer<string[]>
{
public bool Equals(string[] left, string[] right)
{
if (ReferenceEquals(left, right))
{
return true;
}
if ((left == null) || (right == null))
{
return false;
}
return left.SequenceEqual(right);
}
public int GetHashCode(string[] obj)
{
return obj.Aggregate(17, (res, item) => unchecked(res * 23 + item.GetHashCode()));
}
}
You need to create an IEqualityComparer<string[]> and pass it to the dictionary's constructor.
This tells the dictionary how to compare keys.
By default, it compares them by reference.
Because an array is a reference type, i.e., you are checking reference (identity) equality, not equality based on the values within the array. When you create a new array with the same values the arrays themselves are still two distinct objects, so ContainsKey returns false.
Using an array as a Dictionary key is a bit... odd. What are you trying to map here? There is probably a better way to do it.
You may be better off, if your application supports it, to combine the string array into a single string.
We have numerous cases where two pieces of information uniquely identifies a record in a collection and in these cases, we join the two strings using a value that should never be in either string (i.e. Char(1)).
Since it is usually a class instance that is being added, we let the class specify the generation of the key so that the code adding to the collection only has to worry about checking a single property (i.e. CollectionKey).
Related
Before marking this as duplicate because of its title please consider the following short program:
static void Main()
{
var expected = new List<long[]> { new[] { Convert.ToInt64(1), Convert.ToInt64(999999) } };
var actual = DoSomething();
if (!actual.SequenceEqual(expected)) throw new Exception();
}
static IEnumerable<long[]> DoSomething()
{
yield return new[] { Convert.ToInt64(1), Convert.ToInt64(999999) };
}
I have a method which returns a sequence of arrays of type long. To test it I wrote some test-code similar to that one within Main.
However I get the exception, but I don´t know why. Shouldn´t the expected sequence be comparable to the actually returned one or did I miss anything?
To me it looks as both the method and the epxected contain exactly one single element containing an array of type long, doesn´t it?
EDIT: So how do I achieve to not get the exception meaning to compare the elements within the enumeration to return equality?
The actual problem is the fact that you're comparing two long[], and Enumerable.SequenceEquals will use an ObjectEqualityComparer<Int64[]> (you can see that by examining EqualityComparer<long[]>.Default which is what is being internally used by Enumerable.SequenceEquals), which will compare references of those two arrays, and not the actual values stored inside the array, which obviously aren't the same.
To get around this, you could write a custom EqualityComparer<long[]>:
static void Main()
{
var expected = new List<long[]>
{ new[] { Convert.ToInt64(1), Convert.ToInt64(999999) } };
var actual = DoSomething();
if (!actual.SequenceEqual(expected, new LongArrayComparer()))
throw new Exception();
}
public class LongArrayComparer : EqualityComparer<long[]>
{
public override bool Equals(long[] first, long[] second)
{
return first.SequenceEqual(second);
}
// GetHashCode implementation in the courtesy of #JonSkeet
// from http://stackoverflow.com/questions/7244699/gethashcode-on-byte-array
public override int GetHashCode(long[] arr)
{
unchecked
{
if (array == null)
{
return 0;
}
int hash = 17;
foreach (long element in arr)
{
hash = hash * 31 + element.GetHashCode();
}
return hash;
}
}
}
No, your sequences are not equal!
Lets remove the sequence bit, and just take what is in the first element of each item
var firstExpected = new[] { Convert.ToInt64(1), Convert.ToInt64(999999) };
var firstActual = new[] { Convert.ToInt64(1), Convert.ToInt64(999999) };
Console.WriteLine(firstExpected == firstActual); // writes "false"
The code above is comparing two separate arrays for equality. Equality does not check the contents of arrays it checks the references for equality.
Your code using SequenceEquals is, essentially, doing the same thing. It checks the references in each case of each element in an enumerable.
SequenceEquals tests for the elements within the sequences to be identical. The elements within the enumerations are of type long[], so we actually compare two different arrays (containing the same elements however) against each other which is obsiously done by comparing their references instead of their actual value .
So what we actually check here is this expected[0] == actual[0] instead of expected[0].SequqnceEquals(actual[0])
This is obiosuly returns false as both arrays share different references.
If we flatten the hierarchy using SelectMany we get what we want:
if (!actual.SelectMany(x => x).SequenceEqual(expected.SelectMany(x => x))) throw new Exception();
EDIT:
Based on this approach I found another elegant way to check if all the elements from expected are contained in actual also:
if (!expected.All(x => actual.Any(y => y.SequenceEqual(x)))) throw new Exception();
This will search if for ever sub-list within expected there is a list within actual that is sequentially identical to the current one. This seems much smarter to be as we do not need any custom EqualityComparer and no weird hashcode-implementation.
I have a table that has combo pairs identifiers, and I use that to go through CSV files looking for matches. I'm trapping the unidentified pairs in a List, and sending them to an output box for later addition. I would like the output to only have single occurrences of unique pairs. The class is declared as follows:
public class Unmatched:IComparable<Unmatched>
{
public string first_code { get; set; }
public string second_code { get; set; }
public int CompareTo(Unmatched other)
{
if (this.first_code == other.first_code)
{
return this.second_code.CompareTo(other.second_code);
}
return other.first_code.CompareTo(this.first_code);
}
}
One note on the above code: This returns it in reverse alphabetical order, to get it in alphabetical order use this line:
return this.first_code.CompareTo(other.first_code);
Here is the code that adds it. This is directly after the comparison against the datatable elements
unmatched.Add(new Unmatched()
{ first_code = fields[clients[global_index].first_match_column]
, second_code = fields[clients[global_index].second_match_column] });
I would like to remove all pairs from the list where both first code and second code are equal, i.e.;
PTC,138A
PTC,138A
PTC,138A
MA9,5A
MA9,5A
MA9,5A
MA63,138A
MA63,138A
MA59,87BM
MA59,87BM
Should become:
PTC, 138A
MA9, 5A
MA63, 138A
MA59, 87BM
I have tried adding my own Equate and GetHashCode as outlined here:
http://www.morgantechspace.com/2014/01/Use-of-Distinct-with-Custom-Class-objects-in-C-Sharp.html
The SE links I have tried are here:
How would I distinct my list of key/value pairs
Get list of distinct values in List<T> in c#
Get a list of distinct values in List
All of them return a list that still has all the pairs. Here is the current code (Yes, I know there are two distinct lines, neither appears to be working) that outputs the list:
parser.Close();
List<Unmatched> noDupes = unmatched.Distinct().ToList();
noDupes.Sort();
noDupes.Select(x => x.first_code).Distinct();
foreach (var pair in noDupes)
{
txtUnmatchedList.AppendText(pair.first_code + "," + pair.second_code + Environment.NewLine);
}
Here is the Equate/Hash code as requested:
public bool Equals(Unmatched notmatched)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(notmatched, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, notmatched)) return true;
//Check whether the UserDetails' properties are equal.
return first_code.Equals(notmatched.first_code) && second_code.Equals(notmatched.second_code);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the UserName field if it is not null.
int hashfirst_code = first_code == null ? 0 : first_code.GetHashCode();
//Get hash code for the City field.
int hashsecond_code = second_code.GetHashCode();
//Calculate the hash code for the GPOPolicy.
return hashfirst_code ^ hashsecond_code;
}
I have also looked at a couple of answers that are using queries and Tuples, which I honestly don't understand. Can someone point me to a source or answer that will explain the how (And why) of getting distinct pairs out of a custom list?
(Side question-Can you declare a class as both IComparable and IEquatable?)
The problem is you are not implementing IEquatable<Unmatched>.
public class Unmatched : IComparable<Unmatched>, IEquatable<Unmatched>
EqualityComparer<T>.Default uses the Equals(T) method only if you implement IEquatable<T>. You are not doing this, so it will instead use Object.Equals(object) which uses reference equality.
The overload of Distinct you are calling uses EqualityComparer<T>.Default to compare different elements of the sequence for equality. As the documentation states, the returned comparer uses your implementation of GetHashCode to find potentially-equal elements. It then uses the Equals(T) method to check for equality, or Object.Equals(Object) if you have not implemented IEquatable<T>.
You have an Equals(Unmatched) method, but it will not be used since you are not implementing IEquatable<Unmatched>. Instead, the default Object.Equals method is used which uses reference equality.
Note your current Equals method is not overriding Object.Equals since that takes an Object parameter, and you would need to specify the override modifier.
For an example on using Distinct see here.
You have to implement the IEqualityComparer<TSource> and not IComparable<TSource>.
I was wondering if it was possible to have a dictionairy, where the key is an array of strings, and then search through the dictionairy by comparing a search term with the array?
EG:
my array has 7 words in it
That array is the key. I want to search the dictionary for any key/value where the key might contain the word 'has'. Is this possible?
No, that basically won't work - even with a custom equality comparer. It sounds like what you really want is a dictionary of individual words, where each individual entry has multiple values. You can create that pretty easily using ToLookup, if you've got the input data as a sequence already.
Maybe Dictionary with custom comparer could be the way to go (http://msdn.microsoft.com/en-us/library/ms132072.aspx), but in your sample (Array contains a word), there could be more results matching one key. So Dictionary is probably not the best storage to choose, beacuse it will return only one value
I would go with linq to transform the collection of words (the key) with their value to a dictionary with as key a word and as value an array of all the values that have a key containing that word, exactly.
public class WordsWithValue
{
public string[] Words { get; set; }
public object Value { get; set; }
}
public IDictionary<string, object[]> GetValuesForWord(IEnumerable<WordsWithValue> wordsWithValues)
{
return wordsWithValues.SelectMany(wwv => wwv.Words.Select(word => Tuple.Create(word, wwv.Value)))
.GroupBy(tuple => tuple.Item1, tuple => tuple.Item2, (word, values) => Tuple.Create(word, values.ToArray()))
.ToDictionary(tuple => tuple.Item1, tuple => tuple.Item2);
}
You can of course rewrite this in a few more methods to make this more clear, another option is to use anonymous classes instead of the Tuples I used here to have more sensible names than Item1 and Item2.
I try to write a program where Dictionary is indexed by List. (trust me i do, and yes there are option, but i like indexing by list). There is a minimal working (actually not working, only one last line which is a problem) example:
using System;
using System.Collections.Generic;
namespace test
{
class Program
{
static void Main(string[] args)
{
Dictionary<List<String>, int> h = new Dictionary<List<string>,int>();
List<String> w = new List<string> {"a"};
h.Add(w, 1);
w = new List<string>{"b"};
h.Add(w,2);
w = new List<string>{"a"};
int value = 0;
h.TryGetValue(w, out value);
Console.WriteLine(value+" "+h[w]);
}
}
if one debugs this program, he will clearly see that there two elements in h, but still these elements are not accessible via correct indexes --- h[w]. Am I wrong or is there something weird going on?
The problem with your app extends from the fact that:
new List<String> { "a" } != new List<String> { "a" }
Equality for lists checks to see if the two references refer to the same instance. In this case, they don't. You've instead created two Lists with the same elements...which doesn't make them equal.
You can fix the problem by creating a custom Equality Comparer:
public class ListEqualityComparer<T> : IEqualityComparer<List<T>>
{
public bool Equals(List<T> list1, List<T> list2)
{
return list1.SequenceEquals(list2);
}
public int GetHashCode(List<T> list)
{
if(list != null && list.Length > 0)
{
var hashcode = list[0].GetHashCode();
for(var i = 1; i <= list.Length; i++)
hashcode ^= list[i].GetHashCode();
return hashcode;
}
return 0;
}
}
And then passing that to the Dictionary constructor:
Dictionary<List<String>, int> h =
new Dictionary<List<string>,int>(new ListEqualityComparer<String>());
The problem is the index by List, what you are indexing by isn't the data in the list but you are essentially indexing by the memory pointer to the List (i.e the memory address of where this List is located).
You Created one list at one memory location, you then created a totally different list at a different memory location (ie when you create a new instance). The two lists are different even though they contain the same data, and this means you can add as many as you want to the dictionary.
One solution is Rather than indexing by List would be to index by String and use a comma separated List containing all the data in your list as an index.
This won't ever work for you, because List<T>'s Equals and GetHashCode methods don't consider the contents of the list. If you want to use a collection of objects as a key, you'll need to implement your own collection type that overrides Equals in such a way as to check the equality of the objects in the collection (perhaps using Enumerable.SequenceEqual.)
The Dictionary class uses reference comparison to look for the specified key, that's why even if the lists contain the same items, they are different.
I think that this problem can be sorted using reflection (a technology which I'm not too sure about).
My code is receiving some code objects that have been serialised to XML at runtime. When I receive it and deserialise it one field is causing me some hassle.
There is a field that can contain a combination of the following data classes (simplified for clarity):
class KeyValuePairType
{
public string Key;
public string Value;
}
class KeyValueListPair
{
public string Key;
public string[] Value;
}
I receive these into my code as an object[] and I need to determine at runtime what exactly this contains so that I can call an interface on a local object that requires
KeyValuePairType[] and KeyValueListPair[] as parameters e.g.
public DoSomeWork(KeyValuePairType[] data1, KeyValueListPair[] data2)
I have the following cases to cope with:
object[] contains:
nothing in which case I call
DoSomeWork(null,null);
an array of KeyValuePairType only,
in which case I call
DoSomeWork(KeyValuePairType[],
null);
an array of KeyValueListPair only,
in which case I call
DoSomework(null,
KeyValueListPair[]);
or an array of each, in which case I
call DoSomework(KeyValuePairType[],
KeyValueListPair[]);
Any ideas are welcome.
Thank you
It turns out that the object array contains a random sequence of discrete objects. Initially I was led to belive that it may be a sequence of discretes and arrays of those objects.
As it is the LINQ statements will cover all eventualities.
Can I say a big thank you to those that that answered. I have posted a +1 for those answering with the LINQ statements.
Assuming you've got LINQ available to you...
public void Foo(object[] values)
{
var pairs = values.OfType<KeyValuePairType>().ToArray();
var lists = values.OfType<KeyValueListPair>().ToArray();
pairs = pairs.Length == 0 ? null : pairs;
lists = lists.Length == 0 ? null : lists;
DoSomeWork(pairs, lists);
}
You can do this using LINQ in C# 3, like this:
void HandleThings(params object[] values) {
var pairTypes = values.OfType<KeyValuePairType>().ToArray();
var listPairs = values.OfType<KeyValueListPair>().ToArray();
DoSomeWork(pairTypes.Any() ? pairTypes : null, listPairs.Any() ? listPairs : null);
}
You can make it a tiny bit faster by replacing .Any() with .Length > 0, at the cost of brevity.
How about this:
object[] objects = GetObjects();
var pairs = objects.OfType<KeyValuePairType[]>().FirstOrDefault();
var lists = objects.OfType<KeyValueListPair[]>().FirstOrDefault();
DoSomeWork(pairs, lists);
It depends, somewhat, on how the individual elements are being handled to you, but in general, the is and as keywords should work fine for checking individual objects in the object[] array, and assigning them to the appropriate output.
After making sure object[] is not null and has a length of greater than 1, you can just call GetType on the objects in the array.
objectArray[0].GetType().FullName
will return either
"Namespace.KeyValuePairType"
or
"Namespace.KeyValueListPair"