How to count distinct trees in a list of trees - c#

I have a tree structure as follows:
public class TAGNode
{
public string Val;
public string Type = "";
private List<TAGNode> childs;
public IList<TAGNode> Childs
{
get { return childs.AsReadOnly(); }
}
public TAGNode AddChild(string val)
{
TAGNode tree = new TAGNode(val);
tree.Parent = this;
childs.Add(tree);
return tree;
}
public override bool Equals(object obj)
{
var t = obj as TAGNode;
bool eq = Val == t.Val && childs.Count == t.Childs.Count;
if (eq)
{
for (int i = 0; i < childs.Count; i++)
{
eq &= childs[i].Equals(t.childs[i]);
}
}
return eq;
}
}
I have a list of such trees which can contain repeated trees, by repeated I mean they have the same structure with the same labels. Now I want to select distinct trees from this list. I tried
etrees = new List<TAGNode>();
TAGNode test1 = new TAGNode("S");
test1.AddChild("A").AddChild("B");
test1.AddChild("C");
TAGNode test2 = new TAGNode("S");
test2.AddChild("A").AddChild("B");
test2.AddChild("C");
TAGNode test3 = new TAGNode("S");
test3.AddChild("A");
test3.AddChild("B");
etrees.Add(test1);
etrees.Add(test2);
etrees.Add(test3);
var results = etrees.Distinct();
label1.Text = results.Count() + " unique trees";
This returns the count of all the trees (3) while I expect 2 distinct trees! I think maybe I should implement a suitable Equals function for it, but as I tested it doesn't care what Equals returns!

I think maybe I should implement a suitable Equals function for it
Correct.
but as I tested it doesn't care what Equals returns!
Because you have to implement a matching GetHashCode! It doesn't need to include all the items used inside the Equals, in your case Val could be sufficient. Remember, all you need is to return one and the same hash code for the potentially equal items. The items with different hash codes are considered non equal and never checked with Equals.
So something like this should work:
public bool Equals(TAGNode other)
{
if ((object)this == (object)other) return true;
if ((object)other == null) return false;
return Val == other.Val && childs.SequenceEqual(other.childs);
}
public override bool Equals(object obj) => Equals(obj as TAGNode);
public override int GetHashCode() => Val?.GetHashCode() ?? 0;
Once you do that, you can also "mark" your TAGNode as IEquatable<TAGNode>, to let the default equality comparer directly call the Equals(TAGNode other) overload.

see https://msdn.microsoft.com/en-us/library/bb348436(v=vs.100).aspx
If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable generic interface in the class. The following code example shows how to implement this interface in a custom data type and provide GetHashCode and Equals methods.
you need to impliment IEquatable for TagNode

try following for GetHashCode. I updated the method below to make more robust. Was afraid original answer may not create unique has values.
private int GetHashCode(TAGNode node)
{
string hash = node.Val;
foreach(TAGNode child in node.childs)
{
hash += GetHashStr(child);
}
return hash.GetHashCode();
}
private string GetHashStr(TAGNode node)
{
string hash = node.Val;
foreach (TAGNode child in node.childs)
{
hash += ":" + GetHashStr(child);
}
return hash;
}

Related

Proper way to write GetHashCode() when Equality Comparer is based on OR operation?

I'm trying to write an Equality Comparer for a simple class with 3 fields, like so:
public class NumberClass
{
public int A { get; set; }
public int B { get; set; }
public int C { get; set; }
}
My condition for two objects of NumberClass to be equal is if Obj1.A == Obj2.A || Obj1.B == Obj2.B (in other words, OR), Obj1 and Obj2 being instances of NumberClass.
I can easily write the Equals() of my comparer as follows, but I don't know what to do with my GetHashCode() method.
public bool Equals(NumberClass x, NumberClass y)
{
if (x.A == y.A || x.B == y.B)
return true;
else
return false;
}
public int GetHashCode(NumberClass obj)
{
return ???
}
If my condition for equality was AND instead of OR, I could write my GetHashCode() as follows, taken from this SO answer.
public int GetHashCode(NumberClass obj)
{
unchecked
{
int hash = 17;
if (obj != null)
{
hash = hash * 23 + obj.A.GetHashCode();
hash = hash * 23 + obj.B.GetHashCode();
}
return hash;
}
}
But that obviously wouldn't work for OR since only one of A or B being equal is sufficient for my equality condition to be true.
One workaround I could think of is always returning the same value in GetHashCode() which would be sufficient for LINQ operations such as Distinct(), but I feel like there should be another way as that has its own shortcomings.
What's the proper way to handle this situation?
P.S.
For testing, imagine my Main() is the following:
static void Main(string[] args)
{
List<NumberClass> list = new List<NumberClass>();
list.Add(new NumberClass { A = 1, B = 2, C = 3 });
list.Add(new NumberClass { A = 1, B = 22, C = 33 });
var distinct = list.Distinct(new NumberComparer());
Console.ReadKey();
}
I expect distinct to contain only the first element in the list.
There is no solution for your situation. Your objects violate assumptions that are necessary for an equality comparer to work, for example, it assumes that equality is going to be transitive, but that's not true of your implementation of equality.
You simply won't be able to use any hash-based algorithms so long as you have "fuzzy" equality like that.

Avoiding duplicates in a HashSet of custom types in C#

I have the following custom class deriving from Tuple:
public class CustomTuple : Tuple<List<string>, DateTime?>
{
public CustomTuple(IEnumerable<string> strings, DateTime? time)
: base(strings.OrderBy(x => x).ToList(), time)
{
}
}
and a HashSet<CustomTuple>. The problem is that when I add items to the set, they are not recognised as duplicates. i.e. this outputs 2, but it should output 1:
void Main()
{
HashSet<CustomTuple> set = new HashSet<CustomTuple>();
var a = new CustomTuple(new List<string>(), new DateTime?());
var b = new CustomTuple(new List<string>(), new DateTime?());
set.Add(a);
set.Add(b);
Console.Write(set.Count); // Outputs 2
}
How can I override the Equals and GetHashCode methods to cause this code to output a set count of 1?
You should override GetHashCode and Equals virtual methods defined in System.Object class.
Please remember that:
If two objects are logically "equal" then they MUST have the same hash code!
If two objects have the same hashcode, then it is not mandatory to have your objects equal.
Also, i've noticed an architectural problem in your code:
List is a mutable type but overriding Equals and GetHashCode usually makes your class logically to behave like a value type. So having "Item1" a mutable type and behaving like a value type is very dangerous. I suggest replacing your List with a ReadOnlyCollection . Then you would have to make a method that checks whether two ReadOnlyCollections are Equal.
For the GetHashCode () method, just compose a string from all string items found in Item1 then append a string that represents the Hash code for datetime then finally call on the concatenated result the "GetHashCode ()" overrided on string method. So normally you would have:
override int GetHashCode () {
return (GetHashCodeForList (Item1) + (Item2 ?? DateTime.MinValue).GetHashCode ()).GetHashCode ();
}
And the GetHashCodeForList method would be something like this:
private string GetHashCodeForList (IEnumerable <string> lst) {
if (lst == null) return string.Empty;
StringBuilder sb = new StringBuilder ();
foreach (var item in lst) {
sb.Append (item);
}
return sb.ToString ();
}
Final note: You could cache the GetHashCode result since it is relative expensive to get and your entire class would became immutable (if you replace List with a readonly collection).
A HashSet<T> will first call GetHashCode, so you need to work on that first. For an implementation, see this answer: https://stackoverflow.com/a/263416/1250301
So a simple, naive, implementation might look like this:
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + this.Item2.GetHashCode();
foreach (var s in this.Item1)
{
hash = hash * 23 + s.GetHashCode();
}
return hash;
}
}
However, if your lists are long, then this might not be efficient enough. So you'll have to decide where to compromise depending on how tolerant you are of collisions.
If the result of GetHashCode for two items are the same, then, and only then, will it call Equals. An implementation of Equals is going to need to compare the items in the list. Something like this:
public override bool Equals(object o1)
{
var o = o1 as CustomTuple;
if (o == null)
{
return false;
}
if (Item2 != o.Item2)
{
return false;
}
if (Item1.Count() != o.Item1.Count())
{
return false;
}
for (int i=0; i < Item1.Count(); i++)
{
if (Item1[i] != o.Item1[i])
{
return false;
}
}
return true;
}
Note that we check the date (Item2) first, because that's cheap. If the date isn't the same, we don't bother with anything else. Next we check the Count on both collections (Item1). If they don't match, there's no point iterating the collections. Then we loop through both collections and compare each item. Once we find one that doesn't match, we return false because there is no point continuing to look.
As pointed out in George's answer, you also have the problem that your list is mutable, which will cause problems with your HashSet, for example:
var a = new CustomTuple(new List<string>() {"foo"} , new DateTime?());
var b = new CustomTuple(new List<string>(), new DateTime?());
set.Add(a);
set.Add(b);
// Hashset now has two entries
((List<string>)a.Item1).Add("foo");
// Hashset still has two entries, but they are now identical.
To solve that, you need to force your IEnumerable<string> to be readonly. You could do something like:
public class CustomTuple : Tuple<IReadOnlyList<string>, DateTime?>
{
public CustomTuple(IEnumerable<string> strings, DateTime? time)
: base(strings.OrderBy(x => x).ToList().AsReadOnly(), time)
{
}
public override bool Equals(object o1)
{
// as above
}
public override int GetHashCode()
{
// as above
}
}
This is is what I went for, which outputs 1 as desired:
private class CustomTuple : Tuple<List<string>, DateTime?>
{
public CustomTuple(IEnumerable<string> strings, DateTime? time)
: base(strings.OrderBy(x => x).ToList(), time)
{
}
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
{
return false;
}
var that = (CustomTuple) obj;
if (Item1 == null && that.Item1 != null || Item1 != null && that.Item1 == null) return false;
if (Item2 == null && that.Item2 != null || Item2 != null && that.Item2 == null) return false;
if (!Item2.Equals(that.Item2)) return false;
if (that.Item1.Count != Item1.Count) return false;
for (int i = 0; i < Item1.Count; i++)
{
if (!Item1[i].Equals(that.Item1[i])) return false;
}
return true;
}
public override int GetHashCode()
{
int hash = 17;
hash = hash*23 + Item2.GetHashCode();
return Item1.Aggregate(hash, (current, s) => current*23 + s.GetHashCode());
}
}

Implementing correct GetHashCode

I have the following class
public class ResourceInfo
{
public string Id { get; set; }
public string Url { get; set; }
}
which contains information about some resource.
Now I need the possibility to check if two such resources are equal by the following scenario (I`ve implemented IEquatable interface)
public class ResourceInfo : IEquatable<ResourceInfo>
{
public string Id { get; set; }
public string Url { get; set; }
public bool Equals(ResourceInfo other)
{
if (other == null)
return false;
// Try to match by Id
if (!string.IsNullOrEmpty(Id) && !string.IsNullOrEmpty(other.Id))
{
return string.Equals(Id, other.Id, StringComparison.InvariantCultureIgnoreCase);
}
// Match by Url if can`t match by Id
return string.Equals(Url, other.Url, StringComparison.InvariantCultureIgnoreCase);
}
}
Usage: oneResource.Equals(otherResource). And everything is just fine. But some time have passed and now I need to use such eqaulity comparing in some linq query.
As a result I need to implement separate Equality comparer which looks like this:
class ResourceInfoEqualityComparer : IEqualityComparer<ResourceInfo>
{
public bool Equals(ResourceInfo x, ResourceInfo y)
{
if (x == null || y == null)
return object.Equals(x, y);
return x.Equals(y);
}
public int GetHashCode(ResourceInfo obj)
{
if (obj == null)
return 0;
return obj.GetHashCode();
}
}
Seems to be ok: it makes some validation logic and uses the native equality comparing logic. But then I need to implement GetHashCode method in the ResourceInfo class and that is the place where I have some problem.
I don`t know how to do this correctly without changing the class itself.
At first glance, the following example can work
public override int GetHashCode()
{
// Try to get hashcode from Id
if(!string.IsNullOrEmpty(Id))
return Id.GetHashCode();
// Try to get hashcode from url
if(!string.IsNullOrEmpty(Url))
return Url.GetHashCode();
// Return zero
return 0;
}
But this implementation is not very good.
GetHashCode should match the Equals method : if two objects are equal, then they should have the same hashcode, right? But my Equals method uses two objects to compare them. Here is the usecase, where you can see the problem itself:
var resInfo1 = new ResourceInfo()
{
Id = null,
Url = "http://res.com/id1"
};
var resInfo2 = new ResourceInfo()
{
Id = "id1",
Url = "http://res.com/id1"
};
So, what will happen, when we invoke Equals method: obviously they will be equal, because Equals method will try to match them by Id and fail, then it tries matching by Url and here we have the same values. As intended.
resInfo1.Equals(resInfo1 ) -> true
But then, if they are equal, they should have the same hash codes:
var hash1 = resInfo.GetHashCode(); // -263327347
var hash2 = resInfo.GetHashCode(); // 1511443452
hash1.GetHashCode() == hash2.GetHashCode() -> false
Shortly speaking, the problem is that Equals method decides which field to use for equality comparing by looking at two different objects, while GetHashCode method have access only to one object.
Is there a way to implement it correctly or I just have to change my class to avoid such situations?
Many thanks.
Your approach to equality fundamentally breaks the specifications in Object.Equals.
In particular, consider:
var x = new ResourceInfo { Id = null, Uri = "a" };
var y = new ResourceInfo { Id = "yz", Uri = "a" };
var z = new ResourceInfo { Id = "yz", Uri = "b" };
Here, x.Equals(y) would be true, and y.Equals(z) would be true - but x.Equals(z) would be false. That is specifically prohibited in the documentation:
If (x.Equals(y) && y.Equals(z)) returns true, then x.Equals(z) returns true.
You'll need to redesign, basically.

how to implement override of GetHashCode() with logic of overriden Equals()

I have some classes as below, i have implemented the Equals(Object) method for almost all of them. But i don't know how to write GetHashCode() . As far I used these data types as value type in a Dictionary Collection, i think i should override GetHashCode().
1.I don't know how to implement GetHashCode() with logic of Equals(Object).
2.There are some derived classes, if i override GetHashCode() and Equals(Object) for base class ( Param ), is it still necessary to override it for childs?
class Param
{
...
public Int16 id { get; set; }
public String name { get; set; }
...
public override bool Equals(object obj)
{
if ( obj is Param){
Param p = (Param)(obj);
if (id > 0 && p.id > 0)
return (id == p.id);
else if (name != String.Empty && p.name != String.Empty)
return (name.equals(p.name));
else
return object.ReferenceEquals(this, obj);
}
return false;
}
}
class Item
{
public int it_code { get; set; }
public Dictionary<String, Param> paramAr { get; set; }
...
public override bool Equals(Object obj)
{
Item im = new Item();
if (obj is Item)
im = (Item)obj;
else
return false;
if (this.it_code != String.Empty && im.it_code != String.Empty)
if (this.it_code.Equals(im.it_code))
return true;
bool reParams = true;
foreach ( KeyValuePair<String,Param> kvp in paramAr ){
if (kvp.Value != im.paramAr[kvp.Key]) {
reParams = false;
break;
}
}
return reParams;
}
}
class Order
{
public String or_code { get; set; }
public List <Item> items { get; set; }
...
public override bool Equals( Object obj ){
Order o = new Order();
if (obj is Order)
o = (Order)obj;
else
return false;
if (this.or_code != String.Empty && o.or_code != String.Empty)
if (this.or_code.Equals(o.or_code))
return true;
bool flag = true;
foreach( Item i in items){
if (!o.items.Contains(i)) {
flag = false;
break;
}
}
return flag;
}
}
EDIT:
i get this warning:
Warning : 'Item' overrides Object.Equals(object o) but does not
override Object.GetHashCode()
Firstly, as I think you understand, wherever you implement Equals you MUST also implement GetHashCode. The implementation of GetHashCode must reflect the behaviour of the Equals implementation but it doesn't usually use it.
See http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx - especially the "Notes to Implementers"
So if you take your example of the Item implementation of Equals, you're considering both the values of id and name to affect equality. So both of these must contribute to the GetHashCode implementation.
An example of how you could implement GetHashCode for Item would be along the lines of the following (note you may need to make it resilient to a nullable name field):
public override GetHashCode()
{
return id.GetHashCode() ^ name.GetHashCode();
}
See Eric Lippert's blog post on guidelines for GetHashCode - http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
As for whether you need to re-implement GetHashCode in subclasses - Yes if you also override Equals - as per the first (and main) point - the implementation of the two must be consistent - if two items are considered equal by Equals then they must return the same value from GetHashCode.
Side note:
As a performance improvement on your code (avoid multiple casts):
if ( obj is Param){
Param p = (Param)(obj);
Param p = obj as Param;
if (p != null) ...
I prefer Josh Bloch's aproach.
Here's the example for the Param class.
override GetHashCode(object obj)
{
unchecked
{
int hash = 17;
hash = hash * 23 + id.GetHashCode();
hash = hash * 23 + name.GetHashCode();
return hash;
}
}
Also, check this link out : .net - best algorithm for GetHashCode
Properties used for the hashcode computation should be immutable as well.

add unique Elements to List/HashSet of User defined class (i.e. List<Operand> )

add unique Elements to List of User defined class (i.e. List )
public class Operand: IEnumerable, IEnumerator
{
public String opr;
public String state;
}
i want made it and implement all necessary routines to apply List on it.
(Note : class Operand: IEnumerable, IEnumerator )
but when i am Trying to add elements ie Operand 's object
operand tem1=new Operand("eax","undef");
operand tem2=new Operand("ebx","undef");
operand tem3=new Operand("ecx","undef");
operand tem4=new Operand("eax","undef");
operand tem5=new Operand("eax","undef");
then i want to add these 5 temp[1-5] elements in List OR HashSet.
and if there is duplicate element the update the state of that element to Def i.e.Define
how do i do it...
please help me..
In your Operand class override the GetHashCodeMethod and Equals method. Make sure you return a unique HashCode for unique Operand instance and also make sure if two operands are same i.e. their property values are same return true in equals method and false otherwise
public override bool Equals(object obj)
{
if(obj is Operand)
{
Operand op = obj as Operand;
if (this.opr == op.opr && this.state == op.state)
return true;
}
return false;
}
public override int GetHashCode()
{
int hash = 13;
hash = (hash * 7) + opr.GetHashCode();
hash = (hash * 7) + state.GetHashCode();
return hash;
}
After implementing these method you can check for duplicate in the List of Hashset by using Contains method. If you find the duplicate instead of inserting new record update the existing one.
Put this code in a operand.java file:
public class Operand
{
public String opr;
public String state;
}
Now in other bean.java file write this code
public class bean {
List<operand> mylist = new ArrayList<operand>();
String[] opr = new String[]{"eax","ecx","eax","eax"};
String[] state = new String[]{"def","def","def","def"};
operand[] o = new operand[4];
public bean() {
for (int i = 0; i <=3; i++) {
o[i] = new operand();
o[i].setCity(opr[i]);
o[i].setName(state[i]);
mylist.add(o[i]);
}
}
}
I think it will work for you problem, try it out.

Categories