C# User class. GetHashCode implementation - c#

I have simple class only with public string properties.
public class SimpleClass
{
public string Field1 {get; set;}
public string Field2 {get; set;}
public string Field3 {get; set;}
public List<SimpleClass> Children {get; set;}
public bool Equals(SimpleClass simple)
{
if (simple == null)
{
return false;
}
return IsFieldsAreEquals(simple) && IsChildrenAreEquals(simple);
}
public override int GetHashCode()
{
return RuntimeHelpers.GetHashCode(this); //Bad idea!
}
}
This code doesn't return same value for equal instances. But this class does not have readonly fields for compute hash.
How can i generate correct hash in GetHashCode() if all my properties are mutable.

The contract for GetHashCode requires (emphasis mine):
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.
So basically, you should compute it based on all the used fields in Equals, even though they're mutable. However, the documentation also notes:
If you do choose to override GetHashCode for a mutable reference type, your documentation should make it clear that users of your type should not modify object values while the object is stored in a hash table.
If only some of your properties were mutable, you could potentially override GetHashCode to compute it based only on the immutable ones - but in this case everything is mutable, so you'd basically end up returning a constant, making it awful to be in a hash-based collection.
So I'd suggest one of three options:
Use the mutable fields, and document it carefully.
Abandon overriding equality/hashing operations
Abandon it being mutable

Related

GetHashCode() - immutable values?

As I know the method "GetHashCode()" should use only readonly / immutable properties. But if I change for example id property which use GetHashCode() then I get new hash code. So why it should be immutable? If it wouldn't changed then I see problem but it changes.
class Program
{
public class Point
{
public int Id { get; set; }
public override bool Equals(object obj)
{
return obj is Point point &&
Id == point.Id;
}
public override int GetHashCode()
{
return HashCode.Combine(Id);
}
}
static void Main(string[] args)
{
Point point = new Point();
point.Id = 5;
var r1 = point.GetHashCode(); //467047723
point.Id = 10;
var r2 = point.GetHashCode(); //1141379410
}
}
GetHashCode() is there for mainly one reason: retrieval of an object from a hash table. You are right that it is desirable that the hash code should be computed only from immutable fields, but think about the reason for this. Since the hashcode is used to retrieve an object from a hashtable it will lead to errors when the hashcode changes while the object is stored in the hashtable.
To put it more generally: the value returned by GetHashCode must stay stable as long as a structure depends on that hashcode to stay stable. So for you example it means you can change the id field as long as the object is currently not used in any such structure.
Exactly because of this, because if it's not Immutable the hash code changes every time
A hash code is a numeric value that is used to identify an object
during equality testing. It can also serve as an index for an object
in a collection.
so if it changes every time you can't use it for its purpose. more info...

More informative comparison of objects in C#

In my C# testing, I often want to compare two objects of the same type (typically an expected object against the actual object), but I want to allow for some flexibility. For example, there may be timestamp fields that I know can't be equal or some fields that I just want to ignore when comparing the objects.
Most importantly, I want to provide an informative message that describes where the two object properties' values differ in order that I can quickly identify what the problem is. For example, a message that says "Source property Name value Fred does not match target property Name value Freda".
The standard Equals and Comparer methods just seem to return ints or Booleans which don't provide enough information for me. At the moment, my object comparison methods return a custom type that has two fields (a boolean and a message), but my thinking is that there must be a more standard way to do this. These days, perhaps a Tuple might be the way to go, but I would welcome suggestions.
"Comparison" might not be the word for what you're trying to do. That word already has a common meaning in this context. We compare objects for equality, which returns a boolean - they are equal or they are not. Or we compare them to see which is greater. That returns an int which can indicate that one or the other is greater, or that they are equal. This is helpful when sorting objects.
What you're trying to do is determine specific differences between objects. I wouldn't try to write something generic that handles different types of objects unless you intend for them to be extremely simple. That gets really complicated as you get into properties that return additional complex objects or collections or collections of complex objects. It's not impossible, just rarely worth the effort compared to just writing a method that compares the particular type you want to compare.
Here's a few interfaces and classes that could make the task a little easier and more consistent. But to be honest it's hard to tell what to do with this. And again, it gets complicated if you're dealing with nested complex properties. What happens if two properties both contain lists of some other object, and all the items in those lists are the same except one on each side that have a differing property. Or what if they're all different? In that case how would you describe the "inequality" of the parent objects? It might be useful to know that they are or are not equal, but less so to somehow describe the difference.
public interface IInstanceComparer<T>
{
IEnumerable<PropertyDifference> GetDifferences(T left, T right);
}
public abstract class InstanceComparer<T> : IInstanceComparer<T>
{
public IEnumerable<PropertyDifference> GetDifferences(T left, T right)
{
var result = new List<PropertyDifference>();
PopulateDifferences(left, right, result);
return result;
}
public abstract void PopulateDifferences(T left, T right,
List<PropertyDifference> differences);
}
public class PropertyDifference
{
public PropertyDifference(string propertyName, string leftValue,
string rightValue)
{
PropertyName = propertyName;
LeftValue = leftValue;
RightValue = rightValue;
}
public string PropertyName { get; }
public string LeftValue { get; }
public string RightValue { get; }
}
public class Animal
{
public string Name { get; }
public int NumberOfLimbs { get; }
public DateTime Created { get; }
}
public class AnimalDifferenceComparer : InstanceComparer<Animal>
{
public override void PopulateDifferences(Animal left, Animal right,
List<PropertyDifference> differences)
{
if(left.Name != right.Name)
differences.Add(new PropertyDifference("Name", left.Name, right.Name));
if(left.NumberOfLimbs!=right.NumberOfLimbs)
differences.Add(new PropertyDifference("NumberOfLimbs",
left.NumberOfLimbs.ToString(),
right.NumberOfLimbs.ToString()));
}
}
You could use extension methods to do this. For example:
public static Extensions
{
public static void CompareWithExpected(this <type> value, <type> expected)
{
Assert.AreEqual(expected.Property1, value.Property1, "Property1 did not match expected";
Assert.AreEqual(expected.Property2, value.Property2, "Property2 did not match expected";
}
}
Then this can be used as follows:
public void TestMethod()
{
// Arrange
...
// Act
...
// Assert
value.CompareWithExpected(expected);
}
You could have any number of these extension methods allowing you the flexibility to check only certain values etc.
This also means you do not need to pollute your types with what is essentially test code.

Using structs instead of classes for simple types

In C# if I use a struct like shown below and do an equality comparison , values of the fields of the struct would be compared and I would get a result true if all the fields have same value.This is the default behaviour.
struct PersonStruct
{
public PersonStruct(string n,int a)
{
Name = n;Age = a;
}
public string Name { get; set; }
public int Age { get; set; }
}
var p1 = new PersonStruct("Jags", 1);
var p2 = new PersonStruct("Jags", 1);
Console.WriteLine(p1.Equals(p2)); //Return True
In case of class same thing would return a value false as it is a reference type.
class PersonClass
{
public PersonClass(string n, int a)
{
Name = n; Age = a;
}
public string Name { get; set; }
public int Age { get; set; }
}
var pc1 = new PersonClass("Jags", 1);
var pc2 = new PersonClass("Jags", 1);
Console.WriteLine(pc1.Equals(pc2));//Returns False
I understand the above concept.My question is considering the above scenario is it a good idea to use structs in such simple cases instead of a class ? I have commonly seen people implement classes in such cases(e.g. simple DTOs) and do all the extra stuff to implement equality operators (such as IEquatable and overridden equals method) .
Is my understanding correct or am I missing something here ?
You should avoid the default implementation of equality for structs. If your structs contain reference type fields (as PersonStruct does) then reflection is used to compare corresponding fields for equality, which is relatively slow. You should also implement IEquatable<T> for your structs since calling the object.Equals(object) method will cause boxing for both the source and argument struct. This will be avoided if the call can be resolved to IEquatable<PersonStruct>.
There is a whole article about this in MSDN.
✓ CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.
X AVOID defining a struct unless the type has all of the following characteristics:
It logically represents a single value, similar to primitive types (int, double, etc.).
It has an instance size under 16 bytes.
It is immutable.
It will not have to be boxed frequently.
In all other cases, you should define your types as classes.
Related:
When do you use a struct instead of a class?

ReadonlyCollection, are the objects immutable?

I'm trying using ReadOnlyCollection to make object immutable, I want the property of object are immutable.
public ReadOnlyCollection<FooObject> MyReadOnlyList
{
get
{
return new ReadOnlyCollection<FooObject>(_myDataList);
}
}
But I little confused.
I tried to change the property of the object in to MyReadOnlyList using a foreach and ... I can change value property, is it correct? I understood ReadOnlyCollection set an add level to make the object immutable.
The fact that ReadOnlyCollection is immutable means that the collection cannot be modified, i.e. no objects can be added or removed from the collection. This does not mean that the objects it contains immutable.
This article by Eric Lippert, explains how different kinds of immutability work. Basically, a ReadOnlyCollection is an immutable facade which can read the underlying collection (_myDataList), but cannot modify it. However, you can still change the underlying collection since you have a reference to _myDataList by doing something like _myDataList[0] = null.
Furthermore, the objects returned by ReadOnlyCollection are the same ones returned by _myDataList, i.e. this._myDataList.First() == this.MyReadOnlyList.First() (with LINQ). This means that if an object in _myDataList is mutable, then so is the object in MyReadOnlyList.
If you want the objects to be immutable, you should design them accordingly. For instance, you might use:
public struct Point
{
public Point(int x, int y)
{
this.X = x;
this.Y = y;
}
// In C#6, the "private set;" can be removed
public int X { get; private set; }
public int Y { get; private set; }
}
instead of:
public struct Point
{
public int X { get; set; }
public int Y { get; set; }
}
Edit: in this case, as noted by Ian Goldby, neither struct allows you to modify properties of the elements in the collection. This happens because structs are value types and when you access an element the collection returns a copy of the value. You can only modify the properties of a Point type if it is a class, which would mean that references to the actual objects are returned, instead of copies of their values.
I tried to change the property of the object in to MyReadOnlyList
using a foreach and ... I can change value property, is it correct? I
understood ReadOnlyCollection set an add level to make the object
immutable.
Using a ReadOnlyCollection does not make any guarantees as for the object that is stored in the collection. All it guarantees is that the collection cannot be modified once it has been created. If an element is retrieved from it, and it has mutable properties, it can very well be modified.
If you want to make your FooObject an immutable one, then simply do so:
public class FooObject
{
public FooObject(string someString, int someInt)
{
SomeString = someString;
SomeInt = someInt;
}
public string SomeString { get; };
public int SomeInt { get; };
}
What is immutable is the collection itself, not the objects. For now, C# doesn't support immutable objects without wrapping them as ReadOnlyCollection<T> does in your case.
Well, you can still create immutable objects if their properties have no accessible setter. BTW, they're not immutable at all because they can mutate from a class member that may have equal or more accessibility than the setter.
// Case 1
public class A
{
public string Name { get; private set; }
public void DoStuff()
{
Name = "Whatever";
}
}
// Case 2
public class A
{
// This property will be settable unless the code accessing it
// lives outside the assembly where A is contained...
public string Name { get; internal set; }
}
// Case 3
public class A
{
// This property will be settable in derived classes...
public string Name { get; protected set; }
}
// Case 4: readonly fields is the nearest way to design an immutable object
public class A
{
public readonly string Text = "Hello world";
}
As I said before, reference types are always mutable by definition and they can behave as immutable under certain conditions playing with member accessibility.
Finally, structs are immutable but they're value types and they shouldn't be used just because they can represent immutable data. See this Q&A to learn more about why structs are immutable: Why are C# structs immutable?

C# Primitive Types or Complex Types as Method Signatures?

What are the pros and cons of using Primitve Types or Complex Types?
When should you use primitive types over complex types and vice versa?
i.e.:
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public int Age { get; set; }
public int IQ { get; set; }
}
public void FooPrimitiveTypes (string firstName, string lastName, int age, int IQ)
{
}
public void FooComplexTypes(Person person)
{
}
To pass each property separately are generally used when you are dealing with disjoint values. Also, sometimes used on constructors. Bad practice.
This way is preferred when the values are related.
Why #1 is a bad practice - suppose you needed to add height. I'd much rather update one class by adding another property, instead of 50 methods.
Does Foo conceptually deal with a Person? Does all (or at least most) of Person get used by Foo, or is it just using a few bits of information that happen to be in Person? Is Foo likely to ever deal with something that's not a Person? If Foo is InsertPersonIntoDB(), then it's probably best to deal with Person.
If Foo is PrintName(), then maybe PrintName(string FirstName, string LastName) is more appropriate (or alternatively, you might define a Name class instead and say that a person has a Name).
If you find yourself creating half initialized temporary Person objects just to pass to Foo, then you probably want to break down the parameters.
Something to note is that when you use primitives they are being passed by value... the object reference is also being passed by value but since all the underlying references to the values are references it is effectively pass by reference. So depending on what you are doing this pass by value or pass by reference could be of importance. Also in the first case modifications to the primitives will not affect the values of the variables in the calling scope however modifying the object passed in will affect the object in the calling scope.

Categories