Why is Dictionary.ContainsKey() & ToString() causing GC Alloc?

Why is Dictionary.ContainsKey() & ToString() causing GC Alloc? - c#

There's not much more I have to offer in terms of details other than the fact that there's around 1k instances that are running their own ContainsKey() and ToString() pretty often.
Location is just my personal replacement for Unity's Vector3 to fit my needs:
[Serializable] public struct Location
{
public double X;
public double Y;
public double Z;
public Location(double x, double y, double z) : this()
{
X = x;
Y = y;
Z = z;
}
public override string ToString()
{
return String.Format("{0}, {1}, {2}", X, Y, Z);
}
}
(I know I'm breaking a rule of some sort with Structs.. just not sure how to achieve my needs another way atm.)
Here's a screenshot of the Profiler running:
As you can see, for most of the timeline, it's stable, then all of the sudden after my instances reach around 1k(quantity), (they start around 100-250) the CPU and Memory go wild due to what seems to be the GC allocations. I've been going through looking for what I can clean up a bit better, but all I see that's even causing ANY GC alloc is when I am running an:
if (_dictionary.ContainsKey(key)) {...}
and when renaming Unity GameObjects with:
part.name = "Part: " + part.Location.ToString();
If it just pertains to the unavoidable time it takes for the lookup, then are there any alternatives to Dictionaries that tend to function even slower but causing less GC alloc, and is there a more effective way to override the ToString() method?
Addition: My Dictionary is Key: (my personal struct) Location, Value: Class instance.

Turning comments into an answer...
Your ToString() method is always going to create a new string, so that's no surprise. However, you're also using string concatenation, so you're creating two new strings. You could reduce this to one by just inlining the ToString() method. For example, using C# 6 interpolated strings for brevity:
var location = part.Location;
part.name = $"Part: {location.X}, {location.Y}, {location.Z}";
For the dictionary aspect, there are two issues:
You're not overriding Equals and GetHashCode, which may mean that the value is being boxed in order to call the implementations in ValueType. I'm not 100% sure on this; the rules on boxing can be complicated.
You're not implementing IEquatable<T>, so it's very likely that any Equals calls will be boxing.
You can fix both of these easily:
[Serializable] public struct Location : IEquatable<Location>
{
public double X;
public double Y;
public double Z;
public Location(double x, double y, double z) : this()
{
X = x;
Y = y;
Z = z;
}
public override string ToString() => $"{X}, {Y}, {Z}";
public override bool Equals(object obj) =>
obj is Location loc && Equals(loc);
public bool Equals(Location other) =>
X == other.X && Y == other.Y && Z == other.Z;
public override int GetHashCode()
{
// Replace with whatever implementation you want
int hash = 17;
hash = hash * 23 + X.GetHashCode();
hash = hash * 23 + Y.GetHashCode();
hash = hash * 23 + Z.GetHashCode();
return hash;
};
}
(That's using C# 7 syntax, but I'd expect that to still be okay if you're using a modern version of Unity with VS2017. If you're using an older version, you should be able to implement the same methods just in a slightly more longwinded fashion.)

Related

Why does List<T>.Sort() where T:IComparable<T> produce a different order than List<T>.Sort(IComparer<T>)?

List<T>.Sort() where T:IComparable<T> produces a different order than List<T>.Sort(IComparer<T>), with equal inputs.
Note the comparer is not really correct - it does not return 0 for equality.
Both orders are valid (because the difference lies in the equal elements), but I'm wondering where the difference arises from, having looked through the source, and if it is possible to alter the IComparable.CompareTo to match the behavior of the IComparer.Compare when passed into the list.
The reason I'm looking at this is because IComparable is far faster than using an IComparer (which I'm guessing is why there are two different implementations in .NET), and I was hoping to improve the performance of an open source library by switching the IComparer to IComparable.
A full example is at: https://dotnetfiddle.net/b7s6my
Here's the class/comparer:
public class Point : IComparable<Point>
{
public int X { get; }
public int Y { get; }
public int UID { get; set; } //Set by outside code
public Point(int x, int y)
{
X = x;
Y = y;
}
public int CompareTo(Point b)
{
return PointComparer.Default.Compare(this, b);
}
}
public class PointComparer : IComparer<Point>
{
public readonly static PointComparer Default = new PointComparer();
public int Compare(Point a, Point b)
{
if (a.Y == b.Y)
{
return (a.X < b.X) ? -1 : 1;
}
return (a.Y > b.Y) ? -1 : 1;
}
}
Note the comparer is not mine (so I can't really change it) - and changing the sort order causes the surrounding code to fail.

As mentioned in comments problem is with IComparer<Point>, which for two equal objects a and b (ones that have same X and Y) returns Compare(a, b) = 1, and Compare(b, a) = 1.
However, question arose why are the sorts still different.
Checking source of ArraySortHelper (see comment of #Sweeper) showed two versions of quick sort algorithm implementations (one of explicit IComparer and one for implicit).
Algorithms are mostly the same, however, function PickPivotAndPartition is a bit different. One function is PickPivotAndParition(Span<T> keys), another is PickPivotAndPartition(Span<T> keys, Comparison<T> comparer).
In first function there's line:
while (... && GreaterThan(ref pivot, ref leftRef = ref Unsafe.Add(ref leftRef, 1))) ;
And in second function similar line looks like:
while (comparer(keys[++left], pivot) < 0) ;
So, that looks to be a point - first function line can be thought as Compare(pivot, left) > 0, while second line as Compare(left, pivot) < 0, so when you have Compare(left, pivot) = 1 and Compare(pivot, left) = 1, condition in first function will be true, while in second - false.
This means that two algorithm implementations can select different array slices and hence have different output.

Validate object/struct without failing

Assume we have a huge list of numeric cartesian coordinates (5;3)(1;-9) etc. To represent a point in oop I created a struct/object (c#):
public struct Point
{
public int X, Y { get; }
public Point(int x, int y)
{
// Check if x,y falls within certain boundaries (ex. -1000, 1000)
}
}
It might be wrong of how I am using struct. I guess normally you would not use a constructor but this is not the point.
Suppose I want to add a list of 1000 points and there is no guarantee that these coordinates fall within boundaries. Simply if the point is not valid, move to the next one without failing and inform user about it. As for object, I would think that Point should be responsible for instantiation and validation by itself but I am not sure how to deal with it in this particular case. Checking x, y beforehand by the caller would be the simplest approach but it does not feel right because caller would have to handle logic that should reside in Point.
What would the most appropriate approach to validate and handle incorrect coordinates without failing and violating SRP?

You can't do this in the constructor; the constructor either runs succesfully or it doesn't. If it doesn't its because an exception is raised, so, so much for silently failing. You could catch exceptions but that woul basically mean you are using exceptions as a control flow mechanism and that is a big no no, don't do that!
One solution similar to what you are thinking is to use a static factory method:
public struct Point
{
public static bool TryCreatePoint(int x, int y, Bounds bounds, out Point point)
{
if (x and y are inside bounds)
{
point = new Point(x, y);
return true;
}
point = default(Point);
return false;
}
//...
}
And the code adding points to the list should act based upon creation success.
Fun fact: if you are using C# 7 the code could look a lot cleaner:
public static (bool Succesful, Point NewPoint) TryCreatePoint(int x, int y, Bounds bounds)
{
if (x and y are inside bounds)
return (true, new Point(x, y));
return (false, default(Point));
}

I can think of three options:
Have the constructor throw an exception that you catch. This is not really great if you are expecting a lot of failures.
Have an IsValid property on the struct that you can use to filter it out once created.
Have the thing loading the data take responsibility for validating the data as well. This would be my preferred option. You say "it does not feel right because caller would have to handle logic that should reside in Point" but I would argue that the responsibility for checking that loaded data is correct is with the thing loading the data, not the data type. You could also have it throw an ArgumentOutOfRangeException in the constructor if the inputs are not valid now that you are no longer expecting anything invalid to be passed as a belt and bracers approach to things.

What you want to do is simply not posible, an instance of a class is either fully created or not at all. If the constructor has been called the only way to not instantiate an instance is by throwing an exception.
So you have these two opportunities to do this:
Extract a method Validate that returns a bool and can be called from the caller of your class.
public struct Point
{
public int X, Y { get; }
public Point(int x, int y)
{
}
}
public bool Validate() { return -1000 <= X && X <= 1000 && -1000 <= Y and Y <= 1000; }
Of course you could do the same using a property.
Throw an exception in the constructor
public Point(int x, int y)
{
if(x > 1000) throw new ArgumentException("Value must be smaller 1000");
// ...
}
However the best solution IMHO is to validate the input before you even think about creating a point, that is check the arguments passed to the constructor beforehand:
if(...)
p = new Point(x, y);
else
...

To be honest, Point shouldn't check boundaries, so the caller should do that. A point is valid in the range that their X and Y can operate (int.MinValue and int.MaxValue). So a -1000000,2000000 is a valid point. The problem is that this point isn't valid for YOUR application, so YOUR application (the caller), the one who is using point, should have that logic, not inside the point constructor.

Structs in C# are funny so I'll add another "funny" way to check:
struct Point
{
int _x;
public int X
{
get { return _x; }
set { _x = value; ForceValidate(); }
} // simple getter & setter for X
int _y;
public int Y
{
get { return _y; }
set { _y = value; ForceValidate(); }
} // simple getter & setter for Y
void ForceValidate()
{
const MAX = 1000;
const MIN = -1000;
if(this.X >= MIN && this.X <= MAX && this.Y >= MIN && this.Y <= MAX)
{
return;
}
this = default(Point); // Yes you can reasign "this" in structs using C#
}
}

C# performant alternatives to HashSet and Dictionary that do not use GetHashCode

I'm looking for built-in alternatives of HashSet and Dictionary objects that have better performance than lists but do not use the internal GetHashCode method. I need this because for the class I have written, there is no way of writing a GetHashCode method that fulfills the usual contract with Equals other than
public override int GetHashCode() { return 0; } // or return any other constant value
which would turn HashSet and Dictionary into ordinary lists (performance-wise).
So what I need is a set implementation and a mapping implementation. Any suggestions?
EDIT:
My class is a tolerance-based 3-dimensional vector class:
public class Vector
{
private static const double TOL = 1E-10;
private double x, y, z;
public Vector(double x, double y, double z)
{
this.x = x; this.y = y; this.z = z;
}
public override bool Equals(object o)
{
Vector other = o as Vector;
if (other == null)
return false;
return ((Math.Abs(x - other.x) <= TOL) &&
(Math.Abs(y - other.y) <= TOL) &&
(Math.Abs(z - other.z) <= TOL));
}
}
Note that my Equals method is not transitive. However, in my use case I can make it "locally" transitive because at some point, I will know all vectors that I need to put into my set / mapping key set, and I also know that they will come in clusters. So when I have collected all vectors, I will choose one representative per cluster and replace all original vectors by the representative. Then Equals will be transitive among the elements of my set / mapping key set.
When I have my set or mapping, I will collect vectors from another source (for the sake of this question let's assume I'll ask a user to type in a vector). These can be any possible vector. Those will never be added to the set/mapping, but I will need to know if they are contained in the set / key set of the mapping (regarding tolerance), and I will need to know their value from the mapping.

You need a data structure that supports sorting, binary search and fast insertion. Unfortunately there is no such collection in the .NET Framework. The SortedDictionary doesn't supports binary search, while the SortedList suffers from O(n) insertion for unsorted data. So you must search for a third party tool. A good candidate seems to be the TreeDictionary of C5 library. It is a red-black tree implementation that offers the important method RangeFromTo. Here is an incomplete implementation of a Dictionary that has Vectors as keys, backed internally by a C5.TreeDictionary:
public class VectorDictionary<TValue>
{
private readonly C5.TreeDictionary<double, (Vector, TValue)> _tree =
new C5.TreeDictionary<double, (Vector, TValue)>();
public bool TryGetKeyValue(Vector key, out (Vector, TValue) pair)
{
double xyz = key.X + key.Y + key.Z;
// Hoping that not all vectors are crowded in the same diagonal line
var range = _tree.RangeFromTo(xyz - Vector.TOL * 3, xyz + Vector.TOL * 3);
var equalPairs = range.Where(e => e.Value.Item1.Equals(key));
// Selecting a vector from many "equal" vectors is tricky.
// Some may be more equal than others. :-) Lets return the first for now.
var selectedPair = equalPairs.FirstOrDefault().Value;
pair = selectedPair;
return selectedPair.Item1 != null;
}
public Vector GetExisting(Vector key)
{
return TryGetKeyValue(key, out var pair) ? pair.Item1 : default;
}
public bool Contains(Vector key) => TryGetKeyValue(key, out var _);
public bool Add(Vector key, TValue value)
{
if (Contains(key)) return false;
_tree.Add(key.X + key.Y + key.Z, (key, value));
return true;
}
public TValue this[Vector key]
{
get => TryGetKeyValue(key, out var pair) ? pair.Item2 : default;
set => _tree.Add(key.X + key.Y + key.Z, (key, value));
}
public int Count => _tree.Count;
}
Usage example:
var dictionary = new VectorDictionary<int>();
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.5 * 1E-10, 0, 0), 1)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.6 * 1E-10, 0, 0), 2)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(1.6 * 1E-10, 0, 0), 3)}");
Console.WriteLine($"dictionary.Count: {dictionary.Count}");
Console.WriteLine($"dictionary.Contains: {dictionary.Contains(new Vector(2.5 * 1E-10, 0, 0))}");
Console.WriteLine($"dictionary.GetValue: {dictionary[new Vector(2.5 * 1E-10, 0, 0)]}");
Output:
Added: True
Added: False
Added: True
dictionary.Count: 2
dictionary.Contains: True
dictionary.GetValue: 3

You can get a reasonably good hashcode implementation in your case. Remember that the most important rule for a hash code is the following:
Two equal vectors must return the same value
This does not mean that two different vectors cannot return the same value; they obviously have to in some cases, the number of hashes is limited, the number of distinct vectors for all purposes isn't.
Well, with that in mind, simply evaluate your hashcode based upon the vectors coordinates truncated to the tolerance's significant digits minus one. All equal vectors will give you the same hash and a small minority of non equal vectors that differ in the last decimal wont...you can live with that.
UPDATE: Changed rounded to truncated. Rounding is not the right choice.

Is it better to put multiple numbers in multiple int vars or in one int[] array?

Just wondering cause I often find myself giving input like coordinates (X, Y) and was wondering which case is better.
If I store 3 int in one array I have a reduction of the code to 1/3, but are there more reason to prefer array over multiple vars?
Example to clarify:
int[] coord = new int[2];
coord[0] = 3;
coord[1] = 2;
or
int x = 3;
int y = 2;

I'd say that if the coordinates are so tightly coupled that you always pass both of them together (which I believe to be true), you can create a struct to encapsulate them.
public struct Coords
{
private int x;
private int y;
public Coords(int x, int y)
{
this.x = x;
this.y = y;
}
public int X
{
get { return x; }
}
public int Y
{
get { return y; }
}
}
In such scenario you can pass it like this:
var c = new Coords(1, 2);
MyMethod(c);
You have an optimization tag attached to your question, but if the problem is not critical to your application's performance, I'd go with readability/design over nanoseconds.

It kinda depens on what you're using the values for.
If you're holding a ton of values, like in a game, you should make it easily readable for yourself and maybe other coders and clarify what a value means.
You shouldn't hold many values in one array, like say HP, MP, speed, rotation, height, width,... and not clarifying what they are.
But you should say HP=100; MP=80, ...
In cases like this, there's almost everytime a 'player' class though.
That class contains player.hitpoints, player.magicpoints, player.speed, ...
But for coordinates, what I think programmers use is an array with x, y (and sometime z) coordinates.

Equals and hashcode for vertices

I have a couple of vertices which I want to put into a Hashtable. Vertices which are really close to each other are considered as the same vertex. My C# vertex class looks like this:
public class Vertex3D
{
protected double _x, _y, _z;
public static readonly double EPSILON = 1e-10;
public virtual double x
{
get { return _x;}
set { _x = value; }
}
public virtual double y
{
get { return _y; }
set { _y = value; }
}
public virtual double z
{
get { return _z; }
set { _z = value; }
}
public Vertex3D(double p1, double p2, double p3)
{
this._x = p1;
this._y = p2;
this._z = p3;
}
public override bool Equals(object obj)
{
var other = obj as Vertex3D;
if (other == null)
{
return false;
}
double diffx = this.x - other.x;
double diffy = this.y - other.y;
double diffz = this.z - other.z;
bool eqx = diffx > -EPSILON && diffx < EPSILON;
bool eqy = diffy > -EPSILON && diffy < EPSILON;
bool eqz = diffz > -EPSILON && diffz < EPSILON;
return eqx && eqy && eqz;
}
public override int GetHashCode()
{
return this.x.GetHashCode() ^ this.y.GetHashCode() ^ this.z.GetHashCode();
}
public override string ToString()
{
return "Vertex:" + " " + x + " " + y + " " + z;
}
Now lets say I put the following two vertices into a dictionary (a dictionary is a hashtable which doesn't allow null keys):
Dictionary<Vertex3D, Vertex3D> vertexList = new Dictionary<Vertex3D, Vertex3D>();
Vertex3D v0 = new Vertex3D(0.000000000000000037842417475065449, -1, 0.00000000000000011646698526992202));
Vertex3D v1 = new Vertex3D(0, -1, 0));
vertexList.Add(v0, v0);
vertexList.Add(v1, v1);
The problem is that my implementation of equals and hashcode is faulty. The above two vertices are considered as equal because the distance to each other is smaller than EPSILON. BUT they don't return the same hashcode.
How do I implement equals and hashcode correctly?

Hashtables require equivalence classes, but your Equals() is not transitive. Therefore you cannot use a hashtable for this purpose. (If, for example, you allowed nearby objects to compare equal by rounding to lattice points, you would have transitivity and equivalence classes. But then there still would be arbitrarily close points, down to the precision of your representation, which fell on opposite sides of a threshold and thus in different equivalence classes)
There are other data structures, such as octtrees, which are designed to accelerate finding nearby points. I suggest you use one of those.

Generally, mutable-thing references should only be considered equivalent if they both refer to the same object. Only references to immutable things should use any other definition of equality. It would be helpful if Object included virtual functions to test for equivalence in the scenario where two references are held by separate objects, neither of which will expose its reference to anything that might mutate it. Unfortunately, even though the effectively-immutable-instance-of-mutable-type pattern is very common (nearly all immutable collections, for example, use one or more mutable-type objects such as arrays to hold their data) there's no standard pattern for equivalence testing with it.
If you want to store vertices in a dictionary using Object.Equals for equality testing, it should be an immutable type. Alternatively, you could define a custom IEqualityComparer<T> for use with the dictionary, but you should be aware that Dictionary should only be used to find perfect matches. If you want to be able to find any point that's within EPSILON of a given point, you should use a which maps rounded values to lists of precise values (values should be rounded to a power of two that's at least twice as great as epsilon). If adding or subtracting EPSILON from some or all of the coordinates in a point would cause it to be rounded differently, the point should be included in the dictionary, rounded every such possible way.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is Dictionary.ContainsKey() & ToString() causing GC Alloc? - c#

Related

Why does List<T>.Sort() where T:IComparable<T> produce a different order than List<T>.Sort(IComparer<T>)?

Validate object/struct without failing

C# performant alternatives to HashSet and Dictionary that do not use GetHashCode

Is it better to put multiple numbers in multiple int vars or in one int[] array?

Equals and hashcode for vertices

Categories

Resources