Why is my OrderBy running forever with this comparator? - c#

I have a class,
public class NullsAreLast : IComparer<int?>
{
public int Compare (int? x, int? y)
{
if(y == null)
return -1;
else if(x == null)
return 1;
else
return (int)x - (int)y;
}
}
which is self-explanatory on how it is supposed to work.
Whenever I run
arr.OrderBy(i => i, new NullsAreLast())
with at least two null values in arr it runs forever! Any idea why?

Keep in mind that a sorting algorithm may compare the same two values several times over the process of ordering the whole sequence. Because of this, it's very important to be aware of all three possible results: less than, greater than, and equal.
This is (mostly) fine for your integer comparison at the end (the subtraction operation). There are some weird/rare edge cases when working with floating point numbers instead of integers, and calling .CompareTo() is the preferred practice anyway, but subtraction is usually good enough in this case. However, the null checks here are a real problem.
Think about what happens as a list is nearly finished sorting. You have two null values that have both made their way to the front of the list; the algorithm just needs to verify they are in the correct position. Because both x and y are null, your function should return 0. They are equivalent (for this purpose, at least). Instead, the code always returns -1. The y value will always be less than then x value, and so the algorithm will always believe it still needs to swap them. It swaps, and tries to do the same thing again. And again. And again. And again. It can never finish.
Try this instead:
public class NullsAreLast : IComparer<int?>
{
public int Compare (int? x, int? y)
{
if(!y.HasValue)
{
if (!x.HasValue) return 0;
return -1;
}
if(!x.HasValue) return 1;
return x.Value.CompareTo(y.Value);
}
}

The minus operation at the end of your Compare method isn't appropriate for comparison. You need to handle exactly three possibilities - x is bigger, y is bigger, or they are the same.
MSDN
Compares two objects and returns a value indicating whether one is
less than, equal to, or greater than the other.
With this code, suppose X was 1000 and Y was 15. Your result would be 985, which doesn't make sense here.
Given your code and method name, I'm going to guess what you meant is this:
public class NullsAreLast : IComparer<int?>
{
public int Compare (int? x, int? y)
{
if(y == null)
return -1;
else if(x == null)
return 1;
else{
int diff = x - y;
if (diff == 0) return 0; //same
if (diff < 0) return 1; //y was bigger
if (diff > 0) return -1; //x was bigger
}
}
}
You could even smash it into a horrible one-liner:
return (y==null?-1:(x==null?1:(x-y==0?0:(x-y<0?1:-1))));

Related

Why does List<T>.Sort() where T:IComparable<T> produce a different order than List<T>.Sort(IComparer<T>)?

List<T>.Sort() where T:IComparable<T> produces a different order than List<T>.Sort(IComparer<T>), with equal inputs.
Note the comparer is not really correct - it does not return 0 for equality.
Both orders are valid (because the difference lies in the equal elements), but I'm wondering where the difference arises from, having looked through the source, and if it is possible to alter the IComparable.CompareTo to match the behavior of the IComparer.Compare when passed into the list.
The reason I'm looking at this is because IComparable is far faster than using an IComparer (which I'm guessing is why there are two different implementations in .NET), and I was hoping to improve the performance of an open source library by switching the IComparer to IComparable.
A full example is at: https://dotnetfiddle.net/b7s6my
Here's the class/comparer:
public class Point : IComparable<Point>
{
public int X { get; }
public int Y { get; }
public int UID { get; set; } //Set by outside code
public Point(int x, int y)
{
X = x;
Y = y;
}
public int CompareTo(Point b)
{
return PointComparer.Default.Compare(this, b);
}
}
public class PointComparer : IComparer<Point>
{
public readonly static PointComparer Default = new PointComparer();
public int Compare(Point a, Point b)
{
if (a.Y == b.Y)
{
return (a.X < b.X) ? -1 : 1;
}
return (a.Y > b.Y) ? -1 : 1;
}
}
Note the comparer is not mine (so I can't really change it) - and changing the sort order causes the surrounding code to fail.
As mentioned in comments problem is with IComparer<Point>, which for two equal objects a and b (ones that have same X and Y) returns Compare(a, b) = 1, and Compare(b, a) = 1.
However, question arose why are the sorts still different.
Checking source of ArraySortHelper (see comment of #Sweeper) showed two versions of quick sort algorithm implementations (one of explicit IComparer and one for implicit).
Algorithms are mostly the same, however, function PickPivotAndPartition is a bit different. One function is PickPivotAndParition(Span<T> keys), another is PickPivotAndPartition(Span<T> keys, Comparison<T> comparer).
In first function there's line:
while (... && GreaterThan(ref pivot, ref leftRef = ref Unsafe.Add(ref leftRef, 1))) ;
And in second function similar line looks like:
while (comparer(keys[++left], pivot) < 0) ;
So, that looks to be a point - first function line can be thought as Compare(pivot, left) > 0, while second line as Compare(left, pivot) < 0, so when you have Compare(left, pivot) = 1 and Compare(pivot, left) = 1, condition in first function will be true, while in second - false.
This means that two algorithm implementations can select different array slices and hence have different output.

SortedSet with element duplication - can't remove element

I'm working on an implementation of the A-star algorithm in C# in Unity.
I need to evaluate a collection of Node :
class Node
{
public Cell cell;
public Node previous;
public int f;
public int h;
public Node(Cell cell, Node previous = null, int f = 0, int h = 0)
{
this.cell = cell;
this.previous = previous;
this.f = f;
this.h = h;
}
}
I have a SortedSet which allows me to store several Node, sorted by h property. Though, I need to be able to store two nodes with the same h property. So I've implemented a specific IComparer, in a way that allow me sorting by h property, and triggerring equality only when two nodes are representing the exact same cell.
class ByHCost : IComparer<Node>
{
public int Compare(Node n1, Node n2)
{
int result = n1.h.CompareTo(n2.h);
result = (result == 0) ? 1 : result;
result = (n1.cell == n2.cell) ? 0 : result;
return result;
}
}
My problem : I have a hard time to remove things from my SortedSet (I named it openSet).Here is an example:
At some point in the algorithm, I need to remove a node from the list based on some criteria (NB: I use isCell127 variable to focus my debug on an unique cell)
int removedNodesNb = openSet.RemoveWhere((Node n) => {
bool isSame = n.cell == candidateNode.cell;
bool hasWorseCost = n.f > candidateNode.f;
if(isCell127)
{
Debug.Log(isSame && hasWorseCost); // the predicate match exactly one time and debug.log return true
}
return isSame && hasWorseCost;
});
if(isCell127)
{
Debug.Log($"removed {removedNodesNb}"); // 0 nodes where removed
}
Here, the removeWhere method seems to find a match, but doesn't remove the node.
I tried another way :
Node worseNode = openSet.SingleOrDefault(n => {
bool isSame = n.cell == candidateNode.cell;
bool hasWorseCost = n.f > candidateNode.f;
return isSame && hasWorseCost;
});
if(isCell127)
{
Debug.Log($"does worseNode exists ? {worseNode != null}"); // Debug returns true, it does exist.
}
if(worseNode != null)
{
if(isCell127)
{
Debug.Log($"openSet length {openSet.Count}"); // 10
}
openSet.Remove(worseNode);
if(isCell127)
{
Debug.Log($"openSet length {openSet.Count}"); // 10 - It should have been 9.
}
}
I think the problem is related to my pretty unusual IComparer, but I can't figure whats exatcly the problem.
Also, I would like to know if there is a significative performance improvment about using an auto SortedSet instead of a manually sorted List, especially in the A-star algorithm use case.
If i write your test you do:
n1.h < n2.h
n1.cell = n2.cell -> final result = 0
n1.h > n2.h
n1.cell = n2.cell -> final result = 0
n1.h = n2.h
n1.cell != n2.cell -> final result = 1
n1.h < n2.h
n1.cell != n2.cell -> final result = -1
n1.h > n2.h
n1.cell != n2.cell -> final result = 1
when you have equality on h value (test number 3) you choose to have always the same result -> 1. so its no good you have to have another test on cell to clarify the position bacause there is a confusion with other test which gives the same result (test number 5)
So i could test with sample, but i am pretty sure you break the Sort.
So if you clarify the test, i suggest you to use Linq with a list...its best performance.
I'll answer my own topic because I've a pretty complete one.
Comparison
The comparison of the IComparer interface needs to follow some rules. Like #frenchy said, my own comparison was broken. Here are math fundamentals of a comparison I totally forgot (I found them here):
1) A.CompareTo(A) must return zero.
2) If A.CompareTo(B) returns zero, then B.CompareTo(A) must return zero.
3) If A.CompareTo(B) returns zero and B.CompareTo(C) returns zero, then A.CompareTo(C) must return zero.
4) If A.CompareTo(B) returns a value other than zero, then B.CompareTo(A) must return a value of the opposite sign.
5) If A.CompareTo(B) returns a value x not equal to zero, and B.CompareTo(C) returns a value y of the same sign as x, then A.CompareTo(C) must return a value of the same sign as x and y.
6) By definition, any object compares greater than (or follows) null, and two null references compare equal to each other.
In my case, rule 4) - symetry - was broken.
I needed to store multiple node with the same h property, but also to sort by that h property. So, I needed to avoid equality when h property are the same.
What I decided to do, instead of a default value when h comparison lead to 0 (which broke 4th rule), is refine the comparison in a way that never lead to 0 with a unique value foreach node instance. Well, this implementation is probably not the best, maybe there is something better to do for a unique value, but here is what I did.
private class Node
{
private static int globalIncrement = 0;
public Cell cell;
public Node previous;
public int f;
public int h;
public int uid;
public Node(Cell cell, Node previous = null, int f = 0, int h = 0)
{
Node.globalIncrement++;
this.cell = cell;
this.previous = previous;
this.f = f;
this.h = h;
this.uid = Node.globalIncrement;
}
}
private class ByHCost : IComparer<Node>
{
public int Compare(Node n1, Node n2)
{
if(n1.cell == n2.cell)
{
return 0;
}
int result = n1.h.CompareTo(n2.h);
result = (result == 0) ? n1.uid.CompareTo(n2.uid) : result; // Here is the additional comparison which never lead to 0. Depending on use case and number of object, it would be better to use another system of unique values.
return result;
}
}
RemoveWhere method
RemoveWhere use a predicate to look into the collection so I didn't think it cares about comparison. But RemoveWhere use internally Remove method, which do care about the comparison. So, even if the RemoveWhere have found one element, if your comparison is inconstent, it will silently pass its way. That's a pretty weird implementation, no ?

Array sorting by two parameters

I'm having a little difficulty with the array.sort. I have a class and this class has two fields, one is a random string the other one is a random number. If i want to sort it with one parameter it just works fine. But i would like to sort it with two parameters. The first one is the SUM of the numbers(from low to high), and THEN if these numbers are equal by the random string that is give to them(from low to high).
Can you give some hint and tips how may i can "merge" these two kinds of sort?
Array.Sort(Phonebook, delegate(PBook user1, PBook user2)
{ return user1.Sum().CompareTo(user2.Sum()); });
Console.WriteLine("ORDER");
foreach (PBook user in Phonebook)
{
Console.WriteLine(user.name);
}
That's how i order it with one parameter.
i think this is what you are after:
sourcearray.OrderBy(a=> a.sum).ThenBy(a => a.random)
Here is the general algorithm that you'll use for comparing multiple fields in a CompareTo method:
public int compare(MyClass first, MyClass second)
{
int firstComparison = first.FirstValue.CompareTo(second.SecondValue);
if (firstComparison != 0)
{
return firstComparison;
}
else
{
return first.SecondValue.CompareTo(second.SecondValue);
}
}
However, LINQ does make the syntax for doing this much easier, allowing you to only write:
Phonebook = Phonebook.OrderBy(book=> book.Sum())
.ThenBy(book => book.OtherProperty)
.ToArray();
You can do this in-place by using a custom IComparer<PBook>. The following should order your array as per your original code, but if two sums are equal it should fall back on the random string (which I've called RandomString):
public class PBookComparer : IComparer<PBook>
{
public int Compare(PBook x, PBook y)
{
// Sort null items to the top; you can drop this
// if you don't care about null items.
if (x == null)
return y == null ? 0 : -1;
else if (y == null)
return 1;
// Comparison of sums.
var sumCompare = x.Sum().CompareTo(y.Sum());
if (sumCompare != 0)
return sumCompare;
// Sums are the same; return comparison of strings
return String.Compare(x.RandomString, y.RandomString);
}
}
You call this as
Array.Sort(Phonebook, new PBookComparer());
You could just do this inline but it gets a bit hard to follow:
Array.Sort(Phonebook, (x, y) => {
int sc = x.Sum().CompareTo(y.Sum());
return sc != 0 ? sc : string.Compare(x.RandomString, y.RandomString); });
... Actually, that isn't too bad, although I have dropped the null checks.

Compare two lists that contain a lot of objects (2th part)

Referring to the question that I previously asked:
Compare two lists that contain a lot of objects
It is impressive to see how fast that comparison is maide by implementing the IEqualityComparer interface: example here
As I mentioned in my other question this comparison helps me to backup a sourse folder on a destination folder. Know I want to sync to folders therefore I need to compare the dates of the files. Whenever I do something like:
public class MyFileComparer2 : IEqualityComparer<MyFile>
{
public bool Equals(MyFile s, MyFile d)
{
return
s.compareName.Equals(d.compareName) &&
s.size == d.size &&
s.deepness == d.deepness &&
s.dateModified.Date <= d.dateModified.Date; // This line does not work.
// I also tried comparing the strings by converting it to a string and it does
// not work. It does not give me an error but it does not seem to include the files
// where s.dateModified.Date < d.dateModified.Date
}
public int GetHashCode(MyFile a)
{
int rt = (a.compareName.GetHashCode() * 251 + a.size.GetHashCode() * 251 + a.deepness.GetHashCode() + a.dateModified.Date.GetHashCode());
return rt;
}
}
It will be nice if I could do something similar using greater or equal than signs. I also tried using the tick property and it does not work. Maybe I am doing something wrong. I believe it is not possible to compare things with the less than equal sign implementing this interface. Moreover, I don't understand how this Class works; I just know it is impressive how fast it iterates through the whole list.
Your whole approach is fundementally flawed because your IEqualityComparer.Equals method is not symmetric. This means Equals(file1, file2) does not equal Equals(file2, file1) because of the way you are using the less than operator.
The documentation:
IEqualityComparer.Equals Method
clearly states:
Notes to Implementers
The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x; and true for two objects x and z if it is true for x and y and also true for y and z.
Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.
Instead you need to use the IComparable interface or IEqualityComparer in combination with date comparisions. If you do not, things might seem to work for a while but you will regret it later.
Since the DateTime objects are different in the case when one DateTime is less than the other, you get different hashcodes for the objects s and d and the Equals method is not called. In order for your comparison of the dates to work, you should remove the date part from the GetHashCode method:
public int GetHashCode(MyFile a)
{
int rt = ((a.compareName.GetHashCode() * 251 + a.size.GetHashCode())
* 251 + a.deepness.GetHashCode()) *251;
return rt;
}
Your GetHashCode has a problem:
public int GetHashCode(MyFile a)
{
int rt = (((a.compareName.GetHashCode() * 251)
+ a.size.GetHashCode() * 251)
+ a.deepness.GetHashCode() *251)
+ a.dateModified.Date.GetHashCode();
return rt;
}
I changed the date part because I also needed the time therefore I use the ticks property instead. I got rid of the dateModified hashed code and it works great. here is how I modified my program. I was having trouble comparing the dates therefore I used the Ticks property.
public class MyFileComparer2 : IEqualityComparer<MyFile>
{
public bool Equals(MyFile s, MyFile d)
{
return
s.compareName.Equals(d.compareName) &&
s.size == d.size &&
s.deepness == d.deepness &&
//s.dateModified.Date <= d.dateModified.Date &&
s.dateModified.Ticks >= d.dateModified.Ticks
;
}
public int GetHashCode(MyFile a)
{
int rt = (((a.compareName.GetHashCode() * 251)
+ a.size.GetHashCode() * 251)
+ a.deepness.GetHashCode() * 251)
//+ a.dateModified.Ticks.GetHashCode();
;
return rt;
}
}
I still don't know how this hash code function works. The nice thing is that it works great.

IEqualityComparer for Value Objects

I have an immutable Value Object, IPathwayModule, whose value is defined by:
(int) Block;
(Entity) Module, identified by (string) ModuleId;
(enum) Status; and
(entity) Class, identified by (string) ClassId - which may be null.
Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
int hx = GetHashCode(x);
int hy = GetHashCode(y);
return hx == hy;
}
public int GetHashCode(IPathwayModule obj)
{
int h;
if (obj.Class != null)
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
}
else
{
h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
}
return h;
}
}
IPathwayModule is definitely immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.
I suppose my questions are:
Am I using the interface correctly in this case?
Are there cases where I might not see the desired behaviour?
Is there any way to improve the robustness, performance?
Are there any good practices that I am not following?
Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:
return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...
Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.
return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...
You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...
The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.
GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
if (obj.class != null)
h += obj.Class.ClassId.GetHashCode();
return h;
}
}
The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.
You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.
There could be multiple objects with the same hash code (this is called 'hash code collisions').
Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.
You should implement better versions of Equals and GetHashCode.
For instance, the hash code of enums is simply their numerical value.
In other words, with these two enums:
public enum A { x, y, z }
public enum B { k, l, m }
Then with your implementation, the following value type:
public struct AB {
public A;
public B;
}
the following two values would be considered equal:
AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };
I'm assuming you don't want that.
Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.
Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
It is proabably better to use ^ rather than + to combine the hash codes.
If I understand you well, you'd like to hear some comments on your code. Here're my remarks:
GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
You say your value types are immutable, but you reference objects (.Class), which are not immutable
Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.
Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.
EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.
Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:
private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
public bool Equals(IPathwayModule x, IPathwayModule y)
{
if (x == y) return true;
if (x == null || y == null) return false;
if ((x.Class == null) ^ (y.Class == null)) return false;
if (x.Class == null) //and implicitly y.Class == null
{
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
}
return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
}
public int GetHashCode(IPathwayModule obj)
{
unchecked {
int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
if (obj.Class != null)
{
h ^= obj.Class.ClassId.GetHashCode();
}
return h;
}
}
}

Categories