Compare two lists of nodes with total match - c#

I create the two lists of object, but cannot do total match value which is
var inputNodes = new List<nodes>()
{
new node() { nodeName= "D100", DataLength = 1 },
new node() { nodeName= "D101", DataLength = 1 },
new node() { nodeName= "D102", DataLength = 1 },
new node() { nodeName= "D103", DataLength = 1 },
new node() { nodeName= "D104", DataLength = 1 },
new node() { nodeName= "D105", DataLength = 1 },
new node() { nodeName = "D106", DataLength = 1 }
};
var inputNodes2 = new List<nodes>()
{
new node() { nodeName= "D100", DataLength = 1 },
new node() { nodeName= "D101", DataLength = 1 },
new node() { nodeName= "D102", DataLength = 1 },
new node() { nodeName= "D103", DataLength = 1 },
new node() { nodeName= "D104", DataLength = 1 },
new node() { nodeName= "D105", DataLength = 1 },
new node() { nodeName= "D106", DataLength = 1 }
};
I try to use check var isEqual = inputNodes.SequenceEqual(inputNodes2)
It return false and I don't want to use loop or list.select function
any idea for that ?

It seems to me that you are not familiar with the concept of equality, and how you can change the definition of equality to your definition. Hence I'll explain default equality and how to write an equality comparer that holds your idea of equality.
By default equality of objects is reference equality: two objects are equal if they refer to the same object:
Node A = new Node {...}
Node X = A;
Node Y = A;
Objects X and Y refer to the same object, and thus:
Assert(X == Y)
IEqualityComparer<Node> nodeComparer = EqualityComparer<Node>.Default;
Assert(nodeComparer.Equals(x, y));
However, in your case inputNodes[0] and inputNodes2[0] do not refer to the same object. Hence they are not equal Nodes, and thus SequenceEqual will return false.
You don't want to use the standard Equality comparison, you want a special one. According to your definition, two Nodes are equal, if the properties of the Nodes are equal. This definition of equality is called "value equality", as contrast to "reference equality"
Because you don't want to use the default reference equality, you'll have to write the equality comparer yourself. The easiest way to do this, is to derive a class from EqualityComparer.
public class NodeComparer : EqualityComparer<Node>
{
public static IEqualityComparer<Node> ValueComparer {get} = new NodeComparer();
public override bool Equals(Node x, Node y) {... TODO: implement}
public override int GetHashCode(node x) {... TODO: implement}
}
Usage will be as follows:
IEnumerable<Node> inputNodes1 = ...
IEnumerable<Node> inputNodes2 = ...
IEqualityComparer<Node> nodeComparer = NodeComparer.ValueComparer;
bool equalInputNodes = inputNodes1.SequenceEqual(inputNodes2, nodeComparer);
Equals
The definition depends on YOUR definition of equality. You can use any definition you need. In your case, you chose a straightforward "compare by value":
public override bool Equals(Node x, Node y)
{
// The following statements are almost always the same for every equality
if (x == null) return y == null; // true if both null
if (y == null) return false; // because x not null
if (Object.ReferenceEquals(x, y)) return true; // because same object
if (x.GetType() != y.GetType()) return false; // different types
In some occassions, these statements might be different. For example, if you want to create a string comparer where a null string equals an empty string:
string x = null;
string y = String.Empty;
IEqualityComparer<string> stringComparer = MyStringComparer.EmptyEqualsNull;
Assert(stringComparer.Equals(x, y));
Or, if you think that Teachers are Persons, than in some cases you might want that when you compare a Teacher with a Person, you might not want to check on the type.
But all in all, most comparers will use these four initial lines.
Continuing your equality:
return x.NodeName == y.NodeName
&& x.DataLength == y.DataLength;
To be prepared for the future, consider the following:
private static readonly IEqualityComparer<string> nodeNameComparer = StringComparer.Default;
and in your equals method:
return nodeNameComparer.Equals(x.NodeName, y.NodeName)
&& x.DataLength == y.DataLength;
So if in future you want to do a case insensitive string comparison, you only have to change the static declaration of your nodeNameComparer:
private static readonly IEqualityComparer<string> nodeNameComparer = StringComparer.OrdinalIgnoreCase;
GetHashCode
GetHashCode is meant to create a fast method to separate most unequal objects. This is useful, if your Node has two hundred properties, and you know, that if they have equal value for property Id, that very likely all other elements will be equal.
Note that I use "very likely". It is not guaranteed for 100% that if X has the same hashcode as Y, that X will equal Y. But you can be certain:
if X has a different hashcode than Y, then they will not be equal.
The only requirement for GetHashCode is that if X equals Y, then MyComparer.GetHashCode(X) equals MyComparer.GetHashCode(Y);
If X is not equal to Y, then you don't know whether their hashcodes will be different, although it would be nice if so, because code will be more efficient.
GetHashcode is meant to be fast, it doesn't have to check everything, it might be handy if it separates most elements, but it does not have to be a complete equality check.
How about this one:
public override int GetHashCode(Node x)
{
if (x == null) return 874283; // just a number
// for HashCode only use the NodeName:
return x.NodeName.GetHashCode();
}
Or, if you use a string comparer in method Equals for NodeName:
private static readonly IEqualityComparer<string> nodeNameComparer = StringComparer.OrdinalIgnoreCase;
// this comparer is used in Equals
public override int GetHashCode(Node x)
{
if (x == null) return 874283; // just a number
return nodenameComparer.GetHashCode(x.NodeName);
}
So if in future you change the comparison method for the nodename to CurrentCulture, then both Equals and GetHashCode will use the proper comparer.
Node a = new Node {nodeName= "X", DataLength = 1 };
Node b = new Node {nodeName= "X", DataLength = 1 };
Node c = new Node {nodeName= "X", DataLength = 2 };
Node d = new Node {nodeName= "Y", DataLength = 1 };
It is easy to see, that b equals a. c and d are different than a.
Although c is different, the comparer will return the same hashcode as for a.
So GetHashCode is not enough for exact equality, but a good GetHashCode will separate most different objects.

Use a IEqualityComparer like below.
class NodeComparer : IEqualityComparer<node>
{
public bool Equals(node? x, node? y)
{
if(x == null && y == null){
return true;
}
if(x == null || y == null)
{
return false;
}
return string.Equals(x.nodeName, y.nodeName) && x.DataLength == y.DataLength;
}
public int GetHashCode([DisallowNull] node obj)
{
return obj.nodeName.GetHashCode() * obj.DataLength.GetHashCode();
}
}
and then use it in the SequenceEquals
inputNodes.SequenceEqual(inputNodes2, new NodeComparer());

Related

How to use a primitive list or List as a key of a dictionary in C#

I'm trying to use an int array as key in C# and the behaviour I'm seeing is unexpected (for me).
var result = new Dictionary<int[], int>();
result[new [] {1, 1}] = 100;
result[new [] {1, 1}] = 200;
Assert.AreEqual(1, result.Count); // false is 2
It seems the same with List too.
var result = new Dictionary<List<int>, int>();
result[new List<int> { 1, 1 }] = 100;
result[new List<int> { 1, 1 }] = 200;
Assert.AreEqual(1, result.Count); // false is 2
I'm expecting the Dictionary to use Equals to decide if a Key is present in the map. This doesn't seem to be the case.
Can someone explain why and how I can get this sort of behaviour to work?
.NET lists and arrays do not have a built-in equality comparison, so you need to provide your own:
class ArrayEqComparer : IEqualityComparer<int[]> {
public static readonly IEqualityComparer<int[]> Instance =
new ArrayEqComparer();
public bool Equals(int[] b1, int[] b2) {
if (b2 == null && b1 == null)
return true;
else if (b1 == null | b2 == null)
return false;
return b1.SequenceEqual(b2);
}
public int GetHashCode(int[] a) {
return a.Aggregate(37, (p, v) => 31*v + p);
}
}
Now you can construct your dictionary as follows:
var result = new Dictionary<int[],int>(ArrayEqComparer.Instance);
The Dictionary class allows a custom equality comparer as a dictionary comparer. Implement IEqualityComparer> by providing a GetHashCode(IList obj) by returning the xor (the ^ operator) of all list elements (0 ^ first ^ second...) and Equals(IList x, IList y) by using Linq.Enumerable.SequenceEquals. Then pass an instance of that to the Dictionary constructor.
You are passing new object (array or list) as a key. As a new object it has a different reference so it is accepted as new key.

Compare a set of three strings with another

I am making a list of unique "set of 3 strings" from some data, in a way that if the 3 strings come together they become a set, and I can only have unique sets in my list.
A,B,C
B,C,D
D,E,F and so on
And I keep adding sets to the list if they do not exist in the list already, so that if I encounter these three strings together {A,B,C} I wont put it in the list again. So I have 2 questions. And the answer to second one actually depends on the answer of the first one.
How to store this set of 3 string, use List or array or concatenate them or anything else? (I may add it to a Dictionary to record their count as well but that's for later)
How to compare a set of 3 strings with another, irrespective of their order, obviously depending on the structure used? I want to know a proper solution to this rather than doing everything naively!
I am using C# by the way.
Either an array or a list is your best bet for storing the data, since as wentimo mentioned in a comment, concatenating them means that you are losing data that you may need. To steal his example, "ab" "cd "ef" concatenated together is the same as "abcd" "e" and "f" concatenated, but shouldn't be treated as equivalent sets.
To compare them, I would order the list alphabetically, then compare each value in order. That takes care of the fact that the order of the values doesn't matter.
A pseudocode example might look like this:
Compare(List<string> a, List<string> b)
{
a.Sort();
b.Sort();
if(a.Length == b.Length)
{
for(int i = 0; i < a.Length; i++)
{
if(a[i] != b[i])
{
return false;
}
}
return true;
}
else
{
return false;
}
}
Update
Now that you stated in a comment that performance is an imporatant consideration since you may have millions of these sets to compare and that you won't have duplicate elements in a set, here is a more optimized version of my code, note that I no longer have to sort the two lists, which will save quite a bit of time in executing this function.
Compare(List<string> a, List<string> b)
{
if(a.Length == b.Length)
{
for(int i = 0; i < a.Length; i++)
{
if(!b.Contains(a[i]))
{
return false;
}
}
return true;
}
else
{
return false;
}
}
DrewJordan's approach of using a hashtable is still probably than my approach, since it just has to sort each set of three and then can do the comparison to your existing sets much faster than my approach can.
Probably the best way is to use a HashSet, if you don't need to have duplicate elements in your sets. It sounds like each set of 3 has 3 unique elements; if that is actually the case, I would combine a HashSet approach with the concatenation that you already worked out, i.e. order the elements, combine with some separator, and then add the concatenated elements to a HashSet which will prevent duplicates from ever occuring in the first place.
If your sets of three could have duplicate elements, then Kevin's approach is what you're going to have to do for each. You might get some better performance from using a list of HashSets for each set of three, but with only three elements the overhead of creating a hash for each element of potentially millions of sets seems like it would perform worse then just iterating over them once.
here is a simple string-wrapper for you:
/// The wrapper for three strings
public class StringTriplet
{
private List<string> Store;
// accessors to three source strings:
public string A { get; private set; }
public string B { get; private set; }
public string C { get; private set; }
// constructor (need to feel internal storage)
public StringTriplet(string a, string b, string c)
{
this.Store = new List<string>();
this.Store.Add(a);
this.Store.Add(b);
this.Store.Add(c);
// sort is reqiured, cause later we don't want to compare all strings each other
this.Store.Sort();
this.A = a;
this.B = b;
this.C = c;
}
// additional method. you could add IComparable declaration to the entire class, but it is not necessary in your task...
public int CompareTo(StringTriplet obj)
{
if (null == obj)
return -1;
int cmp;
cmp = this.Store.Count.CompareTo(obj.Store.Count);
if (0 != cmp)
return cmp;
for (int i = 0; i < this.Store.Count; i++)
{
if (null == this.Store[i])
return 1;
cmp = this.Store[i].CompareTo(obj.Store[i]);
if ( 0 != cmp )
return cmp;
}
return 0;
}
// additional method. it is a good practice : override both 'Equals' and 'GetHashCode'. See below..
override public bool Equals(object obj)
{
if (! (obj is StringTriplet))
return false;
var t = obj as StringTriplet;
return ( 0 == this.CompareTo(t));
}
// necessary method . it will be implicitly used on adding values to the HashSet
public override int GetHashCode()
{
int res = 0;
for (int i = 0; i < this.Store.Count; i++)
res = res ^ (null == this.Store[i] ? 0 : this.Store[i].GetHashCode()) ^ i;
return res;
}
}
Now you could just create hashset and add values:
var t = new HashSet<StringTriplet> ();
t.Add (new StringTriplet ("a", "b", "c"));
t.Add (new StringTriplet ("a", "b1", "c"));
t.Add (new StringTriplet ("a", "b", "c")); // dup
t.Add (new StringTriplet ("a", "c", "b")); // dup
t.Add (new StringTriplet ("1", "2", "3"));
t.Add (new StringTriplet ("1", "2", "4"));
t.Add (new StringTriplet ("3", "2", "1"));
foreach (var s in t) {
Console.WriteLine (s.A + " " + s.B + " " + s.C);
}
return 0;
You can inherit from List<String> and override Equals() and GetHashCode() methods:
public class StringList : List<String>
{
public override bool Equals(object obj)
{
StringList other = obj as StringList;
if (other == null) return false;
return this.All(x => other.Contains(x));
}
public override int GetHashCode()
{
unchecked
{
int hash = 19;
foreach (String s in this)
{
hash = hash + s.GetHashCode() * 31;
}
return hash;
}
}
}
Now, you can use HashSet<StringList> to store only unique sets

LINQ (or something else) to compare a pair of values from two lists (in any order)?

Basically, I have two IEnumerable<FooClass>s where each FooClass instance contains 2 properties: FirstName, LastName.
The instances on each of the enumerables is NOT the same. Instead, I need to check against the properties on each of the instances. I'm not sure of the most efficient way to do this, but basically I need to make sure that both lists contain similar data (not the same instance, but the same values on the properties). I don't have access to the FooClass itself to modify it.
I should say that the FooClass is a type of Attribute class, which has access to the Attribute.Match() method, so I don't need to check each properties individually.
Based on the comments, I've updated the question to be more specific and changed it slightly... This is what I have so far:
public void Foo()
{
var info = typeof(MyClass);
var attributes = info.GetCustomAttributes(typeof(FooAttribute), false) as IEnumerable<FooAttribute>;
var validateAttributeList = new Collection<FooAttribute>
{
new FooAttribute(typeof(int), typeof(double));
new FooAttribute(typeof(int), typeof(single));
};
//Make sure that the each item in validateAttributeList is contained in
//the attributes list (additional items in the attributes list don't matter).
//I know I can use the Attribute.Match(obj) to compare.
}
Enumerable.SequenceEqual will tell you if the two sequences are identical.
If FooClass has an overridden Equals method that compares the FirstName and LastName, then you should be able to write:
bool equal = List1.SequenceEqual(List2);
If FooClass doesn't have an overridden Equals method, then you need to create an IEqualityComparer<FooClass>:
class FooComparer: IEqualityComparer<FooClass>
{
public bool Equals(FooClass f1, FooClass f2)
{
return (f1.FirstName == f2.FirstName) && (f1.LastName == f2.LastName);
}
public int GetHashCode()
{
return FirstName.GetHashCode() ^ LastName.GetHashCode();
}
}
and then you write:
var comparer = new FooComparer();
bool identical = List1.SequenceEqual(List2, comparer);
You can do in this way:
Define a custom IEqualityComparer<FooAttribute> :
class FooAttributeComparer : IEqualityComparer<FooAttribute>
{
public bool Equals(FooAttribute x, FooAttribute y)
{
return x.Match(y);
}
public int GetHashCode(FooAttribute obj)
{
return 0;
// This makes lookups complexity O(n) but it could be reasonable for small lists
// or if you're not sure about GetHashCode() implementation to do.
// If you want more speed you could return e.g. :
// return obj.Field1.GetHashCode() ^ (17 * obj.Field2.GetHashCode());
}
}
Define an extension method to compare lists in any order and having the same number of equal elements:
public static bool ListContentIsEqualInAnyOrder<T>(
this IEnumerable<T> list1, IEnumerable<T> list2, IEqualityComparer<T> comparer)
{
var lookup1 = list1.ToLookup(x => x, comparer);
var lookup2 = list2.ToLookup(x => x, comparer);
if (lookup1.Count != lookup2.Count)
return false;
return lookup1.All(el1 => lookup2.Contains(el1.Key) &&
lookup2[el1.Key].Count() == el1.Count());
}
Usage example:
static void Main(string[] args)
{
List<FooAttribute> attrs = new List<FooAttribute>
{
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(bool), typeof(float)),
new FooAttribute(typeof(uint), typeof(string)),
};
List<FooAttribute> attrs2 = new List<FooAttribute>
{
new FooAttribute(typeof(uint), typeof(string)),
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(bool), typeof(float)),
};
// this returns true
var listEqual1 = attrs.ListContentIsEqualInAnyOrder(attrs2, new FooAttributeComparer());
// this returns false
attrs2.RemoveAt(1);
var listEqual2 = attrs.ListContentIsEqualInAnyOrder(attrs2, new FooAttributeComparer());
}
Assuming that
The lists both fit in memory and are unsorted
Case doesn't matter
Names don't contain the character "!"
Names do not contain duplicates:
then
var setA = new HashSet<String>(
firstEnumerable.Select(i => i.FirstName.ToUpper() + "!" + i.LastName.ToUpper()));
var setB = new HashSet<String>(
secondEnumerable.Select(i => i.FirstName.ToUpper() + "!" + i.LastName.ToUpper()));
return setA.SetEquals(setB);

Equality of two structs in C#

I look for an equality between two instances of this struct.
public struct Serie<T>
{
T[] X;
double[] Y;
public Serie(T[] x, double[] y)
{
X = x;
Y = y;
}
public override bool Equals(object obj)
{
return obj is Serie<T> && this == (Serie<T>)obj;
}
public static bool operator ==(Serie<T> s1, Serie<T> s2)
{
return s1.X == s2.X && s1.Y == s2.Y;
}
public static bool operator !=(Serie<T> s1, Serie<T> s2)
{
return !(s1 == s2);
}
This doesn't work. What am I missing?
double[] xa = { 2, 3 };
double[] ya = { 1, 2 };
double[] xb = { 2, 3 };
double[] yb = { 1, 2 };
Serie<double> A = new Serie<double>(xa, ya);
Serie<double> B = new Serie<double>(xb, yb);
Assert.AreEqual(A, B);
You're comparing the array references rather than their contents. ya and yb refer to different arrays. If you want to check the contents of the arrays, you'll have to do so explicitly.
I don't think there's anything built into the framework to do that for you, I'm afraid. Something like this should work though:
public static bool ArraysEqual<T>(T[] first, T[] second)
{
if (object.ReferenceEquals(first, second))
{
return true;
}
if (first == null || second == null)
{
return false;
}
if (first.Length != second.Length)
{
return false;
}
IEqualityComparer comparer = EqualityComparer<T>.Default;
for (int i = 0; i < first.Length; i++)
{
if (!comparer.Equals(first[i], second[i]))
{
return false;
}
}
return true;
}
As an aside, your structs are sort of mutable in that the array contents can be changed after the struct is created. Do you really need this to be a struct?
EDIT: As Nick mentioned in the comments, you should really override GetHashCode as well. Again, you'll need to get the contents from the arrays (and again, this will cause problems if the arrays get changed afterwards). Similar utility method:
public static int GetHashCode<T>(T[] array)
{
if (array == null)
{
return 0;
}
IEqualityComparer comparer = EqualityComparer<T>.Default;
int hash = 17;
foreach (T item in array)
{
hash = hash * 31 + comparer.GetHashCode(item);
}
return hash;
}
I don't think there's anything built into the framework to do that for you, I'm afraid
In 4.0, there is:
StructuralComparisons.StructuralEqualityComparer.Equals(firstArray, secondArray);
You should compare the contents of the Array in your Equality logic ...
Also, it is recommended that you implement IEquatable<T> interface on your struct, as this prevents boxing/unboxing issues in some cases.
http://blogs.msdn.com/jaredpar/archive/2009/01/15/if-you-implement-iequatable-t-you-still-must-override-object-s-equals-and-gethashcode.aspx
The part s1.Y == s2.Y tests if they are 2 references to the same array instance, not if the contents are equal. So despite the title, this question is actually about equality between array(-reference)s.
Some additional advice: Since you are overloading you should design Serie<> as immutable and because of the embedded array I would make it a class instead of a struct.
Calling == performs reference equality on arrays - they don't compare the contents of their elements. This basically means that a1 == a2 will only return true if the exact same instance - which isn't what you want, I think..
You need to modify your operator == to compere the contents of the x array, not it's reference value.
If you're using .NET 3.5 (with link) you can do:
public static bool operator ==(Serie<T> s1, Serie<T> s2)
{
return ((s1.X == null && s2.X == null) || s1.X.SequenceEquals( s2.X ))
&& s1.Y == s2.Y;
}
If you need to do deep comparison (beyond references), you can supply SequenceEquals with a custom IEqualityComparer for the type of T.
You probably should also consider implementing the IEquatable<T> interface for your struct. It will help your code work better with LINQ and other parts of the .NET framework that perform object comparisons.
You can create a private accessor for your struct and use CollectionAssert:
[TestMethod()]
public void SerieConstructorTest()
{
double[] xa = { 2, 3 };
double[] ya = { 1, 2 };
double[] xb = { 2, 3 };
double[] yb = { 1, 2 };
var A = new Serie_Accessor<double>(xa, ya);
var B = new Serie_Accessor<double>(xb, yb);
CollectionAssert.AreEqual(A.X, B.X);
CollectionAssert.AreEqual(A.Y, B.Y);
}
This code works fine.
References:
CollectionAssert.AreEqual Method
How to create private accessor

Decorate-Sort-Undecorate, how to sort an alphabetic field in descending order

I've got a large set of data for which computing the sort key is fairly expensive. What I'd like to do is use the DSU pattern where I take the rows and compute a sort key. An example:
Qty Name Supplier
Row 1: 50 Widgets IBM
Row 2: 48 Thingies Dell
Row 3: 99 Googaws IBM
To sort by Quantity and Supplier I could have the sort keys: 0050 IBM, 0048 Dell, 0099 IBM. The numbers are right-aligned and the text is left-aligned, everything is padded as needed.
If I need to sort by the Quanty in descending order I can just subtract the value from a constant (say, 10000) to build the sort keys: 9950 IBM, 9952 Dell, 9901 IBM.
How do I quickly/cheaply build a descending key for the alphabetic fields in C#?
[My data is all 8-bit ASCII w/ISO 8859 extension characters.]
Note: In Perl, this could be done by bit-complementing the strings:
$subkey = $string ^ ( "\xFF" x length $string );
Porting this solution straight into C# doesn't work:
subkey = encoding.GetString(encoding.GetBytes(stringval).
Select(x => (byte)(x ^ 0xff)).ToArray());
I suspect because of the differences in the way that strings are handled in C#/Perl. Maybe Perl is sorting in ASCII order and C# is trying to be smart?
Here's a sample piece of code that tries to accomplish this:
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
List<List<string>> sample = new List<List<string>>() {
new List<string>() { "", "apple", "table" },
new List<string>() { "", "apple", "chair" },
new List<string>() { "", "apple", "davenport" },
new List<string>() { "", "orange", "sofa" },
new List<string>() { "", "peach", "bed" },
};
foreach(List<string> line in sample)
{
StringBuilder sb = new StringBuilder();
string key1 = line[1].PadRight(10, ' ');
string key2 = line[2].PadRight(10, ' ');
// Comment the next line to sort desc, desc
key2 = encoding.GetString(encoding.GetBytes(key2).
Select(x => (byte)(x ^ 0xff)).ToArray());
sb.Append(key2);
sb.Append(key1);
line[0] = sb.ToString();
}
List<List<string>> output = sample.OrderBy(p => p[0]).ToList();
return;
You can get to where you want, although I'll admit I don't know whether there's a better overall way.
The problem you have with the straight translation of the Perl method is that .NET simply will not allow you to be so laissez-faire with encoding. However, if as you say your data is all printable ASCII (ie consists of characters with Unicode codepoints in the range 32..127) - note that there is no such thing as '8-bit ASCII' - then you can do this:
key2 = encoding.GetString(encoding.GetBytes(key2).
Select(x => (byte)(32+95-(x-32))).ToArray());
In this expression I have been explicit about what I'm doing:
Take x (which I assume to be in 32..127)
Map the range to 0..95 to make it zero-based
Reverse by subtracting from 95
Add 32 to map back to the printable range
It's not very nice but it does work.
Just write an IComparer that would work as a chain of comparators.
In case of equality on each stage, it should pass eveluation to the next key part. If it's less then, or greater then, just return.
You need something like this:
int comparision = 0;
foreach(i = 0; i < n; i++)
{
comparision = a[i].CompareTo(b[i]) * comparisionSign[i];
if( comparision != 0 )
return comparision;
}
return comparision;
Or even simpler, you can go with:
list.OrderBy(i=>i.ID).ThenBy(i=>i.Name).ThenByDescending(i=>i.Supplier);
The first call return IOrderedEnumerable<>, the which can sort by additional fields.
Answering my own question (but not satisfactorily). To construct a descending alphabetic key I used this code and then appended this subkey to the search key for the object:
if ( reverse )
subkey = encoding.GetString(encoding.GetBytes(subkey)
.Select(x => (byte)(0x80 - x)).ToArray());
rowobj.sortKey.Append(subkey);
Once I had the keys built, I couldn't just do this:
rowobjList.Sort();
Because the default comparator isn't in ASCII order (which my 0x80 - x trick relies on). So then I had to write an IComparable<RowObject> that used the Ordinal sorting:
public int CompareTo(RowObject other)
{
return String.Compare(this.sortKey, other.sortKey,
StringComparison.Ordinal);
}
This seems to work. I'm a little dissatisfied because it feels clunky in C# with the encoding/decoding of the string.
If a key computation is expensive, why compute a key at all? String comparision by itself is not free, it's actually expensive loop through the characters and is not going to perform any better then a custom comparision loop.
In this test custom comparision sort performs about 3 times better then DSU.
Note that DSU key computation is not measured in this test, it's precomputed.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace DSUPatternTest
{
[TestClass]
public class DSUPatternPerformanceTest
{
public class Row
{
public int Qty;
public string Name;
public string Supplier;
public string PrecomputedKey;
public void ComputeKey()
{
// Do not need StringBuilder here, String.Concat does better job internally.
PrecomputedKey =
Qty.ToString().PadLeft(4, '0') + " "
+ Name.PadRight(12, ' ') + " "
+ Supplier.PadRight(12, ' ');
}
public bool Equals(Row other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.Qty == Qty && Equals(other.Name, Name) && Equals(other.Supplier, Supplier);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Row)) return false;
return Equals((Row) obj);
}
public override int GetHashCode()
{
unchecked
{
int result = Qty;
result = (result*397) ^ (Name != null ? Name.GetHashCode() : 0);
result = (result*397) ^ (Supplier != null ? Supplier.GetHashCode() : 0);
return result;
}
}
}
public class RowComparer : IComparer<Row>
{
public int Compare(Row x, Row y)
{
int comparision;
comparision = x.Qty.CompareTo(y.Qty);
if (comparision != 0) return comparision;
comparision = x.Name.CompareTo(y.Name);
if (comparision != 0) return comparision;
comparision = x.Supplier.CompareTo(y.Supplier);
return comparision;
}
}
[TestMethod]
public void CustomLoopIsFaster()
{
var random = new Random();
var rows = Enumerable.Range(0, 5000).Select(i =>
new Row
{
Qty = (int) (random.NextDouble()*9999),
Name = random.Next().ToString(),
Supplier = random.Next().ToString()
}).ToList();
foreach (var row in rows)
{
row.ComputeKey();
}
var dsuSw = Stopwatch.StartNew();
var sortedByDSU = rows.OrderBy(i => i.PrecomputedKey).ToList();
var dsuTime = dsuSw.ElapsedMilliseconds;
var customSw = Stopwatch.StartNew();
var sortedByCustom = rows.OrderBy(i => i, new RowComparer()).ToList();
var customTime = customSw.ElapsedMilliseconds;
Trace.WriteLine(dsuTime);
Trace.WriteLine(customTime);
CollectionAssert.AreEqual(sortedByDSU, sortedByCustom);
Assert.IsTrue(dsuTime > customTime * 2.5);
}
}
}
If you need to build a sorter dynamically you can use something like this:
var comparerChain = new ComparerChain<Row>()
.By(r => r.Qty, false)
.By(r => r.Name, false)
.By(r => r.Supplier, false);
var sortedByCustom = rows.OrderBy(i => i, comparerChain).ToList();
Here is a sample implementation of comparer chain builder:
public class ComparerChain<T> : IComparer<T>
{
private List<PropComparer<T>> Comparers = new List<PropComparer<T>>();
public int Compare(T x, T y)
{
foreach (var comparer in Comparers)
{
var result = comparer._f(x, y);
if (result != 0)
return result;
}
return 0;
}
public ComparerChain<T> By<Tp>(Func<T,Tp> property, bool descending) where Tp:IComparable<Tp>
{
Comparers.Add(PropComparer<T>.By(property, descending));
return this;
}
}
public class PropComparer<T>
{
public Func<T, T, int> _f;
public static PropComparer<T> By<Tp>(Func<T,Tp> property, bool descending) where Tp:IComparable<Tp>
{
Func<T, T, int> ascendingCompare = (a, b) => property(a).CompareTo(property(b));
Func<T, T, int> descendingCompare = (a, b) => property(b).CompareTo(property(a));
return new PropComparer<T>(descending ? descendingCompare : ascendingCompare);
}
public PropComparer(Func<T, T, int> f)
{
_f = f;
}
}
It works a little bit slower, maybe because of property binging delegate calls.

Categories