Compare a set of three strings with another - c#

I am making a list of unique "set of 3 strings" from some data, in a way that if the 3 strings come together they become a set, and I can only have unique sets in my list.
A,B,C
B,C,D
D,E,F and so on
And I keep adding sets to the list if they do not exist in the list already, so that if I encounter these three strings together {A,B,C} I wont put it in the list again. So I have 2 questions. And the answer to second one actually depends on the answer of the first one.
How to store this set of 3 string, use List or array or concatenate them or anything else? (I may add it to a Dictionary to record their count as well but that's for later)
How to compare a set of 3 strings with another, irrespective of their order, obviously depending on the structure used? I want to know a proper solution to this rather than doing everything naively!
I am using C# by the way.

Either an array or a list is your best bet for storing the data, since as wentimo mentioned in a comment, concatenating them means that you are losing data that you may need. To steal his example, "ab" "cd "ef" concatenated together is the same as "abcd" "e" and "f" concatenated, but shouldn't be treated as equivalent sets.
To compare them, I would order the list alphabetically, then compare each value in order. That takes care of the fact that the order of the values doesn't matter.
A pseudocode example might look like this:
Compare(List<string> a, List<string> b)
{
a.Sort();
b.Sort();
if(a.Length == b.Length)
{
for(int i = 0; i < a.Length; i++)
{
if(a[i] != b[i])
{
return false;
}
}
return true;
}
else
{
return false;
}
}
Update
Now that you stated in a comment that performance is an imporatant consideration since you may have millions of these sets to compare and that you won't have duplicate elements in a set, here is a more optimized version of my code, note that I no longer have to sort the two lists, which will save quite a bit of time in executing this function.
Compare(List<string> a, List<string> b)
{
if(a.Length == b.Length)
{
for(int i = 0; i < a.Length; i++)
{
if(!b.Contains(a[i]))
{
return false;
}
}
return true;
}
else
{
return false;
}
}
DrewJordan's approach of using a hashtable is still probably than my approach, since it just has to sort each set of three and then can do the comparison to your existing sets much faster than my approach can.

Probably the best way is to use a HashSet, if you don't need to have duplicate elements in your sets. It sounds like each set of 3 has 3 unique elements; if that is actually the case, I would combine a HashSet approach with the concatenation that you already worked out, i.e. order the elements, combine with some separator, and then add the concatenated elements to a HashSet which will prevent duplicates from ever occuring in the first place.
If your sets of three could have duplicate elements, then Kevin's approach is what you're going to have to do for each. You might get some better performance from using a list of HashSets for each set of three, but with only three elements the overhead of creating a hash for each element of potentially millions of sets seems like it would perform worse then just iterating over them once.

here is a simple string-wrapper for you:
/// The wrapper for three strings
public class StringTriplet
{
private List<string> Store;
// accessors to three source strings:
public string A { get; private set; }
public string B { get; private set; }
public string C { get; private set; }
// constructor (need to feel internal storage)
public StringTriplet(string a, string b, string c)
{
this.Store = new List<string>();
this.Store.Add(a);
this.Store.Add(b);
this.Store.Add(c);
// sort is reqiured, cause later we don't want to compare all strings each other
this.Store.Sort();
this.A = a;
this.B = b;
this.C = c;
}
// additional method. you could add IComparable declaration to the entire class, but it is not necessary in your task...
public int CompareTo(StringTriplet obj)
{
if (null == obj)
return -1;
int cmp;
cmp = this.Store.Count.CompareTo(obj.Store.Count);
if (0 != cmp)
return cmp;
for (int i = 0; i < this.Store.Count; i++)
{
if (null == this.Store[i])
return 1;
cmp = this.Store[i].CompareTo(obj.Store[i]);
if ( 0 != cmp )
return cmp;
}
return 0;
}
// additional method. it is a good practice : override both 'Equals' and 'GetHashCode'. See below..
override public bool Equals(object obj)
{
if (! (obj is StringTriplet))
return false;
var t = obj as StringTriplet;
return ( 0 == this.CompareTo(t));
}
// necessary method . it will be implicitly used on adding values to the HashSet
public override int GetHashCode()
{
int res = 0;
for (int i = 0; i < this.Store.Count; i++)
res = res ^ (null == this.Store[i] ? 0 : this.Store[i].GetHashCode()) ^ i;
return res;
}
}
Now you could just create hashset and add values:
var t = new HashSet<StringTriplet> ();
t.Add (new StringTriplet ("a", "b", "c"));
t.Add (new StringTriplet ("a", "b1", "c"));
t.Add (new StringTriplet ("a", "b", "c")); // dup
t.Add (new StringTriplet ("a", "c", "b")); // dup
t.Add (new StringTriplet ("1", "2", "3"));
t.Add (new StringTriplet ("1", "2", "4"));
t.Add (new StringTriplet ("3", "2", "1"));
foreach (var s in t) {
Console.WriteLine (s.A + " " + s.B + " " + s.C);
}
return 0;

You can inherit from List<String> and override Equals() and GetHashCode() methods:
public class StringList : List<String>
{
public override bool Equals(object obj)
{
StringList other = obj as StringList;
if (other == null) return false;
return this.All(x => other.Contains(x));
}
public override int GetHashCode()
{
unchecked
{
int hash = 19;
foreach (String s in this)
{
hash = hash + s.GetHashCode() * 31;
}
return hash;
}
}
}
Now, you can use HashSet<StringList> to store only unique sets

Related

Merging two arrays in C# [duplicate]

This question already has answers here:
How do I concatenate two arrays in C#?
(23 answers)
Joining two lists together
(15 answers)
how to sort a string array by alphabet?
(3 answers)
Sort a list alphabetically
(5 answers)
Closed 4 years ago.
In my code I have 2 arrays and I want to merge the both using right sequence. and save value to 3rd array. I tried a lot but could not find perfect solution.
public void Toolchange_T7()
{
for (int T7 = 0; T7 < sModulename_listofsafetysensor.Length; T7++)
{
if (sModulename_listofsafetysensor[T7] != null && sModulename_listofsafetysensor[T7].Contains("IR") && sModulename_listofsafetysensor[T7].Contains("FS"))
{
sElement_toolchanger[iET7] = sModulename_listofsafetysensor[T7];
iET7++;
}
}
for (int T7 = 0; T7 < sDesignation_toolchanger_t7.Length; T7++)
{
if (sDesignation_toolchanger_t7[T7] != null && sDesignation_toolchanger_t7[T7].Contains("IR") && sDesignation_toolchanger_t7[T7].Contains("FW"))
{
sDesignation_toolchanger[iMT7] = sDesignation_toolchanger_t7[T7];
iMT7++;
}
}
}
sElement_toolchanger contains:
++ST010+IR001+FW001
++ST010+IR002+FW001
++ST010+IR006+FW001
sDesignation_toolchanger contains:
++ST010+IR001.FS001
++ST010+IR001.FS002
++ST010+IR002.FS001
++ST010+IR002.FS002
++ST010+IR006.FS001
++ST010+IR006.FS002
My desired output is:
++ST010+IR001+FW001
++ST010+IR001.FS001
++ST010+IR001.FS002
++ST010+IR002+FW001
++ST010+IR002.FS001
++ST010+IR002.FS002
++ST010+IR006+FW001
++ST010+IR006.FS001
++ST010+IR006.FS002
It will be very helpful if some one know perfect solution
using System.Collections;
var mergedAndSorted = list1.Union(list2).OrderBy(o => o);
Simplest would be to:
Convert the arrays to lists:
var List1 = new List<string>(myArray1);
var List2 = new List<string>(myArray2);
Merge the two lists together:
List1.AddRange(List2);
and sort them.
List1.Sort();
According to what you said in the comments, here is a small function that will take one item from the first array, then two from the second array and so on to make a third one.
This code could be improved...
static void Main(string[] args)
{
string[] t1 = new string[] { "a", "b", "c" };
string[] t2 = new string[] { "a1", "a2", "b1", "b2", "c1", "c2" };
List<string> merged = Merge(t1.ToList(), t2.ToList());
foreach (string item in merged)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static List<T> Merge<T>(List<T> first, List<T> second)
{
List<T> ret = new List<T>();
for (int indexFirst = 0, indexSecond = 0;
indexFirst < first.Count && indexSecond < second.Count;
indexFirst++, indexSecond+= 2)
{
ret.Add(first[indexFirst]);
ret.Add(second[indexSecond]);
ret.Add(second[indexSecond + 1]);
}
return ret;
}
An example here
From your given sample Output, it doesn't look to be alphabetically sorted. In that scenario, you would need to use a Custom Comparer along with your OrderBy Clause after taking the Union.
For example, (since your sorting algorithm is unknown, am assuming one here for showing the example)
var first = new[]{"++ST010+IR001+FW001","++ST010+IR002+FW001","++ST010+IR006+FW001"};
var second = new[]{"++ST010+IR001.FS001",
"++ST010+IR001.FS002",
"++ST010+IR002.FS001",
"++ST010+IR002.FS002",
"++ST010+IR006.FS001",
"++ST010+IR006.FS002"};
var customComparer = new CustomComparer();
var result = first.Union(second).OrderBy(x=>x,customComparer);
Where your custom comparer is defined as
public class CustomComparer : IComparer<string>
{
int IComparer<string>.Compare(string x, string y)
{
var items1 = x.ToCharArray();
var items2 = y.ToCharArray();
var temp = items1.Zip(items2,(a,b)=> new{Item1 = a, Item2=b});
var difference = temp.FirstOrDefault(item=>item.Item1!=item.Item2);
if(difference!=null)
{
if(difference.Item1=='.' && difference.Item2=='+')
return 1;
if(difference.Item1=='+' && difference.Item2=='.')
return -1;
}
return string.Compare(x,y);
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
Output
Here's a solution if the two input arrays are already sorted. Since I suspect your comparison function is non-straightforward, I include a separate comparison function (not very efficient, but it will do). The comparison function splits the string into two (the alpha part and the numeric part) and then compares them so that the numeric part is more "important" in the sort.
Note that it doesn't do any sorting - it relies on the inputs being sorted. It just picks the lower valued item from the current position in one of the two arrays for transfer to the output array. The result is O(N).
The code makes a single pass through the two arrays (in one loop). When I run this, the output is AA00, AA01, AB01, AB02, AC02, AA03, AA04. I'm sure there are opportunities to make this code cleaner, I just popped it off. However, it should give you ideas to continue:
public class TwoArrays
{
private string[] _array1 = {"AA01", "AB01", "AB02", "AC02"};
private string[] _array2 = {"AA00", "AA03", "AA04"};
public void TryItOut()
{
var result = ConcatenateSorted(_array1, _array2);
var concatenated = string.Join(", ", result);
}
private int Compare(string a, string b)
{
if (a == b)
{
return 0;
}
string a1 = a.Substring(0, 2);
string a2 = a.Substring(2, 2);
string b1 = b.Substring(0, 2);
string b2 = b.Substring(2, 2);
return string.Compare(a2 + a1, b2 + b1);
}
private string[] ConcatenateSorted(string[] a, string[] b)
{
string[] ret = new string[a.Length + b.Length];
int aIndex = 0, bIndex = 0, retIndex = 0;
while (true) //do this forever, until time to "break" and return
{
if (aIndex >= a.Length && bIndex >= b.Length)
{
return ret;
}
if (aIndex >= a.Length)
{
ret[retIndex++] = b[bIndex++];
continue;
}
if (bIndex >= b.Length)
{
ret[retIndex++] = a[aIndex++];
continue;
}
if (Compare(a[aIndex], b[bIndex]) > 0)
{
ret[retIndex++] = b[bIndex++];
}
else
{
ret[retIndex++] = a[aIndex++];
}
}
}
}
To get it to work with your data, you need to change the input arrays and provide your own comparison function (which you could pass in as a delegate).

algorithm for selecting N random elements from a List<T> in C# [duplicate]

This question already has answers here:
Randomize a List<T>
(28 answers)
Closed 6 years ago.
I need a quick algorithm to select 4 random elements from a generic list. For example, I'd like to get 4 random elements from a List and then based on some calculations if elements found not valid then it should again select next 4 random elements from the list.
You could do it like this
public static class Extensions
{
public static Dictionary<int, T> GetRandomElements<T>(this IList<T> list, int quantity)
{
var result = new Dictionary<int, T>();
if (list == null)
return result;
Random rnd = new Random(DateTime.Now.Millisecond);
for (int i = 0; i < quantity; i++)
{
int idx = rnd.Next(0, list.Count);
result.Add(idx, list[idx]);
}
return result;
}
}
Then use the extension method like this:
List<string> list = new List<string>() { "a", "b", "c", "d", "e", "f", "g", "h" };
Dictionary<int, string> randomElements = list.GetRandomElements(3);
foreach (KeyValuePair<int, string> elem in randomElements)
{
Console.WriteLine($"index in original list: {elem.Key} value: {elem.Value}");
}
something like that:
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
list.Add(5);
int n = 4;
var rand = new Random();
var randomObjects = new List<int>();
for (int i = 0; i<n; i++)
{
var index = rand.Next(list.Count);
randomObjects.Add(list[index]);
}
}
}
You can store indexes in some list to get non-repeated indexes:
List<T> GetRandomElements<T>(List<T> allElements, int randomCount = 4)
{
if (allElements.Count < randomCount)
{
return allElements;
}
List<int> indexes = new List<int>();
// use HashSet if performance is very critical and you need a lot of indexes
//HashSet<int> indexes = new HashSet<int>();
List<T> elements = new List<T>();
Random random = new Random();
while (indexes.Count < randomCount)
{
int index = random.Next(allElements.Count);
if (!indexes.Contains(index))
{
indexes.Add(index);
elements.Add(allElements[index]);
}
}
return elements;
}
Then you can do some calculation and call this method:
void Main(String[] args)
{
do
{
List<int> elements = GetRandomelements(yourElements);
//do some calculations
} while (some condition); // while result is not right
}
Suppose that the length of the List is N. Now suppose that you will put these 4 numbers in another List called out. Then you can loop through the List and the probability of the element you are on being chosen is
(4 - (out.Count)) / (N - currentIndex)
funcion (list)
(
loop i=0 i < 4
index = (int) length(list)*random(0 -> 1)
element[i] = list[index]
return element
)
while(check == false)
(
elements = funcion (list)
Do some calculation which returns check == false /true
)
This is the pseudo code, but i think you should of come up with this yourself.
Hope it helps:)
All the answers up to now have one fundamental flaw; you are asking for an algorithm that will generate a random combination of n elements and this combination, following some logic rules, will be valid or not. If its not, a new combination should be produced. Obviously, this new combination should be one that has never been produced before. All the proposed algorithms do not enforce this. If for example out of 1000000 possible combinations, only one is valid, you might waste a whole lot of resources until that particular unique combination is produced.
So, how to solve this? Well, the answer is simple, create all possible unique solutions, and then simply produce them in a random order. Caveat: I will suppose that the input stream has no repeating elements, if it does, then some combinations will not be unique.
First of all, lets write ourselves a handy immutable stack:
class ImmutableStack<T> : IEnumerable<T>
{
public static readonly ImmutableStack<T> Empty = new ImmutableStack<T>();
private readonly T head;
private readonly ImmutableStack<T> tail;
public int Count { get; }
private ImmutableStack()
{
Count = 0;
}
private ImmutableStack(T head, ImmutableStack<T> tail)
{
this.head = head;
this.tail = tail;
Count = tail.Count + 1;
}
public T Peek()
{
if (this == Empty)
throw new InvalidOperationException("Can not peek a empty stack.");
return head;
}
public ImmutableStack<T> Pop()
{
if (this == Empty)
throw new InvalidOperationException("Can not pop a empty stack.");
return tail;
}
public ImmutableStack<T> Push(T item) => new ImmutableStack<T>(item, this);
public IEnumerator<T> GetEnumerator()
{
var current = this;
while (current != Empty)
{
yield return current.head;
current = current.tail;
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
This will make our life easier while producing all combinations by recursion. Next, let's get the signature of our main method right:
public static IEnumerable<IEnumerable<T>> GetAllPossibleCombinationsInRandomOrder<T>(
IEnumerable<T> data, int combinationLength)
Ok, that looks about right. Now let's implement this thing:
var allCombinations = GetAllPossibleCombinations(data, combinationLength).ToArray();
var rnd = new Random();
var producedIndexes = new HashSet<int>();
while (producedIndexes.Count < allCombinations.Length)
{
while (true)
{
var index = rnd.Next(allCombinations.Length);
if (!producedIndexes.Contains(index))
{
producedIndexes.Add(index);
yield return allCombinations[index];
break;
}
}
}
Ok, all we are doing here is producing random indexees, checking we haven't produced it yet (we use a HashSet<int> for this), and returning the combination at that index.
Simple, now we only need to take care of GetAllPossibleCombinations(data, combinationLength).
Thats easy, we'll use recursion. Our bail out condition is when our current combination is the specified length. Another caveat: I'm omitting argument validation throughout the whole code, things like checking for null or if the specified length is not bigger than the input length, etc. should be taken care of.
Just for the fun, I'll be using some minor C#7 syntax here: nested functions.
public static IEnumerable<IEnumerable<T>> GetAllPossibleCombinations<T>(
IEnumerable<T> stream, int length)
{
return getAllCombinations(stream, ImmutableStack<T>.Empty);
IEnumerable<IEnumerable<T>> getAllCombinations<T>(IEnumerable<T> currentData, ImmutableStack<T> combination)
{
if (combination.Count == length)
yield return combination;
foreach (var d in currentData)
{
var newCombination = combination.Push(d);
foreach (var c in getAllCombinations(currentData.Except(new[] { d }), newCombination))
{
yield return c;
}
}
}
}
And there we go, now we can use this:
var data = "abc";
var random = GetAllPossibleCombinationsInRandomOrder(data, 2);
foreach (var r in random)
{
Console.WriteLine(string.Join("", r));
}
And sure enough, the output is:
bc
cb
ab
ac
ba
ca

Pass string value created in one function as parameter to function

I have a function called ValidColumns that, when finished will contain a string of id values joined by the pipe character "|."
private bool ValidColumns(string[] curGlobalAttr, int LineCount)
{
//test to see if there is more than 1 empty sku mod field in the imported file
string SkuMod = GetValue(curGlobalAttr, (int)GlobalAttrCols.SKU);
if(SkuMod =="")
ids += string.Join("|", OptionId);
}
What I want to do is take the ids string and pass it as a reference into another function to check to see if it contains duplicate values:
protected bool CheckForDuplicates(ref string ids)
{
bool NoDupes = true;
string[] idSet = ids.Split('|');
for (int i = 1; i <= idSet.Length - 1; i++)
{
if (idSet[i].ToString() == idSet[i - 1].ToString()) { NoDupes = false; }
}
return NoDupes;
}
But I am not sure how to do it properly? It seems so easy but I feel like I'm making this out to be much harder than it needs to be.
if (idSet[i].ToString() == idSet[i - 1].ToString())
You're only checking each value against the previous value. That would work if the values were sorted, but an easier method would just be to get a distinct list and check the lengths:
return (idSet.Length == idSet.Distinct().Count());
The second part of D Stanley's answer is what you want
public bool CheckForDuplicates(string value)
{
string[] split = value.Split('|');
return split.Length != split.Distinct().ToArray().Length;
}
You need a nested loop to do this which will look at each item and compare it to another item. This also will eliminate checking it against itself so it does not always return false.
In this we will track what we have already checked and only compare the new items against items after that slowly shrinking the second loop down in size to eliminate duplicate checks.
bool noDupes = true;
int currentItem = 0;
int counter = 0;
string[] idSet = ids.Split('|');
while(currentItem < idSet.Count)
{
counter = currentItem + 1;
while(counter < idSet.Count)
{
if(idSet[currentItem].ToUpper() == idSet[counter].ToUpper())
{
noDupes = false;
return noDupes;
}
counter ++;
}
currentItem ++;
}
return noDupes;
Edit:
It was pointed out an option I posted as an answer would always return false so I removed that option and adjusted this option to be more robust and easier to step through logically.
Hope this helps :)

Case insensitive group on multiple columns

Is there anyway to do a LINQ2SQL query doing something similar to this:
var result = source.GroupBy(a => new { a.Column1, a.Column2 });
or
var result = from s in source
group s by new { s.Column1, s.Column2 } into c
select new { Column1 = c.Key.Column1, Column2 = c.Key.Column2 };
but with ignoring the case of the contents of the grouped columns?
You can pass StringComparer.InvariantCultureIgnoreCase to the GroupBy extension method.
var result = source.GroupBy(a => new { a.Column1, a.Column2 },
StringComparer.InvariantCultureIgnoreCase);
Or you can use ToUpperInvariant on each field as suggested by Hamlet Hakobyan on comment. I recommend ToUpperInvariant or ToUpper rather than ToLower or ToLowerInvariant because it is optimized for programmatic comparison purpose.
I couldn't get NaveenBhat's solution to work, getting a compile error:
The type arguments for method
'System.Linq.Enumerable.GroupBy(System.Collections.Generic.IEnumerable,
System.Func,
System.Collections.Generic.IEqualityComparer)' cannot be
inferred from the usage. Try specifying the type arguments explicitly.
To make it work, I found it easiest and clearest to define a new class to store my key columns (GroupKey), then a separate class that implements IEqualityComparer (KeyComparer). I can then call
var result= source.GroupBy(r => new GroupKey(r), new KeyComparer());
The KeyComparer class does compare the strings with the InvariantCultureIgnoreCase comparer, so kudos to NaveenBhat for pointing me in the right direction.
Simplified versions of my classes:
private class GroupKey
{
public string Column1{ get; set; }
public string Column2{ get; set; }
public GroupKey(SourceObject r) {
this.Column1 = r.Column1;
this.Column2 = r.Column2;
}
}
private class KeyComparer: IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
if (!x.Column1.Equals(y.Column1,StringComparer.InvariantCultureIgnoreCase) return false;
if (!x.Column2.Equals(y.Column2,StringComparer.InvariantCultureIgnoreCase) return false;
return true;
//my actual code is more complex than this, more columns to compare
//and handles null strings, but you get the idea.
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
return 0.GetHashCode() ; // forces calling Equals
//Note, it would be more efficient to do something like
//string hcode = Column1.ToLower() + Column2.ToLower();
//return hcode.GetHashCode();
//but my object is more complex than this simplified example
}
}
I had the same issue grouping by the values of DataRow objects from a Table, but I just used .ToString() on the DataRow object to get past the compiler issue, e.g.
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"].ToString(),
StringComparer.InvariantCultureIgnoreCase)
instead of
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"],
StringComparer.InvariantCultureIgnoreCase)
I've expanded on Bill B's answer to make things a little more dynamic and to avoid hardcoding the column properties in the GroupKey and IQualityComparer<>.
private class GroupKey
{
public List<string> Columns { get; } = new List<string>();
public GroupKey(params string[] columns)
{
foreach (var column in columns)
{
// Using 'ToUpperInvariant()' if user calls Distinct() after
// the grouping, matching strings with a different case will
// be dropped and not duplicated
Columns.Add(column.ToUpperInvariant());
}
}
}
private class KeyComparer : IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
for (var i = 0; i < x.Columns.Count; i++)
{
if (!x.Columns[i].Equals(y.Columns[i], StringComparison.OrdinalIgnoreCase)) return false;
}
return true;
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
var hashcode = obj.Columns[0].GetHashCode();
for (var i = 1; i < obj.Columns.Count; i++)
{
var column = obj.Columns[i];
// *397 is normally generated by ReSharper to create more unique hash values
// So I added it here
// (do keep in mind that multiplying each hash code by the same prime is more prone to hash collisions than using a different prime initially)
hashcode = (hashcode * 397) ^ (column != null ? column.GetHashCode() : 0);
}
return hashcode;
}
}
Usage:
var result = source.GroupBy(r => new GroupKey(r.Column1, r.Column2, r.Column3), new KeyComparer());
This way, you can pass any number of columns into the GroupKey constructor.

Decorate-Sort-Undecorate, how to sort an alphabetic field in descending order

I've got a large set of data for which computing the sort key is fairly expensive. What I'd like to do is use the DSU pattern where I take the rows and compute a sort key. An example:
Qty Name Supplier
Row 1: 50 Widgets IBM
Row 2: 48 Thingies Dell
Row 3: 99 Googaws IBM
To sort by Quantity and Supplier I could have the sort keys: 0050 IBM, 0048 Dell, 0099 IBM. The numbers are right-aligned and the text is left-aligned, everything is padded as needed.
If I need to sort by the Quanty in descending order I can just subtract the value from a constant (say, 10000) to build the sort keys: 9950 IBM, 9952 Dell, 9901 IBM.
How do I quickly/cheaply build a descending key for the alphabetic fields in C#?
[My data is all 8-bit ASCII w/ISO 8859 extension characters.]
Note: In Perl, this could be done by bit-complementing the strings:
$subkey = $string ^ ( "\xFF" x length $string );
Porting this solution straight into C# doesn't work:
subkey = encoding.GetString(encoding.GetBytes(stringval).
Select(x => (byte)(x ^ 0xff)).ToArray());
I suspect because of the differences in the way that strings are handled in C#/Perl. Maybe Perl is sorting in ASCII order and C# is trying to be smart?
Here's a sample piece of code that tries to accomplish this:
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
List<List<string>> sample = new List<List<string>>() {
new List<string>() { "", "apple", "table" },
new List<string>() { "", "apple", "chair" },
new List<string>() { "", "apple", "davenport" },
new List<string>() { "", "orange", "sofa" },
new List<string>() { "", "peach", "bed" },
};
foreach(List<string> line in sample)
{
StringBuilder sb = new StringBuilder();
string key1 = line[1].PadRight(10, ' ');
string key2 = line[2].PadRight(10, ' ');
// Comment the next line to sort desc, desc
key2 = encoding.GetString(encoding.GetBytes(key2).
Select(x => (byte)(x ^ 0xff)).ToArray());
sb.Append(key2);
sb.Append(key1);
line[0] = sb.ToString();
}
List<List<string>> output = sample.OrderBy(p => p[0]).ToList();
return;
You can get to where you want, although I'll admit I don't know whether there's a better overall way.
The problem you have with the straight translation of the Perl method is that .NET simply will not allow you to be so laissez-faire with encoding. However, if as you say your data is all printable ASCII (ie consists of characters with Unicode codepoints in the range 32..127) - note that there is no such thing as '8-bit ASCII' - then you can do this:
key2 = encoding.GetString(encoding.GetBytes(key2).
Select(x => (byte)(32+95-(x-32))).ToArray());
In this expression I have been explicit about what I'm doing:
Take x (which I assume to be in 32..127)
Map the range to 0..95 to make it zero-based
Reverse by subtracting from 95
Add 32 to map back to the printable range
It's not very nice but it does work.
Just write an IComparer that would work as a chain of comparators.
In case of equality on each stage, it should pass eveluation to the next key part. If it's less then, or greater then, just return.
You need something like this:
int comparision = 0;
foreach(i = 0; i < n; i++)
{
comparision = a[i].CompareTo(b[i]) * comparisionSign[i];
if( comparision != 0 )
return comparision;
}
return comparision;
Or even simpler, you can go with:
list.OrderBy(i=>i.ID).ThenBy(i=>i.Name).ThenByDescending(i=>i.Supplier);
The first call return IOrderedEnumerable<>, the which can sort by additional fields.
Answering my own question (but not satisfactorily). To construct a descending alphabetic key I used this code and then appended this subkey to the search key for the object:
if ( reverse )
subkey = encoding.GetString(encoding.GetBytes(subkey)
.Select(x => (byte)(0x80 - x)).ToArray());
rowobj.sortKey.Append(subkey);
Once I had the keys built, I couldn't just do this:
rowobjList.Sort();
Because the default comparator isn't in ASCII order (which my 0x80 - x trick relies on). So then I had to write an IComparable<RowObject> that used the Ordinal sorting:
public int CompareTo(RowObject other)
{
return String.Compare(this.sortKey, other.sortKey,
StringComparison.Ordinal);
}
This seems to work. I'm a little dissatisfied because it feels clunky in C# with the encoding/decoding of the string.
If a key computation is expensive, why compute a key at all? String comparision by itself is not free, it's actually expensive loop through the characters and is not going to perform any better then a custom comparision loop.
In this test custom comparision sort performs about 3 times better then DSU.
Note that DSU key computation is not measured in this test, it's precomputed.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace DSUPatternTest
{
[TestClass]
public class DSUPatternPerformanceTest
{
public class Row
{
public int Qty;
public string Name;
public string Supplier;
public string PrecomputedKey;
public void ComputeKey()
{
// Do not need StringBuilder here, String.Concat does better job internally.
PrecomputedKey =
Qty.ToString().PadLeft(4, '0') + " "
+ Name.PadRight(12, ' ') + " "
+ Supplier.PadRight(12, ' ');
}
public bool Equals(Row other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.Qty == Qty && Equals(other.Name, Name) && Equals(other.Supplier, Supplier);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Row)) return false;
return Equals((Row) obj);
}
public override int GetHashCode()
{
unchecked
{
int result = Qty;
result = (result*397) ^ (Name != null ? Name.GetHashCode() : 0);
result = (result*397) ^ (Supplier != null ? Supplier.GetHashCode() : 0);
return result;
}
}
}
public class RowComparer : IComparer<Row>
{
public int Compare(Row x, Row y)
{
int comparision;
comparision = x.Qty.CompareTo(y.Qty);
if (comparision != 0) return comparision;
comparision = x.Name.CompareTo(y.Name);
if (comparision != 0) return comparision;
comparision = x.Supplier.CompareTo(y.Supplier);
return comparision;
}
}
[TestMethod]
public void CustomLoopIsFaster()
{
var random = new Random();
var rows = Enumerable.Range(0, 5000).Select(i =>
new Row
{
Qty = (int) (random.NextDouble()*9999),
Name = random.Next().ToString(),
Supplier = random.Next().ToString()
}).ToList();
foreach (var row in rows)
{
row.ComputeKey();
}
var dsuSw = Stopwatch.StartNew();
var sortedByDSU = rows.OrderBy(i => i.PrecomputedKey).ToList();
var dsuTime = dsuSw.ElapsedMilliseconds;
var customSw = Stopwatch.StartNew();
var sortedByCustom = rows.OrderBy(i => i, new RowComparer()).ToList();
var customTime = customSw.ElapsedMilliseconds;
Trace.WriteLine(dsuTime);
Trace.WriteLine(customTime);
CollectionAssert.AreEqual(sortedByDSU, sortedByCustom);
Assert.IsTrue(dsuTime > customTime * 2.5);
}
}
}
If you need to build a sorter dynamically you can use something like this:
var comparerChain = new ComparerChain<Row>()
.By(r => r.Qty, false)
.By(r => r.Name, false)
.By(r => r.Supplier, false);
var sortedByCustom = rows.OrderBy(i => i, comparerChain).ToList();
Here is a sample implementation of comparer chain builder:
public class ComparerChain<T> : IComparer<T>
{
private List<PropComparer<T>> Comparers = new List<PropComparer<T>>();
public int Compare(T x, T y)
{
foreach (var comparer in Comparers)
{
var result = comparer._f(x, y);
if (result != 0)
return result;
}
return 0;
}
public ComparerChain<T> By<Tp>(Func<T,Tp> property, bool descending) where Tp:IComparable<Tp>
{
Comparers.Add(PropComparer<T>.By(property, descending));
return this;
}
}
public class PropComparer<T>
{
public Func<T, T, int> _f;
public static PropComparer<T> By<Tp>(Func<T,Tp> property, bool descending) where Tp:IComparable<Tp>
{
Func<T, T, int> ascendingCompare = (a, b) => property(a).CompareTo(property(b));
Func<T, T, int> descendingCompare = (a, b) => property(b).CompareTo(property(a));
return new PropComparer<T>(descending ? descendingCompare : ascendingCompare);
}
public PropComparer(Func<T, T, int> f)
{
_f = f;
}
}
It works a little bit slower, maybe because of property binging delegate calls.

Categories