Specific sort with Comparer by string contains and alphabetically - c#

I want to write Comparer which will return sorted list by string and in alphabet order.
This is my List:
AA
AA MN
ADIDAS
ADIDAS MEN
2KC
ISANA MEN
Here is my comparer:
public class MyComparer : IComparer<string>
{
int IComparer<string>.Compare(string x, string y)
{
if (x == y)
{
return 0;
}
if (x.Contains(y))
{
return -1;
}
else
{
return 1;
}
}
}
And this comparer sort my list like that:
2KC
ADIDAS MEN
AA MEN
AA
ISANA MEN
ADIDAS
It's correct because I want first check "longest" brands (this removes error with finding brand in product), but with this sorting I want sort it alphabetically. So my list should look like
2KC
ADIDAS MEN
AA MEN
AA
ADIDAS
ISANA MEN
I'm trying with string.Sort(), not work correctly.

You have two problems. The first is that your comparer violates the rules for IComparer.Compare. The rules say that if A.CompareTo(B) returns 1, then B.CompareTo(A) must return -1. Your comparer doesn't do that. For example, this code:
string a = "abc";
string b = "def";
MyComparer comp = new MyComparer();
int rsltab = comp.Compare(a, b); // returns 1
int rsltba = comp.Compare(b, a); // returns 1
That's plainly wrong.
Whatever else you do in your comparer, you must ensure that:
If x.Compare(y) returns -1, then y.Compare(x) returns 1.
If y.Compare(x) returns -1, then x.Compare(y) returns 1.
If x.Compare(y) returns 0, then y.Compare(x) returns 0.
See the Notes to Implementers section of the documentation for ICompareble.CompareTo for more information about the rules for comparisons. I'm kind of surprised that this information isn't repeated in the documentation for IComparer<T>.
The other problem is that I don't think you really want to test to see if x contains y. Maybe you want to check to see if x starts with y. That is:
if (x.StartsWith(y)) return 1;
But that, too, could give you some strange results. For example, what if one company's name is just a prefix of the first word in another's? For example:
Art
Artificial Flavor Company
Do you really want Artificial Flavor Company to sort before Art, even though the companies aren't related?
I suspect that what you really want is to parse the string and see if the first word of the longer string is equal to the shorter string. So given "ADIDAS" and "ADIDAS MEN", you would parse the longer string to get the first word, and compare that against the shorter string. That's not a perfect solution, but it will be much more effective than what you currently have.

Related

How to combine items in List<string> to make new items efficiently

I have a case where I have the name of an object, and a bunch of file names. I need to match the correct file name with the object. The file name can contain numbers and words, separated by either hyphen(-) or underscore(_). I have no control of either file name or object name. For example:
10-11-12_001_002_003_13001_13002_this_is_an_example.svg
The object name in this case is just a string, representing an number
10001
I need to return true or false if the file name is a match for the object name. The different segments of the file name can match on their own, or any combination of two segments. In the example above, it should be true for the following cases (not every true case, just examples):
10001
10002
10003
11001
11002
11003
12001
12002
12003
13001
13002
And, we should return false for this case (among others):
13003
What I've come up with so far is this:
public bool IsMatch(string filename, string objectname)
{
var namesegments = GetNameSegments(filename);
var match = namesegments.Contains(objectname);
return match;
}
public static List<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-').ToList();
var newSegments = new List<string>();
foreach (var segment in segments)
{
foreach (var segment2 in segments)
{
if (segment == segment2)
continue;
var newToken = segment + segment2;
newSegments.Add(newToken);
}
}
return segments.Concat(newSegments).ToList();
}
One or two segments combined can make a match, and that is enought. Three or more segments combined should not be considered.
This does work so far, but is there a better way to do it, perhaps without nesting foreach loops?
First: don't change debugged, working, sufficiently efficient code for no reason. Your solution looks good.
However, we can make some improvements to your solution.
public static List<string> GetNameSegments(string filename)
Making the output a list puts restrictions on the implementation that are not required by the caller. It should be IEnumerable<String>. Particularly since the caller in this case only cares about the first match.
var segments = filename.Split('_', '-').ToList();
Why ToList? A list is array-backed. You've already got an array in hand. Just use the array.
Since there is no longer a need to build up a list, we can transform your two-loop solution into an iterator block:
public static IEnumerable<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-');
foreach (var segment in segments)
yield return segment;
foreach (var s1 in segments)
foreach (var s2 in segments)
if (s1 != s2)
yield return s1 + s2;
}
Much nicer. Alternatively we could notice that this has the structure of a query and simply return the query:
public static IEnumerable<string> GetNameSegments(string filename)
{
var q1= filename.Split('_', '-');
var q2 = from s1 in q1
from s2 in q1
where s1 != s2
select s1 + s2;
return q1.Concat(q2);
}
Again, much nicer in this form.
Now let's talk about efficiency. As is often the case, we can achieve greater efficiency at a cost of increased complication. This code looks like it should be plenty fast enough. Your example has nine segments. Let's suppose that nine or ten is typical. Our solutions thus far consider the ten or so singletons first, and then the hundred or so combinations. That's nothing; this code is probably fine. But what if we had thousands of segments and were considering millions of possibilities?
In that case we should restructure the algorithm. One possibility would be this general solution:
public bool IsMatch(HashSet<string> segments, string name)
{
if (segments.Contains(name))
return true;
var q = from s1 in segments
where name.StartsWith(s1)
let s2 = name.Substring(s1.Length)
where s1 != s2
where segments.Contains(s2)
select 1; // Dummy. All we care about is if there is one.
return q.Any();
}
Your original solution is quadratic in the number of segments. This one is linear; we rely on the constant order contains operation. (This assumes of course that string operations are constant time because strings are short. If that's not true then we have a whole other kettle of fish to fry.)
How else could we extract wins in the asymptotic case?
If we happened to have the property that the collection was not a hash set but rather a sorted list then we could do even better; we could binary search the list to find the start and end of the range of possible prefix matches, and then pour the list into a hashset to do the suffix matches. That's still linear, but could have a smaller constant factor.
If we happened to know that the target string was small compared to the number of segments, we could attack the problem from the other end. Generate all possible combinations of partitions of the target string and check if both halves are in the segment set. The problem with this solution is that it is quadratic in memory usage in the size of the string. So what we'd want to do there is construct a special hash on character sequences and use that to populate the hash table, rather than the standard string hash. I'm sure you can see how the solution would go from there; I shan't spell out the details.
Efficiency is very much dependent on the business problem that you're attempting to solve. Without knowing the full context/usage it's difficult to define the most efficient solution. What works for one situation won't always work for others.
I would always advocate to write working code and then solve any performance issues later down the line (or throw more tin at the problem as it's usually cheaper!) If you're having specific performance issues then please do tell us more...
I'm going to go out on a limb here and say (hope) that you're only going to be matching the filename against the object name once per execution. If that's the case I reckon this approach will be just about the fastest. In a circumstance where you're matching a single filename against multiple object names then the obvious choice is to build up an index of sorts and match against that as you were already doing, although I'd consider different types of collection depending on your expected execution/usage.
public static bool IsMatch(string filename, string objectName)
{
var segments = filename.Split('-', '_');
for (int i = 0; i < segments.Length; i++)
{
if (string.Equals(segments[i], objectName)) return true;
for (int ii = 0; ii < segments.Length; ii++)
{
if (ii == i) continue;
if (string.Equals($"{segments[i]}{segments[ii]}", objectName)) return true;
}
}
return false;
}
If you are willing to use the MoreLINQ NuGet package then this may be worth considering:
public static HashSet<string> GetNameSegments(string filename)
{
var segments = filename.Split(new char[] {'_', '-'}, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = segments
.Cartesian(segments, (x, y) => x == y ? null : x + y)
.Where(z => z != null)
.Concat(segments);
return new HashSet<string>(matches);
}
StringSplitOptions.RemoveEmptyEntries handles adjacent separators (e.g. --). Cartesian is roughly equivalent to your existing nested for loops. The Where is to remove null entries (i.e. if x == y). Concat is the same as your existing Concat. The use of HashSet allows for your Contains calls (in IsMatch) to be faster.

Custom string Comparison in C#

I want to implement a custom string IComparer in C# and apply it to a ComboBox.
Actual Results
If I set the ComboBox's Sorted property to true, the output is :
A
AA
AAA
B
BB
BBB
Wanted Results
The wanted behavior of the sorting algorithm is the following (financial developers will understand why :) ) :
AAA
AA
A
BBB
BB
B
Question
Is it possible to do it ? Are sorting algorithms needed here ?
PS : I don't need a complete answer with code, i just need an idea of how it might be done ..
EDIT
This is about credit ratings. I've omitted something in my question. The ratings have to be sorted in this order :
XXX
XX+
XX
XX-
X+
X
X-
with X in ('A','B','C') and 'A' > 'B' > 'C'
Here's a mostly implemented version:
public class MyComparer : IComparer<string>
{
public int Compare(string x, string y)
{
//todo null checks on input
var pairs = x.Zip(y, (a, b) => new { x = a, y = b });
foreach (var pair in pairs)
{
int value = pair.x.CompareTo(pair.y);
if (value != 0)
return value;
}
//if we got here then either they are the same,
//or one starts with the other
return y.Length.CompareTo(x.Length); //note x and y are reversed here
}
}
So this uses Zip to get the pairs of chars from each corresponding string until one ends, returning the appropriate value if they aren't equal. If it makes it past that then one string start with the other. For a traditional string comparison we'd just compare the lengths in the same order as the input parameters. Since we're essentially reversing the order based on length, note that the x and y are swapped on the last line. That reverses the comparison logic.
Assuming this is for credit ratings, normally this is done by having a "sort order" column on the CreditRating class that you could use to sort the list before assigning it as the data source of the drop-down.
But, a quick workaround (based on the limited possible values) would be to sort by the first letter ascending, then by the length of the string descending:
if(left[0] != right[0])
return left[0].CompareTo(right[0]);
else
return right.Length - left.Length;
Another workaround if you want more control over the order is to create a list of possible values in the "right" order and then use that to sort the list:
public class MyComparer : IComparer<string>
{
private static readonly string[] Ratings = new [] {
"CC","C","CCC-","CCC","CCC+",
"B-","B","B+","BB-","BB","BB+","BBB-","BBB","BBB+",
"A-","A","A+","AA-","AA","AA+","AAA"};
// reverse the order so that any strings not found will be put at the end.
public int Compare(string left, string right)
{
return Array.IndexOf(Ratings, right).CompareTo(Array.IndexOf(Ratings, left));
}
}
Write the IComparer so that it takes strings but compares per character,
if A[0] == B[0] go to the next character.
if B[1] == null or A[1] < B[1], return A < B.
if A[1] == null or B[1] < A[1], return B < A.
if equal...continue as needed

Listview item sort

I have several columns in a listview but I am just so dummb to think up a logical sorting method to both sort items out alphabetically and numerically. Because in case of numerical values
I'd like a column's content such as:
111
13
442
23
214
to be:
13
23
111
214
442
My current sorting class looks like this:
class itemsorter:IComparer
{
public int compare (object a, object b)
{
return string.compare(((lvitem)a).text,((lvitem)b).text));
}
}
Parse your Strings to numbers before doing the comparison, in which case you can simply return the difference of the 2 numbers as your result from the compare method.
As it sounds like you still want to sort both alphabetical and numerical values, this would have to be a combined, hybrid approach with the above - such that numbers are sorted against numbers, and alphabetical values with alphabetical. You'd just need to choose which takes precedence, such that either numerical or alphabetical values always come first - necessary to maintain a stable and reflexive sort. (For example, if a is a number, and b is a non-number, return 1. If a is a non-number, and b is a number, return -1. Else, they must be of equal types, and then you can defer to the type-specific sorting.)
As ziesemer said, you can take my sample code as below, hope this will give you a hand.
class itemsorter : IComparer
{
public int compare(object a, object b)
{
int resultA, resultB;
bool markA = int.TryParse(((lvitem)a).text, out resultA);
bool markB = int.TryParse(((lvitem)b).text, out resultB)
// They are number.
if (markA && markB)
{
if (resultA > resultB)
return 1;
else if (resultA < resultB)
return -1;
else
return 0;
}
// a can convert to number,
// b can't.
if (markA && !markB)
{
return 1;
}
// b can convert to number,
// a can't.
if(!markA && markB)
{
return -1;
}
}
}

c# - BinarySearch StringList with wildcard

I have a sorted StringList and wanted to replace
foreach (string line3 in CardBase.cardList)
if (line3.ToLower().IndexOf((cardName + Config.EditionShortToLong(edition)).ToLower()) >= 0)
{
return true;
}
with a binarySearch, since the cardList ist rather large(~18k) and this search takes up around 80% of the time.
So I found the List.BinarySearch-Methode, but my problem is that the lines in the cardList look like this:
Brindle_Boar_(Magic_2012).c1p247924.prod
But I have no way to generate the c1p... , which is a problem cause the List.BinarySearch only finds exact matches.
How do I modify List.BinarySearch so that it finds a match if only a part of the string matches?
e. g.
searching for Brindle_Boar_(Magic_2012) should return the position of Brindle_Boar_(Magic_2012).c1p247924.prod
List.BinarySearch will return the ones complement of the index of the next item larger than the request if an exact match is not found.
So, you can do it like this (assuming you'll never get an exact match):
var key = (cardName + Config.EditionShortToLong(edition)).ToLower();
var list = CardBase.cardList;
var index = ~list.BinarySearch(key);
return index != list.Count && list[index].StartsWith(key);
BinarySearch() has an overload that takes an IComparer<T> has second parameter, implement a custom comparer and return 0 when you have a match within the string - you can use the same IndexOf() method there.
Edit:
Does a binary search make sense in your scenario? How do you determine that a certain item is "less" or "greater" than another item? Right now you only provide what would constitute a match. Only if you can answer this question, binary search applies in the first place.
You can take a look at the C5 Generic Collection Library (you can install it via NuGet also).
Use the SortedArray(T) type for your collection. It provides a handful of methods that could prove useful. You can even query for ranges of items very efficiently.
var data = new SortedArray<string>();
// query for first string greater than "Brindle_Boar_(Magic_2012)" an check if it starts
// with "Brindle_Boar_(Magic_2012)"
var a = data.RangeFrom("Brindle_Boar_(Magic_2012)").FirstOrDefault();
return a.StartsWith("Brindle_Boar_(Magic_2012)");
// query for first 5 items that start with "Brindle_Boar"
var b = data.RangeFrom("string").Take(5).Where(s => s.StartsWith("Brindle_Boar"));
// query for all items that start with "Brindle_Boar" (provided only ascii chars)
var c = data.RangeFromTo("Brindle_Boar", "Brindle_Boar~").ToList()
// query for all items that start with "Brindle_Boar", iterates until first non-match
var d = data.RangeFrom("Brindle_Boar").TakeWhile(s => s.StartsWith("Brindle_Boar"));
The RageFrom... methods perform a binary search, find the first element greater than or equal to your argument, that returns an iterator from that position

IEnumerable<IEnumerable<int>> - no duplicate IEnumerable<int>s

I'm trying to find a solution to this problem:
Given a IEnumerable< IEnumerable< int>> I need a method/algorithm that returns the input, but in case of several IEnmerable< int> with the same elements only one per coincidence/group is returned.
ex.
IEnumerable<IEnumerable<int>> seqs = new[]
{
new[]{2,3,4}, // #0
new[]{1,2,4}, // #1 - equals #3
new[]{3,1,4}, // #2
new[]{4,1,2} // #3 - equals #1
};
"foreach seq in seqs" .. yields {#0,#1,#2} or {#0,#2,#3}
Sould I go with ..
.. some clever IEqualityComparer
.. some clever LINQ combination I havent figured out - groupby, sequenceequal ..?
.. some seq->HashSet stuff
.. what not. Anything will help
I'll be able to solve it by good'n'old programming but inspiration is always appreciated.
Here's a slightly simpler version of digEmAll's answer:
var result = seqs.Select(x => new HashSet<int>(x))
.Distinct(HashSet<int>.CreateSetComparer());
Given that you want to treat the elements as sets, you should have them that way to start with, IMO.
Of course this won't help if you want to maintain order within the sequences that are returned, you just don't mind which of the equal sets is returned... the above code will return an IEnumerable<HashSet<int>> which will no longer have any ordering within each sequence. (The order in which the sets are returned isn't guaranteed either, although it would be odd for them not to be return in first-seen-first-returned basis.)
It feels unlikely that this wouldn't be enough, but if you could give more details of what you really need to achieve, that would make it easier to help.
As noted in comments, this will also assume that there are no duplicates within each original source array... or at least, that they're irrelevant, so you're happy to treat { 1 } and { 1, 1, 1, 1 } as equal.
Use the correct collection type for the job. What you really want is ISet<IEnumerable<int>> with an equality comparer that will ignore the ordering of the IEnumerables.
EDITED:
You can get what you want by building your own IEqualityComparer<IEnumerable<int>> e.g.:
public class MyEqualityComparer : IEqualityComparer<IEnumerable<int>>
{
public bool Equals(IEnumerable<int> x, IEnumerable<int> y)
{
return x.OrderBy(el1 => el1).SequenceEqual(y.OrderBy(el2 => el2));
}
public int GetHashCode(IEnumerable<int> elements)
{
int hash = 0;
foreach (var el in elements)
{
hash = hash ^ el.GetHashCode();
}
return hash;
}
}
Usage:
var values = seqs.Distinct(new MyEqualityComparer()).ToList();
N.B.
this solution is slightly different from the one given by Jon Skeet.
His answer considers sublists as sets, so basically two lists like [1,2] and [1,1,1,2,2] are equal.
This solution don't, i.e. :
[1,2,1,1] is equal to [2,1,1,1] but not to [2,2,1,1], hence basically the two lists have to contain the same elements and in the same number of occurrences.

Categories