Replacing CompareTo with LINQ for array elements - c#

I am working on a project which uses posts to represent a fence. Each fence has exactly two posts that implement IComparable and are ordered in each fence. In order to override my CompareTo on Fence, I need to compare post 0 between this and the other fence; if that result returns 0, then I need to compare post 1 between this and the other fence. I wrote a simple for loop to perform this logic, which I've included below. However, Resharper is giving me a warning that I should replace the for loop with LINQ. Is there an easy way to replace the for loop with LINQ?
public int CompareTo(Fence other)
{
for(int i = 0; i < Posts.Length; i++)
{
int c = Posts[i].CompareTo(other.Posts[i]);
if (c != 0)
return c;
}
return 0;
}

Since a Fence has exactly two Posts, then this can be reduced to :
public int CompareTo(Fence other)
{
int c = Post[0].CompareTo(other.Post[0]);
if (c == 0)
c = Post[1].CompareTo(other.Post[1]);
return c;
}
Note that you can (and probably should) replace the Post array with Post0 and `Post1'.
Note, that this could give you a completely different ordering than:
int c = Post[1].CompareTo(other.Post[1]);
if (c == 0)
c = Post[0].CompareTo(other.Post[0]);
which, presumably, is just as valid. (i.e, if this Post[0] is less than the other's, but it's Post[1] is greater, is the Fence greater or less than the other?)

If ReSharper suggests it you may easily hit AltEnterEnter and see what happens. I guess something like:
public int CompareTo(Fence other)
{
return Posts.Select((p, i) => p.CompareTo(other.Posts[i]))
.FirstOrDefault(c => c != 0);
}
This projects each Post to it's comparison result to the respective Post of the other Fence (p is the Post loop variable, i is the index). FirstOrDefault looks for the first non-zero comparsion result or returns 0 if all results are 0.
So this does exactly what your loop does (note that LINQ uses deferred execution, so when the first non-zero comparison occures, no further Posts are compared).
Note that this code is error-prone, as juharr commented: you should first null-check other and check if the two Post arrays have the same length.
(I guess that Posts is not null and the arrays don't contain null elements should be ensured by your classes' implementations).

Related

merging k sorted lists does not work with negative values C#

The question for the problem is:
You are given an array of k linked-lists lists, each linked-list is sorted in ascending order.
Merge all the linked-lists into one sorted linked-list and return it.
A successful working example with inputs and outputs is:
Input: lists = [[1,4,5],[1,3,4],[2,6]]
Output: [1,1,2,3,4,4,5,6]
Explanation: The linked-lists are:
[
1->4->5,
1->3->4,
2->6
]
merging them into one sorted list:
1->1->2->3->4->4->5->6
The issue with my code is that it is not working for negative values however it works fine for positive values.
eg input -> [[2],[1],[-1]] output->[1,2]
public class Solution {
public ListNode MergeKLists(ListNode[] lists) {
if (lists.Length == 0) return null;
var newlist = new ListNode();
var result = newlist;
for(int i=0; i<lists.Length; i++)
{
if(lists[i] !=null)
newlist = MergeTwoLists(lists[i], newlist);
}
return result.next;
}
private ListNode MergeTwoLists(ListNode l1, ListNode l2) {
if (l1 == null)
return l2;
if (l2 == null)
return l1;
if (l1.val <= l2.val)
{
l1.next = MergeTwoLists(l1.next, l2);
return l1;
}
else
{
l2.next = MergeTwoLists(l1, l2.next);
return l2;
}
}
}
Fundamentally, the problem with your code is that you initialize the algorithm with a non-empty list. I.e. you set newlist to a single ListNode object. You haven't provided a complete code example, but presumably this is a class with at least two members, val and next. In C#, both of these members will initially have their default values, which means you start out with the list [0], before you've even started merging anything.
Note also that while you are modifying the newlist variable in the merging loop, what you return is the result.next variable. This appears to be an attempt to skip the erroneously include 0 value that you put in the newlist list in the first place. But in your negative-valued example, it causes you to skip both the 0 value, and the -1 value that the merge correctly placed before it.
In your example [[2],[1],[-1]], this means that when the loop for merging is done, you have newlist referring to the list [-1, 0, 1, 2]. But result points to the second element of that list (the original 0-valued node), giving you [0, 1, 2]. Then what you return is the next node of that, which produces [1,2].
The fact is, the MergeTwoLists() method you show already handles empty, i.e. null-valued, lists. It's not clear what motivated you to initialize your algorithm with a non-empty list, nor what motivated you to keep a second variable to reference that same node. Your problem probably would've been easier for you to notice if you hadn't done the latter, and of course, the whole bug is caused by the former.
You should just remove both aspects from the code you already have. Initialize newlist to null instead of creating a new node for it, and get rid of the result variable altogether:
public ListNode MergeKLists(ListNode[] lists) {
var newlist = null;
for (int i = 0; i < lists.Length; i++)
{
if (lists[i] != null) {
newlist = MergeTwoLists(lists[i], newlist);
}
}
return newlist;
}
Note: you also don't need the check for lists.Length == 0. The loop will be skipped if lists.Length is 0, and with the fixed version just returning null as the empty list in that case, it works just as well without it.
On a more general note: most of the time, bugs are fixed by changing code, or even removing it. If you get into the habit of fixing bugs by adding code, more often than not you just wind up with a new bug added to the one you already had.
I will admit, that rule is not a hard-and-fast, 100% reliable one. But it's served me very well over the years.

C# nth-child logic test

I've been working on my own, headless browser implementation and I feel like I am making a mess of my nth-child selector logic. Given an element and it's 0-based position in its group of siblings is there a simple, one-line expression to see if that element belongs in the result set?
public bool Evaluate(HTMLElement element)
{
if (element.parentNode == element.ownerDocument)
return false;
List<Element> children = element.Parent.Children
.Where(e => e is Element)
.Cast<Element>()
.ToList();
int index = children.IndexOf(element);
bool result = (an + b test here);
return result;
}
Currently I have a convoluted set of branching logic based on tests for 0 values for (a) and (b) and I suspect I am making it more complicated than it needs to be.
If I'm understanding correctly, you need to determine whether an n exists such that index = a*n + b for some fixed a, b.
bool result = (a == 0) ? b == index : (Math.Abs(index - b) % Math.Abs(a)) == 0;
If a is 0, then index must be b. Otherwise, a must evenly divide the difference between i and b.
Naturally, if a negative value for a is not allowed you can skip the Math.Abs(a) call.

Regex to find 'good enough' sequences

I'm looking to implement some algorithm to help me match imperfect sequences.
Say I have a stored sequence of ABBABABBA and I want to find something that 'looks like' that in a large stream of characters.
If I give my algorithm the allowance to have 2 wildcards (differences), how can I use Regex to match something like: where ( and ) mark the differences:
A(A)BABAB(A)A
or
(B)BBA(A)ABBA
My Dilemma is that I am looking to find these potential target matches (with imperfections) in a big string of characters.
So in something like:
ABBDBABDBCBDBABDB(A(A)BABAB(A)A)DBDBABDBCBDBAB
ADBDBABDBDBDBCBDBABCBDBABCBDBABCBDBABABBBDBABABBCD
DBABCBDABDBABCBCBDBABABDABDBABCBDBABABDDABCBDBABAB
I must be able to search for these 'near enough' matches.
Where brackets denote: (The Good enough Match with the (Differences))
Edit: To be more formal in this example, A match of Length N can be accepted if N-2 characters are the same as the original (2 Differences)
I've used Regex before, but only to find perfect sequences - not for something that 'looks like' one.
Hope this is clear enough to get some advice on.
Thanks for reading and any help!
You could use LINQ to be nice and expressive.
In order to use this make sure you have a using System.Linq at the top of your code.
Assuming that
source is the stored target pattern
test is the string to test.
Then you can do
public static bool IsValid(string source, string test)
{
return test != null
&& source != null
&& test.Length == source.Length
&& test.Where((x,i) => source[i] != x).Count() <=2
}
There is also a shortcut version that exits false the moment it fails, saving iterating the rest of the string.
public static bool IsValid(string source, string test) 
{
  return test != null  
         && source != null 
          && test.Length == source.Length 
          && !test.Where((x,i) => source[i] != x).Skip(2).Any();
}
As requested in comments, a little explanation of how this works
in C# a string can be treated as an array of characters, which means that the Linq methods can be used on it.
test.Where((x,i) => source[i] != x)
This uses the overload of Where that for each character in test, x gets assigned to the character and i gets assigned to the index. If the condition character at position i in source is not equal to x then output into the result.
Skip(2)
this skips the first 2 results.
Any()
this returns true if there any results left or false if not. Because linq defers execution the moment that this is false the function exits rather than evaluating the rest of the string.
The entire test is then negated by prefixing with a '!' to indicate we want to know where there are no more results.
Now in order to match as substring you are going to need to behave similar to a regex backtracking...
public static IEnumerable<int> GetMatches(string source, string test)
{
return from i in Enumerable.Range(0,test.Length - source.Length)
where IsValid(source, !test.Skip(i).Take(source.Length))
select i;
}
public static bool IsValid(string source, IEnumerable<char> test) 
{
  return test.Where((x,i) => source[i] != x).Skip(2).Any();
}
UPDATE Explained
Enumerable.Range(0,test.Length - source.Length)
This creates a sequence of numbers from 0 to test.Length - source.Length, there is no need in checking starting at every char in test because once the length is shorter the answer is invalid.
from i in ....
Basically iterate over the collection assigning i to be the current value each time
where IsValid(source, !test.Skip(i).Take(source.Length))
Filter the results to only include the ones where there is a match in test starting at index i (hence the skip) and going on for source.Length chars (hence the take.
select i
return i
This returns an enumerable over the indexes in test where there is a match, you could extract them with
GetMatches(source,test).Select(i =>
new string(test.Skip(i).Take(source.Length).ToArray()));
I don't think this can be done with regexes (if it can, I'm unfamiliar with the syntax). However, you can use the dynamic programming algorithm for Levenshtein distance.
Edit: If you don't need to handle letters that have switched positions, a much easier approach is to just compare each pair of characters from the two strings, and just count the number of differences.
I can't think how you'd do it with regex but it should be pretty simple to code.
I'd probably just split the strings up and compare them character by character. If you get a difference count it and move to the next character. If you exceed 2 differences then move on to the next full string.
I don't think there's a good regular expression to handle this case. (Or at least, there isn't one that won't take up a good three lines of text and cause multiple bullets in your feet.) However, that doesn't mean you can't solve this problem.
Depending on how large your strings are (I'm assuming they won't be millions of characters each) I don't see anything stopping you from using a single loop to compare individuals character in order, while keeping a tally of differences:
int differences = 0; // Count of discrepancies you've detected
int tolerance = 7; // Limit of discrepancies you'll allow
CheckStrings(int differences, int tolerance) {
for (i = 0; i < StringA.Length; i++)
{
if (StringA[i] != StringB[i]) {
differences++;
if (differences > tolerance) {
return false;
}
}
}
return true;
}
Most of the time, don't be concerned about your strings being too long to put into a loop. Behind-the-scenes, any code that assesses every character of a string will loop in some form or another. Until you literally have millions of characters to deal with, a loop should do the trick just fine.
I'll bypass the 'regex' part and focus on:
Is there a better way than doing nested loops to wildcard every position?
It sounds like there's a programmatic way that might help you. See this post about iterating over two IEnumerables. By iterating over both strings at the same time, you can complete the task in O(n) time. Even better, if you know your tolerance(maximum of 2 errors), you can sometimes finish faster than O(n).
Here's a simple example that I wrote up. It probably needs tweaking for your own case, but it might be a good starting point.
static void imperfectMatch(String original, String testCase, int tolerance)
{
int mistakes = 0;
if (original.Length == testCase.Length)
{
using (CharEnumerator enumerator1 = original.GetEnumerator())
using (CharEnumerator enumerator2 = testCase.GetEnumerator())
{
while (enumerator1.MoveNext() && enumerator2.MoveNext())
{
if (mistakes >= tolerance)
break;
if (enumerator1.Current != enumerator2.Current)
mistakes++;
}
}
}
else
mistakes = -1;
Console.WriteLine(String.Format("Original String: {0}", original));
Console.WriteLine(String.Format("Test Case String: {0}", testCase));
Console.WriteLine(String.Format("Number of errors: {0}", mistakes));
Console.WriteLine();
}
Does any combination of A, B, ( and ) work?
bool isMatch = Regex.IsMatch(inputString, "^[AB()]+$")
For sufficiently small patterns (ABCD), you could generate a regexp:
..CD|.B.D|.BC.|A..D|A.C.|AB..
You could also code a custom comparison loop

Is there a LINQ extension or (a sensible/efficient set of LINQ entensions) that determine whether a collection has at least 'x' elements?

I have code that needs to know that a collection should not be empty or contain only one item.
In general, I want an extension of the form:
bool collectionHasAtLeast2Items = collection.AtLeast(2);
I can write an extension easily, enumerating over the collection and incrementing an indexer until I hit the requested size, or run out of elements, but is there something already in the LINQ framework that would do this? My thoughts (in order of what came to me) are::
bool collectionHasAtLeast2Items = collection.Take(2).Count() == 2; or
bool collectionHasAtLeast2Items = collection.Take(2).ToList().Count == 2;
Which would seem to work, though the behaviour of taking more elements than the collection contains is not defined (in the documentation) Enumerable.Take Method , however, it seems to do what one would expect.
It's not the most efficient solution, either enumerating once to take the elements, then enumerating again to count them, which is unnecessary, or enumerating once to take the elements, then constructing a list in order to get the count property which isn't enumerator-y, as I don't actually want the list.
It's not pretty as I always have to make two assertions, first taking 'x', then checking that I actually received 'x', and it depends upon undocumented behaviour.
Or perhaps I could use:
bool collectionHasAtLeast2Items = collection.ElementAtOrDefault(2) != null;
However, that's not semantically-clear. Maybe the best is to wrap that with a method-name that means what I want. I'm assuming that this will be efficient, I haven't reflected on the code.
Some other thoughts are using Last(), but I explicitly don't want to enumerate through the whole collection.
Or maybe Skip(2).Any(), again not semantically completely obvious, but better than ElementAtOrDefault(2) != null, though I would think they produce the same result?
Any thoughts?
public static bool AtLeast<T>(this IEnumerable<T> source, int count)
{
// Optimization for ICollection<T>
var genericCollection = source as ICollection<T>;
if (genericCollection != null)
return genericCollection.Count >= count;
// Optimization for ICollection
var collection = source as ICollection;
if (collection != null)
return collection.Count >= count;
// General case
using (var en = source.GetEnumerator())
{
int n = 0;
while (n < count && en.MoveNext()) n++;
return n == count;
}
}
You can use Count() >= 2, if you sequence implements ICollection?
Behind the scene, Enumerable.Count() extension method checks does the sequence under loop implements ICollection. If it does indeed, Count property returned, so target performance should be O(1).
Thus ((IEnumerable<T>)((ICollection)sequence)).Count() >= x also should have O(1).
You could use Count, but if performance is an issue, you will be better off with Take.
bool atLeastX = collection.Take(x).Count() == x;
Since Take (I believe) uses deferred execution, it will only go through the collection once.
abatishchev mentioned that Count is O(1) with ICollection, so you could do something like this and get the best of both worlds.
IEnumerable<int> col;
// set col
int x;
// set x
bool atLeastX;
if (col is ICollection<int>)
{
atLeastX = col.Count() >= x;
}
else
{
atLeastX = col.Take(x).Count() == x;
}
You could also use Skip/Any, in fact I bet it would be even faster than Take/Count.

Test whether two IEnumerable<T> have the same values with the same frequencies

I have two multisets, both IEnumerables, and I want to compare them.
string[] names1 = { "tom", "dick", "harry" };
string[] names2 = { "tom", "dick", "harry", "harry"};
string[] names3 = { "tom", "dick", "harry", "sally" };
string[] names4 = { "dick", "harry", "tom" };
Want names1 == names4 to return true (and self == self returns true obviously)
But all other combos return false.
What is the most efficient way? These can be large sets of complex objects.
I looked at doing:
var a = name1.orderby<MyCustomType, string>(v => v.Name);
var b = name4.orderby<MyCustomType, string>(v => v.Name);
return a == b;
First sort as you have already done, and then use Enumerable.SequenceEqual. You can use the first overload if your type implements IEquatable<MyCustomType> or overrides Equals; otherwise you will have to use the second form and provide your own IEqualityComparer<MyCustomType>.
So if your type does implement equality, just do:
return a.SequenceEqual(b);
Here's another option that is both faster, safer, and requires no sorting:
public static bool UnsortedSequencesEqual<T>(
this IEnumerable<T> first,
IEnumerable<T> second)
{
return UnsortedSequencesEqual(first, second, null);
}
public static bool UnsortedSequencesEqual<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var counts = new Dictionary<T, int>(comparer);
foreach (var i in first) {
int c;
if (counts.TryGetValue(i, out c))
counts[i] = c + 1;
else
counts[i] = 1;
}
foreach (var i in second) {
int c;
if (!counts.TryGetValue(i, out c))
return false;
if (c == 1)
counts.Remove(i);
else
counts[i] = c - 1;
}
return counts.Count == 0;
}
The most efficient way would depend on the datatypes. A reasonably efficient O(N) solution that's very short is the following:
var list1Groups=list1.ToLookup(i=>i);
var list2Groups=list2.ToLookup(i=>i);
return list1Groups.Count == list2Groups.Count
&& list1Groups.All(g => g.Count() == list2Groups[g.Key].Count());
The items are required to have a valid Equals and GetHashcode implementation.
If you want a faster solution, cdhowie's solution below is comparably fast # 10000 elements, and pulls ahead by a factor 5 for large collections of simple objects - probably due to better memory efficiency.
Finally, if you're really interested in performance, I'd definitely try the Sort-then-SequenceEqual approach. Although it has worse complexity, that's just a log N factor, and those can definitely be drowned out by differences in the constant for all practical data set sizes - and you might be able to sort in-place, use arrays or even incrementally sort (which can be linear). Even at 4 billion elements, the log-base-2 is just 32; that's a relevant performance difference, but the difference in constant factor could conceivably be larger. For example, if you're dealing with arrays of ints and don't mind modifying the collection order, the following is faster than either option even for 10000000 items (twice that and I get an OutOfMemory on 32-bit):
Array.Sort(list1);
Array.Sort(list2);
return list1.SequenceEqual(list2);
YMMV depending on machine, data-type, lunar cycle, and the other usual factors influencing microbenchmarks.
You could use a binary search tree to ensure that the data is sorted. That would make it an O(log N) operation. Then you can run through each tree one item at a time and break as soon as you find a not equal to condition. This would also give you the added benefit of being able to first compare the size of the two trees since duplicates would be filtered out. I'm assuming these are treated as sets, whereby {"harry", "harry"} == {"harry").
If you are counting duplicates, then do a quicksort or a mergesort first, that would then make your comparison operation an O(N) operation. You could of course compare the size first, as two enums cannot be equal if the sizes are different. Since the data is sorted, the first non-equal condition you encounter would render the entire operation as "not-equal".
#cdhowie's answer is great, but here's a nice trick that makes it even better for types that declare .Count by comparing that value prior to decomposing parameters to IEnumerable. Just add this to your code in addition to his solution:
public static bool UnsortedSequencesEqual<T>(this IReadOnlyList<T> first, IReadOnlyList<T> second, IEqualityComparer<T> comparer = null)
{
if (first.Count != second.Count)
{
return false;
}
return UnsortedSequencesEqual((IEnumerable<T>)first, (IEnumerable<T>)second, comparer);
}

Categories