Consider the following simple code with LINQ OrderBy and ThenBy:
static void Main()
{
var arr1 = new[] { "Alpha", "Bravo", "Charlie", };
var coStr = Comparer<string>.Create((x, y) =>
{
Console.WriteLine($"Strings: {x} versus {y}");
return string.CompareOrdinal(x, y);
});
arr1.OrderBy(x => x, coStr).ToList();
Console.WriteLine("--");
var arr2 = new[]
{
new { P = "Alpha", Q = 7, },
new { P = "Bravo", Q = 9, },
new { P = "Charlie", Q = 13, },
};
var coInt = Comparer<int>.Create((x, y) =>
{
Console.WriteLine($"Ints: {x} versus {y}");
return x.CompareTo(y);
});
arr2.OrderBy(x => x.P, coStr).ThenBy(x => x.Q, coInt).ToList();
}
This simply uses some comparers that write out to the console what they compare.
On my hardware and version of the Framework (.NET 4.6.2), this is the output:
Strings: Bravo versus Alpha
Strings: Bravo versus Bravo
Strings: Bravo versus Charlie
Strings: Bravo versus Bravo
--
Strings: Bravo versus Alpha
Strings: Bravo versus Bravo
Ints: 9 versus 9
Strings: Bravo versus Charlie
Strings: Bravo versus Bravo
Ints: 9 versus 9
My question is: Why would they compare an item from the query to itself?
In the first case, before the -- separator, they do four comparisons. Two of them compare an entry to itself ("Strings: Bravo versus Bravo"). Why?
In the second case, there should not ever be a need for resorting to comparing the Q properties (integers); for there are no duplicates (wrt. ordinal comparison) in the P values, so no tie-breaking from ThenBy should be needed ever. Still we see "Ints: 9 versus 9" twice. Why use the ThenBy comparer with identical arguments?
Note: Any comparer has to return 0 upon comparing something to itself. So unless the algorithm just wants to check if we implemented a comparer correctly (which it will never be able to do fully anyway), what is going on?
Be aware: There are no duplicates in the elements yielded by the queries in my examples.
I saw the same issue with another example with more entries yielded from the query. Above I just give a small example. This happens with an even number of elements yielded, as well.
In the reference source of the QuickSort method used by OrderBy you can see these two lines:
while (i < map.Length && CompareKeys(x, map[i]) > 0) i++;
while (j >= 0 && CompareKeys(x, map[j]) < 0) j--;
These while loops run until they find an element that is no longer "greater" (resp. "less") than the one x points to. So they will break when the identical element is compared.
I can't prove it mathematical, but I guess to avoid comparing identical elements would make the algorithm more complicated and introduce overhead that would impact performance more than this single comparison.
(Note that your comparer should be implemented clever enough to quickly return 0 for identical elements)
In the first case, before the -- separator, they do four comparisons. Two of them compare an entry to itself ("Strings: Bravo versus Bravo"). Why?
Efficiency. Sure it would be possible to check the object isn't itself first, but that means doing an extra operation on every comparison to avoid a case that comes up relatively rarely and which in most cases is pretty cheap (most comparers are). That would be a nett loss.
(Incidentally, I did experiment with just such a change to the algorithm, and when measured it really was an efficiency loss with common comparisons such as the default int comparer).
In the second case, there should not ever be a need for resorting to comparing the Q properties (integers); for there are no duplicates (wrt. ordinal comparison) in the P values, so no tie-breaking from ThenBy should be needed ever. Still we see "Ints: 9 versus 9" twice. Why use the ThenBy comparer with identical arguments?
Who is to say there are no duplicates? The internal comparison was given two things (not necessarily reference types, so short-circuiting on reference identity isn't always an option) and has two rules to follow. The first rule needed a tie-break so the tie-break was done.
The code is designed to work for cases where there can be equivalent values for the first comparison.
If it's known that there won't be equivalent values for the OrderBy then it's for the person who knows that to not use an unnecessary ThenBy, as they are the person who can potentially know that.
Ok, let's see the possibilities here:
T is a value type
In order to check if it's comparing an item against itself, it first needs to check if both items are the same one. How would you do that?
You could call Equals first and then CompareTo if the items are not the same. Do you really want to do that? The cost of Equals is going to be roughly the same as comparing so you'd actually be doubling the cost of the ordering for exactly what? OrderBy simply compares all items, period.
T is a reference type
c# doesn't let you overload only with generic constraints so you'd need to check in runtime if T is a reference type or not and then call a specific implementation that would change the behavior described above. Do you want to incurr in that cost in every case? Of course not.
If the comparison is expensive, then implement in your comparison logic a reference optimization to avoid incurring in stupid costs when comparing an item to itself, but that choice must be yours. I'm pretty sure string.CompareTo does precisely that.
I hope this makes my answer clearer, sorry for the previous short answer, my reasoning wasnt that obvious.
In simple terms in case 1
var coStr = Comparer<string>.Create((x, y) =>
{
Console.WriteLine($"Strings: {x} versus {y}");
return string.CompareOrdinal(x, y);
});
We are just comparing the elements there is no condition to ignore if the result is 0. so Console.WriteLine condition is irrespective to the output of comparison. If you change your code like below
var coStr = Comparer<string>.Create((x, y) =>
{
if (x != y)
Console.WriteLine($"Strings: {x} versus {y}");
return string.CompareOrdinal(x, y);
});
Your output will be like
Strings: Bravo versus Alpha
Strings: Bravo versus Charlie
Same thing for the second statement here we are checking the output of both so for string comparison will return 0 then it will go for the in comparison so it will take that one and output the required. Hope it clears your doubt :)
Related
This question already has answers here:
What does the '=>' syntax in C# mean?
(7 answers)
Closed 2 years ago.
Please look at this block of code:
List<int> list = new List<int>();
list.Add(1);
list.Add(6);
list.Add(2);
list.Add(3);
list.Add(8);
list.Sort((x, y) => x.CompareTo(y));
int[] arr = list.ToArray();
foreach (var i in arr)
{
Console.WriteLine(i);
}
I know that when executed, the code above will print the list in ascending order. If I switch the positions of x and y in "x.CompareTo(y)", then the list will be sorted in descending order instead. I know what CompareTo() does, but here how does it work with x and y to decide the order of sorting? what does x and y represent here?
Sort method signature is public void Sort(Comparison<T> comparison), and if you see the declaration of Comparison it is public delegate int Comparison<in T>(T x, T y).
Just see the comments for Comparison.
//
// Summary:
// Represents the method that compares two objects of the same type.
//
// Parameters:
// x:
// The first object to compare.
//
// y:
// The second object to compare.
//
// Type parameters:
// T:
// The type of the objects to compare.
//
// Returns:
// A signed integer that indicates the relative values of x and y, as shown in the
// following table.
// Value – Meaning
// Less than 0 –x is less than y.
// 0 –x equals y.
// Greater than 0 –x is greater than y.
So it does the comparison for all the elements in the order, and if you swap x and y then it will do just the reverse of it.
In simple terms, (x, y) => x.CompareTo(y) is like a one line version of a method - it goes by various names, usually "lambda" or "delegate". You can think of it as a method that is stripped down as much as possible. Here's the full method, I'll talk about stripping it down shortly:
int MyComparer(int x, int y){
return x.CompareTo(y);
}
The list will perform a sort and will call this method every time it wants to make a choice as to which of the two ints it's considering, is larger than the other. How it does the sort is irrelevant, probably quicksort ish, but ultimately it will often want to know whether x is larger, same or smaller than y
You could take that method I wrote up there and call sort with it instead:
yourList.Sort(MyComparer)
So c# has this notion of being able to pass methods round in the same way that it passes variable data around. Typically when you pass a method you're saying "here's some logic you should call when you need to know that thing you want to know" - like a custom sort algorithm trhat you can vary by varying the logic you pass in
In stripping MyComparer down, well.. we know this list is full of ints so we know that x and y are ints, so we can throw the "int" away. We can get rid of the method name too, and because it's just one line we can ditch the curly brackets. The return keyword is unnecessary too because by definition to be useful the single line is going to have to evaluate to something returnable and it shall be returned implicitly. The => separates the parameters from the body logic
(method,parameters) => logic_that_produces_a_value(
So that long winded method can be boiled down to
(x,y) => x.CompareTo(y)
x,y are whatever type the list holds, the result value is a -1, 0 or 1 which list will use to determine how x and y relate sorting wise. They could be called anything; just like when you declare a method MyMethod(int left, int right) and choose to call the parameters "left" and "right", here we do the same because this is just declaring name parameters - you could easily write (left,right) => left.CompareTo(right) and it'd work the same
You'll see it a lot in c#, because c# devs are quite focused on representing logic in a compact fashion. Often the most useful way of making something flexible and powerful is to implement some of the logic but then leave it possible for another dev to finish the logic off. Microsoft implemented sorting into list, but they leave it up to us to decide "which is greater or lesser" so we can supply logic that influences sort order. They just mandated "any time we want to know which way round two things are we'll ask you" - which means they don't need to have SortAscending and SortDescending - we just flip the logic and declare that 10 is less than 9 if we want 10 to come first.
LINQ relies on "supply me some logic" heavily - you might be looking for all ints less than 3 in your list:
myList.Where(item => item < 3)
LINQ will loop over the list calling your supplied logic (lambda) here on every item and delivering only those where your lambda is true. item is one of the list entries, in this case an int. if the list held Person objects, item would be a Person. To that end I'd recommend calling it p or person to make it clearer what it is. This will become important when you start using more complex lambdas like
string[] wantedNames = new [] { "John", "Luke" };
people.Where(p => wantedNames.Any(n => p.Name == n));
This declares multiple names you're looking for then searches the list of people for all those whose name is in the list of wanted names. For every person p, LINQ asks "wantedNames, do Any of your elements return true for this supplied logic" and then it loops over the wantedNames calling the logic you supplied (n, the item from wantedNames is equal to the person name?).
Here you've used two different methods that take custom logic that LINQ calls repeatedly - Where is a method that takes a delegate, and so is Any; the Any operates on the list of names, the Where operates on the list of people so keeping it clear which is a Person p and which is a name string n using variable names is better than using bland names like a and b.
Got more into on your Sort method see the docs:
See List.Sort(Comparer) for more info
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.sort?view=net-5.0#System_Collections_Generic_List_1_Sort_System_Comparison__0__
I am looking at this code
var numbers = Enumerable.Range(0, 20);
var parallelResult = numbers.AsParallel().AsOrdered()
.Where(i => i % 2 == 0).AsSequential();
foreach (int i in parallelResult.Take(5))
Console.WriteLine(i);
The AsSequential() is supposed to make the resulting array sorted. Actually it is sorted after its execution, but if I remove the call to AsSequential(), it is still sorted (since AsOrdered()) is called.
What is the difference between the two?
AsSequential is just meant to stop any further parallel execution - hence the name. I'm not sure where you got the idea that it's "supposed to make the resulting array sorted". The documentation is pretty clear:
Converts a ParallelQuery into an IEnumerable to force sequential evaluation of the query.
As you say, AsOrdered ensures ordering (for that particular sequence).
I know that this was asked over a year old but here are my two cents.
In the example exposed, i think it uses AsSequential so that the next query operator (in this case the Take operator) it is execute sequentially.
However the Take operator prevent a query from being parallelized, unless the source elements are in their original indexing position, so that is why even when you remove the AsSequential operator, the result is still sorted.
int a, b, n;
...
(a, b) = (2, 3);
// 'a' is now 2 and 'b' is now 3
This sort of thing would be really helpfull in C#. In this example 'a' and 'b' arn't encapsulated together such as the X and Y of a position might be. Does this exist in some form?
Below is a less trivial example.
(a, b) = n == 4 ? (2, 3) : (3, n % 2 == 0 ? 1 : 2);
Adam Maras shows in the comments that:
var result = n == 4 ? Tuple.Create(2, 3) : Tuple.Create(3, n % 2 == 0 ? 1 : 2);
Sort of works for the example above however as he then points out it creates a new truple instead of changing the specified values.
Eric Lippert asks for use cases, therefore perhaps:
(a, b, c) = (c, a, b); // swap or reorder on one line
(x, y) = move((x, y), dist, heading);
byte (a, b, c, d, e) = (5, 4, 1, 3, 2);
graphics.(PreferredBackBufferWidth, PreferredBackBufferHeight) = 400;
notallama also has use cases, they are in his answer below.
We have considered supporting a syntactic sugar for tuples but it did not make the bar for C# 4.0. It is unlikely to make the bar for C# 5.0; the C# 5.0 team is pretty busy with getting async/await working right. We will consider it for hypothetical future versions of the language.
If you have a really solid usage case that is compelling, that would help us prioritize the feature.
use case:
it'd be really nice for working with IObservables, since those have only one type parameter. you basically want to subscribe with arbitrary delegates, but you're forced to use Action, so that means if you want multiple parameters, you have to either use tuples, or create custom classes for packing and unpacking parameters.
example from a game:
public IObservable<Tuple<GameObject, DamageInfo>> Damaged ...
void RegisterHitEffects() {
(from damaged in Damaged
where damaged.Item2.amount > threshold
select damaged.Item1)
.Subscribe(DoParticleEffect)
.AddToDisposables();
}
becomes:
void RegisterHitEffects() {
(from (gameObject, damage) in Damaged
where damage.amount > threshold
select gameObject)
.Subscribe(DoParticleEffect)
.AddToDisposables();
}
which i think is cleaner.
also, presumably IAsyncResult will have similar issues when you want to pass several values. sometimes it's cumbersome to create classes just to shuffle a bit of data around, but using tuples as they are now reduces code clarity. if they're used in the same function, anonymous types fit the bill nicely, but they don't work if you need to pass data between functions.
also, it'd be nice if the sugar worked for generic parameters, too. so:
IEnumerator<(int, int)>
would desugar to
IEnumerator<Tuple<int,int>>
The behavior that you're looking for can be found in languages that have support or syntactic sugar for tuples. C# is not among these langauges; while you can use the Tuple<...> classes to achieve similar behavior, it will come out very verbose (not clean like you're looking for.)
Deconstruction was introduced in C# 7.0:
https://blogs.msdn.microsoft.com/dotnet/2016/08/24/whats-new-in-csharp-7-0/#user-content-deconstruction
The closest structure I can think of is the Tuple class in version 4.0 of the framework.
As others already wrote, C# 4 Tuples are a nice addition, but nothing really compelling to use as long as there aren't any unpacking mechanisms. What I really demand of any type I use is clarity of what it describes, on both sides of the function protocol (e.g. caller, calle sides)... like
Complex SolvePQ(double p, double q)
{
...
return new Complex(real, imag);
}
...
var solution = SolvePQ(...);
Console.WriteLine("{0} + {1}i", solution.Real, solution.Imaginary);
This is obvious and clear at both caller and callee side. However this
Tuple<double, double> SolvePQ(double p, double q)
{
...
return Tuple.Create(real, imag);
}
...
var solution = SolvePQ(...);
Console.WriteLine("{0} + {1}i", solution.Item1, solution.Item2);
Doesn't leave the slightest clue about what that solution actually is (ok, the string and the method name make it pretty obvious) at the call site. Item1 and Item2 are of the same type, which renders tooltips useless. The only way to know for certain is to "reverse engineer" your way back through SolvePQ.
Obivously, this is far fetched and everyone doing serious numerical stuff should have a Complex type (like that in the BCL).
But everytime you get split results and you want give those results distinct names for the sake of readability, you need tuple unpacking. The rewritten last two lines would be:
var (real, imaginary) = SolvePQ(...); // or var real, imaginary = SolvePQ(...);
Console.WriteLine("{0} + {1}i", real, imaginary);
This leaves no room for confusion, except for getting used to the syntax.
Creating a set of Unpack<T1, T2>(this Tuple<T1, T2>, out T1, out T2) methods would be a more idiomatic c# way of doing this.
Your example would then become
int a, b, n;
...
Tuple.Create(2, 3).Unpack(out a, out b);
// 'a' is now 2 and 'b' is now 3
which is no more complex than your proposal, and a lot clearer.
I can't figured out in remove duplicates entries from an Array of struct
I have this struct:
public struct stAppInfo
{
public string sTitle;
public string sRelativePath;
public string sCmdLine;
public bool bFindInstalled;
public string sFindTitle;
public string sFindVersion;
public bool bChecked;
}
I have changed the stAppInfo struct to class here thanks to Jon Skeet
The code is like this: (short version)
stAppInfo[] appInfo = new stAppInfo[listView1.Items.Count];
int i = 0;
foreach (ListViewItem item in listView1.Items)
{
appInfo[i].sTitle = item.Text;
appInfo[i].sRelativePath = item.SubItems[1].Text;
appInfo[i].sCmdLine = item.SubItems[2].Text;
appInfo[i].bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false;
appInfo[i].sFindTitle = item.SubItems[4].Text;
appInfo[i].sFindVersion = item.SubItems[5].Text;
appInfo[i].bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false;
i++;
}
I need that appInfo array be unique in sTitle and sRelativePath members the others members can be duplicates
EDIT:
Thanks to all for the answers but this application is "portable" I mean I just need the .exe file and I don't want to add another files like references *.dll so please no external references this app is intended to use in a pendrive
All data comes form a *.ini file what I do is: (pseudocode)
ReadFile()
FillDataFromFileInAppInfoArray()
DeleteDuplicates()
FillListViewControl()
When I want to save that data into a file I have these options:
Using ListView data
Using appInfo array (this is more faster¿?)
Any other¿?
EDIT2:
Big thanks to: Jon Skeet, Michael Hays thanks for your time guys!!
Firstly, please don't use mutable structs. They're a bad idea in all kinds of ways.
Secondly, please don't use public fields. Fields should be an implementation detail - use properties.
Thirdly, it's not at all clear to me that this should be a struct. It looks rather large, and not particularly "a single value".
Fourthly, please follow the .NET naming conventions so your code fits in with all the rest of the code written in .NET.
Fifthly, you can't remove items from an array, as arrays are created with a fixed size... but you can create a new array with only unique elements.
LINQ to Objects will let you do that already using GroupBy as shown by Albin, but a slightly neater (in my view) approach is to use DistinctBy from MoreLINQ:
var unique = appInfo.DistinctBy(x => new { x.sTitle, x.sRelativePath })
.ToArray();
This is generally more efficient than GroupBy, and also more elegant in my view.
Personally I generally prefer using List<T> over arrays, but the above will create an array for you.
Note that with this code there can still be two items with the same title, and there can still be two items with the same relative path - there just can't be two items with the same relative path and title. If there are duplicate items, DistinctBy will always yield the first such item from the input sequence.
EDIT: Just to satisfy Michael, you don't actually need to create an array to start with, or create an array afterwards if you don't need it:
var query = listView1.Items
.Cast<ListViewItem>()
.Select(item => new stAppInfo
{
sTitle = item.Text,
sRelativePath = item.SubItems[1].Text,
bFindInstalled = item.SubItems[3].Text == "Sí",
sFindTitle = item.SubItems[4].Text,
sFindVersion = item.SubItems[5].Text,
bChecked = item.SubItems[6].Text == "Sí"
})
.DistinctBy(x => new { x.sTitle, x.sRelativePath });
That will give you an IEnumerable<appInfo> which is lazily streamed. Note that if you iterate over it more than once, however, it will iterate over listView1.Items the same number of times, performing the same uniqueness comparisons each time.
I prefer this approach over Michael's as it makes the "distinct by" columns very clear in semantic meaning, and removes the repetition of the code used to extract those columns from a ListViewItem. Yes, it involves building more objects, but I prefer clarity over efficiency until benchmarking has proved that the more efficient code is actually required.
What you need is a Set. It ensures that the items entered into it are unique (based on some qualifier which you will set up). Here is how it is done:
First, change your struct to a class. There is really no getting around that.
Second, provide an implementation of IEqualityComparer<stAppInfo>. It may be a hassle, but it is the thing that makes your set work (which we'll see in a moment):
public class AppInfoComparer : IEqualityComparer<stAppInfo>
{
public bool Equals(stAppInfo x, stAppInfo y) {
if (ReferenceEquals(x, y)) return true;
if (x == null || y == null) return false;
return Equals(x.sTitle, y.sTitle) && Equals(x.sRelativePath,
y.sRelativePath);
}
// this part is a pain, but this one is already written
// specifically for your question.
public int GetHashCode(stAppInfo obj) {
unchecked {
return ((obj.sTitle != null
? obj.sTitle.GetHashCode() : 0) * 397)
^ (obj.sRelativePath != null
? obj.sRelativePath.GetHashCode() : 0);
}
}
}
Then, when it is time to make your set, do this:
var appInfoSet = new HashSet<stAppInfo>(new AppInfoComparer());
foreach (ListViewItem item in listView1.Items)
{
var newItem = new stAppInfo {
sTitle = item.Text,
sRelativePath = item.SubItems[1].Text,
sCmdLine = item.SubItems[2].Text,
bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false,
sFindTitle = item.SubItems[4].Text,
sFindVersion = item.SubItems[5].Text,
bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false};
appInfoSet.Add(newItem);
}
appInfoSet now contains a collection of stAppInfo objects with unique Title/Path combinations, as per your requirement. If you must have an array, do this:
stAppInfo[] appInfo = appInfoSet.ToArray();
Note: I chose this implementation because it looks like the way you are already doing things. It has an easy to read for-loop (though I do not need the counter variable). It does not involve LINQ (wich can be troublesome if you aren't familiar with it). It requires no external libraries outside of what .NET framework provides to you. And finally, it provides an array just like you've asked. As for reading the file in from an INI file, hopefully you see that the only thing that will change is your foreach loop.
Update
Hash codes can be a pain. You might have been wondering why you need to compute them at all. After all, couldn't you just compare the values of the title and relative path after each insert? Well sure, of course you could, and that's exactly how another set, called SortedSet works. SortedSet makes you implement IComparer in the same way that I implemented IEqualityComparer above.
So, in this case, AppInfoComparer would look like this:
private class AppInfoComparer : IComparer<stAppInfo>
{
// return -1 if x < y, 1 if x > y, or 0 if they are equal
public int Compare(stAppInfo x, stAppInfo y)
{
var comparison = x.sTitle.CompareTo(y.sTitle);
if (comparison != 0) return comparison;
return x.sRelativePath.CompareTo(y.sRelativePath);
}
}
And then the only other change you need to make is to use SortedSet instead of HashSet:
var appInfoSet = new SortedSet<stAppInfo>(new AppInfoComparer());
It's so much easier in fact, that you are probably wondering what gives? The reason that most people choose HashSet over SortedSet is performance. But you should balance that with how much you actually care, since you'll be maintaining that code. I personally use a tool called Resharper, which is available for Visual Studio, and it computes these hash functions for me, because I think computing them is a pain, too.
(I'll talk about the complexity of the two approaches, but if you already know it, or are not interested, feel free to skip it.)
SortedSet has a complexity of O(log n), that is to say, each time you enter a new item, will effectively go the halfway point of your set and compare. If it doesn't find your entry, it will go to the halfway point between its last guess and the group to the left or right of that guess, quickly whittling down the places for your element to hide. For a million entries, this takes about 20 attempts. Not bad at all. But, if you've chosen a good hashing function, then HashSet can do the same job, on average, in one comparison, which is O(1). And before you think 20 is not really that big a deal compared to 1 (after all computers are pretty quick), remember that you had to insert those million items, so while HashSet took about a million attempts to build that set up, SortedSet took several million attempts. But there is a price -- HashSet breaks down (very badly) if you choose a poor hashing function. If the numbers for lots of items are unique, then they will collide in the HashSet, which will then have to try again and again. If lots of items collide with the exact same number, then they will retrace each others steps, and you will be waiting a long time. The millionth entry will take a million times a million attempts -- HashSet has devolved into O(n^2). What's important with those big-O notations (which is what O(1), O(log n), and O(n^2) are, in fact) is how quickly the number in parentheses grows as you increase n. Slow growth or no growth is best. Quick growth is sometimes unavoidable. For a dozen or even a hundred items, the difference may be negligible -- but if you can get in the habit of programming efficient functions as easily as alternatives, then it's worth conditioning yourself to do so as problems are cheapest to correct closest to the point where you created that problem.
Use LINQ2Objects, group by the things that should be unique and then select the first item in each group.
var noDupes = appInfo.GroupBy(
x => new { x.sTitle, x.sRelativePath })
.Select(g => g.First()).ToArray();
!!! Array of structs (value type) + sorting or any kind of search ==> a lot of unboxing operations.
I would suggest to stick with recommendations of Jon and Henk, so make it as a class and use generic List<T>.
Use LINQ GroupBy or DistinctBy, as for me it is much simple to use built in GroupBy, but it also interesting to take a look at an other popular library, perhaps it gives you some insights.
BTW, Also take a look at the LambdaComparer it will make you life easier each time you need such kind of in place sorting/search, etc...
I have two List's of words. I want to count the words larger than 3 characters that exists in both lists. Using C# how would you solve it ?
I would use
var count = Enumerable.Intersect(listA, listB).Count(word => word.Length > 3);
assuming listA and listB are of type IEnumerable<String>.
Assuming list 1 length is N and list 2 length is M:
I would first filter since this is a cheap operation O(N+M) then do the intersection, A relatively expensive operation based on the current implementation. The cost of the Intersect call is complicated and is fundamentally driven by the behaviours of the hash function:
If the hash function is poor then the performance can degrade to O(N*M) (as every string in one list is checked against every stream in the other.
If the performance is good then cost is simlpy a lookup in a hash, as such this is O(1), thus a cost of M checks in the hash and cost of M to construct the hash so also O(N+M) in time but with an additional cost of O(N) in space.
The construction of the backing set will be the killer in performance terms.
If you knew that both lists were, say, sorted already then you could write your own Intersects check with constant space overhead and O(N+M) worst case running time not to mention excellent memory locality.
This leaves you with:
int count = Enumerable.Intersect(
list1.Where(word => word.Length > 3),
list2.Where(word => word.Length > 3)).Count();
Incidentally the Enumerable.Intersect method's performance behaviour may change considerably depending on the order of the arguments.
In most cases making the smaller of the two the first argument will produce faster, more memory efficient code and the first argument is used to construct a backing temporary (hash based) Set. This of course is coding to a (hidden) implementation detail so should be considered only if this is shown to be an issue after performance analysis and highlighted as a micro optimization if so.
The second filter on list2 is fundamentally unecessary for correctness (since the Intersects will remove such entries anyway)
If it quite possible that the following is faster
int count = Enumerable.Intersect(
list1.Where(word => word.Length > 3),
list2).Count();
However filtering by length is very cheap for long strings compared to calculating their hash codes. The better one will only be found through benchmarking with inputs appropriate to your usage.
List<string> t = new List<string>();
List<string> b = new List<string>();
...
Console.WriteLine(t.Count(x => x.Length > 3 && b.Contains(x)));