Can you explain this lambda grouping function?

Can you explain this lambda grouping function? - c#

I've been using LINQ and Lambda Expressions for a while, but I'm still not completely comfortable with every aspect of the feature.
So, while I was working on a project recently I needed to get a distinct list of objects based off of some property, and I ran across this code. It works, and I'm fine with that, but I'd like to understand the grouping mechanism. I don't like simply plugging code in and running away from the problem if I can help it.
Anyways the code is:
var listDistinct
=list.GroupBy(
i => i.value1,
(key, group) => group.First()
).ToList();
In the code sample above, you're first calling GroupBy and passing it a lambda expression telling it to group by the property value1. The second section of the code is causing the confusion.
I understand that key is referencing value1 in the (key, group) statement, but I'm still not wrapping my head around everything that's taking place.

What does the expression
list.GroupBy(
i => i.value1,
(key, group) => group.First())
do?
This creates a query which, when executed, analyzes the sequence list to produce a sequence of groups, and then projects the sequence of groups into a new sequence. In this case, the projection is to take the first item out of each group.
The first lambda chooses the "key" upon which the groups are constructed. In this case, all items in the list which have the same value1 property are put in a group. The value that they share becomes the "key" of the group.
The second lambda projects from the sequence of keyed groups; it's as though you'd done a select on the sequence of groups. The net effect of this query is to choose a set of elements from the list such that each element of the resulting sequence has a different value of the value1 property.
The documentation is here:
http://msdn.microsoft.com/en-us/library/bb549393.aspx
If the documentation is not clear, I am happy to pass along criticisms to the documentation manager.
This code uses group as the formal parameter of a lambda. Isn't group a reserved keyword?
No, group is a contextual keyword. LINQ was added to C# 3.0, so there might have already been existing programs using group as an identifier. These programs would be broken when recompiled if group was made a reserved keyword. Instead, group is a keyword only in the context of a query expression. Outside of a query expression it is an ordinary identifier.
If you want to call attention to the fact that it is an ordinary identifier, or if you want to use the identifier group inside a query expression, you can tell the compiler "treat this as an identifier, not a keyword" by prefacing it with #. Were I writing the code above I would say
list.GroupBy(
i => i.value1,
(key, #group) => #group.First())
to make it clear.
Are there other contextual keywords in C#?
Yes. I've documented them all here:
http://ericlippert.com/2009/05/11/reserved-and-contextual-keywords/

I would like to simplify this to a list of int and how to do distinct in this list by using GroupBy:
var list = new[] {1, 2, 3, 1, 2, 2, 3};
if you call GroupBy with x => x, you will get 3 groups with type:
IEnumerable<IEnumerable<int>>
{{1,1},{2,2,2},{3,3}}
The key of each group are: 1, 2, 3. And then, when calling group.First(), it means you get first item of each group:
{1,1}: -> 1.
{2,2,2}: -> 2
{3,3} -> 3
So the final result is : {1, 2, 3}
Your case is similar with this.

It uses this overload of the Enumerable.GroupBy method:
public static IEnumerable<TResult> GroupBy<TSource, TKey, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, IEnumerable<TSource>, TResult> resultSelector
)
which, as stated on MSDN:
Groups the elements of a sequence according to a specified key selector function and creates a result value from each group and its key.
So, unlike other overloads which return a bunch of groups (i.e. IEnumerable<IGrouping<TK, TS>>), this overload allows you to project each group to a single instance of a TResult of your choice.
Note that you could get the same result using the basic GroupBy overload and a Select:
var listDistinct = list
.GroupBy(i => i.value1)
.Select(g => g.First())
.ToList();

(key, group) => group.First()
It's just taking the First() element within each group.
Within that lambda expression key is a key that was used to create that group (value1 in your example) and group is IEnumerable<T> with all elements that has that key.

The below self describing example should help you understand the grouping:
class Item
{
public int Value { get; set; }
public string Text { get; set; }
}
static class Program
{
static void Main()
{
// Create some items
var item1 = new Item {Value = 0, Text = "a"};
var item2 = new Item {Value = 0, Text = "b"};
var item3 = new Item {Value = 1, Text = "c"};
var item4 = new Item {Value = 1, Text = "d"};
var item5 = new Item {Value = 2, Text = "e"};
// Add items to the list
var itemList = new List<Item>(new[] {item1, item2, item3, item4, item5});
// Split items into groups by their Value
// Result contains three groups.
// Each group has a distinct groupedItems.Key --> {0, 1, 2}
// Each key contains a collection of remaining elements: {0 --> a, b} {1 --> c, d} {2 --> e}
var groupedItemsByValue = from item in itemList
group item by item.Value
into groupedItems
select groupedItems;
// Take first element from each group: {0 --> a} {1 --> c} {2 --> e}
var firstTextsOfEachGroup = from groupedItems in groupedItemsByValue
select groupedItems.First();
// The final result
var distinctTexts = firstTextsOfEachGroup.ToList(); // Contains items where Text is: a, c, e
}
}

It's equvilant to
var listDistinct=(
from i in list
group i by i.value1 into g
select g.First()
).ToList();
The part i => i.value1 in your original code is key selector. In this code, is simply i.value in the syntax of group elements by key.
The part (key, group) => group.First() in the original code is a delegate of result selector. In this code, it is wrote in a more semantical syntax of from ... select. is Here g is group in original code.

Related

Lambda Function to find most popular word in a List C# [duplicate]

This question already has answers here:
How to Count Duplicates in List with LINQ
(7 answers)
Closed 2 years ago.
I currently have what I believe is a lambda function with C# (fairly new to coding & haven't used a lambda function before so go easy), which adds duplicate strings (From FilteredList) in a list and counts the number of occurrences and stores that value in count. I only want the most used word from the list which I've managed to do by the "groups.OrderBy()... etc) line, however I'm pretty sure that I've made this very complicated for myself and very inefficient. As well as by adding the dictionary and the key value pairs.
var groups =
from s in FilteredList
group s by s into g
// orderby g descending
select new
{
Stuff = g.Key,
Count = g.Count()
};
groups = groups.OrderBy(g => g.Count).Reverse().Take(1);
var dictionary = groups.ToDictionary(g => g.Stuff, g => g.Count);
foreach (KeyValuePair<string, int> kvp in dictionary)
{
Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);
}
Would someone please either help me through this and explain a little bit of this too me or at least point me in the direction of some learning materials which may help me better understand this.
For extra info: The FilteredList comes from a large piece of external text, read into a List of strings (split by delimiters), minus a list of string stop words.
Also, if this is not a lambda function or I've got any of the info in here incorrect, please kindly correct me so I can fix the question to be more relevant & help me find an answer.
Thanks in advance.

Yes, I think you have overcomplicated it somewhat.. Assuming your list of words is like:
var words = new[] { "what's", "the", "most", "most", "most", "mentioned", "word", "word" };
You can get the most mentioned word with:
words.GroupBy(w => w).OrderByDescending(g => g.Count()).First().Key;
Of course, you'd probably want to assign it to a variable, and presentationally you might want to break it into multiple lines:
var mostFrequentWord = words
.GroupBy(w => w) //make a list of sublists of words, like a dictionary of word:list<word>
.OrderByDescending(g => g.Count()) //order by sublist count descending
.First() //take the first list:sublist
.Key; //take the word
The GroupBy produces a collection of IGroupings, which is like a Dictionary<string, List<string>>. It maps each word (the key of the dictionary) to a list of all the occurrences of that word. In my example data, the IGrouping with the Key of "most" will be mapped to a List<string> of {"most","most","most"} which has the highest count of elements at 3. If we OrderByDescending the grouping based on the Count() of each of the lists then take the First, we'll get the IGrouping with a Key of "most", so all we need to do to retrieve the actual word is pull the Key out
If the word is just one of the properties of a larger object, then you can .GroupBy(o => o.Word). If you want some other property from the IGrouping such as its first or last then you can take that instead of the Key, but bear in mind that the property you end up taking might be different each time unless you enforce ordering of the list inside the grouping
If you want to make this more efficient than you can install MoreLinq and use MaxBy; getting the Max word By the count of the lists means you can avoid a sort operation. You could also avoid LINQ and use a dictionary:
string[] words = new[] { "what", "is", "the", "most", "most", "most", "mentioned", "word", "word" };
var maxK = "";
var maxV = -1;
var d = new Dictionary<string, int>();
foreach(var w in words){
if(!d.ContainsKey(w))
d[w] = 0;
d[w]++;
if(d[w] > maxV){
maxK = w;
maxV = d[w];
}
}
Console.WriteLine(maxK);
This keeps a dictionary that counts words as it goes, and will be more efficient than the LINQ route as it needs only a single pass of the word list, plus the associated dictionary lookups in contrast to "convert wordlist to list of sublists, sort list of sublists by sublist count, take first list item"

This should work:
var mostPopular = groups
.GroupBy(item => new {item.Stuff, item.Count})
.Select(g=> g.OrderByDescending(x=> x.Count).FirstOrDefault())
.ToList();
OrderByDescending along with .First() combines your usage of OrderBy, Reverse() and Take.

First part is a Linq operation to read the groups from the FilteredList.
var groups =
from s in FilteredList
group s by s into g
// orderby g descending
select new
{
Stuff = g.Key,
Count = g.Count()
};
The Lambda usage starts when the => signal is used. Basically means it's going to be computed at run time and an object of that type/format is to be created.
Example on your code:
groups = groups.OrderBy(g => g.Count).Reverse().Take(1);
Reading this, it is going to have an object 'g' that represents the elements on 'groups' with a property 'Count'. Being a list, it allows the 'Reverse' to be applied and the 'Take' to get the first element only.
As for documentation, best to search inside Stack Overflow, please check these links:
C# Lambda expressions: Why should I use them? - StackOverflow
Lambda Expressions in C# - external
Using a Lambda Expression Over a List in C# - external
Second step: if the data is coming from an external source and there are no performance issues, you can leave the code to refactor onwards. A more detail data analysis needs to be made to ensure another algorithm works.

How to build nested HashMap in C# using LanguageExt in a most-readable way?

In .NET / C# I have input data of type IEnumerable<T> with T having some properties I want to use for lookups.
How can I build a two-level (maybe three-level) lookup using LanguageExt without producing hard to read code like this:
var items = Range(1, 1000)
.Map(i => new
{
Number = i,
Section = (byte) (i % 10),
Text = $"Number is i"
}); // just some test data
HashMap<byte, HashMap<int, string>> lookup
= toHashMap(
from item in items
group item.Text by (item.Section, item.Number) into gInner
group gInner by gInner.Key.Section into gOuter
select ( gOuter.Key, toHashMap(gOuter.Map(_ => (_.Key.Number, _.Head()))) )
);
Expected output: lookup hashmap with Section as outer key, Number as inner key and Text as value.
I prefer solutions using LINQ syntax (maybe making it easier to combine this with transforming / filtering / ordering ...).

Language-ext has built-in extension methods for dealing with nested HashMap and Map types (up to four nested levels deep):
So, to lookup a type:
HashMap<int, HashMap<string, DateTime>> lookup = ...;
var value = lookup.Find(123, "Hello");
You can also add:
lookup = lookup.AddOrUpdate(123, "Hello", value);
There's also Remove, MapRemoveT, MapT, FilterT, FilterRemoveT, Exists, ForAll, SetItemT, TrySetItemT, and FoldT.
So, to answer your specific question:
var map = items.Fold(
HashMap<byte, HashMap<int, string>>(),
(s, item) => s.AddOrUpdate(item.Section, item.Number, item.Text));
If you do this a lot, then you could generalise it into an extension of Seq<(A, B, C)>
public static HashMap<A, HashMap<B, C>> ToHashMap<A, B, C>(this Seq<(A, B, C)> items) =>
items.Fold(
HashMap<A, HashMap<B, C>>(),
(s, item) => s.AddOrUpdate(item.Item1, item.Item2, item.Item3));
I had the same requirements, that's why I added these to language-ext :)

How to handle duplicate keys while joining two lists?

I'm new to C#.
I have the following struct.
struct Foo
{
string key;
Bar values;
}
I have two lists of Foo, L1 and L2 of equal size both contain same set of keys.
I have to merge the corresponding Foo instances in L1 and L2.
Foo Merge(Foo f1, Foo f2)
{
// merge f1 and f2.
return result.
}
I wrote the following to achieve this.
resultList = L1.Join(L2, f1 => f1.key, f2 => f2.key, (f1, f2) => Merge(f1, f2)
).ToList())
My problem is that my key is not unique. I have n number of elements in L1 with the same key (say "key1") (which are also appearing in L2 somewhere). So, the above join statement selects n matching entries from L2 for each "key1" from L1 and I get n*n elements with key "key1" in the result where I want only n. (So, this is kind of crossproduct for those set of elements).
I want to use Join and still select an element from L1 with "key1" and force the Linq to use the first available 'unused' "key1" element from L2. Is this possible? Is join a bad idea here?
(Also, the I want to preserve the order of the keys as in L1. I tried to handle all elements with such keys before the join and removed those entries from L1 and L2. This disturbed the order of the keys and it looked ugly).
I'm looking for a solution without any explicit for loops.

From your comment to ElectricRouge answer, you could do something like
var z = list1.Join(list2.GroupBy(m => m.Id),
m => m.Id,
g => g.Key,
(l1, l2) => new{l1, l2});
this would give you a list of all keys in l1, and the corresponding grouped keys in l2.
Not sure it's really readable.

I need to find the corresponding entries in two lists and do some operation on them. That is my preliminary requirement.
For this you can do something like this.
var z=S1.Select(i=>i.Key).Tolist(); //make a list of all keys in S1
List<Foo> result=new List<Foo>();
foreach(var item in z) // Compare with S2 using keys in z
{
var x=item.Where(i=>i.Key==item.Key)
result.Add(x);
}
Is this what you are looking for?

I want to use Join and still select an element from L1 with "key1" and force the Linq to use the first available 'unused' "key1" element from L2. Is this possible?
When combining elements from the two lists you want to pick the first element in the second list having the same key as the element in the first list. (Previously, I interpreted you question differently, and a solution to this different problem is available in the edit history of this answer.)
For quick access to the desired values in the second list a dictionary is created providing lookup from keys to the desired value from the second list:
var dictionary2 = list2
.GroupBy(foo => foo.Key)
.ToDictionary(group => group.Key, group => group.First());
The use of First expresses the requirement that you want to pick the first element in the second list having the same key.
The merged list is now created by using projection over the first list:
var mergedList = list1.Select(
foo => Merge(
foo,
dictionary2[foo.Key]
)
);
When you use foreach to iterate mergedList or ToList() the desired result will be computed.

You could use Union to remove the duplicated keys.
Documentation at http://msdn.microsoft.com/en-us/library/bb341731.aspx
List<int> list1 = new List<int> { 1, 12, 12, 5};
List<int> list2 = new List<int> { 12, 5, 7, 9, 1 };
List<int> ulist = list1.Union(list2).ToList();
Example taken from : how to merge 2 List<T> with removing duplicate values in C#
Or you can use Concat to merge a list of different types (Keeping all keys).
See the documentation her : http://msdn.microsoft.com/en-us/library/bb302894(v=vs.110).aspx
var MyCombinedList = Collection1.Concat(Collection2)
.Concat(Collection3)
.ToList();
Example taken from same question : Merge two (or more) lists into one, in C# .NET

Finally I adapted Raphaël's answer as below.
public class FooComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo o1, Foo o2)
{
return o1.key == o2.key;
}
public int GetHashCode(Foo obj)
{
return obj.key.GetHashCode();
}
}
resultList = L1.Join(L2.Select(m => m).Distinct(new FooComparer()).ToList(), f1 => f1.key, f2 => f2.key, (f1, f2) => Merge(f1, f2)
).ToList());
Short explanation:
L2.Select(m => m).Distinct(new FooComparer()).ToList()
creates a new list by removing the duplicate keys from L2. Join L1 with this new list to get the required result.

LINQ to find array indexes of a value

Assuming I have the following string array:
string[] str = new string[] {"max", "min", "avg", "max", "avg", "min"}
Is it possbile to use LINQ to get a list of indexes that match one string?
As an example, I would like to search for the string "avg" and get a list containing
2, 4
meaning that "avg" can be found at str[2] and str[4].

.Select has a seldom-used overload that produces an index. You can use it like this:
str.Select((s, i) => new {i, s})
.Where(t => t.s == "avg")
.Select(t => t.i)
.ToList()
The result will be a list containing 2 and 4.
Documentation here

You can do it like this:
str.Select((v,i) => new {Index = i, Value = v}) // Pair up values and indexes
.Where(p => p.Value == "avg") // Do the filtering
.Select(p => p.Index); // Keep the index and drop the value
The key step is using the overload of Select that supplies the current index to your functor.

You can use the overload of Enumerable.Select that passes the index and then use Enumerable.Where on an anonymous type:
List<int> result = str.Select((s, index) => new { s, index })
.Where(x => x.s== "avg")
.Select(x => x.index)
.ToList();
If you just want to find the first/last index, you have also the builtin methods List.IndexOf and List.LastIndexOf:
int firstIndex = str.IndexOf("avg");
int lastIndex = str.LastIndexOf("avg");
(or you can use this overload that take a start index to specify the start position)

First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, str.Count)
.Where(i => str[i] == "avg")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.

While you could use a combination of Select and Where, this is likely a good candidate for making your own function:
public static IEnumerable<int> Indexes<T>(IEnumerable<T> source, T itemToFind)
{
if (source == null)
throw new ArgumentNullException("source");
int i = 0;
foreach (T item in source)
{
if (object.Equals(itemToFind, item))
{
yield return i;
}
i++;
}
}

You need a combined select and where operator, comparing to accepted answer this will be cheaper, since won't require intermediate objects:
public static IEnumerable<TResult> SelectWhere<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool> filter, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (var s in source)
{
checked{ ++index; }
if (filter(s))
yield return selector(s, index);
}
}

Sort one list by another

I have 2 list objects, one is just a list of ints, the other is a list of objects but the objects has an ID property.
What i want to do is sort the list of objects by its ID in the same sort order as the list of ints.
Ive been playing around for a while now trying to get it working, so far no joy,
Here is what i have so far...
//**************************
//*** Randomize the list ***
//**************************
if (Session["SearchResultsOrder"] != null)
{
// save the session as a int list
List<int> IDList = new List<int>((List<int>)Session["SearchResultsOrder"]);
// the saved list session exists, make sure the list is orded by this
foreach(var i in IDList)
{
SearchData.ReturnedSearchedMembers.OrderBy(x => x.ID == i);
}
}
else
{
// before any sorts randomize the results - this mixes it up a bit as before it would order the results by member registration date
List<Member> RandomList = new List<Member>(SearchData.ReturnedSearchedMembers);
SearchData.ReturnedSearchedMembers = GloballyAvailableMethods.RandomizeGenericList<Member>(RandomList, RandomList.Count).ToList();
// save the order of these results so they can be restored back during postback
List<int> SearchResultsOrder = new List<int>();
SearchData.ReturnedSearchedMembers.ForEach(x => SearchResultsOrder.Add(x.ID));
Session["SearchResultsOrder"] = SearchResultsOrder;
}
The whole point of this is so when a user searches for members, initially they display in a random order, then if they click page 2, they remain in that order and the next 20 results display.
I have been reading about the ICompare i can use as a parameter in the Linq.OrderBy clause, but i can’t find any simple examples.
I’m hoping for an elegant, very simple LINQ style solution, well I can always hope.
Any help is most appreciated.

Another LINQ-approach:
var orderedByIDList = from i in ids
join o in objectsWithIDs
on i equals o.ID
select o;

One way of doing it:
List<int> order = ....;
List<Item> items = ....;
Dictionary<int,Item> d = items.ToDictionary(x => x.ID);
List<Item> ordered = order.Select(i => d[i]).ToList();

Not an answer to this exact question, but if you have two arrays, there is an overload of Array.Sort that takes the array to sort, and an array to use as the 'key'
https://msdn.microsoft.com/en-us/library/85y6y2d3.aspx
Array.Sort Method (Array, Array)
Sorts a pair of one-dimensional Array objects (one contains the keys
and the other contains the corresponding items) based on the keys in
the first Array using the IComparable implementation of each key.

Join is the best candidate if you want to match on the exact integer (if no match is found you get an empty sequence). If you want to merely get the sort order of the other list (and provided the number of elements in both lists are equal), you can use Zip.
var result = objects.Zip(ints, (o, i) => new { o, i})
.OrderBy(x => x.i)
.Select(x => x.o);
Pretty readable.

Here is an extension method which encapsulates Simon D.'s response for lists of any type.
public static IEnumerable<TResult> SortBy<TResult, TKey>(this IEnumerable<TResult> sortItems,
IEnumerable<TKey> sortKeys,
Func<TResult, TKey> matchFunc)
{
return sortKeys.Join(sortItems,
k => k,
matchFunc,
(k, i) => i);
}
Usage is something like:
var sorted = toSort.SortBy(sortKeys, i => i.Key);

One possible solution:
myList = myList.OrderBy(x => Ids.IndexOf(x.Id)).ToList();
Note: use this if you working with In-Memory lists, doesn't work for IQueryable type, as IQueryable does not contain a definition for IndexOf

docs = docs.OrderBy(d => docsIds.IndexOf(d.Id)).ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.