How to handle duplicate keys while joining two lists?

How to handle duplicate keys while joining two lists? - c#

I'm new to C#.
I have the following struct.
struct Foo
{
string key;
Bar values;
}
I have two lists of Foo, L1 and L2 of equal size both contain same set of keys.
I have to merge the corresponding Foo instances in L1 and L2.
Foo Merge(Foo f1, Foo f2)
{
// merge f1 and f2.
return result.
}
I wrote the following to achieve this.
resultList = L1.Join(L2, f1 => f1.key, f2 => f2.key, (f1, f2) => Merge(f1, f2)
).ToList())
My problem is that my key is not unique. I have n number of elements in L1 with the same key (say "key1") (which are also appearing in L2 somewhere). So, the above join statement selects n matching entries from L2 for each "key1" from L1 and I get n*n elements with key "key1" in the result where I want only n. (So, this is kind of crossproduct for those set of elements).
I want to use Join and still select an element from L1 with "key1" and force the Linq to use the first available 'unused' "key1" element from L2. Is this possible? Is join a bad idea here?
(Also, the I want to preserve the order of the keys as in L1. I tried to handle all elements with such keys before the join and removed those entries from L1 and L2. This disturbed the order of the keys and it looked ugly).
I'm looking for a solution without any explicit for loops.

From your comment to ElectricRouge answer, you could do something like
var z = list1.Join(list2.GroupBy(m => m.Id),
m => m.Id,
g => g.Key,
(l1, l2) => new{l1, l2});
this would give you a list of all keys in l1, and the corresponding grouped keys in l2.
Not sure it's really readable.

I need to find the corresponding entries in two lists and do some operation on them. That is my preliminary requirement.
For this you can do something like this.
var z=S1.Select(i=>i.Key).Tolist(); //make a list of all keys in S1
List<Foo> result=new List<Foo>();
foreach(var item in z) // Compare with S2 using keys in z
{
var x=item.Where(i=>i.Key==item.Key)
result.Add(x);
}
Is this what you are looking for?

I want to use Join and still select an element from L1 with "key1" and force the Linq to use the first available 'unused' "key1" element from L2. Is this possible?
When combining elements from the two lists you want to pick the first element in the second list having the same key as the element in the first list. (Previously, I interpreted you question differently, and a solution to this different problem is available in the edit history of this answer.)
For quick access to the desired values in the second list a dictionary is created providing lookup from keys to the desired value from the second list:
var dictionary2 = list2
.GroupBy(foo => foo.Key)
.ToDictionary(group => group.Key, group => group.First());
The use of First expresses the requirement that you want to pick the first element in the second list having the same key.
The merged list is now created by using projection over the first list:
var mergedList = list1.Select(
foo => Merge(
foo,
dictionary2[foo.Key]
)
);
When you use foreach to iterate mergedList or ToList() the desired result will be computed.

You could use Union to remove the duplicated keys.
Documentation at http://msdn.microsoft.com/en-us/library/bb341731.aspx
List<int> list1 = new List<int> { 1, 12, 12, 5};
List<int> list2 = new List<int> { 12, 5, 7, 9, 1 };
List<int> ulist = list1.Union(list2).ToList();
Example taken from : how to merge 2 List<T> with removing duplicate values in C#
Or you can use Concat to merge a list of different types (Keeping all keys).
See the documentation her : http://msdn.microsoft.com/en-us/library/bb302894(v=vs.110).aspx
var MyCombinedList = Collection1.Concat(Collection2)
.Concat(Collection3)
.ToList();
Example taken from same question : Merge two (or more) lists into one, in C# .NET

Finally I adapted Raphaël's answer as below.
public class FooComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo o1, Foo o2)
{
return o1.key == o2.key;
}
public int GetHashCode(Foo obj)
{
return obj.key.GetHashCode();
}
}
resultList = L1.Join(L2.Select(m => m).Distinct(new FooComparer()).ToList(), f1 => f1.key, f2 => f2.key, (f1, f2) => Merge(f1, f2)
).ToList());
Short explanation:
L2.Select(m => m).Distinct(new FooComparer()).ToList()
creates a new list by removing the duplicate keys from L2. Join L1 with this new list to get the required result.

Related

How to combine different values of IEnumerable? [duplicate]

This question already has answers here:
C# Linq Grouping
(2 answers)
Closed 1 year ago.
Struggling with the concept of this: I have a method that returns an IEnumerable like so:
public IEnumerable<IFruit> FruitStall(IEnumerable<IFruit> fruits)...
and each IFruit is as follows
public decimal Amount { get; }
public string Code { get; }
and my FruitsStall method returns one IFruit per currency with the sum of all money of the same code.
e.g
{APL10, APL20, APL50} => {APLE80}
or
{APL10, APL20, ORG50} => {APL30, ORG50}
Can anyone point me in the right direction of this? Not too sure how to go about this.
I was looping through the IEnumerable with a
foreach (var item in fruits)
{
fruits.Code
}
but unsure where to go there on

When you said:
{APL10, APL20, APL50} => {APLE80}
or
{APL10, APL20, ORG50} => {APL30, ORG50}
If you meant
I have this: {new Fruit("APL",10), new Fruit("APL", 20), new Fruit("ORG",50)} and I want to generate a list like {new Fruit("APL",30), new Fruit("ORG",50)}
I would say:
You need to have some container that can hold all the different codes and map them to a sum of Amounts. For this we often use a dictionary:
var d = new Dictionary<string,decimal>();
foreach(var f in fruit){
if(!d.ContainsKey(f.Code))
d[f.Code] = f.Amount;
else
d[f.Code] += f.Amount;
}
At the end of this operation your dictionary will contain a unique set of fruit codes and the sum of all the amounts. You can turn it back into a list of fruit by enumerating the dictionary and creating a list in a similar way to how you created the list initially
Once you get your head round that, you can take a look at using LINQ, and do something like:
var summedFruits = fruits
.GroupBy(f => f.Code)
.Select(g => new Fruit(g.Key, g.Sum(f => f.Amount)));
(This assumes your Fruit has a constructor that takes a Code and an Amount). When you GroupBy in LINQ you get an output that is like a List of Lists. The original input list is broken up into some number of lists; where everything in each list has the same value for what you declared was the key (I said to group by code)
So your original representation:
{APL10, APL20, ORG50}
Would end up looking like:
Key = APL, List = {APL10, APL20}
Key = ORG, List = {ORG50}
Your "one list of three things" has become "a list of (a list of two APL ) and (a list of one ORG)"
If you then run a select on them you can create a new Fruit that uses the Key as the code and sums up the amount in each list
Key = APL, List = {APL10, APL20}, Sum = 30
Key = ORG, List = {ORG50}, Sum = 50
In this code:
var summedFruits = fruits
.GroupBy(f => f.Code)
.Select(g => new Fruit(g.Key, g.Sum(f2=> f2.Amount)));
f is a fruit, one of the items in the original list of 3 fruits. We group by the fruit's code. g is the result of the grouping operation, it is a "list of fruit with a common code", the Key is the code (apl or org). g is a list, so you can call sum on it. Every item inside the g list is a fruit, which is why I switch back to f (when I say f2), to help remember that it's an individual fruit- we're summing the amount. For the first list of APL the sum is 30. At the end of the operation a new List results; one with two elements - an APL and an ORG, but the Amounts are the summations of all

var res = fruits.GroupBy(x => x.Code).ToDictionary(g => g.Key, g => g.Sum(x => x.Amount));

Lambda - user id in list existing in a list

list1 contains userid and username
list2 contains userids
Need to display the list1 where its userid is included in list2.
string userids = "user1,user2,user3";
var list2 = userids.Split(',').Select(userid => userid.Trim()).ToList();
list1 = list1.Any(x => x.UserID)... //Got stuck here

Better use HashSet<T> for search:
string userids = "user1,user2,user3";
var userIdSet = new HashSet<string>(userids.Split(',').Select(userid => userid.Trim()));
list1 = list1.Where(x => userIdSet.Contains(x.UserID)).ToList();

Another way is Enumerable.Join which is more efficient if the lists are pretty large:
var containedUsers = from x1 in list1
join x2 in list2 on x1.UserId equals x2
select x1;
list1 = containedUsers.ToList();
I assume that the UserID's in list2 are unique(otherwise use Distinct). If not, joining them might cause duplcicate items in list1.

Its easy to get stuck on So you need to check list2 contains the item you picked.
found = list1.Where( x => list2.contains(x.UserID));

Method Any returns bool, it
Determines whether any element of a sequence satisfies a condition
Return Value
Type: System.Boolean
true if any elements in the source sequence pass the test in the specified predicate; otherwise, false.
Method Where
Filters a sequence of values based on a predicate.
So you can use Any inside Where to filter only results that contains inside list2.
list1 = list1.Where(l1 => list2.Any(l2 => l2 == l1.UserID)).ToList();
References: Enumerable.Any(Of TSource) Method, Enumerable.Where(Of TSource) Method

Linq statement to select common elements between two collections

I'm trying to implement a search function, and I want to select all elements that are common in variable A and B and remove the rest.
My code looks like this:
A.ForEach(y =>
{
temp = temp.Where(x => x.Id== y.Id);
});
The problem is if A has some values that temp doesn't contain, I'll get an empty temp.
I hope I'm clear enough, but just to make sure:
If A contains 6, 10 and
temp contains 10, 7. I want to just have 10.
What's correct join or any other Linq statement for this? I'm not able to use intersect since two variables are from different tables.

You would want to use a Join.
A.Join(B, a => a.Id, b => b.Id, (a,b) => new { A = a, B = b });
This will result in an enumerable with the rows where A and B are joined and an anonymous type of:
public class AnonymousType {
AType A;
BType B;
}
Join information from C# Joins/Where with Linq and Lambda

You can try this solution, it works fine for me which return the shared elements by id between two sets: IEnumerable SetA and IEnumerable SetB:
IEnumerable<MyClassTypeA> SetA;
IEnumerable<MyClassTypeB> SetB;
Dictionary<Id, MyClassTypeA> entriesOfSetA= SetA.ToDictionary(x=>x.id);
var result= SetB.Where(x=> entriesOfSetA.ContainsKey(x.id));

Can you explain this lambda grouping function?

I've been using LINQ and Lambda Expressions for a while, but I'm still not completely comfortable with every aspect of the feature.
So, while I was working on a project recently I needed to get a distinct list of objects based off of some property, and I ran across this code. It works, and I'm fine with that, but I'd like to understand the grouping mechanism. I don't like simply plugging code in and running away from the problem if I can help it.
Anyways the code is:
var listDistinct
=list.GroupBy(
i => i.value1,
(key, group) => group.First()
).ToList();
In the code sample above, you're first calling GroupBy and passing it a lambda expression telling it to group by the property value1. The second section of the code is causing the confusion.
I understand that key is referencing value1 in the (key, group) statement, but I'm still not wrapping my head around everything that's taking place.

What does the expression
list.GroupBy(
i => i.value1,
(key, group) => group.First())
do?
This creates a query which, when executed, analyzes the sequence list to produce a sequence of groups, and then projects the sequence of groups into a new sequence. In this case, the projection is to take the first item out of each group.
The first lambda chooses the "key" upon which the groups are constructed. In this case, all items in the list which have the same value1 property are put in a group. The value that they share becomes the "key" of the group.
The second lambda projects from the sequence of keyed groups; it's as though you'd done a select on the sequence of groups. The net effect of this query is to choose a set of elements from the list such that each element of the resulting sequence has a different value of the value1 property.
The documentation is here:
http://msdn.microsoft.com/en-us/library/bb549393.aspx
If the documentation is not clear, I am happy to pass along criticisms to the documentation manager.
This code uses group as the formal parameter of a lambda. Isn't group a reserved keyword?
No, group is a contextual keyword. LINQ was added to C# 3.0, so there might have already been existing programs using group as an identifier. These programs would be broken when recompiled if group was made a reserved keyword. Instead, group is a keyword only in the context of a query expression. Outside of a query expression it is an ordinary identifier.
If you want to call attention to the fact that it is an ordinary identifier, or if you want to use the identifier group inside a query expression, you can tell the compiler "treat this as an identifier, not a keyword" by prefacing it with #. Were I writing the code above I would say
list.GroupBy(
i => i.value1,
(key, #group) => #group.First())
to make it clear.
Are there other contextual keywords in C#?
Yes. I've documented them all here:
http://ericlippert.com/2009/05/11/reserved-and-contextual-keywords/

I would like to simplify this to a list of int and how to do distinct in this list by using GroupBy:
var list = new[] {1, 2, 3, 1, 2, 2, 3};
if you call GroupBy with x => x, you will get 3 groups with type:
IEnumerable<IEnumerable<int>>
{{1,1},{2,2,2},{3,3}}
The key of each group are: 1, 2, 3. And then, when calling group.First(), it means you get first item of each group:
{1,1}: -> 1.
{2,2,2}: -> 2
{3,3} -> 3
So the final result is : {1, 2, 3}
Your case is similar with this.

It uses this overload of the Enumerable.GroupBy method:
public static IEnumerable<TResult> GroupBy<TSource, TKey, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TKey, IEnumerable<TSource>, TResult> resultSelector
)
which, as stated on MSDN:
Groups the elements of a sequence according to a specified key selector function and creates a result value from each group and its key.
So, unlike other overloads which return a bunch of groups (i.e. IEnumerable<IGrouping<TK, TS>>), this overload allows you to project each group to a single instance of a TResult of your choice.
Note that you could get the same result using the basic GroupBy overload and a Select:
var listDistinct = list
.GroupBy(i => i.value1)
.Select(g => g.First())
.ToList();

(key, group) => group.First()
It's just taking the First() element within each group.
Within that lambda expression key is a key that was used to create that group (value1 in your example) and group is IEnumerable<T> with all elements that has that key.

The below self describing example should help you understand the grouping:
class Item
{
public int Value { get; set; }
public string Text { get; set; }
}
static class Program
{
static void Main()
{
// Create some items
var item1 = new Item {Value = 0, Text = "a"};
var item2 = new Item {Value = 0, Text = "b"};
var item3 = new Item {Value = 1, Text = "c"};
var item4 = new Item {Value = 1, Text = "d"};
var item5 = new Item {Value = 2, Text = "e"};
// Add items to the list
var itemList = new List<Item>(new[] {item1, item2, item3, item4, item5});
// Split items into groups by their Value
// Result contains three groups.
// Each group has a distinct groupedItems.Key --> {0, 1, 2}
// Each key contains a collection of remaining elements: {0 --> a, b} {1 --> c, d} {2 --> e}
var groupedItemsByValue = from item in itemList
group item by item.Value
into groupedItems
select groupedItems;
// Take first element from each group: {0 --> a} {1 --> c} {2 --> e}
var firstTextsOfEachGroup = from groupedItems in groupedItemsByValue
select groupedItems.First();
// The final result
var distinctTexts = firstTextsOfEachGroup.ToList(); // Contains items where Text is: a, c, e
}
}

It's equvilant to
var listDistinct=(
from i in list
group i by i.value1 into g
select g.First()
).ToList();
The part i => i.value1 in your original code is key selector. In this code, is simply i.value in the syntax of group elements by key.
The part (key, group) => group.First() in the original code is a delegate of result selector. In this code, it is wrote in a more semantical syntax of from ... select. is Here g is group in original code.

Sort one list by another

I have 2 list objects, one is just a list of ints, the other is a list of objects but the objects has an ID property.
What i want to do is sort the list of objects by its ID in the same sort order as the list of ints.
Ive been playing around for a while now trying to get it working, so far no joy,
Here is what i have so far...
//**************************
//*** Randomize the list ***
//**************************
if (Session["SearchResultsOrder"] != null)
{
// save the session as a int list
List<int> IDList = new List<int>((List<int>)Session["SearchResultsOrder"]);
// the saved list session exists, make sure the list is orded by this
foreach(var i in IDList)
{
SearchData.ReturnedSearchedMembers.OrderBy(x => x.ID == i);
}
}
else
{
// before any sorts randomize the results - this mixes it up a bit as before it would order the results by member registration date
List<Member> RandomList = new List<Member>(SearchData.ReturnedSearchedMembers);
SearchData.ReturnedSearchedMembers = GloballyAvailableMethods.RandomizeGenericList<Member>(RandomList, RandomList.Count).ToList();
// save the order of these results so they can be restored back during postback
List<int> SearchResultsOrder = new List<int>();
SearchData.ReturnedSearchedMembers.ForEach(x => SearchResultsOrder.Add(x.ID));
Session["SearchResultsOrder"] = SearchResultsOrder;
}
The whole point of this is so when a user searches for members, initially they display in a random order, then if they click page 2, they remain in that order and the next 20 results display.
I have been reading about the ICompare i can use as a parameter in the Linq.OrderBy clause, but i can’t find any simple examples.
I’m hoping for an elegant, very simple LINQ style solution, well I can always hope.
Any help is most appreciated.

Another LINQ-approach:
var orderedByIDList = from i in ids
join o in objectsWithIDs
on i equals o.ID
select o;

One way of doing it:
List<int> order = ....;
List<Item> items = ....;
Dictionary<int,Item> d = items.ToDictionary(x => x.ID);
List<Item> ordered = order.Select(i => d[i]).ToList();

Not an answer to this exact question, but if you have two arrays, there is an overload of Array.Sort that takes the array to sort, and an array to use as the 'key'
https://msdn.microsoft.com/en-us/library/85y6y2d3.aspx
Array.Sort Method (Array, Array)
Sorts a pair of one-dimensional Array objects (one contains the keys
and the other contains the corresponding items) based on the keys in
the first Array using the IComparable implementation of each key.

Join is the best candidate if you want to match on the exact integer (if no match is found you get an empty sequence). If you want to merely get the sort order of the other list (and provided the number of elements in both lists are equal), you can use Zip.
var result = objects.Zip(ints, (o, i) => new { o, i})
.OrderBy(x => x.i)
.Select(x => x.o);
Pretty readable.

Here is an extension method which encapsulates Simon D.'s response for lists of any type.
public static IEnumerable<TResult> SortBy<TResult, TKey>(this IEnumerable<TResult> sortItems,
IEnumerable<TKey> sortKeys,
Func<TResult, TKey> matchFunc)
{
return sortKeys.Join(sortItems,
k => k,
matchFunc,
(k, i) => i);
}
Usage is something like:
var sorted = toSort.SortBy(sortKeys, i => i.Key);

One possible solution:
myList = myList.OrderBy(x => Ids.IndexOf(x.Id)).ToList();
Note: use this if you working with In-Memory lists, doesn't work for IQueryable type, as IQueryable does not contain a definition for IndexOf

docs = docs.OrderBy(d => docsIds.IndexOf(d.Id)).ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.