Overlay/Join two collections with Linq - c#

I have the following scenario:
List 1 has 20 items of type TItem, List 2 has 5 items of the same type. List 1 already contains the items from List 2 but in a different state. I want to overwrite the 5 items in List 1 with the items from List 2.
I thought a join might work, but I want to overwrite the items in List 1, not join them together and have duplicates.
There is a unique key that can be used to find which items to overwrite in List 1 the key is of type int

You could use the built in Linq .Except() but it wants an IEqualityComparer so use a fluid version of .Except() instead.
Assuming an object with an integer key as you indicated:
public class Item
{
public int Key { get; set; }
public int Value { get; set; }
public override string ToString()
{
return String.Format("{{{0}:{1}}}", Key, Value);
}
}
The original list of objects can be merged with the changed one as follows:
IEnumerable<Item> original = new[] { 1, 2, 3, 4, 5 }.Select(x => new Item
{
Key = x,
Value = x
});
IEnumerable<Item> changed = new[] { 2, 3, 5 }.Select(x => new Item
{
Key = x,
Value = x * x
});
IEnumerable<Item> result = original.Except(changed, x => x.Key).Concat(changed);
result.ForEach(Console.WriteLine);
output:
{1:1}
{4:4}
{2:4}
{3:9}
{5:25}

LINQ isn't used to perform actual modifications to the underlying data sources; it's strictly a query language. You could, of course, do an outer join on List2 from List1 and select List2's entity if it's not null and List1's entity if it is, but that is going to give you an IEnumerable<> of the results; it won't actually modify the collection. You could do a ToList() on the result and assign it to List1, but that would change the reference; I don't know if that would affect the rest of your application.
Taking your question literally, in that you want to REPLACE the items in List1 with those from List2 if they exist, then you'll have to do that manually in a for loop over List1, checking for the existence of a corresponding entry in List2 and replacing the List1 entry by index with that from List2.

As Adam says, LINQ is about querying. However, you can create a new collection in the right way using Enumerable.Union. You'd need to create an appropriate IEqualityComparer though - it would be nice to have UnionBy. (Another one for MoreLINQ perhaps?)
Basically:
var list3 = list2.Union(list1, keyComparer);
Where keyComparer would be an implementation to compare the two keys. MiscUtil contains a ProjectionEqualityComparer which would make this slightly easier.
Alternatively, you could use DistinctBy from MoreLINQ after concatenation:
var list3 = list2.Concat(list1).DistinctBy(item => item.Key);

Here's a solution with GroupJoin.
List<string> source = new List<string>() { "1", "22", "333" };
List<string> modifications = new List<string>() { "4", "555"};
//alternate implementation
//List<string> result = source.GroupJoin(
// modifications,
// s => s.Length,
// m => m.Length,
// (s, g) => g.Any() ? g.First() : s
//).ToList();
List<string> result =
(
from s in source
join m in modifications
on s.Length equals m.Length into g
select g.Any() ? g.First() : s
).ToList();
foreach (string s in result)
Console.WriteLine(s);
Hmm, how about a re-usable extension method while I'm at it:
public static IEnumerable<T> UnionBy<T, U>
(
this IEnumerable<T> source,
IEnumerable<T> otherSource,
Func<T, U> selector
)
{
return source.GroupJoin(
otherSource,
selector,
selector,
(s, g) => g.Any() ? g.First() : s
);
}
Which is called by:
List<string> result = source
.UnionBy(modifications, s => s.Length)
.ToList();

Related

Filter a list of address objects by a list of string postcodes [duplicate]

I have a list of parameters like this:
public class parameter
{
public string name {get; set;}
public string paramtype {get; set;}
public string source {get; set;}
}
IEnumerable<Parameter> parameters;
And a array of strings i want to check it against.
string[] myStrings = new string[] { "one", "two"};
I want to iterate over the parameter list and check if the source property is equal to any of the myStrings array. I can do this with nested foreach's but i would like to learn how to do it in a nicer way as i have been playing around with linq and like the extension methods on enumerable like where etc so nested foreachs just feel wrong. Is there a more elegant preferred linq/lambda/delegete way to do this.
Thanks
You could use a nested Any() for this check which is available on any Enumerable:
bool hasMatch = myStrings.Any(x => parameters.Any(y => y.source == x));
Faster performing on larger collections would be to project parameters to source and then use Intersect which internally uses a HashSet<T> so instead of O(n^2) for the first approach (the equivalent of two nested loops) you can do the check in O(n) :
bool hasMatch = parameters.Select(x => x.source)
.Intersect(myStrings)
.Any();
Also as a side comment you should capitalize your class names and property names to conform with the C# style guidelines.
Here is a sample to find if there are match elements in another list
List<int> nums1 = new List<int> { 2, 4, 6, 8, 10 };
List<int> nums2 = new List<int> { 1, 3, 6, 9, 12};
if (nums1.Any(x => nums2.Any(y => y == x)))
{
Console.WriteLine("There are equal elements");
}
else
{
Console.WriteLine("No Match Found!");
}
If both the list are too big and when we use lamda expression then it will take a long time to fetch . Better to use linq in this case to fetch parameters list:
var items = (from x in parameters
join y in myStrings on x.Source equals y
select x)
.ToList();
list1.Select(l1 => l1.Id).Intersect(list2.Select(l2 => l2.Id)).ToList();
var list1 = await _service1.GetAll();
var list2 = await _service2.GetAll();
// Create a list of Ids from list1
var list1_Ids = list1.Select(l => l.Id).ToList();
// filter list2 according to list1 Ids
var list2 = list2.Where(l => list1_Ids.Contains(l.Id)).ToList();

How do I sort a List<Type> by List<int>?

In my c# MVC project I have a list of items in that I want to sort in order of another list
var FruitTypes = new List<Fruit> {
new Fruit { Id = 1, Name = "Banana"},
new Fruit { Id = 2, Name = "Apple" },
new Fruit { Id = 3, Name = "Orange" },
new Fruit { Id = 4, Name = "Plum"},
new Fruit { Id = 5, Name = "Pear" },
};
SortValues = new List<int> {5,4,3,1,2};
Currently my list is showing as default of fruit type.
How can I sort the Fruit list by SortValues?
It's unclear if you are sorting by the indexes in SortValues or whether SortValues contains corresponding Id values that should be joined.
In the first case:
First you have to Zip your two lists together, then you can sort the composite type that Zip generates, then select the FruitType back out.
IEnumerable<FruitType> sortedFruitTypes = FruitTypes
.Zip(SortValues, (ft, idx) => new {ft, idx})
.OrderBy(x => x.idx)
.Select(x => x.ft);
However, this is simply sorting the first list by the ordering indicated in SortValues, not joining the ids.
In the second case, a simple join will suffice:
IEnumerable<FruitType> sortedFruitTypes = SortValues
.Join(FruitTypes, sv => sv, ft => ft.Id, (_, ft) => ft);
This works because Enumerable.Join maintains the order of the "left" hand side of the join.
While there is almost certainly a more LINQ-y way, if you tend towards verbosity, you could accomplish this with an iterator function. For example:
public IEnumerable<Fruit> SortFruits(IEnumerable<Fruit> unordered, IEnumerable<int> sortValues)
{
foreach (var value in sortValues)
yield return unordered.Single(f => f.Id == value);
}
I like that it's explicit about what it's doing. You may consider throwing an exception when the number of items in each list is different, or maybe you just don't return an item if there is no sort value for it. You'll have to decide what the behaviour should be for "missing" values in either collection is. I think that having to handle these scenarios is a good reason to put it all in a single method this way, instead of a longer LINQ query.
Time complexity:O(n) + TM of Linq.
Declare list of fruits to store result.
Iterate through each fruit type.
Use Linq FirstOrDefault to get element by sorted value.
List<int> SortValues = new List<int> { 5, 4, 3, 1, 2 };
List<Fruit> result = new List<Fruit>();
foreach (var element in SortValues)
{
Fruit f = FruitTypes.FirstOrDefault(fruitElement => fruitElement.Id == element);
result.Add(f);
}
Implementation: DotNetFiddler

How do I access things returned by GroupBy

Ridiculously simple question that for the life of me I cant figure out. How do I 'get' at the values returned by GroupBy?
Take simple example below. I want to print out the first value that occurs more than once. Looking at the output in the watch window (image below) it sort of suggests that list3[0][0] might get at "one". But it gives me an error.
Note, I'm looking for the general solution - ie understanding what GroupBy returns.
Also, I would like to use the watch window to help me figure out for my self how I would access variables (as I find much of MSDN reference incomprehensible) - is this possible?
var list1 = new List<String>() {
"one", "two", "three", "one", "two"};
var list3 = list1
.GroupBy(x => x)
.Where(x => x.Count() > 1)
.ToList();
Console.WriteLine("list3[0][0]=" + list3[0][0]); //error
While the VS debugger shows you an "index" number because the underlying type is a collection, the grouping is exposed as an IGrouping<T> that does not have an indexer. If you just want the first item in the first group do:
Console.WriteLine("list3[0][0] =" + list3.First().First());
If you want to see all if the items you cam loop through the groupings:
int gi = 0, ii = 0;
foreach(var g in list3)
{
foreach(item i in g)
{
Console.WriteLine("list3[{0}][{1}] = {2}", gi, ii, i);
ii++;
}
gi++;
}
You are looking for the .Key property, as GroupBy returns an IEnumerable containing IGrouping elements.
If you look at the documentation of GroupBy you'll see it returns a IEnumerable<IGrouping<TKey, TSource>>.
IGrouping<TKey,TSource> has a single property Key and itself inherits IEnumerable<TElement>.
So you can enumerate over the list of items returned from a call to GroupBy and each element will have a Key property (which is whatever you grouped by) as well as enumerate each item (which will be the list of items grouped together)
Hopefully this demonstrates a bit clearer. Given a class:
public class Person
{
public string Name{get;set;}
public int Age{get;set;}
}
And a list:
var people = new List<Person>{
new Person{Name="Jamie",Age=35},
new Person{Name="Bob",Age=45},
new Person{Name="Fred",Age=35},
};
Grouping and enumerating as follows:
var groupedByAge = people.GroupBy(x => x.Age);
foreach(var item in groupedByAge)
{
Console.WriteLine("Age:{0}", item.Key);
foreach(var person in item)
{
Console.WriteLine("{0}",person.Name);
}
}
Gives this output:
Age:35
Jamie
Fred
Age:45
Bob
Live example: http://rextester.com/OWPR50756
GroupBy return an IEnumerable<IGrouping<TKey, TSource>> where each IGrouping<TKey, TElement> object contains a sequence of objects and a key it's not a Multidimensional Array which can be accessed by index [][].
To access the first element try this
static void Main(string[] args)
{
var list1 = new List<String>() {
"one", "two", "three", "one", "two"};
var list3 = list1
.GroupBy(x => x)
.Where(x => x.Count() > 1)
.ToList();
Console.WriteLine("list3[0][0]=" + list3[0].ToList()[0].ToString());
//OR Console.WriteLine("list3[0][0]=" + list3[0].First());
}

How to concatenate two IEnumerable<T> into a new IEnumerable<T>?

I have two instances of IEnumerable<T> (with the same T). I want a new instance of IEnumerable<T> which is the concatenation of both.
Is there a built-in method in .NET to do that or do I have to write it myself?
Yes, LINQ to Objects supports this with Enumerable.Concat:
var together = first.Concat(second);
NB: Should first or second be null you would receive a ArgumentNullException. To avoid this & treat nulls as you would an empty set, use the null coalescing operator like so:
var together = (first ?? Enumerable.Empty<string>()).Concat(second ?? Enumerable.Empty<string>()); //amending `<string>` to the appropriate type
The Concat method will return an object which implements IEnumerable<T> by returning an object (call it Cat) whose enumerator will attempt to use the two passed-in enumerable items (call them A and B) in sequence. If the passed-in enumerables represent sequences which will not change during the lifetime of Cat, and which can be read from without side-effects, then Cat may be used directly. Otherwise, it may be a good idea to call ToList() on Cat and use the resulting List<T> (which will represent a snapshot of the contents of A and B).
Some enumerables take a snapshot when enumeration begins, and will return data from that snapshot if the collection is modified during enumeration. If B is such an enumerable, then any change to B which occurs before Cat has reached the end of A will show up in Cat's enumeration, but changes which occur after that will not. Such semantics may likely be confusing; taking a snapshot of Cat can avoid such issues.
You can use below code for your solution:-
public void Linq94()
{
int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 };
int[] numbersB = { 1, 3, 5, 7, 8 };
var allNumbers = numbersA.Concat(numbersB);
Console.WriteLine("All numbers from both arrays:");
foreach (var n in allNumbers)
{
Console.WriteLine(n);
}
}
I know this is a relatively old post, but if you wanted to concatenate multiple IEnumerable's, I use the following
var joinedSel = new[] { first, second, third }.Where(x => x != null).SelectMany(x => x);
This eliminates any null IEnumerable's and allows for multiple concatenations.
Based off of craig1231's answer, I've created some extension methods...
public static IEnumerable<T> JoinLists<T>(this IEnumerable<T> list1, IEnumerable<T> list2)
{
var joined = new[] { list1, list2 }.Where(x => x != null).SelectMany(x => x);
return joined ?? Enumerable.Empty<T>();
}
public static IEnumerable<T> JoinLists<T>(this IEnumerable<T> list1, IEnumerable<T> list2, IEnumerable<T> list3)
{
var joined = new[] { list1, list2, list3 }.Where(x => x != null).SelectMany(x => x);
return joined ?? Enumerable.Empty<T>();
}
public static IEnumerable<T> JoinMany<T>(params IEnumerable<T>[] array)
{
var final = array.Where(x => x != null).SelectMany(x => x);
return final ?? Enumerable.Empty<T>();
}
// The answer that I was looking for when searching
public void Answer()
{
IEnumerable<YourClass> first = this.GetFirstIEnumerableList();
// Assign to empty list so we can use later
IEnumerable<YourClass> second = new List<YourClass>();
if (IwantToUseSecondList)
{
second = this.GetSecondIEnumerableList();
}
IEnumerable<SchemapassgruppData> concatedList = first.Concat(second);
}

How to get duplicate items from a list using LINQ? [duplicate]

This question already has answers here:
C# LINQ find duplicates in List
(13 answers)
Closed 3 years ago.
I'm having a List<string> like:
List<String> list = new List<String>{"6","1","2","4","6","5","1"};
I need to get the duplicate items in the list into a new list. Now I'm using a nested for loop to do this.
The resulting list will contain {"6","1"}.
Is there any idea to do this using LINQ or lambda expressions?
var duplicates = lst.GroupBy(s => s)
.SelectMany(grp => grp.Skip(1));
Note that this will return all duplicates, so if you only want to know which items are duplicated in the source list, you could apply Distinct to the resulting sequence or use the solution given by Mark Byers.
Here is one way to do it:
List<String> duplicates = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
The GroupBy groups the elements that are the same together, and the Where filters out those that only appear once, leaving you with only the duplicates.
Here's another option:
var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };
var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));
I know it's not the answer to the original question, but you may find yourself here with this problem.
If you want all of the duplicate items in your results, the following works.
var duplicates = list
.GroupBy( x => x ) // group matching items
.Where( g => g.Skip(1).Any() ) // where the group contains more than one item
.SelectMany( g => g ); // re-expand the groups with more than one item
In my situation I need all duplicates so that I can mark them in the UI as being errors.
I wrote this extension method based off #Lee's response to the OP. Note, a default parameter was used (requiring C# 4.0). However, an overloaded method call in C# 3.0 would suffice.
/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable<T></remarks>
public static IEnumerable<T> Duplicates<T>
(this IEnumerable<T> source, bool distinct = true)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
// select the elements that are repeated
IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));
// distinct?
if (distinct == true)
{
// deferred execution helps us here
result = result.Distinct();
}
return result;
}
List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };
var q = from s in list
group s by s into g
where g.Count() > 1
select g.First();
foreach (var item in q)
{
Console.WriteLine(item);
}
Hope this wil help
int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
foreach (var d in duplicates)
Console.WriteLine(d);
I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list. So I came up with looping through the groups to repack the original List with items that have duplicates.
public List<MediaFileInfo> GetDuplicatePictures()
{
List<MediaFileInfo> dupes = new List<MediaFileInfo>();
var grpDupes = from f in _fileRepo
group f by f.Length into grps
where grps.Count() >1
select grps;
foreach (var item in grpDupes)
{
foreach (var thing in item)
{
dupes.Add(thing);
}
}
return dupes;
}
All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.
The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.
As always in LINQ there are two versions, one with IEqualityComparer and one without it.
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
IEqualityComparer<TSource> comparer);
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (comparer == null)
comparer = EqualityCompare<TSource>.Default;
HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
foreach (TSource sourceItem in source)
{
if (!foundElements.Contains(sourceItem))
{ // we've not seen this sourceItem before. Add to the foundElements
foundElements.Add(sourceItem);
}
else
{ // we've seen this item before. It is a duplicate!
yield return sourceItem;
}
}
}
Usage:
IEnumerable<MyClass> myObjects = ...
// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();
// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)
// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
.Where(duplicate => duplicate.Name == "MyName")
.Take(5);
For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.
IMHO that is an efficiency boost to consider.

Categories