Intersection of List of List - c#

I have a list of lists which looks like the following
public class FilteredVM
{
public int ID { get; set; }
public string Name { get; set; }
public string Number { get; set; }
}
List<List<FilteredVM>> groupedExpressionResults = new List<List<FilteredVM>>();
I would like to Intersect the lists within this list based upon the ID's, whats the best way to tackle this?

Here's an optimized extension method:
public static HashSet<T> IntersectAll<T>(this IEnumerable<IEnumerable<T>> series, IEqualityComparer<T> equalityComparer = null)
{
if (series == null)
throw new ArgumentNullException("series");
HashSet<T> set = null;
foreach (var values in series)
{
if (set == null)
set = new HashSet<T>(values, equalityComparer ?? EqualityComparer<T>.Default);
else
set.IntersectWith(values);
}
return set ?? new HashSet<T>();
}
Use this with the following comparer:
public class FilteredVMComparer : IEqualityComparer<FilteredVM>
{
public static readonly FilteredVMComparer Instance = new FilteredVMComparer();
private FilteredVMComparer()
{
}
public bool Equals(FilteredVM x, FilteredVM y)
{
return x.ID == y.ID;
}
public int GetHashCode(FilteredVM obj)
{
return obj.ID;
}
}
Like that:
series.IntersectAll(FilteredVMComparer.Instance)
You could just write
series.Aggregate((a, b) => a.Intersect(b, FilteredVMComparer.Instance))
but it 'd be wasteful because it'd have to construct multiple sets.

Intersect will work when the type are dead equals, which in your case won't apply because you haven't implemented the GetHashCode and Equals methods, which is the best and complete way.
Thus, If you only intended to take elements that contains in both lists, than the following solution will suit you right.
Assuming list1 and list2 are type List<FilteredVM> than, The most simple way, will be doing this:
var intersectByIDs = list1.Where(elem => list2.Any(elem2 => elem2.ID == elem.ID));

If you are a fan of one-liner solutions you can use this:
List<FilteredVM> result = groupedExpressionResults.Aggregate((x, y) => x.Where(xi => y.Select(yi => yi.ID).Contains(xi.ID)).ToList());
And if you just want the IDs you can just add .Select(x => x.ID), like this:
var ids = groupedExpressionResults.Aggregate((x, y) => x.Where(xi => y.Select(yi => yi.ID).Contains(xi.ID)).ToList()).Select(x => x.ID);
Working Demo

Related

C# comparing two large lists of items by a specific property

I have two large lists of items whos class look like this (both lists are of same type):
public class Items
{
public string ItemID { get; set; }
public int QuantitySold { get; set; }
}
var oldList = new List<Items>(); // oldList
var newList = new List<Items>(); // new list
The old list contains items from database and the new list represents items fetched from API;
Both lists can be very large with 10000+ items in each (20000 total)
I need to compare items from newList against the items from "oldList" and see which items that have same itemID value, are of different "QuantitySold" value, and those that are of different "QuantitySold" value should be stored in third list called "differentQuantityItems".
I could just simply do double foreach list and compare values but since both of the lists are large the performance with double foreach loop is terrible and I can't do it...
Can someone help me out with this?
#YamamotoTetsua I'm already using a IEqualityComparer to get the desired result, however it doesn't gives the results that I'm expecting. Here is why...I have a first IEqualityComparer which looks like this:
public class MissingItemComparer : IEqualityComparer<SearchedUserItems>
{
public static readonly IEqualityComparer<SearchedUserItems> Instance = new MissingItemComparer();
public bool Equals(SearchedUserItems x, SearchedUserItems y)
{
return x.ItemID == y.ItemID;
}
public int GetHashCode(SearchedUserItems x)
{
return x.ItemID.GetHashCode();
}
}
The usage of this IEqualityComparer basically gives me items from newList that are not present in my database like following:
var missingItems= newItems.Except(competitor.SearchedUserItems.ToList(), MissingItemComparer.Instance).ToList();
Now in this list I will have the list of items which are new from API and are not present in my DB...
Second IEqualityComparer is based on the different QuantitySold from old and new list:
public class ItemsComparer : IEqualityComparer<SearchedUserItems>
{
public static readonly IEqualityComparer<SearchedUserItems> Instance = new ItemsComparer();
public bool Equals(SearchedUserItems x, SearchedUserItems y)
{
return (x.QuantitySold == y.QuantitySold);
}
public int GetHashCode(SearchedUserItems x)
{
return x.ItemID.GetHashCode();
}
}
Usage example:
var differentQuantityItems = newItems.Except(competitor.SearchedUserItems.ToList(), ItemsComparer.Instance).ToList();
The issue with these two equality comparers is that first one will for example return these itemID's that are missing:
123124124
123124421
512095902
And they indeed are missing from my oldList... However the second IEQualityComparer will also return these items as differentQuantity items, they indeed are, but the aren't present in the oldList.. So they shouldn't be included in the second list.
This is a perfect candidate for LINQ Join:
var differentQuantityItems =
(from newItem in newList
join oldItem in oldList on newItem.ItemID equals oldItem.ItemID
where newItem.QuantitySold != oldItem.QuantitySold
select newItem).ToList();
This will return all new items which have corresponding old item with different QuantitySold. If you want to also include the new items without corresponding old item, then use left outer join:
var differentQuantityItems =
(from newItem in newList
join oldItem in oldList on newItem.ItemID equals oldItem.ItemID into oldItems
from oldItem in oldItems.DefaultIfEmpty()
where oldItem == null || newItem.QuantitySold != oldItem.QuantitySold
select newItem).ToList();
In both cases, join operator is used to quickly correlate the items with the same ItemID. Then you can compare QuantitySold or any other properties.
This code will run in less than a second, even if there are no matches at all (also less than a second if everything is a match).
It will return all items that exists in both lists (i.e. same ItemID) but with a different QuantitySold.
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp5
{
class Program
{
public class Items
{
public string ItemID { get; set; }
public int QuantitySold { get; set; }
}
static void Main(string[] args)
{
// Sample data
var oldList = new List<Items>();
oldList.AddRange(Enumerable.Range(0, 20000).Select(z => new Items() { ItemID = z.ToString(), QuantitySold = 4 }));
var newList = new List<Items>();
newList.AddRange(Enumerable.Range(0, 20000).Select(z => new Items() { ItemID = z.ToString(), QuantitySold = 5 }));
var results = oldList.Join(newList,
left => left.ItemID,
right => right.ItemID,
(left, right) => new { left, right })
.Where(z => z.left.QuantitySold != z.right.QuantitySold).Select(z => z.left);
Console.WriteLine(results.Count());
Console.ReadLine();
}
}
}
The use of z.left means only one of the items will be returned - if you want both the old and the new, instead use:
var results = oldList.Join(newList,
left => left.ItemID,
right => right.ItemID,
(left, right) => new { left, right })
.Where(z => z.left.QuantitySold != z.right.QuantitySold)
.Select(z => new[] { z.left, z.right })
.SelectMany(z => z);
From a big-O complexity point of view, just comparing the lists in a nested for loop would be in the class of O(n*m), being n the size of the list in the DB, and m the size of the list fetched from the API.
What you can do to improve your performance is to sort the two lists, that would cost O(n log(n) + m log(m)), and then you could find the new items in O(n + m). Therefore, the overall complexity of your algorithm would then be in the class of O(n log(n) + m log(m)).
Here's an idea of the time it would take, comparing the quadratic solution to the superlinear one.
You can think of using Except clause with custom written IEqualityComparer something like below
var oldList = new List<Item>(); // oldList
var newList = new List<Item>(); // new list
var distinctList = newList.Except(oldList,new ItemEqualityComparer()).ToList();
class ItemEqualityComparer : IEqualityComparer<Item>
{
public bool Equals(Item i1, Item i2)
{
if (i1.ItemID == i2.ItemID && i1.QuantitySold != i2.QuantitySold)
return false;
return true;
}
public int GetHashCode(Item item)
{
return item.ItemID.GetHashCode();
}
}
public class Item
{
public string ItemID { get; set; }
public int QuantitySold { get; set; }
}

Get distinct list values

i have a C# application in which i'd like to get from a List of Project objects , another List which contains distinct objects.
i tried this
List<Project> model = notre_admin.Get_List_Project_By_Expert(u.Id_user);
if (model != null) model = model.Distinct().ToList();
The list model still contains 4 identical objects Project.
What is the reason of this? How can i fix it?
You need to define "identical" here. I'm guessing you mean "have the same contents", but that is not the default definition for classes: the default definition is "are the same instance".
If you want "identical" to mean "have the same contents", you have two options:
write a custom comparer (IEqualityComparer<Project>) and supply that as a parameter to Distinct
override Equals and GetHashCode on Project
There are also custom methods like DistinctBy that are available lots of places, which is useful if identity can be determined by a single property (Id, typically) - not in the BCL, though. But for example:
if (model != null) model = model.DistinctBy(x => x.Id).ToList();
With, for example:
public static IEnumerable<TItem>
DistinctBy<TItem, TValue>(this IEnumerable<TItem> items,
Func<TItem, TValue> selector)
{
var uniques = new HashSet<TValue>();
foreach(var item in items)
{
if(uniques.Add(selector(item))) yield return item;
}
}
var newList =
(
from x in model
select new {Id_user= x.Id_user}
).Distinct();
or you can write like this
var list1 = model.DistinctBy(x=> x.Id_user);
How do you define identical? You should override Equals in Project with this definition (if you override Equals also override GetHashCode). For example:
public class Project
{
public int ProjectID { get; set; }
public override bool Equals(object obj)
{
var p2 = obj as Project;
if (p2 == null) return false;
return this.ProjectID == m2.ProjectID;
}
public override int GetHashCode()
{
return ProjectID;
}
}
Otherwise you are just checking reference equality.
The object's reference aren't equal. If you want to be able to do that on the entire object itself and not just a property, you have to implement the IEqualityComparer or IEquatable<T>.
Check this example: you need to use either Comparator or override Equals()
class Program
{
static void Main( string[] args )
{
List<Item> items = new List<Item>();
items.Add( new Item( "A" ) );
items.Add( new Item( "A" ) );
items.Add( new Item( "B" ) );
items.Add( new Item( "C" ) );
items = items.Distinct().ToList();
}
}
public class Item
{
string Name { get; set; }
public Item( string name )
{
Name = name;
}
public override bool Equals( object obj )
{
return Name.Equals((obj as Item).Name);
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
Here's an answer from basically the same question that will help.
Explanation:
The Distinct() method checks reference equality for reference types. This means it is looking for literally the same object duplicated, not different objects which contain the same values.
Credits to #Rex M.
Isn't simpler to use one of the approaches shown below :) ?
You can just group your domain objects by some key and select FirstOrDefault like below.
More interesting option is to create some Comparer adapter that takes you domain object and creates other object the Comparer can use/work with out of the box. Base on the comparer you can create your custom linq extensions like in sample below. Hope it helps :)
[TestMethod]
public void CustomDistinctTest()
{
// Generate some sample of domain objects
var listOfDomainObjects = Enumerable
.Range(10, 10)
.SelectMany(x =>
Enumerable
.Range(15, 10)
.Select(y => new SomeClass { SomeText = x.ToString(), SomeInt = x + y }))
.ToList();
var uniqueStringsByUsingGroupBy = listOfDomainObjects
.GroupBy(x => x.SomeText)
.Select(x => x.FirstOrDefault())
.ToList();
var uniqueStringsByCustomExtension = listOfDomainObjects.DistinctBy(x => x.SomeText).ToList();
var uniqueIntsByCustomExtension = listOfDomainObjects.DistinctBy(x => x.SomeInt).ToList();
var uniqueStrings = listOfDomainObjects
.Distinct(new EqualityComparerAdapter<SomeClass, string>(x => x.SomeText))
.OrderBy(x=>x.SomeText)
.ToList();
var uniqueInts = listOfDomainObjects
.Distinct(new EqualityComparerAdapter<SomeClass, int>(x => x.SomeInt))
.OrderBy(x => x.SomeInt)
.ToList();
}
Custom comparer adapter:
public class EqualityComparerAdapter<T, V> : EqualityComparer<T>
where V : IEquatable<V>
{
private Func<T, V> _valueAdapter;
public EqualityComparerAdapter(Func<T, V> valueAdapter)
{
_valueAdapter = valueAdapter;
}
public override bool Equals(T x, T y)
{
return _valueAdapter(x).Equals(_valueAdapter(y));
}
public override int GetHashCode(T obj)
{
return _valueAdapter(obj).GetHashCode();
}
}
Custom linq extension (definition of DistinctBy extension method):
// Embedd this class in some specific custom namespace
public static class DistByExt
{
public static IEnumerable<T> DistinctBy<T,V>(this IEnumerable<T> enumerator,Func<T,V> valueAdapter)
where V : IEquatable<V>
{
return enumerator.Distinct(new EqualityComparerAdapter<T, V>(valueAdapter));
}
}
Definition of domain object used in test case:
public class SomeClass
{
public string SomeText { get; set; }
public int SomeInt { get; set; }
}
List<ViewClReceive> passData = (List<ViewClReceive>)TempData["passData_Select_BankName_List"];
passData = passData?.DistinctBy(b=>b.BankNm).ToList();
It will Works ......

Don't bind duplicate items to checked list box

I've got a list of objects which have team details of salesmen.
the list has several teams which have the same name but the salesman is different.
the teamDetails class has the following attributes:
string teamName;
string region;
int teamSales;
string salesmanFullName;
string salesmanAddress;
the user has an option to find all the teams which have sales over a certain value. these teams are then added to a check box list.
this is how i'm populating the check box list:
var viewList = from toSearch in GlobalVariables.allSalesmenList
where toSearch.teamSales > Convert.ToInt32(txtSalesSearch.Text)
select toSearch;
SearchCheckedListBox.DataSource = viewList.ToList();
SearchCheckedListBox.DisplayMember = "teamName";
the problem I'm having is the team name is shown more than once if the team has more than one salesman.
how would I prevent the checkbox from having repeated values?
Try to use distinct with comparer:
var viewList = from toSearch in GlobalVariables.allSalesmenList
where toSearch.teamSales > Convert.ToInt32(txtSalesSearch.Text)
select toSearch;
SearchCheckedListBox.DataSource = viewList.Distinct(new TeamComparer()).ToList();
SearchCheckedListBox.DisplayMember = "teamName";
Comparer code:
public class TeamComparer : IEqualityComparer<teamDetails>
{
public bool Equals(teamDetails x, teamDetails y)
{
if (x.teamName == y.teamName) return true;
return false;
}
public int GetHashCode(teamDetails obj)
{
if (Object.ReferenceEquals(obj, null)) return 0;
return obj.teamName.GetHashCode();
}
}
You can simply use this
SearchCheckedListBox.DataSource = viewList.GroupBy(x => x.teamName)
.Select(g => g.First())
.ToList();
If you are planing to use the same trick more than once, you can write an extension method
public static IEnumerable<T> DistinctBy<T, S>(this IEnumerable<T> list, Func<T, S> selector)
{
return list.GroupBy(selector).Select(g => g.First());
}
then the code would be
SearchCheckedListBox.DataSource = viewList.DistinctBy(x => x.teamName).ToList();

Grouping by IEnumerable<string> does not work at all

I'm not really sure, why grouping by IEnumerable<string> does not work. I provide custom IEqualityComparer, of course.
public class StringCollectionEqualityComparer : EqualityComparer<IEnumerable<string>>
{
public override bool Equals(IEnumerable<string> x, IEnumerable<string> y)
{
if (Object.Equals(x, y) == true)
return true;
if (x == null) return y == null;
if (y == null) return x == null;
return x.SequenceEqual(y, StringComparer.OrdinalIgnoreCase);
}
public override int GetHashCode(IEnumerable<string> obj)
{
return obj.OrderBy(value => value, StringComparer.OrdinalIgnoreCase).Aggregate(0, (hashCode, value) => value == null ? hashCode : hashCode ^ value.GetHashCode() + 33);
}
}
class A
{
public IEnumerable<string> StringCollection { get; set; }
}
IEnumerable<A> collection = // collection of A
var grouping = collection.GroupBy(obj => a.StringCollection, StringCollectionEqualityComparer.Default).ToList();
(ToList() is to force evaluation, I have breakpoints in StringCollectionEqualityComparer, but unfortunately, they're not invoked, as expected)
When I group collection in this dumb way, it actually works.
var grouping = collection.GroupBy(obj => String.Join("|", obj.StringCollection));
Unfortunately, obviously it is not something I want to use.
By not working, I mean the results are not the ones I expect (using dumb way, the results are correct).
StringCollectionEqualityComparer.Default is a valid alternative way to access EqualityComparer<IEnumerable<string>>.Default, since the latter is a base class of the former. You need to create an instance of StringCollectionEqualityComparer, simply using new StringCollectionEqualityComparer(), instead.

C# Linq, Searching for same items in two lists

we have the following setup:
We have a array of objects with a string in it (xml-ish but not normalized) and we have a list/array of strings with id.
We need to find out if a string from that list with id's is also pressent in one of the objects.
Here we have a setup that we have tried:
public class Wrapper
{
public string MyProperty { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<Wrapper> wrappers = new List<Wrapper>()
{
new Wrapper{ MyProperty = "<flkds,dlsklkdlsqkdkqslkdlqk><id>3</id><sqjldkjlfdskjlkfjsdklfj>"},
new Wrapper{ MyProperty = "<flkds,dlsklkdlsqkdkqslkdlqk><id>2</id><sqjldkjlfdskjlkfjsdklfj>"}
};
string[] ids = { "<id>0</id>", "<id>1</id>", "<id>2</id>" };
var props = wrappers.Select(w => w.MyProperty);
var intersect = props.Intersect(ids, new MyEquilityTester());
Debugger.Break();
}
}
class MyEquilityTester: IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return x.Contains(y);
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
Edit:
What we expect is when we do a .Any() on intersect that is says true because wrappers has a object with a prop that contains <id>2</id>, intersect is null.
If we are using the wrong method please say. It should work as fast as posible. A simple true when found will do!
For your case, you could write your IEqualitycomparer like this:
class MyEquilityTester: IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return x.Contains(y) || y.Contains(x);
}
public int GetHashCode(string obj)
{
return 0;
}
}
and it will find
<flkds,dlsklkdlsqkdkqslkdlqk><id>2</id><sqjldkjlfdskjlkfjsdklfj>
This works because GetHashCode always return 0, and the x.Contains(y) || y.Contains(x) check.
Another not-so-hacky solution is to use a Where in combination with Any
IEnumerable<String> intersect = props.Where(p => ids.Any (i => p.Contains(i)));
or replace the Where with another Any if you don't care about the actual items and you only want a true or false.
bool intersect = props.Any(p => ids.Any (i => p.Contains(i)));
wrappers.Where(w=>ids.Any(i=>w.MyProperty.Contains(i)))

Categories