find common items across multiple lists in C# - c#

I have two generic list :
List<string> TestList1 = new List<string>();
List<string> TestList2 = new List<string>();
TestList1.Add("1");
TestList1.Add("2");
TestList1.Add("3");
TestList2.Add("3");
TestList2.Add("4");
TestList2.Add("5");
What is the fastest way to find common items across these lists?

Assuming you use a version of .Net that has LINQ, you can use the Intersect extension method:
var CommonList = TestList1.Intersect(TestList2)

If you have lists of objects and want to get the common objects for some property then use;
var commons = TestList1.Select(s1 => s1.SomeProperty).ToList().Intersect(TestList2.Select(s2 => s2.SomeProperty).ToList()).ToList();
Note: SomeProperty refers to some criteria you want to implement.

Assuming you have LINQ available. I don't know if it's the fastest, but a clean way would be something like:
var distinctStrings = TestList1.Union(TestList2).Distinct();
var distinctStrings = TestList1.Union(TestList2);
Update: well never mind my answer, I've just learnt about Intersect as well!
According to an update in the comments, Unions apply a distinct, which makes sense now that I think about it.

You can do this by counting occurrences of all items in all lists - those items whose occurrence count is equal to the number of lists, are common to all lists:
static List<T> FindCommon<T>(IEnumerable<List<T>> lists)
{
Dictionary<T, int> map = new Dictionary<T, int>();
int listCount = 0; // number of lists
foreach (IEnumerable<T> list in lists)
{
listCount++;
foreach (T item in list)
{
// Item encountered, increment count
int currCount;
if (!map.TryGetValue(item, out currCount))
currCount = 0;
currCount++;
map[item] = currCount;
}
}
List<T> result= new List<T>();
foreach (KeyValuePair<T,int> kvp in map)
{
// Items whose occurrence count is equal to the number of lists are common to all the lists
if (kvp.Value == listCount)
result.Add(kvp.Key);
}
return result;
}

Sort both arrays and start from the top of both and compare if they are equal.
Using a hash is even faster: Put the first array in a hash, then compare every item of the second array if it is already in the hash.
I don't know those Intersect and Union are implemented. Try to find out their running time if you care about the performance. Of course they are better suited if you need clean code.

Use the Intersect method:
IEnumerable<string> result = TestList1.Intersect(TestList2);

Using HashSet for fast lookup. Here is the solution:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
List<int> list1 = new List<int> {1, 2, 3, 4, 5, 6 };
List<int> list2 = new List<int> {1, 2, 3 };
List<int> list3 = new List<int> {1, 2 };
var lists = new IEnumerable<int>[] {list1, list2, list3 };
var commons = GetCommonItems(lists);
Console.WriteLine("Common integers:");
foreach (var c in commons)
Console.WriteLine(c);
}
static IEnumerable<T> GetCommonItems<T>(IEnumerable<T>[] lists)
{
HashSet<T> hs = new HashSet<T>(lists.First());
for (int i = 1; i < lists.Length; i++)
hs.IntersectWith(lists[i]);
return hs;
}
}

Following the lead of #logicnp on counting the number of lists containing each member, once you have your list of lists, it's pretty much one line of code:
List<int> l1, l2, l3, cmn;
List<List<int>> all;
l1 = new List<int>() { 1, 2, 3, 4, 5 };
l2 = new List<int>() { 1, 2, 3, 4 };
l3 = new List<int>() { 1, 2, 3 };
all = new List<List<int>>() { l1, l2, l3 };
cmn = all.SelectMany(x => x).Distinct()
.Where(x => all .Select(y => (y.Contains(x) ? 1 : 0))
.Sum() == all.Count).ToList();
Or, if you prefer:
public static List<T> FindCommon<T>(IEnumerable<List<T>> Lists)
{
return Lists.SelectMany(x => x).Distinct()
.Where(x => Lists.Select(y => (y.Contains(x) ? 1 : 0))
.Sum() == Lists.Count()).ToList();
}

Related

Dictionary with a list of integers as key where the order of the values in the list should matter

Hi I'm trying to use a Dictionary with a list of integers as key where the order of the values in the list should matter. So if a have something like this:
List<int> list1 = new List<int>();
list1.Add(1);
list1.Add(2);
list1.Add(3);
List<int> list2 = new List<int>();
list2.Add(1);
list2.Add(2);
list2.Add(3);
Dictionary<List<int>,int> dictionary = new Dictionary<List<int>,int>();
dictionary.Add(list2 , 100);
I want to be able to access the value in the dictionary with list2. So in term of functionality I want it to work similar to
Enumerable.SequenceEqual(list1, list2)
You should build your Dictionary using the constructor accepting an IEqualityComparer.
In the implementation of that IEqualityComparer you should use SequenceEqual instead of / in addition to standard equality.
Example (on C# PlayGround):
public class EnumerableComparer<T> : IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
{
return Object.ReferenceEquals(x, y) || (x != null && y != null && x.SequenceEqual(y));
}
public int GetHashCode(IEnumerable<T> obj)
{
unchecked
{
return obj.Where(e => e != null).Select(e => e.GetHashCode()).Aggregate(17, (a, b) => 23 * a + b);
}
}
}
var key1 = new List<int> { 1, 3, 5 };
var key2 = new List<int> { 2, 4, 6 };
var key3 = new List<int> { 1, 3, 5 };
var comparer = new EnumerableComparer<int>();
var dictionary = new Dictionary<List<int>, string>(comparer);
dictionary.Add(key1, "hello");
dictionary.Add(key2, "world");
Console.WriteLine(dictionary[key3]); // prints "hello"
Info: A quick research shows that the mentioned Dictionary constructor is available since at least .NET Framework 2.0. It's available in all versions up to the most current one (at the time of writing: .NET 6).
Update (2022-03-25): renamed IEnumerableComparer to EnumerableComparer
The easiest solution would be to just serialize the list of ints into a string.
For example:
Dictionary<string,int> dictionary = new Dictionary<string,int>();
dictionary.Add(string.Join(",", list2), 100);
You could do a less hacky thing where you define your own class which encapsulates the list and properly implements GetHashCode() and Equals(). important: You will need to be careful with this class since lists are mutable. If you insert something into the list after you have added the item to the dictionary you won't get the expected behavior:
For example:
List<int> list1 = new List<int>();
list1.Add(1);
list1.Add(2);
list1.Add(3);
MyListWrapper wrapper1 = new MyListWrapper(list1);
List<int> list2 = new List<int>();
list2.Add(1);
list2.Add(2);
list2.Add(3);
MyListWrapper wrapper2 = new MyListWrapper(list2);
Dictionary<MyListWrapper,int> dictionary = new Dictionary<MyListWrapper,int>();
dictionary.Add(wrapper1, 100);
Assert.AreEqual(100, dictionary[wrapper2]); // this will work
list1.Add(5);
list2.Add(5);
Assert.AreEqual(100, dictionary[wrapper2]); // this will NOT work

How to add List<T> items dynamically to IEnumerable<T>

Code
public static void Main()
{
List<int> list1 = new List<int> {1, 2, 3, 4, 5, 6 };
List<int> list2 = new List<int> {1, 2, 3 };
List<int> list3 = new List<int> {1, 2 };
var lists = new IEnumerable<int>[] { list1, list2, list3 };
var commons = GetCommonItems(lists);
Console.WriteLine("Common integers:");
foreach (var c in commons)
Console.WriteLine(c);
}
static IEnumerable<T> GetCommonItems<T>(IEnumerable<T>[] lists)
{
HashSet<T> hs = new HashSet<T>(lists.First());
for (int i = 1; i < lists.Length; i++)
hs.IntersectWith(lists[i]);
return hs;
}
As for the sample, I showed "list1" "list2" "list3", but I may have more than 50 lists that are generating each list using for each loop. How can I add programmatically each "list" to IEnumerable lists for comparing data of each list?
I tried many ways like conversion to list, Add, Append, Concat but nothing worked.
Is there any other best way to compare the N number of lists?
The output of Code: 1 2
You can create a list of lists and add lists to that list dynamically. Something like this:
var lists = new List<List<int>>();
lists.Add(new List<int> {1, 2, 3, 4, 5, 6 });
lists.Add(new List<int> {1, 2, 3 });
lists.Add(new List<int> {1, 2 });
foreach (var list in listSources)
lists.Add(list);
var commons = GetCommonItems(lists);
To find intersections you can use this solution for example: Intersection of multiple lists with IEnumerable.Intersect() (actually looks like that's what you are using already).
Also make sure to change the signature of the GetCommonItems method:
static IEnumerable<T> GetCommonItems<T>(List<List<T>> lists)
What you could do is allow the GetCommonItems method to accept a variable amount of parameters using the params keyword. This way, you avoid needing to create a new collection of lists.
It goes without saying, however, that if the amount of lists in your source is variable as well, this could be trickier to use.
I've also amended the GetCommonItems method to work like the code from https://stackoverflow.com/a/1676684/9945524
public static void Main()
{
List<int> list1 = new List<int> { 1, 2, 3, 4, 5, 6 };
List<int> list2 = new List<int> { 1, 2, 3 };
List<int> list3 = new List<int> { 1, 2 };
var commons = GetCommonItems(list1, list2, list3); // pass the lists here
Console.WriteLine("Common integers:");
foreach (var c in commons)
Console.WriteLine(c);
}
static IEnumerable<T> GetCommonItems<T>(params List<T>[] lists)
{
return lists.Skip(1).Aggregate(
new HashSet<T>(lists.First()),
(hs, lst) =>
{
hs.IntersectWith(lst);
return hs;
}
);
}
Alternate solution using your existing Main method.
EDIT: changed the type of lists to IEnumerable<IEnumerable<T>> as per comment in this answer.
public static void Main()
{
List<int> list1 = new List<int> { 1, 2, 3, 4, 5, 6 };
List<int> list2 = new List<int> { 1, 2, 3 };
List<int> list3 = new List<int> { 1, 2 };
var lists = new List<List<int>> { list1, list2, list3 };
var commons = GetCommonItems(lists);
Console.WriteLine("Common integers:");
foreach (var c in commons)
Console.WriteLine(c);
}
static IEnumerable<T> GetCommonItems<T>(IEnumerable<IEnumerable<T>> enumerables)
{
return enumerables.Skip(1).Aggregate(
new HashSet<T>(enumerables.First()),
(hs, lst) =>
{
hs.IntersectWith(lst);
return hs;
}
);
}
IEnumerable is immutable so you always should return an implementation of IEnumerable depending on your needs.
If I understand correctly you want to get common items of N lists. I would use LINQ for this.
My proposition:
1. make one list that contains all of the items. =>
var allElements = new List<int>();
var lists = new List<List<int>>();
foreach (list in lists)
allElements.AddRange(list);
Take items that are repetitive
allElements.GroupBy(x => x).Where(x => x.Count() > 1).Select(x => x).ToList();

Swapping elements between 2 hashsets c#

If I have 2 hashsets of size 5, how can I take x items from the first hashset and swap them with x items from the second one ?
For example :
HashSet 1 has elements = {a , b, c , d ,e}
HashSet 2 has elements = {r , s ,t ,u , w}
After the swap I would like to obtain:
HashSet 1 = {r, s , c ,d e}
HashSet 2 = {a, b, t , u ,w}
I don't need a speficic order.
Perhaps:
HashSet<string> hash1 = new HashSet<string>() { "A1", "B1", "C1", "D1" };
HashSet<string> hash2 = new HashSet<string>() { "A2", "B2", "C2", "D2" };
var firstThreeInOne = hash1.Take(3).ToList();
var firstThreeInTwo = hash2.Take(3).ToList();
foreach (string str in firstThreeInOne)
hash1.Remove(str);
foreach (string str in firstThreeInTwo)
hash2.Remove(str);
foreach (string str in firstThreeInTwo)
hash1.Add(str);
foreach (string str in firstThreeInOne)
hash2.Add(str);
Test:
Console.WriteLine(string.Join(",", hash1)); // C2,B2,A2,D1
Console.WriteLine(string.Join(",", hash2)); // C1,B1,A1,D2
But note that a HashSet does not guarantee insertion order. It is simply not an ordered collection.
MSDN mentions that explicitly:
The HashSet<T> class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
You can't unless you will implement hashset with ordering.
HashSet class of CLR has not ordering, and you can't get "first N" elements
As DarkFalcon and others said the HashSet per definition has no order and therefor there are no first x elements in it.
BUT as with every collection you can get elements with the ElementAt method in some kind of order.
Be aware that you cannot know WHICH of the elements are counted as the first ones...
void Main()
{
var hashSet1 = new HashSet<int>();
hashSet1.Add(1);
hashSet1.Add(2);
hashSet1.Add(3);
hashSet1.Add(4);
hashSet1.Add(5);
var hashSet2 = new HashSet<int>();
hashSet2.Add(6);
hashSet2.Add(7);
hashSet2.Add(8);
hashSet2.Add(9);
hashSet2.Add(0);
SwapHashSets(hashSet1, hashSet2, 3);
}
private List<int> GetXValuesFromHashSet(HashSet<int> hashSet, int count)
{
var list = new List<int>();
for (var i = 0; i < count; i++)
{
list.Add(hashSet.ElementAt(i));
}
return list;
}
private void SwapHashSets(HashSet<int> hashSet1, HashSet<int> hashSet2, int count )
{
var list1 = GetXValuesFromHashSet(hashSet1, count);
var list2 = GetXValuesFromHashSet(hashSet2, count);
foreach (var value in list1)
{
hashSet1.Remove(value);
}
foreach (var value in list2)
{
hashSet2.Remove(value);
}
foreach (var value in list1)
{
hashSet2.Add(value);
}
foreach (var value in list2)
{
hashSet1.Add(value);
}
}

Compare size (Count) of many lists

I was wondering if I can compare size of many lists in a elegant and fast way.
Basically this is my problem, I need to assert that 6 lists have the same size. So the usual way is something like (warning ugly code..):
if (list1.Count == list2.Count && list1.Count == list3.Count && .....) {
//ok, so here they have same size.
}
Some Jedi alternatives here?
The all() method should do the trick:
http://msdn.microsoft.com/en-us/library/bb548541.aspx.
Code should look like this, I think:
(new[] {list1, list2, list3, list4, list5, list6}).
All(list => list.Count == list1.Count);
Using Enumerable.All you can check that all lists match the same criteria:
var allLists = new[] { list1, list2, list3 };
bool result = allLists.All(l => l.Count == allLists[0].Count);
Or as a one-liner, but you would then need to refer to a particular list:
bool result = (new[] { list1, list2, list3 }).All(l => l.Count == list1.Count);
How about with LINQ:
bool allSameSize = new[] { list1, list2, list3, list4, list5, list6 }
.Select(list => list.Count)
.Distinct()
.Take(2) // Optimization, not strictly necessary
.Count() == 1;
This idea works for any kind of sequence (not just lists), and will quick-reject as soon as two distinct counts are found.
On another note, is there any reason that the lists aren't part of a "list of lists" collection?
If you make this kind of comparison at just one place, then it is probably not worth trying to make it shorter (especially if it impacts the performance).
However, if you compare list lengths at more than one place, it is perhaps worthwhile putting it in a function then reusing it many times:
static bool SameLength<T>(params IList<T>[] lists) {
int len = -1;
foreach (var list in lists) {
int list_len = list.Count;
if (len >= 0 && len != list_len)
return false;
len = list_len;
}
return true;
}
static void Main(string[] args) {
// All of these lists have same length (2):
var list1 = new List<int> { 1, 2 };
var list2 = new List<int> { 3, 4 };
var list3 = new List<int> { 5, 6 };
var list4 = new List<int> { 7, 8 };
var list5 = new List<int> { 9, 10 };
var list6 = new List<int> { 11, 12 };
if (SameLength(list1, list2, list3, list4, list5, list6)) {
// Executed.
}
// But this one is different (length 3):
var list7 = new List<int> { 11, 22, 33 };
if (SameLength(list1, list2, list3, list7, list4, list5, list6)) {
// Not executed.
}
}
--- EDIT ---
Based on Dean Barnes' idea, you could even do this for extra-short implementation:
static bool SameLength<T>(params IList<T>[] lists) {
return lists.All(list => list.Count == lists[0].Count);
}
var lists = new [] { list1, list2, list3 ... };
bool diffLengths = lists.Select(list => list.Count).Distinct().Skip(1).Any();
Or
bool sameLen = new HashSet<int>(lists.Select(list => list.Count)).Count <= 1;

LINQ intersect, multiple lists, some empty

I'm trying to find an intersect with LINQ.
Sample:
List<int> int1 = new List<int>() { 1,2 };
List<int> int2 = new List<int>();
List<int> int3 = new List<int>() { 1 };
List<int> int4 = new List<int>() { 1, 2 };
List<int> int5 = new List<int>() { 1 };
Want to return: 1 as it exists in all lists.. If I run:
var intResult= int1
.Intersect(int2)
.Intersect(int3)
.Intersect(int4)
.Intersect(int5).ToList();
It returns nothing as 1 obviously isn't in the int2 list. How do I get this to work regardless if one list is empty or not ?
Use the above example or:
List<int> int1 = new List<int>() { 1,2 };
List<int> int2 = new List<int>();
List<int> int3 = new List<int>();
List<int> int4 = new List<int>();
List<int> int5 = new List<int>();
How do I return 1 & 2 in this case.. I don't know ahead of time if the lists are populated...
If you need it in a single step, the simplest solution is to filter out empty lists:
public static IEnumerable<T> IntersectNonEmpty<T>(this IEnumerable<IEnumerable<T>> lists)
{
var nonEmptyLists = lists.Where(l => l.Any());
return nonEmptyLists.Aggregate((l1, l2) => l1.Intersect(l2));
}
You can then use it on a collection of lists or other IEnumerables:
IEnumerable<int>[] lists = new[] { l1, l2, l3, l4, l5 };
var intersect = lists.IntersectNonEmpty();
You may prefer a regular static method:
public static IEnumerable<T> IntersectNonEmpty<T>(params IEnumerable<T>[] lists)
{
return lists.IntersectNonEmpty();
}
var intersect = ListsExtensionMethods.IntersectNonEmpty(l1, l2, l3, l4, l5);
You could write an extension method to define that behaviour. Something like
static class MyExtensions
{
public static IEnumerable<T> IntersectAllIfEmpty<T>(this IEnumerable<T> list, IEnumerable<T> other)
{
if (other.Any())
return list.Intersect(other);
else
return list;
}
}
So the code below would print 1.
List<int> list1 = new List<int>() { 1, 2 };
List<int> list2 = new List<int>();
List<int> list3 = new List<int>() { 1 };
foreach (int i in list1.IntersectAllIfEmpty(list2).IntersectAllIfEmpty(list3))
Console.WriteLine(i);
Update:
Anon brings up a good point in the comments to the question. The above function will result in an empty set if list itself is empty, which should be desirable. This means if the first list in the method chain or the result set of any intersection is empty, the final result will be empty.
To allow for an empty first list but not for empty result sets, you could take a different approach. This is a method which is not an extension method, but rather takes a params array of IEnumerables and first filters out the empty sets and then attempts to intersect the rest.
public static IEnumerable<T> IntersectAllIfEmpty<T>(params IEnumerable<T>[] lists)
{
IEnumerable<T> results = null;
lists = lists.Where(l => l.Any()).ToArray();
if (lists.Length > 0)
{
results = lists[0];
for (int i = 1; i < lists.Length; i++)
results = results.Intersect(lists[i]);
}
else
{
results = new T[0];
}
return results;
}
You would use it like this
List<int> list0 = new List<int>();
List<int> list1 = new List<int>() { 1, 2 };
List<int> list2 = new List<int>() { 1 };
List<int> list3 = new List<int>() { 1,2,3 };
foreach (int i in IntersectAllIfEmpty(list0, list1, list2, list3))
{
Console.WriteLine(i);
}

Categories