Items Common to Most Lists - c#

Given a list of lists (let's say 5 lists, to have a real number with which to work), I can find items that are common to all 5 lists with relative ease (see Intersection of multiple lists with IEnumerable.Intersect()) using a variation of the following code:
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());
Now let's say that intersection ends up containing 0 items. It's quite possible that there are some objects that are common to 4/5 lists. How would I go about finding them in the most efficient way?
I know I could just run through all the combinations of 4 lists and save all the results, but that method doesn't scale very well (this will eventually have to be done on approx. 40 lists).
If no item is common to 4 lists, then the search would be repeated looking for items common to 3/5 lists, etc. Visually, this could be represented by lists of grid points and we're searching for the points that have the most overlap.
Any ideas?
EDIT:
Maybe it would be better to look at each point and keep track of how many times it appears in each list, then create a list of the points with the highest occurrence?

You can select all numbers (points) from all lists, and group them by value. Then sort result by group size (i.e. lists count where point present) and select most common item:
var mostCommon = listOfLists.SelectMany(l => l)
.GroupBy(i => i)
.OrderByDescending(g => g.Count())
.Select(g => g.Key)
.First();
// outputs 3
Instead of taking only first item, you can take several top items by replacing First() with Take(N).
Returning items with number of lists (ordered by number of lists):
var mostCommonItems = from l in listOfLists
from i in l
group i by i into g
orderby g.Count() descending
select new {
Item = g.Key,
NumberOfLists = g.Count()
};
Usage (item is a strongly-typed anonymous object):
var topItem = mostCommonItems.First();
var item = topItem.Item;
var listsCount = topItem.NumberOfLists;
foreach(var item in mostCommonItems.Take(3))
// iterate over top three items

You can first combine all the lists, then find the Mode of the list using a dictionary strategy as follows. This makes it pretty fast:
/// <summary>
/// Gets the element that occurs most frequently in the collection.
/// </summary>
/// <param name="list"></param>
/// <returns>Returns the element that occurs most frequently in the collection.
/// If all elements occur an equal number of times, a random element in
/// the collection will be returned.</returns>
public static T Mode<T>(this IEnumerable<T> list)
{
// Initialize the return value
T mode = default(T);
// Test for a null reference and an empty list
if (list != null && list.Count() > 0)
{
// Store the number of occurences for each element
Dictionary<T, int> counts = new Dictionary<T, int>();
// Add one to the count for the occurence of a character
foreach (T element in list)
{
if (counts.ContainsKey(element))
counts[element]++;
else
counts.Add(element, 1);
}
// Loop through the counts of each element and find the
// element that occurred most often
int max = 0;
foreach (KeyValuePair<T, int> count in counts)
{
if (count.Value > max)
{
// Update the mode
mode = count.Key;
max = count.Value;
}
}
}
return mode;
}

Related

How to create new list from list of list where elements are in new list are in alternative order? [duplicate]

This question already has answers here:
Interleaving multiple (more than 2) irregular lists using LINQ
(5 answers)
Closed 5 years ago.
Suppose I have list of list. I want to create new list from given list of list such that elements are in order of example given below.
Inputs:-
List<List<int>> l = new List<List<int>>();
List<int> a = new List<int>();
a.Add(1);
a.Add(2);
a.Add(3);
a.Add(4);
List<int> b = new List<int>();
b.Add(11);
b.Add(12);
b.Add(13);
b.Add(14);
b.Add(15);
b.Add(16);
b.Add(17);
b.Add(18);
l.Add(a);
l.Add(b);
Output(list):-
1
11
2
12
3
13
4
14
15
16
And output list must not contain more than 10 elements.
I am currently doing this using foreach inside while but I want to know how can I do this using LINQ.
int loopCounter = 0,index=0;
List<int> o=new List<int>();
while(o.Count<10)
{
foreach(List<int> x in l)
{
if(o.Count<10)
o.Add(x[index]);
}
index++;
}
Thanks.
Use the SelectMany and Select overloads that receive the item's index. That will be used to apply the desired ordering. The use of the SelectMany is to flatten the nested collections level. Last, apply Take to retrieve only the desired number of items:
var result = l.SelectMany((nested, index) =>
nested.Select((item, nestedIndex) => (index, nestedIndex, item)))
.OrderBy(i => i.nestedIndex)
.ThenBy(i => i.index)
.Select(i => i.item)
.Take(10);
Or in query syntax:
var result = (from c in l.Select((nestedCollection, index) => (nestedCollection, index))
from i in c.nestedCollection.Select((item, index) => (item, index))
orderby i.index, c.index
select i.item).Take(10);
If using a C# 6.0 and prior project an anonymous type instead:
var result = l.SelectMany((nested, index) =>
nested.Select((item, nestedIndex) => new {index, nestedIndex, item}))
.OrderBy(i => i.nestedIndex)
.ThenBy(i => i.index)
.Select(i => i.item)
.Take(10);
To explain why Zip alone is not enough: zip is equivalent to performing a join operation on the second collection to the first, where the
attribute to join by is the index. Therefore Only items that exist in the first collection, if they have a match in the second, will appear in the result.
The next option is to think about left join which will return all items of the first collection with a match (if exists) in the second. In the case described OP is looking for the functionality of a full outer join - get all items of both collection and match when possible.
I know you asked for LINQ, but I do often feel that LINQ is a hammer and as soon as a developer finds it, every problem is a nail. I wouldn't have done this one with LINQ, for a readability/maintainability point of view because I think something like this is simpler and easier to understand/more self documenting:
List<int> r = new List<int>(10);
for(int i = 0; i < 10; i++){
if(i < a.Count)
r.Add(a[i]);
if(i < b.Count)
r.Add(b[i]);
}
You don't need to stop the loop early if a and b collectively only have eg 8 items, but you could by extending the test of the for loop
I also think this case may be more performant than LINQ because it's doing a lot less
If your mandate to use LINQ is academic (this is a homework that must use LINQ) then go ahead, but if it's a normal everyday system that some other poor sucker will have to maintain one day, I implore you to consider whether this is a good application for LINQ
This will handle 2 or more internal List<List<int>>'s - it returns an IEnumerable<int> via yield so you have to call .ToList() on it to make it a list. Linq.Any is used for the break criteria.
Will throw on any list being null. Add checks to your liking.
static IEnumerable<int> FlattenZip (List<List<int>> ienum, int maxLength = int.MaxValue)
{
int done = 0;
int index = 0;
int yielded = 0;
while (yielded <= maxLength && ienum.Any (list => index < list.Count))
foreach (var l in ienum)
{
done++;
if (index < l.Count)
{
// this list is big enough, we will take one out
yielded++;
yield return l[index];
}
if (yielded > maxLength)
break; // we are done
if (done % (ienum.Count) == 0)
index += 1; // checked all lists, advancing index
}
}
public static void Main ()
{
// other testcases to consider:
// in total too few elememts
// one list empty (but not null)
// too many lists (11 for 10 elements)
var l1 = new List<int> { 1, 2, 3, 4 };
var l2 = new List<int> { 11, 12, 13, 14, 15, 16 };
var l3 = new List<int> { 21, 22, 23, 24, 25, 26 };
var l = new List<List<int>> { l1, l2, l3 };
var zipped = FlattenZip (l, 10);
Console.WriteLine (string.Join (", ", zipped));
Console.ReadLine ();
}

Get every nth element or last

I'm hitting a brick wall with this, and I just can't seem to wrap my head around it.
Given a List of objects, how can i get every third element starting from the end (so the third to last, sixth to last etc) but if it gets to the end and there are only 1 or 2 left, returns the first element.
I'm essentially trying to simulate drawing three cards from the Stock and checking for valid moves in a game of patience, but for some reason i'm struggling with this one concept.
EDIT:
So far I've tried looked into using the standard for loop increasing the step. That leads me to the second need which is to get the first element if there are less than three on the final loop.
I've tried other suggestions on stack overflow for getting nth element from a list, however they all also don't provide the second requirement.
Not entirely sure what code i could post that wouldn't be a simple for loop. as my problem is the logic for the code, not the code itself.
For example:
Given the list
1,2,3,4,5,6,7,8,9,10
i would like to get a list with
8, 5, 2, 1
as the return.
pseudocode:
List<object> filtered = new List<object>();
List<object> reversedList = myList.Reverse();
if(reversedList.Count % 3 != 0)
{
return reversedList.Last();
}
else
{
for(int i = 3; i < reversedList.Count; i = i +3)
{
filterList.Add(reversedList[i]);
}
if(!filterList.Contains(reversedList.Last())
{
filterList.Add(reversedList.Last());
}
Try using this code -
List<int> list = new List<int>();
List<int> resultList = new List<int>();
int count = 1;
for (;count<=20;count++) {
list.Add(count);
}
for (count=list.Count-3;count>=0;count-=3)
{
Debug.Log(list[count]);
resultList.Add(list[count]);
}
if(list.Count % 3 > 0)
{
Debug.Log(list[0]);
resultList.Add(list[0]);
}
Had to try and do it with linq.
Not sure if it live up to your requirements but works with your example.
var list = Enumerable.Range(1, 10).ToList();
//Start with reversing the order.
var result = list.OrderByDescending(x => x)
//Run a select overload with index so we can use position
.Select((number, index) => new { number, index })
//Only include items that are in the right intervals OR is the last item
.Where(x => ((x.index + 1) % 3 == 0) || x.index == list.Count() - 1)
//Select only the number to get rid of the index.
.Select(x => x.number)
.ToList();
Assert.AreEqual(8, result[0]);
Assert.AreEqual(5, result[1]);
Assert.AreEqual(2, result[2]);
Assert.AreEqual(1, result[3]);

Check if list contains another list. C#

EDIT, just saying the comment in the ContainsAllItem explains best.
Im sorry for asking, I know this was asked before, but I just did not get it.
Ok, so I want to check If a list contains all the items in another list WITHOUT overlapping, aswell as compare the items based on the classes string, name variable(called itemname and it is public).
public class Item
{
public string itemname;
}
So basically, have a class(lets say.. class A) with a list of items, and a function that checks takes the list of items of class A, and then compares it to another list(lets call it B), but compare it by the itemname variable not the whole item.
And most importantly could you explain in detail what it does.
So how the function/class would look as of now.
public class SomeClass
{
public List<Item> myItems = new List<Item>();
public bool ContainsAllItems(List<Item> B)
{
//Make a function that compares the to lists by itemname and only returns true if the myItems list contains ALL, items in list b.
//Also could you explain how it works.
}
}
I haven't checked the pref on this, but linq does have the Except operator.
var x = new int[] {4,5};
var y = new int[] {1 ,2 ,3 ,4 ,5};
y.Except(x).Any(); //true, not all items from y are in x
x.Except(y).Any(); // false, all items from x are in y
This isn't exactly what you asked for, but performance wise you should definitely use HashSet's IsProperSubsetOf. It can do what you want in orders of magnitude less time:
HashSet<string> a = new HashSet<string>(list1.Select(x => x.itemname));
HashSet<string> b = new HashSet<string>(list2.Select(x => x.itemname));
a.IsProperSubsetOf(b)
Explanation: HashSet uses the item's GetHashCode value and Equals method in an efficient way to compare items. That means that when it internally goes through the values in b it doesn't have to compare it to all other items in a. It uses the hash code (and an internal hash function) to check whether it already has that value or doesn't.
Because it does only a single check for every item (each check is O(1)) it's much faster than checking all items in a which would take O(n) (for each item in b that is).
B.All(itB=>myItems.Select(itA=>itA.ItemName).Contains(itB.ItemName))
Will run in O(N^2) time, but it's cool you can do that in just one rather unreadable line.
Here is another way. I included a way to include and exclude the list comparison.
var a = new List<int> { 1, 2, 3, 4, 5 };
var b = new List<int> { 1, 2, 3 };
//Exists in list a but not in b
var c = (from i
in a
let found = b.Any(j => j == i)
where !found select i)
.ToList();
//Exists in both lists
var d = (from i
in a
let found = b.Any(j => j == i)
where found select i)
.ToList();

c# implementation get range of values and union of these ranges

I have a situation that is well explained in this question:
Range intersection / union
I need a C# implementation (a collection maybe) that takes a list of ranges (of ints) and do the union of them.
Then I need to iterate through all ints in this collection (also numbers between ranges)
Are there any library/implementation so that I don't have to rewrite everything by myself?
You might take a look at this implementation and see if it will fit your needs.
Combine ranges with Range.Coalesce:
var range1 = Range.Create(0, 5, "Range 1");
var range2 = Range.Create(11, 41, "Range 2");
var range3 = Range.Create(34, 50, "Range 3");
var ranges = new List<Range> { range1, range2, range3 };
var unioned = Range.Coalesce(ranges);
Iterate over ranges with .Iterate:
foreach (var range in unioned)
{
foreach (int i in range.Iterate(x => x + 1))
{
Debug.WriteLine(i);
}
}
The simplest thing that comes to my mind is to use Enumerable.Range, and then treat the different IEnumerable with standard linq operators. Something like:
var list = Enumerable.Range(1, 5)
.Concat(Enumerable.Range(7, 11))
.Concat(Enumerable.Range(13, 22))
foreach(var number in list)
// Do something
Obviously you can use Union and Intersect as well... clearly you can also put your ranges in a List<IEnumerable<int>> or something similar and then iterate over the elements for producing a single list of the elements:
var ranges = new List<IEnumerable<int>>
{
Enumerable.Range(1, 5),
Enumerable.Range(7, 11),
Enumerable.Range(10, 22)
};
var unionOfRanges = Enumerable.Empty<int>();
foreach(var range in ranges)
unionOfRanges = unionOfRanges.Union(range);
foreach(var item in unionOfRanges)
// Do something
The following is vanilla Linq implementation:
var r1 = Enumerable.Range(1,10);
var r2 = Enumerable.Range(20,5);
var r3 = Enumerable.Range(-5,10);
var union = r1.Union(r2).Union(r3).Distinct();
foreach(var n in union.OrderBy(n=>n))
Console.WriteLine(n);
System.Collections.Generic.HashSet has just the thing:
UnionWith( IEnumerable<T> other ). Modifies the current HashSet object to contain all elements that are present in itself, the specified collection, or both.
IntersectWith( IEnumerable<T> other ). Modifies the current HashSet object to contain only elements that are present in that object and in the specified collection.
The data structure you are looking for is called an "interval tree".
You can find different implementations on the net.
For example here's one: http://www.emilstefanov.net/Projects/RangeSearchTree.aspx

How to get duplicate items from a list using LINQ? [duplicate]

This question already has answers here:
C# LINQ find duplicates in List
(13 answers)
Closed 3 years ago.
I'm having a List<string> like:
List<String> list = new List<String>{"6","1","2","4","6","5","1"};
I need to get the duplicate items in the list into a new list. Now I'm using a nested for loop to do this.
The resulting list will contain {"6","1"}.
Is there any idea to do this using LINQ or lambda expressions?
var duplicates = lst.GroupBy(s => s)
.SelectMany(grp => grp.Skip(1));
Note that this will return all duplicates, so if you only want to know which items are duplicated in the source list, you could apply Distinct to the resulting sequence or use the solution given by Mark Byers.
Here is one way to do it:
List<String> duplicates = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
The GroupBy groups the elements that are the same together, and the Where filters out those that only appear once, leaving you with only the duplicates.
Here's another option:
var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };
var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));
I know it's not the answer to the original question, but you may find yourself here with this problem.
If you want all of the duplicate items in your results, the following works.
var duplicates = list
.GroupBy( x => x ) // group matching items
.Where( g => g.Skip(1).Any() ) // where the group contains more than one item
.SelectMany( g => g ); // re-expand the groups with more than one item
In my situation I need all duplicates so that I can mark them in the UI as being errors.
I wrote this extension method based off #Lee's response to the OP. Note, a default parameter was used (requiring C# 4.0). However, an overloaded method call in C# 3.0 would suffice.
/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable<T></remarks>
public static IEnumerable<T> Duplicates<T>
(this IEnumerable<T> source, bool distinct = true)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
// select the elements that are repeated
IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));
// distinct?
if (distinct == true)
{
// deferred execution helps us here
result = result.Distinct();
}
return result;
}
List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };
var q = from s in list
group s by s into g
where g.Count() > 1
select g.First();
foreach (var item in q)
{
Console.WriteLine(item);
}
Hope this wil help
int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
foreach (var d in duplicates)
Console.WriteLine(d);
I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list. So I came up with looping through the groups to repack the original List with items that have duplicates.
public List<MediaFileInfo> GetDuplicatePictures()
{
List<MediaFileInfo> dupes = new List<MediaFileInfo>();
var grpDupes = from f in _fileRepo
group f by f.Length into grps
where grps.Count() >1
select grps;
foreach (var item in grpDupes)
{
foreach (var thing in item)
{
dupes.Add(thing);
}
}
return dupes;
}
All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.
The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.
As always in LINQ there are two versions, one with IEqualityComparer and one without it.
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
IEqualityComparer<TSource> comparer);
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (comparer == null)
comparer = EqualityCompare<TSource>.Default;
HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
foreach (TSource sourceItem in source)
{
if (!foundElements.Contains(sourceItem))
{ // we've not seen this sourceItem before. Add to the foundElements
foundElements.Add(sourceItem);
}
else
{ // we've seen this item before. It is a duplicate!
yield return sourceItem;
}
}
}
Usage:
IEnumerable<MyClass> myObjects = ...
// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();
// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)
// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
.Where(duplicate => duplicate.Name == "MyName")
.Take(5);
For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.
IMHO that is an efficiency boost to consider.

Categories