Get Non-Distinct elements from an IEnumerable - c#

I have a class called Item. Item has an identifier property called ItemCode which is a string. I would like to get a list of all non-distinct Items in a list of Items.
Example:
List<Item> itemList = new List<Item>()
{
new Item("code1", "description1"),
new Item("code2", "description2"),
new Item("code2", "description3"),
};
I want a list containing the bottom two entries
If I use
var distinctItems = itemsList.Distinct();
I get the list of distinct items which is great, but I want almost the opposite of that. I could subtract the the distinct list from the original list but that wouldn't contain ALL repeats, just one instance of each.
I've had a play and can't figure out an elegant solution. Any pointers or help would be much appreciated. Thanks!
I have 3.5 so LINQ is available

My take:
var distinctItems =
from list in itemsList
group list by list.ItemCode into grouped
where grouped.Count() > 1
select grouped;

as an extension method:
public static IEnumerable<T> NonDistinct<T, TKey> (this IEnumerable<T> source, Func<T, TKey> keySelector)
{
return source.GroupBy(keySelector).Where(g => g.Count() > 1).SelectMany(r => r);
}

You might want to try it with group by operator. The idea would be to group them by the ItemCode and taking the groups with more than one member, something like :
var grouped = from i in itemList
group i by i.ItemCode into g
select new { Code = g.Key, Items = g };
var result = from g in grouped
where g.Items.Count() > 1;

I'd suggest writing a custom extension method, something like this:
static class RepeatedExtension
{
public static IEnumerable<T> Repeated<T>(this IEnumerable<T> source)
{
var distinct = new Dictionary<T, int>();
foreach (var item in source)
{
if (!distinct.ContainsKey(item))
distinct.Add(item, 1);
else
{
if (distinct[item]++ == 1) // only yield items on first repeated occurence
yield return item;
}
}
}
}
You also need to override Equals() method for your Item class, so that items are correctly compared by their code.

Related

How to return a specific item in Distinct using EqualityComparer in C#

I have defined a CustomListComparer which compares List<int> A and List<int> B and if Union of the two lists equals at least on of the lists, considers them equal.
var distinctLists = MyLists.Distinct(new CustomListComparer()).ToList();
public bool Equals(Frame other)
{
var union = CustomList.Union(other.CustomList).ToList();
return union.SequenceEqual(CustomList) ||
union.SequenceEqual(other.CustomList);
}
For example, the below lists are equal:
ListA = {1,2,3}
ListB = {1,2,3,4}
And the below lists are NOT:
ListA = {1,5}
ListB = {1,2,3,4}
Now all this works fine. But here is my question: Which one of the Lists (A or B) gets into distinctLists? Do I have any say in that? Or is it all handled by compiler itself?
What I mean is say that the EqualityComparer considers both of the Lists equal. and adds one of them to distinctLists. Which one does it add? I want the list with more items to be added.
Distinct always adds the first element which it see. So it depends on the order of the sequence which you passed in.
Source is fairly simple, which can be found here
static IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer) {
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource element in source)
if (set.Add(element)) yield return element;
}
If you need to return list with more elements, you need to roll your own. Worth noting that Distinct is lazy, but the implementation you're asking for will need a eager implementation.
static class MyDistinctExtensions
{
public static IEnumerable<T> DistinctMaxElements<T>(this IEnumerable<T> source, IEqualityComparer<T> comparer) where T : ICollection
{
Dictionary<T, List<T>> dictionary = new Dictionary<T, List<T>>(comparer);
foreach (var item in source)
{
List<T> list;
if (!dictionary.TryGetValue(item, out list))
{
list = new List<T>();
dictionary.Add(item, list);
}
list.Add(item);
}
foreach (var list in dictionary.Values)
{
yield return list.Select(x => new { List = x, Count = x.Count })
.OrderByDescending(x => x.Count)
.First().List;
}
}
}
Updated the answer with naive implementation, not tested though.
Instead of Distinct you can use GroupBy with MaxBy method::
var distinctLists = MyLists.GroupBy(x => x, new CustomListComparer())
.Select(g => g.MaxBy(x => x.Count))
.ToList();
This will group lists using your comparer and select the list that has max item from each group.
MaxBy is quite useful in this situation, you can find it in MoreLINQ library.
Edit: Using pure LINQ:
var distinctLists = MyLists.GroupBy(x => x, new CustomListComparer())
.Select(g => g.First(x => x.Count == g.Max(l => l.Count)))
.ToList();

Trying GroupBy within List in c#

In the below case, I want to get a count of how many times the employee is repeating. For example, if the list has EmpA 25 times, I would like to get it. I am trying with GroupBy but not getting results. I can do record skip and find the count but there are lot of records.
So in below example, lineEmpNrs is the list and I want to have grouping results by employee ID.
Please suggest.
public static string ReadLines(StreamReader input)
{
string line;
while ( (line = input.ReadLine()) != null)
yield return line;
}
private taMibMsftEmpDetails BuildLine(string EmpId, string EmpName, String ExpnsDate)
{
taMibMsftEmpDetails empSlNr = new taMibMsftEmpDetails();
empSlNr.EmployeeId = EmpId;
empSlNr.EmployeeName = EmpName;
empSlNr.ExpenseDate = ExpnsDate;
return empSlNr;
}
List<taMibMsftEmpDetails> lineEmpNrs = new List<taMibMsftEmpDetails>();
foreach (string line in ReadLines(HeaderFile))
{
headerFields = line.Split(',');
lineEmpNrs.Add(BuildLine(headerFields[1],headerFields[2],headerFields[3]));
}
You can define following delegate, which you will use to select grouping key from list elements. It matches any method which accepts one argument and returns some value (key value):
public delegate TResult Func<T, TResult>(T arg);
And following generic method, which will convert any list to dictionary of grouped items
public static Dictionary<TKey, List<T>> ToDictionary<T, TKey>(
List<T> source, Func<T, TKey> keySelector)
{
Dictionary<TKey, List<T>> result = new Dictionary<TKey, List<T>>();
foreach (T item in source)
{
TKey key = keySelector(item);
if (!result.ContainsKey(key))
result[key] = new List<T>();
result[key].Add(item);
}
return result;
}
Now you will be able to group any list into dictionary by any property of list items:
List<taMibMsftEmpDetails> lineEmpNrs = new List<taMibMsftEmpDetails>();
// we are grouping by EmployeeId here
Func<taMibMsftEmpDetails, int> keySelector =
delegate(taMibMsftEmpDetails emp) { return emp.EmployeeId; };
Dictionary<int, List<taMibMsftEmpDetails>> groupedEmployees =
ToDictionary(lineEmpNrs, keySelector);
GroupBy should work if you use it like this:
var foo = lineEmpNrs.GroupBy(e => e.Id);
And if you'd want to get an enumerable with all the employees of the specified ID:
var list = lineEmpNrs.Where(e => e.Id == 1); // Or whatever employee ID you want to match
Combining the two should get you the results you're after.
If you wanted to see how many records there were with each employee, you can use GroupBy as:
foreach (var g in lineEmpNrs.GroupBy(e => e.Id))
{
Console.WriteLine("{0} records with Id '{1}'", g.Count(), g.Key);
}
To simply find out how many records there are for a specified Id, however, it may be simpler to use Where instead:
Console.WriteLine("{0} records with Id '{1}'", lineEmpNrs.Where(e => e.Id == id).Count(), id);

Use Group By in order to remove duplicates

I am looking for a simple way of removing duplicates without having to implement the class IComparable, having to override GetHashCode etc..
I think this can be achieved with linq. I have the class:
class Person
{
public string Name;
public ing Age;
}
I have a list of about 500 People List<Person> someList = new List<Person()
now I want to remove people with the same name and if there is a duplicate I want to keep the person that had the greater age. In other words if I have the list:
Name----Age---
Tom, 24 |
Alicia, 22 |
Alicia, 12 |
I will like to end up with:
Name----Age---
Tom, 24 |
Alicia, 22 |
How can I do this with a query? My list is not that long so I don't want to create a hash set nor implement the IComparable interface. It will be nice if I can do this with a linq query.
I think this can be done with the groupBy extension method by doing something like:
var people = // the list of Person
person.GroupBy(x=>x.Name).Where(x=>x.Count()>1)
... // select the person that has the greatest age...
people
.GroupBy(p => p.Name)
.Select(g => g.OrderByDescending(p => p.Age).First())
This will work across different Linq providers. If this is just Linq2Objects, and speed is important (usually, it isn't) consider using one of the many MaxBy extensions found on the web (here's Skeet's) and replacing
g.OrderByDescending(p => p.Age).First()
with
g.MaxBy(p => p.Age)
This can be trivially easy so long as you first create a helper function MaxBy that is capable of selecting the item from a sequence who's selector is largest. Unfortunately the Max function in LINQ won't work as we want to select the item from the sequence, not the selected value.
var distinctPeople = people.GroupBy(person => person.Name)
.Select(group => group.MaxBy(person => person.Age));
And then the implementation of MaxBy:
public static TSource MaxBy<TSource, TKey>(this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, IComparer<TKey> comparer = null)
{
comparer = comparer ?? Comparer<TKey>.Default;
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
throw new ArgumentException("Source must have at least one item");
var maxItem = iterator.Current;
var maxKey = keySelector(maxItem);
while (iterator.MoveNext())
{
var nextKey = keySelector(iterator.Current);
if (comparer.Compare(nextKey, maxKey) > 0)
{
maxItem = iterator.Current;
maxKey = nextKey;
}
}
return maxItem;
}
}
Note that while you can achieve the same result by sorting the sequence and then taking the first item, doing so is less efficient in general than doing just one pass with a max function.
I prefer to be simple:
var retPeople = new List<Person>;
foreach (var p in person)
{
if(!retPeople.Contains(p))
{
retPeople.Add(p);
}
}
Making Person to implement IComparable
I got rid of my last answer because I realized it was too slow and was too complicated. Here is the solution that makes a little more sense
var peoplewithLargestAgeByName =
from p in people
orderby p.Name
group p by p.Name into peopleByName
select peopleByName.First ( );
This is the same solution as the solution #spender contributed, but with the linq syntax.

How to get duplicate items from a list using LINQ? [duplicate]

This question already has answers here:
C# LINQ find duplicates in List
(13 answers)
Closed 3 years ago.
I'm having a List<string> like:
List<String> list = new List<String>{"6","1","2","4","6","5","1"};
I need to get the duplicate items in the list into a new list. Now I'm using a nested for loop to do this.
The resulting list will contain {"6","1"}.
Is there any idea to do this using LINQ or lambda expressions?
var duplicates = lst.GroupBy(s => s)
.SelectMany(grp => grp.Skip(1));
Note that this will return all duplicates, so if you only want to know which items are duplicated in the source list, you could apply Distinct to the resulting sequence or use the solution given by Mark Byers.
Here is one way to do it:
List<String> duplicates = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
The GroupBy groups the elements that are the same together, and the Where filters out those that only appear once, leaving you with only the duplicates.
Here's another option:
var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };
var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));
I know it's not the answer to the original question, but you may find yourself here with this problem.
If you want all of the duplicate items in your results, the following works.
var duplicates = list
.GroupBy( x => x ) // group matching items
.Where( g => g.Skip(1).Any() ) // where the group contains more than one item
.SelectMany( g => g ); // re-expand the groups with more than one item
In my situation I need all duplicates so that I can mark them in the UI as being errors.
I wrote this extension method based off #Lee's response to the OP. Note, a default parameter was used (requiring C# 4.0). However, an overloaded method call in C# 3.0 would suffice.
/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable<T></remarks>
public static IEnumerable<T> Duplicates<T>
(this IEnumerable<T> source, bool distinct = true)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
// select the elements that are repeated
IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));
// distinct?
if (distinct == true)
{
// deferred execution helps us here
result = result.Distinct();
}
return result;
}
List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };
var q = from s in list
group s by s into g
where g.Count() > 1
select g.First();
foreach (var item in q)
{
Console.WriteLine(item);
}
Hope this wil help
int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
foreach (var d in duplicates)
Console.WriteLine(d);
I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list. So I came up with looping through the groups to repack the original List with items that have duplicates.
public List<MediaFileInfo> GetDuplicatePictures()
{
List<MediaFileInfo> dupes = new List<MediaFileInfo>();
var grpDupes = from f in _fileRepo
group f by f.Length into grps
where grps.Count() >1
select grps;
foreach (var item in grpDupes)
{
foreach (var thing in item)
{
dupes.Add(thing);
}
}
return dupes;
}
All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.
The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.
As always in LINQ there are two versions, one with IEqualityComparer and one without it.
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
IEqualityComparer<TSource> comparer);
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (comparer == null)
comparer = EqualityCompare<TSource>.Default;
HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
foreach (TSource sourceItem in source)
{
if (!foundElements.Contains(sourceItem))
{ // we've not seen this sourceItem before. Add to the foundElements
foundElements.Add(sourceItem);
}
else
{ // we've seen this item before. It is a duplicate!
yield return sourceItem;
}
}
}
Usage:
IEnumerable<MyClass> myObjects = ...
// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();
// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)
// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
.Where(duplicate => duplicate.Name == "MyName")
.Take(5);
For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.
IMHO that is an efficiency boost to consider.

Overlay/Join two collections with Linq

I have the following scenario:
List 1 has 20 items of type TItem, List 2 has 5 items of the same type. List 1 already contains the items from List 2 but in a different state. I want to overwrite the 5 items in List 1 with the items from List 2.
I thought a join might work, but I want to overwrite the items in List 1, not join them together and have duplicates.
There is a unique key that can be used to find which items to overwrite in List 1 the key is of type int
You could use the built in Linq .Except() but it wants an IEqualityComparer so use a fluid version of .Except() instead.
Assuming an object with an integer key as you indicated:
public class Item
{
public int Key { get; set; }
public int Value { get; set; }
public override string ToString()
{
return String.Format("{{{0}:{1}}}", Key, Value);
}
}
The original list of objects can be merged with the changed one as follows:
IEnumerable<Item> original = new[] { 1, 2, 3, 4, 5 }.Select(x => new Item
{
Key = x,
Value = x
});
IEnumerable<Item> changed = new[] { 2, 3, 5 }.Select(x => new Item
{
Key = x,
Value = x * x
});
IEnumerable<Item> result = original.Except(changed, x => x.Key).Concat(changed);
result.ForEach(Console.WriteLine);
output:
{1:1}
{4:4}
{2:4}
{3:9}
{5:25}
LINQ isn't used to perform actual modifications to the underlying data sources; it's strictly a query language. You could, of course, do an outer join on List2 from List1 and select List2's entity if it's not null and List1's entity if it is, but that is going to give you an IEnumerable<> of the results; it won't actually modify the collection. You could do a ToList() on the result and assign it to List1, but that would change the reference; I don't know if that would affect the rest of your application.
Taking your question literally, in that you want to REPLACE the items in List1 with those from List2 if they exist, then you'll have to do that manually in a for loop over List1, checking for the existence of a corresponding entry in List2 and replacing the List1 entry by index with that from List2.
As Adam says, LINQ is about querying. However, you can create a new collection in the right way using Enumerable.Union. You'd need to create an appropriate IEqualityComparer though - it would be nice to have UnionBy. (Another one for MoreLINQ perhaps?)
Basically:
var list3 = list2.Union(list1, keyComparer);
Where keyComparer would be an implementation to compare the two keys. MiscUtil contains a ProjectionEqualityComparer which would make this slightly easier.
Alternatively, you could use DistinctBy from MoreLINQ after concatenation:
var list3 = list2.Concat(list1).DistinctBy(item => item.Key);
Here's a solution with GroupJoin.
List<string> source = new List<string>() { "1", "22", "333" };
List<string> modifications = new List<string>() { "4", "555"};
//alternate implementation
//List<string> result = source.GroupJoin(
// modifications,
// s => s.Length,
// m => m.Length,
// (s, g) => g.Any() ? g.First() : s
//).ToList();
List<string> result =
(
from s in source
join m in modifications
on s.Length equals m.Length into g
select g.Any() ? g.First() : s
).ToList();
foreach (string s in result)
Console.WriteLine(s);
Hmm, how about a re-usable extension method while I'm at it:
public static IEnumerable<T> UnionBy<T, U>
(
this IEnumerable<T> source,
IEnumerable<T> otherSource,
Func<T, U> selector
)
{
return source.GroupJoin(
otherSource,
selector,
selector,
(s, g) => g.Any() ? g.First() : s
);
}
Which is called by:
List<string> result = source
.UnionBy(modifications, s => s.Length)
.ToList();

Categories