Trying GroupBy within List in c#

Trying GroupBy within List in c# - c#

In the below case, I want to get a count of how many times the employee is repeating. For example, if the list has EmpA 25 times, I would like to get it. I am trying with GroupBy but not getting results. I can do record skip and find the count but there are lot of records.
So in below example, lineEmpNrs is the list and I want to have grouping results by employee ID.
Please suggest.
public static string ReadLines(StreamReader input)
{
string line;
while ( (line = input.ReadLine()) != null)
yield return line;
}
private taMibMsftEmpDetails BuildLine(string EmpId, string EmpName, String ExpnsDate)
{
taMibMsftEmpDetails empSlNr = new taMibMsftEmpDetails();
empSlNr.EmployeeId = EmpId;
empSlNr.EmployeeName = EmpName;
empSlNr.ExpenseDate = ExpnsDate;
return empSlNr;
}
List<taMibMsftEmpDetails> lineEmpNrs = new List<taMibMsftEmpDetails>();
foreach (string line in ReadLines(HeaderFile))
{
headerFields = line.Split(',');
lineEmpNrs.Add(BuildLine(headerFields[1],headerFields[2],headerFields[3]));
}

You can define following delegate, which you will use to select grouping key from list elements. It matches any method which accepts one argument and returns some value (key value):
public delegate TResult Func<T, TResult>(T arg);
And following generic method, which will convert any list to dictionary of grouped items
public static Dictionary<TKey, List<T>> ToDictionary<T, TKey>(
List<T> source, Func<T, TKey> keySelector)
{
Dictionary<TKey, List<T>> result = new Dictionary<TKey, List<T>>();
foreach (T item in source)
{
TKey key = keySelector(item);
if (!result.ContainsKey(key))
result[key] = new List<T>();
result[key].Add(item);
}
return result;
}
Now you will be able to group any list into dictionary by any property of list items:
List<taMibMsftEmpDetails> lineEmpNrs = new List<taMibMsftEmpDetails>();
// we are grouping by EmployeeId here
Func<taMibMsftEmpDetails, int> keySelector =
delegate(taMibMsftEmpDetails emp) { return emp.EmployeeId; };
Dictionary<int, List<taMibMsftEmpDetails>> groupedEmployees =
ToDictionary(lineEmpNrs, keySelector);

GroupBy should work if you use it like this:
var foo = lineEmpNrs.GroupBy(e => e.Id);
And if you'd want to get an enumerable with all the employees of the specified ID:
var list = lineEmpNrs.Where(e => e.Id == 1); // Or whatever employee ID you want to match
Combining the two should get you the results you're after.

If you wanted to see how many records there were with each employee, you can use GroupBy as:
foreach (var g in lineEmpNrs.GroupBy(e => e.Id))
{
Console.WriteLine("{0} records with Id '{1}'", g.Count(), g.Key);
}
To simply find out how many records there are for a specified Id, however, it may be simpler to use Where instead:
Console.WriteLine("{0} records with Id '{1}'", lineEmpNrs.Where(e => e.Id == id).Count(), id);

Related

How to Except<> specifing another key? Or faster way to differences two huge List<>?

I have a list of AE_AlignedPartners items in the db, which I retrieve with:
List<AE_AlignedPartners> ae_alignedPartners_olds = ctx.AE_AlignedPartners.AsNoTracking().ToList();
Than, I got and serialize a new list (of the same object type) with JSON:
List<AE_AlignedPartners> ae_alignedPartners_news = GetJSONPartnersList();
Than I'm getting the intersections of both:
var IDSIntersections = (from itemNew in ae_alignedPartners_news
join itemOld in ae_alignedPartners_olds on itemNew.ObjectID equals itemOld.ObjectID
select itemNew).Select(p => p.ObjectID).ToList();
Now, due of these intersections, I need to create two new lists, with the added items (ae_alignedPartners_news - intersections) and the deleted ones (ae_alignedPartners_olds - interesections). Here's the code:
// to create
IList<AE_AlignedPartners> ae_alignedPartners_toCreate = ae_alignedPartners_news.Where(p => !IDSIntersections.Contains(p.ObjectID)).ToList();
// to delete
IList<AE_AlignedPartners> ae_alignedPartners_toDelete = ae_alignedPartners_olds.Where(p => !IDSIntersections.Contains(p.ObjectID)).ToList();
But with many records (~100k) it tooks too much time.
Is there a sort of Except<> specifing which key need to be compared? In my case its not p.ID (which is the Primary Key on the DB), but p.ObjectID.
Or any other faster way?

There is an Except function that you can use with a custom comparer:
class PartnerComparer : IEqualityComparer<AE_AlignedPartners>
{
// Partners are equal if their ObjectID's are equal.
public bool Equals(AE_AlignedPartners x, AE_AlignedPartners y)
{
//Check whether the partner's ObjectID's are equal.
return x.ObjectID == y.ObjectID;
}
public int GetHashCode(AE_AlignedPartners ap) {
return ap.ObjectId.GetHashCode();
}
}
var intersect = ae_alignedPartners_news.Intersect(ae_alignedPartners_olds);
var creates = ae_alignedPartners_news.Except(intersect, new PartnerComparer);
var deletes = ae_alignedPartners_old.Except(intersect, new PartnerComparer);
This should give you a reasonable boost in performance.

You don't need an inner join, you need a full outer join on primary key. LINQ does not know a full outer join, but it is easy to extend IEnumerable with a function.
from StackOverlow: LINQ full outer join, I took the solution that uses deferred execution. This solution only works if the KeySelector uses unique keys.
public static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> sequenceA,
IEnumerable<TB> sequenceB,
Func<TA, TKey> keyASelector,
Func<TB, TKey> keyBSelector,
Func<TKey, TA, TB, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
{
if (comparer == null) comparer = EqualityComparer<TKey>.Default;
// create two lookup tables:
var alookup = a.ToLookup(selectKeyA, comparer);
var blookup = b.ToLookup(selectKeyB, comparer);
// all used keys:
var aKeys = alookup.Select(p => p.Key);
var bKeys = blookup.Select(p => p.Key);
var allUsedKeys = aKeys.bKeys.Distinct(comparer);
// for every used key:
// get the values from A with this key, or default if it is not a key used by A
// and the value from B with this key, or default if it is not a key used by B
// put the key, and the fetched values in the ResultSelector
foreach (TKey key in allUsedKeys)
{
TA fetchedA = aLookup[key].FirstOrDefault();
TB fetchedB = bLookup[key].FirstOrDefault();
TResult result = ResultSelector(key, fetchedA, fetchedB);
yield result;
}
I use this function to create three types:
Values in A but not in B: (A, null) => must be added
Values in B but not in A: (null, B) => must be removed
Values in A and in B: (A, B) => need further inspection to see if update is needed
.
IEnumerable<AlignedPartners> olds = ...
IEnumerable<AlignedPartners> news = ...
var joinResult = olds.FullOuterJoin(news, // join old and new
oldItem => oldItem.Id, // from every old take the Id
newItem => newItem.Id, // from every new take the Id
(key, oldItem, newItem) => new // when they match make one new object
{ // containing the following properties
OldItem = oldItem,
NewItem = newItem,
});
Note: until now nothing has been enumerated!
foreach (var joinedItem in joinResult)
{
if (joinedItem.OldItem == null)
{
// we won't have both items null, so we know NewItem is not null
AddItem(joinedItem.NewItem);
}
else if (joinedItem.NewItem == null)
{ // old not null, new equals null
DeleteItem(joinedItem.OldItem);
}
else
{ // both old and new not null, if desired: check if update needed
if (!comparer.Equals(old, new))
{ // changed
UpdateItems(old, new)
}
}
}

How to return a specific item in Distinct using EqualityComparer in C#

I have defined a CustomListComparer which compares List<int> A and List<int> B and if Union of the two lists equals at least on of the lists, considers them equal.
var distinctLists = MyLists.Distinct(new CustomListComparer()).ToList();
public bool Equals(Frame other)
{
var union = CustomList.Union(other.CustomList).ToList();
return union.SequenceEqual(CustomList) ||
union.SequenceEqual(other.CustomList);
}
For example, the below lists are equal:
ListA = {1,2,3}
ListB = {1,2,3,4}
And the below lists are NOT:
ListA = {1,5}
ListB = {1,2,3,4}
Now all this works fine. But here is my question: Which one of the Lists (A or B) gets into distinctLists? Do I have any say in that? Or is it all handled by compiler itself?
What I mean is say that the EqualityComparer considers both of the Lists equal. and adds one of them to distinctLists. Which one does it add? I want the list with more items to be added.

Distinct always adds the first element which it see. So it depends on the order of the sequence which you passed in.
Source is fairly simple, which can be found here
static IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer) {
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource element in source)
if (set.Add(element)) yield return element;
}
If you need to return list with more elements, you need to roll your own. Worth noting that Distinct is lazy, but the implementation you're asking for will need a eager implementation.
static class MyDistinctExtensions
{
public static IEnumerable<T> DistinctMaxElements<T>(this IEnumerable<T> source, IEqualityComparer<T> comparer) where T : ICollection
{
Dictionary<T, List<T>> dictionary = new Dictionary<T, List<T>>(comparer);
foreach (var item in source)
{
List<T> list;
if (!dictionary.TryGetValue(item, out list))
{
list = new List<T>();
dictionary.Add(item, list);
}
list.Add(item);
}
foreach (var list in dictionary.Values)
{
yield return list.Select(x => new { List = x, Count = x.Count })
.OrderByDescending(x => x.Count)
.First().List;
}
}
}
Updated the answer with naive implementation, not tested though.

Instead of Distinct you can use GroupBy with MaxBy method::
var distinctLists = MyLists.GroupBy(x => x, new CustomListComparer())
.Select(g => g.MaxBy(x => x.Count))
.ToList();
This will group lists using your comparer and select the list that has max item from each group.
MaxBy is quite useful in this situation, you can find it in MoreLINQ library.
Edit: Using pure LINQ:
var distinctLists = MyLists.GroupBy(x => x, new CustomListComparer())
.Select(g => g.First(x => x.Count == g.Max(l => l.Count)))
.ToList();

LINQ: Collapsing a series of strings into a set of "ranges"

I have an array of strings similar to this (shown on separate lines to illustrate the pattern):
{ "aa002","aa003","aa004","aa005","aa006","aa007", // note that aa008 is missing
"aa009"
"ba023","ba024","ba025"
"bb025",
"ca002","ca003",
"cb004",
...}
...and the goal is to collapse those strings into this comma-separated string of "ranges":
"aa002-aa007,aa009,ba023-ba025,bb025,ca002-ca003,cb004, ... "
I want to collapse them so I can construct a URL. There are hundreds of elements, but I can still convey all the information if I collapse them this way - putting them all into a URL "longhand" (it has to be a GET, not a POST) isn't feasible.
I've had the idea to separate them into groups using the first two characters as the key - but does anyone have any clever ideas for collapsing those sequences (without gaps) into ranges? I'm struggling with it, and everything I've come up with looks like spaghetti.

So the first thing that you need to do is parse the strings. It's important to have the alphabetic prefix and the integer value separately.
Next you want to group the items on the prefix.
For each of the items in that group, you want to order them by number, and then group items while the previous value's number is one less than the current item's number. (Or, put another way, while the previous item plus one is equal to the current item.)
Once you've grouped all of those items you want to project that group out to a value based on that range's prefix, as well as the first and last number. No other information from these groups is needed.
We then flatten the list of strings for each group into just a regular list of strings, since once we're all done there is no need to separate out ranges from different groups. This is done using SelectMany.
When that's all said and done, that, translated into code, is this:
public static IEnumerable<string> Foo(IEnumerable<string> data)
{
return data.Select(item => new
{
Prefix = item.Substring(0, 2),
Number = int.Parse(item.Substring(2))
})
.GroupBy(item => item.Prefix)
.SelectMany(group => group.OrderBy(item => item.Number)
.GroupWhile((prev, current) =>
prev.Number + 1 == current.Number)
.Select(range =>
RangeAsString(group.Key,
range.First().Number,
range.Last().Number)));
}
The GroupWhile method can be implemented like so:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
And then the simple helper method to convert each range into a string:
private static string RangeAsString(string prefix, int start, int end)
{
if (start == end)
return prefix + start;
else
return string.Format("{0}{1}-{0}{2}", prefix, start, end);
}

Here's a LINQ version without the need to add new extension methods:
var data2 = data.Skip(1).Zip(data, (d1, d0) => new
{
value = d1,
jump = d1.Substring(0, 2) == d0.Substring(0, 2)
? int.Parse(d1.Substring(2)) - int.Parse(d0.Substring(2))
: -1,
});
var agg = new { f = data.First(), t = data.First(), };
var query2 =
data2
.Aggregate(new [] { agg }.ToList(), (a, x) =>
{
var last = a.Last();
if (x.jump == 1)
{
a.RemoveAt(a.Count() - 1);
a.Add(new { f = last.f, t = x.value, });
}
else
{
a.Add(new { f = x.value, t = x.value, });
}
return a;
});
var query3 =
from q in query2
select (q.f) + (q.f == q.t ? "" : "-" + q.t);
I get these results:

IEnumerable.Except() between different classes with a common field

Is it possible to use Except() for two List's that have two different classes but a common field? I have List<User1> and List<User2> collections. They have different properties except Id column and I want to find the different records between them using this Id column. I'm trying to use List<>.Except() but I'm getting this error:
The type arguments for method 'System.Linq.Enumerable.Except(System.Collections.Generic.IEnumerable, System.Collections.Generic.IEnumerable)' cannot be inferred from the usage. Try specifying the type arguments explicitly.
Here's what I'm trying:
List<User1> list1 = List1();
List<User2> list2 = List2();
var listdiff = list1.Except(list2.Select(row => row.Id));
What am I doing wrong?

List1 contains instances of User1 and List2 contains instances of User2.
What type of instance should be produced by list1.Except(list2.Select(row => row.Id))?
In other words if type inference was not available, what would you replace var with?
If User1 and User2 inherit from the same ancestor (with ID), use List<User> instead.
Otherwise:
var list2Lookup = list2.ToLookup(user => user.Id);
var listdiff = list1.Where(user => (!list2Lookup.Contains(user.Id))

Not Except, but the correct results and similar performance:
// assumes that the Id property is an Int32
var tempKeys = new HashSet<int>(list2.Select(x => x.Id));
var listdiff = list1.Where(x => tempKeys.Add(x.Id));
And, of course, you can wrap it all up in your own re-usable extension method:
var listdiff = list1.Except(list2, x => x.Id, y => y.Id);
// ...
public static class EnumerableExtensions
{
public static IEnumerable<TFirst> Except<TFirst, TSecond, TKey>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TKey> firstKeySelector,
Func<TSecond, TKey> secondKeySelector)
{
// argument null checking etc omitted for brevity
var keys = new HashSet<TKey>(second.Select(secondKeySelector));
return first.Where(x => keys.Add(firstKeySelector(x)));
}
}

Briefly, make lists to be List<object> and use C# feature from .NET 4.0: dynamic.
Example:
var listDiff = list1
.AsEnumerable<object>()
.Except(list2
.AsEnumerable<object>()
.Select(row => ((dynamic)row).ID));

If you just want the Ids in list1 that are not in list2, you can do:
var idsInList1NotInList2 = list1.Select(user1 => user1.Id)
.Except(list2.Select(user2 => user2.Id));
If you need the associated User1 objects too, here's one way (assuming Ids are unique for a User1 object):
// Create lookup from Id to the associated User1 object
var user1sById = list1.ToDictionary(user1 => user1.Id);
// Find Ids from the lookup that are not present for User2s from list2
// and then retrieve their associated User1s from the lookup
var user1sNotInList2 = user1sById.Keys
.Except(list2.Select(user2 => user2.Id))
.Select(key => user1sById[key]);
EDIT: vc74's take on this idea is slightly better; it doesn't require uniqueness.

public static IEnumerable<TSource> Except<TSource, CSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> TSelector, IEnumerable<CSource> csource, Func<CSource, TKey> CSelector)
{
bool EqualFlag = false;
foreach (var s in source)
{
EqualFlag = false;
foreach (var c in csource)
{
var svalue = TSelector(s);
var cvalue = CSelector(c);
if (svalue != null)
{
if (svalue.Equals(cvalue))
{
EqualFlag = true;
break;
}
}
else if (svalue == null && cvalue == null)
{
EqualFlag = true;
break;
}
}
if (EqualFlag)
continue;
else
{
yield return s;
}
}
}

Try
list1.Where(user1 => !list2.Any(user2 => user2.Id.Equal(user1.Id)));

Get Non-Distinct elements from an IEnumerable

I have a class called Item. Item has an identifier property called ItemCode which is a string. I would like to get a list of all non-distinct Items in a list of Items.
Example:
List<Item> itemList = new List<Item>()
{
new Item("code1", "description1"),
new Item("code2", "description2"),
new Item("code2", "description3"),
};
I want a list containing the bottom two entries
If I use
var distinctItems = itemsList.Distinct();
I get the list of distinct items which is great, but I want almost the opposite of that. I could subtract the the distinct list from the original list but that wouldn't contain ALL repeats, just one instance of each.
I've had a play and can't figure out an elegant solution. Any pointers or help would be much appreciated. Thanks!
I have 3.5 so LINQ is available

My take:
var distinctItems =
from list in itemsList
group list by list.ItemCode into grouped
where grouped.Count() > 1
select grouped;

as an extension method:
public static IEnumerable<T> NonDistinct<T, TKey> (this IEnumerable<T> source, Func<T, TKey> keySelector)
{
return source.GroupBy(keySelector).Where(g => g.Count() > 1).SelectMany(r => r);
}

You might want to try it with group by operator. The idea would be to group them by the ItemCode and taking the groups with more than one member, something like :
var grouped = from i in itemList
group i by i.ItemCode into g
select new { Code = g.Key, Items = g };
var result = from g in grouped
where g.Items.Count() > 1;

I'd suggest writing a custom extension method, something like this:
static class RepeatedExtension
{
public static IEnumerable<T> Repeated<T>(this IEnumerable<T> source)
{
var distinct = new Dictionary<T, int>();
foreach (var item in source)
{
if (!distinct.ContainsKey(item))
distinct.Add(item, 1);
else
{
if (distinct[item]++ == 1) // only yield items on first repeated occurence
yield return item;
}
}
}
}
You also need to override Equals() method for your Item class, so that items are correctly compared by their code.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Trying GroupBy within List in c# - c#

Related

How to Except<> specifing another key? Or faster way to differences two huge List<>?

How to return a specific item in Distinct using EqualityComparer in C#

LINQ: Collapsing a series of strings into a set of "ranges"

IEnumerable.Except() between different classes with a common field

Get Non-Distinct elements from an IEnumerable

Categories

Resources