Where LINQ on HashSet vs. List - c#

I need to count the elements of a list/set having property with given value. The list is huge and I need the performance as good as possible. Should I use a list or a set (when having unique elements)? Is there any faster way?
int counter = myList.Where(x => x.A == myValue || x.B == myValue).Count()
This is already inside of AsParallel().ForAll() for another huge list. And no, I can't change that.
Edit
I already saw this question and it does definitely not solve my problem, I am interested on the differences in (P)LINQ queries.

If you are walking a collection in its entirety, walking the entire list is likely to yield better performance than walking the entire set because of the way list elements are allocated in memory (assuming that you are using List<T>, not a linked list).
If you are performing thousands of such queries on the same data in myList, you could get performance improvements by building three look-up tables - on x.A, x.B, and on the common value when x.A == x.B:
var countByA = myList
.GroupBy(x => x.A)
.ToDictionary(g => g.Key, g => g.Count());
var countByB = myList
.GroupBy(x => x.B)
.ToDictionary(g => g.Key, g => g.Count());
var countByAandB = myList
.Where(x => x.A == x.B)
.GroupBy(x => x.A)
.ToDictionary(g => g.Key, g => g.Count());
Now your query can be converted to three look-ups using the inclusion-exclusion principle:
countByA.TryGetValue(myValue, out var counterA);
countByB.TryGetValue(myValue, out var counterB);
countByAandB.TryGetValue(myValue, out var counterAandB);
int counter = counterA + counterB - counterAandB;

Related

C# Linq copying values into model from different sources

So I come with a List list and first make it to a dictionary for faster searches.
Dictionary<Guid, Int32> dictionaryFromList = list
.ToDictionary(x => x.Item1, y => y.Item2);
Now Im loading all the other info in another Dictionary from the cache:
Dictionary<Guid, Model> modelDictionary = _cache
.Where(x => dictionaryFromList.ContainsKey(x.Id))
.Select(y => new {y.Id, y })
.ToDictionary(t => t.Id, t=> t.y);
Now I need two different things:
1. I need to insert more data into some of the Models in the modelDictionary
2. I need to insert the Int32 from the dictionaryFromList into the modelDictionary
My approach for 1. was the following:
HashSet<Guid> toLoadIds = new HashSet<Guid>(modelDictionary
.Where(x => !x.Value.IsLoaded)
.Select(x => x.Key));
context.myTable
.Where(x => toLoadIds.Contains(x.Id))
.Select(x => new {x.value1, x.value2, x.value3, x.value4, x.value5, x.value6 }));
I selected the values now afaik but how should I get them into the right model in the modelDictionary?
For the 2. one I tried doing this:
dictionaryFromList.Select(y => modelDictionary[y.Key].myValue = y.Value);
But it seems like nothing is working properly :(
The previous solution for the 2. was the following when modelDictionary was stil a List
modelDictionary.ForEach(x => x.myValue = dictionaryFromList[x.AddressId]);
When you write:
dictionaryFromList.Select(y => modelDictionary[y.Key].myValue = y.Value);
the execution is deferred until someone runs through the returned select iterator. But you do not pick it up, so no-one does that.
You could do:
dictionaryFromList.Select(y => modelDictionary[y.Key].myValue = y.Value)
.LastOrDefault();
where the final call will force the iteration of the select iterator. But I find that ugly. Why not simply use foreach?
foreach (var y in dictionaryFromList) { modelDictionary[y.Key].myValue = y.Value; }

c# and LINQ - convert IGrouping to List

I have the following code written to find common objects in a list of objects
https://dotnetfiddle.net/gCgNBf
..............................
var query = setOfPersons
.SelectMany(l => l.Select(l1 => l1))
.GroupBy(p => p.Id)
.Where(g => g.Count() == setOfPersons.Count);
After that, I need to convert "query" to a list of "Person" objects ( List ) to achieve something else.
I tried using "ToList()"... But it says:
" cannot convert IGrouping to a List ".
Can someone help me to fix it ?
Looking at your code it seems that what you are trying to achieve is to get the list of people that exist in each list. If so, you can use the following query:
var query = setOfPersons
.SelectMany(l => l.Select(l1 => l1))
.GroupBy(p => p.Id)
.Where(g => g.Count() == setOfPersons.Count)
.Select(x=>x.First()) // Select first person from the grouping - they all are identical
.ToList();
Console.WriteLine("These people appears in all set:");
foreach (var a in query)
{
Console.WriteLine("Id: {0} Name: {1}", a.Id, a.Name);
}
Here you select just a single item from each grouping, because they all are identical.

Split list into duplicate and non-duplicate lists

So far I have this:
List<Item> duplicates = items.GroupBy(x => x.Id)
.SelectMany(g => g.Skip(1)).ToList();
List<Item> nonDuplicates = items.GroupBy(x => x.Id)
.Select(x => x.First()).ToList();
Is there a more efficient way to do this (i.e. one select)?
Example input:
Id Value (added for some perspective)
-- -----
1 12
1 909
1231 0
1 577
Example Output:
duplicates -> {1, 909}, {1, 577}
non-duplicates -> {1, 12}, {1231, 0}
If you really want to avoid doing the actual grouping more than once, and thus avoid iterating the source sequence more than once, you can group the items, materialize that query into a list, and then grab the info that you want from that list.
var query = items.GroupBy(x => x.id)
.ToList();
var duplicates = query.SelectMany(group => group.Skip(1));
var nonDuplicates = query.Select(group => group.First());
Having said that, grouping items isn't particularly expensive of an operation, so this may not actually be a particularly huge win. Odds are reasonably high that your existing code is "good enough".
I'd be mostly interested in doing this if I wasn't confident that the source sequence would return the same items if iterated multiple times, or if it's say an IQueryable that needs to do a round trip to the database to get the items. In those cases this is a change worth implementing.
Get the first one for each Id, then use Except to get the others.
List<Item> nonDupes = items.GroupBy(x => x.Id).Select(x => x.First()).ToList();
List<Item> dupes = items.Except(nonDupes).ToList();
This is, however, assuming that Equals hasn't been overridden to be simply the Id.
EDIT: And here's a fiddle: http://dotnetfiddle.net/4GaPK4
var result = items.GroupBy(x => x.Id)
.Select(g => new {
Dups = g.Where(g.Count > 1),
NonDups = g.Where(g.Count == 1), })
.ToList();
Not in one query, but arguably more efficient approach can be achieved by using DistinctBy from MoreLinq:
var nonDuplicates = items.DistinctBy(i => i.Id);
var duplicates = items.Except(nonDuplicates);

GroupBy and OrderBy

I'm trying to do a GroupBy and then OrderBy to a list I have. Here is my code so far:
reportList.GroupBy(x => x.Type).ToDictionary(y=>y.Key, z=>z.OrderBy(a=>a.Lost));
With the help of the last question I asked on linq I think the ToDictionary is probably unneeded, but without it I don't know how to access the inner value.
To be clear, I need to GroupBy the Type property and want the inner groups I get to be OrderBy the Lost property (an integer). I want to know if there is a better, more efficient way or at the least better then what I've done.
An explanation and not just an answer would be very much appreciated.
Yes, there is better approach. Do not use random names (x,y,z,a) for variables:
reportList.GroupBy(r => r.Type)
.ToDictionary(g => g.Key, g => g.OrderBy(r => r.Lost));
You can even use long names to make code more descriptive (depends on context in which you are creating query)
reportList.GroupBy(report => report.Type)
.ToDictionary(group => group.Key,
group => group.OrderBy(report => report.Lost));
Your code does basically the following things:
Group elements by type
Convert the GroupBy result into a dictionary where the values of the dictionary are IEnumerables coming from a call to OrderBy
As far as the code correctness it is perfectly fine IMO, but maybe can be improved in term of efficiency (even if depends on your needs).
In fact, with your code, the values of your dictionary are lazily evaluated each time you enumerate them, resulting in a call to OrderBy method.
Probably you could perform it once and store the result in this way:
var dict = reportList
.GroupBy(x => x.Type)
.ToDictionary(y => y.Key, z => z.OrderBy(a => a.Lost).ToList());
// note the ToList call
or in this way:
var dict = reportList.OrderBy(a => a.Lost)
.GroupBy(x => x.Type)
.ToDictionary(y => y.Key, z => z);
// here we order then we group,
// since GroupBy guarantees to preserve the original order
Looks fine to me. If you use an anonymous type instead of a Dictionary, you could probably improve the readability of the code that uses the results of this query.
reportList.GroupBy(r => r.Type)
.Select(g => new { Type = g.Key, Reports = g.OrderBy(r => r.Lost) });

Counting grouped data with Linq to Sql

I have a database of documents in an array, each with an owner and a document type, and I'm trying to get a list of the 5 most common document types for a specific user.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id);
This returns all the documents belonging to a specific owner and grouped as I need them, I now need a way to extract the ids of the most common document types. I'm not too familiar with Linq to Sql, so any help would be great.
This would order the groups by count descending and then take the top 5 of them, you could adapt to another number or completely take out the Take() if its not needed in your case:
var mostCommon = docTypes.OrderByDescending( x => x.Count()).Take(5);
To just select the top document keys:
var mostCommonDocTypes = docTypes.OrderByDescending( x => x.Count())
.Select( x=> x.Key)
.Take(5);
You can also of course combine this with your original query by appending/chaining it, just separated for clarity in this answer.
Using the Select you can get the value from the Key of the Grouping (the Id) and then a count of each item in the grouping.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id)
.Select(groupingById=>
new
{
Id = groupingById.Key,
Count = groupingById.Count(),
})
.OrderByDescending(x => x.Count);

Categories