Split list into duplicate and non-duplicate lists - c#

So far I have this:
List<Item> duplicates = items.GroupBy(x => x.Id)
.SelectMany(g => g.Skip(1)).ToList();
List<Item> nonDuplicates = items.GroupBy(x => x.Id)
.Select(x => x.First()).ToList();
Is there a more efficient way to do this (i.e. one select)?
Example input:
Id Value (added for some perspective)
-- -----
1 12
1 909
1231 0
1 577
Example Output:
duplicates -> {1, 909}, {1, 577}
non-duplicates -> {1, 12}, {1231, 0}

If you really want to avoid doing the actual grouping more than once, and thus avoid iterating the source sequence more than once, you can group the items, materialize that query into a list, and then grab the info that you want from that list.
var query = items.GroupBy(x => x.id)
.ToList();
var duplicates = query.SelectMany(group => group.Skip(1));
var nonDuplicates = query.Select(group => group.First());
Having said that, grouping items isn't particularly expensive of an operation, so this may not actually be a particularly huge win. Odds are reasonably high that your existing code is "good enough".
I'd be mostly interested in doing this if I wasn't confident that the source sequence would return the same items if iterated multiple times, or if it's say an IQueryable that needs to do a round trip to the database to get the items. In those cases this is a change worth implementing.

Get the first one for each Id, then use Except to get the others.
List<Item> nonDupes = items.GroupBy(x => x.Id).Select(x => x.First()).ToList();
List<Item> dupes = items.Except(nonDupes).ToList();
This is, however, assuming that Equals hasn't been overridden to be simply the Id.
EDIT: And here's a fiddle: http://dotnetfiddle.net/4GaPK4

var result = items.GroupBy(x => x.Id)
.Select(g => new {
Dups = g.Where(g.Count > 1),
NonDups = g.Where(g.Count == 1), })
.ToList();

Not in one query, but arguably more efficient approach can be achieved by using DistinctBy from MoreLinq:
var nonDuplicates = items.DistinctBy(i => i.Id);
var duplicates = items.Except(nonDuplicates);

Related

C# Linq union multiple properties to one list

Basically I have an object with 2 different properties, both int and I want to get one list with all values from both properties. As of now I have a couple of linq queries to do this for me, but I am wondering if this could be simplified somehow -
var componentsWithDynamicApis = result
.Components
.Where(c => c.DynamicApiChoicesId.HasValue ||
c.DynamicApiSubmissionsId.HasValue);
var choiceApis = componentsWithDynamicApis
.Select(c => c.DynamicApiChoicesId.Value);
var submissionApis = componentsWithDynamicApis
.Select(c => c.DynamicApiSubmissionsId.Value);
var dynamicApiIds = choiceApis
.Union(submissionApis)
.Distinct();
Not every component will have both Choices and Submissions.
By simplify, I assume you want to combine into fewer statements. You can also simplify in terms of execution by reducing the number of times you iterate the collection (the current code does it 3 times).
One way is to use a generator function (assuming the type of items in your result.Components collection is Component):
IEnumerable<int> GetIds(IEnumerable<Component> components)
{
foreach (var component in components)
{
if (component.DynamicApiChoicesId.HasValue) yield return component.DynamicApiChoicesId.Value;
if (component.DynamicApiSubmissionsId.HasValue) yield return component.DynamicApiSubmissionsId.Value;
}
}
Another option is to use SelectMany. The trick there is to create a temporary enumerable holding the appropriate values of DynamicApiChoicesId and DynamicApiSubmissionsId. I can't think of a one-liner for this, but here is one option:
var dynamicApiIds = result
.Components
.SelectMany(c => {
var temp = new List<int>();
if (c.DynamicApiChoicesId.HasValue) temp.Add(c.DynamicApiChoicesId.Value);
if (c.DynamicApiSubmissionsId.HasValue) temp.Add(c.DynamicApiSubmissionsId.Value);
return temp;
})
.Distinct();
#Eldar's answer gave me an idea for an improvement on option #2:
var dynamicApiIds = result
.Components
.SelectMany(c => new[] { c.DynamicApiChoicesId, c.DynamicApiSubmissionsId })
.Where(c => c.HasValue)
.Select(c => c.Value)
.Distinct();
Similar to some of the other answers, but I think this covers all your bases with a very minimal amount of code.
var dynamicApiIds = result.Components
.SelectMany(c => new[] { c.DynamicApiChoicesId, c.DynamicApiSubmissionsId}) // combine
.OfType<int>() // remove nulls
.Distinct();
To map each element in the source list onto more than one element on the destination list, you can use SelectMany.
var combined = componentsWithDynamicApis
.SelectMany(x => new[] { x.DynamicApiChoicesId.Value, x.DynamicApiSubmissionsId.Value })
.Distinct();
I have not tested it but you can use SelectMany with filtering out the null values like below :
var componentsWithDynamicApis = result
.Components
.Select(r=> new [] {r.DynamicApiChoicesId,r.DynamicApiSubmissionsId})
.SelectMany(r=> r.Where(p=> p!=null).Cast<int>()).Distinct();

LINQ returns List<{int,double}> two values after .selected but I need List<int> with one value only

I have got this assignment. I need to create method which works with JSON data in this form:
On input N, what is top N of movies? The score of a movie is its average rate
So I have a JSONfile with 5 mil. movies inside. Each row looks like this:
{ Reviewer:1, Movie:1535440, Grade:1, Date:'2005-08-18'},
{ Reviewer:1, Movie:1666666, Grade:2, Date:'2006-09-20'},
{ Reviewer:2, Movie:1535440, Grade:3, Date:'2008-05-10'},
{ Reviewer:3, Movie:1535440, Grade:5, Date:'2008-05-11'},
This file is deserialized and then saved as a IEnumerable. And then I wanted to create a method, which returns List<int> where int is MovieId. Movies in the list are ordered descending and the amount of "top" movies is specified as a parameter of the method.
My method looks like this:
public List<int> GetSpecificAmountOfBestMovies(int amountOfMovies)
{
var moviesAndAverageGradeSortedList = _deserializator.RatingCollection()
.GroupBy(movieId => movieId.Movie)
.Select(group => new
{
Key = group.Key,
Average = group.Average(g => g.Grade)
})
.OrderByDescending(a => a.Average)
.Take(amountOfMovies)
.ToList();
var moviesSortedList = new List<int>();
foreach (var movie in moviesAndAverageGradeSortedList)
{
var key = movie.Key;
moviesSortedList.Add(key);
}
return moviesSortedList;
}
So moviesAndAverageGradeSortedList returns List<{int,double}> because of the .select method. So I could not return this value as this method is type of List<int> because I want only movieIds not their average grades.
So I created a new List<int> and then foreach loop which go through the moviesAndAverageGradeSortedList and saves only Keys from that List.
I think this solution is not correct because foreach loop can be then very slow when I put big number as a parameter. Does somebody know, how can I get "Keys" (movieIds) from the first list and therefore avoid creating another List<int> and foreach loop?
I will be thankful for every solution.
You can avoid the second list creation by just adding another .Select after the ordering. Also to make it all a bit cleaner you could:
return _deserializator.RatingCollection()
.GroupBy(i => i.Movie)
.OrderByDescending(g => g.Average(i => i.Grade))
.Select(g => g.Key)
.Take(amountOfMovies)
.ToList();
Note that this won't really improve performance much (if at all) because even in your original implementation the creation of the second list is done only on the subset of the first n items. The expensive operations are the ordering by the averages of the group and that you want to perform on all items in the json file, regardless to the number of item you want to return
You could add another select after you have ordered the list by average
var moviesAndAverageGradeSortedList = _deserializator.RatingCollection()
.GroupBy(movieId => movieId.Movie)
.Select(group => new
{
Key = group.Key,
Average = group.Average(g => g.Grade)
})
.OrderByDescending(a => a.Average)
.Take(amountOfMovies)
.Select(s=> s.Key)
.ToList();

How to do in this in Linq C#

So far, I have this:
var v = Directory.EnumerateFiles(_strConfigurationFolder)
.GroupBy(x => GetReportName(Path.GetFileNameWithoutExtension(x)));
Configuration folder will contain pairs of files:
abc.json
abc-input.json
def.json
def-input.json
GetReportName() method strips off the "-input" and title cases the filename, so you end up with a grouping of:
Abc
abc.json
abc-input.json
Def
def.json
def-input.json
I have a ReportItem class that has a constructor (Name, str1, str2). I want to extend the Linq to create the ReportItems in a single statement, so really something like:
var v = Directory.EnumerateFiles(_strConfigurationFolder)
.GroupBy(x => GetReportName(Path.GetFileNameWithoutExtension(x)))
**.Select(x => new ReportItem(x.Key, x[0], x[1]));**
Obviously last line doesn't work because the grouping doesn't support array indexing like that. The item should be constructed as "Abc", "abc.json", "abc-input.json", etc.
If you know that each group of interest contains exactly two items, use First() to get the item at index 0, and Last() to get the item at index 1:
var v = Directory.EnumerateFiles(_strConfigurationFolder)
.GroupBy(x => GetReportName(Path.GetFileNameWithoutExtension(x)))
.Where(g => g.Count() == 2) // Make sure we have exactly two items
.Select(x => new ReportItem(x.Key, x.First(), x.Last()));
var v = Directory.EnumerateFiles(_strConfigurationFolder)
.GroupBy(x => GetReportName(Path.GetFileNameWithoutExtension(x))).Select(x => new ReportItem(x.Key, x.FirstOrDefault(), x.Skip(1).FirstOrDefault()));
But are you sure there will be exactly two items in each group? Maybe has it sence for ReportItem to accept IEnumerable, not just two strings?

Counting grouped data with Linq to Sql

I have a database of documents in an array, each with an owner and a document type, and I'm trying to get a list of the 5 most common document types for a specific user.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id);
This returns all the documents belonging to a specific owner and grouped as I need them, I now need a way to extract the ids of the most common document types. I'm not too familiar with Linq to Sql, so any help would be great.
This would order the groups by count descending and then take the top 5 of them, you could adapt to another number or completely take out the Take() if its not needed in your case:
var mostCommon = docTypes.OrderByDescending( x => x.Count()).Take(5);
To just select the top document keys:
var mostCommonDocTypes = docTypes.OrderByDescending( x => x.Count())
.Select( x=> x.Key)
.Take(5);
You can also of course combine this with your original query by appending/chaining it, just separated for clarity in this answer.
Using the Select you can get the value from the Key of the Grouping (the Id) and then a count of each item in the grouping.
var docTypes = _documentRepository.GetAll()
.Where(x => x.Owner.Id == LoggedInUser.Id)
.GroupBy(x => x.DocumentType.Id)
.Select(groupingById=>
new
{
Id = groupingById.Key,
Count = groupingById.Count(),
})
.OrderByDescending(x => x.Count);

How can I order a Dictionary in C#?

edit: Thanks Jason, the fact that it was a dictionary isn't that important. I just wanted the runtime to have a low runtime. Is that LINQ method fast? Also, I know this is off topic but what does the n => n mean?
I have a list of numbers and I want to make another list with the numbers that appear most at the beginning and the least at the end.
So what I did was when through the list and checked if the number x was in the dictionary. If it wasn't then I made the key x and the value one. If it was then I changed the value to be the value plus one.
Now I want to order the dictionary so that I can make a list with the ones that appear the most at the beginning and the least at the end.
How can I do that in C#?
ps. runtime is very important.
So it sounds like you have a Dictionary<int, int> where the key represents some integer that you have in a list and corresponding value represents the count of the number of times that integer appeared. You are saying that you want to order the keys by counts sorted in descending order by frequency. Then you can say
// dict is Dictionary<int, int>
var ordered = dict.Keys.OrderByDescending(k => dict[k]).ToList();
Now, it sounds like you started with a List<int> which are the values that you want to count and order by count. You can do this very quickly in LINQ like so:
// list is IEnumerable<int> (e.g., List<int>)
var ordered = list.GroupBy(n => n)
.OrderByDescending(g => g.Count())
.Select(g => g.Key)
.ToList();
Or in query syntax
var ordered = (from n in list
group n by n into g
orderby g.Count() descending
select g.Key).ToList();
Now, if you need to have the intermediate dictionary you can say
var dict = list.GroupBy(n => n)
.ToDictionary(g => g.Key, g => g.Count());
var ordered = dict.Keys.OrderByDescending(k => dict[k]).ToList();
Use the GroupBy extension on IEnumerable() to group the numbers and extract the count of each. This creates the dictionary from the list and orders it in one statement.
var ordered = list.GroupBy( l => l )
.OrderByDescending( g => g.Count() )
.ToDictionary( g => g.Key, g.Count() );
You may also consider using SortedDictionary.
It sorts the items on the basis of key, while insertion. more..
List<KeyValuePair<type, type>> listEquivalent =
new List<KeyValuePair<type, type>>(dictionary);
listEquivalent.Sort((first,second) =>
{
return first.Value.CompareTo(second.Value);
});
Something like that maybe?
edit: Thanks Jason for the notice on my omission

Categories