Selecting unique values of different columns using LINQ

Selecting unique values of different columns using LINQ - c#

I have a table (orders for ex) which has Multiple Columns.
products categories subcategories
--------------------------------------
prod1 cat1 sub1
prod1 cat2 sub2
prod2 cat3 sub6
prod1 cat1 sub1
prod5 cat2 sub8
prod2 cat1 sub1
prod1 cat7 sub3
prod8 cat2 sub2
prod2 cat3 sub1
Now I can write three different queries to get distinct values
var prod = (from p in _context.orders select p.products).ToList().Distinct();
similarly I can write it for others.
Now I need to get the distinct values of each column in a single query for which the result needs to look like
products categories subcategories
--------------------------------------
prod1 cat1 sub1
prod2 cat2 sub2
prod5 cat3 sub6
prod8 cat7 sub8
sub3
My ClassType for unique fields looks like this
public class UniqueProductFields
{
public IEnumerable<string> Products { get; set; }
public IEnumerable<string> Categories { get; set; }
public IEnumerable<string> Subcategories { get; set; }
}
Not sure how to do this in an efficient manner so that I dont have to write three methods. The table is in the database (hence the need for optimization)
Thanks!

Is it an absolutely unchangeable requirement to use Linq? Why do you need it to be returned in a single query?
Suggestion: Use SQL. It can be done in a single query but you won't like the query. I'm assuming SQL Server (can be done differently for other DBMSes).
WITH V AS (
SELECT DISTINCT
V.*
FROM
Orders O
CROSS APPLY (
VALUES (1, O.Products), (2, O.Categories), (3, O.Subcategories)
) V (Which, Value)
),
Nums AS (
SELECT
Num = Row_Number() OVER (PARTITION BY V.Which ORDER BY V.Value),
V.Which,
V.Value
FROM
V
)
SELECT
Products = P.[1],
Categories = P.[2],
Subcategories = P.[3]
FROM
Nums N
PIVOT (Max(N.Value) FOR N.Which IN ([1], [2], [3])) P
;
See this working at db<>fiddle
Output:
Products Categories Subcategories
-------- ---------- -------------
prod1 cat1 sub1
prod2 cat2 sub2
prod5 cat3 sub3
prod8 cat7 sub6
null null sub8
If you are bound and determined to use Linq, well, I can't help you with the query-style syntax. I only know the C# code style syntax, but here's a stab at that. Unfortunately, I don't think this will do you any good, because I had to use some pretty funky stuff to make it work. It uses essentially the same technique as the SQL query above, only, there's no equivalent of PIVOT in Linq and there's no real natural row object other than a custom class.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program {
public static void Main() {
var data = new List<Order> {
new Order("prod1", "cat1", "sub1"),
new Order("prod1", "cat2", "sub2"),
new Order("prod2", "cat3", "sub6"),
new Order("prod1", "cat1", "sub1"),
new Order("prod5", "cat2", "sub8"),
new Order("prod2", "cat1", "sub1"),
new Order("prod1", "cat7", "sub3"),
new Order("prod8", "cat2", "sub2"),
new Order("prod2", "cat3", "sub1")
};
int max = 0;
var items = data
.SelectMany(o => new List<KeyValuePair<int, string>> {
new KeyValuePair<int, string>(1, o.Products),
new KeyValuePair<int, string>(2, o.Categories),
new KeyValuePair<int, string>(3, o.Subcategories)
})
.Distinct()
.GroupBy(d => d.Key)
.Select(g => {
var l = g.Select(d => d.Value).ToList();
max = Math.Max(max, l.Count);
return l;
})
.ToList();
Enumerable
.Range(0, max)
.Select(i => new {
p = items[0].ItemAtOrDefault(i, null),
c = items[1].ItemAtOrDefault(i, null),
s = items[2].ItemAtOrDefault(i, null)
})
.ToList()
.ForEach(row => Console.WriteLine($"p: {row.p}, c: {row.c}, s: {row.s}"));
}
}
public static class ListExtensions {
public static T ItemAtOrDefault<T>(this List<T> list, int index, T defaultValue)
=> index >= list.Count ? defaultValue : list[index];
}
public class Order {
public Order(string products, string categories, string subcategories) {
Products = products;
Categories = categories;
Subcategories = subcategories;
}
public string Products { get; set; }
public string Categories { get; set; }
public string Subcategories { get; set; }
}
I suppose that we could swap this
.Select(i => new {
p = items[0].ItemAtOrDefault(i, null),
c = items[1].ItemAtOrDefault(i, null),
s = items[2].ItemAtOrDefault(i, null)
})
for this:
.Select(i => new Order(
items[0].ItemAtOrDefault(i, null),
items[1].ItemAtOrDefault(i, null),
items[2].ItemAtOrDefault(i, null)
))
Then use that class's properties in the output section.

As far as i know, you won't be able to do it in a single query. Before thinking how would you do it with C# think how would you do it in SQL; I might be wrong but to me you'll be writing 3 querys anyway.
If you notice some performance issues and this is your actual code:
var prod = (from p in _context.orders select p.products).ToList().Distinct();
You may want to start by removing the .ToList() extension method beacuse that is retrieveng all records to memory and only after that the distinction is applied.
That's because your query expression (from p in ...) returns an IQueryable and calling .ToList() on it makes it IEnumerable. force the current formed SQL query to run and bring the results to memory.
The difference in this case is: Deferred execution
See: https://www.c-sharpcorner.com/UploadFile/rahul4_saxena/ienumerable-vs-iqueryable/

Related

c# - list of objects - group by - get distinct values by key - lambda / linq

i try to get all keys, that have identical values.
data:
public class CustItems
{
public string CustID { get; set; }
public string ItemID { get; set; }
}
List<CustItems> custItems = new List<CustItems>();
// GetData => fill list
custItems.Add(new CustItems { CustID = "1", ItemID = "1" });
No of items: 50'000,
No of customers: 2'000
base list contains 2 fields, meaning is, which customer can buy which item
CustID
ItemID
1
1
1
2
2
2
3
2
4
1
5
1
5
2
1
3
4
3
5
3
i try to find out, which items can be bought by the same customers
according to the demo-data
item 1 by customers 1,4,5
item 2 by customers 1,2,3,5
item 3 by customers 1,4,5
so item 1 and 3 can be bought by the same customers
couldn't find out, how to solve this in a performant way, using lambda or linq.
appreciate any hint very much! thx a lot!
p.s.
started with something like:
var groupedList = from c in custItems
group c by c.ItemID into grp
select new
{
ID = grp.Key,
CustList = grp.Select(g => g.CustID).ToList()
};
but after all, the CustList contains all customers by key (ItemID), but couldn't find a good way to find out, which of the keys (=Item) have identical values (=CustList)

Since your CustID and ItemID are strings (not very optimal performace-wise), I came up with the following linq solution:
var res = custItems
.GroupBy(s => s.ItemID)
.Select(g => new { ItemId = g.Key, Customers = g.Select(i => i.CustID).OrderBy(c => c).Aggregate((c0, c1) => $"{c0},{c1}") })
.GroupBy(g => g.Customers)
.Select(g => new { Customers = g.Key.Split(',').ToList(), Items = g.Select(i => i.ItemId).ToList() })
.ToList();
you first group your list by the ItemID to find out all the customers that buy each individual item
you then create an anonymous type containing the ItemID and a set of CustIDs - I've used string concatenation here, it's the first spot for improvement - converting a set of IDs that can be used for further grouping
then you group the results by the CustIDs sets
and in the end you bring your CustID sets back to a list of IDs and store those in an anonymous type containing the list of CustIDs and list of ItemID that this set of customers buy
finally you convert everything into a list for structured browsing.
Again, combining and splitting the customers (2nd and 4th step) is what can be optimised.

Get results matching a list of Ids using Linq

I am trying to get the list of gameIds that satisfy all the genreIds included in a a List<int>.
The tables (partial):
editorial_list:
game_id
content
game_genres (game can belong to several genres):
id
game_id
genre_id
I need to get the list of the game Ids of games that exists for all the genre_id's in the game_genres table.
For example:
The list of genre_id's includes genre 2 and 3.
If Game Id = 14 exists in game_genres table for both genre_id = 2 and genre_id = 3. So it will be included in the final results.
Here is my code:
// get the list of game_id's that have an editorial
var editorialList = (from ee in db.editorials where ee.is_enabled == true select new {
game_id = ee.game_id
}).ToList();
// Produce a list of the games in the editorials and their genre Ids that the belong to
var gameAndGenres = (from el in editorialList join gg in db.game_genres
on el.game_id equals gg.game_id
select new {
game_id = el.game_id,
genre_id = gg.genre_id
}
);
var res = gameAndGenres.Where(
x => x.genres.Contains(x.genre_id)) == genres.Count; // stuck here
The end results should be a unique list of game_id's, that each game in the list belongs to all the genres that was listed in the genres List<int>.
I created several steps to help me understand the query, but it might be able to be solved in one line, I just wasn't able to solve it.
Update: This is a new code that I'm trying.
var res = gameAndGenres.GroupBy(x => x.game_id)
.Select(g => new {
game_id = g.Key,
genreIds = g.Select( c => c.genre_id)
});
var res2 = res.Where(x => genres.Intersect(x.genreIds).Count()
== genres.Count()).ToList();

The relation between Game and Genre is a many-to-many: a Game can belong to zero or more genres and a Genre can have zero or more Games.
You want (the IDs of) all games that belong to all Genres in your game-genres table.
For example, if your list game-genres contains only records with references to genre 2 and genre 3 then you want all games that belong to genres 2 and 3.
Note that a Genre may exist that is not owned by any 'Game'. In that case there is no record in the game-genres table with a reference to this Genre.
Below an example of your Property Entity Framework classes. The actual names of the classes and properties may vary, but you'll get the Id
public class Game
{
public int GameId {get; set;}
public virtual ICollection<Genre> Genres {get; set;}
... // other properties
}
public class Genre
{
public int GenreId {get; set;
public virtual ICollection<Game> Games {get; set;}
...
}
public MyDbContext : DbContext
{
public DbSet<Game> Games {get; set;}
public DbSet<Genre> Genres {get; set;}
...
}
The entity framework model builder will detect that there is a many-to-many relation between Games and Genres and will automatically add a table like your 'game-genres'.
The nice thing, is that if a certain Genre does not belong to any Game, it won't be in your game_genres table. Also the other way round: if you have an element in your game-genres table, than there is at least one game that belongs to that genre.
So you don't want all genres, you only want genres that are used by at least one Game.
IEnumerable<Genre> usedGenres = dbContext.Genres
.Where(genre => genre.Games.Any());
Now you want only those Games that belong to EVERY genre in usedGenres
= every game where every element of usedGenres is in the collection Game.Genres.
To check if a genre is in the collection of Game.Genres, we only have to compare the GenreId of Game.Genres with the GenreId of usedGenreIds
IEnumerable<int> usedGenreIds = usedGenres
.Select(genre => genre.GenreId);
IEnumerable<Game> gamesWithAllUsedGenres = dbContext.Games
.Where(game => usedGenreIds.All(genreId => game.Genres.Select(genre => genre.GenreIdContains(genreId));

Something like this:
var gamesIds = db.editorials
.Where(e => db.game_genres.Select(gg => gg.genre_id).Distinct().All(genId => db.game_genres.Any(gg => gg.game_id == e.game_id && gg.genre_id == genId)))
.Select(e => e.game_id)
.ToList();
Select game_id from editorials, where the game_id have an entry of all distinct genre_ids in the game_genre table.
If you want all games having all genres in a list instead of all in the table:
List<int> genreIds = new List<int>() {1,2,3};
var gamesIds = db.editorials
.Where(e => genreIds.All(genId => db.game_genres.Any(gg => gg.game_id == e.game_id && gg.genre_id == genId)))
.Select(e => e.game_id)
.ToList();

You can achieve it by groupping rows from your tables.
Using Lambda expressions:
var res = db.game_genres.GroupBy(gg => gg.game_id)
.Where(g => g.Count() == db.genres.Count())
.Select(x => x.Key).ToList();
or using LINQ:
var res = (from gg in db.game_genres
group gg by gg.game_id into g
where g.Count() == db.genres.Count()
select g.Key).ToList();
Such groupping produces one record (called group) per game_id. The Key of each group is the game_id used in the "group by" clause. Each group is a collection of rows having equal game_id, therefore you can treat it like any other collection of database entities, here you filter it using where statement and call Count() on the collection.

seeking elegant linq solution

I have a model like this:
public class Post
{
public int PostId,
public List<Category> Categories
}
Posts have at least 1 category, but can also have many categories.
I have a List, this list contains Posts (some with the same PostId), and each entry in the List contains exactly one unique Category (Categories.Count = 1 for each).
I want to create a new List with only distinct Posts (distinct PostId), with the Categories list populated with each category in the original List having the same PostId.
Basically, find each Post in the original list, and populate the Categories field by adding each of their First (and only) entry in their Categories field together.
Is there a nice solution for this in linq?
Category is just an Enum,
I have tried using varous nested foreach and for loops and it works but it is just gross. I know there is a clean way to do it.
Example:
Categories = { PostId = 1, Category = Shopping }, { PostId = 1, Category = Pizza }, { PostId = 2, Category = Laundry }
after sequence desired output to be:
Categories = { PostId = 1, Categories = Shopping, Pizza }, { PostId = 2, Categories = Laundry }
Order does not matter for the category list

Given that you will have only one category per post (as stated in the second paragraph), you can try
var result = aPosts
.GroupBy(item => item.PostId, item => item.Categories[0])
.Select(group => new Post() { PostId = group.Key, Categories = new List<Category>(group) })
.ToList();
Note that having a Post constructor that accepts both PostId and Categories would allow a more simplified version of any solution.
Post(int postId, IEnumerable<Category> categories)
{
PostId = postId;
Categories = new List<Category>(categories);
}
Would allow the following:
var result = aPosts
.GroupBy(item => item.PostId, item => item.Categories[0])
.Select(group => new Post(group.Key, group))
.ToList();

something like below
var result = yourlist.GroupBy(l=>l.PostId)
.Select(x=>new Post{ PostId =x.Key, Categories =x.SelectMany(y=>y.Categories).ToList()})
.ToList();

With LINQ expressions:
var result = from o in posts
group o by o.PostID into gr
select new Post
{
PostID = gr.Key,
Categories = gr.SelectMany(c=>c.Categories).ToList()
};

All the other given solutions would work. But if you might have more than 1 category in the Category list, and you need only the first of each Post you can use following.
var posts =
postList.GroupBy(p => p.PostId)
.Select(
g =>
new Post
{
PostId = g.Key,
Categories =
g.Select(p => p.Categories.FirstOrDefault())
.Where(c => c != null).ToList()
});
Also, make sure you initialize you Categories property (e.g. in the constructor of Post class) before using Linq given in the answers. Otherwise you might get NUllReferenceException.

Group By Query with Entity Framework

In my application I have Movements associated with a category.
I want a list of the most frequent category.
My objects are:
Category: catId, catName
Movement: Movid, movDate, movMount, catId
I think it would have to raise it with a "Group By" query (grouping by catId and getting those more)
(Im using Entity Framework 6 in c#)
From already thank you very much!

IMPORTANT: Entity Framework 7 (now renamed to Entity Framework Core 1.0) does not yet support GroupBy() for translation to GROUP BY in generated SQL. Any grouping logic will run on the client side, which could cause a lot of data to be loaded.
https://blogs.msdn.microsoft.com/dotnet/2016/05/16/announcing-entity-framework-core-rc2

group the movements by category and select catid and count.
join this result with category to get the name and then descending sort the results on count.
var groupedCategories = context.Movements.GroupBy(m=>m.catId).Select(g=>new {CatId = g.Key, Count = g.Count()});
var frequentCategories = groupedCategories.Join(context.Categories, g => g.CatId, c => c.catId, (g,c) => new { catId = c.catId, catName = c.catName, count = g.Count }).OrderByDescending(r => r.Count);
foreach (var category in frequentCategories)
{
// category.catId, category.catName and category.Count
}

i hope this help:
var query = dbContext.Category.Select(u => new
{
Cat = u,
MovementCount = u.Movement.Count()
})
.ToList()
.OrderByDescending(u => u.MovementCount)
.Select(u => u.Cat)
.ToList();

I resolved the problem!
I used the proposal by "Raja" solution (Thanks a lot!).
This return a collection composed of "Category" and "Count". I Change it a bit to return a list of Categories.
var groupedCategories = model.Movement.GroupBy(m => m.catId).Select(
g => new {catId= g.Key, Count = g.Count() });
var freqCategories= groupedCategories.Join(model.Category,
g => g.catId,
c => c.catId,
(g, c) => new {category = c, count = g.Count}).OrderByDescending(ca => ca.count).Select(fc => fc.category).ToList ();

you just need to use navigation property on category simply, you have a navigation property on category contains all related Movement, i call it Movements in following query. you can write your query like this, with minimum of connection with DB.
class Cat
{
public Guid catId { get; set; }
public string catName { get; set; }
public IEnumerable<Movement> Movements { get; set; }
public int MovementsCount { get { return Movements.Count(); } }
}
var Categories = category.Select(u => new Cat()
{
u.catId,
u.catName,
Movements = u.Movements.AsEnumerable()
}).ToList();
var CategoriesIncludeCount = Categories.OrderBy(u => u.MovementsCount).ToList();

Ordering collection within group partitions using LINQ

I have simple type Question:
public class Question
{
public string[] Tags { get; set; }
public DateTime Created { get; set; }
}
While I have a list of questions, I need to filter them along list of tags (called filters). The questions which have the most tags matched by the filters list, should be placed higher in the result collection. I wrote expression for that:
public IList<Question> GetSimiliar(IList<Questions> all, string[] filters)
{
var questions = all.Select(
x => new
{
MatchedTags = x.Tags
.Count(tag => filters.Contains(tag)),
Question = x
})
.Where(x => x.MatchedTags > 0)
.OrderByDescending(x => x.MatchedTags)
.Select(x => x.Question);
return questions.ToList();
}
Now I need a support for such situation, where I have more than one question with the same quantity of matched tags. Such questions should be further sorted by creation date (from newest to oldest).
Example of what I want:
filter: tags = [a,b,c]
collection of questions to be filtered:
q1 { tags = [a], created = 1939 }
q2 { tags = [b], created = 1945 }
q3 { tags = [a,b,c], created = 1800 }
q4 { tags = [a,b], created = 2012 }
q5 { tags = [z], created = 1999 }
result - the sorted collection:
q3
q4
q2
q1
How to do that using linq ?

Now I need a support for such situation, where I have more than one question with the same quantity of matched tags. Such questions should be further sorted by creation date (from newest to oldest).
Use ThenBy or ThenByDescending to further sort your query. Use these methods to break ties in prior ordering.
.OrderByDescending(x => x.MatchedTags)
.ThenByDescending(x => x.Question.Created)
.Select(x => x.Question);

The 101 Linq Samples page has a nested grouping example. This sample uses group by to partition a list of each customer's orders, first by year, and then by month:
public void Linq43()
{
List<Customer> customers = GetCustomerList();
var customerOrderGroups =
from c in customers
select
new
{
c.CompanyName,
YearGroups =
from o in c.Orders
group o by o.OrderDate.Year into yg
select
new
{
Year = yg.Key,
MonthGroups =
from o in yg
group o by o.OrderDate.Month into mg
select new { Month = mg.Key, Orders = mg }
}
};
ObjectDumper.Write(customerOrderGroups, 3);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Selecting unique values of different columns using LINQ - c#

Related

c# - list of objects - group by - get distinct values by key - lambda / linq

Get results matching a list of Ids using Linq

seeking elegant linq solution

Group By Query with Entity Framework

Ordering collection within group partitions using LINQ

Categories

Resources