Flatten Linq query of nested groupings with combined key in ASP.net - c#

In my database there are player-objects which do have a nationality and points.
I need a possibility to nest my groupings. Because they are given by the client, grouping into an anonymous key seems no option here.
So I have to nest them like
Players.Where(p => p.Created <= /*some date*/)
.GroupBy(p => p.Nationality) // group them by their nationality
.Select(arg => new {
arg.Key,
Elements = arg.GroupBy(p => p.Points > 0) // group them by the ones with points and the ones without
})
. // here i need to flatten them by also combining the key(s) of the groupings to put them into a dictionary
.ToDictionary(/*...*/);
At the end, the Dictionary should contain the keys as string like ["USA|true"], ["USA|false"] or ["GER|true"] with their respective elements.
I guess SelectMany is the key but I don't get the point where to start from to achieve this.

What about this solution:
public class Player
{
public string Nationality {get;set;}
public int Points {get;set;}
public double otherProp {get;set;}
//new field is added
public string groupings {get;set;}
}
var groups = new List<Func<Player, string>>();
groups.Add(x => x.Nationality);
groups.Add(x => (x.Points > 0).ToString().ToLower());
Players.ForEach(x =>
groups.ForEach(y => x.groupings = x.groupings + (x.groupings == null ? "" : "|") + y(x))
);
var answer = Players.GroupBy(x => x.groupings).ToDictionary(x => x.Key, x => x.ToList());

Answering your concrete question.
As you mentioned, SelectMany is the key. The place in your query is right after the Select:
.Select(...)
.SelectMany(g1 => g1.Elements.Select(g2 => new {
Key = g1.Key + "|" + g2.Key, Elements = g2.ToList() }))
.ToDictionary(g => g.Key, g => g.Elements);
It can also replace the Select (i.e. start right after the first GroupBy):
.GroupBy(p => p.Nationality)
.SelectMany(g1 => g1.GroupBy(p => p.Points > 0).Select(g2 => new {
Key = g1.Key + "|" + g2.Key, Elements = g2.ToList() }))
.ToDictionary(g => g.Key, g => g.Elements);

Related

LINQ to SQL - order by, group by and order by each group with skip and take

This is an extension of already answered question by Jon Skeet that you can find here.
The desired result is following:
A 100
A 80
B 80
B 50
B 40
C 70
C 30
considering you have following class:
public class Student
{
public string Name { get; set; }
public int Grade { get; set; }
}
to get to the result (in ideal scenario) can be done with Jon Skeet's answer:
var query = grades.GroupBy(student => student.Name)
.Select(group =>
new { Name = group.Key,
Students = group.OrderByDescending(x => x.Grade) })
.OrderBy(group => group.Students.FirstOrDefault().Grade);
However in my case I have to support paging in my query as well. This means performing SelectMany() and then do Skip() and Take(). But to do Skip() you have to apply OrderBy(). This is where my ordering breaks again as I need to preserve the order I get after SelectMany().
How to achieve this?
var query = grades.GroupBy(student => student.Name)
.Select(group =>
new { Name = group.Key,
Students = group.OrderByDescending(x => x.Grade) })
.OrderBy(group => group.Students.FirstOrDefault().Grade).SelectMany(s => s.Students).OrderBy(something magical that doesn't break ordering).Skip(s => skip).Take(t => take);
I know I could manually sort again the records when my query is materialised but I would like to avoid this and do all of it in one SQL query that is translated from LINQ.
You can take another approach using Max instead of ordering each group and taking the first value. After that you can order by max grade, name (in case two students have the same max grade) and grade:
var query = c.Customers
.GroupBy(s => s.Name, (k, g) => g
.Select(s => new { MaxGrade = g.Max(s2 => s2.Grade), Student = s }))
.SelectMany(s => s)
.OrderBy(s => s.MaxGrade)
.ThenBy(s => s.Student.Name)
.ThenByDescending(s => s.Student.Grade)
.Select(s => s.Student)
.Skip(toSkip)
.Take(toTake)
.ToList();
All these methods are supported by EF6 so you should get your desired result.
Just re-index your list results and remove the index before returning.
var query = grades.GroupBy(student => student.Name)
.Select(group =>
new { Name = group.Key,
Students = group.OrderByDescending(x => x.Grade)
})
.OrderBy(group => group.Students.FirstOrDefault().Grade)
.SelectMany(s => s.Students)
.Select((obj,index) => new {obj,index})
.OrderBy(newindex => newindex.index)
.Skip(s => skip).Take(t => take)
.Select(final=> final.obj);

Finding the most specific matching item

User input will be like 'BY1 2PX', which will split and stored into list like below
var items = new List<string> {'BY1 2PX', 'BY12', 'BY1', 'BY'};
I have source list of Products
public class Product
{
public string Name {get;set;}
public string Id {get;set;}
}
Below is a sample product list. There is no guarentee on ordering, it could be in any order.
var products = new List<Product>{
new Product("1", "BY1 2PX"),
new Product("2", "BY12"),
new Product("3", "BY1"),
new Product("4", "BY"),
new Product("5", "AA2 B2X"),
//...etc
}
my output should fetch 1, because its most specific match. If Id = 1 is not there then it should have fetched Id =2 like that...etc Could anyone help me in writing a linq query. I have tried something like below, is this fine?
var result = items.Select(x => products.FirstOrDefault(p =>
string.Equals(p.Name.Trim(), x, StringComparison.OrdinalIgnoreCase)))
.FirstOrDefault();
Well, you can use dictionary with its fast lookups :
var productsDict = products.ToDictionary(p => p.Name, p => p);
var key = items.FirstOrDefault(i => productsDict.ContainsKey(i));
Product result = key != null ? productsDict[key] : null;
Or as Tim suggested, if you have multiple elements with same names you can use Lookup :
var productsDict = products.ToLookup(p => p.Name, p => p);
var key = items.FirstOrDefault(i => productsDict.Contains(i));
Product result = key != null ? productsDict[key] : null;
If you want to select the best-matching product you need to select from the product- not the string-list. You could use following LINQ approach that uses List.FindIndex:
Product bestProduct = products
.Select(p => new {
Product = p,
Index = items.FindIndex(s => String.Equals(p.Name, s, StringComparison.OrdinalIgnoreCase))
})
.Where(x => x.Index != -1)
.OrderBy(x => x.Index) // ensures the best-match logic
.Select(x => x.Product)
.FirstOrDefault();
The Where ensures that you won't get an arbitrary product if there is no matching one.
Update:
A more efficient solution is this query:
Product bestProduct = items
.Select(item => products.FirstOrDefault(p => String.Equals(p.Name, item, StringComparison.OrdinalIgnoreCase)))
.FirstOrDefault(p != null); // ensures the best-match logic
You can try to find resemblance of words by using a specific algorythm called Levenshtein's distance algorythm, which is mostly used on "Did you mean 'word'" on most search websites.
This solution can be found here:
https://stackoverflow.com/a/9453762/1372750
Once you find the distance difference, you can measure which word or phrase is more "like" the searched one.
This will find for each product what is the "most specific" (the longest) match in items and will return the product with the longest match (regardless to order of either of the collections)
var result = products
.Select(p => new
{
Product = p,
MostSpecific = items.Where(item => p.Name.Contains(item))
.OrderByDescending(match => match.Length
.FirstOrDefault()
})
.Where(x => x.MostSpecific != null)
.OrderByDescending(x => x.MostSpecific.Length)
.Select(x => x.Product)
.FirstOrDefault();

Can this query about finding missing keys be improved? (either SQL or LINQ)

I am developing a ASP.NET MVC website and is looking a way to improve this routine. It can be improved either at LINQ level or SQL Server level. I hope at best we can do it within one query call.
Here is the tables involved and some example data:
We have no constraint that every Key has to have each LanguageId value, and indeed the business logic does not allow such contraint. However, at application level, we want to warn the admin that a key is missing a/some language values. So I have this class and query:
public class LocalizationKeyWithMissingCodes
{
public string Key { get; set; }
public IEnumerable<string> MissingCodes { get; set; }
}
This method get the Key list, as well as any missing codes (for example, if we have en + jp + ch language codes, and the key only has values for en + ch, the list will contains jp):
public IEnumerable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes()
{
var languageList = Utils.ResolveDependency<ILanguageRepository>().GetActive();
var languageIdList = languageList.Select(q => q.Id);
var languageIdDictionary = languageList.ToDictionary(q => q.Id);
var keyList = this.GetActive()
.Select(q => q.Key)
.Distinct();
var result = new List<LocalizationKeyWithMissingCodes>();
foreach (var key in keyList)
{
// Get missing codes
var existingCodes = this.Get(q => q.Active && q.Key == key)
.Select(q => q.LanguageId);
// ToList to make sure it is processed at application
var missingLangId = languageList.Where(q => !existingCodes.Contains(q.Id))
.ToList();
result.Add(new LocalizationKeyWithMissingCodes()
{
Key = key,
MissingCodes = missingLangId
.Select(q => languageIdDictionary[q.Id].Code),
});
}
result = result.OrderByDescending(q => q.MissingCodes.Count() > 0)
.ThenBy(q => q.Key)
.ToList();
return result;
}
I think my current solution is not good, because it make a query call for each key. Is there a way to improve it, by either making it faster, or pack within one query call?
EDIT: This is the final query of the answer:
public IQueryable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes()
{
var languageList = Utils.ResolveDependency<ILanguageRepository>().GetActive();
var localizationList = this.GetActive();
return localizationList
.GroupBy(q => q.Key, (key, items) => new LocalizationKeyWithMissingCodes()
{
Key = key,
MissingCodes = languageList
.GroupJoin(
items,
lang => lang.Id,
loc => loc.LanguageId,
(lang, loc) => loc.Any() ? null : lang)
.Where(q => q != null)
.Select(q => q.Code)
}).OrderByDescending(q => q.MissingCodes.Count() > 0) // Show the missing keys on the top
.ThenBy(q => q.Key);
}
Another possibility, using LINQ:
public IEnumerable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes(
List<Language> languages,
List<Localization> localizations)
{
return localizations
.GroupBy(x => x.Key, (key, items) => new LocalizationKeyWithMissingCodes
{
Key = key,
MissingCodes = languages
.GroupJoin( // check if there is one or more match for each language
items,
x => x.Id,
y => y.LanguageId,
(x, ys) => ys.Any() ? null : x)
.Where(x => x != null) // eliminate all languages with a match
.Select(x => x.Code) // grab the code
})
.Where(x => x.MissingCodes.Any()); // eliminate all complete keys
}
Here is the SQL logic to identify the keys that are missing "complete" language assignments:
SELECT
all.[Key],
all.LanguageId
FROM
(
SELECT
loc.[Key],
lang.LanguageId
FROM
Language lang
FULL OUTER JOIN
Localization loc
ON (1 = 1)
WHERE
lang.Active = 1
) all
LEFT JOIN
Localization loc
ON (loc.[Key] = all.[Key])
AND (loc.LanguageId = all.LanguageId)
WHERE
loc.[Key] IS NULL;
To see all keys (instead of filtering):
SELECT
all.[Key],
all.LanguageId,
CASE WHEN loc.[Key] IS NULL THEN 1 ELSE 0 END AS Flagged
FROM
(
SELECT
loc.[Key],
lang.LanguageId
FROM
Language lang
FULL OUTER JOIN
Localization loc
ON (1 = 1)
WHERE
lang.Active = 1
) all
LEFT JOIN
Localization loc
ON (loc.[Key] = all.[Key])
AND (loc.LanguageId = all.LanguageId);
your code seems to be doing a lot of database query and materialization..
in terms of LINQ, the single query would look like this..
we take the cartesian product of language and localization tables to get all combinations of (key, code) and then subtract the (key, code) tuples that exist in the relationship. this gives us the (key, code) combination that don't exist.
var result = context.Languages.Join(context.Localizations, lang => true,
loc => true, (lang, loc) => new { Key = loc.Key, Code = lang.Code })
.Except(context.Languages.Join(context.Localizations, lang => lang.Id,
loc => loc.LanguageId, (lang, loc) => new { Key = loc.Key, Code = lang.Code }))
.GroupBy(r => r.Key).Select(r => new LocalizationKeyWithMissingCodes
{
Key = r.Key,
MissingCodes = r.Select(kc => kc.Code).ToList()
})
.ToList()
.OrderByDescending(lkmc => lkmc.MissingCodes.Count())
.ThenBy(lkmc => lkmc.Key).ToList();
p.s. i typed this LINQ query on the go, so let me know if it has syntax issues..
the gist of the query is that we take a cartesian product and subtract matching rows.

Join an array of string with the result of an existing linq statement

As a follow up to my last question here:
Filtering a list of HtmlElements based on a list of partial ids
I need to take this statement:
doc.All.Cast<HtmlElement>()
.Where(x => x.Id != null)
.Where(x => ids
.Any(id => x.Id.Contains(id))).ToList();
and join it with an array of strings called fields. Assuming the array and list will have the same amount of elements each and line up correctly. I tried using Zip() but thought I might need to use an additional linq statement to make it work.
Assuming that fieldList[0] and IdList[0] corresponding to each other, you can do the following:
var IdList = doc.All.Cast<HtmlElement>()
.Where(x => x.Id != null)
.Where(x => ids
.Any(id => x.Id.Contains(id))).ToList();
var resultList = fieldList
.Select( (item, index) => new { Field = item, Id = IdList[index] })
.ToDictionary(x => x.Id, x => x.Field);
You have mentioned it already, you can use Enumerable.Join:
var joined = from id in fields
join ele in elements on id equals ele.Id
select new { Element = ele, ID = id };
var dict = joined.ToDictionary(x => x.ID, x => x.Element);
I've presumed that you want to join them via ID. I've also presumed that the string[] contains only unique ID's. Otherwise you need to use Distinct.

Grouping a list of list using linq

I have these tables
public class TaskDetails
{
public string EmployeeName {get; set;}
public decimal EmployeeHours {get; set;}
}
public class Tasks
{
public string TaskName {get; set;}
public List<TaskDetails> TaskList {get; set;}
}
I have a function that returns a List<Tasks>. What I would need is to create a new List that groups the EmployeeNames and SUM the EmployeeHours irrespective of the TaskName. That is, I need to fetch TotalHours of each Employees. How to get that?
P.S: And to what have I done so far. I have stared at the code for a long time. Tried Rubber Duck Problem solving to no avail. I can do get the results using a foreach and placing it to a Dictionary<string, decimal>. That logic will be to check if key does not exist, add a new key and assign the value and if the key exists add the decimal value to the original value. But I feel its too much here. I feel there is a ForEach - GroupBy - Sum combination which I am missing.
Any pointers on how to do it will be very helpful for me.
var results = tasks.SelectMany(x => x.Tasks)
.GroupBy(x => x.EmployeeName)
.ToDictionary(g => g.Key, g => g.Sum(x => x.EmployeeHours));
Gives you Dictionary<string, decimal>.
To get a list just replace ToDictionary with Select/ToList chain:
var results = tasks.SelectMany(x => x.Tasks)
.GroupBy(x => x.EmployeeName)
.Select(g => new {
EmployeeName = g.Key,
Sum = g.Sum(x => x.EmployeeHours)
}).ToList();
a SelectMany would help, I think.
It will "flatten" the Lists of TaskDetail of all your Task elements into a single IEnumerable<TaskDetail>
var result = listOfTasks.SelectMany(x => x.Tasks)
.GroupBy(m => m.EmployeeName)
.Select(m => new {
empName = m.Key,
hours = m.Sum(x => x.EmployeeHours)
});
var emplWithHours = allTasks
.SelectMany(t => t.Tasks)
.GroupBy(empl => empl.EmployeeName)
.Select(empl => new
{
EmployeeName = empl.Key,
TotalHours = empl.Sum(hour => hour.EmployeeHours)
}).ToDictionary(i => i.EmployeeName, i => i.TotalHours);
Also, when both your class name and field name is Tasks, it gives a compile-time error:
Error 1 'Tasks': member names cannot be the same as their enclosing type
I would have named your class Task since it represents a single task.
I would do it this way:
var query =
(
from t in tasks
from td in t.TaskList
group td.EmployeeHours by td.EmployeeName into ghs
select new
{
EmployeeName = ghs.Key,
EmployeeHours = ghs.Sum(),
}
).ToDictionary(x => x.EmployeeName, x => x.EmployeeHours);
I slightly more succinct query would be this:
var query =
(
from t in tasks
from td in t.TaskList
group td.EmployeeHours by td.EmployeeName
).ToDictionary(x => x.Key, x => x.Sum());
There are pros and cons to each. I think the first is more explicit, but the second a little neater.

Categories