I am having trouble trying to understand how to perform an order by in a LINQ to Entities call to return data organized in the desired order. The database used is postgresql. The order by in postgres is:
SELECT
*
FROM
part
ORDER BY
split_part(partnumber, '-',1)::int
, split_part(partnumber, '-',2)::int
Partnumber is a string field which is formated into 2-3 segments which are numeric separated by '-'. eg:
1-000235
10-100364
9-123456
etc.
I would want the sorted result to return:
1-000235
9-123456
10-100364
I have a test VB.Net app I am trying to figure out how to do this:
Using ctx As New EFWeb.MaverickEntities
Dim myparts = ctx.parts.
OrderBy(Function(e) e.partnumber).
ToList()
For Each pt As part In myparts
Console.WriteLine("{0} - {1}", pt.partnumber, pt.description)
Next
End Using
I tried doing: CInt(e.partnumber.Split("-")(0)) to force sorting for the first segment of the partnumber, but errored out because of the the compiler did not like the array referrence for the result of the Split() call.
If anybody knows of a good current reference for LINQ to Entities ... that would be appreciated.
You didn't share your Linq code. Anyway I would get the data to client side and then do the ordering. In C#:
var result = ctx.Parts.AsEnumerable()
.Select(p => new {p, pnSplit = p.PartNumber.Split('-')})
.OrderBy(x => int.Parse(x.pnSplit[0]))
.ThenBy(x => int.Parse(x.pnSplit[1]))
.Select(x => x.p);
In VB it should be:
Dim result = ctx.Parts.AsEnumerable()
Select(Function(p) New With {p, .pnSplit = p.PartNumber.Split("-"c)}).
OrderBy(Function(x) Integer.Parse(x.pnSplit(0))).
ThenBy(Function(x) Integer.Parse(x.pnSplit(1))).
Select(Function(x) x.p)
Note the integer.Parse. Otherwise it would be alphabetic sort.
I have a list of data retrieved from SQL and stored in a class. I want to now aggregate the data using LINQ in C# rather than querying the database again on a different dataset.
Example data I have is above.
Date, Period, Price, Vol and I am trying to create a histogram using this data. I tried to use Linq code below but seem to be getting a 0 sum.
Period needs to be a where clause based on a variable
Volume needs to be aggregated for the price ranges
Price needs to be a bucket and grouped on this column
I dont want a range. Just a number for each bucket.
Example output I want is (not real data just as example):
Bucket SumVol
18000 50
18100 30
18200 20
Attempted the following LINQ query but my SUM seems to be be empty. I still need to add my where clause in, but for some reason the data is not aggregating.
var ranges = new[] { 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000 };
var priceGroups = eod.GroupBy(x => ranges.FirstOrDefault(r => r > x.price))
.Select(g => new { Price = g.Key, Sum = g.Sum(s => s.vol)})
.ToList();
var grouped = ranges.Select(r => new
{
Price = r,
Sum = priceGroups.Where(g => g.Price > r || g.Price == 0).Sum(g => g.Sum)
});
First things first... There seems to be nothing wrong with your priceGroups list. I've run that on my end and, as far as I can understand your purpose, it seems to be grabbing the expected values from your dataset.
var ranges = new[] { 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000 };
var priceGroups = eod.GroupBy(x => ranges.FirstOrDefault(r => r > x.price))
.Select(g => new { Price = g.Key, Sum = g.Sum(s => s.vol) })
.ToList();
Now, I assume your intent with the grouped list was to obtain yet another anonymous type list, much like you did with your priceGroups list, which is also an anonymous type list... List<'a> in C#.
var grouped = ranges.Select(r => new
{
Price = r,
Sum = priceGroups.Where(g => g.Price > r || g.Price == 0).Sum(g => g.Sum)
});
For starters, your are missing the ToList() method call at the end of it. However, that's not the main issue here, as you could still work with an IEnumerable<'a> just as well for most purposes.
As I see it, the core problem is at your anonymous property Sum attribution. Why are your filtering for g.Price > r || g.Price == 0?
There is no element with Price equal to zero on your priceGroups list. Those are a subset of ranges, and there is no zero there. Then you are comparing every value in ranges against that subset in priceGroups, and consolidating the Sums of every element in priceGroups that have Price higher than the range being evaluated. In other words, the property Sum in your grouped list is a sum of sums.
Keep in mind that priceGroups is already an aggregated list. It seems to me you are trying to aggregate it again when you call the Sum() method after a Where() clause like you are doing. That doesn't make much sense.
What you want (I believe) for the Sum property in the grouped list is for it to be the same as the Sum property in the priceGroups list, if the range being evaluated matches the Price being evaluated. Furthermore, where there is no matches, you want your grouped list Sum to be zero, as that means the range being evaluated was not in the original dataset. You can achieve that with the following instead:
Sum = priceGroups.FirstOrDefault(g => g.Price == r)?.Sum ?? 0
You said your Sum was "empty" in your post, but that's not the behavior I saw on my end. Try the above and, if still not behaving as you would expect, share a small dataset for which you know the expected output with me and I can try to help you further.
Use LINQ instead to query the DB is great, mainly because you are saving process avoiding a new call to your DB. And in case you don't have a high update BD (that change the data very quickly) you can use the retrived data to calculate all using LINQ
Say you have columns AppleType, CreationDate and want to order each group of AppleType by CreationDate. Furthermore, you want to create a new column which explicitly ranks the order of the CreationDate per AppleType.
So, the resulting DataSet would have three columns, AppleType, CreationDate, OrderIntroduced.
Is there a LINQ way of doing this? Would I have to actually go through the data programmatically (but not via LINQ), create an array, convert that to a column and add to the DataSet? I have there is a LINQ way of doing this. Please use LINQ non-method syntax if possible.
So are the values actually appearing in the right order? If so, it's easy - but you do need to use method syntax, as the query expression syntax doesn't support the relevant overload:
var queryWithIndex = queryWithoutIndex.Select((x, index) => new
{
x.AppleType,
x.CreationDate,
OrderIntroduced = index + 1,
});
(That's assuming you want OrderIntroduced starting at 1.)
I don't know offhand how you'd then put that back into a DataSet - but do you really need it in a DataSet as opposed to in the strongly-typed sequence?
EDIT: Okay, the requirements are still unclear, but I think you want something like:
var query = dataSource.GroupBy(x => x.AppleType)
.SelectMany(g => g.OrderBy(x => x.CreationDate)
.Select((x, index ) => new {
x.AppleType,
x.CreationDate,
OrderIntroduced = index + 1 }));
Note: The GroupBy and SelectMany calls here can be put in query expression syntax, but I believe it would make it more messy in this case. It's worth being comfortable with both forms.
If you want a pure Linq to Entities/SQL solution you can do something like this:
Modified to handle duplicate CreationDate's
var query = from a in context.AppleGroup
orderby a.CreationDate
select new
{
AppleType = a.AppleType,
CreationDate = a.CreationDate,
OrderIntroduced = (from b in context.AppleGroup
where b.CreationDate < a.CreationDate
select b).Count() + 1
};
I need to add a literal value to a query. My attempt
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
In the above example, I get an error:
"Local sequence cannot be used in LINQ to SQL implementation
of query operators except the Contains() operator."
If I am using Entity Framework 4 for example, what could I add to the Union statement to always include the "seed" ID?
I am trying to produce SQL code like the following:
select distinct ID
from product
union
select 0 as ID
So later I can join the list to itself so I can find all values where the next highest value is not present (finding the lowest available ID in the set).
Edit: Original Linq Query to find lowest available ID
var skuQuery = Context.Products
.Where(p => p.sku > skuSeedStart &&
p.sku < skuSeedEnd)
.Select(p => p.sku).Distinct();
var lowestSkuAvailableList =
(from p1 in skuQuery
from p2 in skuQuery.Where(a => a == p1 + 1).DefaultIfEmpty()
where p2 == 0 // zero is default for long where it would be null
select p1).ToList();
var Answer = (lowestSkuAvailableList.Count == 0
? skuSeedStart :
lowestSkuAvailableList.Min()) + 1;
This code creates two SKU sets offset by one, then selects the SKU where the next highest doesn't exist. Afterward, it selects the minimum of that (lowest SKU where next highest is available).
For this to work, the seed must be in the set joined together.
Your problem is that your query is being turned entirely into a LINQ-to-SQL query, when what you need is a LINQ-to-SQL query with local manipulation on top of it.
The solution is to tell the compiler that you want to use LINQ-to-Objects after processing the query (in other words, change the extension method resolution to look at IEnumerable<T>, not IQueryable<T>). The easiest way to do this is to tack AsEnumerable() onto the end of your query, like so:
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().AsEnumerable().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
Up front: not answering exactly the question you asked, but solving your problem in a different way.
How about this:
var a = Products.Select(p => p.sku).Distinct().ToList();
a.Add(0);
a.Dump(); // LinqPad's way of showing the values
You should create database table for storing constant values and pass query from this table to Union operator.
For example, let's imagine table "Defaults" with fields "Name" and "Value" with only one record ("SKU", 0).
Then you can rewrite your expression like this:
var zero = context.Defaults.Where(_=>_.Name == "SKU").Select(_=>_.Value);
var result = context.Products.Select(p => p.sku).Distinct().Union(zero).ToList();
What is the best way to get the Max value from a LINQ query that may return no rows? If I just do
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).Max
I get an error when the query returns no rows. I could do
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter _
Order By MyCounter Descending).FirstOrDefault
but that feels a little obtuse for such a simple request. Am I missing a better way to do it?
UPDATE: Here's the back story: I'm trying to retrieve the next eligibility counter from a child table (legacy system, don't get me started...). The first eligibility row for each patient is always 1, the second is 2, etc. (obviously this is not the primary key of the child table). So, I'm selecting the max existing counter value for a patient, and then adding 1 to it to create a new row. When there are no existing child values, I need the query to return 0 (so adding 1 will give me a counter value of 1). Note that I don't want to rely on the raw count of child rows, in case the legacy app introduces gaps in the counter values (possible). My bad for trying to make the question too generic.
Since DefaultIfEmpty isn't implemented in LINQ to SQL, I did a search on the error it returned and found a fascinating article that deals with null sets in aggregate functions. To summarize what I found, you can get around this limitation by casting to a nullable within your select. My VB is a little rusty, but I think it'd go something like this:
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select CType(y.MyCounter, Integer?)).Max
Or in C#:
var x = (from y in context.MyTable
where y.MyField == value
select (int?)y.MyCounter).Max();
I just had a similar problem, but I was using LINQ extension methods on a list rather than query syntax. The casting to a Nullable trick works there as well:
int max = list.Max(i => (int?)i.MyCounter) ?? 0;
Sounds like a case for DefaultIfEmpty (untested code follows):
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).DefaultIfEmpty.Max
Think about what you're asking!
The max of {1, 2, 3, -1, -2, -3} is obviously 3. The max of {2} is obviously 2. But what is the max of the empty set { }? Obviously that is a meaningless question. The max of the empty set is simply not defined. Attempting to get an answer is a mathematical error. The max of any set must itself be an element in that set. The empty set has no elements, so claiming that some particular number is the max of that set without being in that set is a mathematical contradiction.
Just as it is correct behavior for the computer to throw an exception when the programmer asks it to divide by zero, so it is correct behavior for the computer to throw an exception when the programmer asks it to take the max of the empty set. Division by zero, taking the max of the empty set, wiggering the spacklerorke, and riding the flying unicorn to Neverland are all meaningless, impossible, undefined.
Now, what is it that you actually want to do?
You could always add Double.MinValue to the sequence. This would ensure that there is at least one element and Max would return it only if it is actually the minimum. To determine which option is more efficient (Concat, FirstOrDefault or Take(1)), you should perform adequate benchmarking.
double x = context.MyTable
.Where(y => y.MyField == value)
.Select(y => y.MyCounter)
.Concat(new double[]{Double.MinValue})
.Max();
int max = list.Any() ? list.Max(i => i.MyCounter) : 0;
If the list has any elements (ie. not empty), it will take the max of the MyCounter field, else will return 0.
Since .Net 3.5 you can use DefaultIfEmpty() passing the default value as an argument. Something like one of the followings ways:
int max = (from e in context.Table where e.Year == year select e.RecordNumber).DefaultIfEmpty(0).Max();
DateTime maxDate = (from e in context.Table where e.Year == year select e.StartDate ?? DateTime.MinValue).DefaultIfEmpty(DateTime.MinValue).Max();
The first one is allowed when you query a NOT NULL column and the second one is the way a used it to query a NULLABLE column. If you use DefaultIfEmpty() without arguments the default value will be that defined to the type of your output, as you can see in the Default Values Table .
The resulting SELECT will not be so elegant but it's acceptable.
Hope it helps.
I think the issue is what do you want to happen when the query has no results. If this is an exceptional case then I would wrap the query in a try/catch block and handle the exception that the standard query generates. If it's ok to have the query return no results, then you need to figure out what you want the result to be in that case. It may be that #David's answer (or something similar will work). That is, if the MAX will always be positive, then it may be enough to insert a known "bad" value into the list that will only be selected if there are no results. Generally, I would expect a query that is retrieving a maximum to have some data to work on and I would go the try/catch route as otherwise you are always forced to check if the value you obtained is correct or not. I'd rather that the non-exceptional case was just able to use the obtained value.
Try
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).Max
... continue working with x ...
Catch ex As SqlException
... do error processing ...
End Try
Another possibility would be grouping, similar to how you might approach it in raw SQL:
from y in context.MyTable
group y.MyCounter by y.MyField into GrpByMyField
where GrpByMyField.Key == value
select GrpByMyField.Max()
The only thing is (testing again in LINQPad) switching to the VB LINQ flavor gives syntax errors on the grouping clause. I'm sure the conceptual equivalent is easy enough to find, I just don't know how to reflect it in VB.
The generated SQL would be something along the lines of:
SELECT [t1].[MaxValue]
FROM (
SELECT MAX([t0].[MyCounter) AS [MaxValue], [t0].[MyField]
FROM [MyTable] AS [t0]
GROUP BY [t0].[MyField]
) AS [t1]
WHERE [t1].[MyField] = #p0
The nested SELECT looks icky, like the query execution would retrieve all rows then select the matching one from the retrieved set... the question is whether or not SQL Server optimizes the query into something comparable to applying the where clause to the inner SELECT. I'm looking into that now...
I'm not well-versed in interpreting execution plans in SQL Server, but it looks like when the WHERE clause is on the outer SELECT, the number of actual rows resulting in that step is all rows in the table, versus only the matching rows when the WHERE clause is on the inner SELECT. That said, it looks like only 1% cost is shifted to the following step when all rows are considered, and either way only one row ever comes back from the SQL Server so maybe it's not that big of a difference in the grand scheme of things.
litt late, but I had the same concern...
Rephrasing your code from the original post, you want the max of the set S defined by
(From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter)
Taking in account your last comment
Suffice to say that I know I want 0
when there are no records to select
from, which definitely has an impact
on the eventual solution
I can rephrase your problem as: You want the max of {0 + S}.
And it looks like the proposed solution with concat is semantically the right one :-)
var max = new[]{0}
.Concat((From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.Max();
Why Not something more direct like:
Dim x = context.MyTable.Max(Function(DataItem) DataItem.MyField = Value)
One interesting difference that seems worth noting is that while FirstOrDefault and Take(1) generate the same SQL (according to LINQPad, anyway), FirstOrDefault returns a value--the default--when there are no matching rows and Take(1) returns no results... at least in LINQPad.
Just to let everyone out there know that is using Linq to Entities the methods above will not work...
If you try to do something like
var max = new[]{0}
.Concat((From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.Max();
It will throw an exception:
System.NotSupportedException: The LINQ expression node type 'NewArrayInit' is not supported in LINQ to Entities..
I would suggest just doing
(From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.OrderByDescending(x=>x).FirstOrDefault());
And the FirstOrDefault will return 0 if your list is empty.
decimal Max = (decimal?)(context.MyTable.Select(e => e.MyCounter).Max()) ?? 0;
For Entity Framework and Linq to SQL we can achieve this by defining an extension method which modifies an Expression passed to IQueryable<T>.Max(...) method:
static class Extensions
{
public static TResult MaxOrDefault<T, TResult>(this IQueryable<T> source,
Expression<Func<T, TResult>> selector)
where TResult : struct
{
UnaryExpression castedBody = Expression.Convert(selector.Body, typeof(TResult?));
Expression<Func<T, TResult?>> lambda = Expression.Lambda<Func<T,TResult?>>(castedBody, selector.Parameters);
return source.Max(lambda) ?? default(TResult);
}
}
Usage:
int maxId = dbContextInstance.Employees.MaxOrDefault(employee => employee.Id);
// maxId is equal to 0 if there is no records in Employees table
The generated query is identical, it works just like a normal call to IQueryable<T>.Max(...) method, but if there is no records it returns a default value of type T instead of throwing an exeption
I've knocked up a MaxOrDefault extension method. There's not much to it but its presence in Intellisense is a useful reminder that Max on an empty sequence will cause an exception. Additionally, the method allows the default to be specified if required.
public static TResult MaxOrDefault<TSource, TResult>(this
IQueryable<TSource> source, Expression<Func<TSource, TResult?>> selector,
TResult defaultValue = default (TResult)) where TResult : struct
{
return source.Max(selector) ?? defaultValue;
}
I just had a similar problem, my unit tests passed using Max() but failed when run against a live database.
My solution was to separate the query from the logic being performed, not join them in one query.
I needed a solution to work in unit tests using Linq-objects (in Linq-objects Max() works with nulls) and Linq-sql when executing in a live environment.
(I mock the Select() in my tests)
var requiredDataQuery = _dataRepo.Select(x => new { x.NullableDate1, .NullableDate2 });
var requiredData.ToList();
var maxDate1 = dates.Max(x => x.NullableDate1);
var maxDate2 = dates.Max(x => x.NullableDate2);
Less efficient? Probably.
Do I care, as long as my app doesn't fall over next time? Nope.