Max or Default? - c#

What is the best way to get the Max value from a LINQ query that may return no rows? If I just do
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).Max
I get an error when the query returns no rows. I could do
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter _
Order By MyCounter Descending).FirstOrDefault
but that feels a little obtuse for such a simple request. Am I missing a better way to do it?
UPDATE: Here's the back story: I'm trying to retrieve the next eligibility counter from a child table (legacy system, don't get me started...). The first eligibility row for each patient is always 1, the second is 2, etc. (obviously this is not the primary key of the child table). So, I'm selecting the max existing counter value for a patient, and then adding 1 to it to create a new row. When there are no existing child values, I need the query to return 0 (so adding 1 will give me a counter value of 1). Note that I don't want to rely on the raw count of child rows, in case the legacy app introduces gaps in the counter values (possible). My bad for trying to make the question too generic.

Since DefaultIfEmpty isn't implemented in LINQ to SQL, I did a search on the error it returned and found a fascinating article that deals with null sets in aggregate functions. To summarize what I found, you can get around this limitation by casting to a nullable within your select. My VB is a little rusty, but I think it'd go something like this:
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select CType(y.MyCounter, Integer?)).Max
Or in C#:
var x = (from y in context.MyTable
where y.MyField == value
select (int?)y.MyCounter).Max();

I just had a similar problem, but I was using LINQ extension methods on a list rather than query syntax. The casting to a Nullable trick works there as well:
int max = list.Max(i => (int?)i.MyCounter) ?? 0;

Sounds like a case for DefaultIfEmpty (untested code follows):
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).DefaultIfEmpty.Max

Think about what you're asking!
The max of {1, 2, 3, -1, -2, -3} is obviously 3. The max of {2} is obviously 2. But what is the max of the empty set { }? Obviously that is a meaningless question. The max of the empty set is simply not defined. Attempting to get an answer is a mathematical error. The max of any set must itself be an element in that set. The empty set has no elements, so claiming that some particular number is the max of that set without being in that set is a mathematical contradiction.
Just as it is correct behavior for the computer to throw an exception when the programmer asks it to divide by zero, so it is correct behavior for the computer to throw an exception when the programmer asks it to take the max of the empty set. Division by zero, taking the max of the empty set, wiggering the spacklerorke, and riding the flying unicorn to Neverland are all meaningless, impossible, undefined.
Now, what is it that you actually want to do?

You could always add Double.MinValue to the sequence. This would ensure that there is at least one element and Max would return it only if it is actually the minimum. To determine which option is more efficient (Concat, FirstOrDefault or Take(1)), you should perform adequate benchmarking.
double x = context.MyTable
.Where(y => y.MyField == value)
.Select(y => y.MyCounter)
.Concat(new double[]{Double.MinValue})
.Max();

int max = list.Any() ? list.Max(i => i.MyCounter) : 0;
If the list has any elements (ie. not empty), it will take the max of the MyCounter field, else will return 0.

Since .Net 3.5 you can use DefaultIfEmpty() passing the default value as an argument. Something like one of the followings ways:
int max = (from e in context.Table where e.Year == year select e.RecordNumber).DefaultIfEmpty(0).Max();
DateTime maxDate = (from e in context.Table where e.Year == year select e.StartDate ?? DateTime.MinValue).DefaultIfEmpty(DateTime.MinValue).Max();
The first one is allowed when you query a NOT NULL column and the second one is the way a used it to query a NULLABLE column. If you use DefaultIfEmpty() without arguments the default value will be that defined to the type of your output, as you can see in the Default Values Table .
The resulting SELECT will not be so elegant but it's acceptable.
Hope it helps.

I think the issue is what do you want to happen when the query has no results. If this is an exceptional case then I would wrap the query in a try/catch block and handle the exception that the standard query generates. If it's ok to have the query return no results, then you need to figure out what you want the result to be in that case. It may be that #David's answer (or something similar will work). That is, if the MAX will always be positive, then it may be enough to insert a known "bad" value into the list that will only be selected if there are no results. Generally, I would expect a query that is retrieving a maximum to have some data to work on and I would go the try/catch route as otherwise you are always forced to check if the value you obtained is correct or not. I'd rather that the non-exceptional case was just able to use the obtained value.
Try
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).Max
... continue working with x ...
Catch ex As SqlException
... do error processing ...
End Try

Another possibility would be grouping, similar to how you might approach it in raw SQL:
from y in context.MyTable
group y.MyCounter by y.MyField into GrpByMyField
where GrpByMyField.Key == value
select GrpByMyField.Max()
The only thing is (testing again in LINQPad) switching to the VB LINQ flavor gives syntax errors on the grouping clause. I'm sure the conceptual equivalent is easy enough to find, I just don't know how to reflect it in VB.
The generated SQL would be something along the lines of:
SELECT [t1].[MaxValue]
FROM (
SELECT MAX([t0].[MyCounter) AS [MaxValue], [t0].[MyField]
FROM [MyTable] AS [t0]
GROUP BY [t0].[MyField]
) AS [t1]
WHERE [t1].[MyField] = #p0
The nested SELECT looks icky, like the query execution would retrieve all rows then select the matching one from the retrieved set... the question is whether or not SQL Server optimizes the query into something comparable to applying the where clause to the inner SELECT. I'm looking into that now...
I'm not well-versed in interpreting execution plans in SQL Server, but it looks like when the WHERE clause is on the outer SELECT, the number of actual rows resulting in that step is all rows in the table, versus only the matching rows when the WHERE clause is on the inner SELECT. That said, it looks like only 1% cost is shifted to the following step when all rows are considered, and either way only one row ever comes back from the SQL Server so maybe it's not that big of a difference in the grand scheme of things.

litt late, but I had the same concern...
Rephrasing your code from the original post, you want the max of the set S defined by
(From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter)
Taking in account your last comment
Suffice to say that I know I want 0
when there are no records to select
from, which definitely has an impact
on the eventual solution
I can rephrase your problem as: You want the max of {0 + S}.
And it looks like the proposed solution with concat is semantically the right one :-)
var max = new[]{0}
.Concat((From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.Max();

Why Not something more direct like:
Dim x = context.MyTable.Max(Function(DataItem) DataItem.MyField = Value)

One interesting difference that seems worth noting is that while FirstOrDefault and Take(1) generate the same SQL (according to LINQPad, anyway), FirstOrDefault returns a value--the default--when there are no matching rows and Take(1) returns no results... at least in LINQPad.

Just to let everyone out there know that is using Linq to Entities the methods above will not work...
If you try to do something like
var max = new[]{0}
.Concat((From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.Max();
It will throw an exception:
System.NotSupportedException: The LINQ expression node type 'NewArrayInit' is not supported in LINQ to Entities..
I would suggest just doing
(From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.OrderByDescending(x=>x).FirstOrDefault());
And the FirstOrDefault will return 0 if your list is empty.

decimal Max = (decimal?)(context.MyTable.Select(e => e.MyCounter).Max()) ?? 0;

For Entity Framework and Linq to SQL we can achieve this by defining an extension method which modifies an Expression passed to IQueryable<T>.Max(...) method:
static class Extensions
{
public static TResult MaxOrDefault<T, TResult>(this IQueryable<T> source,
Expression<Func<T, TResult>> selector)
where TResult : struct
{
UnaryExpression castedBody = Expression.Convert(selector.Body, typeof(TResult?));
Expression<Func<T, TResult?>> lambda = Expression.Lambda<Func<T,TResult?>>(castedBody, selector.Parameters);
return source.Max(lambda) ?? default(TResult);
}
}
Usage:
int maxId = dbContextInstance.Employees.MaxOrDefault(employee => employee.Id);
// maxId is equal to 0 if there is no records in Employees table
The generated query is identical, it works just like a normal call to IQueryable<T>.Max(...) method, but if there is no records it returns a default value of type T instead of throwing an exeption

I've knocked up a MaxOrDefault extension method. There's not much to it but its presence in Intellisense is a useful reminder that Max on an empty sequence will cause an exception. Additionally, the method allows the default to be specified if required.
public static TResult MaxOrDefault<TSource, TResult>(this
IQueryable<TSource> source, Expression<Func<TSource, TResult?>> selector,
TResult defaultValue = default (TResult)) where TResult : struct
{
return source.Max(selector) ?? defaultValue;
}

I just had a similar problem, my unit tests passed using Max() but failed when run against a live database.
My solution was to separate the query from the logic being performed, not join them in one query.
I needed a solution to work in unit tests using Linq-objects (in Linq-objects Max() works with nulls) and Linq-sql when executing in a live environment.
(I mock the Select() in my tests)
var requiredDataQuery = _dataRepo.Select(x => new { x.NullableDate1, .NullableDate2 });
var requiredData.ToList();
var maxDate1 = dates.Max(x => x.NullableDate1);
var maxDate2 = dates.Max(x => x.NullableDate2);
Less efficient? Probably.
Do I care, as long as my app doesn't fall over next time? Nope.

Related

How to convert a LINQ statement from VB to C#

I have the following LINQ statement in VB:
Dim query As List(Of RepresentativeCase)
query = (From w In Me.RepresentativeCases
Where w.Active = True
Group w By w.PublicSubType, w.LawArea Into g = Group
Where g.Count(Function(x) x.PublicSubType.Distinct.Count) >= 100
Select g.First()
).ToList()
I am trying to convert into C# and having trouble. I got the following but it doesn't compile:
List<RepresentativeCase> query = (
from w in this.RepresentativeCases
where w.Active == true
group w by new { w.PublicSubType, w.LawArea} into g
let PublicSubType = g.Key.PublicSubType
let LawArea = g.Key.LawArea
where g.Count(x => x.PublicSubType.Distinct().Count()) > 100 // breaks here
select g.First()).ToList();
The where g.Count(x => x.PublicSubType.Distinct().Count()) > 100 line won't compile. Additionally, I am confused by the Into g = Group in the VB sample - not clear what it does.
What am I missing?
This:
Where g.Count(Function(x) x.PublicSubType.Distinct.Count) >= 100
would only compile with Option Strict Off. The outer Count call requires a delegate that returns a Boolean, but the inner Count call is returning an Integer. That means that that Integer must be implicitly converted to a Boolean.
The question you should have been asking was how to fix that VB to compile with Option Strict On and, to work that out, you should be considering what the code is actually doing. What's actually happening is that a count of zero will be converted to False and any other count will be converted to True. That means that what that code is actually doing is determining whether there are any items in the PublicSubType list. What's the proper way to do that? With the Any method:
Where g.Count(Function(x) x.PublicSubType.Any()) >= 100
Note that the Distinct call is pointless because it might change the number of items but not whether there are any items or not.
It should be pretty obvious how to convert that to C# but, so that I am answering the specific question as asked:
List<RepresentativeCase> query = (
from w in this.RepresentativeCases
where w.Active == true
group w by new { w.PublicSubType, w.LawArea} into g
let PublicSubType = g.Key.PublicSubType
let LawArea = g.Key.LawArea
where g.Count(x => x.PublicSubType.Any()) > 100
select g.First()).ToList();
In this line:
Where g.Count(Function(x) x.PublicSubType.Distinct.Count) >= 100
The outer .Count() method is expecting a lambda that returns a Boolean, where the result tells .Count() whether or not to include this item as part of the final result. But you have this:
Function(x) x.PublicSubType.Distinct.Count
The result of the above function is an Integer, not a Boolean. Also, you should write it (probably) like this:
Function(x) x.PublicSubType.Distinct().Count()
It is still legal to omit parentheses when calling methods in VB.Net, but this is for backwards compatibility with old code and is not recommended in other situations.
I'm not sure what the VB code is doing here, but with a sane configuration that would not compile in VB, either. What you have is not sane, especially as VB has some weird ideas about what Integer values can map to Boolean true and false.

How can I make Sum() return 0 instead of 'null'?

I'm trying to use LINQ-to-entities to query my DB, where I have 3 tables: Room, Conference, and Participant. Each room has many conferences, and each conference has many participants. For each room, I'm trying to get a count of its conferences, and a sum of all of the participants for all of the room's conferences. Here's my query:
var roomsData = context.Rooms
.GroupJoin(
context.Conferences
.GroupJoin(
context.Participants,
conf => conf.Id,
part => part.ConferenceId,
(conf, parts) => new { Conference = conf, ParticipantCount = parts.Count() }
),
rm => rm.Id,
data => data.Conference.RoomId,
(rm, confData) => new {
Room = rm,
ConferenceCount = confData.Count(),
ParticipantCount = confData.Sum(cd => cd.ParticipantCount)
}
);
When I try and turn this into a list, I get the error:
The cast to value type 'System.Int32' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type.
I can fix this by changing the Sum line to:
ParticipantCount = confData.Count() == 0 ? 0 : confData.Sum(cd => cd.ParticipantCount)
But the trouble is that this seems to generate a more complex query and add 100ms onto the query time. Is there a better way for me to tell EF that when it is summing ParticipantCount, an empty list for confData should just mean zero, rather than throwing an exception? The annoying thing is that this error only happens with EF; if I create an empty in-memory List<int> and call Sum() on that, it gives me zero, rather than throwing an exception!
You may use the null coalescing operator ?? as:
confData.Sum(cd => cd.ParticipantCount ?? 0)
I made it work by changing the Sum line to:
ParticipantCount = (int?)confData.Sum(cd => cd.ParticipantCount)
Confusingly, it seems that even though IntelliSense tells me that the int overload for Sum() is getting used, at runtime it is actually using the int? overload because the confData list might be empty. If I explicitly tell it the return type is int? it returns null for the empty list entries, and I can later null-coalesce the nulls to zero.
Use Enumerable.DefaultIfEmpty:
ParticipantCount = confData.DefaultIfEmpty().Sum(cd => cd.ParticipantCount)
Instead of trying to get EF to generate a SQL query that returns 0 instead of null, you change this as you process the query results on the client-side like this:
var results = from r in roomsData.AsEnumerable()
select new
{
r.Room,
r.ConferenceCount,
ParticipantCount = r.ParticipantCount ?? 0
};
The AsEnumerable() forces the SQL query to be evaluated and the subsequent query operators are client-side LINQ-to-Objects.

Why is this output variable in my LINQ expression NOT problematic?

Given the following code:
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue);
outValue = 3;
//enumerating over someEnumerable here shows ints from 0 to 99
I am able to see a "snapshot" of the out parameter for each iteration. Why does this work correctly instead of me seeing 100 3's (deferred execution) or 100 99's (access to modified closure)?
First you define a query, strings that knows how to generate a sequence of strings, when queried. Each time a value is asked for it will generate a new number and convert it to a string.
Then you declare a variable, outValue, and assign 0 to it.
Then you define a new query, someEnumerable, that knows how to, when asked for a value, get the next value from the query strings, try to parse the value and, if the value can be parsed, yields the value of outValue. Once again, we have defined a query that can do this, we have not actually done any of this.
You then set outValue to 3.
Then you ask someEnumerable for it's first value, you are asking the implementation of Select for its value. To compute that value it will ask the Where for its first value. The Where will ask strings. (We'll skip a few steps now.) The Where will get a 0. It will call the predicate on 0, specifically calling int.TryParse. A side effect of this is that outValue will be set to 0. TryParse returns true, so the item is yielded. Select then maps that value (the string 0) into a new value using its selector. The selector ignores the value and yields the value of outValue at that point in time, which is 0. Our foreach loop now does whatever with 0.
Now we ask someEnumerable for its second value, on the next iteration of the loop. It asks Select for a value, Select asks Where,Where asks strings, strings yields "1", Where calls the predicate, setting outValue to 1 as a side effect, Select yields the current value of outValue, which is 1. The foreach loop now does whatever with 1.
So the key point here is that due to the way in which Where and Select defer execution, performing their work only immediately when the values are needed, the side effect of the Where predicate ends up being called immediately before each projection in the Select. If you didn't defer execution, and instead performed all of the TryParse calls before any of the projections in Select, then you would see 100 for each value. We can actually simulate this easily enough. We can materialize the results of the Where into a collection, and then see the results of the Select be 100 repeated over and over:
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.ToList()//eagerly evaluate the query up to this point
.Select(s => outValue);
Having said all of that, the query that you have is not particularly good design. Whenever possible you should avoid queries that have side effects (such as your Where). The fact that the query both causes side effects, and observes the side effects that it creates, makes following all of this rather hard. The preferable design would be to rely on purely functional methods that aren't causing side effects. In this context the simplest way to do that is to create a method that tries to parse a string and returns an int?:
public static int? TryParse(string rawValue)
{
int output;
if (int.TryParse(rawValue, out output))
return output;
else
return null;
}
This allows us to write:
var someEnumerable = from s in strings
let n = TryParse(s)
where n != null
select n.Value;
Here there are no observable side effects in the query, nor is the query observing any external side effects. It makes the whole query far easier to reason about.
Because when you enumerate the value changes one at a time and changes the value of the variable on the fly. Due to the nature of LINQ the select for the first iteration is executed before the where for the second iteration. Basically this variable turns into a foreach loop variable of a kind.
This is what deferred execution buys us. Previous methods do not have to execute fully before the next method in the chain starts. One value moves through all the methods before the second goes in. This is very useful with methods like First or Take which stop the iteration early. Exceptions to the rule are methods that need to aggregate or sort like OrderBy (they need to look at all elements before finding out which is first). If you add an OrderBy before the Select the behavior will probably break.
Of course I wouldn't depend on this behavior in production code.
I don't understand what is odd for you?
If you write a loop on this enumerable like this
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}
Because LINQ enumerates each where and select lazyly and yield each value, if you add ToArray
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
Than in the loop you will see 99 s
Edit
The below code will print 99 s
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
//outValue = 3;
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}

LINQ - Using Select - understanding select

I find LINQ a little difficult to wrap my head around. I like the concept and believe it has lots of potential. But, after writing so much SQL the syntax is just not easy for me to swallow.
A. What is the deal with multiple ways to select?
I see that I am able to create a context and perform a Select() using a method.
context.Table.Select(lamba expression);
ok...Why would I use this? How does it compare to (or does it) this type of select?
var returnVal = from o in context.Table
orderby o.Column
select o;
B. Please explain the variable nature of
**from X** in context.Table
Why do we stick a seemingly arbitrarily named variable here? Shouldn't this be a known type of type <Table>?
So...
var returnVal = context.Table.Select(o => o);
and
var returnVal = from o in context.Table
select o;
are the same. In the second case, C# just has nice syntactic sugar to give you something closer to normal SQL syntax. Notice I removed the orderby from your second query. If you wanted that in there, then the first one would become:
var returnVal = context.Table.OrderBy(o => o.Column).Select(o => o);
As for your last question... we're not sticking an arbitrarily named variable here. We're giving a name to each row so that we can reference it later on in the statement. It is implicitly typed because the system knows what type Table contains.
In response to your comment, I wanted to add one more thought. You mentioned things getting nasty with the normal method calls. It really can. Here's a simple example where its immediately much cleaner (at least, if you're used to SQL syntax) in the LINQ syntax:
var returnVal = context.Table.OrderBy(o => o.Column1)
.ThenBy(o => o.Column2)
.ThenBy(o => o.Column3)
.Select(o => o);
versus
var returnVal = from o in context.Table
orderby o.Column1, o.Column2, o.Column3
select o;
A: this is the same. The compiler transforms the query expression to method calls. Exactly the same.
B: The x is the same as in foreach(var X in context.Table). You define a name for an individual element of the table/sequence.
In B, X's type is implicit. You could just as easily do something like:
from Row x in context.Table
and it would be the same. In A, there isn't any difference between using a lambda and the equivalent full-LINQ syntax, except that you would never do .Select(x => x). It's for transforming items. Say you had a list of integers, .Select(x => x * x) would return the square of each of them.

Linq Union: How to add a literal value to the query?

I need to add a literal value to a query. My attempt
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
In the above example, I get an error:
"Local sequence cannot be used in LINQ to SQL implementation
of query operators except the Contains() operator."
If I am using Entity Framework 4 for example, what could I add to the Union statement to always include the "seed" ID?
I am trying to produce SQL code like the following:
select distinct ID
from product
union
select 0 as ID
So later I can join the list to itself so I can find all values where the next highest value is not present (finding the lowest available ID in the set).
Edit: Original Linq Query to find lowest available ID
var skuQuery = Context.Products
.Where(p => p.sku > skuSeedStart &&
p.sku < skuSeedEnd)
.Select(p => p.sku).Distinct();
var lowestSkuAvailableList =
(from p1 in skuQuery
from p2 in skuQuery.Where(a => a == p1 + 1).DefaultIfEmpty()
where p2 == 0 // zero is default for long where it would be null
select p1).ToList();
var Answer = (lowestSkuAvailableList.Count == 0
? skuSeedStart :
lowestSkuAvailableList.Min()) + 1;
This code creates two SKU sets offset by one, then selects the SKU where the next highest doesn't exist. Afterward, it selects the minimum of that (lowest SKU where next highest is available).
For this to work, the seed must be in the set joined together.
Your problem is that your query is being turned entirely into a LINQ-to-SQL query, when what you need is a LINQ-to-SQL query with local manipulation on top of it.
The solution is to tell the compiler that you want to use LINQ-to-Objects after processing the query (in other words, change the extension method resolution to look at IEnumerable<T>, not IQueryable<T>). The easiest way to do this is to tack AsEnumerable() onto the end of your query, like so:
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().AsEnumerable().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
Up front: not answering exactly the question you asked, but solving your problem in a different way.
How about this:
var a = Products.Select(p => p.sku).Distinct().ToList();
a.Add(0);
a.Dump(); // LinqPad's way of showing the values
You should create database table for storing constant values and pass query from this table to Union operator.
For example, let's imagine table "Defaults" with fields "Name" and "Value" with only one record ("SKU", 0).
Then you can rewrite your expression like this:
var zero = context.Defaults.Where(_=>_.Name == "SKU").Select(_=>_.Value);
var result = context.Products.Select(p => p.sku).Distinct().Union(zero).ToList();

Categories