Given the following code:
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue);
outValue = 3;
//enumerating over someEnumerable here shows ints from 0 to 99
I am able to see a "snapshot" of the out parameter for each iteration. Why does this work correctly instead of me seeing 100 3's (deferred execution) or 100 99's (access to modified closure)?
First you define a query, strings that knows how to generate a sequence of strings, when queried. Each time a value is asked for it will generate a new number and convert it to a string.
Then you declare a variable, outValue, and assign 0 to it.
Then you define a new query, someEnumerable, that knows how to, when asked for a value, get the next value from the query strings, try to parse the value and, if the value can be parsed, yields the value of outValue. Once again, we have defined a query that can do this, we have not actually done any of this.
You then set outValue to 3.
Then you ask someEnumerable for it's first value, you are asking the implementation of Select for its value. To compute that value it will ask the Where for its first value. The Where will ask strings. (We'll skip a few steps now.) The Where will get a 0. It will call the predicate on 0, specifically calling int.TryParse. A side effect of this is that outValue will be set to 0. TryParse returns true, so the item is yielded. Select then maps that value (the string 0) into a new value using its selector. The selector ignores the value and yields the value of outValue at that point in time, which is 0. Our foreach loop now does whatever with 0.
Now we ask someEnumerable for its second value, on the next iteration of the loop. It asks Select for a value, Select asks Where,Where asks strings, strings yields "1", Where calls the predicate, setting outValue to 1 as a side effect, Select yields the current value of outValue, which is 1. The foreach loop now does whatever with 1.
So the key point here is that due to the way in which Where and Select defer execution, performing their work only immediately when the values are needed, the side effect of the Where predicate ends up being called immediately before each projection in the Select. If you didn't defer execution, and instead performed all of the TryParse calls before any of the projections in Select, then you would see 100 for each value. We can actually simulate this easily enough. We can materialize the results of the Where into a collection, and then see the results of the Select be 100 repeated over and over:
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.ToList()//eagerly evaluate the query up to this point
.Select(s => outValue);
Having said all of that, the query that you have is not particularly good design. Whenever possible you should avoid queries that have side effects (such as your Where). The fact that the query both causes side effects, and observes the side effects that it creates, makes following all of this rather hard. The preferable design would be to rely on purely functional methods that aren't causing side effects. In this context the simplest way to do that is to create a method that tries to parse a string and returns an int?:
public static int? TryParse(string rawValue)
{
int output;
if (int.TryParse(rawValue, out output))
return output;
else
return null;
}
This allows us to write:
var someEnumerable = from s in strings
let n = TryParse(s)
where n != null
select n.Value;
Here there are no observable side effects in the query, nor is the query observing any external side effects. It makes the whole query far easier to reason about.
Because when you enumerate the value changes one at a time and changes the value of the variable on the fly. Due to the nature of LINQ the select for the first iteration is executed before the where for the second iteration. Basically this variable turns into a foreach loop variable of a kind.
This is what deferred execution buys us. Previous methods do not have to execute fully before the next method in the chain starts. One value moves through all the methods before the second goes in. This is very useful with methods like First or Take which stop the iteration early. Exceptions to the rule are methods that need to aggregate or sort like OrderBy (they need to look at all elements before finding out which is first). If you add an OrderBy before the Select the behavior will probably break.
Of course I wouldn't depend on this behavior in production code.
I don't understand what is odd for you?
If you write a loop on this enumerable like this
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}
Because LINQ enumerates each where and select lazyly and yield each value, if you add ToArray
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
Than in the loop you will see 99 s
Edit
The below code will print 99 s
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
//outValue = 3;
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}
Related
I have 2 results of a Linq query, with which i want to do do some string operations and concatenate.
Result 1 which is the names of enabled checkboxes from group1, obtained by
var selectedCarPosts = grpBox1MCar.Controls.OfType<CheckBox>()
.Where(c => c.Checked).OrderBy(c => c.Name).Select(c => c.Name);
which yields Result 1:
NearMainGate
NearMP5WestGate
Result 2 which is the names of enabled checkboxes from group2, obtainedby
var selectedDtTypeCars = gbDataTypeMCars.Controls.OfType<CheckBox>()
.Where(c => c.Checked).OrderBy(c => c.Name).Select(c => c.Name);
which yields Result2:
WindDir
WindVel
From both results would like to get a concatenated list as follows: (Result3)
C.NearMainGate
C.NearMP5WestGate
C.WindDirNearMainGate
C.WindDirNearMP5WestGate
C.WindVelNearMainGate
C.WindVelNearMP5WestGate
These form columns in a dynamic sql query later.
I have the following code to accomplish this step by step:
var s1 = selectedCarPosts.Select(s => "C." + s); //the first 2 items in Result3
//Now to get the rest, by inserting Result2 string in each of the first 2 items of Result3
IEnumerable<string> selCarPostsArrWithC = new string[]{};
IEnumerable<string> s2 = new string[]{};
foreach (var type in selectedDtTypeCars)
{
selCarPostsArrWithC = s1.Select(s => s.Insert(2, type));//C.WindDirNearMainGate C.WindDirNearMP5WestGate in FIRST iteration and so on
s2 = s2.Concat(selCarPostsArrWithC);// as soon as the SECOND iteration starts, the previous s2 list is overwritten with the subsequent result in selCarPostsArrWithC
}
The problem here is that during code debugging, I noticed, that as soon as I tap F10 key just after the foreach line, before actually reaching the foreach block, the the previous values in s2 is overwritten with the subsequent result in selCarPostsArrWithC already. Explained below
For the first iteration s2 has result.
[0] "C.WindDirNearMainGate"
[1] "C.WindDirNearMP5WestGate"
At the beginning of second iteration before entering inside the foreach block
s2 already resets to new values with WindVel some how:
[0] "C.WindVelNearMainGate"
[1] "C.WindVelNearMP5WestGate"
Please could any one assist what am I doing wrong? How can i accomplish Result3 in bold, for the IEnumerable list?
Enumerable.Select doesn't do anything important when you call it, it merely sets things up so that the requested work will be performed later as desired.
So when you write
selCarPostsArrWithC = s1.Select(s => s.Insert(2, type));
this doesn't call string.Insert yet. string.Insert is only called when you (or your debugger) later starts iterating over selCarPostsArrWithC.
Normally, that doesn't matter, except for performance if you iterate over the enumerable multiple times. However, here, because string.Insert is called later than you expect, the arguments that you pass to it are also evaluated later than you expect. You only have a single type variable, and that variable already holds the next value by the time it gets read.
In general, you can either solve this by creating a new variable per iteration, that captures the value of type as seen during that iteration:
foreach (var type in selectedDtTypeCars)
{
var type_ = type;
selCarPostsArrWithC = s1.Select(s => s.Insert(2, type_));
s2 = s2.Concat(selCarPostsArrWithC);
}
(Nowadays, C# already does this behind the scenes for foreach, but you may need to write it out like this if using an older compiler.)
Or, alternatively, perform all of the evaluations directly inside the loop body:
foreach (var type in selectedDtTypeCars)
{
selCarPostsArrWithC = s1.Select(s => s.Insert(2, type)).ToList();
s2 = s2.Concat(selCarPostsArrWithC);
}
Although in that case, it would be better to just make s2 a List<string>, and call its AddRange method.
Basically you have a list of prefixes, and a list of suffixes. (Note: I skimmed your code, and I couldn't find where you got the C. that was appended to each of the values.)
For every prefix, you need a suffix.
This is the use of SelectMany()
I simplified your code, because you gave quite a bit of code. But here's what I came up with:
var location = new[]
{
"NearMainGate",
"NearMP5WestGate"
};
var modifier = new[]
{
"WindDir",
"WindVel"
};
var lambdaResult = modifier.SelectMany(s => location.Select(l => string.Format("{0}{1}{2}", "C.", s, l)));
var queryResult =
from m in modifier
from l in location
select string.Format("{0}{1}{2}", "C.", m, l);
Note that there's two solutions: a linq query syntax, and linq lambda syntax.
In this situation, I think the query syntax is cleaner and easier to read, but to each their own.
And here's a fiddle with proof of functionality: https://dotnetfiddle.net/Z61RsI
Assuming I have the following string array:
string[] str = new string[] {"max", "min", "avg", "max", "avg", "min"}
Is it possbile to use LINQ to get a list of indexes that match one string?
As an example, I would like to search for the string "avg" and get a list containing
2, 4
meaning that "avg" can be found at str[2] and str[4].
.Select has a seldom-used overload that produces an index. You can use it like this:
str.Select((s, i) => new {i, s})
.Where(t => t.s == "avg")
.Select(t => t.i)
.ToList()
The result will be a list containing 2 and 4.
Documentation here
You can do it like this:
str.Select((v,i) => new {Index = i, Value = v}) // Pair up values and indexes
.Where(p => p.Value == "avg") // Do the filtering
.Select(p => p.Index); // Keep the index and drop the value
The key step is using the overload of Select that supplies the current index to your functor.
You can use the overload of Enumerable.Select that passes the index and then use Enumerable.Where on an anonymous type:
List<int> result = str.Select((s, index) => new { s, index })
.Where(x => x.s== "avg")
.Select(x => x.index)
.ToList();
If you just want to find the first/last index, you have also the builtin methods List.IndexOf and List.LastIndexOf:
int firstIndex = str.IndexOf("avg");
int lastIndex = str.LastIndexOf("avg");
(or you can use this overload that take a start index to specify the start position)
First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, str.Count)
.Where(i => str[i] == "avg")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.
While you could use a combination of Select and Where, this is likely a good candidate for making your own function:
public static IEnumerable<int> Indexes<T>(IEnumerable<T> source, T itemToFind)
{
if (source == null)
throw new ArgumentNullException("source");
int i = 0;
foreach (T item in source)
{
if (object.Equals(itemToFind, item))
{
yield return i;
}
i++;
}
}
You need a combined select and where operator, comparing to accepted answer this will be cheaper, since won't require intermediate objects:
public static IEnumerable<TResult> SelectWhere<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool> filter, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (var s in source)
{
checked{ ++index; }
if (filter(s))
yield return selector(s, index);
}
}
I return a List from a Linq query, and after it I have to fill the values in it with a for cycle.
The problem is that it is too slow.
var formentries = (from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
}).ToList();
and then I use the for in this way:
for(var x=0; x<formentries.Count(); x++)
{
var values = (from e in entryvalues
where e.FormEntryID.Equals(formentries.ElementAt(x).FormEntryID)
select e).ToList<FormEntryValue>();
formentries.ElementAt(x).Values = values;
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
But it is definitely too slow.
Is there a way to make it faster?
it is definitely too slow. Is there a way to make it faster?
Maybe. Maybe not. But that's not the right question to ask. The right question is:
Why is it so slow?
It is a lot easier to figure out the answer to the first question if you have an answer to the second question! If the answer to the second question is "because the database is in Tokyo and I'm in Rome, and the fact that the packets move no faster than speed of light is the cause of my unacceptable slowdown", then the way you make it faster is you move to Japan; no amount of fixing the query is going to change the speed of light.
To figure out why it is so slow, get a profiler. Run the code through the profiler and use that to identify where you are spending most of your time. Then see if you can speed up that part.
For what I see, you are iterating trough formentries 2 more times without reason - when you populate the values, and when you convert to dictionary.
If entryvalues is a database driven - i.e. you get them from the database, then put the value field population in the first query.
If it's not, then you do not need to invoke ToList() on the first query, do the loop, and then the Dictionary creation.
var formentries = from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
};
var formEntryDictionary = new Dictionary<int, FormEntry>();
foreach (formEntry in formentries)
{
formentry.Values = GetValuesForFormEntry(formentry, entryvalues);
formEntryDict.Add(formEntry.FormEntryID, formEntry);
}
return formEntryDictionary;
And the values preparation:
private IList<FormEntryValue> GetValuesForFormEntry(FormEntry formEntry, IEnumerable<FormEntryValue> entryValues)
{
return (from e in entryValues
where e.FormEntryID.Equals(formEntry.FormEntryID)
select e).ToList<FormEntryValue>();
}
You can change the private method to accept only entryId instead the whole formEntry if you wish.
It's slow because your O(N*M) where N is formentries.Count and M is entryvalues.Count Even with a simple test I was getting more than 20 times slower with only 1000 elements any my type only had an int id field, with 10000 elements in the list it was over 1600 times slower than the code below!
Assuming your entryvalues is a local list and not hitting a database (just .ToList() it to a new variable somewhere if that's the case), and assuming your FormEntryId is unique (which it seems to be from the .ToDictionary call then try this instead:
var entryvaluesDictionary = entryvalues.ToDictionary(entry => entry.FormEntryID, entry => entry);
for(var x=0; x<formentries.Count; x++)
{
formentries[x] = entryvaluesDictionary[formentries[x].FormEntryID];
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
It should go a long way to making it at least scale better.
Changes: .Count instead of .Count() just because it's better to not call extension method when you don't need to. Using a dictionary to find the values rather than doing a where for every x value in the for loop effectively removes the M from the bigO.
If this isn't entirely correct I'm sure you can change whatever is missing to suit your work case instead. But as an aside, you should really consider using case for your variable names formentries versus formEntries one is just that little bit easier to read.
There are some reasons why this might be slow regarding the way you use formentries.
The formentries List<T> from above has a Count property, but you are calling the enumerable Count() extension method instead. This extension may or may not have an optimization that detects that you're operating on a collection type that has a Count property that it can defer to instead of walking the enumeration to compute the count.
Similarly the formEntries.ElementAt(x) expression is used twice; if they have not optimized ElementAt to determine that they are working with a collection like a list that can jump to an item by its index then LINQ will have to redundantly walk the list to get to the xth item.
The above evaluation may miss the real problem, which you'll only really know if you profile. However, you can avoid the above while making your code significantly easier to read if you switch how you iterate the collection of formentries as follows:
foreach(var fe in formentries)
{
fe.Values = entryvalues
.Where(e => e.FormEntryID.Equals(fe.FormEntryID))
.ToList<FormEntryValue>();
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
You may have resorted to the for(var x=...) ...ElementAt(x) approach because you thought you could not modify properties on object referenced by the foreach loop variable fe.
That said, another point that could be an issue is if formentries has multiple items with the same FormEntryID. This would result in the same work being done multiple times inside the loop. While the top query appears to be against a database, you can still do joins with data in linq-to-object land. Happy optimizing/profiling/coding - let us know what works for you.
I have written the following C# code:
_locationsByRegion = new Dictionary<string, IEnumerable<string>>();
foreach (string regionId in regionIds)
{
IEnumerable<string> locationIds = Locations
.Where(location => location.regionId.ToUpper() == regionId.ToUpper())
.Select(location => location.LocationId); //If I cast to an array here, it works.
_locationsByRegion.Add(regionId, LocationIdsIds);
}
This code is meant to create a a dictionary with my "region ids" as keys and lists of "location ids" as values.
However, what actually happens is that I get a dictionary with the "region ids" as keys, but the value for each key is identical: it is the list of locations for the last region id in regionIds!
It looks like this is a product of how lambda expressions are evaluated. I can get the correct result by casting the list of location ids to an array, but this feels like a kludge.
What is a good practice for handling this situation?
You're using LINQ. You need to perform an eager operation to make it perform the .Select. ToList() is a good operator to do that. List is generic it can be assigned to IEnumberable directly.
In the case where you're using LINQ it does lazy evaluation by default. ToList/eager operations force the select to occur. Before you use one of these operators the action is not performed. It is like executing SQL in ADO.NET kind of. If you have the statement "Select * from users" that doesn't actually perform the query until you do extra stuff. The ToList makes the select execute.
Your closing over the variable, not the value.
Make a local copy of the variable so you capture the current value from the foreach loop instead:
_locationsByRegion = new Dictionary<string, IEnumerable<string>>();
foreach (string regionId in regionIds)
{
var regionToUpper = regionId.ToUpper();
IEnumerable<string> locationIds = Locations
.Where(location => location.regionId.ToUpper() == regionToUpper)
.Select(location => location.LocationId); //If I cast to an array here, it works.
_locationsByRegion.Add(regionId, LocationIdsIds);
}
Then read this:
http://msdn.microsoft.com/en-us/vcsharp/hh264182
edit - Forcing a eager evaluation would also work as others have suggested, but most of the time eager evaluations end up being much slower.
Call ToList() or ToArray() after the Select(...). Thus entire collection will be evaluated right there.
Actually the question is about lookup creation, which could be achieved simpler with standard LINQ group join:
var query = from regionId in regionIds
join location in Locations
on regionId.ToLower() equals location.regionId.ToLower() into g
select new { RegionID = regionId,
Locations = g.Select(location => location.LocationId) };
In this case all locations will be downloaded at once, and grouped in-memory. Also this query will not be executed until you try to access results, or until you convert it to dictionary:
var locationsByRegion = query.ToDictionary(x => x.RegionID, x => x.Locations);
What is the best way to get the Max value from a LINQ query that may return no rows? If I just do
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).Max
I get an error when the query returns no rows. I could do
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter _
Order By MyCounter Descending).FirstOrDefault
but that feels a little obtuse for such a simple request. Am I missing a better way to do it?
UPDATE: Here's the back story: I'm trying to retrieve the next eligibility counter from a child table (legacy system, don't get me started...). The first eligibility row for each patient is always 1, the second is 2, etc. (obviously this is not the primary key of the child table). So, I'm selecting the max existing counter value for a patient, and then adding 1 to it to create a new row. When there are no existing child values, I need the query to return 0 (so adding 1 will give me a counter value of 1). Note that I don't want to rely on the raw count of child rows, in case the legacy app introduces gaps in the counter values (possible). My bad for trying to make the question too generic.
Since DefaultIfEmpty isn't implemented in LINQ to SQL, I did a search on the error it returned and found a fascinating article that deals with null sets in aggregate functions. To summarize what I found, you can get around this limitation by casting to a nullable within your select. My VB is a little rusty, but I think it'd go something like this:
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select CType(y.MyCounter, Integer?)).Max
Or in C#:
var x = (from y in context.MyTable
where y.MyField == value
select (int?)y.MyCounter).Max();
I just had a similar problem, but I was using LINQ extension methods on a list rather than query syntax. The casting to a Nullable trick works there as well:
int max = list.Max(i => (int?)i.MyCounter) ?? 0;
Sounds like a case for DefaultIfEmpty (untested code follows):
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).DefaultIfEmpty.Max
Think about what you're asking!
The max of {1, 2, 3, -1, -2, -3} is obviously 3. The max of {2} is obviously 2. But what is the max of the empty set { }? Obviously that is a meaningless question. The max of the empty set is simply not defined. Attempting to get an answer is a mathematical error. The max of any set must itself be an element in that set. The empty set has no elements, so claiming that some particular number is the max of that set without being in that set is a mathematical contradiction.
Just as it is correct behavior for the computer to throw an exception when the programmer asks it to divide by zero, so it is correct behavior for the computer to throw an exception when the programmer asks it to take the max of the empty set. Division by zero, taking the max of the empty set, wiggering the spacklerorke, and riding the flying unicorn to Neverland are all meaningless, impossible, undefined.
Now, what is it that you actually want to do?
You could always add Double.MinValue to the sequence. This would ensure that there is at least one element and Max would return it only if it is actually the minimum. To determine which option is more efficient (Concat, FirstOrDefault or Take(1)), you should perform adequate benchmarking.
double x = context.MyTable
.Where(y => y.MyField == value)
.Select(y => y.MyCounter)
.Concat(new double[]{Double.MinValue})
.Max();
int max = list.Any() ? list.Max(i => i.MyCounter) : 0;
If the list has any elements (ie. not empty), it will take the max of the MyCounter field, else will return 0.
Since .Net 3.5 you can use DefaultIfEmpty() passing the default value as an argument. Something like one of the followings ways:
int max = (from e in context.Table where e.Year == year select e.RecordNumber).DefaultIfEmpty(0).Max();
DateTime maxDate = (from e in context.Table where e.Year == year select e.StartDate ?? DateTime.MinValue).DefaultIfEmpty(DateTime.MinValue).Max();
The first one is allowed when you query a NOT NULL column and the second one is the way a used it to query a NULLABLE column. If you use DefaultIfEmpty() without arguments the default value will be that defined to the type of your output, as you can see in the Default Values Table .
The resulting SELECT will not be so elegant but it's acceptable.
Hope it helps.
I think the issue is what do you want to happen when the query has no results. If this is an exceptional case then I would wrap the query in a try/catch block and handle the exception that the standard query generates. If it's ok to have the query return no results, then you need to figure out what you want the result to be in that case. It may be that #David's answer (or something similar will work). That is, if the MAX will always be positive, then it may be enough to insert a known "bad" value into the list that will only be selected if there are no results. Generally, I would expect a query that is retrieving a maximum to have some data to work on and I would go the try/catch route as otherwise you are always forced to check if the value you obtained is correct or not. I'd rather that the non-exceptional case was just able to use the obtained value.
Try
Dim x = (From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter).Max
... continue working with x ...
Catch ex As SqlException
... do error processing ...
End Try
Another possibility would be grouping, similar to how you might approach it in raw SQL:
from y in context.MyTable
group y.MyCounter by y.MyField into GrpByMyField
where GrpByMyField.Key == value
select GrpByMyField.Max()
The only thing is (testing again in LINQPad) switching to the VB LINQ flavor gives syntax errors on the grouping clause. I'm sure the conceptual equivalent is easy enough to find, I just don't know how to reflect it in VB.
The generated SQL would be something along the lines of:
SELECT [t1].[MaxValue]
FROM (
SELECT MAX([t0].[MyCounter) AS [MaxValue], [t0].[MyField]
FROM [MyTable] AS [t0]
GROUP BY [t0].[MyField]
) AS [t1]
WHERE [t1].[MyField] = #p0
The nested SELECT looks icky, like the query execution would retrieve all rows then select the matching one from the retrieved set... the question is whether or not SQL Server optimizes the query into something comparable to applying the where clause to the inner SELECT. I'm looking into that now...
I'm not well-versed in interpreting execution plans in SQL Server, but it looks like when the WHERE clause is on the outer SELECT, the number of actual rows resulting in that step is all rows in the table, versus only the matching rows when the WHERE clause is on the inner SELECT. That said, it looks like only 1% cost is shifted to the following step when all rows are considered, and either way only one row ever comes back from the SQL Server so maybe it's not that big of a difference in the grand scheme of things.
litt late, but I had the same concern...
Rephrasing your code from the original post, you want the max of the set S defined by
(From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter)
Taking in account your last comment
Suffice to say that I know I want 0
when there are no records to select
from, which definitely has an impact
on the eventual solution
I can rephrase your problem as: You want the max of {0 + S}.
And it looks like the proposed solution with concat is semantically the right one :-)
var max = new[]{0}
.Concat((From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.Max();
Why Not something more direct like:
Dim x = context.MyTable.Max(Function(DataItem) DataItem.MyField = Value)
One interesting difference that seems worth noting is that while FirstOrDefault and Take(1) generate the same SQL (according to LINQPad, anyway), FirstOrDefault returns a value--the default--when there are no matching rows and Take(1) returns no results... at least in LINQPad.
Just to let everyone out there know that is using Linq to Entities the methods above will not work...
If you try to do something like
var max = new[]{0}
.Concat((From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.Max();
It will throw an exception:
System.NotSupportedException: The LINQ expression node type 'NewArrayInit' is not supported in LINQ to Entities..
I would suggest just doing
(From y In context.MyTable _
Where y.MyField = value _
Select y.MyCounter))
.OrderByDescending(x=>x).FirstOrDefault());
And the FirstOrDefault will return 0 if your list is empty.
decimal Max = (decimal?)(context.MyTable.Select(e => e.MyCounter).Max()) ?? 0;
For Entity Framework and Linq to SQL we can achieve this by defining an extension method which modifies an Expression passed to IQueryable<T>.Max(...) method:
static class Extensions
{
public static TResult MaxOrDefault<T, TResult>(this IQueryable<T> source,
Expression<Func<T, TResult>> selector)
where TResult : struct
{
UnaryExpression castedBody = Expression.Convert(selector.Body, typeof(TResult?));
Expression<Func<T, TResult?>> lambda = Expression.Lambda<Func<T,TResult?>>(castedBody, selector.Parameters);
return source.Max(lambda) ?? default(TResult);
}
}
Usage:
int maxId = dbContextInstance.Employees.MaxOrDefault(employee => employee.Id);
// maxId is equal to 0 if there is no records in Employees table
The generated query is identical, it works just like a normal call to IQueryable<T>.Max(...) method, but if there is no records it returns a default value of type T instead of throwing an exeption
I've knocked up a MaxOrDefault extension method. There's not much to it but its presence in Intellisense is a useful reminder that Max on an empty sequence will cause an exception. Additionally, the method allows the default to be specified if required.
public static TResult MaxOrDefault<TSource, TResult>(this
IQueryable<TSource> source, Expression<Func<TSource, TResult?>> selector,
TResult defaultValue = default (TResult)) where TResult : struct
{
return source.Max(selector) ?? defaultValue;
}
I just had a similar problem, my unit tests passed using Max() but failed when run against a live database.
My solution was to separate the query from the logic being performed, not join them in one query.
I needed a solution to work in unit tests using Linq-objects (in Linq-objects Max() works with nulls) and Linq-sql when executing in a live environment.
(I mock the Select() in my tests)
var requiredDataQuery = _dataRepo.Select(x => new { x.NullableDate1, .NullableDate2 });
var requiredData.ToList();
var maxDate1 = dates.Max(x => x.NullableDate1);
var maxDate2 = dates.Max(x => x.NullableDate2);
Less efficient? Probably.
Do I care, as long as my app doesn't fall over next time? Nope.