Linq Sum() precision

Linq Sum() precision - c#

In my project I have been using Linq's Sum() a lot. It's powered by NHibernate on MySQL. In my Session Factory I have explicitly asked NHibernate to deal with exactly 8 decimal places when it comes to decimals:
public class DecimalsConvention : IPropertyConvention
{
public void Apply(IPropertyInstance instance)
{
if (instance.Type.GetUnderlyingSystemType() == typeof(decimal))
{
instance.Scale(8);
instance.Precision(20);
}
}
}
However, I found out that .Sum() rounds up the numbers in 5 decimal places:
var opasSum = opasForThisIp.Sum(x => x.Amount); // Amount is a decimal
In the above statement opaSum equals to 2.46914 while it should be 2.46913578 (calculated directly on MySQL). opasForThisIp is of type IQueryable<OutgoingPaymentAssembly>.
I need all the Linq calculations to handle 8 decimal places when it comes to decimals.
Any ideas of how to fix this?
Edit 1: I have found var opasSum = Enumerable.Sum(opasForThisIp, opa => opa.Amount); to produce the correct result, however the question remains, why .Sum() rounds up the result and how can we fix it?
Edit 2: The produced SQL seems to be problematic:
select cast(sum(outgoingpa0_.Amount) as DECIMAL(19,5)) as col_0_0_
from `OutgoingPaymentAssembly` outgoingpa0_
where outgoingpa0_.IncomingPayment_id=?p0
and (outgoingpa0_.OutgoingPaymentTransaction_id is not null);
?p0 = 24 [Type: UInt64 (0)]
Edit 3: var opasSum = opasForThisIp.ToList().Sum(x => x.Amount); also produces the correct result.
Edit 4: Converting the IQueryable<OutgoingPaymentAssembly> to an IList<OutgoingPaymentAssembly> made the original query: var opasSum = opasForThisIp.Sum(x => x.Amount); to work.

x.Amount is being converted to a low precision minimum type from "LINQ-to-SQL" conversion, because your collection is IQueryable.
There are several workarounds, the easiest of which is to change the type of your collection to IList, or call ToList() on your collection, forcing the linq query to run as LINQ-to-Objects.
var opasSum = opasForThisIp.ToList().Sum(x => x.Amount);
Note:
If you don't want to lose deferred execution by moving away from the IQueryable, you could try casting the Amount to a decimal inside of the linq query.
From MSDN decimal and numeric (Transact-SQL):
In Transact-SQL statements, a constant with a decimal point is
automatically converted into a numeric data value, using the minimum
precision and scale necessary. For example, the constant 12.345 is
converted into a numeric value with a precision of 5 and a scale of 3.
Edit (to include great explanation of different .NET collection types:
Taken from the answer to this SO question.
IQueryable is intended to allow a query provider (for example, an
ORM like LINQ to SQL or the Entity Framework) to use the expressions
contained in a query to translate the request into another format. In
other words, LINQ-to-SQL looks at the properties on the entities that
you're using along with the comparisons you're making and actually
creates a SQL statement to express (hopefully) an equivalent request.
IEnumerable is more generic than IQueryable (though all
instances of IQueryable implement IEnumerable) and only defines
a sequence. However, there are extension methods available within the
Enumerable class that define some query-type operators on that
interface and use ordinary code to evaluate these conditions.
List is just an output format, and while it implements
IEnumerable, is not directly related to querying.
In other words, when you're using IQueryable, you're defining and
expression that gets translated into something else. Even though
you're writing code, that code never gets executed, it only gets
inspected and turned into something else, like an actual SQL query.
Because of this, only certain things are valid within these
expressions. For instance, you cannot call an ordinary function that
you define from within these expressions, since LINQ-to-SQL doesn't
know how to turn your call into a SQL statement. Most of these
restrictions are only evaluated at runtime, unfortunately.
When you use IEnumerable for querying, you're using
LINQ-to-Objects, which means you are writing the actual code that is
used for evaluating your query or transforming the results, so there
are, in general, no restrictions on what you can do. You can call
other functions from within these expressions freely.

Related

C# LINQ to Entities does not recognize the method 'System.String method, and this method cannot be translated into a store expression

It is just an example of my situation
var query =
Employee.Join(department,
emp => emp.depId,
dep=> dep.Id,
(emp, dep) =>
new EmployeeModel{ Name = emp.Name, Total = GetTotal(emp)});
public string GetTotal(emp)
{
//dynamically decide which column to pass to the stored procedure, that is
//why I am passing the whole object
total = sp(emp.column1, emp.column2);//here I pass the parameters to SP
return total;
}
And I get an exception here, I do not know how to solve it.
Any help would be greatly appreciated.
Thanks

Of course this is going to happen. Linq to entities doesn't know what GetTotal() is, and given how it is going to call a stored procedure for each row returned I suggest you move the join itself to a stored procedure and forget LINQ for a while here.

You have to understand the difference between IEnumerable and IQueryable.
IEnumerable
An object that implements IEnumerable, represents and enumerable sequence. It holds everything to ask for the first element of the sequence, and as long as you've got an element, you can ask for the next one.
At its lowest level this is done by calling GetEnumerator and repeatedly calling MoveNext / Current, until there are no more elements.
Higher level methods as foreach and IEnumerable LINQ methods that don't return IEnumerable<TResult>, like ToList / ToDictionary / Count / FirstOrDefault / Any / etc. all deep inside use GetEnumerator / MoveNext / Current.
IQueryable
An object that implements IQueryable, doesn't represent the enumerable sequence itself, it represents the potential to get an Enumerable sequence.
For this, an IQueryable has an Expression and a Provider.
The Expression represents what data must be fetched in some generic format. The Provider knows who will provide the data (quite often a database management system) and what language is used to communicate with this DBMS (usually SQL).
As long as you concatenate LINQ methods that return IQueryable<...>, you are only changing the Expression. The DBMS is not contacted.
When you start enumerating the sequence, either directly by calling GetEnumerator / MoveNext, or indirectly by using foreach, or calling a LINQ method that returns anything but IQueryable<...>, then the Expression is sent to the Provider who will translate the Expression into SQL and fetch the data. The fetched data is represented as an enumerable sequence, which your code can access using MoveNext / Current repeatedly.
The problem is, that your provider doesn't know how to translate the call to GetTotal(...) into SQL. In fact, although the guys who wrote Entity Framework did a marvellous job, there are several LINQ methods that your Provider can't handle. See Supported and Unsupported LINQ Methods (LINQ to Entities).
So you can't call your own methods when using IQueryable. Luckily there are several solutions for this.
AsEnumerable
Your DBMS is extremely optimized to select the data in the query. One of the slower parts of a database query is the transfer from the selected data from the DBMS to your local process. Hence it is wise to limit the amount of data being transferred.
If your DBMS doesn't need the output of your method, consider using AsEnumerable(). This will transfer your selected data to your local process in a smart way.
Depending on the Provider, it can fetch the data "per page", instead of fetching all data, so if you end your LINQ with Take(3).ToList(), it will not have fetched all 1000 Customers from your database.
The result of the fetched data is represented as an IEnumerable<...> to you. The LINQ statements after that are executed on your local machine, so you can call your local methods.
Alas, as you use GetTotal in your Join, you can't use this method. If you would use AsEnumerable before the Join, you would transfer all Employees and all Departments to your local process, who had to do the join.
Convert GetEmp
You use GetEmp in parameter resultSelector of your Join. This parameter has the following generic format
System.Linq.Expressions.Expression<Func<TOuter,TInner,TResult>> resultSelector
Of in your case:
Expression<Employee, Department, EmployeeModel>
So you'll have to change your GetEmp, such that it creates this expression.
Expression<Employee, Department, EmployeeModel> EmployeeModelExpression =
(employee, department) => new EmployeeModel
{
Name = employee.Name,
Total = ???
};
Alas you forgot to tell us how you calculate the Total in SP.
You should only use fairly simple calculations, like:
Total = employee.Column1 + employee.Column2,
Total = (employee.Column1 < 50) ? employee.Column1 : employee.Column2,
Total = employee.Column1 ?? employee.Column2

How do I calculate a checksum on all columns in a row using LINQ and Entity Framework?

The query I am trying to execute is similar to this:
var checksum = from i in db.Items
where i.Id == id
select SqlFunctions.Checksum("*");
However, this returns the checksum value of the string "*" rather than evaluating the wildcard. Is there a way to calculate the checksum of all the columns instead?
Update:
var checksum = db.Database.SqlQuery<int?>("SELECT CHECKSUM(*) FROM [Catalog].[Item] WHERE Id = #p0", id);
This gives me the result I want but seems dirty. Is there a way to do this without inline SQL?

This can be done with the SqlFunctions class. This Class allows for linq-to-entities code to include methods that are easily converted to Sql.
First of all in your current edit: Using inline SQL is not 'dirty' and is totally fine in most (if not all) cases. ORMs don't provide everything, especially if there isn't a good object-column mapping that exists. However, since you're using entity framework you might as well get aquanted with the SqlFunctions static methods.
In this case there are a lot of overloads for performing a checksum, however they must all be of the same type. Since you didn't post what types your columns or how many you have, I don't want to recommend the wrong overload in an example for you to use.
Here are your options:
SqlFunctions.Checksum():
bool?
char[]
DateTime?
DateTimeOffset?
Decimal?
double?
Guid?
TimeSpan?
String
All of the above have overloads to allow up to 3 parameters (of the same type).
SqlFunctions.AggregateChecksum():
IEnumerable<int>
IEnumerable<int?>
If you take a look at the documentation for these functions you'll see that the parameters that you're passing are VALUES, not column names. So you should be using them inside of a Select() clause. This is why when you passed "*" to the operation it checksummed the string containing a single asterisk instead of all columns. Also, keep in mind that these functions cannot be called directly, and must only be used within a Linq-To-Entities query.
Let's assume your columns named "ItemName" & "Description" are both strings, and you also want your id, which is an int:
var checksum = db.Items.Where(i => i.Id == id)
.Select(i => SqlFunctions.Checksum(i.Id.ToString(), i.ItemName, i.Description));
Unfortunately, as you see in the above example we had to cast our int to a string. There are no overloads that allow for different typed parameters for computing a checksum, nor are there any options that allow for more than 3 parameters in the checksum function; however, as I mentioned above sometimes you need to do an inline SQL command and this is OK.

Reusable functions for use with Linq-to-Entities

I have some stats code that I want to use in various places to calculate success / failure percentages of schedule Results. I recently found a bug in the code and this was due to the fact it was replicated in each LINQ statement, I then decided it would be better to have common code to do this. The problem being, of course, is that a normal function, when executed on SQL server, throws a NotSupportedException because the fuinction doesnt exist in SQL Server.
How can I write a reusable stats code that gets executed on SQL server or is this not possible?
Here is the code I have written for Result
public class Result
{
public double CalculateSuccessRatePercentage()
{
return this.ExecutedCount == 0 ? 100 : ((this.ExecutedCount - this.FailedCount) * 100.0 / this.ExecutedCount);
}
public double CalculateCoveragePercentage()
{
return this.PresentCount == 0 ? 0 : (this.ExecutedCount * 100.0 / this.PresentCount);
}
}
And it is used like so (results is IQueryable, and throws the exception):
schedule.SuccessRatePercentage = (int)Math.Ceiling(results.Average(r => r.CalculateSuccessRatePercentage()));
schedule.CoveragePercentage = (int)Math.Ceiling(results.Average(r => r.CalculateCoveragePercentage()));
or like this (which works, because we do this on a single result)
retSchedule.SuccessRatePercentage = (byte)Math.Ceiling(result.CalculateSuccessRatePercentage());
retSchedule.CoveragePercentage = (byte)Math.Ceiling(result.CalculateCoveragePercentage());
Edit
As per #Fred's answer I now have the following code, which works for an IQueryable
schedule.SuccessRatePercentage = (int)Math.Ceiling(scheduleResults.Average(ScheduleResult.CalculateSuccessRatePercentageExpression()));
schedule.CoveragePercentage = (int)Math.Ceiling(scheduleResults.Average(ScheduleResult.CalculateCoveragePercentageExpression()));
The only problem, albeit a minor one, is that this code will not work for individual results i.e.
retSchedule.SuccessRatePercentage = (byte)Math.Ceiling(/* How do I use it here for result */);

You can't pass functions to SQL - you would need to declare the function on the actual SQL database and then call that from your code.
What you could do/try is this:
Expression<Func<Result, double>> CalculateCoveragePercentage()
{
return r => r.PresentCount == 0 ? 0 : (r.ExecutedCount * 100.0 / r.PresentCount);
}
It needs to be interpreted instead of executed so that EF can translate it to SQL. The problem is, I've only heard of this being possible when it's passed directly into a where clause.
Since you are able to do these calculations when you apply them directly inside of your LINQ query, I'm inclined to think that it should also be possible to declare those calculations as Expression<Func<..., ...>> and them pass them in.
The only way to know for sure is to try (unless you feel like looking into EF's ExpressionBuilder)
UPDATE:
I should have mentioned that, if this would work, you need to pass this expression into a Select statement:
// Assuming you have Results declared as a DbSet or IDbSet, such as:
DbSet<Result> Results
// You could do something like this (just to illustrate that
// it would be interpreted rather than executed):
List<double> allCoveragePercentages = Results.Select(CalculateCoveragePercentage)
.ToList();
UPDATE #2:
In order for this to work with individual results (or in any case whatsoever), you need to pass it into a clause that accepts the expression. Examples are Select, Where, Average (apparently), anything that does not returns results.
From the top of my head (I'm sure I'm missing a few):
List: ToArray, ToDictionary, ToList, ToLookup
Single result: First, FirstOrDefault, Single, SingleOrDefault, Last, LastOrDefault
Computation: Count, Sum, Max, Min
Since the above clauses return results, they (for as far as I know) only accept Predicates (a function that can only return 'true' or 'false')
You may have coincidentally got it right with your .Average(CalculateCoveragePercentage)
So if you were to get a single result with .FirstOrDefault(), you would pass in your expression inside of a select clause right before that: .Select(CalculateCoveragePercentage).FirstOrDefault(). That is, if you don't need the actual entity but just the calculation. Be aware though that this particular example will return 0 if there were no Result results. You may or may not want this behavior.
Of course, if you already have your result (it's not an IQueryable anymore) then you can simple do:
var coveragePercentage = CalculateCoveragePercentage().Compile().Invoke(result);
But that would kind of defeat the purpose of the expression - for this situation you should just add a method to your Result class that calculates the CoveragePercentage of a given instance.

Passing query data from LINQ to method in same query

I was able to create a LINQ statement that I thought was strange and wanted to see if anyone else had experience with it.
I've simplified it to this:
var x = db.Test
.Where(a => a.Field1 == Utils.CreateHash(Preferences.getValue(a.Field2)))
.FirstOrDefault();
Now how does this translate to database code? Wouldn't LINQ need to do a double query for every single row, i.e. for row a:
1) Query a.Field2
2) Return value to run Utils.CreateHash(Preferences.getValue(a.Field2))
3) Take that value from step 2 and compare it against a.Field1
4) Repeat 1-3 until I've gone through all the rows or returned a matching row
Wouldn't this be extremely inefficient? Or is LINQ smart enough to run this in a better way? Note, I haven't actually run this code so another possibility is a runtime error. Why wouldn't LINQ be smart enough to detect a conflict then and not let me compile it?

The query as is will not work since have a call to Utils.CreateHash in your lambda that you are trying to execute on the DB - in that context you cannot execute that method since there simply is no equivalent on the DB side hence the query will fail.
In general the ability of 3rd party Linq IQuerable providers (e.g. Linq to SQL, Linq to Entities) to access in memory constructs such as methods or classes is very limited, as a rule of thumb at most accessing primitive values or collections of primitives will work.

Just to add fast...
A good example to know how this works would be to write (extreme case I agree, but best :) or go through the source code for a custom (open source) LINQ provider (e.g. http://relinq.codeplex.com/ has one etc.).
Basically (I'm simplifying things here a bit), a LINQ provider can only 'map' to Db (supported SQL, functions) what he 'knows' about.
i.e. it has a standard set it can work with, other than that, and with your custom methods (that do not translate to constants etc.) in the frame, there is no way to resolve that on the 'Db/SQL side'.
E.g. with your 'custom' linq provider (not the case here) you could add a specific extension call e.g. .MyCalc() - which would be properly resolved and translated into SQL equivalent - and then you'd be able to use it.
Other than that, I think if I recall correct, provider will leave that as an expression, to resolve when it returns from the Db 'fetch', query operation. Or complain about it in certain cases.
Linq is based on IQueryable - and you can take a look at extension methods provided there for SQL equivalents supported.
hope this helps
EDIT: whether things 'work' or not doesn't matter - it still doesn't mean it'd execute on the Db context - i.e. it'd be unacceptable performance wise in most cases. IQueryable works with expressions (and if you look at the interface) - and linq is executed when you invoke or enumerate usually. At that point some of the expressions may evaluate to a const value that can be worked into a SQL, but not in your case.
Best way to test is to test back the SQL generated by query (possibly this one I think Translate LINQ to sql statement).

No.
The LINQ provider will run a single SELECT query that selects both fields, then execute your lambda expression with the two values for each returned row.

How to call a method in the where clause of a LINQ query on a IQueryable object

I have an IQueryable of MyType obtained via EF 4.1.
I am applying filters via linq in the form of a where clause, One of which will filter based on distance from a given zip code.
MyType has a ZipCode property and I need to call a method which computes the distance between the MyType zip codes and my given zip code.
I have tried the following, which compiles, but throws an error at runtime.
myTypes = myTypes.Where(x => GetDistance(x.Zip, givenZip) < 10);
How can I accompish this?
EDIT
My Distance method returns a double that represents the distance in miles
public double Distance(Position position1, Position position2)
{
}
Position is a struct containing doubles for lat and long

This should work in Linq to Objects if GetDistance() returns a boolean - it will not work with Linq to Entities since it will try to map your method to a SQL equivalent, which of course there is none.
As a crude workaround you could use AsEnumerable() but that would materialize all your types so is not recommended if your table is larger:
myTypes = myTypes.AsEnumerable()
.Where(x => GetDistance(x.Zip, givenZip) < 10);
Another way would be to map Zip codes to geographic locations in the database and use those locations directly with the soon to be supported spatial data types - this is probably the best approach in my opinion, but not production-ready. Of course if you are restricted to just SQL Server you could just use a store query directly to use geo-locations - but that would work around EF.

This will throw an error because the runtime tries to convert your expression tree into SQL. The function 'GetDistance' cannot be converted.
Have a look at Model Defined Functions. They allow you to define a custom function in your edmx which you can execute when building queries.

Assuming:
List<myType> myTypes;
try:
myTypes = myTypes.Where(x => GetDistance(x.Zip, givenZip) < 10).ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.