Trim function within a Where Clause - c#

Say you have a Collection of items
IList<string> items = List<string>();
And you want to search each element against some search term: term. If you performed a Trim() within the Where clause as follows, would the trim operation occur for each element in the sequence, or would it be compiled once and then used to check against the elements?
items.Where(o => o.Contains(term.Trim())).ToList();
This question was based on a LINQ to SQL statement but I have simplified it. (Unsure if compiling to SQL would make any difference).

If it's in memory, then yes, it is called every time. A simple test:
public static class BLah
{
public static string DoTrim(this string item)
{
Console.WriteLine("called");
return item.Trim();
}
}
IList<string> items = new List<string> { "a", "b", "c" };
items.Where(o => o.Contains(term.DoTrim()));
Prints out 'called' 3 times.
However, when executing on the database, it's completely up to the ORM to decide how to generate the SQL, which may or may not call TRIM() as part of the query. The only way to know is to test it and see what SQL it generates.
For Linq2Sql, it does not run the trim multiple times. For example:
string term = "a b ";
Warehouses.Where(w => w.AdminComment.Contains(term.DoTrim()));
Produces
-- Region Parameters
DECLARE #p0 NVarChar(1000) = '%a b%'
-- EndRegion
SELECT [t0].[Id], [t0].[Name], [t0].[AdminComment], [t0].[AddressId]
FROM [Warehouse] AS [t0]
WHERE [t0].[AdminComment] LIKE #p0

It's worth noting here a difference between object-based linq that uses Func<…> and expression-based linq that uses Expression<Func<…>>.
In object-based linq, o => o.Contains(term.Trim()) is used as a Func<string, bool>. We can reason about it as such.
Now, considering o => o.Contains(term.Trim()) as such. It has a captured variable term on which the method Trim() is called. And so that's just what happens.
There are three reasons why Trim() must be called every time.
Considering just this delegate, we do not know that Trim() is pure and hence would always return the same value for the same object.
Considering just this delegate we do not know that term is immutable, and so Trim() would return the same value for the same object even if we did know it was pure.
Considering just this delegate, We don't know that the captured variable isn't changed between calls.
In all a delegate can be reasoned about pretty much like a method if we consider the captured variables to be mutable fields. If you looked at:
class SomeClass
{
public string term;
public bool Predicate(string o)
{
return o.Contains(term.Trim());
}
}
You wouldn't expect the call to Trim() to be cached for reasons analogous to the three reasons above.
Now, with expression-based linq, such as Linq2SQL or EF it's a bit different.
In this case o => o.Contains(term.Trim()) is used as a Expression<Func<string, bool>>, and there's a lot more variety in what a provider might do with that.
The provider might have special knowledge to know that term.Trim() could only ever have the one value, and re-write the expression to o => o.Contains("/* result of single call to term.Trim() goes here */")
The provider might have no idea what to do with Trim() and throw an exception.
The provider might not do anything special in how it turns term.Trim() into SQL, but the database processing that SQL in turn might itself realise it needs only to be calculated once, and so do so.
None of the above might happen, with it being essentially the same as the object-based case.
Expressions handling is a lot more flexible in terms of what may or may not happen to optimise the handling.

That is not an L2S IQueryable call. If it were a proper L2S call, then it would be converted to:
field LIKE #p0
where #p0 set to %term%
For example, with Northwind sample database:
var term = "USA";
var items = db.Customers
.Where(c => c.Address.Contains( term.Trim() ))
.ToList();
would produce (thanks to LinqPad):
-- Region Parameters
DECLARE #p0 NVarChar(1000) SET #p0 = '%USA%'
-- EndRegion
SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [Customers] AS [t0]
WHERE [t0].[Address] LIKE #p0
Probably you would likely want to write that as this one instead:
var term = "USA".Trim();
var items = Customers
.Where(c => c.Address.Contains( term ))
.ToList();
Generated SQL would be the same.

Why should the result of term.Trim be cached in any way? There is no runtime-magic included that would imply that the result of this operation never changes and therefore the operation is always executed for every single element. To improve this you may cache the result yourself:
var test = term.Trim();
items.Where(o => o.Contains(test).ToList();

Related

Simulate Entity Framework's .Last() when using SQL Server

SQL server is able to translate EF .First() using its function TOP(1). But when using Entity Framework's .Last() function, it throws an exception. SQL server does not recognize such functions, for obvious reasons.
I used to work it around by sorting descending and taking the first corresponding line :
var v = db.Table.OrderByDescending(t => t.ID).FirstOrDefault(t => t.ClientNumber == ClientNumberDetected);
This does it with a single query, but sorting the whole table (million rows) before querying...
Do I have good reasons to think there will be speed issues if I abuse of this technique ?
I thought of something similar... but it requires two query :
int maxID_of_Client = db.Where(t => t.ClientNumber == ClientNumberDetected).Max(t => t.ID);
var v = db.First(t => t.ID == maxID_of_Client);
It's consisting of retrieving the max ID of the client, then use this ID to retrieve the last line of the client.
It doesn't seems faster to query two times...
There must be a way to optimize this and use a single query without sorting millions of datas.
Unless there is something I don't understand, I'm probably not the first to think about this problem and I want to solve it for good !
Thanks in advance.
The assumption driving this question is that result sets with no ordering clause come back from your DB in any predictable order at all.
In reality, result sets that come back from SQL have no implicit ordering and none should be assumed.
Therefore, the result of
db.Table.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is actually indeterminate.
Whether you're taking first or last, without ordering it's all meaningless anyway.
Now, what goes to SQL where you add an ordering clause to your LINQ? It will be something similar to...
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue
or, in the the descending/last case
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue DESC
From SQL's POV, there's no significant difference here and your DB will be optimized for this sort of query. The index can be scanned in either direction, and the cost of each query above is the same.
TL;DR :
db.Table.OrderByDescending(t => t.ID)
.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is just fine.

Reusable functions for use with Linq-to-Entities

I have some stats code that I want to use in various places to calculate success / failure percentages of schedule Results. I recently found a bug in the code and this was due to the fact it was replicated in each LINQ statement, I then decided it would be better to have common code to do this. The problem being, of course, is that a normal function, when executed on SQL server, throws a NotSupportedException because the fuinction doesnt exist in SQL Server.
How can I write a reusable stats code that gets executed on SQL server or is this not possible?
Here is the code I have written for Result
public class Result
{
public double CalculateSuccessRatePercentage()
{
return this.ExecutedCount == 0 ? 100 : ((this.ExecutedCount - this.FailedCount) * 100.0 / this.ExecutedCount);
}
public double CalculateCoveragePercentage()
{
return this.PresentCount == 0 ? 0 : (this.ExecutedCount * 100.0 / this.PresentCount);
}
}
And it is used like so (results is IQueryable, and throws the exception):
schedule.SuccessRatePercentage = (int)Math.Ceiling(results.Average(r => r.CalculateSuccessRatePercentage()));
schedule.CoveragePercentage = (int)Math.Ceiling(results.Average(r => r.CalculateCoveragePercentage()));
or like this (which works, because we do this on a single result)
retSchedule.SuccessRatePercentage = (byte)Math.Ceiling(result.CalculateSuccessRatePercentage());
retSchedule.CoveragePercentage = (byte)Math.Ceiling(result.CalculateCoveragePercentage());
Edit
As per #Fred's answer I now have the following code, which works for an IQueryable
schedule.SuccessRatePercentage = (int)Math.Ceiling(scheduleResults.Average(ScheduleResult.CalculateSuccessRatePercentageExpression()));
schedule.CoveragePercentage = (int)Math.Ceiling(scheduleResults.Average(ScheduleResult.CalculateCoveragePercentageExpression()));
The only problem, albeit a minor one, is that this code will not work for individual results i.e.
retSchedule.SuccessRatePercentage = (byte)Math.Ceiling(/* How do I use it here for result */);
You can't pass functions to SQL - you would need to declare the function on the actual SQL database and then call that from your code.
What you could do/try is this:
Expression<Func<Result, double>> CalculateCoveragePercentage()
{
return r => r.PresentCount == 0 ? 0 : (r.ExecutedCount * 100.0 / r.PresentCount);
}
It needs to be interpreted instead of executed so that EF can translate it to SQL. The problem is, I've only heard of this being possible when it's passed directly into a where clause.
Since you are able to do these calculations when you apply them directly inside of your LINQ query, I'm inclined to think that it should also be possible to declare those calculations as Expression<Func<..., ...>> and them pass them in.
The only way to know for sure is to try (unless you feel like looking into EF's ExpressionBuilder)
UPDATE:
I should have mentioned that, if this would work, you need to pass this expression into a Select statement:
// Assuming you have Results declared as a DbSet or IDbSet, such as:
DbSet<Result> Results
// You could do something like this (just to illustrate that
// it would be interpreted rather than executed):
List<double> allCoveragePercentages = Results.Select(CalculateCoveragePercentage)
.ToList();
UPDATE #2:
In order for this to work with individual results (or in any case whatsoever), you need to pass it into a clause that accepts the expression. Examples are Select, Where, Average (apparently), anything that does not returns results.
From the top of my head (I'm sure I'm missing a few):
List: ToArray, ToDictionary, ToList, ToLookup
Single result: First, FirstOrDefault, Single, SingleOrDefault, Last, LastOrDefault
Computation: Count, Sum, Max, Min
Since the above clauses return results, they (for as far as I know) only accept Predicates (a function that can only return 'true' or 'false')
You may have coincidentally got it right with your .Average(CalculateCoveragePercentage)
So if you were to get a single result with .FirstOrDefault(), you would pass in your expression inside of a select clause right before that: .Select(CalculateCoveragePercentage).FirstOrDefault(). That is, if you don't need the actual entity but just the calculation. Be aware though that this particular example will return 0 if there were no Result results. You may or may not want this behavior.
Of course, if you already have your result (it's not an IQueryable anymore) then you can simple do:
var coveragePercentage = CalculateCoveragePercentage().Compile().Invoke(result);
But that would kind of defeat the purpose of the expression - for this situation you should just add a method to your Result class that calculates the CoveragePercentage of a given instance.

LINQ: When to use Compiled Queries?

I'd like some expert advice on this. I've used compiled queries before, but for this particular case, i'm not sure whether it's appropriate.
It's a search form where the query changes and is dependent on what is being searched on.
static Func<DBContext, int, IQueryable<Foo>> Search = CompiledQuery.Compile(
(DBContext db, int ID) =>
db.Person
.Where(w => w.LocationID = ID)
.Select(s =>
new Foo
{
Name = s.PersonName,
Age = s.Age,
Location = s.LocationName,
Kin = s.Kin
}));
Now if someone fills in the search box, i want to extend the query by adding another Where statement to the query:
var query = Search(context, 123);
query = query.Where(w => w.Name.Contains(searchString));
So my question is, is it returning all the results where LocationID == 123, then checking the results for a searchString match? Or is it actually extending the compiled query?
If it's the former (which i suspect it is), should scrap the CompiledQuery and just create a method that extends the query then return it as a list?
Also, what are the best practices for CompiledQuery usage and is there a guideline of when they should be used?
Note: I'm using the above in an ASP.NET website with Linq to SQL. Not sure if that makes any difference.
Thanks
The problem is that the compiled query is set in stone; it knows what SQL it will run against the database. The lambda expression is lazy loaded however, and cannot modify the compile query as it is being run during run time. The bad news is that it will return all of the records from the database, but it will query those records in memory to further refine them.
If you want to compile the query then I would suggest writing two queries with different signatures.
As far as I know, it is good practice to compile your query once, that is the whole point of pre-compiled query(and that's why your pre-compiled query is static), it saves time to compile that query into SQL. If it extend that pre-compiled query, then it is compiling that query again, which you loose gains.
Query result on result (your query variable) is no longer LINQ to SQL.
Just include your additional condition in your compiled query.
DB.Person.Where(w => w.LocationID == ID
& (searchString=="" || w.Name.Contains(searchString)))
If i am right then you need some dynamic where clause in linq. So for that i would suggest go this way
IEnumerable list;
if(condition1)
{
list = Linq Statement;
}
if(condition2)
{
list = from f in list where con1=con && con2=con select f;
}
if(condition3)
{
list = from n in list con1=con && con2=con select f;
}
I hope you got my words.

LINQ: adding where clause only when a value is not null

I know a typical way is like this:
IQueryable query = from staff in dataContext.Staffs;
if(name1 != null)
{
query = from staff in query where (staff.name == name1);
}
However, from a program we took over from other developers, we saw code like this:
IQueryable query = from staff in dataContext.Staffs;
query = from staff in query where (name1 == null || staff.name == name1);
If this is a normal SQL statement, I would definitely say that the 2nd one is a bad practice. Because it adds a meaningless where clause to the query when name1 is null.
But I am new to LINQ, so I am not sure if LINQ is different?
you can write it like
IQueryable query = from staff in dataContext.Staffs;
query = from staff in query where (name1 != null && staff.name == name1);
This way second part of your condition will not be evaluated if your first condition evaluates to false
Update:
if you write
IQueryable query = from staff in dataContext.Staffs;
query = from staff in query where (name1 == null || staff.name == name1);
and name1 is null second part of your condition will not be evaluated since or condition only requires one condition to return true
plz see this link for further detail
Often this sort of thing feels smoother to write using the fluent syntax, rather than the query syntax.
e.g.
IQueryable query = dataContext.Staffs;
if(name1 != null)
{
query = query.Where(x => x.name == name1);
}
So if name1 is null, you just don't do any Where() call. If you have multiple different filters, all of which may or may not be required, and perhaps various different sort orders, I find this becomes a lot more manageable.
Edit for alex: OK, I was answering the question about adding a where clause only when a value is not null. In response to the other part of the question, I tried this out with Entity Framework 4 to see what SQL that LINQ produced. You do this by casting query to an ObjectQuery and calling .ToTraceString(). The results were that the WHERE clause came out as follows:
WHERE #p__linq__0 IS NULL OR [Extent1].[name] = #p__linq__1
So, yes, it's classic bad SQL, if you have an index on the name column, don't expect it to be used.
Edit #2: Tried this again using LINQ to SQL rather than Entity Framework, with rather different results. This time, trying the query with name1 being null results in no WHERE clause at all, as you'd hope; trying it with name1 being "a" resulted in a simple WHERE [t0].[name] = #p0 and #p0 sent as "a". Entity Framework does not seem to optimize thus. That's a bit worrying.
The best way to do this is to create yourself an extension method that will take in a conditional statement and a where expression. If the condition is true then it will use the where expression else it will not use it. This can dramatically clean up your code, eliminating the need for if statements.
public static class LinqExtensions
{
public static IQueryable<T> WhereIf<T>(this IQueryable<T> query, bool condition, Expression<Func<T, bool>> whereClause)
{
if (condition)
{
return query.Where(whereClause);
}
return query;
}
}
Now you can write your code like this:
IQueryable<Staffs> query = dataContext.Staffs.AsQueryable().WhereIf(name1 != null, x => x.Name == name1);
So I tried the .Where(..., x => ...) extension method listed here as an answer but it doesn't work against Entity Framework as Linq To Entities doesn't know how to translate that into TSQL.
So here's my solution getting my Func on:
Expression<Func<SomeEfPoco, bool>> columnBeingFilteredPredicate = x => true; // Default expression to just say yes
if (!string.IsNullOrWhiteSpace(someColumnBeingFilteredValue))
{
columnBeingFilteredPredicate = x => x.someColumnBeingFiltered == someColumnBeingFilteredValue;
}
_context.SomeEfPocos.Where(x => ..... &&
..... &&
..... &&)
.Where(columnBeingFilteredPredicate);
someColumnBeingFilteredValue in my case is a string parameter on the encapsulating method with a default value of NULL.
LINQ is diffrent in some other causes (not in this causes),
LINQ is the way to get data in the "Faster way" with a littel code and clear cod as possible, there a many benefits of LINQ:
Makes it easier to transform data into objects. I'm sure you've heard the term "Impedence Mismatch" being used quite often, meaning that LINQ reduces the amount of work you must do to translate between object-oriented code and data paradigms such as hierarchical, flat-file, messages, relational, and more. It doesn't eliminate the "Impedence Mismatch" because you must still reason about your data in its native form, but the bridge from here to there is (IMO) much shorter.
A common syntax for all data. Once you learn query syntax, you can use it with any LINQ provider. I think this is a much better development paradigm than the Tower of Babel that has grown over the years with data access technologies. Of course, each LINQ provider has unique nuances that are necessary, but the basic approach and query syntax is the same.
Strongly typed code. The C# (or VB.NET) query syntax is part of the language and you code with C# types, which are translated into something a provider understands. This means that you gain the productivity of having your compiler find errors earlier in the development lifecycle than elsewhere. Granted, many errors in stored proc syntax will generate errors when you save, but LINQ is more general than SQL Server. You have to think of all the other types of data sources that generate runtime errors because their queries are formed with strings or some other loosely typed mechanism.
Provider integration. Pulling together data sources is very easy. For example, you can use LINQ to Objects, LINQ to SQL, and LINQ to XML together for some very sophisticated scenarios. I think it's very elegant.
Reduction in work. Before LINQ, I spent a lot of time building DALs, but now my DataContext is the DAL. I've used OPFs too, but now I have LINQ that ships with multiple providers in the box and many other 3rd party providers, giving me the benefits from my previous points. I can set up a LINQ to SQL DataContext in a minute (as fast as my computer and IDE can keep up).
Performance in the general case doesn't become an issue. SQL Server optimizes queries quite well these days, just like stored procs. Of course, there are still cases where stored procs are necessary for performance reasons. For example, I've found it smarter to use a stored proc when I had multiple interactions between tables with additional logic inside of a transaction. The communications overhead of trying to do the same task in code, in addition to getting the DTC involved in a distributed transaction made the choice for a stored proc more compelling. However, for a query that executes in a single statement, LINQ is my preferred choice because even if there was a small performance gain from a stored proc, the benefits in previous points (IMO) carry more weight.
Built-in security. One reason I preferred stored procs before LINQ was that they forced the use of parameters, helping to reduce SQL injection attacks. LINQ to SQL already parameterizes input, which is just as secure.
LINQ is declarative. A lot of attention is paid to working with LINQ to XML or LINQ to SQL, but LINQ to Objects is incredibly powerful. A typical example of LINQ to Objects is reading items from a string[]. However, that's just a small example. If you think about all of the IEnumerable collections (you can also query IEnumerable) that you work with every day, the opportunities are plentiful. i.e. Searching an ASP.NET ListBox control for selected items, performing set operations (such as Union) on two collections, or iterating through a List and running a lambda in a ForEach of each item. Once you begin to think in LINQ, which is declarative in nature, you can find many of your tasks to be simpler and more intuitive than the imperative techniques you use today.
I could probably go on, but I'd better stop there. Hopefully, this will provide a more positive view of how you could be more productive with LINQ and perhaps see it as a useful technology from a broader perspective.
I've seen this pattern in standard SQL, and it seems useful if you have several parameters that may be NULL. For example:
SELECT * FROM People WHERE ( #FirstName IS NULL OR FirstName = #FirstName )
AND ( #LastName IS NULL OR LastName = #LastName )
If you see this in LINQ, it's possible they just blindly translated their old SQL-queries.
I like use the Expression
e.g.
Expression<Func<Persons, bool>> expresionFinal = c => c.Active == true;
if (DateBirth.HasValue)
{
Expression<Func<Persons, bool>> expresionDate = c => (EntityFunctions.TruncateTime(c.DateBirth) == DateBirth);
expresionFinal = PredicateBuilder.And(expresionFinal, expresionDate);
}
IQueryable query = dataContext.Persons;
query = query.Where(expresionFinal);
For EF Core I broke it up like this:
IQueryable<Partners> recs = contextApi.Partners;
if (status != -1)
{
recs = recs.Where(i => i.Status == status);
}
recs = recs.OrderBy(i => i.Status).ThenBy(i => i.CompanyName);
foreach (var rec in recs)
{
}
I had to be explicit with my typing instead of relying on var.
I like the idea with Extension
public static IQueryable<T> WhereIf<T>(this IQueryable<T> query, bool condition, Expression<Func<T, bool>> whereClause)
=> condition ? query.Where(whereClause) : query;
No, I am not strongly agree with you.
here you just gave a simple logic
if(name1 != null)
// do your stuff
but what will happen if you do something different with the name1 that have null value..!!
Ok, now consider this situation.
In this example you shows how to handle possible null values in source collections.
An object collection such as an IEnumerable<T> can contain elements whose value is null.
If a source collection is null or contains an element whose value is null,
and your query does not handle null values, a NullReferenceException will be thrown when you execute the query.
Probably this could be a issue...
I use the extension method below. It's less flexible than the WhereIf extension from the other answers, but it's shorter to use.
public static IQueryable<T1> FilterBy<T1, T2>(this IQueryable<T1> query, T2 expectedValue, Expression<Func<T1, T2>> propertyAccessor)
{
if (propertyAccessor == null) throw new ArgumentNullException(nameof(propertyAccessor));
if (expectedValue == null) return query;
var equalExpr = Expression.Equal(propertyAccessor.Body, Expression.Constant(expectedValue, typeof(T2)));
var lambda = Expression.Lambda<Func<T1, bool>>(equalExpr, propertyAccessor.Parameters);
return query.Where(lambda);
}
It can be used like:
var query = dataContext.Staffs.FilterBy(name, s => s.Name);

Linq to SQL - what's better?

db.Albums.FirstOrDefault(x => x.OrderId == orderId)
or
db.Albums.FirstOrDefault(x => x.OrderId.Equals(orderId))
I'm going to try to convince you that:
The two methods you proposed give the same performance.
There are at least two non-performance related reasons you should prefer ==.
There is another separate improvement that you can make to your code to reduce the possibility of errors.
To see that the performance will be the same, look at the SQL generated in each case. This test program shows you how you can view the generated SQL:
int orderId = 4;
TextWriter textWriter = new StringWriter();
using (var dc = new DataClasses1DataContext())
{
dc.Log = textWriter;
Order o1 = dc.Orders.FirstOrDefault(x => x.OrderId == orderId);
Order o2 = dc.Orders.FirstOrDefault(x => x.OrderId.Equals(orderId));
}
string log = textWriter.ToString();
The SQL sent in each case is the same, as you can see by inspecting the log:
SELECT TOP (1) [t0].[OrderId], [t0].[CustomerID], [t0].[Date], [t0].[Description]
FROM [dbo].[Order] AS [t0]
WHERE [t0].[OrderId] = #p0
SELECT TOP (1) [t0].[OrderId], [t0].[CustomerID], [t0].[Date], [t0].[Description]
FROM [dbo].[Order] AS [t0]
WHERE [t0].[OrderId] = #p0
Regarding whether to use == or Equals, firstly I'd suggest using == for readability. This is the idiomatic way to compare two integers in C#.
Secondly with == you will get a compile time error if you give objects of different (incompatible) types. I assume that in your case order has type int, but let's assume that someone else wrote this code and accidentally made an error where order is a variable of type Order instead of an int. Now let's compare what would happen in each case:
Order order = new Order { OrderId = 4 };
x.OrderId.Equals(order) // This compiles, but you get an exception at runtime:
// Could not format node 'Value' for execution as SQL.
x.OrderId == order // Compile error: Operator '==' cannot be applied to
// operands of type 'int' and 'Order'
It is better to get compile time errors than runtime errors, so prefer to use == in this case.
Finally, if you only expect one result you should prefer to use SingleOrDefault instead of FirstOrDefault as the former will throw an exception if there are two matching objects found instead of just returning the first. This extra check will cost a tiny amount in performance but again allows you to catch errors earlier. If performance is a critical issue for you, instead of removing these safety checks you should consider fetching multiple objects from the database at once, not one object at a time.
So in summary I recommend that you use this:
Album album = db.Albums.SingleOrDefault(x => x.OrderId == orderId);
They will both be equivalent from a performance perspective. I tend to prefer == over .Equals() for readability, but the beauty of L2S is that you can use either one, depending on what type of object you have.
(And I'm assuming your second statement is on the orderId, and not the order object)
In most situations you should get the same result. However, there is a difference.
Using the operator Equals determines whether two Object instances are the same.
The operator == determines whether two Objects have the same value.
In this case, I use the == operator, so it's more readable.
It's almost same. If you want to check only the value then you should use
==
If you want check the value as well as if they are same instances or not use
Equals
But in both cases resulting time is almost same.

Categories