Linq to Entities performance difference between Expression/Func - c#

I was just testing a simple query that i'm accessing in different ways, but the speed of each can vary by up to 2 seconds. I was hoping someone can clarify why this is the case. My project is in it's very early stages, so I thought I'd make sure I'm doing it right before it gets too big.
Admittedly, my testing style isn't perfect, but i think it's good enough for this.
I'm using a generic Repository and UnitofWork, and I hit the DB (sqlexpress on my local machine) 10,000 times in this while statement. The table only has 64 records. Tests are run in Release mode.
[TestMethod]
public void MyTestMethod()
{
using (var u = new UnitOfWork())
{
TestA(u);
TestB(u);
}
}
TestA (Func):
public void TestA(UnitOfWork u)
{
Stopwatch s = Stopwatch.StartNew();
s.Start();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.Single(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
Console.WriteLine("TESTA: " + s.Elapsed);
}
TestB (Expression):
public void TestB(UnitOfWork u)
{
Stopwatch s = Stopwatch.StartNew();
s.Start();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.First(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
Console.WriteLine("TESTB: " + s.Elapsed);
}
Even though i'm using the calls First() and Single(), they're not the built-in LINQ calls. They're part of my repository.
First() expression (IQueryable)
public TEntity Single(Func<TEntity, bool> predicate)
{
return dbSet.FirstOrDefault(predicate);
}
Single() func (IEnumerable)
public TEntity First(Expression<Func<TEntity, bool>> predicate)
{
return dbSet.FirstOrDefault(predicate);
}
Output:
Test Name: MyTestMethod
Test Outcome: Passed
Result StandardOutput:
TESTA: 00:00:02.4798818
TESTB: 00:00:03.4212112

First() with Expression<Func<...>> parameter is an extension method on IQueryable<T> and is used by query providers, like LINQ to Entities. Expression tree you provide is transformed into proper SQL query, which is sent to DB and only necessary rows are returned back to your application.
First() with Func<...> parameter is an extension method on IEnumerable<T> and is used by LINQ to Objects, which mean all the records from database will be fetched into application memory, and then element will be search as in-memory query, which is implemented as linear search.
You should definitely use the one from IQueryable<T>, because it will be more efficient (as database is optimized to perform queries).

This is not an answer, but just trying to make sure that the test results are more reliable.
Try writing your tests like this:
public long TestA()
{
using (var u = new UnitOfWork())
{
var s = Stopwatch.StartNew();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.Single(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
return s.ElapsedMilliseconds;
}
}
(Obviously TestB is just a minor variant.)
And then your test method becomes:
[TestMethod]
public void MyTestMethod()
{
var dummyA = TestA();
var dummyB = TestB();
var realA = 0L;
var realB = 0L;
for (var i = 0; i < 10; i++)
{
realA += TestA();
realB += TestB();
}
Console.WriteLine("TESTA: " + realA.ToString());
Console.WriteLine("TESTB: " + realA.ToString());
}
Now your results are likely to be more accurate. Let us know the timings now.
Now try changing your tests like this:
public int TestA()
{
var gc0 = GC.CollectionCount(0);
using (var u = new UnitOfWork())
{
var s = Stopwatch.StartNew();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.Single(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
}
return GC.CollectionCount(0) - gc0;
}
This should determine how many generation 0 garbage collections are being performed. That might indicate that the performance issues are with your tests and not with the SQL.

I will list some tests you might wanna try to help you narrow the differences between the operations.
Check the actual SQL code
Turn on the debug log for the queries or check it on the SSE logs. It is important since the EF engine should optimize the statements, and you can see what is really beeing sent to the DB.
As you said, the First operation should be faster, since there are optimized SQL operators for that. The Single should be slower since it has to validate all the values, and would scale based on the amount of rows.
Use the real SQL on the database for a reference test
Once you have the real SQL you can also check the differences of time elapsed on the database directly. Implement the same C# test on the DB, a Sotred Procedure maybe, and see what happens.
Try the built-in LINQ for comparison
I dont know if you already did it for the test, but try to use the native LINQ for a comparison.
I made many tests here using LINQ and there were no differences between the two statements you presented, so it actually could be the Expressions. (I used the SS CE btw).
Also, just for the sake of saying it, remmember to create Indexes for columns involved in heavy operations ;)
EF 6.1 has this feature built-in now.
[Index]
public String MyProperty{ get; set; }
Let me know if it was helpful.

Related

Obtaining entities from DbSet from a list of matching objects

I'm using Entity Framework Core 6 and I want to find a series of entities in a DbSet. The entities I want to obtain are the ones match some properties in a list of input objects.
I've tried something like this:
public IEnumerable<MyEntity> FindEntities(IEnumerable<MyEntityDtos> entries)
{
return dbContext.MyDbSet.Where(r => entries.Any(e => e.Prop1 == r.Prop1 && e.Prop2 == r.Prop2));
}
But I get the classic EF Core exception saying that my LINQ cannot be translated to a database query (the problem in particular is the entries.Any(...) instruction)
I know I can just loop over the list of entries and obtain the entities one by one from the DbSet, but that is very slow, I was wondering if there was a more efficient way to do this in EF Core that I don't know about.
I think this should work:
public IEnumerable<MyEntity> FindEntities(IEnumerable<MyEntityDtos> entries)
{
var props1=entries.Select(x=>x.Prop1).ToArray();
var props2=entries.Select(x=>x.Prop2).ToArray();
return dbContext.MyDbSet.Where(r => props1.Contains(r.Prop1) && props2.Contains(r.Prop2));
}
In the end, I've done this:
public static IEnumerable<MyEntity> GetRangeByKey(this DbSet<MyEntity> dbSet, IEnumerable<MyEntity> toFind)
{
var keys = new HashSet<string>(toFind.Select(e => e.Id));
IEnumerable<MyEntity> result = null;
for (int i = 0; i < keys.Length; i += 1000)
{
var keyChunk = keys[i..(Math.Min(i + 1000, keys.Length))];
var res = dbSet.Where(x => keyChunk.Any(k => x.ResourceArn == k));
if (result == null)
{
result = res;
}
else
{
result = result.Concat(res);
}
}
return result;
}
Basically I get the keys to find in a HashSet and use it to perform a Where query, which will be translated to a SQL IN clause which is quite fast. I do it in chunks because there's a maximum number of values you can put in a IN clause before the DB engine refuses it.

Mongodb AsQueryable() Performance

I have code like this where I want to query to MongoDB using Linq.
I get an AsQueryable from MongoDB collection.
public IEnumerable<IVideo> GetVideos()
{
var collection = database.GetCollection<IVideo>("Videos");
return collection.AsQueryable();
}
I call it like so,
var finalList = Filter2(Filter1(GetVideos())).Skip(2).Take(30);
foreach(var v in finalList)
{
....
}
Functions with the queries.
public IEnumerable<IVideo> Filter1(IEnumerable<IVideo> list)
{
return list.Where(q=>q.Categorized)
}
public IEnumerable<IVideo> Filter2(IEnumerable<IVideo> list)
{
var query = from d in list
where d.File == "string1" || d.File == "string2"
select d;
return query;
}
My code works fine. I have my code hosted in an IIS and have around 50,000 records and the queries are a bit complex than the example. My worker process spikes to 17% and takes a few seconds to execute when the foreach is called. This is a ridiculous high for such a low date amount.
I have a couple of questions.
Is the query being executed by .net or MongoDB? If it is executed by MongoDB why is my worker process taking such a hit?
What are the steps I can take to improve the execution time to render the query and reduce the server load.
Thanks
You're downloading all entries client-side by accident
public IEnumerable<IVideo> Filter1(IEnumerable<IVideo> list)
{
var list = list.Where(q=>q.Categorized)
}
IEnumerable causes the queryable to execute and return results. Change the filter methods to accept and return IQueryable.
EDIT:
The code you posted:
public IEnumerable<IVideo> Filter1(IEnumerable<IVideo> list)
{
var list = list.Where(q=>q.Categorized)
}
Does not compile.
Your code should look like this:
public IQueryable<IVideo> Filter1(IQueryable<IVideo> qVideos)
{
return qVideos.Where(q => q.Categorized);
}
public IQueryable<IVideo> Filter2(IQueryable<IVideo> qVideos)
{
return qVideos
.Where(e => e.File == "string1" || e.File == "string2");
}
public DoSomething()
{
// This is the query, in debug mode you can inspect the actual query generated under a property called 'DebugView'
var qVideos = Filter2(Filter1(GetVideos()))
.Skip(1)
.Take(30);
// This runs the actual query and loads the results client side.
var videos = qVideos.ToList();
// now iterated
foreach (var video in videos)
{
}
}

making Linq sum better

I want to find all the physical card effects from the
List<CardEffect> opponenteffect
This codeblock is what i came up with
int netPhysicalDamage = oponenteffect.FindAll(
ce => ce.type == CardEffect.Type.physical
).Sum(ce => ce.amount);
Is possible to make this to single Linq function call?
I want to find all the physical card effects from the
List<CardEffect> opponenteffect
This codeblock is what i came up with
int netPhysicalDamage = oponenteffect.FindAll(
ce => ce.type == CardEffect.Type.physical
).Sum(ce => ce.amount);
Edit: "Single Linq function call" was not a good idea to make this better
Edit2: The question was not very good. I was unfamiliar with linq and I thought was doing something wrong with this part of code. Thanks for you attention.
The conditional operator is your friend in this case:
int result = oponenteffect.Sum(ce => ce.type == CardEffect.Type.physical ? ce.amount : 0);
You should really be using a Where and a Sum. So you should do this:
int netPhysicalDamage = oponenteffect.Where(
ce => ce.type == CardEffect.Type.physical
).Sum(ce => ce.amount);
To make my answer 'valid' for your question (because that's what SO voters will pick up on), I am assuming that by "single call" you actually mean it will only process the query once. In this case it will, as Where returns an IEnumerable and doesn't execute until the result is required by a subsequent call. So in this case Sum is the only 'call'.
Also, you edited your question to suggest you do not require it to be a single function anyway.
Let's look at this code:
var ams = 0.0;
var bms = 0.0;
for (var i = 0; i < 10000; i++)
{
var sw = Stopwatch.StartNew();
var a = Enumerable.Range(0, 10000).Where(x => x % 2 == 0).Sum();
sw.Stop();
ams += sw.Elapsed.TotalMilliseconds;
sw = Stopwatch.StartNew();
var b = Enumerable.Range(0, 10000).Sum(x => x % 2 == 0 ? x : 0);
sw.Stop();
bms += sw.Elapsed.TotalMilliseconds;
};
Console.WriteLine(ams);
Console.WriteLine(bms);
The results I get are similar to this:
853.603399999998
1268.61419999997
This means that two separate LINQ calls are "better" than a single .Sum(...) call.

Slowness when chaining LINQ queries

I am doing chaining LINQ queries as show below. I am trying to find out the cause for the slowness of query.ToList();. The SQL queries are fast (milliseconds), but the code takes a minute. The reason for chaining is to reuse the repository function.
Is there any obvious reasons for slowness here?
How could I optimize this ?
How can I check the actual SQL query executed when
running query.ToList();?
//Client
var query = _service.GetResultsByStatus(status, bType, tType);
var result = query.ToList(); //takes a long time to execute
//Service function
public IEnumerable<CustomResult> GetResultsByStatus(string status, string bType, string tType) {
IEnumerable<CustomResult> result = null;
result = repo.GetResults(bType).Where(item => item.tStatus == status && (tType == null || item.tType == tType))
.Select(item => new CustomResult {
A = item.A,
B = item.B,
});
return result;
}
// Repository Function (reused in many places)
public IEnumerable<my_model> GetResults(string bType) {
return from p in dbContext.my_model()
where p.bType.Equals(bType)
select p;
}
Your .Where(item => item.tStatus == status && (tType == null || item.tType == tType)) and the .Select are being done "locally" on your PC... Tons of useless rows and columns are being returned by the SQL to be then "filtered" on your PC.
public IEnumerable<my_model> GetResults(string bType) {
return from p in dbContext.my_model()
where p.bType.Equals(bType)
select p;
}
Change it to
public IQueryable<my_model> GetResults(string bType) {
Normally IEnumerable<> means "downstream LINQ will be executed locally", IQueryable<> means "downstream LINQ will be executed on a server". In this case the Where and the Select are "downstream" from the transformation of the query in a IEnumerable<>. Note that while it is possible (and easy) to convert an IQueryable<> to an IEnumerable<>, the opposite normally isn't possible. The AsQueryable<> creates a "fake" IQueryable<> that is executed locally and is mainly useful in unit tests.

Can an IQueryable be reused with different values of the referenced local variables?

The intention of the following code is to find an available ID.
My thinking when I wrote it was that each time query.Any() runs, then the query runs again with the newly incremented value in the local id variable. This grew from my knowledge that id doesn't get evaluated until the query is executed.
From the results of my experiement, I can see this isn't how it works, and I'd like to know how you can re-write the code to accomplish its stated goal while keeping in the style of using LINQ to EF?
I know how to re-write in the trivial way - my goal is to better understand delayed execution of LINQ expressions and their execution context.
int id = 4700;
var query = from c in Advertisers
where c.ID == id
select c;
int loopCount = 0;
while(query.Any())
{
if(++loopCount == 5)
{
Console.WriteLine ("Cannot find a safe id.");
break;
}
Console.WriteLine ("Already a record with " + id);
id++;
}
Console.WriteLine ("The available id is " + id);
from the code above the problem is that id is updated but not your IQueryable object. You would need to change the value of id in the where clause of query. The easiest way I know of is to encapsulate the initial query and then change whatever I need.
For example:
using (var context = new ModelContainer())
{
IQueryable<Advertiser> queryAdvertisers =
from c in context.Advertisers
select c;
for (int i = 0; i < 100; i++)
{
if (queryAdvertisers.Where(a => a.ID == i).Any())
{
Console.WriteLine("ID:{0} already exists.", i);
}
else
{
Console.WriteLine("ID:{0} does not exist.", i);
}
}
}
Normally I would refactor queryAdvertisers out into a method or class.

Categories