Mongodb AsQueryable() Performance - c#

I have code like this where I want to query to MongoDB using Linq.
I get an AsQueryable from MongoDB collection.
public IEnumerable<IVideo> GetVideos()
{
var collection = database.GetCollection<IVideo>("Videos");
return collection.AsQueryable();
}
I call it like so,
var finalList = Filter2(Filter1(GetVideos())).Skip(2).Take(30);
foreach(var v in finalList)
{
....
}
Functions with the queries.
public IEnumerable<IVideo> Filter1(IEnumerable<IVideo> list)
{
return list.Where(q=>q.Categorized)
}
public IEnumerable<IVideo> Filter2(IEnumerable<IVideo> list)
{
var query = from d in list
where d.File == "string1" || d.File == "string2"
select d;
return query;
}
My code works fine. I have my code hosted in an IIS and have around 50,000 records and the queries are a bit complex than the example. My worker process spikes to 17% and takes a few seconds to execute when the foreach is called. This is a ridiculous high for such a low date amount.
I have a couple of questions.
Is the query being executed by .net or MongoDB? If it is executed by MongoDB why is my worker process taking such a hit?
What are the steps I can take to improve the execution time to render the query and reduce the server load.
Thanks

You're downloading all entries client-side by accident
public IEnumerable<IVideo> Filter1(IEnumerable<IVideo> list)
{
var list = list.Where(q=>q.Categorized)
}
IEnumerable causes the queryable to execute and return results. Change the filter methods to accept and return IQueryable.
EDIT:
The code you posted:
public IEnumerable<IVideo> Filter1(IEnumerable<IVideo> list)
{
var list = list.Where(q=>q.Categorized)
}
Does not compile.
Your code should look like this:
public IQueryable<IVideo> Filter1(IQueryable<IVideo> qVideos)
{
return qVideos.Where(q => q.Categorized);
}
public IQueryable<IVideo> Filter2(IQueryable<IVideo> qVideos)
{
return qVideos
.Where(e => e.File == "string1" || e.File == "string2");
}
public DoSomething()
{
// This is the query, in debug mode you can inspect the actual query generated under a property called 'DebugView'
var qVideos = Filter2(Filter1(GetVideos()))
.Skip(1)
.Take(30);
// This runs the actual query and loads the results client side.
var videos = qVideos.ToList();
// now iterated
foreach (var video in videos)
{
}
}

Related

Obtaining entities from DbSet from a list of matching objects

I'm using Entity Framework Core 6 and I want to find a series of entities in a DbSet. The entities I want to obtain are the ones match some properties in a list of input objects.
I've tried something like this:
public IEnumerable<MyEntity> FindEntities(IEnumerable<MyEntityDtos> entries)
{
return dbContext.MyDbSet.Where(r => entries.Any(e => e.Prop1 == r.Prop1 && e.Prop2 == r.Prop2));
}
But I get the classic EF Core exception saying that my LINQ cannot be translated to a database query (the problem in particular is the entries.Any(...) instruction)
I know I can just loop over the list of entries and obtain the entities one by one from the DbSet, but that is very slow, I was wondering if there was a more efficient way to do this in EF Core that I don't know about.
I think this should work:
public IEnumerable<MyEntity> FindEntities(IEnumerable<MyEntityDtos> entries)
{
var props1=entries.Select(x=>x.Prop1).ToArray();
var props2=entries.Select(x=>x.Prop2).ToArray();
return dbContext.MyDbSet.Where(r => props1.Contains(r.Prop1) && props2.Contains(r.Prop2));
}
In the end, I've done this:
public static IEnumerable<MyEntity> GetRangeByKey(this DbSet<MyEntity> dbSet, IEnumerable<MyEntity> toFind)
{
var keys = new HashSet<string>(toFind.Select(e => e.Id));
IEnumerable<MyEntity> result = null;
for (int i = 0; i < keys.Length; i += 1000)
{
var keyChunk = keys[i..(Math.Min(i + 1000, keys.Length))];
var res = dbSet.Where(x => keyChunk.Any(k => x.ResourceArn == k));
if (result == null)
{
result = res;
}
else
{
result = result.Concat(res);
}
}
return result;
}
Basically I get the keys to find in a HashSet and use it to perform a Where query, which will be translated to a SQL IN clause which is quite fast. I do it in chunks because there's a maximum number of values you can put in a IN clause before the DB engine refuses it.

Slowness when chaining LINQ queries

I am doing chaining LINQ queries as show below. I am trying to find out the cause for the slowness of query.ToList();. The SQL queries are fast (milliseconds), but the code takes a minute. The reason for chaining is to reuse the repository function.
Is there any obvious reasons for slowness here?
How could I optimize this ?
How can I check the actual SQL query executed when
running query.ToList();?
//Client
var query = _service.GetResultsByStatus(status, bType, tType);
var result = query.ToList(); //takes a long time to execute
//Service function
public IEnumerable<CustomResult> GetResultsByStatus(string status, string bType, string tType) {
IEnumerable<CustomResult> result = null;
result = repo.GetResults(bType).Where(item => item.tStatus == status && (tType == null || item.tType == tType))
.Select(item => new CustomResult {
A = item.A,
B = item.B,
});
return result;
}
// Repository Function (reused in many places)
public IEnumerable<my_model> GetResults(string bType) {
return from p in dbContext.my_model()
where p.bType.Equals(bType)
select p;
}
Your .Where(item => item.tStatus == status && (tType == null || item.tType == tType)) and the .Select are being done "locally" on your PC... Tons of useless rows and columns are being returned by the SQL to be then "filtered" on your PC.
public IEnumerable<my_model> GetResults(string bType) {
return from p in dbContext.my_model()
where p.bType.Equals(bType)
select p;
}
Change it to
public IQueryable<my_model> GetResults(string bType) {
Normally IEnumerable<> means "downstream LINQ will be executed locally", IQueryable<> means "downstream LINQ will be executed on a server". In this case the Where and the Select are "downstream" from the transformation of the query in a IEnumerable<>. Note that while it is possible (and easy) to convert an IQueryable<> to an IEnumerable<>, the opposite normally isn't possible. The AsQueryable<> creates a "fake" IQueryable<> that is executed locally and is mainly useful in unit tests.

Linq to Entities performance difference between Expression/Func

I was just testing a simple query that i'm accessing in different ways, but the speed of each can vary by up to 2 seconds. I was hoping someone can clarify why this is the case. My project is in it's very early stages, so I thought I'd make sure I'm doing it right before it gets too big.
Admittedly, my testing style isn't perfect, but i think it's good enough for this.
I'm using a generic Repository and UnitofWork, and I hit the DB (sqlexpress on my local machine) 10,000 times in this while statement. The table only has 64 records. Tests are run in Release mode.
[TestMethod]
public void MyTestMethod()
{
using (var u = new UnitOfWork())
{
TestA(u);
TestB(u);
}
}
TestA (Func):
public void TestA(UnitOfWork u)
{
Stopwatch s = Stopwatch.StartNew();
s.Start();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.Single(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
Console.WriteLine("TESTA: " + s.Elapsed);
}
TestB (Expression):
public void TestB(UnitOfWork u)
{
Stopwatch s = Stopwatch.StartNew();
s.Start();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.First(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
Console.WriteLine("TESTB: " + s.Elapsed);
}
Even though i'm using the calls First() and Single(), they're not the built-in LINQ calls. They're part of my repository.
First() expression (IQueryable)
public TEntity Single(Func<TEntity, bool> predicate)
{
return dbSet.FirstOrDefault(predicate);
}
Single() func (IEnumerable)
public TEntity First(Expression<Func<TEntity, bool>> predicate)
{
return dbSet.FirstOrDefault(predicate);
}
Output:
Test Name: MyTestMethod
Test Outcome: Passed
Result StandardOutput:
TESTA: 00:00:02.4798818
TESTB: 00:00:03.4212112
First() with Expression<Func<...>> parameter is an extension method on IQueryable<T> and is used by query providers, like LINQ to Entities. Expression tree you provide is transformed into proper SQL query, which is sent to DB and only necessary rows are returned back to your application.
First() with Func<...> parameter is an extension method on IEnumerable<T> and is used by LINQ to Objects, which mean all the records from database will be fetched into application memory, and then element will be search as in-memory query, which is implemented as linear search.
You should definitely use the one from IQueryable<T>, because it will be more efficient (as database is optimized to perform queries).
This is not an answer, but just trying to make sure that the test results are more reliable.
Try writing your tests like this:
public long TestA()
{
using (var u = new UnitOfWork())
{
var s = Stopwatch.StartNew();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.Single(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
return s.ElapsedMilliseconds;
}
}
(Obviously TestB is just a minor variant.)
And then your test method becomes:
[TestMethod]
public void MyTestMethod()
{
var dummyA = TestA();
var dummyB = TestB();
var realA = 0L;
var realB = 0L;
for (var i = 0; i < 10; i++)
{
realA += TestA();
realB += TestB();
}
Console.WriteLine("TESTA: " + realA.ToString());
Console.WriteLine("TESTB: " + realA.ToString());
}
Now your results are likely to be more accurate. Let us know the timings now.
Now try changing your tests like this:
public int TestA()
{
var gc0 = GC.CollectionCount(0);
using (var u = new UnitOfWork())
{
var s = Stopwatch.StartNew();
var x = 0;
var repo = u.Repository<MyEntity>();
var code = "ABCD".First().ToString();
while (x < 10000)
{
var testCase = repo.Single(w => w.Code == code && w.CodeOrder == 0).Name;
x++;
}
s.Stop();
}
return GC.CollectionCount(0) - gc0;
}
This should determine how many generation 0 garbage collections are being performed. That might indicate that the performance issues are with your tests and not with the SQL.
I will list some tests you might wanna try to help you narrow the differences between the operations.
Check the actual SQL code
Turn on the debug log for the queries or check it on the SSE logs. It is important since the EF engine should optimize the statements, and you can see what is really beeing sent to the DB.
As you said, the First operation should be faster, since there are optimized SQL operators for that. The Single should be slower since it has to validate all the values, and would scale based on the amount of rows.
Use the real SQL on the database for a reference test
Once you have the real SQL you can also check the differences of time elapsed on the database directly. Implement the same C# test on the DB, a Sotred Procedure maybe, and see what happens.
Try the built-in LINQ for comparison
I dont know if you already did it for the test, but try to use the native LINQ for a comparison.
I made many tests here using LINQ and there were no differences between the two statements you presented, so it actually could be the Expressions. (I used the SS CE btw).
Also, just for the sake of saying it, remmember to create Indexes for columns involved in heavy operations ;)
EF 6.1 has this feature built-in now.
[Index]
public String MyProperty{ get; set; }
Let me know if it was helpful.

LINQ combine select and update query

I have a LINQ database operation that looks like this:
using (SomeDC TheDC = new SomeDC())
{
var SomeData = (from .... select x).ToList();
if (SomeData.Count() > 0)
{
foreach (.... in SomeData) { x.SomeProp = NewProp; }
TheDC.SubmitChanges();
}
}
As you can see, I'm reading a list, then I'm updating this list, and finally I'm writing it back to the DB. Is there a way to combine this operation in just one query?
Is there a way to combine this operation in just one query?
Not in Linq - you could execute one SQL statement directly if there's a pattern to the updates, but querying and updating are two separate methods.
No, you should keep using a simple loop to manipulate your objects.There is no method for updating in LINQ.As the name suggests: Language Integrated Query, LINQ used for querying not for updating.
You could manually generate the sql query for your update operation then execute it via SqlQuery method or you can also execute Stored Procedures with Entity Framework if you wish.
There isn't really a single update select statement, however you could shorten your code and drop the ToList() call.
Additionally the check of count is unnecessary as the foreach wouldn't operate for an empty loop anyhow.
using (SomeDC TheDC = new SomeDC())
{
var SomeData = (from .... select x);
foreach (.... in SomeData) { x.SomeProp = NewProp; }
TheDC.SubmitChanges();
}
Alternatively if you are desperate to just do it in a single execute you can roll your own extension method to help.
public static class DatabaseHelper {
public static void Apply<T>(this IEnumerable<T> collection, Action<T> action) {
if (collection == null) { // Sanity check
return;
}
foreach (var item in collection) {
action(item);
}
}
}
Then your code would be
using (SomeDC TheDC = new SomeDC()) {
(from ... select x).Apply(x => x.SomeProp = NewProp);
TheDC.SubmitChanges();
}

Using Entity Framework, which method is more efficient?

I have some code that changes a value of some data within my database while within a loop. I'm just wondering what is the most efficient way of filtering my data first? I'll give an example:-
With the class:-
public class myObj
{
int id {get;set;}
string product {get; set;}
string parent{get;set;}
bool received {get;set;}
}
And the DbContext:-
public class myCont:DbContext
{
public DbSet<myObj> myObjs {get;set;}
}
Is it better to do this:-
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
myObj ob = data.myObjs.Where(o => o.parent == "number1");
foreach(int i in list)
{
ob.First(o => o.id == i && o.received != true).received = true;
}
Or:-
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
foreach(int i in list)
{
data.myObjs.First(o => o.parent == "number1" && o.id == i && o.received != true).received = true;
}
Or is there no difference?
Not sure how you get to compile your code example above.
In your myObj object, the received property is an int, yet you are evaluating it against a bool which should cause this line o.received != true to results in an error Cannot apply operator '!=' to operands of type 'int' and 'bool'.
To Check the SQL
Once the code compiles use SQL Profiler to see what SQL is generated.
That will show you the constructed SQLs
Benchmarking
The below is a very crude description of only one possible way you can benchmark your code execution.
Wrap your code into a method, for example:
public void TestingOperationOneWay()
{
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
myObj ob = data.myObjs.Where(o => o.parent == "number1");
foreach(int i in list)
{
ob.First(o => o.id == i && o.received != true).received = true;
}
}
And:
public void TestingOperationAnotherWay()
{
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
foreach(int i in list)
{
data.myObjs.First(o => o.parent == "number1" && o.id == i && o.received != true).received = true;
}
}
Crate a method which iterates x amount of times over each method using the Stopwatch similar to this:
private static TimeSpan ExecuteOneWayTest(int iterations)
{
var stopwatch = Stopwatch.StartNew();
for (var i = 1; i < iterations; i++)
{
TestingOperationOneWay();
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
Evaluate the results similar to this:
static void RunTests()
{
const int iterations = 100000000;
var timespanRun1 = ExecuteOneWayTest(iterations);
var timespanRun2 = ExecuteAnotherWayTest(iterations);
// Evaluate Results....
}
In the case of a choice between your two queries, I agree that they would both execute similarly, and benchmarking is an appropriate response. However, there are some things you can do to optimize. For example, you could use the method 'AsEnumerable' to force evaluation using the IEnumerable 'Where' vice the LINQ 'Where' clause (a difference of translating into SQL and executing against the data source or handling the where within the object hierarchy). Since you appear to be manipulating only properties (and not Entity Relationships), you could do this:
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
myObj ob = data.myObjs.Where(o => o.parent == "number1").AsEnumerable<myObj>();
foreach(int i in list)
{
ob.First(o => o.id == i && o.received != true).received = true;
}
Doing so would avoid the penalty of hitting the database for each record (possibly avoiding network latency), but would increase your memory footprint. Here's an associated LINQ further explaining this idea. It really depends on where you can absorb the performance cost.

Categories