Large database linq query to sql server takes forever - c#

Background
So, I am using a React frontend and a .net core 3.1 backend for a webapp where I display a view with a list of data. The list is often times several thousands long. In this case its around 7500. We virtualize it to prevent sluggishness. Along with the display of data, every row has a column with the latest logchange someone did on that datarow. The logs and the rest of the data for every row comes from two different applications with their own databases. The log data consists of the name, and date of when the log was made, is also supposed to be rendering for every row.
The problem
When you route to the page, a useEffect fires that fetches the rows from one of the databases. When I get the response, I filter out all of the ids from the data and then I post that list to the other endpoint to request the latest log from every id. This endpoint queries the logging database. The number of ids I am passing to the endpoint is about 7200+. It wont always be this much, but sometimes.
Troubleshooting
This is the query that is giving me trouble in the log endpoint
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds= (LogIds)parameters["LogIds"];
var results = await context.Set<LogEvent>()
.Where(x => LogIds.Ids.Contains(x.Id)).ToListAsync(); //55 600 entities
results = results
.GroupBy(x => x.ContextId)
.Select(x => x.OrderByDescending(p => p.CreationDate).First()).ToList(); //7 500 entities
var transformed = results.Select(MapEntityToLogEvent).ToList();
return Ok(transformed);
}
The first db query takes around 25 seconds (!) and returns around 56000 entities.
The second linq takes about 2 seconds, and returns around 7500 entites, and the mapping takes around 1 second.
The database is SQL server, and there are three indexes, one of which is Id, the other two are irrelevant for this assignment.
I have tried different queries, AsNoTracking, but to no avail.
Obviously this is horrible. Do you know of a way to optimize this query?

There are two ways, how to improve your query:
Pure EF Core
We can rewrite LINQ query to be translatable and avoid unnecessary records on the client side. Note that your GroupBy will work with EF Core 6:
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds = (LogIds)parameters["LogIds"];
var results = context.Set<LogEvent>()
.Where(x => LogIds.Ids.Contains(x.Id));
results =
from d in results.Select(d => new { d.ContextId }).Distinct()
from r in results
.Where(r => r.ContextId == d.ContextId)
.OrderByDescending(r => r.CreationDate)
.Take(1)
select r;
var transformed = await results.Select(MapEntityToLogEvent).ToListAsync();
return Ok(transformed);
}
Using third party extension
With linq2db.EntityFrameworkCore we can use full power of the SQL and make most efficient query in this case.
Big list of ids can fast be copied to temorary table and used in result query.
Retrieveing only latest records by ContextId can be done effectively with Windows Function ROW_NUMBER.
Disclaimer I'm maintainer of this library.
// helper class for creating temporary table
class IdsTable
{
public int Id { get; set; }
}
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds = (LogIds)parameters["LogIds"];
using var db = context.CreateLinqToDBConnection();
TempTable<IdsTable>? idsTable = null;
var results = context.Set<LogEvent>().AsQueryable();
try
{
// avoid using temporary table for small amount of Ids
if (LogIds.Ids.Count() < 20)
{
results = results.Where(x => LogIds.Ids.Contains(x.Id));
}
else
{
// initializing temporary table
idsTable = await db.CreateTampTableAsync(LogIds.Ids.Select(id => new IdsTable { Id = id }, tableName: "temporaryIds"));
// filter via join
results =
from t in idsTable
join r in results on t.Id equals r.Id
select r;
}
// selecting last log
results =
from r in results
select new
{
r,
rn = Sql.Ext.RowNumber().Over()
.PartitionBy(r.ContextId)
.OrderByDesc(r.CreationDate)
.ToValue()
} into s
where s.rn == 1
select s.r;
var transformed = await results
.Select(MapEntityToLogEvent)
.ToListAsyncLinqToDB(); // we have to use our extension because of name collision with EF Core extensions
}
finally
{
// dropping temporaty table if it was used
idsTable?.Dispose();
}
return Ok(transformed);
}
Warning
Also note that logs count will grow and you have to limit result set by date and probably count of retrieved records.

Related

Send multiple queries to EF / SQL database

.NET 4.7.6 and Entity Framework 3.1.16
Is it possible to send multiple queries via EF or any other method when querying the database?
The problem I have is that I have to extract a lot of data from multiple tables with multiple queries for example (I have omitted unnecessary info)
var companyValues = GetCompanyValues(companyId); // DB query
var allMemberScores = GetAllMemberScores(companyId); // DB query
var currentPeriodMemberScores = GetAllMemberScores(companyId, timespan); // DB query
var previousPeriodMemberScores = GetAllMemberScores(companyId, previousTimespan); // DB query
var currentDepartments = allMemberScores.Select(x => x.Department).Distinct();
if (currentDepartments?.Any() == true)
{
foreach (var department in currentDepartments)
{
var current = currentPeriodMemberScores.Where(x => x.Department.Equals(department));
var previous = previousPeriodMemberScores.Where(x => x.Department.Equals(department));
var combinedCompanyValueScore = allMemberScores.Where(x => x.Department.Equals(department)).SelectMany(x => x.CompanyValueScores).Average(x => x.Score);
foreach (var companyValue in companyValues)
{
int? value = GetCompanyValueDifference(current, previous, companyValue); // DB query
}
int responseRate = GetResponseRate(surveysQuery, current); // DB query
}
}
but this will send a lot of requests depending on the number of departments and company values both of which are undefined in length, could be 10 could be 20+ etc.
This is an inherited project, so there is existing data etc. and redesigning the database is on the backseat.
I have previously used Elastic search in another project where I could define a search request that takes in multiple queries and give each query an id, and on the object that is returned pull it the response for each of those ids.
Is there anything like this that I could utilise? So could I build up a query and then process all the results?
Thanks in advance :)

Why is Entity Framework having performance issues when calculating a sum

I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.
Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.
Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };

How to Performance Test This and Suggestions to Make Faster?

I seem to have written some very slow piece of code which gets slower when I have to deal with EF Core.
Basically I have a list of items that store attributes in a Json string in the database as I am storing many different items with different attributes.
I then have another table that contains the display order for each attribute, so when I send the items to the client I am order them based on that order.
It is kinda slow at doing 700 records in about 18-30 seconds (from where I start my timer, not the whole block of code).
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId);
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
Stopwatch a = new Stopwatch();
a.Start();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
var specDtos = new List<SpecDto>();
foreach (var inventorySpecification in inventorySpecifications.OrderBy(x => x.DisplayOrder))
{
if (specs.ContainsKey(inventorySpecification.JsonKey))
{
var value = specs.GetValue(inventorySpecification.JsonKey);
var newSpecDto = new SpecDto()
{
Key = inventorySpecification.JsonKey,
Value = displaySpec.ToString()
};
specDtos.Add(newSpecDto);
}
}
var dto = new InventoryItemDto()
{
// create dto
};
inventoryItemDtos.Add(dto);
}
Now it goes crazy slow when I add EF some more columns that I need info from.
In the //create dto area I access some information from other tables
var dto = new InventoryItemDto()
{
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
};
By trying to access these columns in the loop takes 6mins to process 700 rows.
I don't understand why it is so slow, it's the only change I really made and I made sure to eager load everything in.
To me it almost makes me think eager loading is not working, but I don't know how to verify if it is or not.
var inventoryItems = dbContext.InventoryItems.Include(x => x.Branch).ThenInclude(x => x.Company)
.Include(x => x.Branch).ThenInclude(x => x.Country)
.Include(x => x.Branch).ThenInclude(x => x.State)
.Include(x => x.Brand)
.Where(x => x.InventoryCategoryId == categoryId).ToList();
so I thought because of doing this the speed would not be that much different then the original 18-30 seconds.
I would like to speed up the original code too but I am not really sure how to get rid of the dual foreach loops that is probably slowing it down.
First, loops inside loops is a very bad thing, you should refactor that out and make it a single loop. This should not be a problem because inventorySpecifications is declared outside the loop
Second, the line
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
should end with ToList(), because it's enumerations is happening within the inner foreach, which means that the query is running for each of "inventoryItems"
that should save you a good amount of time
I'm no expert but this part of your second foreach raises a red flag: inventorySpecifications.OrderBy(x => x.DisplayOrder). Because this is getting called inside another foreach it's doing the .OrderBy call every time you iterate over inventoryItems.
Before your first foreach loop, try this: var orderedInventorySpecs = inventorySpecifications.OrderBy(x => x.DisplayOrder); and then use foreach (var inventorySpec in orderedInventorySpecs) and see if it makes a difference.
To help you better understand what EF is running behind the scenes add some logging in to expose the SQL being run which might help you see how/where your queries are going wrong. This can be extremely helpful to help determine if your queries are hitting the DB too often. As a very general rule you want to hit the DB as few times as possible and retrieve only the information you need via the use of .Select() to reduce what is being returned. The docs for the logging are: http://learn.microsoft.com/en-us/ef/core/miscellaneous/logging
I obviously cannot test this and I am a little unsure where your specDto's go once you have them but I assume they become part of the InventoryItemDto?
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId).Select(x => new InventoryItemDto() {
Attributes = x.Attributes,
//.....
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
}).ToList();
var inventorySpecifications = dbContext.InventoryCategorySpecifications
.Where(x => x.InventoryCategoryId == categoryId)
.OrderBy(x => x.DisplayOrder)
.Select(x => x.InventorySpecification).ToList();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
// Assuming the specs become part of an inventory item?
item.specs = inventorySpecification.Where(x => specs.ContainsKey(x.JsonKey)).Select(x => new SpecDto() { Key = x.JsonKey, Value = specs.GetValue(x.JsonKey)});
}
The first call to the DB for inventoryItems should produce one SQL query that will pull all the information you need at once to construct your InventoryItemDto and thus only hits the DB once. Then it pulls the specs out and uses OrderBy() before materialising which means the OrderBy will be run as part of the SQL query rather than in memory. Both those results are materialised via .ToList() which will cause EF to pull the results into memory in one go.
Finally the loop goes over your constructed inventoryItems, parses the Json and then filters the specs based on that. I am unsure of where you were using the specDtos so I made an assumption that it was part of the model. I would recomend checking the performance of the Json work you are doing as that could be contributing to your slow down.
A more integrated approach to using Json as part of your EF models can be seen at this answer: https://stackoverflow.com/a/51613611/621524 however you will still be unable to use those properties to offload execution to SQL as accessing properties that are defined within code will cause queries to fragment and run in several parts.

how to search record from single table with multiple parameters using LINQ?

I am trying to search record(s) from table by appying multiple search parameters.
as per below snap.
here by using various parameters as per above snap i want to filter the records.
here user could enter any combination of parameter(s) to search record.
i tried something like below code hich works for single condition but fails for combination of any search paramets.
public List<students> SearchStudents(students search)
{
var result = new List<students>();
var records= from stud in db.students
where stud.enrollmentNumber== search.enrollmentNumber
|| stud.enrollmentDate==search.enrollmenttDate
|| stud.enrollmentType==search.enrollmentType
|| stud.className==search.className
select new Search()
{
enrollmentNumber= stud.enrollmentNumber,
enrollmentDate = stud.enrollmentDate,
enrollmentType = stud.enrollmentType,
Name = stud.Name,
className=stud.className,
Description = stud.Description
};
result = records.ToList();
return result;
}
but this is not working properly. means it returns same result whatever parameters I pass.
Like in the table i ahve 20 records and the enrollment number is the unique value field in DB so here when i am passing enrollment number thats like "2018-0001" it returns all records when it should return only single reocrd.
can someone guide me with this?
Without further explanation in your question about how this isn't working, the best we can do is guess. However, one very plausible reason for this is because you're including parameters you don't want to be filtering on.
Because you're using ORs in your statement, if any of those other properties are defaulted in the database, you're going to be returning those records. What you need to be doing is conditionally including your pieces of the WHERE clauses for only the properties that you want to search on. Unfortunately, that is not possible with the "SQL syntax" version of LINQ, so you will need to convert your query to that. (Good news: It's slightly more performant as well as it usually has to convert the SQL to the method syntax.)
Because of deferred execution, your query will not be sent to the database until you call a .ToList() or something to actually start processing the results. This allows you to chain method calls together, even if they are completely different C# statements. This is what you'll want to do:
public List<students> SearchStudents(students search)
{
var query = db.students;
if (!string.IsNullOrWhiteSpace(search.enrollmentNumber))
{
query = query.Where(s => s.enrollmentNumber == search.enrollmentNumber);
}
if (search.enrollmentDate != DateTime.MinValue)
{
query = query.Where(s => s.enrollmentDate == search.enrollmentDate);
}
if (!string.IsNullOrWhiteSpace(search.enrollmentType))
{
query = query.Where(s => s.enrollmentType == search.enrollmentType);
}
if (!string.IsNullOrWhiteSpace(search.className))
{
query = query.Where(s => s.className == search.className);
}
return query.Select(stud => new Search
{
enrollmentNumber= stud.enrollmentNumber,
enrollmentDate = stud.enrollmentDate,
enrollmentType = stud.enrollmentType,
Name = stud.Name,
className=stud.className,
Description = stud.Description
})
.ToList();
}
You may need to adjust the if statements in there to accommodate different data types than what is intuitive from the names, but this will only add the filter if a value has been provided.

Better loading performance with EF code first and MVC 4

I am trying to make better (= faster) response in my MVC 4 project and mainly in Web Api part. I added MiniProfiler to see where is problem with slow loading but I can't figure out.
duration (ms) from start (ms) query time (ms)
http://www.url.com:80/api/day?city=param (example) 1396.1 +0.0 1 sql 173.8
logging 9.3 +520.9
EF query 4051.5 +530.2 2 sql 169.6
then when I tried same url again I have these numbers:
http://www.url.com:80/api/day?city=param (example) 245.6 +0.0 1 sql 50.6
logging 8.6 +19.6
EF query 7.7 +28.3
but when I tried it after 2 mins later I get again big numbers like in first example.
Same with loading Home Index:
http://www.blanskomenu.amchosting.cz:80/ 333.0 +0.0
Controller: HomeController.Index 71.0 +286.8
Find: Index 100.4 +387.8
Render : Index 2468.1 +494.6
This is my method for Web Api in first example
[OutputCache(CacheProfile = "Cache1Hour", VaryByParam = "city")]
public IEnumerable<RestaurantDayMealsView> GetDay(string city)
{
var profiler = MiniProfiler.Current;
using (profiler.Step("logging"))
{
var logFile = new LogFile(System.Web.HttpContext.Current.Server.MapPath("~/Logs/"), DateTime.Today);
logFile.Write(String.Format("{0},api/daymenu,{1}", DateTime.Now, city));
}
using (profiler.Step("EF query"))
{
var meals = repo.GetAllDayMealsForCity(city);
if (meals == null)
{
throw new HttpResponseException(Request.CreateResponse(HttpStatusCode.NotFound));
}
return meals;
}
}
and my repository method:
public IEnumerable<RestaurantDayMealsView> GetAllDayMealsForCity(string city)
{
return db.Restaurants
.Include(rest => rest.Meals)
.Where(rest => rest.City.Name == city)
.OrderBy(r => r.Order)
.AsEnumerable()
.Select(r => new RestaurantDayMealsView()
{
Id = r.Id,
Name = r.Name,
Meals = r.Meals.Where(meal => meal.Date == DateTime.Today).ToList(),
IsPropagated = r.IsPropagated
}).Where(r => r.Meals.Count > 0);
}
for my Home Index I have in my controller just:
public ActionResult Index()
{
return View();
}
So my questions are:
Why is Rendering of Index taking so long? I have just default website so I think there is no problem with css and other things.
What is taking so long in EF query when it is not query? How can I fix these problems?
I was looking at these links: SO list and ASP.NET MVC Overview - performence and I tried some tricks and read about others but nothing help me much. Is it possible that problem is with hosting? Or where? Thanks
It looks like you've got a 1+N query issue in your repository method. Using Include is only optimized if your don't modify the collection (i.e. use something like Where on it). When you do that, EF will re-fetch the records from the database. You need to cast Meals to a List first, and then run your Where clause. That will essentially freeze the pre-selected results for Meals and then filter them in memory instead of at the database.
Meals = r.Meals.ToList().Where(meal => meal.Date == DateTime.Today).ToList(),
1.
In your Repository.GetAllDayMealsForCity() method:
return db.Restaurants
.Include(rest => rest.Meals)
.Where(rest => rest.City.Name == city)
.OrderBy(r => r.Order)
.AsEnumerable() // <-- Materiazling the query before projection
.Select(r => new RestaurantDayMealsView()
{
Id = r.Id,
Name = r.Name,
Meals = r.Meals.Where(meal => meal.Date == DateTime.Today).ToList(),
IsPropagated = r.IsPropagated
}).Where(r => r.Meals.Count > 0);
You call AsEnumerable() before Projecting the results using the Select method. you have to remember that AsEnumerable() is causing the query to 'Materialize' (execute), and because you're calling it before the Select method, your query is not limiting the results to the data needed by RestaurantDayMealsView only (the further projection is done on an in-memory objects and not on the data store).
Also, your last Where could be also appended before the AsEnumerable() method.
2.
The reason for the significant difference in your profiling results between the first and second hit could be that after the first time Entity Framework is querying for the data from SQL Server, it internally caches the results in memory for better performance.

Categories