How to Performance Test This and Suggestions to Make Faster? - c#

I seem to have written some very slow piece of code which gets slower when I have to deal with EF Core.
Basically I have a list of items that store attributes in a Json string in the database as I am storing many different items with different attributes.
I then have another table that contains the display order for each attribute, so when I send the items to the client I am order them based on that order.
It is kinda slow at doing 700 records in about 18-30 seconds (from where I start my timer, not the whole block of code).
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId);
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
Stopwatch a = new Stopwatch();
a.Start();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
var specDtos = new List<SpecDto>();
foreach (var inventorySpecification in inventorySpecifications.OrderBy(x => x.DisplayOrder))
{
if (specs.ContainsKey(inventorySpecification.JsonKey))
{
var value = specs.GetValue(inventorySpecification.JsonKey);
var newSpecDto = new SpecDto()
{
Key = inventorySpecification.JsonKey,
Value = displaySpec.ToString()
};
specDtos.Add(newSpecDto);
}
}
var dto = new InventoryItemDto()
{
// create dto
};
inventoryItemDtos.Add(dto);
}
Now it goes crazy slow when I add EF some more columns that I need info from.
In the //create dto area I access some information from other tables
var dto = new InventoryItemDto()
{
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
};
By trying to access these columns in the loop takes 6mins to process 700 rows.
I don't understand why it is so slow, it's the only change I really made and I made sure to eager load everything in.
To me it almost makes me think eager loading is not working, but I don't know how to verify if it is or not.
var inventoryItems = dbContext.InventoryItems.Include(x => x.Branch).ThenInclude(x => x.Company)
.Include(x => x.Branch).ThenInclude(x => x.Country)
.Include(x => x.Branch).ThenInclude(x => x.State)
.Include(x => x.Brand)
.Where(x => x.InventoryCategoryId == categoryId).ToList();
so I thought because of doing this the speed would not be that much different then the original 18-30 seconds.
I would like to speed up the original code too but I am not really sure how to get rid of the dual foreach loops that is probably slowing it down.

First, loops inside loops is a very bad thing, you should refactor that out and make it a single loop. This should not be a problem because inventorySpecifications is declared outside the loop
Second, the line
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
should end with ToList(), because it's enumerations is happening within the inner foreach, which means that the query is running for each of "inventoryItems"
that should save you a good amount of time

I'm no expert but this part of your second foreach raises a red flag: inventorySpecifications.OrderBy(x => x.DisplayOrder). Because this is getting called inside another foreach it's doing the .OrderBy call every time you iterate over inventoryItems.
Before your first foreach loop, try this: var orderedInventorySpecs = inventorySpecifications.OrderBy(x => x.DisplayOrder); and then use foreach (var inventorySpec in orderedInventorySpecs) and see if it makes a difference.

To help you better understand what EF is running behind the scenes add some logging in to expose the SQL being run which might help you see how/where your queries are going wrong. This can be extremely helpful to help determine if your queries are hitting the DB too often. As a very general rule you want to hit the DB as few times as possible and retrieve only the information you need via the use of .Select() to reduce what is being returned. The docs for the logging are: http://learn.microsoft.com/en-us/ef/core/miscellaneous/logging
I obviously cannot test this and I am a little unsure where your specDto's go once you have them but I assume they become part of the InventoryItemDto?
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId).Select(x => new InventoryItemDto() {
Attributes = x.Attributes,
//.....
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
}).ToList();
var inventorySpecifications = dbContext.InventoryCategorySpecifications
.Where(x => x.InventoryCategoryId == categoryId)
.OrderBy(x => x.DisplayOrder)
.Select(x => x.InventorySpecification).ToList();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
// Assuming the specs become part of an inventory item?
item.specs = inventorySpecification.Where(x => specs.ContainsKey(x.JsonKey)).Select(x => new SpecDto() { Key = x.JsonKey, Value = specs.GetValue(x.JsonKey)});
}
The first call to the DB for inventoryItems should produce one SQL query that will pull all the information you need at once to construct your InventoryItemDto and thus only hits the DB once. Then it pulls the specs out and uses OrderBy() before materialising which means the OrderBy will be run as part of the SQL query rather than in memory. Both those results are materialised via .ToList() which will cause EF to pull the results into memory in one go.
Finally the loop goes over your constructed inventoryItems, parses the Json and then filters the specs based on that. I am unsure of where you were using the specDtos so I made an assumption that it was part of the model. I would recomend checking the performance of the Json work you are doing as that could be contributing to your slow down.
A more integrated approach to using Json as part of your EF models can be seen at this answer: https://stackoverflow.com/a/51613611/621524 however you will still be unable to use those properties to offload execution to SQL as accessing properties that are defined within code will cause queries to fragment and run in several parts.

Related

How to make a linq-query with multiple Contains()/Any() on possibly empty lists?

I am trying to make a query to a database view based on earlier user-choices. The choices are stored in lists of objects.
What I want to achieve is for a record to be added to the reportViewList if the stated value exists in one of the lists, but if for example the clientList is empty the query should overlook this statement and add all clients in the selected date-range. The user-choices are stored in temporary lists of objects.
The first condition is a time-range, this works fine. I understand why my current solution does not work, but I can not seem to wrap my head around how to fix it. This example works when both a client and a product is chosen. When the lists are empty the reportViewList is obviously also empty.
I have played with the idea of adding all the records in the date-range and then removing the ones that does not fit, but this would be a bad solution and not efficient.
Any help or feedback is much appreciated.
List<ReportView> reportViews = new List<ReportView>();
using(var dbc = new localContext())
{
reportViewList = dbc.ReportViews.AsEnumerable()
.Where(x => x.OrderDateTime >= from && x.OrderDateTime <= to)
.Where(y => clientList.Any(x2 => x2.Id == y.ClientId)
.Where(z => productList.Any(x3 => x3.Id == z.ProductId)).ToList();
}
You should not call AsEnumerable() before you have added eeverything to your query. Calling AsEnumerable() here will cause your complete data to be loaded in memory and then be filtered in your application.
Without AsEnumerable() and before calling calling ToList() (Better call ToListAsync()), you are working with an IQueryable<ReportView>. You can easily compose it and just call ToList() on your final query.
Entity Framework will then examinate your IQueryable<ReportView> and generate an SQL expression out of it.
For your problem, you just need to check if the user has selected any filters and only add them to the query if they are present.
using var dbc = new localContext();
var reportViewQuery = dbc.ReportViews.AsQueryable(); // You could also write IQuryable<ReportView> reportViewQuery = dbc.ReportViews; but I prefer it this way as it is a little more save when you are refactoring.
// Assuming from and to are nullable and are null if the user has not selected them.
if (from.HasValue)
reportViewQuery = reportViewQuery.Where(r => r.OrderDateTime >= from);
if (to.HasValue)
reportViewQuery = reportViewQuery.Where(r => r.OrderDateTime <= to);
if(clientList is not null && clientList.Any())
{
var clientIds = clientList.Select(c => c.Id).ToHashSet();
reportViewQuery = reportViewQuery.Where(r => clientIds.Contains(y.ClientId));
}
if(productList is not null && productList.Any())
{
var productIds = productList.Select(p => p.Id).ToHashSet();
reportViewQuery = reportViewQuery.Where(r => productIds .Contains(r.ProductId));
}
var reportViews = await reportViewQuery.ToListAsync(); // You can also use ToList(), if you absolutely must, but I would not recommend it as it will block your current thread.

Ef returning null, but entity exists

I'm trying to load some user info in code, but EF returns null.
foreach (var user in allAutoUsers)
{
Wallet wallet = db.CabinetUsers
.Find(user.IdCabinetUser)?
.Wallets
.FirstOrDefault(x => x.TypeCurrency == currency);
}
Variable user reporting 1 wallet, but when I'm trying to get it in code above it returns null.
Are there some ways to solve this problem?
Read more into Linq expressions rather than relying on Find. You can be running into issues if your relationships between entities are not defined as virtual which would prevent EF from lazy loading them.
The issue with using .Find() is that it will return the entity if it exists, however, attempting to access any related property that the DbContext isn't already aware of will require a lazy load call. This can be easily missed if lazy loading is disabled, or the member isn't virtual, and it can be a performance issue when it is enabled.
Instead, Linq can allow you to query through the object graph directly to get what you want:
foreach (var user in allAutoUsers)
{
Wallet wallet = db.CabinetUsers
.Where(x => x.IdCabinetUser == user.IdCabinetUser)
.SelectMany(x => x.Wallets)
.SingleOrDefault(x => x.TypeCurrency == currency);
// do stuff with wallet...
}
This assumes that there will be only 1 wallet for the specified currency per user. When expecting 0 or 1, use SingleOrDefault. Use FirstOrDefault only when you are expecting 0 or many, want the "first" one and have specified an OrderBy clause to ensure the first item is predictable.
This will result in a query per user. To accomplish with 1 query for all users:
var userIds = allAutoUsers.Select(x => x.IdCabinetUser).ToList();
var userWallets = db.CabinetUsers
.Where(x => userIds.Contains(x.IdCabinetUser))
.Select(x => new
{
x.IdCabinetUser,
Wallet = x.SelectMany(x => x.Wallets)
.SingleOrDefault(x => x.TypeCurrency == currency);
}).ToList();
From this I would consider expanding the wallets SelectMany with a Select for the details in the Wallet you actually care about rather than a reference to the entire wallet entity. This has the benefit of speeding up the query, reducing memory use, and avoids the potential for lazy load calls tripping things up if Wallet references any other entities that get touched later on.
For example if you only need the IdWallet, WalletName, TypeCurrency, and Balance:
// replace this line from above...
Wallet = x.SelectMany(x => x.Wallets)
// with ...
Wallet = x.SelectMany(x => x.Wallets.Select(w => new
{
w.IdWallet,
w.WalletName,
w.TypeCurrency,
w.Ballance
}) // ...
From there you can foreach to your heart's content without extra queries:
foreach ( var userWallet in userWallets)
{
// do stuff with userWallet.Wallet and userWallet.IdCabinetUser.
}
If you want to return the wallet details to a calling method or view or such, then you cannot use an anonymous type for that ( new { } ). Instead you will need to define a simple class for the data you want to return and use Select into that. I.e. new WalletDTO { IdWallet = w.IdWallet, //... } Even if you use Entities, it is recommended to reduce these into DTOs or ViewModels rather than returning entities. Entities should not "live" longer than the DbContext that spawned them otherwise you get all kinds of nasty behaviour popping up like ObjectDisposedException and serialization exceptions.

Replacing Include() calls to Select()

Im trying to eliminate the use of the Include() calls in this IQueryable definition:
return ctx.timeDomainDataPoints.AsNoTracking()
.Include(dp => dp.timeData)
.Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasGroup))
.Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasUnit))
.Where(dp => dp.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
.Where(dp => dp.Source == 235235)
.Where(dp => dp.timeData.time >= start && cd.timeData.time <= end)
.OrderByDescending(cd => cd.timeData.time);
I have been having issues with the database where the run times are far too long and the primary cause of this is the Include() calls are pulling everything.
This is evident in viewing the table that is returned from the resultant SQL query generated from this showing lots of unnecessary information being returned.
One of the things that you learn I guess.
The Database has a large collection of data points which there are many Recorded values.
Each Recorded value is mapped to a Record Kind which may have a Record Alias.
I have tried creating a Select() as an alternative but I just cant figure out how to construct the right Select and also keep the entity hierarchy correctly loaded. I.e. the related entities are loaded with unnecessary calls to the DB.
Does anyone has alternate solutions that may jump start me to solve this problem.
Ill add more detail if needed.
You are right. One of the slower parts of a database query is the transport of the selected data from the DBMS to your local process. Hence it is wise to limit this.
Every TimeDomainDataPoint has a primary key. All RecordValues of this TimeDomainDataPoint have a foreign key TimeDomainDataPointId with a value equal to this primary key.
So If TimeDomainDataPoint with Id 4 has a thousand RecordValues, then every RecordValue will have a foreign key with a value 4. It would be a waste to transfer this value 4 a 1001 times, while you only need it once.
When querying data, always use Select and select only the properties you actually plan to use. Only use Include if you plan to update the fetched included items.
The following will be much faster:
var result = dbContext.timeDomainDataPoints
// first limit the datapoints you want to select
.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
.Where(datapoint => datapoint.Source == 235235)
.Where(datapoint => datapoint.timeData.time >= start
&& datapoint.timeData.time <= end)
.OrderByDescending(datapoint => datapoint.timeData.time)
// then select only the properties you actually plan to use
Select(dataPoint => new
{
Id = dataPoint.Id,
RecordValues = dataPoint.RecordValues
.Where(recordValues => ...) // if you don't want all RecordValues
.Select(recordValue => new
{
// again: select only the properties you actually plan to use:
Id = recordValue.Id,
// not needed, you know the value: DataPointId = recordValue.DataPointId,
RecordKinds = recordValues.RecordKinds
.Where(recordKind => ...) // if you don't want all recordKinds
.Select(recordKind => new
{
... // only the properties you really need!
})
.ToList(),
...
})
.ToList(),
TimeData = dataPoint.TimeData.Select(...),
...
});
Possible imporvement
The part:
.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
is used to fetch only datapoints that have recordValues with a non-null RecordAlias. If you are selecting the RecordAlias anyway, consider doing this Where after your select:
.Select(...)
.Where(dataPoint => dataPoint
.Where(dataPoint.RecordValues.RecordKind.RecordAlias != null)
.Any());
I'm not really sure whether this is faster. If your database management system internally first creates a complete table with all columns of all joined tables and then throws away the columns that are not selected, then it won't make a difference. However, if it only creates a table with the columns it actually uses, then the internal table will be smaller. This could be faster.
your problem is hierarchy joins in your query.In order to decrease this problem create other query for get result from relation table as follows:
var items= ctx.timeDomainDataPoints.AsNoTracking().Include(dp =>dp.timeData).Include(dp => dp.RecordValues);
var ids=items.selectMany(item=>item.RecordValues).Select(i=>i.Id);
and on other request to db:
var otherItems= ctx.RecordAlias.AsNoTracking().select(dp =>dp.RecordAlias).where(s=>ids.Contains(s.RecordKindId)).selectMany(s=>s.RecordAliasGroup)
to this approach your query do not have internal joins.

Compare two List<POCO> to find differences being case insensitive

I have two collections:
private void ProcessCollectionsData(List<OrganizationUser> databaseUsers, List<OrganizationUser> importedUsers) { ... }
There is a property called UserIdentifier (String) and UserId (Int32). This is how I'm comparing them which is producing wrong results and heavy performance bottlenecks:
LogMessage(new LogEntry(" - Generating delta for new users...", true));
Task.WaitAll(Task.Run(() =>
{
newUsers = databaseUsers.Any() ? importedUsers.Where(x => !databaseUsers.Select(y => y.UserIdentifier.ToLower())
.ToList()
.Contains(x.UserIdentifier.ToLower()))
.ToList()
: importedUsers;
duplicates = newUsers.OrderByDescending(o1 => o1.UserId)
.GroupBy(s => s.UserIdentifier, StringComparer.InvariantCultureIgnoreCase)
.Where(y => y.Count() > 1);
foreach (var item in duplicates)
{
newUsers.RemoveAll(s => string.Equals(s.UserIdentifier, item.Key, StringComparison.OrdinalIgnoreCase));
newUsers.Add(item.First());
}
}));
LogMessage(new LogEntry(String.Format(" - Done. New users to be imported: {0}", newUsers.Count)));
The data in the importedUsers comes from a CSV and can be duplicated and also duplicated with mix case for UserIdentifier field. The data in databaseUsers is empty first time around. Then, after first run, the import file dumps around a 100,000 users to database and at second and consecutive runs, the databaseUsers is loaded with 100,000 existing users and importedUsersalso brings data in the range of 99,990 to 100,100 (example) which requires me to generate delta collections so that I know which users to mark delete, which to add (new) and remaining (common) needs to be updated.
Can anyone suggest a faster way to do this?
I can see I'm making a mistake where I'm assigning to the newUser collection by using ToLower()
Correction to above statement, the resulting newUsers collection retains case information as desired. So the performance is the real issue here now.
I think the crux of your performance problem is here
!databaseUsers.Select(y => y.UserIdentifier.ToLower()).ToList()
Given databaseUsers can contain 100,000 users you certainly don't want to be pulling that entire list into memory. Getting rid of the Select / ToList calls should mean you only query the DB which should make a difference
importedUsers.Where(x => !databaseUsers.Any(y =>
y.UserIdentifier.ToLower() == x.UserIdentifier.ToLower()).ToList()

Check if list of entities already exist in Database using linq to entities

I have a list of entities coming from an external source. I need to compare it to what I already have, and only add the ones that don't exist. Pseudo code below.
var newVersions = item.Versions
.Where(s => db.ExistingVersions
.Select(t=>t.versionID)
.DoesNotContains(s.versionID));
That obviously doesn't work, and I'm not sure how to fix it. I don't want to use a for loop because I believe that would mean I would have hundreds of database hits just to check the versions on each item. I am loading multiple items, each item has as many as 100 versions.
If there's nothing more to the question than I think, then it shouldn't be complicated.
Assuming that VersionID is unique identifier, then you can do this:
var existingVersions = db.ExistingVersions.Select(x => x.VersionID).ToList();
mind you, for Contains it would be better to:
var existingVersions = new HashSet(db.ExistingVersions.Select(x => x.VersionID).ToList());
[EDIT]: Per Magnus's comment, you can drop the ToList from above code snippet.
and then:
var newVersions = items.Versions.Where(x => !existingVersions.Contains(x.VersionID));
This is probably the most performant, 'cause when calling the database, you select only the VersionID. Other option involves writing a custom IEqualityComparer<T> and using Except, but you'd have to pull everything from the DB which may be more costly.
You could try something like this:
// in memory: get list of potential version ids
var potentialIds = item.Versions.Select( o => o.versionID ).ToList();
// hit database ( once ) : get existing version ids
var existingIds = db.ExistingVersions
.Where( o => potentialIds.Contains( o.versionID ) )
.Select( o => o.versionID )
.ToList();
// in memory: filter potential objects
var newVersions = item.Versions
.Where( o => !existingIds.Contains( o.versionID ) )
.ToList();
// database inserts:
foreach( var newVersion in newVersions )
{
...
One thing to bear in mind is that this is not thread-safe: if something else is adding ExistingVersion rows at the same time, you may try to insert a record that was added after you checked the database.

Categories