We are using EF 6.0, .NET 4.5 and using code first approach and our database has around 170 entities(tables) and the main table holding around 150,000 records
On first load of the entity framework it takes around 25 seconds.
I am trying to improve this time as this is too slow and as the number of records increases it becomes slower.
I have tried generating native images, tried using pre generated interactive views but I couldn't achieve any significant improvements.
Can anyone please help me on this?
Thanks.
You can consider the Entity Framework Pre-Generated Mapping Views.You can use EF Power Tools to create pre-generate views.
Using pre-generated views moves the cost of view generation from model
loading (run time) to compile time. While this improves startup
performance at runtime, you will still experience the pain of view
generation while you are developing. There are several additional
tricks that can help reduce the cost of view generation, both at
compile time and run time.
You can refer this for knowing more about it : Entity Framework Pre-Generated Mapping Views
You can use Caching in the Entity Framework to improve the performance of your app.
There are 3 types of caching.
1. Object caching – the ObjectStateManager built into an ObjectContext
instance keeps track in memory of the objects that have been
retrieved using that instance. This is also known as first-level
cache.
2. Query Plan Caching - reusing the generated store command when a
query is executed more than once.
3. Metadata caching - sharing the metadata for a model across different
connections to the same model.
You can refer this article to read more about it : Performance Considerations for EF 6
I recently had a simple query that runs super quick in SSMS that was taking way, way too long to run using the Entity Framework in my C# program.
This page has been extremely helpful, when trouble shooting EF performance problems in general:
https://www.simple-talk.com/dotnet/net-tools/entity-framework-performance-and-what-you-can-do-about-it/
..but in this case, nothing helped. So in the end, I did this:
List<UpcPrintingProductModel> products = new List<UpcPrintingProductModel>();
var sql = "select top 75 number, desc1, upccode "
+ "from MailOrderManager..STOCK s "
+ "where s.number like #puid + '%' "
;
var connstring = ConfigurationManager.ConnectionStrings["MailOrderManagerContext"].ToString();
using (var connection = new SqlConnection(connstring))
using (var command = new SqlCommand(sql, connection)) {
connection.Open();
command.Parameters.AddWithValue("#puid", productNumber);
using (SqlDataReader reader = command.ExecuteReader()) {
while (reader.Read()) {
var product = new UpcPrintingProductModel() {
ProductNumber = Convert.ToString(reader["number"]),
Description = Convert.ToString(reader["desc1"]),
Upc = Convert.ToString(reader["upccode"])
};
products.Add(product);
}
}
}
(For this particular query, I just completely bypassed the EF altogether, and used the old standby: System.Data.SqlClient.)
You can wrinkle your nose in disgust; I certainly did - but it didn't actually take that long to write, and it executes almost instantly.
You also can circumvent this issue asynchronously "warming" your dbcontexts at application start moment.
protected void Application_Start()
{
// your code.
// Warming up.
Start(() =>
{
using (var dbContext = new SomeDbContext())
{
// Any request to db in current dbContext.
var response1 = dbContext.Addresses.Count();
}
});
}
private void Start(Action a)
{
a.BeginInvoke(null, null);
}
I also recommend use such settings as: (if they fit your app)
dbContext.Configuration.AutoDetectChangesEnabled = false;
dbContext.Configuration.LazyLoadingEnabled = false;
dbContext.Configuration.ProxyCreationEnabled = false;
Skip validation part ( ie Database.SetInitializer<SomeDbContext>(null);)
Using .asNoTraking() on GET queries.
For additional information you can read:
https://msdn.microsoft.com/en-in/data/hh949853.aspx
https://www.fusonic.net/en/blog/3-steps-for-fast-entityframework-6.1-code-first-startup-performance/
https://www.fusonic.net/en/blog/ef-cache-deployment/
https://msdn.microsoft.com/en-us/library/dn469601(v=vs.113).aspx
https://blog.3d-logic.com/2013/12/14/using-pre-generated-views-without-having-to-pre-generate-views-ef6/
In some cases EF does not use Query Plan Caching. For example if you use contans, any, all methods or use constants in you query. You can try NihFix.EfQueryCacheOptimizer. It convert your query expression that allow EF use cahce.
Related
I have 200k rows in my table and I need to filter the table and then show in datatable. When I try to do that, my sql run fast. But when I want to get row count or run the ToList(), it takes long time. Also when I try to convert it to list it has 15 rows after filter, it has not huge data.
public static List<Books> GetBooks()
{
List<Books> bookList = new List<Books>();
var v = from a in ctx.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList(); // also toList is so slow is my second problem
}
There's nothing wrong with the code you've shown. So either you have some trouble in the database itself, or you're ruining the query by using IEnumerable instead of IQueryable.
My guess is that either ctx.Books is IEnumerable<Books> (instead of IQueryable<Books>), or that the Count (and Where etc.) method you're calling is the Enumerable version, rather than the Queryable version.
Which version of Count are you actually calling?
First, to get help you need to provide quantitative values for "fast" vs. "too long". Loading entities from EF will take longer than running a raw SQL statement in a client tool like TOAD etc. Are you seeing differences of 15ms vs. 15 seconds, or 15ms vs. 150ms?
To help identify and eliminate possible culprits:
Eliminate the possibility of a long-running DbContext instance tracking too many entities bogging down performance. The longer a DbContext is used and the more entities it tracks, the slower it gets. Temporarily change the code to:
List<Books> bookList = new List<Books>();
using (var context = new YourDbContext())
{
var v = from a in context.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList();
}
Using a fresh DbContext ensures queries are not sifting through in-memory entities after running a query to find tracked instances to return. This also ensures we are running against IQueryable off the Books DbSet within the context. We can only guess what "ctx" in your code actually represents.
Next: Look at a profiler for MySQL, or have your database log out SQL statements to capture exactly what EF is requesting. Check that the Count and ToList each trigger just one query against the database, and then run these exact statements against the database. If there are more than these two queries being run then something odd is happening behind the scenes that you need to investigate, such as that your example doesn't really represent what your real code is doing. You could be tripping client side evaluation (if using EF Core 2) or lazy loading. The next thing I would look at is if possible to look at the execution plan for these queries for hints like missing indexes or such. (my DB experience is primarily SQL Server so I cannot provide advice on tools to use for MySQL)
I would log the actual SQL queries here. You can then use DESCRIBE to look at how many rows it hits. There are various tools that can further analyse the queries if DESCRIBE isn't sufficient. This way you can see whether it's the queries or the (lack of) indices that is the problem. Next step has to be guided by that.
I am currently using EF Extensions. One thing I don't understand, "its supposed to help with performance"
however placing a million+ records into List variable, is a Memory Issue itself.
So If wanting to update million records, without holding everything in memory, how can this be done efficiently?
Should we use a for loop, and update in batches say 10,000? Does EFExtensions BulkUpdate have any native functionality to support this?
Example:
var productUpdate = _dbContext.Set<Product>()
.Where(x => x.ProductType == 'Electronics'); // this creates IQueryable
await productUpdate.ForEachAsync(c => c.ProductBrand = 'ABC Company');
_dbContext.BulkUpdateAsync(productUpdate.ToList());
Resource:
https://entityframework-extensions.net/bulk-update
This is actually something that EF is not made for. EF's database interactions start from the record object, and flow from there. EF cannot generate a partial UPDATE (i.e. not overwriting everything) if the entity wasn't change tracked (and therefore loaded), and similarly it cannot DELETE records based on a condition instead of a key.
There is no EF equivalent (without loading all of those records) for conditional update/delete logic such as
UPDATE People
SET FirstName = 'Bob'
WHERE FirstName = 'Robert'
or
DELETE FROM People
WHERE FirstName = 'Robert'
Doing this using the EF approach will require you to load all of these entities just to send them back (with an update or delete) to the database, and that's a waste of bandwidth and performance as you've already found.
The best solution I've found here is to bypass EF's LINQ-friendly methods and instead execute the raw SQL yourself. This can still be done using an EF context.
using (var ctx = new MyContext())
{
string updateCommand = "UPDATE People SET FirstName = 'Bob' WHERE FirstName = 'Robert'";
int noOfRowsUpdated = ctx.Database.ExecuteSqlCommand(updateCommand);
string deleteCommand = "DELETE FROM People WHERE FirstName = 'Robert'";
int noOfRowsDeleted = ctx.Database.ExecuteSqlCommand(deleteCommand);
}
More information here. Of course don't forget to protect against SQL injection where relevant.
The specific syntax to run raw SQL may vary per version of EF/EF Core but as far as I'm aware all versions allow you to execute raw SQL.
I can't comment on the performance of EF Extensions or BulkUpdate specifically, and I'm not going to buy it from them.
Based on their documentation, they don't seem to have the methods with the right signatures to allow for conditional update/delete logic.
BulkUpdate doesn't seem to allow you to input the logical condition (the WHERE in your UPDATE command) that would allow you to optimize this.
BulkDelete still has a BatchSize setting, which suggests that they are still handling the records one at a time (well, per batch I guess), and not using a single DELETE query with a condition (WHERE clause).
Based on your intended code in the question, EF Extensions isn't really giving you what you need. It's more performant and cheaper to simply execute raw SQL on the database, as this bypasses EF's need to load its entities.
Update
I might stand corrected, there is some support for conditional update logic, as seen here. However, it is unclear to me while the example still loads everything in memory and what the purpose of that conditional WHERE logic then is if you've already loaded it all in memory (why not use in-memory LINQ then?)
However, even if this works without loading the entities, it's still:
more limited (only equality checks are allowed, compared to SQL allowing any boolean condition that is valid SQL),
relatively complex (I don't like their syntax, maybe that's subjective)
and more costly (still a paid library)
compared to rolling your own raw SQL query. I would still suggest rolling your own raw SQL here, but that's just my opinion.
I found the "proper" EF Extensions way to do a bulk update with a query-like condition:
var productUpdate = _dbContext.Set<Product>()
.Where(x => x.ProductType == 'Electronics')
.UpdateFromQuery( x => new Product { ProductBrand = "ABC Company" });
This should result in a proper SQL UPDATE ... SET ... WHERE, without the need to load entities first, as per the documentation:
Why UpdateFromQuery is faster than SaveChanges, BulkSaveChanges, and BulkUpdate?
UpdateFromQuery executes a statement directly in SQL such as UPDATE [TableName] SET [SetColumnsAndValues] WHERE [Key].
Other operations normally require one or multiple database round-trips which makes the performance slower.
You can check the working syntax on this dotnet fiddle example, adapted from their example of BulkUpdate.
Other considerations
No mention of batch operations for this, unfortunately.
Before doing a big update like this, it might be worth considering deactivating indexes you may have on this column, and rebuild them afterward. This is especially useful if you have many of them.
Careful about the condition in the Where, if it can't be translated as SQL by EF, then it will be done client side, meaning the "usual" terrible roundtrip "Load - change in memory - update"
Would an Entity Framework LINQ-to-Entities query return all records (even 10 million rows) from a database, or would there be any limitation on retrieval record size?
Entity Framework and LINQ don't have any limitations for how many rows they can fetch. A problem you might face is making your server out of memory, since you're trying to retrieve that amount of data at once.
You should consider using something like Dapper as Valkyriee mentioned in the comments, or at least disable proxy if you still want to use Entity Framework:
using(var db = new MyDbContext())
{
db.Configuration.ProxyCreationEnabled = false;
var data = db.Users.ToList(); // suppose you have 10 milion users
}
...just be aware of what disabling proxy will cause. I'd still recommend using Dapper for this purpose.
Normally fetching 10 million records from database at one shot is not a good practice. You can use Entities recommended pagination functionalities.
I am looking for a cache system that works with Entity Framework 6+. Either to find a suitable library or implement it myself.
I have already investigated these two open source cache providers:
Second Level Cache for Entity Framework 6.1
https://efcache.codeplex.com/
EFSecondLevelCache
https://github.com/VahidN/EFSecondLevelCache/
They seem like solid products although I am looking for a way to specify the maximum cache time within the query. I think that would be the best way to specify the oldest data I would be able to accept (since this is very dependent on business logic of course)
This would be the best way to call it I think:
using (var db = new MyEF6Entities()) {
var path = db.Configuration.Where(c => c.name="Path")
.AllowFromCache(TimeSpan.FromMinutes(60));
var onlineUserList = db.Users.Where(u => u.IsOnline)
.AllowFromCache(TimeSpan.FromSeconds(30));
}
Is there a way to make an API like this?
EDIT: I have now tried EF+ as suggested by #JonathanMagnan but as an example the below code does not ever get a cache hit. I know this because it is very slow, I also see in the mysql administrator that there is a connection made with the specific query, and if I change data in the database it will show immediately, I would expect a delay of 60 minutes
public String GetClientConfig(string host, Guid deviceId) {
using (var db = new DbEntities(host)) {
var clientConfig = db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60)).ToList();
return string.Join("\n", clientConfig.Select(b => b.Name + "=" + b.Value));
}
return null;
}
EDIT 2: I realized not that caching works, however it still creates a connection to the database because I create a new context instance. There is however a problem! I do EF requests in a web service. Depending on the host name of the web service Host:-header different databases are queried. EF Plus does not seem to handle this so if I use http://host1/ClientConfig I later get the same and incorrect results from http://host2/ClientConfig. Is there a way to tag the cache with the host name?
Disclaimer: I'm the owner of the project Entity Framework Plus (EF+)
I cannot speak for other libraries you already investigated, but EF+ Query Cache allows to cache data for a specific amount of time.
Wiki: EF+ Query Cache
Example
using (var db = new MyEF6Entities()) {
var path = db.Configuration.Where(c => c.name="Path")
.FromCache(DateTime.Now.AddMinutes(60));
var onlineUserList = db.Users.Where(u => u.IsOnline)
.FromCache(DateTime.Now.AddSeconds(30));
}
You can also use cache "tags" to cache your data multiple times if some logic requires data live, and other logic doesn't need it.
Example
using (var db = new MyEF6Entities()) {
var recentUserList = db.Users.Where(u => u.IsOnline)
.FromCache(DateTime.Now.AddHours(1), "recent");
var onlineUserList = db.Users.Where(u => u.IsOnline)
.FromCache(DateTime.Now.AddSeconds(30), "online");
}
From my knowledge, I don't think any cache support is taking data cached in the last X seconds.
EDIT: Answer subquestion
Thanks Jonathan. How does EF+ work with using different databases for
example, and different logged in users for example. Will it handle
this automatically or do I need to give it some hints ?
It depends on each feature. For example, Query Cache works with all databases provider since only LINQ is used under the hood. You don't need to specify anything, it already handles everything.
Some other features like Query Future don't work yet on all providers. It works on popular providers like SQL Server, SQL Azure, and MySQL.
You can see the requirements section under each feature. We normally say "All supported" if it supports all providers.
EDIT: Answer subquestion
I change the database in the MyEF6Entities constructor so question is
if EF+ will take the database name into account or if this could mess
up the cache?
Starting from v1.3.34, the connection string is now part of the key. So that will support this scenario.
The key is composed of:
A cache prefix
The connection string
All cache tags
All parameter names & values
Is there a way to tag the cache with the host name
You can use Cache Tags: EF+ Query Cache Tag & ExpireTag
Example:
var host1 = "http://host1/ClientConfig";
var host2 = "http://host2/ClientConfig";
var clientHost1 = db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60), host1).ToList();
var clientHost1 = db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60), host2).ToList();
All tags are part of the cache key so both queries will be cached in a different cache entry.
EDIT: Answer subquestion
I set the ConnectionString in the MyEF6Entities() constructor and it
still seem to confuse data from different databases
Can you generate the cache key and check if the connection string is included?
var query = b.Users.Where(u => u.IsOnline);
var cacheKey = QueryCacheManager.GetCacheKey(query, new string[0]);
It works if I do for example var clientHost =
db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60),
hostName).ToList(); but I shouldnt need to to this since the
connectionStrings are different, right ?
Yes, you are right. You shouldn't need it if the connection string changes with the host.
I need to insert around 2500 rows using EF Code First.
My original code looked something like this:
foreach(var item in listOfItemsToBeAdded)
{
//biz logic
context.MyStuff.Add(i);
}
This took a very long time. It was around 2.2 seconds for each DBSet.Add() call, which equates to around 90 minutes.
I refactored the code to this:
var tempItemList = new List<MyStuff>();
foreach(var item in listOfItemsToBeAdded)
{
//biz logic
tempItemList.Add(item)
}
context.MyStuff.ToList().AddRange(tempItemList);
This only takes around 4 seconds to run. However, the .ToList() queries all the items currently in the table, which is extremely necessary and could be dangerous or even more time consuming in the long run. One workaround would be to do something like context.MyStuff.Where(x=>x.ID = *empty guid*).AddRange(tempItemList) because then I know there will never be anything returned.
But I'm curious if anyone else knows of an efficient way to to a bulk insert using EF Code First?
Validation is normally a very expensive portion of EF, I had great performance improvements by disabling it with:
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;
I believe I found that in a similar SO question--perhaps it was this answer
Another answer on that question rightly points out that if you really need bulk insert performance you should look at using System.Data.SqlClient.SqlBulkCopy. The choice between EF and ADO.NET for this issue really revolves around your priorities.
I have a crazy idea but I think it will help you.
After each adding 100 items call SaveChanges. I have a feeling Track Changes in EF have a very bad performance with huge data.
I would recommend this article on how to do bulk inserts using EF.
Entity Framework and slow bulk INSERTs
He explores these areas and compares perfomance:
Default EF (57 minutes to complete adding 30,000 records)
Replacing with ADO.NET Code (25 seconds for those same 30,000)
Context Bloat- Keep the active Context Graph small by using a new context for each Unit of Work (same 30,000 inserts take 33 seconds)
Large Lists - Turn off AutoDetectChangesEnabled (brings the time down to about 20 seconds)
Batching (down to 16 seconds)
DbTable.AddRange() - (performance is in the 12 range)
As STW pointed out, the DetectChanges method called every time you call the Add method is VERY expensive.
Common solution are:
Use AddRange over Add
SET AutoDetectChanges to false
SPLIT SaveChanges in multiple batches
See: Improve Entity Framework Add Performance
It's important to note that using AddRange doesn't perform a BulkInsert, it's simply invoke the DetecthChanges method once (after all entities is added) which greatly improve the performance.
But I'm curious if anyone else knows of an efficient way to to a bulk
insert using EF Code First
There is some third party library supporting Bulk Insert available:
See: Entity Framework Bulk Insert library
Disclaimer: I'm the owner of Entity Framework Extensions
This library allows you to perform all bulk operations you need for your scenarios:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});
EF is not really usable for batch/bulk operations (I think in general ORMs are not).
The particular reason why this is running so slowly is because of the change tracker in EF. Virtually every call to the EF API results in a call to TrackChanges() internally, including DbSet.Add(). When you add 2500, this function gets called 2500 times. And each call gets slower and slower, the more data you have added. So disabling the change tracking in EF should help a lot:
dataContext.Configuration.AutoDetectChangesEnabled = false;
A better solution would be to split your big bulk operation into 2500 smaller transactions, each running with their own data context. You could use msmq, or some other mechanism for reliable messaging, for initiating each of these smaller transactions.
But if your system is build around a lot a bulk operations, I would suggest finding a different solution for your data access layer than EF.
While this is a bit late and the answers and comments posted above are very useful, I will just leave this here and hope it proves useful for people who
had the same problem as I did and come to this post for answers. This post
still ranks high on Google (at the time of posting this answer) if you search for a way
to bulk-insert records using Entity Framework.
I had a similar problem using Entity Framework and Code First in an MVC 5 application. I had a user submit a form that caused tens of thousands of records
to be inserted into a table. The user had to wait for more than 2 and a half minutes while 60,000 records were being inserted.
After much googling, I stumbled upon BulkInsert-EF6 which is also available
as a NuGet package. Reworking the OP's code:
var tempItemList = new List<MyStuff>();
foreach(var item in listOfItemsToBeAdded)
{
//biz logic
tempItemList.Add(item)
}
using (var transaction = context.Transaction())
{
try
{
context.BulkInsert(tempItemList);
transaction.Commit();
}
catch (Exception ex)
{
// Handle exception
transaction.Rollback();
}
}
My code went from taking >2 minutes to <1 second for 60,000 records.
Although a late reply, but I'm posting the answer because I suffered the same pain.
I've created a new GitHub project just for that, as of now, it supports Bulk insert/update/delete for Sql server transparently using SqlBulkCopy.
https://github.com/MHanafy/EntityExtensions
There're other goodies as well, and hopefully, It will be extended to do more down the track.
Using it is as simple as
var insertsAndupdates = new List<object>();
var deletes = new List<object>();
context.BulkUpdate(insertsAndupdates, deletes);
Hope it helps!
EF6 beta 1 has an AddRange function that may suit your purpose:
INSERTing many rows with Entity Framework 6 beta 1
EF6 will be released "this year" (2013)
public static void BulkInsert(IList list, string tableName)
{
var conn = (SqlConnection)Db.Connection;
if (conn.State != ConnectionState.Open) conn.Open();
using (var bulkCopy = new SqlBulkCopy(conn))
{
bulkCopy.BatchSize = list.Count;
bulkCopy.DestinationTableName = tableName;
var table = ListToDataTable(list);
bulkCopy.WriteToServer(table);
}
}
public static DataTable ListToDataTable(IList list)
{
var dt = new DataTable();
if (list.Count <= 0) return dt;
var properties = list[0].GetType().GetProperties();
foreach (var pi in properties)
{
dt.Columns.Add(pi.Name, Nullable.GetUnderlyingType(pi.PropertyType) ?? pi.PropertyType);
}
foreach (var item in list)
{
DataRow row = dt.NewRow();
properties.ToList().ForEach(p => row[p.Name] = p.GetValue(item, null) ?? DBNull.Value);
dt.Rows.Add(row);
}
return dt;
}