I am looking for a cache system that works with Entity Framework 6+. Either to find a suitable library or implement it myself.
I have already investigated these two open source cache providers:
Second Level Cache for Entity Framework 6.1
https://efcache.codeplex.com/
EFSecondLevelCache
https://github.com/VahidN/EFSecondLevelCache/
They seem like solid products although I am looking for a way to specify the maximum cache time within the query. I think that would be the best way to specify the oldest data I would be able to accept (since this is very dependent on business logic of course)
This would be the best way to call it I think:
using (var db = new MyEF6Entities()) {
var path = db.Configuration.Where(c => c.name="Path")
.AllowFromCache(TimeSpan.FromMinutes(60));
var onlineUserList = db.Users.Where(u => u.IsOnline)
.AllowFromCache(TimeSpan.FromSeconds(30));
}
Is there a way to make an API like this?
EDIT: I have now tried EF+ as suggested by #JonathanMagnan but as an example the below code does not ever get a cache hit. I know this because it is very slow, I also see in the mysql administrator that there is a connection made with the specific query, and if I change data in the database it will show immediately, I would expect a delay of 60 minutes
public String GetClientConfig(string host, Guid deviceId) {
using (var db = new DbEntities(host)) {
var clientConfig = db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60)).ToList();
return string.Join("\n", clientConfig.Select(b => b.Name + "=" + b.Value));
}
return null;
}
EDIT 2: I realized not that caching works, however it still creates a connection to the database because I create a new context instance. There is however a problem! I do EF requests in a web service. Depending on the host name of the web service Host:-header different databases are queried. EF Plus does not seem to handle this so if I use http://host1/ClientConfig I later get the same and incorrect results from http://host2/ClientConfig. Is there a way to tag the cache with the host name?
Disclaimer: I'm the owner of the project Entity Framework Plus (EF+)
I cannot speak for other libraries you already investigated, but EF+ Query Cache allows to cache data for a specific amount of time.
Wiki: EF+ Query Cache
Example
using (var db = new MyEF6Entities()) {
var path = db.Configuration.Where(c => c.name="Path")
.FromCache(DateTime.Now.AddMinutes(60));
var onlineUserList = db.Users.Where(u => u.IsOnline)
.FromCache(DateTime.Now.AddSeconds(30));
}
You can also use cache "tags" to cache your data multiple times if some logic requires data live, and other logic doesn't need it.
Example
using (var db = new MyEF6Entities()) {
var recentUserList = db.Users.Where(u => u.IsOnline)
.FromCache(DateTime.Now.AddHours(1), "recent");
var onlineUserList = db.Users.Where(u => u.IsOnline)
.FromCache(DateTime.Now.AddSeconds(30), "online");
}
From my knowledge, I don't think any cache support is taking data cached in the last X seconds.
EDIT: Answer subquestion
Thanks Jonathan. How does EF+ work with using different databases for
example, and different logged in users for example. Will it handle
this automatically or do I need to give it some hints ?
It depends on each feature. For example, Query Cache works with all databases provider since only LINQ is used under the hood. You don't need to specify anything, it already handles everything.
Some other features like Query Future don't work yet on all providers. It works on popular providers like SQL Server, SQL Azure, and MySQL.
You can see the requirements section under each feature. We normally say "All supported" if it supports all providers.
EDIT: Answer subquestion
I change the database in the MyEF6Entities constructor so question is
if EF+ will take the database name into account or if this could mess
up the cache?
Starting from v1.3.34, the connection string is now part of the key. So that will support this scenario.
The key is composed of:
A cache prefix
The connection string
All cache tags
All parameter names & values
Is there a way to tag the cache with the host name
You can use Cache Tags: EF+ Query Cache Tag & ExpireTag
Example:
var host1 = "http://host1/ClientConfig";
var host2 = "http://host2/ClientConfig";
var clientHost1 = db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60), host1).ToList();
var clientHost1 = db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60), host2).ToList();
All tags are part of the cache key so both queries will be cached in a different cache entry.
EDIT: Answer subquestion
I set the ConnectionString in the MyEF6Entities() constructor and it
still seem to confuse data from different databases
Can you generate the cache key and check if the connection string is included?
var query = b.Users.Where(u => u.IsOnline);
var cacheKey = QueryCacheManager.GetCacheKey(query, new string[0]);
It works if I do for example var clientHost =
db.ClientConfig.FromCache(DateTime.Now.AddMinutes(60),
hostName).ToList(); but I shouldnt need to to this since the
connectionStrings are different, right ?
Yes, you are right. You shouldn't need it if the connection string changes with the host.
Related
We are using EF 6.0, .NET 4.5 and using code first approach and our database has around 170 entities(tables) and the main table holding around 150,000 records
On first load of the entity framework it takes around 25 seconds.
I am trying to improve this time as this is too slow and as the number of records increases it becomes slower.
I have tried generating native images, tried using pre generated interactive views but I couldn't achieve any significant improvements.
Can anyone please help me on this?
Thanks.
You can consider the Entity Framework Pre-Generated Mapping Views.You can use EF Power Tools to create pre-generate views.
Using pre-generated views moves the cost of view generation from model
loading (run time) to compile time. While this improves startup
performance at runtime, you will still experience the pain of view
generation while you are developing. There are several additional
tricks that can help reduce the cost of view generation, both at
compile time and run time.
You can refer this for knowing more about it : Entity Framework Pre-Generated Mapping Views
You can use Caching in the Entity Framework to improve the performance of your app.
There are 3 types of caching.
1. Object caching – the ObjectStateManager built into an ObjectContext
instance keeps track in memory of the objects that have been
retrieved using that instance. This is also known as first-level
cache.
2. Query Plan Caching - reusing the generated store command when a
query is executed more than once.
3. Metadata caching - sharing the metadata for a model across different
connections to the same model.
You can refer this article to read more about it : Performance Considerations for EF 6
I recently had a simple query that runs super quick in SSMS that was taking way, way too long to run using the Entity Framework in my C# program.
This page has been extremely helpful, when trouble shooting EF performance problems in general:
https://www.simple-talk.com/dotnet/net-tools/entity-framework-performance-and-what-you-can-do-about-it/
..but in this case, nothing helped. So in the end, I did this:
List<UpcPrintingProductModel> products = new List<UpcPrintingProductModel>();
var sql = "select top 75 number, desc1, upccode "
+ "from MailOrderManager..STOCK s "
+ "where s.number like #puid + '%' "
;
var connstring = ConfigurationManager.ConnectionStrings["MailOrderManagerContext"].ToString();
using (var connection = new SqlConnection(connstring))
using (var command = new SqlCommand(sql, connection)) {
connection.Open();
command.Parameters.AddWithValue("#puid", productNumber);
using (SqlDataReader reader = command.ExecuteReader()) {
while (reader.Read()) {
var product = new UpcPrintingProductModel() {
ProductNumber = Convert.ToString(reader["number"]),
Description = Convert.ToString(reader["desc1"]),
Upc = Convert.ToString(reader["upccode"])
};
products.Add(product);
}
}
}
(For this particular query, I just completely bypassed the EF altogether, and used the old standby: System.Data.SqlClient.)
You can wrinkle your nose in disgust; I certainly did - but it didn't actually take that long to write, and it executes almost instantly.
You also can circumvent this issue asynchronously "warming" your dbcontexts at application start moment.
protected void Application_Start()
{
// your code.
// Warming up.
Start(() =>
{
using (var dbContext = new SomeDbContext())
{
// Any request to db in current dbContext.
var response1 = dbContext.Addresses.Count();
}
});
}
private void Start(Action a)
{
a.BeginInvoke(null, null);
}
I also recommend use such settings as: (if they fit your app)
dbContext.Configuration.AutoDetectChangesEnabled = false;
dbContext.Configuration.LazyLoadingEnabled = false;
dbContext.Configuration.ProxyCreationEnabled = false;
Skip validation part ( ie Database.SetInitializer<SomeDbContext>(null);)
Using .asNoTraking() on GET queries.
For additional information you can read:
https://msdn.microsoft.com/en-in/data/hh949853.aspx
https://www.fusonic.net/en/blog/3-steps-for-fast-entityframework-6.1-code-first-startup-performance/
https://www.fusonic.net/en/blog/ef-cache-deployment/
https://msdn.microsoft.com/en-us/library/dn469601(v=vs.113).aspx
https://blog.3d-logic.com/2013/12/14/using-pre-generated-views-without-having-to-pre-generate-views-ef6/
In some cases EF does not use Query Plan Caching. For example if you use contans, any, all methods or use constants in you query. You can try NihFix.EfQueryCacheOptimizer. It convert your query expression that allow EF use cahce.
I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?
Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.
Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.
Please see the following situation:
I do have a CSV files of which I import a couple of fields (not all in SQL server using Entity Framework with the Unit Of Work and Repository Design Pattern).
var newGenericArticle = new GenericArticle
{
GlnCode = data[2],
Description = data[5],
VendorId = data[4],
ItemNumber = data[1],
ItemUOM = data[3],
VendorName = data[12]
};
var unitOfWork = new UnitOfWork(new AppServerContext());
unitOfWork.GenericArticlesRepository.Insert(newGenericArticle);
unitOfWork.Commit();
Now, the only way to uniquely identify a record, is checking on 4 fields: GlnCode, Description, VendorID and Item Number.
So, before I can insert a record, I need to check whether or not is exists:
var unitOfWork = new UnitOfWork(new AppServerContext());
// If the article is already existing, update the vendor name.
if (unitOfWork.GenericArticlesRepository.GetAllByFilter(
x => x.GlnCode.Equals(newGenericArticle.GlnCode) &&
x.Description.Equals(newGenericArticle.Description) &&
x.VendorId.Equals(newGenericArticle.VendorId) &&
x.ItemNumber.Equals(newGenericArticle.ItemNumber)).Any())
{
var foundArticle = unitOfWork.GenericArticlesRepository.GetByFilter(
x => x.GlnCode.Equals(newGenericArticle.GlnCode) &&
x.Description.Equals(newGenericArticle.Description) &&
x.VendorId.Equals(newGenericArticle.VendorId) &&
x.ItemNumber.Equals(newGenericArticle.ItemNumber));
foundArticle.VendorName = newGenericArticle.VendorName;
unitOfWork.GenericArticlesRepository.Update(foundArticle);
}
If it's existing, I need to update it, which you see in the code above.
Now, you need to know that I'm importing around 1.500.000 records, so quite a lot.
And it's the filter which causes the CPU to reach almost 100%.
The `GetAllByFilter' method is quite simple and does the following:
return !Entities.Any() ? null : !Entities.Where(predicate).Any() ? null : Entities.Where(predicate).AsQueryable();
Where predicate equals Expression<Func<TEntity, bool>>
Is there anything that I can do to make sure that the server's CPU doesn't reach 100%?
Note: I'm using SQL Server 2012
Kind regards
Wrong tool for the task. You should never process a million+ records one at at time. Insert the records to a staging table using bulk insert and clean (if need be) and then use a stored proc to do the processing in a set-based way or use the tool designed for this, SSIS.
I've found another solution which wasn't proposed here, so I'll be answering my own question.
I will have a temp table in which I will import all the data, and after the import, I'll execute a stored procedure which will execute a Merge command to populate the destinatio table. I do believe that this is the most performant.
Have you indexed on those four fields in your database? That is the first thing that I would do.
Ok, I would recommend trying the following:
Improving bulk insert performance in Entity framework
To summarize,
Do not call SaveChanges() after every insert or update. Instead, call every 1-2k records so that the inserts/updates are made in batches to the database.
Also, optionally change the following parameters on your context:
yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;
I'm using BatchDelete found on the answer to this question: EF Code First Delete Batch From IQueryable<T>?
The method seems to be wasting too much time building the delete clause from the IQueryable. Specifically, deleting 20.000 elements using the IQueryable below is taking almost two minutes.
context.DeleteBatch(context.SomeTable.Where(x => idList.Contains(x.Id)));
All the time is spent on this line:
var sql = clause.ToString();
The line is part of this method, available on the original question linked above but pasted here for convenience:
private static string GetClause<T>(DbContext context, IQueryable<T> clause) where T : class
{
const string Snippet = "FROM [dbo].[";
var sql = clause.ToString();
var sqlFirstPart = sql.Substring(sql.IndexOf(Snippet, System.StringComparison.OrdinalIgnoreCase));
sqlFirstPart = sqlFirstPart.Replace("AS [Extent1]", string.Empty);
sqlFirstPart = sqlFirstPart.Replace("[Extent1].", string.Empty);
return sqlFirstPart;
}
I imagine making context.SomeTable.Where(x => idList.Contains(x.Id)) into a compiled query could help, but AFAIK you can't compile queries while using DbContext on EF 5. In thesis they should be cached but I see no sign of improvement on a second execution of the same BatchDelete.
Is there a way to make this faster? I would like to avoid manually building the SQL delete statement.
The IQueryable isn't cached and each time you evaluate it you're going out to SQL. Running ToList() or ToArray() on it will evaluate it once and then you can work with the list as the cached version.
If you want to preserve you're interfaces, you'd use ToList().AsQueryable() and this would pass in a cached version.
Related post.
How do I cache an IQueryable object?
It seems there is no way to cache the IQueryable in this case, because the query contains a list of ids to check against and the list changes in every call.
The only way I found to avoid the two minute delay in building the query every time I had to mass-delete objects was to use ExecuteSqlCommand as below:
var list = string.Join("','", ids.Select(x => x.ToString()));
var qry = string.Format("DELETE FROM SomeTable WHERE Id IN ('{0}')", list);
context.Database.ExecuteSqlCommand(qry);
I'll mark this as the answer for now. If any other technique is suggested that doesn't rely on ExecuteSqlCommand, I'll gladly change the answer.
There is a EF pattern that works Ok.
it uses projection. to return ONLY keys from DB. (projections are not added to context,
So this is pretty quick.
Then You build the context with KEY only stub POCOs, and light the fuse....
basically.
var deleteMagazine = Context.Set<DeadMeat>.Where(t=>t.IhateYou == true).Select(t=>t.THEKEY).toList
//Now instantiate a dummy POCO with KEY only for the list,
foreach ( var bullet in deleteMagazine)
{
context.Set<deadmeat>.attach(bullet);
context.set<deadmeat>.remove(bullet);
// consider saving chnages every 1000 records .... performance, trial different values
if (magazineisEmpty) // your counter logic here :-)
context.SaveChanges
}
// shoot anyone still moving
context.SaveChanges
check SQL server profiler....
I am often comparing data in tables in different databases. These databases do not have the same schema. In TSQL, I can reference them with the DB>user>table structure (DB1.dbo.Stores, DB2.dbo.OtherPlaces) to pull the data for comparison. I like the idea of LINQPad quite a bit, but I just can't seem to easily pull data from two different data contexts within the same set of statements.
I've seen people suggest simply changing the connection string to pull the data from the other source into the current schema but, as I mentioned, this will not do. Did I just skip a page in the FAQ? This seems a fairly routine procedure to be unavailable to me.
In the "easy" world, I'd love to be able to simply reference the typed datacontext that LINQPad creates. Then I could simply:
DB1DataContext db1 = new DB1DataContext();
DB2DataContext db2 = new DB2DataContext();
And work from there.
Update: it's now possible to do cross-database SQL Server queries in LINQPad (from LINQPad v4.31, with a LINQPad Premium license). To use this feature, hold down the Control key while dragging databases from the Schema Explorer to the query window.
It's also possible to query linked servers (that you've linked by calling sp_add_linkedserver). To do this:
Add a new LINQ to SQL connection.
Choose Specify New or Existing Database and choose the primary database you want to query.
Click the Include Additional Databases checkbox and pick the linked server(s) from the list.
Keep in mind that you can always create another context on your own.
public FooEntities GetFooContext()
{
var entityBuilder = new EntityConnectionStringBuilder
{
Provider = "Devart.Data.Oracle",
ProviderConnectionString = "User Id=foo;Password=foo;Data Source=Foo.World;Connect Mode=Default;Direct=false",
Metadata = #"D:\FooModel.csdl|D:\FooModel.ssdl|D:\FooModel.msl"
};
return new FooEntities(entityBuilder.ToString());
}
You can instantiate as many contexts as you like to disparate SQL instances and execute pseudo cross database joins, copy data, etc. Note, joins across contexts are performed locally so you must call ToList(), ToArray(), etc to execute the queries using their respective data sources individually before joining. In other words if you "inner" join 10 rows from DB1.TABLE1 with 20 rows from DB2.TABLE2, both sets (all 30 rows) must be pulled into memory on your local machine before Linq performs the join and returns the related/intersecting set (20 rows max per example).
//EF6 context not selected in Linqpad Connection dropdown
var remoteContext = new YourContext();
remoteContext.Database.Connection.ConnectionString = "Server=[SERVER];Database="
+ "[DATABASE];Trusted_Connection=false;User ID=[SQLAUTHUSERID];Password="
+ "[SQLAUTHPASSWORD];Encrypt=True;";
remoteContext.Database.Connection.Open();
var DB1 = new Repository(remoteContext);
//EF6 connection to remote database
var remote = DB1.GetAll<Table1>()
.Where(x=>x.Id==123)
//note...depending on the default Linqpad connection you may get
//"EntityWrapperWithoutRelationships" results for
//results that include a complex type. you can use a Select() projection
//to specify only simple type columns
.Select(x=>new { x.Col1, x.Col1, etc... })
.Take(1)
.ToList().Dump(); // you must execute query by calling ToList(), ToArray(),
// etc before joining
//Linq-to-SQL default connection selected in Linqpad Connection dropdown
Table2.Where(x=>x.Id = 123)
.ToList() // you must execute query by calling ToList(), ToArray(),
// etc before joining
.Join(remote, a=> a.d, b=> (short?)b.Id, (a,b)=>new{b.Col1, b.Col2, a.Col1})
.Dump();
remoteContext.Database.Connection.Close();
remoteContext = null;
I do not think you are able to do this. See this LinqPad request.
However, you could build multiple dbml files in a separate dll and reference them in LinqPad.
Drag-and-drop approach: hold down the Ctrl key while dragging additional databases
from the Schema Explorer to the query editor.
Use case:
//Access Northwind
var ID = new Guid("107cc232-0319-4cbe-b137-184c82ac6e12");
LotsOfData.Where(d => d.Id == ID).Dump();
//Access Northwind_v2
this.NORTHWIND_V2.LotsOfData.Where(d => d.Id == ID).Dump();
Multiple databases are as far as I know only available in the "paid" version of LinqPad (what I wrote applies to LinqPad 6 Premium).
For more details, see this answer in StackOverflow (section "Multiple database support").