I have a reasonably large edmx generated from a database and I have been working on performance recently to improve my application I have read a number of articles in a variety of places some here some not
this one on disabling auto detect of changes http://msdn.microsoft.com/en-us/data/jj556205.aspx
this one on improving performance on delete DbContext is very slow when adding and deleting
this one (which I think is pretty good) http://www.codeproject.com/Articles/38922/Performance-and-the-Entity-Framework
I am already using myentities.tablename.MergeOption = MergeOption.NoTracking, i am using compiledqueries, I am pregenerated my View using EdmGen, I have reduced the data I am fetching etc.. and, of course, I have gained performance in leaps and bounds so that a page that was loading in 54 seconds is now taking 16.1 seconds - however I have to get it to 3 seconds So I am still looking for the next improvement
so the research is all well and great and as a result I have upgraded to the latest EntityFramework, I have regenerated my .edmx from db etc... and tried a variety of things but I simply cannot find a myEntities.Configuration.AutoDetectChangesEnabled in order to set it to false. Now I must be missing a simple easy trick - how do I get my edmx to have this option.
I am in this environment.Net 4.0.3, visual studio 2010, latest version of EntityFramework, MVC 4.0... All I need is somebody to say "aha" you need to go and do this....
Currently if I delete 1000 records from one of my larger tables (134million rows) it takes nearly 10 minutes to savechanges. So from what I have read AutoDetectChangesEnabled is what I need to alter but it doesnt exist in my classes? where is it what must I do to get it?
Any help appreciated I am trying to solve this one quickly
Regards Julian
Right, I eventually found this item on stackoverflow Get DbContext for Entities that describes what is needed in order to change your database first edmx into a version that has the .Configuration.AutoDetectChangesEnabled this was great and I was able to progress. However, this did not get me the solution I was looking for as deletes being saved still took an inordinate amount of time.
So the moral is, yes apply all of the performance tricks
pre generate your views,
use AutoDetectChangesenabled=true,
use compiled queries,
smart connection strings
create fake objects instead of fetching the data first,
etc...
you can probably in most cases get the performance that is acceptable but if you really need to do things quickly you will need to go to TSQL and do it by hand
Regards Julian
AutoDetectChanges sits in DbContext.Configuration.AutoDetectChangesEnabled. For deletion what you can also try is to get list of IDs you want to delete, create fake objects that have only these IDs set, attach these objects and than delete them.
However we have also recently had a similar problem and we are currently deleting with ADO.NET. (or there is a method on DbContext where you can push SQL). In general EF works great for our app, but in 2-3 places we need performance, as number of records is huge. Unfortunatelly we had to use ADO.NET in these places, it's just many time faster when you work with mass data.
Related
I'm building a web app that will have 30-35 tables in one database. Now the thing is I want to split the app into 3 different front ends (different teams want different things). 3 different projects.
App1 might use 15-20 tables, App2 might use 10, App3 might use 15.
I was planning on making a project called Models that has a dbContext with all the tables in the database and use that for the web app projects. If I need to add or update the database I can just update that one models project.
A colleague mentioned that you should only include what you need so I should make 3 separate dbcontexts for each web project or there will be a performance hit for including unnecessary tables.
To answer the question in the title: no, I haven't seen any performance hit with extremely large DbContexts. In one project I've worked, where the DbContext was defined with close to a thousand DbSets, the configuration time (the time taken to perform the calls to OnConfiguring and OnModelCreating) was around 2 seconds, and every single entity was configured through the Fluent API; so you can say that the hit is negligible (if there's one at all) for only 35 entities.
That said, whether you use one or more DbContext is dependent of how you will use them. If there's a clear separation of data where you can clearly say "this table will only be used here" and you will not end up having repeated DbSet, you could keep them separated.
A colleague mentioned [...] there will be a performance hit for including unnecessary tables
When colleagues say things like that, you tell them to either back such claims with evidence or to shut up. Seriously, there's enough cargo cult programming in the world already. It's the same as colleagues enforcing you to use String.Empty because it's faster than using "", because they read that on a blog once. Hint: it isn't.
It's very healthy to apply criticism to every claim you hear, especially if that claim is not grounded in any reality whatsoever.
Yes, loading a type with more properties will require more disk I/O and more CPU cycles. This will be extremely negligible though. You will not notice this on the grand scale of things.*
It becomes quite a different story if you're using an EDMX though, as loading and parsing that 5 MB of metadata will literally add seconds to the loading time of your application.*
*: yes, I'm looking for sources for both those claims at the moment.
I think its not a problem from performance perspective - but definitely I see challenge from maintenance perspective.
I experienced similar situation where we had one edmx based data model shared across different capabilities. however each capability is just focused on specific number of tables.
With this, problem we started facing whenever we required to change any table specific to any capability required us to touch one single data model and also leads to unnecessary merge conflicts during checkins.
I'm using Entity Framework 6 on a SQL Server database to query an existing database (database first, so there's an EDMX in my project).
I've noticed that the first time I request an entity, it can take up to thirty seconds for the query to be executed. Subsequent queries to the same object then get completed in a matter of milliseconds. The actual SQL being executed is very fast so it's not a slow query.
I've found that Entity Framework generates views on the background and that this is the most likely culprit. What I haven't found, however, is a good solution for this. There's a NuGet package that can handle the View Generation (EFInteractiveViews), but it hasn't been updated since 2014 and I hardly seem to find any information on how to use it.
What options do I have nowadays? I've tried initializing Entity Framework on Application_Start by doing a few queries, but this doesn't seem to help much at all, and also it's quite difficult to perform the real queries on Application_Start, because most queries use data from the current user (who is not yet logged on at Application_Start) so it's difficult to run these in advance.
I've thought about creating an ashx file that constantly polls the application by calling the API and keep it alive. I've also set the Application Pool to "AlwaysRunning" so that EF doesn't restart when the app pool is recycled.
Does anyone have any tips or ideas on how I can resolve this or things I can try?
Thanks a lot in advance. I've spent the better part of two days already searching for a viable solution.
There are many practices to speed up Entity Framework, I will mention some of them
Turn off the LazyLoading (EDMX => open the file right click anywhere => properties => Lazy Loading Enabled set it to false )
Use AsNoTracking().ToList() and when you want to update, use Attach and update object state to EntityState.Modified
Use Indexes on your table
Use Paging, do not load all the data at once
Split your Edmx into many smaller, only include the ones you need in your page, ( this will effect the performance in good way)
If you want to load related objects "be eager and not lazy", use Include, you might include using System.Data.Entity to use the lambda include features
Example for splitting your Edmx
If you have the following objects for a rent a car app : Country, City , Person, Car, Rent, Gender, Engine, Manufacturers,..etc.
Now
If you are working on a screen to Manage (CRUD) person, this means you don't need Car,Rent,Manufacturer, so create ManagePerson.edmx contains ( Country, City, Person, Gender)
If you are working on managing (CRUD) Car then you don't need (Person,City, Gender,Rent), so you can create ManageCar.edmx containing ( Car, Manufacturer,Country, Engine)
Entity Framework must first compile and translate your LINQ queries into SQL, but after this it then caches them. The first hit to a query is always going to take a long time, but as you mention after that the query will run very quickly.
When I first used EF it was constantly an issue brought up by testers, but when the system went live and was used frequently (and queries were cached) it wasn't an issue.
See Hadi Hassans answer for general speed up tips.
I'm using .NET 4.5.1 with EF 6.0.2 and db-first.
The use case is something like this:
Roughly 50k entities are loaded
A set of these entities are displayed for the user, others are required for displaying the items correctly
The user may perform heavy actions on the entities, meaning the user chooses to perform one action which cascades to actually affect potentially hundreds of entities.
The changes are saved back to database.
The question, then, is what is the best way to handle this? So far I've come up with 2 different solutions, but don't really like either:
Create a DbContext at step 1. Keep it around during the whole process, then finally save changes. The reason I don't necessarily like this, is that the process might take hours, and as far as I know, DbContexts should not be preserved for this long.
Create a DbContext at step 1. Discard it right after. At step 4, create a new DbContext, attach the modified entities to it and save changes. The big problem I see with this approach is how do I figure out which entities have actually be changed? Do I need to build a ChangeTracker of my own to be able to do this?
So is there a better alternative for handling this, or should I use one of the solutions above (perhaps with some changes)?
I would go with option number 1 - use a DbContext for the entire process.
The problem I have is with the assertion that the process might take hours. I don't think this is something you want to do. Imagine what happens when your user has been editing the data for 3 hours, and then face a power blackout before clicking the final save. You'll have users running after you with pitchforks.
You're also facing a lot of concurrency issues - what if two users perform the same lengthy process at once? Handling collisions after a few hours of work is going to be a problem, especially if you tell users changes they've made hours ago can't be saved. Pitchforks again.
So, I think you should go with number 3 - save incremental changes of the editing process, so the user's work isn't lost if something bad happens, and so that you can handle collisions if two users are updating the data at the same time.
You would probably want to keep the incremental changes in a separate place, not your main tables, because the business change hasn't been finalized yet.
and as far as I know, DbContexts should not be preserved for this long.
Häh?
There is nothing in a db context about not preserving it. You may get problems with other people having already edited the item, but that is an inherent architectura problem - generally it isn ot adviced to use optimistic AND pessimistic locking in a "multi hour edit marathon".
The only sensible approach if you have editing over hours is using your own change tracker and using proper logic when changes collode - and / or use a logical locking mechanism (flag in the database).
I am developing an application with Fluent nHibernat/nHibernate 3/Sqlite. I have run into a very specific problem for which I need help with.
I have a product database and a batch database. Products are around 100k but batches run in around 11 million+ mark as of now. When provided with a product, I need to fill a Combobox with batches. As I do not want to load all the batches at once because of memory constraints, I am loading them, when the product is provided, directly from the database. But the problem is that sqlite (or maybe the combination of sqlite & nh) for this, is a little slow. It normally takes around 3+ seconds to retrieve the batches for a particular product. Although it might not seem like a slow scenario, I want to know that can I improve this time? I need sub second results to make order entry a smooth experience.
The details:
New products and batches are imported periodically (bi-monthly).
Nothing in the already persisted products or batchs ever changes (No Update).
Storing products is not an issue. Batches are the main culprit.
Product Ids are long
Batch Ids are string
Batches contain 3 fields, rate, mrp (both decimal) & expiry (DateTime).
The requirements:
The data has to be stored in a file based solution. I cannot use a client-server approach.
Storage time is not important. Search & retrieval time is.
I am open to storing the batch database using any other persistence model.
I am open to using anything like Lucene, or a nosql database (like redis), or a oodb, provided they are based on single storage file implementation.
Please suggest what I can use for fast object retrieval.
Thanks.
You need to profile or narrow down to find out where those 3+ seconds are.
Is it the database fetching?
Try running the same queries in Sqlite browser. Does the queries take 3+ seconds there too? Then you might need to do something with the database, like adding some good indexes.
Is it the filling of the combobox?
What if you only fill the first value in the combobox and throw away the others? Does that speed up the performance? Then you might try BeginUpdate and EndUpdate.
Are the 3+ seconds else where? If so, find out where.
This may seem like a silly question, but figured I'd double-check before proceeding to alternatives or other optimizations, but is there an index (or hopefully a primary key) on the Batch Id column in your Batch table. Without indexes those kinds of searches will be painfully slow.
For fast object retrieval, a key/value store is definitely a viable alternative. I'm not sure I would necessarily recommend redis in this situation since your Batches database may be a little too large to fit into memory, and although it also stores to a disk it's generally better when suited with a dataset that strictly fits into memory.
My personal favourite would be mongodb - but overall the best thing to do would be to take your batches data, load it into a couple of different nosql dbs and see what kind of read performance you're getting and pick the one that suits the data best. Mongo's quite fast and easy to work with - and you could probably ditch the nhibernate layer for such a simple data structure.
There is a daemon that needs to run locally, but depending on the size of the db it will be single file (or a few files if it has to allocate more space). Again, ensure there is an index on your batch id column to ensure quick lookups.
3 seconds to load ~100 records from the database? That is slow. You should examine the generated sql and create an index that will improve the query's performance.
In particular, the ProductId column in the Batches table should be indexed.
I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!
If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.
If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...
There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!
If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.