Entity Framework can't keep up with current amount of traffic - c#

I have an asp.net project running on a web server that receives a random amounts of traffic that needs to write this information to a sql database as soon as it receives it. It needs to handle up to 2000 - 3000 messages a second at times and other times just a few a second.
The programmers above me are stuck on using entity framework for the safety it provides, but I can't keep up with the surge of messages sometimes as they need to hit the database fast and can't be queued. The best I've gotten is about 1200 messages a second with entity framework using a save after each request, which I would think is now how entity framework should be used. I know using bulk insert is way more effective, but it isn't an option as we can't hold on to the message as per our requirements given to me. If I do a direct sql insert I can keep up with the message load, but my management says no for the type safety.
Any suggestions on how I can make entity framework keep up with the load or any other frame works that provide the safety and the backing that entity framework has that I bring to management? I've heard dapper is the other good contender, but I have no experience with it to justify it for enterprise solutions.
I've tried researching all the microsoft documents on entity framework and entityframework.net documentation. Tried setting AutoDetectChangesEnabled to false. Everything I read just points to do bulk insert. I've also tried stripping out other tables and have a staging table to see if I can make it faster.

Related

Potential conflict between transferring data and '.ValueGeneratedOnAdd()'

I apologize if this is duplicative; I could find nothing directly pertaining.
The difficulty involves EF Core (v 3.1.8, if it matters), but is not specific or restricted thereto. I am doing code first, creating a number of entities, but the key point is that I am getting my initial data set from an app that I am trying to replace. My new app has a number of structural differences in every corresponding entity, but the data in the old app is still critical, so I will be transferring it to my new database. (Old db is hosted by MS SQL 2008; new db is hosted by MS SQL 2019, if it matters).
Most of the key fields are GUIDs, and the problem is that in EF Core, at the point in the future when I want to use the new app to do more data entry, I will also want the database to choose the GUID. In EF Core Fluent API parlance, that would be, for example:
modelBuilder.Entity("ReplaceOldApp.Models.Address", b =>
{
b.Property<Guid>("AddressID")
.ValueGeneratedOnAdd()
.HasColumnType("uniqueidentifier");
}
However, if I inform EF Core that I want the database to create the key, then it will create the tables such that when I try to transfer the data from the old database (whether using EF or some other means), the new database will ignore the old GUID and create a new, unrelated one. (Or at least, that's what I think will happen. I'm not ready to try it yet.) If that happens, then all of the data from, say, the old Person entity (such as the above-implied Address entity), will no longer be related between their corresponding entities in the new database, because all records will have shiny new GUIDs. I will have all the information, and no way to actually use it.
Obviously I can tell EF Core to inform the database that it will not be creating the GUIDs, and I can then read, unmunge and transfer the data from the old database to the new without fear of data loss (God willing). But then going forward, for any new data entry, the GUIDs will not be automatically genned. I can of course then mod my IEntityTypeConfiguration Fluent API classes for the various entities and do a second migration, re-genning the affected tables, but I'm worried that EF Core will decide that it needs to DROP the tables to accommodate such a change. (Again, I do not know for sure because I have not tried it: sorry.)
So my question is: How would you approach such a situation? Should I ignore EF and do something clever with MS SQL Studio? Should I do two migrations with a transfer in-between? Should I tell the database, even though it has been told to gen the keys, somehow to accept the old keys without changing things, perhaps via LINQ?
============== Edit:
I'm sure SSIS would work to transfer the data from old to new databases, but the learning curve appears daunting, and I am only trying to solve one problem, not gain a new career. Powershell ditto, although it may be a bit more of a hacker's tool, and as such knowledge of it might assist tweaking or help to solve a diverse set of one-time SQL Server headaches. However, again, as would you, I prefer to use what I know, or failing that, learn or learn more about a tool which promises to serve me consistently into the future.
With the very welcome new (to me) information about IDENTITY_INSERT, and information gained from Linq To Sql and identity_insert, I believe I should not use LINQ to SQL because it may assume that IDENTITY_INSERT is OFF and simply filter out the crucial GUID, failing therefore to provide it to the target server. Rather, it seems I can use C# to produce a series of generated SQL statements, and then run each one on the target server inside a TransactionScope(). Because each such insert will thereby run 'in the same connection', the state of IDENTITY_INSERT will be preserved for that entire insert transaction, and (creek don't rise) it should work.
Again, I appreciate your answer, Randy in Marin. It has, it seems, led me to an approach that will work within the potential constraints of my context (EF Core), while allowing me to preserve the crucial existing IDENTITY information. Peace.
Not being an EF programmer, I don't know if there is an option for identity insert that you can enable for a migration. You might search the term to see if it comes up.
Our team support database migrations. We can do it a number of ways. I would not even consider EF because it's not designed for data migrations - or for database design. (And because we tend to use what we know.)
This is not the way I would do it, but it might be better than SSIS if you have not used SSIS. If the tables are in the same database or in databases on the same server, you can use T-SQL to load each table one at a time. Even if not on the same server, a linked server would allow a distributed transaction. (I avoid linked servers like the plague, but for a one time thing like a migration I would tolerate it. I would rather restore a copy of the source database to the destination server to use as a source. Distributed transactions gone wrong have forced me to reboot critical servers.)
Each table can have a 4 part name. If the server part (e.g., using a linked server name) is not present, the local instance is used. If the database part is not present, the current database is used. This is the format I assume for the "src_table" and "dst_table".
[myserver\myinstance].[mydatabase].[myschema].[mytable]
Each table is loaded with T-SQL as follows:
TRUNCATE TABLE dst_table
SET IDENTITY_INSERT dst_table ON
INSERT dst_table (...) SELECT ... FROM src_table
SET IDENTITY_INSERT dst_table OFF -- must be turned off - only 1 table can have this ON
If there are foreign keys, some tables (e.g., def tables) would need to be loaded first.
If the table does not have an IDENTITY column (EF code creates all values), you don't use the IDENTITY_INSERT stuff. It will fail if you use it and there is not an identity column. It will fail if you don't use it and try to insert into an identity column.
If there is a lot of data in a table, the transaction might be too big or slow. Inserting in batches might be called for.
If it was something to run on a schedule, I would likely create a SSIS package to do the load.
If I wanted to try something new, I would use powershell and the DBATools module cmdlets to see if extracting to csv and importing the csv would be efficient. The import cmdlet has a column mapping parameter, among many others. PowerShell could be used to do transformation, but I think this crosses over into SSIS territory.
I have dealt with migrations where the GUIDs and IDs no longer related after the move. Using queries joining the new data to the old data, we were able to fix the related values. It's likely more work to fix it after than to plan for it to be correct from the start.

Entity Framework takes about 30 seconds on first query

I'm using Entity Framework 6 on a SQL Server database to query an existing database (database first, so there's an EDMX in my project).
I've noticed that the first time I request an entity, it can take up to thirty seconds for the query to be executed. Subsequent queries to the same object then get completed in a matter of milliseconds. The actual SQL being executed is very fast so it's not a slow query.
I've found that Entity Framework generates views on the background and that this is the most likely culprit. What I haven't found, however, is a good solution for this. There's a NuGet package that can handle the View Generation (EFInteractiveViews), but it hasn't been updated since 2014 and I hardly seem to find any information on how to use it.
What options do I have nowadays? I've tried initializing Entity Framework on Application_Start by doing a few queries, but this doesn't seem to help much at all, and also it's quite difficult to perform the real queries on Application_Start, because most queries use data from the current user (who is not yet logged on at Application_Start) so it's difficult to run these in advance.
I've thought about creating an ashx file that constantly polls the application by calling the API and keep it alive. I've also set the Application Pool to "AlwaysRunning" so that EF doesn't restart when the app pool is recycled.
Does anyone have any tips or ideas on how I can resolve this or things I can try?
Thanks a lot in advance. I've spent the better part of two days already searching for a viable solution.
There are many practices to speed up Entity Framework, I will mention some of them
Turn off the LazyLoading (EDMX => open the file right click anywhere => properties => Lazy Loading Enabled set it to false )
Use AsNoTracking().ToList() and when you want to update, use Attach and update object state to EntityState.Modified
Use Indexes on your table
Use Paging, do not load all the data at once
Split your Edmx into many smaller, only include the ones you need in your page, ( this will effect the performance in good way)
If you want to load related objects "be eager and not lazy", use Include, you might include using System.Data.Entity to use the lambda include features
Example for splitting your Edmx
If you have the following objects for a rent a car app : Country, City , Person, Car, Rent, Gender, Engine, Manufacturers,..etc.
Now
If you are working on a screen to Manage (CRUD) person, this means you don't need Car,Rent,Manufacturer, so create ManagePerson.edmx contains ( Country, City, Person, Gender)
If you are working on managing (CRUD) Car then you don't need (Person,City, Gender,Rent), so you can create ManageCar.edmx containing ( Car, Manufacturer,Country, Engine)
Entity Framework must first compile and translate your LINQ queries into SQL, but after this it then caches them. The first hit to a query is always going to take a long time, but as you mention after that the query will run very quickly.
When I first used EF it was constantly an issue brought up by testers, but when the system went live and was used frequently (and queries were cached) it wasn't an issue.
See Hadi Hassans answer for general speed up tips.

Entity Framework VS pure Ado.Net

EF is so widely used staff but I don't realize how I should use it. I met a lot of issues with EF on different projects with different approaches. So some questions brought together in my head. And answers leads me to use pure ado.net with stored procedures.
So the questions are:
How to deal with EF in n-tier application?
For example, we have some DAL with EF. I saw a lot of articles and projects that used repository, unit of work patterns as some kind of abstraction for EF. I think such approach kills most of benefits that increase development speed and leads to few things:
remapping of EF load results in some DTO that kills performance(call some select to get table data - first loop, second loop - map results to some composite type generated by ef, next - filter mapped data using linq and, at last, map it to some DTO). Exactly remapping to DTO is killer of one of the biggest efs benefit;
or
leads to strong cohesion between EF (and it's version) and app. It will be something like 2-tier app with dal and presentation with bll or dal with bll and presentation. I guess it's not best practice. And the same loading process as we have for previous thing except mapping, so again performance issue raised up. We could try to use EF as DAL without any abstraction under them. But we will get similar issues in some other way.
Should I use one context per app\thread\atomic operation? Using approach - one context per app\thread may slightly increase performance and possibilities to call navigation properties, but we meet another problem - updating this context and growing loaded data in context, also I'm not sure about concurrency with one dbcontext per app\thread. Using context per operation will lead us to remapping EF results to our DTO's. So you see that we again pushed back to question no.1.
Could we try to use EF + stored procedures only? Again we have issues from previous questions. What is the reason to use EF if the biggest part of functionality will not be used?
So, yes EF is great to start project. It so convenient when we have few screens and crud operations.
But what next?
All this text is just unsorted thoughts. I know that pure ado.net will lead to another kind of challenges.
So, what is your opinion about this topic?
By following the naming conventions , you will find it's called : ADO.NET Entity Framework , which means that Entity Framework sits on top of ADO.NET so it can't be faster , It may perform both in equal time , but let's look at EF provides :
You will no more get stuck with writing queries without any clue about if what you're writing is going to compile or not .
It makes you rely on C# or your favorite .NET language on writing your own data constraints that you wish to accept from the target user directly inside your model classes .
Finally : EF and LINQ give a lot of power in maintaining your applications later .
There are three different models with the Entity Framework : Model First , Database First and Code First get to know each of 'em .
-The Point about killing performance when remapping is on process , it's because that on the first run , EF loads metadata into memory and that takes time as it builds in-memory representation of model from edmx file.
ADO. Net is an object oriented framework that allows you to interact with database system (SQL, Oracle, etc).
Entity framework is a techniques of manipulating data in databases like (collection of queries (inert table name , select * from like this )).
it is uses with LINQ.
Entity Framework is not efficient in any case as in most tools or toolboxes designed to achieve 'faster' results.
Access to database should be viewed as a separate tier using store procedures as the interface. There is no reason for any application to have more than absolutely require CRUD operations. Less is more principle. Stored procedures are easy to write, secure, maintain and is de facto fastest way. It's easy to write tools to generate desired codes for POCO and DbContext through stored procedures.
Application well designed should have a limited numbers of connection strings to database and none of which should be the all mighty God. Using schema to support connection rights.
Lazy loading are false statements added to solve a problem that should never exist and introduced with ORM and its plug and play features. Data should only be read when needed. Developers should be responsible to implement this logic base on application context.
If your application logic has a problem to maintain states, no tool will help. It will in fact, make it worse by cover up the real problem until it's too late.
Database first is the only solution for a well designed application. Civilization realized long time ago the important of solid aqueduct and sewer system. High level code can and will be replaced anytime but data stays. Rewrite an entire application is matter of days if database is well designed.
Applications are just glorified database access. Still true in most cases.
This is my conclusion after many years in business applications debugging through codes produced by many different tools or toolboxes. The faster results advertised are not even close to cover the amount of time/energy wasted later trying to clean up the mess. Performance issues are rarely if not ever caused by high demand but the sum of all 'features' added through unusable tools.
ADO.NET provides consistent access to data sources such as SQL Server and XML, and to data sources exposed through OLE DB and ODBC. Data-sharing consumer applications can use ADO.NET to connect to these data sources and retrieve, handle, and update the data that they contain.
Entity Framework 6 (EF6) is a tried and tested object-relational mapper (O/RM) for .NET with many years of feature development and stabilization. An ORM like EF has the following advantage
ORM lets developers focus on the business logic of the application thereby facilitating huge reduction in code.
It eliminates the need for repetitive SQL code and provides many benefits to development speed.
Prevents writing manual SQL queries; & many more..
In an n-tier application,it depends on the amount of data your application is handling and your database is managing. According to my knowledge DTO's don't kill performance. They are data container for moving data between layers and are only used to pass data and does not contain any business logic. They are mostly used in service classes.See DTO.
One DBContext is always a best practice.
There is no such combination of EF + SP(Stored Procedure) as per my knowledge. If you wish to use an ORM like EF and an SP at the same time try micro-ORMs like Dapper,BLToolkit, etc..It was build for that purpose and is heck lotta fast than EF. Here is a good article on Dapper ORM.
Here is a related thread on a similar topic: What is the difference between an orm and ADO.net?

Poor performance for POCO Generator

I am working on a Legacy application, and we have a poor performance with Entity Framework (4.0.0) and massive insert.
When I tried the POCO Generator (T4), the issue was worse, the SaveChanges was three times longer. This is huge, if you have any idea why I have this issue, I am interested.
I don't have any performance metrics for different generators. But the bottleneck should not be in your context anyway. You should know EF will generate one SQL statement per insert, update and delete, and if you didn't explicitly open the connection first, it will log on before and log off from sql server once per SQL statement.
Also the context must maintain the states and relationships so the performance degrades as your context gets larger and larger. SaveChanges must figure out what's happening in the context first and should be the reason why POCO Generator vs Entity Object ends up with different execution times. As far as it being 3 times longer, more details will be needed to figure it out.
PS, if you are stuck with the legacy code, you should look into using bulk copy with EF.

Best approach to incremently update application data

I have been working on an application for a couple of years that I updated using a back-end database. The whole key is that everything is cached on the client, so that it never requires an network connection to operate, but when it does have a connection it will always pickup the latest updates. Every application updated is shipped with the latest version of the database and I wanted it to download only the minimum amount of data when the database has been updated.
I currently use a table with a timestamp to check for updates. It looks something like this.
ID - Name - Description- Severity - LastUpdated
0 - test.exe - KnownVirus - Critical - 2009-09-11 13:38
1 - test2.exe - Firewall - None - 2009-09-12 14:38
This approach was fine for what I previously needed, but I am looking to expand more function of the application to use this type of dynamic approach. All the data is currently stored as XML, but I do not want to store complete XML files in the database and only transmit changed data.
So how would you go about allowing a fairly simple approach to storing dynamic content (text/xml/json/xaml) in a database, and have the client only download new updates? I was thinking of having logic that can handle XML inserted directly
ID - Data - Revision
15 - XXX - 15
XXX would be something like <Content><File>Test.dll<File/><Description>New DLL to load.</Description></Content> and would be inserted into the cache, but this would obviously be complicated as I would need to load them in sequence.
Another approach that has been mentioned was to base it on something similar to Source Control, storing the version in the root of the file and calculating the delta to figure out the minimal amount of data that need to be sent to the client.
Anyone got any suggestions on how to approach this with no risk for data corruption? I would also to expand with features that allows me to revert possibly bad revisions, and replace them with new working ones.
It really depends on the tools you are using and the architecture you already have. Is there already a server with some logic and a data access layer?
Dynamic approaches might get complicated, slow and limit the number of solutions. Why do you need a dynamic structure? Would it be feasible to just add data by using a name-value pair approach in a relational database? Static and uniform data structures are much easier to handle.
Before going into detail, you should consider the different scenarios.
Items can be added
Items can be changed
Items can be removed (I assume)
Adding is not a big problem. The client needs to remember the last revision number it got from the server and you write a query which get everything since there.
Changing is basically the same. You should care about identification of the items. You need an unchangeable surrogate key, as it seems to be the ID you already have. (Guids may be useful here.)
Removing is tricky. You need to either flag items as deleted instead of actually removing them, or have a list of removed IDs with the revision number when they had been removed.
Storing the data in the client: Consider using a relational database like SQLite in the client. (It doesn't need installation, it is just storing in a file. Firefox for instance stores quite a lot in SQLite databases.) When using the same in the server, you can probably reuse some code. It is also transaction based, which helps to keep it consistent (rollback in case of error during synchronization).
XML - if you really need it - can be stored just as a string in the database.
When using an abstraction layer or ORM that supports SQLite (eg. NHibernate), you may also reuse some code even when there is another database used by the server. Note that the learning curve for such an ORM might be rather steep. If you don't know anything like this, it could be too much.
You don't need to force reuse of code in the client and server.
Synchronization itself shouldn't be very complicated. You have a revision number in the client and a last revision in the server. You get all new / changed and deleted items since then in the client and apply it to the local store. Update the local revision number. Commit. Done.
I would never update only a part of a revision, because then you can't really know what changed since the last synchronization. Because you do differential updates, it is essential to have a well defined state of the client.
I would go with a solution using Sync Framework.
Quote from Microsoft:
Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.
A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur.
I have just built an application pretty much exactly as you described. I built it on top of the Microsoft Sync Framework that DjSol mentioned.
I use a C# front end application with a SqlCe database, and a SQL 2005 Server at the other end.
The following articles were extremely useful for me:
Tutorial: Synchronizing SQL Server and SQL Server Compact
Walkthrough: Creating a Sync service
Step by step N-tier configuration of Sync services for ADO.NET 2.0
How to Sync schema changed database using sync framework?
You don't say what your back-end database is, but if it's SQL Server you can use SqlCE (SQL Server Compact Edition) as the client DB and then use RDA merge replication to update the client DB as desired. This will handle all your requirements for sure; there is no need to reinvent the wheel for such a common requirement.

Categories