Is Entity Framework fast enough for data retrieval

Is Entity Framework fast enough for data retrieval - c#

I am designing a new set of projects including a WCF service that must handle as many as 50 requests per minute.
This will be a Microsoft stack using .NET 4.0 and C#.
Each request will validate the data and if it passes, retrieve data via a stored proc on a SQL Server 2008 server.
The response should be returned within 5 seconds of the request, if possible.
Both the request and the response XML are under 3K each and are fairly simple.
I plan to set up a load-balancer to handle the requests but I need to know if EF will be fast enough to pull this off or if I need to go with something else.
Note that none of this is built yet so I have the freedom to build something from scratch.

Entity Framework is relatively fast (Performance Considerations for Entity Framework 4, 5, and 6), however, if ALL you're doing is invoking stored procedures, Dapper or some other MICRO-ORM will be much faster. If you need to do more complex O/RM tasks, like LINQ queries against the database, LINQ to SQL is generally faster than EF6, but EF6 supports more concepts, such as code-first that LINQ to SQL was never meant to do.
I don't think your O/RM will be your bottleneck, no matter what way you go about it: more likely the stored procedure (or no indexes, if you go the O/RM query route and don't figure out what indexes you need beforehand) will be your performance bottleneck.

Related

New project: ADO.Net vs Entity Framework - trying to understand if EF works out

we are at the beginning of a new project, which will replace a legacy project. The legacy one is written in .Net Framework 4.0 (SOA with WCF) + SQL Server. The connection with SQL is made by ADO.Net + stored procedures. There is a structural mistake by having most of the logic on the stored procedures, and on top of that, it is a monolytic.
The new project will be made with .Net 6 APIs and in some cases, it will have SQL Server as well, for operational data.
So, looking at the new product the question was raised: should we move from ADO.Net to EF? This is tempting since it reduces the development effort, but performance is a concern.
Taking a look at the technical must haves:
Get the product to be as fast as possible (performance is a concern)
The new project is expected to live at least for the next 15 years
Operations are executed against tables with 30 to 50 million records
We must be able to run operations against the regular database, but also against the readonly one (AlwaysOn)
We must be able to perform some resiliency policies such as retries in case of deadlocks
We don't have much room for changes if we choose one path and somewhere along the way we realize we should had gone with the other option
Quite honestly, IMHO, based on our tech requirements I feel should move forward with ADO.Net + Stored procedures (without any business logic) + some sort of package that translates the SQL results to my objects in a fast manner, but I'd like to give EF a shot, at least on this stage of the process where we are investigating possibilities.
I'd like to gather if possible opinions, specially if there is someone out there that went to EF with requirements as similar as ours, or someone who didn't go to EF or had to change from EF to ADO.Net somewhere along the way.
Thanks.

The only thing in your requirements that could support using ADO.NET over EF is
Get the product to be as fast as possible (performance is a concern)
Which is a nonsense requirement, as you can always write more code and make things more complex to make things marginally faster. You need a real performance requirement so you can measure different approaches.

Sending Correlation ID from Code to SQL Server

Is there a way to send a correlation ID from C# code to SQL Server at the command level?
For instance, using x-correlation-id is an accepted way to track a request down to all parts of the system. we are looking for a way to pass this string value to stored procedure calls in SQL Server.
I spent sometime reading thru documents and posts but I was not able to find anything useful.
Can someone please let me know if there is a way to do this? The goal is to be able to track a specific call thru all services (which we can now) and DB calls (which we cannot and looking for a solution.)

I know the answer here is one year later. But in case, somebody has the same question.
Since EF core 2.2, MS provides a new method called "TagWith()" which you could pass your own annotation with the EF query into SQL server. In this way, you could easily track the SQL query with the same correlation id generated in your C# code.
https://learn.microsoft.com/en-us/ef/core/querying/tags
Unfortunately, this new feature is not available in EF 6. But it is not only us in this situation. If you just need a simple solution, you could check the thread here and MS documents.
If you need a more stable solution, you could check this NuGet plugin for EF 6 as well.

To pass your correlation id to SQL Server you have two options:
explicitly pass it as a parameter to your queries & stored procedures.
This is annoying as it requires work to change all your db calls to have a parameter like #correlationId, and often doesn't make sense having that parameter for simple data-retrieval queries. Perhaps you decide to only pass it for data-modification operations.
But on the positive side it's really obvious where the correlation info comes from (i.e. nobody reading the code will be confused) and doesn't require any additional db calls.
If all your data-modification is done using stored procs I think this is a good way to go.
use SQL Server's SESSION_CONTEXT(), which is a way you can set session state on a connection that can be retrieved from within stored procs etc.
You can find a way to inject it into your db layer (e.g. this) so the session context is always set on a connection before executing your real db calls. Then within your procs/queries you get the correlation id from the SESSION_CONTEXT and write to wherever you want to store it (e.g. some log table or as a column on tables being modified)
This can be good as you don't need to change each of your queries or procs to have the #correlationId parameter.
But it's often not so transparent how the session context is magically set. Also you need to be sure it's always set correctly which can be difficult with ORMs and connection pooling and other architectural complexities.
If you're not already using stored procs for all data modification, and you can get this working with your db access layer, and you don't mind the cost of the extra db calls this is a good option.
I wish this was easier.
Another option is to not pass it to SQL Server, but instead log all your SQL calls from the tier that makes the call and include the correlation id in those logs. That's how Application Insights & .NET seems to do it by default: logging SQL calls as a dependency along with the SQL statement and the correlation id.

Entity Framework Core - Daily SQL Operation

I have a big database which is created by entity framework core. This database stores round about 5 million datasets. To improve the query speed i'd like to aggregate the data of the days before.
In this case i would like to execute a SQL command once every day at 00:00 o'clock and aggregate the data of yesterday.
In the past i created stored-procs which are executed by a database-job in mssql. But these databases were created manually and now i'd like to get a similar functionallity by using the entity framework.
I read that there shouldn't be any logic in the database. So how could i do this instead? (The article where i get the base information is: Can you create sql views / stored procedure using Entity Framework 4.1 Code first approach)
So i'm searching a good solution to execute every day a "aggregation" function and store the aggregation data in the database.

You use the method you used before! It's ideally solved by SQL Agent and a proc, almost anything else will have more issues and worse performance.
If you really wanted to do it differently then you need two parts
a scheduler, this will most likely be the OS one, but has no where near
as many features as SQL Agent.
the actual program, a .NET app using EF will do this but EF is
not required, simple ADO will work, as will any other library.
The only reason you'd choose this route, is if you had further requirements that SQL would be inappropriate for, so you needed a more general language.

Entity framework running sprocs and native queries performance considerations

I'm wondering if there's a performance penalty when doing the following vs using plain old ado.net DataReader and DataTable:
using(DBEntities dbEntities = new dbEntities)
{
ObjectResult<tblCustomers> customers =
dbEntities.ExecuteStoreQuery<tblCustomers>("SELECT name,id FROM tblCustomers");
}
I would also like to run sprocs using dbEntity.
I mention this because i'm developing a highly performance sensitive application but would still like to use the entity framework.
furthermore, can anyone point me to recent performance tests of linq to entities compiled queries on .net 4.0?
EDIT
If i go with ado.net i plan on inserting the results i get from each row to a .net object manually. So it's entity framework storequery/sproc vs ado.net + manually creating and inserting data to a .net object.

Yes, of course - this is a higher-level approach than plain ADO.NET / SQL.
You send in a SQL query and get back a list of tblCustomers objects. Somewhere along the line, a mapping from the database's row/column to the object will happen, and this does take some time.
On the other hand - if you want to do the same thing yourself, you will have to pay a performance penalty, too - or you just use the old-style row/column to do your work (not recommended!).
It's the classic "convenience vs. performance" trade-off - what is more important to you? Being able to program with nice C# objects and their properties and be very productive as a programmer - or a few nanoseconds on the SELECT from your database? It's your pick....

Which is the "best" data access framework/approach for C# and .NET?

(EDIT: I made it a community wiki as it is more suited to a collaborative format.)
There are a plethora of ways to access SQL Server and other databases from .NET. All have their pros and cons and it will never be a simple question of which is "best" - the answer will always be "it depends".
However, I am looking for a comparison at a high level of the different approaches and frameworks in the context of different levels of systems. For example, I would imagine that for a quick-and-dirty Web 2.0 application the answer would be very different from an in-house Enterprise-level CRUD application.
I am aware that there are numerous questions on Stack Overflow dealing with subsets of this question, but I think it would be useful to try to build a summary comparison. I will endeavour to update the question with corrections and clarifications as we go.
So far, this is my understanding at a high level - but I am sure it is wrong...
I am primarily focusing on the Microsoft approaches to keep this focused.
ADO.NET Entity Framework
Database agnostic
Good because it allows swapping backends in and out
Bad because it can hit performance and database vendors are not too happy about it
Seems to be MS's preferred route for the future
Complicated to learn (though, see 267357)
It is accessed through LINQ to Entities so provides ORM, thus allowing abstraction in your code
LINQ to SQL
Uncertain future (see Is LINQ to SQL truly dead?)
Easy to learn (?)
Only works with MS SQL Server
See also Pros and cons of LINQ
"Standard" ADO.NET
No ORM
No abstraction so you are back to "roll your own" and play with dynamically generated SQL
Direct access, allows potentially better performance
This ties in to the age-old debate of whether to focus on objects or relational data, to which the answer of course is "it depends on where the bulk of the work is" and since that is an unanswerable question hopefully we don't have to go in to that too much. IMHO, if your application is primarily manipulating large amounts of data, it does not make sense to abstract it too much into objects in the front-end code, you are better off using stored procedures and dynamic SQL to do as much of the work as possible on the back-end. Whereas, if you primarily have user interaction which causes database interaction at the level of tens or hundreds of rows then ORM makes complete sense. So, I guess my argument for good old-fashioned ADO.NET would be in the case where you manipulate and modify large datasets, in which case you will benefit from the direct access to the backend.
Another case, of course, is where you have to access a legacy database that is already guarded by stored procedures.
ASP.NET Data Source Controls
Are these something altogether different or just a layer over standard ADO.NET?
- Would you really use these if you had a DAL or if you implemented LINQ or Entities?
NHibernate
Seems to be a very powerful and powerful ORM?
Open source
Some other relevant links;
NHibernate or LINQ to SQL
Entity Framework vs LINQ to SQL

I think LINQ to SQL is good for projects targeted for SQL Server.
ADO.NET Entity Framework is better if we are targeting different databases. Currently I think a lot of providers are available for ADO.NET Entity Framework, Provider for PostgreSQL, MySQL, esql, Oracle and many other (check http://blogs.msdn.com/adonet/default.aspx).
I don't want to use standard ADO.NET anymore because it's a waste of time. I always go for ORM.

Having worked on 20+ different C#/ASP.NET projects I always end up using NHibernate. I often start with a completely different stack - ADO.NET, ActiveRecord, hand rolled wierdness. There are numerous reasons why NHibernate can work in a wide range of situations, but the absolutely stand out for me is the saving in time, especially when linked to code generation. You can change the datamodel, and the entities get rebuilt, but most/all the other code doesn't need to be changed.
MS does have a nasty habit of pushing technologies in this area that parallel existing open source, and then dropping them when they don't take off. Does anyone remember ObjectSpaces?

Added for new technologies:
With Microsoft Sql Server out for Linux in Beta right now, I think it's ok to not be database agnostic. The .Net Core Path and MS-SQL route allows you to run on Linux servers like Ubuntu entirely with no windows dependencies.
As such, imo, a very good flow is to not use a full ORM framework or data controls and leverage the power of SSDT Visual Studio Projects (Sql Server Data Tools) and a Micro ORM.
In Visual Studio you can create a Sql Server Project as a legit Visual Studio Project. Doing so allows you to create the entire database via table designers or raw query editing right inside visual studio.
Secondly, you get SSDT's Schema Compare tool which you can use to compare your database project to a live database in Microsoft Sql Server and update it. You can sync your Visual Studio Project with the server causing updates in your project to go out to the server. Or you can sync the server with your project causing your source code to update. Via this route you can easily pick up changes the DBA made in maintenance last night and push out your new development changes for a new feature easily with a simple tool.
Using that same tool you can compute the migration script without actually running it, if you need to pass that off to an operations department and submit a change order, it works for that flow to.
Now for writing code against you MS-SQL Database, I recommend PetaPoco.
Because PetaPoco works Perfectly inline with the above SSDT solution. PetaPoco comes with T4 text templates you can use to generate all your data entity classes, and it generates the bulk data layer classes for you.
The catch is, you have to write queries yourself, which isn't a bad thing.
So you end up with something like this:
var people = dbContext.Fetch<Person>("SELECT * FROM People where Username Like '%#0%'", "bob");
PetaPoco automatically handles parameterizing #0 for you, it also has the handy Sql class for building queries.
Furthermore, PetaPoco is an order of magnitude faster than EF6 and 8+ times faster than EF7.
So in total, this solution involves using SSDT for SCHEMA management, and PetaPoco for code integration at the gain of high maintainability, customization, and very good performance.
The only downfall to this approach, is that you're hard tieing yourself to Microsoft Sql Server. However, imo, Microsoft Sql Server is one of the best RDBM's out there.
It's got DBMail, Jobs, CLR object capabilities, and on and on. Plus the integration between Visual Studio and MS-SQL server is phenomenal and you don't get any of that if you choose a different RDBMS.

I must say that I never used NHibernate for the immense time that needed to start using... time wasted on the XML setup.
I recently did a web application in MVC2, where I did choose ADO Entities Framework and I use Linq all the time.
I must say, I was impressed with the speed! and our site was having around 35 000 unique visitors per day, in around 60Gb bandwidth per day (I reduced radically this 60Gb number by hosting all static files in Amazon S3 - Great .NET wrapper they have, I must say).
I will always go this way. It's easy to start (just add new data item, choose tables and that's it! for every change in the database we just need to refresh the model - made automatically in just 2 clicks) and it's fun to use - Linq rules!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.