Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a fairly high performance application, and I know database connections are usually one of the more expensive operations. I have a task that runs pretty frequently, and in the course of business it has to select data from Table1 and Table2. I have two options:
Keep making two entity framework queries like I am right now. select from Table1 and select from Table2 in linq queries. (What I'm currently doing now).
Created a stored procedure that returns both resultsets in one query, using multiple resultsets.
I'd imagine the cost to SQL Server is the same: the same IO is being performed. I'm curious if anyone can speak to the performance bump that may exist in a "hot" codepath where milliseconds matter.
and I know database connections are usually one of the more expensive operations
Unless you turn off connection pooling, then as long as there are connections already established in the pool and available to use, obtaining a connection is pretty cheap. It also really shouldn't matter here anyway.
When it comes to two queries (whether EF or not) vs one query with two result sets (and using NextResult on the data reader) then you will gain a little, but really not much. Since there's no need to re-establish a connection either way, there's only a very small reduction in the overhead of one over the other, that will be dwarfed by the amount of actual data if the results are large enough for you to care much about this impact. (Less overhead again if you could union the two resultsets, but then you could do that with EF too anyway).
If you mean the bytes going too and fro over the connection after it's been established, then you should be able to send slightly less to the database (but we're talking a handful of bytes) and about the same coming back, assuming that your query is only obtaining what is actually needed. That is you do something like from t in Table1Repository select new {t.ID, t.Name} if you need ids and names rather than pulling back complete entities for each row.
EntityFramework does a whole bunch of things, and doing anything costs, so taking on more of the work yourself should mean you can be tighter. However, as well as introducing new scope for error over the tried and tested, you also introduce new scope for doing things less efficiently than EF does.
Any seeking of commonality between different pieces of database-handling code gets you further and further along the sort of path that ends up with you producing your own version of EntityFramework, but with the efficiency of all of it being up to you. Any attempt to streamline a particular query brings you in the opposite direction of having masses of similar, but not identical, code with slightly different bugs and performance hits.
In all, you are likely better off taking the EF approach first, and if a particular query proves particularly troublesome when it comes to performance then first see if you can improve it while remaining with EF (optimise the linq, use AsNoTracking when appropriate and so on) and if it is still a hotspot then try to hand-roll with ADO for just that part and measure. Until then, saying "yes, it would be slightly faster to use two resultsets with ADO.NET" isn't terribly useful, because just what that "slightly" is depends.
If the query is a simple one to read from table1 and table2 then LinQ queries should give similar performance as executing stored procedure (plain SQL). But if the query runs across different databases then plain SQL is always better, where you can Union the result sets and have the data from all databases.
In MySQL "Explain" statement can be used to know the performance of query. See this link:
http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
Another useful tool, is to check the SQL generated for your LinQ query in the output window of Microsoft Visual Studio. You can execute this query directly in a SQL editor and check the performance.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
EDIT
Actually this question should be more general: How to modify query to DB if Linq with IQueryable gives errors?
The correct answer is as far as I understand — to get as much of the query done at the database level. Because in this particular case my complicate query just can not be transform from Linq to sql.
So I just wrote a raw sql query with FromSqlRaw() method and errors have gone. Moreover I wrote query in the way that does not take all entries (with filtering) as opposed to ToList() method, so I have less doubt about performance (though I did not measure it).
Need some help with understanding how to use linq with converting List to IQueryable.
What I had:
A three tables in DB with IQueryable-based queries to one of them.
What I need:
To create a query that combine data from three tables by Linq and give me resulting specific column with data for every element of one of table with function of filtering by this column.
What I try:
Supplement IQueryable-based query. But I found problems with List to IQueryable converting. Method AsQueryable() gives errors.
What I achieve:
I rewrite queries with List-based logic in Linq and it gives me what I need. But I do not understand:
Is this practice good?
Why should I often must make ToList() conversion for avoiding errors?
Is the speed of my solution worse than IQueryable-based approach?
Here is fiddle with my exercises: https://dotnetfiddle.net/BAKi6r
What I need I get in listF var.
I totally replace CreateAsync method in it with Create method for List. Is it good?
I also try to use hardcoded Lists with CreateAsync method /items2moq, items3moq/, but they with filtered List-based query give The provider for the source IQueryable doesn't implement IAsyncQueryProvider error. Also I got Argument types do not match error when I use IQueryable for NamesIQ instead of List for NamesList. What exactly the source of this errors?
Why should I often must make ToList() conversion for avoiding errors?
I often think about Linq queries in three "levels":
IQueryable - there are designed to translate a Linq query into an equivalent database (or whatever data source you're using) query. Many Linq and non-Linq operations just can't be translated into its SQL or other equivalent, so this layer would throw an error. Even operations that seem simple (like splitting a string) are difficult if not impossible to do in SQL
IEnumerable - in this layer, Linq queries are done in memory, so there's much more flexibility to do custom operations. To get from the IQueryable layer to the IEnumerable layer, the AsEnumerable() call is the most straightforward. That separates the part of the query that gets raw data from the part that can create custom objects, do more complex filtering and aggregations, etc. Note that IEnumerable still uses "deferred execution", meaning that at this stage, the query is just a query - the results don;t actually get computed until you enumerate it, either with a foreach loop or by advancing to the next layer:
List/Array/etc. This is where queries are executed and turned into concrete collections. Some of the benefits of this layer are serializability (you can't "serialize" an enumerator) and eager-loading (as opposed to deferred execution described above).
So you're probably getting an error because you have some part of your query that can't be translated by the underlying Queryable provider, and using ToList is a convenient way to materialize the raw data into a list, which allows you to do more complex operations. Note that AsEnumerable() would do the same thing but would maintain deferred execution.
Is this practice good?
It can be, but you might easily be getting more data than you need by doing filtering at the list level rather than at the database level. My general practice is to get as much of the query done at the database level, and only moving to the enumerable/list level when there's no known way to translate the rest of the query to SQL.
Is the speed of my solution worse than IQueryable-based approach?
The only way to know is to try it both ways and measure the difference. But it's a pretty safe bet that if you get more raw data than you need and filter in memory that you'll have worse performance.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Currently I'm redesigning an existing program which uses a master table which contains multiple values. (C# .net core 3.0 & EF) (One big lookup table)
Much of these values are rarely changing and I would put them in a c# enum.
Some examples: Language, Sex, ReceiptStatus, RiskType, RelationType, SignatureStatus, CommunicationType, PartKind, LegalStatute, ...
The list goes on and on and currently has 143 different categories, each having their own values with 2 translations in it.
My company wants the values to be in the database, so a non programmer can change them when they have to.
However it doesn't feel good at all. I would love to separate the table but creating 143 tables seem a bit of an overkill. If it was only 5-10 lookup tables it would have been fine..
Any advice? Stick to 1 lookup table? Feels wrong to my eyes. Multiple tables?
Convince my company we should just use C# enums which work perfectly fine, ruling out the possibility that a non programmer can edit them?
Based on your inclination to use enums, I'm going to assume that these lookup values do not change often.
Buckle up because a lot of hard-fought knowledge about maintainability is embedded in the analysis below. Let me break down the approaches you are considering:
Pure enums: This is the least flexible approach because it closes a lot of doors. As you said, changing values requires a developer and a deployment. What's your strategy if you eventually have other tables that need to relate to one of your many, many values? To me this is far too restrictive, especially since with either of the other approaches, you could create a .t4 template that generates
enums based on the data. Then if the data changes, you just
re-generate. I do this a lot.
One giant lookup table: Not as flexible as it may seem! This trades complexity, single responsibility principal, and referential integrity against repetition/table spam and is probably an expression of the Big Ball of Mud anti-pattern. You could add a column to this table that controls where a given value can be used, and that will allow you to have sane drop down lists, but that isn't as good as referential integrity. If other tables need to relate to a lookup, you have to relate against this entire table, which is much less clear. You will have to be careful to enforce your own layer of referential integrity since the database can't help you. Finally, and this is a big deal, if any if your 143 values has or will ever have extra complexity and could really benefit from an additional column, cognitive load begins to escalate. If five of the 143 need their own columns, you now have to hold all five columns in your mind to understand any one column... That is agony. Here's a thought experiment for you if I'm not getting my point across: why not build your entire project as one giant table?
143 tables: The most flexible approach, and all things considered, the easiest to maintain by a massive margin. It does not close any doors; down the road you can still create a UI for editing any value you want. If you want to relate other tables to a lookup value, that relationship will be easy to understand because you can relate to LegalStatus instead of GiantEverythingTable, and enjoy the benefits of referential integrity, never having to worry about corrupting your own data. You can also script table and index creation with something like NimbleText (a great tool and a hidden gem). There will be a huge number of tables, which is itself a minor maintenance problem, but it's one that doesn't actually break anything and doesn't lead to cognitive load. This is an acceptable trade-off. I would go this way and generate enums using t4.
The thing about most software projects of any size is that you may look at my objections and say they don't apply, and you might be right. But if this thing is going to be in active development, you have to ask: are you sure? Do you really know what's going to happen in a year?
When considering trade-offs, I've learned to assign a lot of weight to the most flexible/simple decision. Maintainability problems are what kill software projects. They are the enemy.
Hope that helps!
This question already has answers here:
LINQ to Objects and improved perf with an Index?
(4 answers)
Closed 3 years ago.
I have worked with FoxPro databases, which uses the Rushmore optimization technology and I wanted to know if there is any optimization technology for LINQ.
I am not looking for this in LINQ-to-SQL, because Rushmore was actually assimilated into SQL Server, and is responsible for part of its index-related speed.
I want to know for LINQ-to-Objects, if there is something similar to Rushmore or the index-related performance optimizations in SQL Server?
This question is not really a duplicate because 1.) Rushmore automatically optimized your expressions (and IF you do that with I4O, it is done manually), because 2). there was a bitmapped component that allowed multiple indexes to be quickly combined in expressions (and had good performance), and because 3). the technology works for tables that can't fit in memory (which would be a plus, in this case).
There is no Query Optimizer and no Indexes in Linq-to-Objects. You can use the ToDictionary, ToLookup, ToHashset extension methods to create "indexes" over in-memory collections, and you can create sorted collections of objects.
You can then manually write queries and procedural code using these optimized collections to replicate what a query optimizer would otherwise do.
(I am just thinking out loud and would be a mess as a comment)
Rushmore optimization was basically about choosing the correct indexes, and\or use no index at all and do a bit map indexing on the fly. While it is a nice technique, I think databases got to speed via their own different tricks besides the indexes themselves. For example, lateral in ansi sql, range index in postgreSQL, shards are few to name. If you use Linq against a particular backend, you would be utilizing that backend's capabilities (including Rushmore, if you are using Linq To VFP).
For Linq To Objects, there isn't such a thing AFAIK, but as a developer you could take the responsibility of writing it as optimized as possible and if you think, L2O is an in memory thing, you might not need it as much as you do with a database. Even with Rushmore, we were taking the responsibility of trying alternative ways of querying for the best performance.
(You have the question tagged as "Linq", I would hope Joseph Albahari -author of .Net xx in a nutshell- would see it and provide an answer in detail)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
In a nutshell, my project is receiving data faster than it can process and then write it to a database (EF6 to SQL Server 2016), and I'm not sure what the best-practice approach is (ditch EF? Offload to database via Service Broker? Something else?) Write events are not being handled fast enough, so they result in cascading event logjams and fatal memory crashes.
The write events are (I want them to be) low-priority, and I'm using async tasks for them. The write events involve a lot of data and a lot of relationships, and EF is just not handling them efficiently (I'm using AddRange, but EF is just sending everything in many single inserts, which I've read is its regular behavior).
I've tried paring back the relationships, and I've moved more processing over to the database, and I've tried using a batched "Delayed"Queue (an observable queue implementation that triggers an "empty me" event when a threshold is met), so that the inbound write events can be handled very quickly (just dump the request in the queue and move on), but this didn't get me anywhere (not surprising, I suppose, since I've basically added a message queue on top of the built-in message queue?).
Please correct me if I'm wrong, but it seems to me that EF is not the right tool for something as write-heavy and relationship-heavy as what I have (I know there are bulk-write extensions...). So, in an effort to resolve this sensibly, would it make sense to bypass EF and do my own bulk-write queries, or is this an appropriate use for Service Broker? With Service Broker, I could just send a dataset in one sproc, which just adds the dataset to the queue, frees the frontend to move on, and the database can handle and build the relationships whenever. Are these solutions sensible or best practice, or am I barking up the wrong tree (or putting lipstick-on-a-pig maybe)?
Thank you.
Please correct me if I'm wrong, but it seems to me that EF is not the
right tool for something as write-heavy and relationship-heavy as what
I have
You are right.
By default like you said, Entity Framework perform one database round-trip for every record to save which is INSANELY slow.
Disclaimer: I'm the owner of Entity Framework Extensions
(The library is not free)
This library allows you to improve Entity Framework Performance.
I'm not sure if our library can help you but it worth a try if you save multiple entities at once.
By example, the BulkSaveChanges is exactly like SaveChanges but way faster by dramatically reducing the database round-trip required.
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(endItems);
context.BulkInsert(endItems);
context.BulkUpdate(endItems);
// Customize Primary Key
context.BulkMerge(endItems, operation => {
operation.ColumnPrimaryKeyExpression =
endItem => endItem.Code;
});
This question already has answers here:
How to best handle the storage of historical data?
(2 answers)
Closed 8 years ago.
What is the best practice of keeping past data in the database? For example lets say we have transaction tables like AttendanceTrans, SalaryTrans in a payroll solution. So every month we have to insert hundreds or thousands of new records to these tables. So all the past and current data in same table.
Another approach would be to keep AttendanceHistory and SalaryHistory tables. So that end of every period (month) we empty the Trans tables after coping the data to respective History tables.
When considering factors like performance, ease of report generation, ease of coding and maintenance, what would be the optimum solution?
Note : RDBMS is SQL Server 2008 and programming environment is .NET (C#)
In general you should keep all the data in the same table. SQL Server is great at handling large volumes of data and it will make your life a lot easier (reporting, querying, coding, maintenance) if it's all in one place. If your data is indexed appropriately then you'll be just fine with thousands of new records per month.
In my opinion, best solution in sql server is CDC (Change Data Capture). It very simple to use. You can change volume of historical data with changing schedule of clear job.
I think this is the best way for performance because CDC gets changes from transaction log (it is not triggers on table), but you need to use Full Recovery Model for you database.