I'm writing a WPF client app, using Linq to Sql with Sql Compact edition.
The db is relatively small (3MB) and read-only.
Bottom line is that The performance are not as good as I hoped them to be, and I'm looking for tips and practical ways to increase that.
More facts:
The schema contains around a dozen of entities with extensive relations between them.
Profiling the app found out that the query is being run quite fast but building the c# Entities is the the process that take the most time (could be up to 8 seconds).
Mostly I believe because we have used LoadWith, and the DataContext got no choice but to build the objects graph in memory.
I can provide additional information, if needed.
EDIT:
As I mentioned the db is read-only so DataContext is not tracking changes.
We are making use of static queries on reoccurring queries. The problem is when the application is initializing and we prefetch many objects to memory to be served as cache.
Thanks for your help.
Ariel
Well, you might find that making use of lazy loading (rather than eager loading) might help increase the performance (i.e. avoid using LoadWith) since the entities won't need memory allocated for the relationship chains (or deep loading of the object graph) and instead they will be populated on demand.
However, you'll need to be focused in your design to support this (otherwise you will simply move the performance bottleneck to become overly "chatty" with regard to SQL statements being executed against the SQL CE database.
The DataContext can also start to bloat (memory) as it tracks changes. You might need to consider your approach to how you use Data Contexts (for instance, you can attach them to new contexts provided the original context has been disposed).
A very simple solution is to use staticly declared compiled linq queries. This is of course not that practical, but it will improve performance as the expression trees will only need to be built once, during compile-time, instead of being dynamically created every time the query is called for execution.
This might help:
http://msmvps.com/blogs/omar/archive/2008/10/27/solving-common-problems-with-compiled-queries-in-linq-to-sql-for-high-demand-asp-net-websites.aspx
Related
I have a problem. My LINQ to SQL queries are pushing data to the database at ~1000 rows per second. But this is much too slow for me. The objects are not complicated. CPU usage is <10% and bandwidth is not the bottleneck too.
10% is on client, on server is 0% or max 1% generally not working at all, not traversing indexes etc.
Why 1000/s are slow, i need something around 20000/s - 200000/s to solve my problem in other way i will get more data than i can calculate.
I dont using transaction but LINQ using, when i post for example milion objects new objects to DataContext and run SubmitChanges() then this is inserting in LINQ internal transaction.
I dont use parallel LINQ, i dont have many selects, mostly in this scenario i'm inserting objects and want use all resources i have not only 5% od cpu and 10kb/s of network!
when i post for example milion objects
Forget it. Linq2sql is not intended for such large batch updates/inserts.
The problem is that Linq2sql will execute a separate insert (or update) statement for each insert (update). This kind of behaviour is not suitable with such large numbers.
For inserts you should look into SqlBulkCopy because it is a lot faster (and really order of magnitudes faster).
Some performance optimization can be achived with LINQ-to-SQL using first off precompiled queries. A large part of the cost is compiling the actual query.
http://www.albahari.com/nutshell/speedinguplinqtosql.aspx
http://msdn.microsoft.com/en-us/library/bb399335.aspx
Also you can disable object tracking which may give you milliseconds of improvement. This is done on the datacontext right after you instantiate it.
I also encountered this problem before. The solution I used is Entity Framework. There is a tutorial here. One traditional way is to use LINQ-To-Entity, which has similar syntax and seamless integration of C# objects. This way gave me 10x acceleration in my impression. But a more efficient (in magnitude) way is to write SQL statement by yourself, and then use ExecuteStoreQuery function to fetch the results. It requires you to write SQL rather than LINQ statements, but the returned results can still be read by C# easily.
I was wondering how does the Entity Framework 4 is compared to native Ado.Net and SPs ?
what i would be missing if i used normal Ado.Net ?
does it worth leaving EF4 ?
In a nutshell, EF is an object-relational mapper (ORM), and ADO.Net is raw power. An ORM allows you to trade some runtime performance for ease of maintenance. You gain the ability to write code in a more declarative manner, expressing what you want out of the database instead of exactly how to go about getting it. As a result, changes to the database structure can be accounted for in the mappings rather than in every single part of your application that needed to touch the particular table that changed.
What you would be missing if you use ADO.Net is developer productivity. Describing each database operation in detail to ADO.Net is time consuming, error-prone, and not much fun.
I don't think I would ever want to "leave" an ORM and go back to raw ADO.Net except in situations in which extreme performance is required, such as importing large amounts of data, in which case you might be better off writing an SSIS package anyway.
EF is not suited for "crunching" large amounts of data: statistical or financial data with lot of abstract entities for example. Otherwise it's fine. Anyway, unless you suffering from perfomance issues - it's fine too. Also nothing stops you from using both concepts at the same time.
EF feels more natural, but if you are a hardcore sql user, it might feel weak and odd at first. But I like doing everything on the c# side, less maintenance issues, less headaches, less magic strings.
Anyways, for performance issues, unless you are doing mass inserts, updates, you won't see any difference.
If you use normal ADO.Net, without some kind of OR/M wrapped around it, you would still be working on records, not classes with behaviours and methods on them. You would need an additional biz layer tied to the record.
An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?
What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).
It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."
in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by
This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.
For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.
Few questions on ORM mappers like nhibernate (for .net/c# environment).
When queries are run against a sqlserver database, does it use parameter sizes internally?
paramaters.Add("#column1", SqlDataType.Int, 4)
Do they all use reflection at runtime? i.e. hand coding is always a tad faster?
does it support temp tables, table variables?
ORM World is powerfull and full featured one, I think today obstacles are peoples theirself, this to say that using ORM's needs to take changes in mind, in the way of thinkng applications, architectures and patterns.
To answer you questions:
Query execution depends on more factors, it's possible to fully customize the engine/environment to take advantages of various features like, deffered execution, future queries (queries executed in a future moment), multiple queries, and last but not least, session management involves in this area:
Deferred execution is based on concepts like lazy-loading and lazy execution, this means that queries are executed against the database just qhen you perform some actions, like accessing to methods of the NHibernate Session like ToList(), another example of deferred execution is with using of LinqToNhibernate, where just when you access to certain objects, queries are executed
Future queries are as I said beforre queries executed in a future moment, Ayende speaks well about that
Multiple queries are queries that can be "packed" together and executed in one time avoiding multiple roundtrips to the DB, and this could be a very cool feature
Session Management, this is another chapter to mention... but take in mind that if you manage well your session, or better, let NHibernate engine to manage well the session, sometime it's not needed to go to the DN to obtain data
in all the cases, tools like NHibernate generates queries for you, and parametrized queries are managed well with parameters even depending on the underlying DB engine and conseguently DB Dialect you choose!
It's clear that frameworks like NHibernate, most of the time, use reflection at runtime, but it's needed to mention the multiple Reflection optimization are used, see for example Dynamic Proxies... It's clear that somethime, or maybe all the time direct code could be faster, but just in the unit, in the big picture this could involve in more mistakes and bottlenecks
Talking about NHibernate, or better saying, It's usefull to understand what you mean when talk about Temp Tables and temp data.. In terms as are, NHibernate, as I know, doesn't support natively Temp Tables, in the sense of Runtime tables, but this could be done because NHibernate permit to create object mapping at runtime, so a mechanism of Temp data could be implemented using that api
I hope I provided an usefull answer!
...and oops, sorry for my bad english!
nhiberate uses an optimised form of reflection that creates proxy objects on startup that perform better than plain reflection, as it only incurs a one time cost. You have the option of disabling this feature as well which has nhibernate behave in a more typical manner, with the permanent use of reflection.
This feature is set with the following key:
<add key="hibernate.use_reflection_optimizer" value="true" />
Nhibernate can be used with variable table names. See this SO thread for a good solution.
NHibernate and SubSonic, LinqToSql, EF and I think most of the other ones use parameterized sql.
Most ORM's use some sort of reflection, there are some that generate all the code and SQL Query for you at design time so they don't have to use reflection code it might work a bit faster but it makes you domain a real mess and you have to use their application to regenerate all your code.
I am almost sure that they all don't support that but most have a way that you can use SP's and Views to do this.
You can check this out NHibernate Screencast Series http://www.summerofnhibernate.com/
Currently I am using NetTiers to generate my data access layer and service layer. I have been using NetTiers for over 2 years and have found it to be very useful. At some point I need to look at LINQ so my questions are...
Has anyone else gone from NetTiers to LINQ To SQL?
Was this switch over a good or bad thing?
Is there anything that I should be aware of?
Would you recommend this switch?
Basically I would welcome any thoughts
.
No
See #1
You should beware of standard abstraction overhead. Also it's very SQL Server based in it's current state.
Are you using SQL Server, then maybe. If you are using LINQ for other things right now like over XML data (great), Object data, Datasets, then yes you should could switch to have a uniform data syntax for all of them. Like lagerdalek mentioned if it ain't broke don't fix it.
From the quick look at .netTiers Application Framework, I'd say if you already have an investment with that solution it seems to give you much more than a simple Data Access Layer and you should stick with it.
From my experience LINQ to SQL is a good solution for small-medium sized projects. It is an ORM which is a great way to enhance productivity. It also should give you another layer of abstraction that will allow you to change out the layer underneath for something else. The designer in Visual Studio (and I belive VS Express also) is very easy and simple to use. It gives you the common drag-drop and property-based editing of the object mappings.
# Jason Jackson - The Designer does let you add properties by hand, however you need to specify the attributes for that property, but you do this once, it might take 3 minutes longer than the initial dragging of the table into the designer, however it is only necessary once per change in the database itself. This is not too different from other ORMs, however you are correct that they could make this much easier, and find only those properties that have changed, or even implement some kind of refactoring tool for such needs.
Resources:
Why use LINQ to SQL?
Scott Guthrie on LINQ to SQL
10 Tips to Improve your LINQ to SQL Application Performance
LINQ To SQL and Visual Studio 2008 Performance Update
Performance Comparisons LINQ to SQL / ADO / C#
LINQ to SQL 5 Minute Overview
Note that Parallel LINQ is being developed to allow for much greater performance on multi-core machines.
I tried to use Linq to SQL on a small project, thinking that I wanted something I could generate quickly. I ran into a lot of problems in the designer. For example, anytime you need to add a column to a table you basically have to remove and re-add the table definition in the designer. If you have set any properties on the table then you have to re-set those properties. For me this really slowed down the development process.
LINQ to SQL itself is nice. I really like the extensibility. If they can improve the designer I might try it again. I think that the framework would benefit from a little more functionality aimed at a disconnected model like web development.
Check out Scott Guthrie's LINQ to SQL series of blog posts for some great examples of how to use it.
NetTiers is very good for generating a heavy and robust DAL, and we use it internally for core libraries and frameworks.
As I see it, LINQ (in all its incarnations, but specifically as I think you're asking to SQL) is fantastic for quick data access, and we generally use it for more agile cases.
Both technologies are quite inflexible to change without regeneration of the code or dbml layer.
That being said, used properly LINQ 2 SQL is quite a robust solution, and you might even start using it for future development due to it's ease of use, but I wouldn't throw away your current DAL for it - if it aint broke ...
My experience tells me that using by using linq you can get things done faster, however the actual actions to the database are slower.
So... if you have a small database, i'll say go for it. If not, i would wait for some improvements before changing
I'm using LINQ to SQL on fairly large project right now (about 150 tables) and it is working out very well for me. The last ORM I used was IBatis and it worked well but took alot of legwork to get your mappings done. LINQ to SQL performs very well for me and so far has proved to be very easy to use out of the box. There are definately some differences you have to overcome in transition, but I would recommend it's use.
Side note, I have never used or read about NetTiers so I won't discount it's effectiveness, but LINQ to SQL in general has proven to be an extremely viable ORM.
Our team used to use NetTiers and found it to be useful. BUT... the more we used it, the more we found headaches and pain points with it. For example, anytime you make a change to the database, you need to re-generate the DAL with CodeSmith which involved:
re-generating thousands of lines of code in 3 separate projects
re-generating hundreds of stored procedures
Maybe there are other ways of doing it, but this is what we had to do. The re-gen of the source code was ok, scary, but ok. The real issue came with the stored procedures. It didn't clean any unused stored procedures so if you removed a table from your schema and re-gened your DAL, the stored procedures for that table did not get removed. Also, this became quite a headache for database change scripts where we had to compare the old database structure to the new one and create a change script to update client installations. This script could run into the tens of thousands of lines of sql code and if there was an issue executing it, which there invariably was, it was quite a pain to resolve it.
Then the light came on, NHibernate as an ORM. It certainly has a ramp-up time to it but it is well worth it. There is a ton of support for it so if there's something you need done, more than likely it's been done before. It is extremely flexible and allows you to control every aspect of it and then some. It is also becoming easier and easier to use. Fluent Nhibernate is up and coming as a great way to get rid of the xml mapping files that are needed and NHibernate Profiler provides an excellent interface to see what's going on behind the scenes to increase efficiency and remove redundancy.
Moving from NetTiers to NHibernate has been painful, but in a good way. It has forced us to move into a better architecture and re-evaluate functional needs. NetTiers provided tons of data access code, get this entity by its id, get this other entity by its foreign key, get a tlist and vlist of this and that, but most of it was unnecessary and unused. NHibernate with a generic repository and custom repositories only where needed reduced tons of unused code and really increased readability and reliability.