I'm making an application that will analyze real-time data that has been stored to a SQL CE database. When I test the application as it is built now, with LINQ to SQL, I get slow results and I need to rethink how to do this.
To save me some time, can I trust that L2S is just as fast as the 'old' SqlCe methodes were? I like L2S and would prefer to stay with it, and if your experience says it's as fast as any other db connection, I can rest assured that I wouldn't increase performance by rewriting the L2S to old SQL statements.
The bottlenecks when using SqlCE doesn't stem from the SQL generated from Linq to Sql. Remember, CE is an in process db and therefore has it's limitations. For example, LEFT OUTTER JOINS are a DISASTER regardless of what you use to query it. Inserts and Updates aren't bad, but then again, if you'll be doing a high volume of either one of those, you'll suffer some serious performance issues. My point is, the slowness isn't because of LINQ to SQL. I've benchmarked it in the past (don't know if I still have that code) and from what I remember, Linq to SQL wasn't slower then querying it directly with ADO.NET. The performance issues are due to the constraints of CE itself.
If you are using SQL CE this video from last year's PDC is very informational. The idea we have of how to optimize queries for full blown SQL Server not always apply, and sometimes detriment performance on SQL CE.
I would recommend you watch it, as the presenter explains the differences and does benchmarks to show the results. Here you can find a link to his blog.
Related
I have some performance measuring issue between the EF query run through the web application and running the Profiler generated T-SQL directly into the SQL Query window.
Following is my EF query that executes through the web application:
IEnumerable<application> _entityList = context.applications
.Include(context.indb_generalInfo.EntitySet.Name)
.Include(context.setup_budget.EntitySet.Name)
.Include(context.setup_committee.EntitySet.Name)
.Include(context.setup_fund.EntitySet.Name)
.Include(context.setup_appStatus.EntitySet.Name)
.Include(context.appSancAdvices.EntitySet.Name)
.Where(e => e.indb_generalInfo != null);
if (isIFL != null)
_entityList = _entityList.Where(e => e.app_isIFL == isIFL);
int _entityCount = _entityList.Count(); // hits the database server at this line
While tracing the above EF Query in SQL Profiler it reveals that it took around 221'095 ms to execute. (The applications table having 30,000+, indb_generalInfo having 11,000+ and appSancAdvices having 30,000+ records).
However, when I copy the T-SQL from Profiler and run it directly from Query window it takes around 4'000 ms only.
Why is it so?
The venom in this query is in the first words: IEnumerable<application>. If you replace that by var (i.e. IQueryable) the query will be translated into SQL up to and including the last Count(). This will take considerably less time, because the amount of transported data is reduced to almost nothing.
Further, as bobek already mentioned, you don't need the Includes as you're only counting context.applications items.
Apart from that, you will always notice overhead of using an ORM like Entity Framework.
That's because EF needs to translate your code into TSQL first which is costly as well. Look at this link here: http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/ It'll let you compile your LINQ and should help you with the speed. Also, do you really need that many tables for this query? Maybe you can think of a way to filter it out and only pull out what you need?
The EF definitely has a cost in terms of performance. But it also provides the flexibility to use storedprocs for complex TSQL. But in my opinion it should be your last resort.
in case you interested re performance and EF.
http://msdn.microsoft.com/en-us/data/hh949853.aspx
However...
EF Query in SQL Profiler it reveals that it took around 221'095 ms to
execute.
then..
copy the T-SQL from Profiler and run it directly from Query window
Where the SQL came from is irrelevant.
Q1 took x millisecs. Based on SQL profiler info
Exact same Query Q1' takes less based on SQL profiler. Which means the source of the SQL isnt the issue, it implies environmental issues are involved.
The most obvious explanation, SQL server has buffered many data pages and can much better serve the second identical request.
I have a problem. My LINQ to SQL queries are pushing data to the database at ~1000 rows per second. But this is much too slow for me. The objects are not complicated. CPU usage is <10% and bandwidth is not the bottleneck too.
10% is on client, on server is 0% or max 1% generally not working at all, not traversing indexes etc.
Why 1000/s are slow, i need something around 20000/s - 200000/s to solve my problem in other way i will get more data than i can calculate.
I dont using transaction but LINQ using, when i post for example milion objects new objects to DataContext and run SubmitChanges() then this is inserting in LINQ internal transaction.
I dont use parallel LINQ, i dont have many selects, mostly in this scenario i'm inserting objects and want use all resources i have not only 5% od cpu and 10kb/s of network!
when i post for example milion objects
Forget it. Linq2sql is not intended for such large batch updates/inserts.
The problem is that Linq2sql will execute a separate insert (or update) statement for each insert (update). This kind of behaviour is not suitable with such large numbers.
For inserts you should look into SqlBulkCopy because it is a lot faster (and really order of magnitudes faster).
Some performance optimization can be achived with LINQ-to-SQL using first off precompiled queries. A large part of the cost is compiling the actual query.
http://www.albahari.com/nutshell/speedinguplinqtosql.aspx
http://msdn.microsoft.com/en-us/library/bb399335.aspx
Also you can disable object tracking which may give you milliseconds of improvement. This is done on the datacontext right after you instantiate it.
I also encountered this problem before. The solution I used is Entity Framework. There is a tutorial here. One traditional way is to use LINQ-To-Entity, which has similar syntax and seamless integration of C# objects. This way gave me 10x acceleration in my impression. But a more efficient (in magnitude) way is to write SQL statement by yourself, and then use ExecuteStoreQuery function to fetch the results. It requires you to write SQL rather than LINQ statements, but the returned results can still be read by C# easily.
I use a stored procedure to fetch data and modifications from the database.
If i use LINQ for the same job will there be any performance issue? I mean will I get better performance or not?
I saw that LINQ syntax is bit complicated, so is there any tool which help me to generate LINQ syntax.
thanks
There is a slight performance impact against using Linq2Sql VS calling stored procedures; for one, the stored procedure can be compiled and stored so subsequent calls are quicker, and you have more control and options in T-SQL land VS having Linq2Sql generate the statements for you.
However, unless your site is getting heavy traffic, theres no reason not to use it if you are looking for an easy way to get your site set up.
In general, LINQ will not beat the performance of a stored procedure. After all, Linq2Sql generates SQL and that will never be better than a well-written stored procedure.
Linq2sql, if not used properly, can harm performance. That is often caused by developers that do not understand the deferred loading principle and that can lead to a LOT of queries.
Where it WILL create performance issues is with bulk operations. This is an area where you do not want to use Linq2Sql but you need SqlBulkCopy or stored procedures.
Nevertheless, I use it a lot and am happy in general.
LinqPad is a free program and can be a way to learn LINQ. It comes loaded with 500 examples.
An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?
What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).
It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."
in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by
This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.
For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.
I made a fairly large social network type website and used nothing but inline SQL Statements to access the database (I was new to the language so back off!)
Are there any performance issues when doing it this way as opposed to using a massive XSD DataSet file to handle all the queries? Or is this just bad design?
Thanks!
When you reach real DB performance issues it won't really matter (performance wise) whether you're using stored procedures or direct SQL statements.
Your best bet in that situation is to avoid DB in the first place. In other words, it would be better to plan and architect a good caching mechanism because that will make all the difference when it really comes to serious traffic.
Stored procedures or inline code... again, performance wise (i'm not talking about maintainability, security, ...) simple doesn't matter that much anymore.
I think the maintenance issue/cost will have much more impact, will be much greater then the performance impact (if there's any performance impact at all).
If it is SQL Server changing your Inline SQL Statements to be parametrised calls to sp_ExecuteSQL should yield a very significant performance improvement - and would probably be easier to refactor than moving to, say, Stored Procedures instead of inline code.
IME ultimately, Stored procedures that return multiple recordsets (i.e. do several pieces of work / logic, rather than just replace single in-line queries) would yield more performance improvements.
Strictly speaking the SQL performance would be better if you use inline sql statements rather than an XSD Dataset. This is assuming that at a minimum you use parameterized queries (although SQL Server 2005 even optimizes for non-parameterized queries).
The statement that inline sql cannot be optimized by the database engine isn't true. Stored procedures are optimized identically to inline sql with SQL Server.
But whether it makes sense from a design perspective is another story entirely. And whether you are over-optimizing too early is another question.
Stay away from generated typed datasets. The performance isn't that good. If you need the type of functionality, look at LINQ; otherwise just grab the Enterprise Library and go direct.
When having large amount of data a DB is better in ways of: maintenance and performance. Having stored procedures for example makes it possible to delegate the work and query-optimizations to a professional DBA, no recompiles are needed. When having XSD/XML data files, you have to regenerate your code for each change.
Also look at concurrency: more users, more web-request, more concurrency... you can end up in locks in the data file. You could implement advanced caching mechanisms to gain more performance but that's also extra maintenance and complexity.