We (my team) are about to start development of a mission critical project, one of the sub-systems of this project is a Windows service. This service will be a backbone of the entire system and has to respond as per mission critical standard.
This service will contain many lists to minimize database interaction and to gain performance, I am estimating average size of a list to be 250,00 under normal circumstances.
Is it a good idea to use LINQ to query data from these queues, or should I follow my original plan of creating indexed list?
Indexed List is a custom implementation of Idictionary, which will work like an index and will have many performance oriented features such as MRU, Index rebuilding, etc.
Before rolling your own solution, you might want to look at i4o, an indexed LINQ to Objects provider.
Otherwise, it sounds like applying your own indexing may well be worthwhile - but do performance tests first. "Mission critical" is often about reliability more than performance - if LINQ to Objects performs well enough, why not use it?
Even if you do end up writing your own collections, you should consider making them "LINQ queryable" in some fashion (which will depend on the exact nature of the collections)... it's nice to be able to write LINQ queries, even if it's not against vanilla LINQ to Objects.
Related
An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?
What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).
It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."
in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by
This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.
For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.
I would like to know what is the best method for developing a multi-user C# app using the SQL Server2005 as database. This is what I have in mind:
using nhibernate or telerik's openacces orm.
linq
using wrappers. all data from tables load into corresponding objects (at startup) and from that point only delete&update transactions affect the database.
...
I've looked at orm tools but in my opinion they generate a lot of code and i do not know if
it's necessary.
What is the best solution having in mind future changes in the application?
If i would choose the 3rd option how can i ensure that only one users modifies a row in a table(how can i lock a table row which is under modification) ?
Any suggestions or reading material will help!
Thanks!
There are hundreds of ways to solve this, but don't discount ORM. Microsoft's Entity Framework is getting better with every revision. The framework 4.0 bits are pretty good and play extremely well with LINQ.
As for generated code vs your own, try something like Entity Spaces... You have complete control over how the code gets generated and the data access layer is extremely powerful and flexible (not to mention very easy to use). It also plays nicely with LINQ.
I have written a lot of data access code over the years. In the beginning, the ORM tools were rough around the edges and left a lot to be desired. These tools have gone through many iterations since and have become indispensable in my opinion. I can't imagine writing routine after routine that does the same basic CRUD. I did that for years and spent lots of time correcting hardcoded SQL and vow to avoid it at all costs from here on out.
As for concurrency / locking issues, that's a question unto itself. There are many ways to provide locking (the major categories being optimistic and pessimistic). Each has its pros and cons.
If it's multiuser do NOT do #3. The purpose of an DBMS is to handle the multi-user aspects for you. Everything from transactions to access rights are built right in. Going down the path of mimicking that in your code will be difficult to get right. In the past some "engines" like Borland's BDE and MS Access did this. The end result is that you end up dealing with little things like data corruption and consistency errors.
Never mind that as your database grows the is going to take exponentially longer to start.
We typically stay away from ORM tools for a number of reasons, mostly feature / benefit / security concerns. Of course, we are extremely well versed in SQL and can take advantage of the specific features a given db server can offer, which most ORMs can't do. We also tend to tweak the queries based on performance metrics after product release, which would force a recompile of an app for most ORMs. By staying away from this, we can let production DBAs do their job. That may or may not be a concern of yours.
That said a lot of dev teams both like and successfully use the ones you spoke about. I would say to skip Linq-to-SQL in favor of Entity Framework if you're going that route. Linq-to-SQL has all but been replaced by EF.
Save yourself a load of effort and time and use an ORM. In terms of helping you decide which one, there is loads of information/opinion on the web (and StackOverflow!) about which one to use but that'll depend on what your application requirements are (which you haven't described).
I like Linq-to-SQL for small/mid sized apps. It's quick and easy and almost efficient. For bigger apps it'll depend on what types of data transformations and design you have in mind but Linq-to-Entities or nHibernate are probably the most appropriate.
I want to start working on a big project. I research about performance issues about LINQ to EF and NHibernate. I want to use one of them as ORM in my project. now my question is that which one of these two ORM can get me better performance in my project? I will use SQL Server 2008 as database and C# as programming language.
Neither one will have "better performance."
When analyzing performance, you need to look at the limiting factor. The limiting factor in this case will not be the ORM you choose, but rather how you use that tool, how you write your queries, and how you optimize the database backend.
Therefore, the "fastest" ORM will be the one which you can use correctly, coupled with the database server you best understand.
The ORM itself does have a certain amount of overhead, so the "fastest", in terms of sheer performance, is to use none at all. However, this favors the computer's time over at your development time, which is typically not a good trade-off. ORMs can save large amounts of your development time, while imposing only a small overhead when used correctly.
Typically when people experience performance problems when using an ORM it is because they are using the ORM incorrectly, rather than because they picked the "wrong" ORM.
We're currently using Fluent NHibernate on one our projects (with web services, so that adds additional time lag) and as far as I can see, data access is pretty much instantaneous (from human perspective).
Maybe someone can provide answer with concrete numbers though.
Since these two ORMs are somewhat different, it'd be better to decide on which one to use with regard to your specific needs, rather than performance (which, like I said, shouldn't be a big deal).
Here's a nice benchmark. As you can see results depend on whether you are doing SELECT, UPDATE, DELETE.
I have a question for anyone who has experience on i4o or PLINQ. I have a big object collection (about 400K ) needed to query. The logic is very simple and straightforward. For example, there has a collection of Person objects, I need to find the persons matched with same firstName, lastName, datebirth, or the first initial of FirstName/lastname, etc. It is just a time consuming process using LINQ to Object.
I am wondering if i4o (http://www.codeplex.com/i4o)
or PLINQ can help on improving the query performance. Which one is better? And if there has any approach out there.
Thanks!
With 400k objects, I wonder whether a database (either in-process or out-of-process) wouldn't be a more appropriate answer. This then abstracts the index creation process. In particular, any database will support multiple different indexes over different column(s), making the queries cited all very supportable without having to code specifically for each (just let the query optimizer worry about it).
Working with it in-memory may be valid, but you might (with vanilla .NET) have to do a lot more manual index management. By the sounds of it, i4o would certainly be worth investigating, but I don't have any existing comparison data.
i4o : is meant to speed up quering using linq by using indexes like old relational database days.
PLinq: is meant to use extra cpu cores to process the query in parallel.
If performance is your target, depending on your hardware, I say go with i4o it will make a hell of improvement.
I haven't used i4o but I have used PLINQ.
Without know specifics of the query you're trying to improve it's hard to say which (if any) will help.
PLINQ allows for multiprocessing of queries, where it's applicable. There are time however when parallel processing won't help.
i4o looks like it helps with indexing, which will speed up some calls, but not others.
Bottom line is, it depends on the query being run.
I've been taking a look at some different products for .NET which propose to speed up development time by providing a way for business objects to map seamlessly to an automatically generated database. I've never had a problem writing a data access layer, but I'm wondering if this type of product will really save the time it claims. I also worry that I will be giving up too much control over the database and make it harder to track down any data level problems. Do these type of products make it better or worse in the already tough case that the database and business object structure must change?
For example:
Object Relation Mapping from Dev Express
In essence, is it worth it? Will I save "THAT" much time, effort, and future bugs?
I have used SubSonic and EntitySpaces. Once you get the hang of them, I beleive they can save you time, but as complexity of your app and volume of data grow, you may outgrow these tools. You start to lose time trying to figure out if something like a performance issue is related to the ORM or to your code. So, to answer your question, I think it depends. I tend to agree with Eric on this, high volume enterprise apps are not a good place for general purpose ORMs, but in standard fare smaller CRUD type apps, you might see some saved time.
I've found iBatis from the Apache group to be an excellent solution to this problem. My team is currently using iBatis to map all of our calls from Java to our MySQL backend. It's been a huge benefit as it's easy to manage all of our SQL queries and procedures because they're all located in XML files, not in our code. Separating SQL from your code, no matter what the language, is a great help.
Additionally, iBatis allows you to write your own data mappers to map data to and from your objects to the DB. We wanted this flexibility, as opposed to a Hibernate type solution that does everything for you, but also (IMO) limits your ability to perform complex queries.
There is a .NET version of iBatis as well.
I've recently set up ActiveRecord from the Castle Project for an app. It was pretty easy to get going. After creating a new app with it, I even used MyGeneration to script out class files for a legacy app that ActiveRecord could use in a pretty short time. It uses NHibernate to interact with the database, but takes away all the xml mapping that comes with NHibernate. The nice thing is though, if necessary, you already have NHibernate in your project, you can use its full power if you have some special cases. I'd suggest taking a look at it.
There are lots of choices of ORMs. Linq to Sql, nHibernate. For pure object databases there is db4o.
It depends on the application, but for a high volume enterprise application, I would not go this route. You need more control of your data.
I was discussing this with a friend over the weekend and it seems like the gains you make on ease of storage are lost if you need to be able to query the database outside of the application. My understanding is that these databases work by storing your object data in a de-normalized fashion. This makes it fast to retrieve entire sets of objects, but if you need to select data from a perspective that doesn't match your object model, the odbms might have a hard time getting at the particular data you want.