Comparing i4o vs. PLINQ for larger collections

Comparing i4o vs. PLINQ for larger collections - c#

I have a question for anyone who has experience on i4o or PLINQ. I have a big object collection (about 400K ) needed to query. The logic is very simple and straightforward. For example, there has a collection of Person objects, I need to find the persons matched with same firstName, lastName, datebirth, or the first initial of FirstName/lastname, etc. It is just a time consuming process using LINQ to Object.
I am wondering if i4o (http://www.codeplex.com/i4o)
or PLINQ can help on improving the query performance. Which one is better? And if there has any approach out there.
Thanks!

With 400k objects, I wonder whether a database (either in-process or out-of-process) wouldn't be a more appropriate answer. This then abstracts the index creation process. In particular, any database will support multiple different indexes over different column(s), making the queries cited all very supportable without having to code specifically for each (just let the query optimizer worry about it).
Working with it in-memory may be valid, but you might (with vanilla .NET) have to do a lot more manual index management. By the sounds of it, i4o would certainly be worth investigating, but I don't have any existing comparison data.

i4o : is meant to speed up quering using linq by using indexes like old relational database days.
PLinq: is meant to use extra cpu cores to process the query in parallel.
If performance is your target, depending on your hardware, I say go with i4o it will make a hell of improvement.

I haven't used i4o but I have used PLINQ.
Without know specifics of the query you're trying to improve it's hard to say which (if any) will help.
PLINQ allows for multiprocessing of queries, where it's applicable. There are time however when parallel processing won't help.
i4o looks like it helps with indexing, which will speed up some calls, but not others.
Bottom line is, it depends on the query being run.

Related

LINQ to SQL - How to make this works with database faster

I have a problem. My LINQ to SQL queries are pushing data to the database at ~1000 rows per second. But this is much too slow for me. The objects are not complicated. CPU usage is <10% and bandwidth is not the bottleneck too.
10% is on client, on server is 0% or max 1% generally not working at all, not traversing indexes etc.
Why 1000/s are slow, i need something around 20000/s - 200000/s to solve my problem in other way i will get more data than i can calculate.
I dont using transaction but LINQ using, when i post for example milion objects new objects to DataContext and run SubmitChanges() then this is inserting in LINQ internal transaction.
I dont use parallel LINQ, i dont have many selects, mostly in this scenario i'm inserting objects and want use all resources i have not only 5% od cpu and 10kb/s of network!

when i post for example milion objects
Forget it. Linq2sql is not intended for such large batch updates/inserts.
The problem is that Linq2sql will execute a separate insert (or update) statement for each insert (update). This kind of behaviour is not suitable with such large numbers.
For inserts you should look into SqlBulkCopy because it is a lot faster (and really order of magnitudes faster).

Some performance optimization can be achived with LINQ-to-SQL using first off precompiled queries. A large part of the cost is compiling the actual query.
http://www.albahari.com/nutshell/speedinguplinqtosql.aspx
http://msdn.microsoft.com/en-us/library/bb399335.aspx
Also you can disable object tracking which may give you milliseconds of improvement. This is done on the datacontext right after you instantiate it.

I also encountered this problem before. The solution I used is Entity Framework. There is a tutorial here. One traditional way is to use LINQ-To-Entity, which has similar syntax and seamless integration of C# objects. This way gave me 10x acceleration in my impression. But a more efficient (in magnitude) way is to write SQL statement by yourself, and then use ExecuteStoreQuery function to fetch the results. It requires you to write SQL rather than LINQ statements, but the returned results can still be read by C# easily.

Do SQL Queries Need to be that Complicated?

An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?

What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).

It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."

in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by

This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.

For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.

Is LINQ a valid option?

We (my team) are about to start development of a mission critical project, one of the sub-systems of this project is a Windows service. This service will be a backbone of the entire system and has to respond as per mission critical standard.
This service will contain many lists to minimize database interaction and to gain performance, I am estimating average size of a list to be 250,00 under normal circumstances.
Is it a good idea to use LINQ to query data from these queues, or should I follow my original plan of creating indexed list?
Indexed List is a custom implementation of Idictionary, which will work like an index and will have many performance oriented features such as MRU, Index rebuilding, etc.

Before rolling your own solution, you might want to look at i4o, an indexed LINQ to Objects provider.
Otherwise, it sounds like applying your own indexing may well be worthwhile - but do performance tests first. "Mission critical" is often about reliability more than performance - if LINQ to Objects performs well enough, why not use it?
Even if you do end up writing your own collections, you should consider making them "LINQ queryable" in some fashion (which will depend on the exact nature of the collections)... it's nice to be able to write LINQ queries, even if it's not against vanilla LINQ to Objects.

Which one can have better performance - LINQ to EF or NHibernate?

I want to start working on a big project. I research about performance issues about LINQ to EF and NHibernate. I want to use one of them as ORM in my project. now my question is that which one of these two ORM can get me better performance in my project? I will use SQL Server 2008 as database and C# as programming language.

Neither one will have "better performance."
When analyzing performance, you need to look at the limiting factor. The limiting factor in this case will not be the ORM you choose, but rather how you use that tool, how you write your queries, and how you optimize the database backend.
Therefore, the "fastest" ORM will be the one which you can use correctly, coupled with the database server you best understand.
The ORM itself does have a certain amount of overhead, so the "fastest", in terms of sheer performance, is to use none at all. However, this favors the computer's time over at your development time, which is typically not a good trade-off. ORMs can save large amounts of your development time, while imposing only a small overhead when used correctly.
Typically when people experience performance problems when using an ORM it is because they are using the ORM incorrectly, rather than because they picked the "wrong" ORM.

We're currently using Fluent NHibernate on one our projects (with web services, so that adds additional time lag) and as far as I can see, data access is pretty much instantaneous (from human perspective).
Maybe someone can provide answer with concrete numbers though.
Since these two ORMs are somewhat different, it'd be better to decide on which one to use with regard to your specific needs, rather than performance (which, like I said, shouldn't be a big deal).

Here's a nice benchmark. As you can see results depend on whether you are doing SELECT, UPDATE, DELETE.

What is the overhead associated with .NET stored procedures executing on Sql Server?

Certainly there is some type of context switching, marshaling, serializing, etc that takes place when you choose to write a stored procedure in NET vs T/SQL.
Are there any hard facts about this overhead as it relates to making a decision about using .NET vs T/SQL for stored procedures?
What factors do you use in order to make the decision?
For me, developing in C# is 1000% faster than developing in T/SQL due to my learning curve, but when does this gain not offset the performance hit (if there even is one)?

I'd say it depends.
Some things you will find using CLR procedres are better
Using native .NET objects -- file system, network, etc
Using features not offered by TQL or not as good as .NET e.g regular expressions
For yet others you will find TSQL procedures are better, either in terms of ease of use or raw execution speed
Using set based logic (where abc in def or abc not in ghi)
CRUD operations

The overhead is irrelevant in relation to switching, marshaling, etc, unless...your data is not already in the database.
That said, I tend to do almost no stored procedures do as much as possible with T-SQL -- wrapped in an ORM.
Really, that is the answer for there. Look into Entity Framework, NHibernate, Linq To SQL, etc. Get out of writing stored procs (I can hear the down votes now), and do it with C#.
EDIT: Added more info
Stored procs are hard to test (inherently data bound), they offer not speed benefits over dynamic sql, and really are not any more secure. That said, for things like sorting and filtering, a relational database is more efficient than C# will be.
But, if you have to use stored procs (and there are times), use TSQL as much as possible. If you have a substancial amount of logic that needs to be performed, that is not set based, then break out C#.
The one time I did this was to add some extra date processing to my database. The company I was at used a 5-4-4 week-to-month fiscal calendar. That was almost impossible to do in SQL. So I used C# for that one.

It's sort of a religious question, but there can be substantial differences depending on what you're doing, your schema, and how many records you're doing it with. Testing is really the best way to find out for your circumstances.
In general, stored prcoedures will scale better; but that may not be true in your case, or that may not be an important consideration for you.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Comparing i4o vs. PLINQ for larger collections - c#

i4o : is meant to speed up quering using linq by using indexes like old relational database days. PLinq: is meant to use extra cpu cores to process the query in parallel. If performance is your target, depending on your hardware, I say go with i4o it will make a hell of improvement.

Related

LINQ to SQL - How to make this works with database faster

Do SQL Queries Need to be that Complicated?

Is LINQ a valid option?

Which one can have better performance - LINQ to EF or NHibernate?

What is the overhead associated with .NET stored procedures executing on Sql Server?

Categories

Resources