Problems with converting List to IQueryable in Linq [closed]

Problems with converting List to IQueryable in Linq [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
EDIT
Actually this question should be more general: How to modify query to DB if Linq with IQueryable gives errors?
The correct answer is as far as I understand — to get as much of the query done at the database level. Because in this particular case my complicate query just can not be transform from Linq to sql.
So I just wrote a raw sql query with FromSqlRaw() method and errors have gone. Moreover I wrote query in the way that does not take all entries (with filtering) as opposed to ToList() method, so I have less doubt about performance (though I did not measure it).
Need some help with understanding how to use linq with converting List to IQueryable.
What I had:
A three tables in DB with IQueryable-based queries to one of them.
What I need:
To create a query that combine data from three tables by Linq and give me resulting specific column with data for every element of one of table with function of filtering by this column.
What I try:
Supplement IQueryable-based query. But I found problems with List to IQueryable converting. Method AsQueryable() gives errors.
What I achieve:
I rewrite queries with List-based logic in Linq and it gives me what I need. But I do not understand:
Is this practice good?
Why should I often must make ToList() conversion for avoiding errors?
Is the speed of my solution worse than IQueryable-based approach?
Here is fiddle with my exercises: https://dotnetfiddle.net/BAKi6r
What I need I get in listF var.
I totally replace CreateAsync method in it with Create method for List. Is it good?
I also try to use hardcoded Lists with CreateAsync method /items2moq, items3moq/, but they with filtered List-based query give The provider for the source IQueryable doesn't implement IAsyncQueryProvider error. Also I got Argument types do not match error when I use IQueryable for NamesIQ instead of List for NamesList. What exactly the source of this errors?

Why should I often must make ToList() conversion for avoiding errors?
I often think about Linq queries in three "levels":
IQueryable - there are designed to translate a Linq query into an equivalent database (or whatever data source you're using) query. Many Linq and non-Linq operations just can't be translated into its SQL or other equivalent, so this layer would throw an error. Even operations that seem simple (like splitting a string) are difficult if not impossible to do in SQL
IEnumerable - in this layer, Linq queries are done in memory, so there's much more flexibility to do custom operations. To get from the IQueryable layer to the IEnumerable layer, the AsEnumerable() call is the most straightforward. That separates the part of the query that gets raw data from the part that can create custom objects, do more complex filtering and aggregations, etc. Note that IEnumerable still uses "deferred execution", meaning that at this stage, the query is just a query - the results don;t actually get computed until you enumerate it, either with a foreach loop or by advancing to the next layer:
List/Array/etc. This is where queries are executed and turned into concrete collections. Some of the benefits of this layer are serializability (you can't "serialize" an enumerator) and eager-loading (as opposed to deferred execution described above).
So you're probably getting an error because you have some part of your query that can't be translated by the underlying Queryable provider, and using ToList is a convenient way to materialize the raw data into a list, which allows you to do more complex operations. Note that AsEnumerable() would do the same thing but would maintain deferred execution.
Is this practice good?
It can be, but you might easily be getting more data than you need by doing filtering at the list level rather than at the database level. My general practice is to get as much of the query done at the database level, and only moving to the enumerable/list level when there's no known way to translate the rest of the query to SQL.
Is the speed of my solution worse than IQueryable-based approach?
The only way to know is to try it both ways and measure the difference. But it's a pretty safe bet that if you get more raw data than you need and filter in memory that you'll have worse performance.

Related

Does there Exist for LINQ any sort of indexing akin to FoxPro's Rushmore? [duplicate]

This question already has answers here:
LINQ to Objects and improved perf with an Index?
(4 answers)
Closed 3 years ago.
I have worked with FoxPro databases, which uses the Rushmore optimization technology and I wanted to know if there is any optimization technology for LINQ.
I am not looking for this in LINQ-to-SQL, because Rushmore was actually assimilated into SQL Server, and is responsible for part of its index-related speed.
I want to know for LINQ-to-Objects, if there is something similar to Rushmore or the index-related performance optimizations in SQL Server?
This question is not really a duplicate because 1.) Rushmore automatically optimized your expressions (and IF you do that with I4O, it is done manually), because 2). there was a bitmapped component that allowed multiple indexes to be quickly combined in expressions (and had good performance), and because 3). the technology works for tables that can't fit in memory (which would be a plus, in this case).

There is no Query Optimizer and no Indexes in Linq-to-Objects. You can use the ToDictionary, ToLookup, ToHashset extension methods to create "indexes" over in-memory collections, and you can create sorted collections of objects.
You can then manually write queries and procedural code using these optimized collections to replicate what a query optimizer would otherwise do.

(I am just thinking out loud and would be a mess as a comment)
Rushmore optimization was basically about choosing the correct indexes, and\or use no index at all and do a bit map indexing on the fly. While it is a nice technique, I think databases got to speed via their own different tricks besides the indexes themselves. For example, lateral in ansi sql, range index in postgreSQL, shards are few to name. If you use Linq against a particular backend, you would be utilizing that backend's capabilities (including Rushmore, if you are using Linq To VFP).
For Linq To Objects, there isn't such a thing AFAIK, but as a developer you could take the responsibility of writing it as optimized as possible and if you think, L2O is an in memory thing, you might not need it as much as you do with a database. Even with Rushmore, we were taking the responsibility of trying alternative ways of querying for the best performance.
(You have the question tagged as "Linq", I would hope Joseph Albahari -author of .Net xx in a nutshell- would see it and provide an answer in detail)

How would you correctly return a collection of objects asynchronously? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I need to define methods in my core interface that return lists. My project heavily relies on the use of async/await so I need to define my core references/interfaces as asynchronous as possible. I also use EF7 for my data-access layer. I currently use IAsyncEnumerable everywhere.
I am currently deciding whether to keep using IAsyncEnumerable or to revert back to using Task<IEnumerable<T>>. IAsyncEnumerable seems promising at this point. EF7 is using it as well. The trouble is, I don't know and can't figure out how to use it. There is almost nothing on the website that tells anyone how to use Ix.Net. There's a ToAsyncEnumerable extension that I can use on IEnumerable objects but this wouldn't do anything asynchronously (or does it??). Another drawback is that given the below signature:
IAsyncEnumerable GetPersons();
Because this isn't a function that returns Task, I can't use async/await inside the function block.
On the other hand, my gut is telling me that I should stick with using Task<IEnumerable<T>>. This of course has it's problems as well. EF does not have an extension method that returns this type. It has a ToArrayAsync and ToListAsync extension method but this of course requires you to call await inside the method because Task<T> isn't covariant. This potentially is a problem because this creates an extra operation which could be avoided if I simply return the Task object.
My questions is: Should I keep using IAsyncEnumerable (preferred) or should I change everything back to Task<IEnumerable<T>> (not preferred)? I'm open to other suggestions as well.

I would go with IAsyncEnumerable. It allows you to keep your operations both asynchronous and lazy.
Without it you need to return Task<IEnumerble> which means you're loading all the results into memory. This in many cases meaning querying and holding more memory than needed.
The classic case is having a query that the user calls Any on. If it's Task<IEnumerable> it will load all the results into memory first, and if it's IAsyncEnumerable loading one result will be enough.
Also relevant is that with Task<IEnumerable> you need to hold the entire result set in memory at the same time while with IAsyncEnumerable you can "stream" the results a few at a time.
Also, that's the direction the ecosystem is heading. It was added by reactive extension, by a new library suggested by Stephen Toub just this week and will probably be supported in the next version of C# natively.

You should just use Task<IEnumerable<T>> return types. The reason is simply that you don’t want to lazily run a new query against the database for every object you want to read, so just let EF query those at once, and then pass that collection on.
Of course you could make the async list into an async enumerable then, but why bother. Once you have the data in memory, there’s no reason to artificially delay access to it.

multiple resultsets vs multiple calls for performance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a fairly high performance application, and I know database connections are usually one of the more expensive operations. I have a task that runs pretty frequently, and in the course of business it has to select data from Table1 and Table2. I have two options:
Keep making two entity framework queries like I am right now. select from Table1 and select from Table2 in linq queries. (What I'm currently doing now).
Created a stored procedure that returns both resultsets in one query, using multiple resultsets.
I'd imagine the cost to SQL Server is the same: the same IO is being performed. I'm curious if anyone can speak to the performance bump that may exist in a "hot" codepath where milliseconds matter.

and I know database connections are usually one of the more expensive operations
Unless you turn off connection pooling, then as long as there are connections already established in the pool and available to use, obtaining a connection is pretty cheap. It also really shouldn't matter here anyway.
When it comes to two queries (whether EF or not) vs one query with two result sets (and using NextResult on the data reader) then you will gain a little, but really not much. Since there's no need to re-establish a connection either way, there's only a very small reduction in the overhead of one over the other, that will be dwarfed by the amount of actual data if the results are large enough for you to care much about this impact. (Less overhead again if you could union the two resultsets, but then you could do that with EF too anyway).
If you mean the bytes going too and fro over the connection after it's been established, then you should be able to send slightly less to the database (but we're talking a handful of bytes) and about the same coming back, assuming that your query is only obtaining what is actually needed. That is you do something like from t in Table1Repository select new {t.ID, t.Name} if you need ids and names rather than pulling back complete entities for each row.
EntityFramework does a whole bunch of things, and doing anything costs, so taking on more of the work yourself should mean you can be tighter. However, as well as introducing new scope for error over the tried and tested, you also introduce new scope for doing things less efficiently than EF does.
Any seeking of commonality between different pieces of database-handling code gets you further and further along the sort of path that ends up with you producing your own version of EntityFramework, but with the efficiency of all of it being up to you. Any attempt to streamline a particular query brings you in the opposite direction of having masses of similar, but not identical, code with slightly different bugs and performance hits.
In all, you are likely better off taking the EF approach first, and if a particular query proves particularly troublesome when it comes to performance then first see if you can improve it while remaining with EF (optimise the linq, use AsNoTracking when appropriate and so on) and if it is still a hotspot then try to hand-roll with ADO for just that part and measure. Until then, saying "yes, it would be slightly faster to use two resultsets with ADO.NET" isn't terribly useful, because just what that "slightly" is depends.

If the query is a simple one to read from table1 and table2 then LinQ queries should give similar performance as executing stored procedure (plain SQL). But if the query runs across different databases then plain SQL is always better, where you can Union the result sets and have the data from all databases.
In MySQL "Explain" statement can be used to know the performance of query. See this link:
http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
Another useful tool, is to check the SQL generated for your LinQ query in the output window of Microsoft Visual Studio. You can execute this query directly in a SQL editor and check the performance.

Understanding AsEnumerable in Linq to Objects [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Written on msdn:
Returns the input typed as IEnumerable<T>.
I do not understand.
Help me to understand this method.

There are three implementations of AsEnumerable.
DataTableExtensions.AsEnumerable
Extends a DataTable to give it an IEnumerable interface so you can use Linq against the DataTable.
Enumerable.AsEnumerable<TSource> and ParallelEnumerable.AsEnumerable<TSource>
The AsEnumerable<TSource>(IEnumerable<TSource>) method has no effect
other than to change the compile-time type of source from a type that
implements IEnumerable<T> to IEnumerable<T> itself.
AsEnumerable<TSource>(IEnumerable<TSource>) can be used to choose
between query implementations when a sequence implements
IEnumerable<T> but also has a different set of public query methods
available. For example, given a generic class Table that implements
IEnumerable<T> and has its own methods such as Where, Select, and
SelectMany, a call to Where would invoke the public Where method of
Table. A Table type that represents a database table could have a
Where method that takes the predicate argument as an expression tree
and converts the tree to SQL for remote execution. If remote execution
is not desired, for example because the predicate invokes a local
method, the AsEnumerable<TSource> method can be used to hide the
custom methods and instead make the standard query operators
available.
In other words.
If I have an
IQueryable<X> sequence = ...;
from a Linq Provider, like Entity Framework, and I do,
sequence.Where(x => SomeUnusualPredicate(x));
that query will be composed on and run on the server. This will fail at runtime because Entity Framework doesn't know how to convert SomeUnusualPredicate into SQL.
If I want that to run the statement with Linq to Objects instead, I do,
sequence.AsEnumerable().Where(x => SomeUnusualPredicate(x));
now the server will return all the data and the Enumerable.Where from Linq to Objects will be used instead of the Query Provider's implementation.
It won't matter that Entity Framework doesn't know how to interpret SomeUnusualPredicate, my function will be used directly. (However, this may be an inefficient approach since all rows will be returned from the server.)

Will indexing a SQL table improve the performance of a LINQ statement?

Apologies for what may very well be a stupid question.
I'm just intrigued as to (a) will indexing improve performance (b) how will it improve performance and (c) why will it improve performance?
Also, if this does improve performance, would this be the case across the board for LINQ to SQL, LINQ to Entities, LINQ to Objects, etc, etc.
Again, if this is a really stupid question, I do apologise.

It will improve the performance if, and only if, your LINQ query causes an SQL statement to be executed which can make use of the index - since, when using Linq-To-Sql, your LINQ query is translated to an SQL statement (hence the name).
For example, the following query would obviously benefit from an index on the LastName column of the Customers table.
var results = from c in db.Customers
where c.LastName == 'Smith'
select c;

Like everything SQL -- it depends.
If you have a small table it doesnt really matter (todays value for "small" is < 3000 rows).
If your query is going to return more than 30% of the rows in a table then it will probably be quicker without an index.
However if you want to select one or two particular rows from a large table then indexing some of the columns you use in the where statement (the search arguments you pass to LINQ) will speed thing up considerably.
Also if you frequntly join tables than the join predicates (the columns used in the "ON" statement for the joined to table) should be indexed. This can reduce the response time on some queries from hours to seconds.

It could improve performance if you index correctly. LINQ just generates a SQL statment behind the scenes... if the statement makes use of your index, then it will improve performance... if it doesn't then it won't.

In order to improve performance, indexing should be used smartly. E.g. you should index the key columns you are querying by or joining by.
http://www.sql-server-performance.com/tips/optimizing_indexes_general_p1.aspx
Indexing would typically improve select request performance, but slightly increase insert and update times, since the index has to be re-calculated.

LINQ to SQL generate a SQL statement, you can see this statement in different utilities like Linq pad. When you query will run in SQL Server the indexes will definitely improve performance.
will indexing improve performanc -
Yes.
how will it improve performance -
Read about Indexes.
why will it improve performance?- I
really don't have answer of this
question.
Also, if this does improve
performance, would this be the case
across the board for LINQ to SQL, LINQ
to Entities, LINQ to Objects, etc,
etc.
Answer of this is what is the relation of SQL table with an Object?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.