Entity Framework - how can I optimize “Contains” statement?

Entity Framework - how can I optimize “Contains” statement? - c#

In our current application we have some performance issues with some of our queries. Usually we have something like:
List<int> idList = some data here…;
var query = (from a in someTable where idList.Contains(a.Id));
while for simple queries this is acceptable, it becomes a bottleneck when we have more items in idList (in some queries we have about 700 id’s to check, for example).
Is there any way to use something other then Contains? We are thinking of using some temporary tables to first insert the Ids, and then to execute join instead of Contains, but it would seem EntityFramework does not support such operations (creating temporary tables in code) :(
What else can we try?

I Suggest using LINQ PAD it offers a "Transform to SQL" option which allows you to see your query in SQL syntax.
there is a chance that this is the optimal solution (if youre not into messy stuff).
might try holding the idList as a sorted array and have the contains method replaced with a binary search. (you can implement your own extension).

You can try this:
var query = someTable.Where(a => idList.Any(b => b.Id == a.Id));

If you don't mind having a physical table you could use a semi-temporary table. The basic idea is:
Create a physical table with a "query id" column
Generate a unique ID (not random, but unique)
Insert data into the table tagging the records with the query ID
Pass the query id to the main query, using it to join to the link table
Once the query is complete, delete the temporary records
At worst if something goes wrong you will have orphaned records in the link table (which is why you use a unique query ID).
It's not the cleanest solution but it will be faster than using Contains if you have a lot of values to check against.

When Entity Framework starts being a performance bottleneck, generally it's time to write actual SQL.
So what you could do for example is build a table-valued function that takes a table-valued parameter (your list of IDs) as parameter. The function would just return the result of your JOIN.
Table valued function feature requires EF5, so it might be not an option if you're really stuck with EF4.

The idea is to refactor your queries to get rid of idList.
For example you should return the list of orders of male users 18-25 year, from France. If you filter users table by age, sex and country to get idList of users you end up with 700+ id's. Instead you make Orders table join with Users and apply filters to the Users table. So you don't have 2 requests (one for ids and one for orders) and it works much faster cause it can use indexes while joining the table.
Makes sense?

Related

How To can Select**DB Table

If I have a database in each table where the ID field and its appropriate function in any field do not take the administrator behavior so that tables and field contents can be fetched until the serial number is unified without duplicate values
Appropriate in this context using except.
Is there a code that can fetch tables either in sql or in the Entity Framework ؟
Eexcept_Admin_except_List
List<int> tempIdList = answeripE.Select(q => q.ID).ToList();
var quslist = db.Qustion.Where(q => !tempIdList.Contains(q.ID));
\Thanks for the creator of "daryal" Get All Except from SQL database using Entity Framework
I need to do this without asking for each table and querying it. And also request SQL from the database as a whole without exception such as
select*
IDfield
FROM
MSDB_Table T
WHERE
T.id == MaxBy(T.OrderBy(x => x.id);
can replace "where TABLE1.id 'OR' Table2.id" decode all the tables and give a result.
All I'm looking forward to is that I can query one database on a whole, get it on a list without the use of tables or a composite key because it serves me in analyzing a set of data converted to other data formats, for example when representing a database in the form of JSON There are a lot of them on more than one platform and in a single database and to avoid the repetition of the data I need to do this or a comprehensive query may be compared or to investigate or like Solver Tool in Excel, so far did not get the answer to show me the first step is because it does not exist originally or because it is not possible?

If you want Entity Framework to retrieve all columns except a subset of them, the best way to do that is either via a stored procedure or a view. With a view you can query it using LINQ and add your predicates in code, but in a stored procedure you will have to write it and feed your predicate conditions into it...so it sounds like a view would be better for you.
Old example, but should guide you through the process:
https://www.mssqltips.com/sqlservertip/1990/how-to-use-sql-server-views-with-the-entity-framework/

Will Linq query code in C# UWP project work slower or faster depending on order of its' parts?

I have this general question to know whether SQLite db query using Linq may be faster or slower depending on order of parts of it.
Let's say I have a DB table with multiple user data.
If I write something like this:
var Query = DB_List.Where(TableName => TableName.UserId == UserId &&
TableName.SomeValue == SomeValue);
Will it be faster than:
var Query = DB_List.Where(TableName => TableName.SomeValue == SomeValue &&
TableName.UserId == UserId);
considering that there may be thousands of userId's??
Personally I think it may be better to first identify current UserId records and then check them for value we look for, but maybe it does not make a difference.

The question is rather about SQLite itself than about UWP. The only thing UWP does in this case is translate the LINQ query into a SQL query that goes to the database. In this case it will be something like:
SELECT * FROM MyTable WHERE SomeValue == #someValue AND UserId = #userId
Now, the order in which the WHERE clause conditions are listed does not matter. Before executing the query, SQLite will prepare an execution plan that will attempt to choose the most performant order of evaluation. Here it will first and foremost prioritize indexed columns - so if you want to improve performance, this is where should start.
Creating an index in SQLite is similar to any other database:
CREATE INDEX idxUserId ON MyTable (UserId);
You can even include multiple columns:
CREATE INDEX idxUserIdSomeValue ON MyTable (UserId, SomeValue);
When you setup an index, the DB will essentially maintain a sorted ordering of the rows in the table to allow for blazingly fast binary search, which will aid your scenario perfectly. Also note, that primary column of all tables is automatically indexed.
For more information on SQLite Indices, see this great article by Jason Feinstein.

How to know how many persistent objects were deleted using Session.Delete(query);

We are refactoring a project from plain MySQL queries to the usage of NHibernate.
In the MySQL connector there is the ExecuteNonQuery function that returns the rows affected. So
int RowsDeleted = ExecuteNonQuery("DELETE FROM `table` WHERE ...");
would show me how many rows where effectively deleted.
How can I achieve the same with NHibernate? So far I can see it is not possible with Session.Delete(query);.
My current workaround is first loading all of the objects that are about to be deleted and delete them one-by-one, incrementing a counter on each delete. But that will cost performance I may assume.

If you don't mind that nHibernate will create delete statements for each row and maybe additional statements for orphans and/or other relationships, you can use session.Delete.
For better performance I would recommend to do batch deletes (see example below).
session.Delete
If you delete many objects with session.Delete, nHibernate makes sure that the integrity is preserved, it will load everything into the session if needed anyways. So there is no real reason to count your objects or have a method to retrieve the number of objects which have been deleted, because you would simply do a query before running the delete to determine the number of objects which will be affected...
The following statement will delete all entities of type post by id.
The select statement will query the database only for the Ids so it is actually very performant...
var idList = session.Query<Post>().Select(p => p.Id).ToList<int>();
session.Delete(string.Format("from Post where Id in ({0})", string.Join(",", idList.ToArray())));
The number of objects deleted will be equal to the number of Ids in the list...
This is actually the same (in terms of queries nHibernate will fire against your database) as if you would query<T> and loop over the result and delete all of them one by one...
Batch delete
You can use session.CreateSqlQuery to run native SQL commands. It also allows you to have input and output parameters.
The following statement would simply delete everything from the table as you would expect
session.CreateSQLQuery(#"Delete from MyTableName");
To retrieve the number of rows delete, we'll use the normal TSQL ##ROWCOUNT variable and output it via select. To retrieve the selected row count, we have to add an output parameter to the created query via AddScalar and UniqueResult simple returns the integer:
var rowsAffected = session.CreateSQLQuery(#"
Delete from MyTableName;
Select ##ROWCOUNT as NumberOfRows")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.UniqueResult();
To pass input variables you can do this with .SetParameter(<name>,<value>)
var rowsAffected = session.CreateSQLQuery(#"
DELETE from MyTableName where ColumnName = :val;
select ##ROWCOUNT NumberOfRows;")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.SetParameter("val", 1)
.UniqueResult();
I'm not so confortable with MySQL, the example I wrote is for MSSQL, I think in MySQL the ##ROWCOUNT equivalent would be SELECT ROW_COUNT();?

How to read the result of SELECT * from joined tables with duplicate column names in .NET

I am a PHP/MySQL developer, slowly venturing into the realm of C#/SQL Server and I am having a problem in C# when it comes to reading an SQL Server query that joins two tables.
Given the two tables:
TableA:
int:id
VARCHAR(50):name
int:b_id
TableB:
int:id
VARCHAR(50):name
And given the query
SELECT * FROM TableA,TableB WHERE TableA.b_id = TableB.id;
Now in C# I normally read query data in the following fashion:
SqlDataReader data_reader= sql_command.ExecuteReader();
data_reader["Field"];
Except in this case I need to differentiate from TableA's name column, and TableB's name column.
In PHP I would simply ask for the field "TableA.name" or "TableB.name" accordingly but when I try something like
data_reader["TableB.name"];
in C#, my code errors out.
How can fix this? And how can I read a query on multiple tables in C#?

The result set only sees the returned data/column names, not the underlying table. Change your query to something like
SELECT TableA.Name as Name_TA, TableB.Name as Name_TB from ...
Then you can refer to the fields like this:
data_reader["Name_TA"];

To those posting that it is wrong to use "SELECT *", I strongly disagree with you. There are many real world cases where a SELECT * is necessary. Your absolute statements about its "wrong" use may be leading someone astray from what is a legitimate solution.
The problem here does not lie with the use of SELECT *, but with a constraint in ADO.NET.
As the OP points out, in PHP you can index a data row via the "TABLE.COLUMN" syntax, which is also how raw SQL handles column name conflicts:
SELECT table1.ID, table2.ID FROM table1, table;
Why DataReader is not implemented this way I do not know...
That said, a solution to be used could build your SQL statement dynamically by:
querying the schema of the tables you're selecting from
build your SELECT clause by iterating through the column names in the schema
In this way you could build a query like the following without having to know what columns currently exist in the schema for the tables you're selecting from
SELECT TableA.Name as Name_TA, TableB.Name as Name_TB from ...

You could try reading the values by index (a number) rather than by key.
name = data_reader[4];
You will have to experiment to see how the numbers correspond.

Welcome to the real world. In the real world, we don't use "SELECT *". Specify which columns you want, from which tables, and with which alias, if required.

Although it is better to use a column list to remove duplicate columns, if for any reason you want *****, then just use
rdr.item("duplicate_column_name")
This will return the first column value, since the inner join will have the same values in both identical columns, so this will accomplish the task.

Ideally, you should never have duplicate column names, across a database schema. So if you can rename your schema to not have conflicting names.
That rule is for this very situation. Once you've done your join, it is just a new recordset, and generally the table names do go with it.

Best way of acquiring information from several database tables

I have a medical database that keeps different types of data on patients: examinations, lab results, x-rays... each type of record exists in a separate table. I need to present this data on one table to show the patient's history with a particular clinic.
My question: what is the best way to do it? Should I do a SELECT from each table where the patient ID matches, order them by date, and then keep them in some artificial list-like structure (ordered by date)? Or is there a better way of doing this?
I'm using WPF and SQL Server 2008 for this app.

As others have said, JOIN is the way you'd normally do this. However, if there are multiple rows in one table for a patient then there's a chance you'll get data in some columns repeated across multiple rows, which often you don't want. In that case it's sometimes easier to use UNION or UNION ALL.
Let's say you have two tables, examinations and xrays, each with a PatientID, a Date and some extra details. You could combine them like this:
SELECT PatientID, ExamDate [Date], ExamResults [Details]
FROM examinations
WHERE PatientID = #patient
UNION ALL
SELECT PatientID, XrayDate [Date], XrayComments [Details]
FROM xrays
WHERE PatientID = #patient
Now you have one big result set with PatientID, Date and Details columns. I've found this handy for "merging" multiple tables with similar, but not identical, data.

If this is something you're going to be doing often, I'd be tempted to create a denormalized view on all of patient data (join the appropriate tables) and index the appropriate column(s) in the view. Then use the appropriate method (stored procedure, etc) to retrieve the data for a passed-in patientID.

Use a JOIN to get data from several tables.

You can use a join (can't remember which type exactly) to get all the records from each table for a specific patient. The way this works depends on your database design.

I'd do it with separate SELECT statements, since a simple JOIN probably won't do due to the fact that some tables might have more than 1 row for the patient.
So I would retrieve multiple result-sets in a simple DataSet, add a DalaRelation, cache the object and query it down the line (by date, by exam type, subsets, ...)
The main point is that you have all the data handy, even cached if needed, in a structure which is easily queried and filtered.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Entity Framework - how can I optimize “Contains” statement? - c#

You can try this: var query = someTable.Where(a => idList.Any(b => b.Id == a.Id));

Related

How To can Select**DB Table

Will Linq query code in C# UWP project work slower or faster depending on order of its' parts?

How to know how many persistent objects were deleted using Session.Delete(query);

How to read the result of SELECT * from joined tables with duplicate column names in .NET

Best way of acquiring information from several database tables

Categories

Resources