EF query where List contains a lot of elements

EF query where List contains a lot of elements - c#

I'm doing an EF query with a list which contains many elements and I seem to be coming across the same issue listed here (where SQL Server throws an error because there are too many elements in the IN statement). I was wondering if there is another way to do it.
Would ExecuteStoreQuery or ExecuteStoreCommand work if the command set up a temporary table?
Thanks

Yep, the best way to select from a large list of keys is to use a temp table.
http://explainextended.com/2009/08/18/passing-parameters-in-mysql-in-list-vs-temporary-table/
If you're using MsSQL and C# then SqlBulkCopy will get your list of keys in the fastest.
public void bulkCopy(String tmpTableName, DataTable table)
{
using (SqlBulkCopy bulkCopy =
new SqlBulkCopy((SqlConnection)connection))
{
bulkCopy.DestinationTableName = tmpTableName;
bulkCopy.WriteToServer(table);
}
}
Then have a stored procedure to match to the temp table by key.

If that list data comes from a database query, do not call .ToList() on your source query. Instead pass the IQueryable to the query.
var statuses = context.Statuses.Where(o => o.IsActive).Select(o => o.Id);
var data = context.Orders.Where(o => statuses.Contains(o.StatusId));
Alternatively you can use something like this
var sessionId = Guid.NewGuid();
foreach (var s in statusList)
insert into SearchValues(sessionId, s); // pseudo code
var statuses = context.SearchValues.Where(o => o.SessionId == sessionId).Select(o => o.Id);
var data = context.Orders.Where(o => statuses.Contains(o.StatusId));
delete from SearchValues where SessionId == #sessionId // pseudo code
You might want to do the insert and delete using SQL queries (and not EF context operations) for optimal performance.

Related

How to avoid duplicates when inserting data using OleDB and Entity Framework?

I need to export Excel report data into the company db but my code just reads and inserts without checking for duplicates, i tried AddOrUpdate() but i couldn't make it work.
Any ideas on how to go through the datareader results and filter already existing IDs so they are not inserted again?
DataView ImportarDatosSites(string filename)
{
string conexion = string.Format("Provider = Microsoft.ACE.OLEDB.12.0; Data Source={0}; Extended Properties= 'Excel 8.0;HDR=YES'" ,filename );
using (OleDbConnection connection = new OleDbConnection(conexion))
{
connection.Open();
OleDbCommand command = new OleDbCommand("SELECT * FROM [BaseSitiosTelemetria$]", connection);
OleDbDataAdapter adaptador = new OleDbDataAdapter { SelectCommand = command };
DataSet ds = new DataSet();
adaptador.Fill(ds);
DataTable dt = ds.Tables[0];
using (OleDbDataReader dr = command.ExecuteReader())
{
while (dr.Read())
{
var SiteID = dr[1];
var ID_AA_FB = dr[2];
var Address = dr[3];
var CreateDate = dr[5];
var Tipo = dr[7];
var Measures = dr[9];
var Latitud = dr[10];
var Longitud = dr[11];
SitesMtto s = new SitesMtto();
s.siteIDDatagate = SiteID.ToString();
s.idFieldBeat = ID_AA_FB.ToString();
s.addressDatagate = Address.ToString();
s.createDateDatagate = Convert.ToDateTime(CreateDate);
s.typeDevice = Tipo.ToString();
s.MeasuresDevice = Measures.ToString();
if (Latitud.ToString() != "" && Longitud.ToString() != "")
{
s.latitudeSite = Convert.ToDouble(Latitud);
s.longitudeSite = Convert.ToDouble(Longitud);
}
db.SitesMtto.Attach(s);
db.SitesMtto.Add(s);
db.SaveChanges();
}
connection.Close();
return ds.Tables[0].DefaultView;
}
}
}

one way is to setup a try catch block and then set your primary key index using tsql. when a constraint error occurs then it will throw an database error which you can catch.

When it comes to an import process from an external source, I recommend using a Staging table approach. Dump the raw data from Excel/file into a clean staging table. (executing a TRUNCATE TABLE script against the staging table first) From there you can perform a query with a join against the real data table to detect and ignore/update possible duplicates, inserting real rows for any staged row that doesn't already have a corresponding value.
Depending on the number of rows I would recommend batching up the read and insert. You also don't need to call both Attach() and Add(), simply adding the item to the DbSet is sufficient:
Step 1: flush the staging table using a db.Database.ExecuteSqlCommand("TRUNCATE TABLE stagingSitesMtto");
Step 2: Open the data reader and bulk-insert the rows into the stagingSitesMtto table. This assumes that the Excel/file source does not include duplicate rows within it.
Step 3: Query your stagingSitesMtto joining your SitesMtto table on the PK/unique key. This is arguably a bit complex as Join is normally used to perform an INNER JOIN but we want an OUTER JOIN since we will be interested in StagingSites that have no corresponding site.
var query = db.StagingSitesMtto
.GroupJoin(db.SitesMto,
staging => staging.SiteID,
site => site.siteIDDatagate,
(staging, site) => new
{
Staging = staging,
Site = site
})
.SelectMany(group => group.Site.DefaultIfEmpty(),
(group, site) => new
{
Staging = group.Staging,
IsNew = site == null
})
.Where(x => x.IsNew)
.Select(x => x.Staging)
.ToList(); // Or run in a loop with Skip and Take
This will look to select all staging rows that do not have a corresponding real row. From there you can create new SitesMtto entities and copy the data across from the staging row, add it to the db.Sites, and save. If you want to update rows as well as insert then you can return the Staging and Site along with the IsNew flag and update the .Site using the values from .Staging. With change tracking enabled, the existing Site will be updated on SaveShanges if values were altered.
Disclaimer: The above code wasn't tested, just written from memory and reference for the Outer Join approach. see: How to make LEFT JOIN in Lambda LINQ expressions
Hopefully that gives you something to consider for handling imports.

How can I map a "select all" query with 2 tables using SqlQuery

I'm trying to retrieve all fields from two joined tables to any kind of a c# object.
So I'm trying to run this code:
var query = #$"EXEC('select *
from persons p join students s on p.id=s.id
where p.id = 21')";
var result = _context.Database.SqlQuery<?>(query).ToList();
But I don't get what should be instead of the question mark.
I've tried List<object> and Dictionary<string,string> but since I couldn't get exactly how this is being mapped, I don't understand to what it can be mapped.
There is a somewhat similar question here but its solution only addresses two columns, and it apparently doesn't support returning nulls.

You can try creating a stored procedure or a function in the SQL level.
Then, just select then generated table / result.
So, you already have an idea of what class it is.

I frequently use the dynamic type like this :
var lst = _context.Database.SqlQuery<dynamic>(query).ToList();
foreach (var item in lst)
{
var myVar = item.myfieldName;
}
It may be preferable to name each field in your query instead of using select *.

Optimize LINQ query that runs fast in Sql server?

I want to calculate the rows of a related table:
MainTable tbl = tblInfo(id);
var count = tbl.Related_Huge_Table_Data.Count();
The problem is: this takes too long (about 20 seconds) to execute, although when I run this query in Sql Server it executes below one second. How can I optimize this query in linq? I also tried to use stored procedure but no luck.
This is the tblInfo method:
public MainTable tblInfo(int id)
{
MyDataContext context = new MyDataContext();
MainTable mt = (from c in context.MainTables
where c.Id == id
select c).SingleOrDefault();
return mt;
}
I used LinqToSql and classes was generated by LinqToSql.

By running SingleOrDefault() you execute the query and have to deal with results in memory after that. You need to stay with IQueryable until your query is fully constructed.
The easiest way to answer "how many child records this parent record has" is to approach it from the child side:
using (var dx = new MyDataContext())
{
// If you have an association between the tables defined in the context
int count = dx.Related_Huge_Table_Datas.Where(t => t.MainTable.id == 42).Count();
// If you don't
int count = dx.Related_Huge_Table_Datas.Where(t => t.parent_id == 42).Count();
}
If you insist on the parent side approach, you can do that too:
using (var dx = new MyDataContext())
{
int count = dx.MainTables.Where(t => t.id == 42).SelectMany(t => t.Related_Huge_Table_Datas).Count();
}
If you want to keep a part of this query in a function like tblInfo, you can, but you can't instantiate MyDataContext from inside such function, otherwise you will get an exception when trying to use the query with another instance of MyDataContext. So either pass MyDataContext to tblInfo or make tblInfo a member of partial class MyDataContext:
public static IQueryable<MainTable> tblInfo(MyDataContext dx, int id)
{
return dx.MainTables.Where(t => t.id == id);
}
...
using (var dx = new MyDataContext())
{
int count = tblInfo(dx, 42).SelectMany(t => t.Related_Huge_Table_Datas).Count();
}

Try this
MyDataContext context = new MyDataContext();
var count=context.Related_Huge_Table_Data.where(o=>o.Parentid==id).Count();
//or
int count=context.Database.SqlQuery<int>("select count(1) from Related_Huge_Table_Data where Parentid="+id).FirstOrDefault();

If you wish to take full advantage of your SQL Database's performance, it may make sense to query it directly rather than use Linq. Should be reasonably more performent :)
var Related_Huge_Table_Data = "TABLENAME";//Input table name here
var Id = "ID"; //Input Id name here
var connectionString = "user id=USERNAME; password=PASSWORD server=SERVERNAME; Trusted_Connection=YESORNO; database=DATABASE; connection timeout=30";
SqlCommand sCommand = new SqlCommand();
sCommand.Connection = new SqlConnection(connectionString);
sCommand.CommandType = CommandType.Text;
sCommand.CommandText = $"COUNT(*) FROM {Related_Huge_Table_Name} WHERE Id={ID}";
sCommand.Connection.Open();
SqlDataReader reader = sCommand.ExecuteReader();
var count = 0;
if (reader.HasRows)
{
reader.Read();
count = reader.GetInt32(0);
}
else
{
Debug.WriteLine("Related_Huge_Table_Data: No Rows returned in Query.");
}
sCommand.Connection.Close();

Try this:
MyDataContext context = new MyDataContext();
var count = context.MainTables.GroupBy(x => x.ID).Distict().Count();

The answer of GSerg is the correct one in many case. But when your table starts to be really huge, even a Count(1) directly in SQL Server is slow.
The best way you can get round this is to query the database stats directly, which is impossible with Linq (or I don't know of).
The best thing you can do is to create a static sub (C#) on your tables definition witch will return the result of the following query:
SELECT
SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = '{TableName}'
AND (index_id < 2)
where {TableName} is the database name of your table.
Beware it's an answer only for the case of counting all records in a table!

Is your linq2sql returning the recordset and then doing the .Count() locally, or is it sending SQL to the server to do the count on the server? There will be a big difference in performance there.
Also, have you inspected the SQL that's being generated when you execute the query? From memory, Linq2Sql allows you to inspect SQL (maybe by setting up a logger on your class?). In Entity Framework, you can see it when debugging and inspecting the IQueryable<> object, not sure if there's an equivalent in Linq2Sql.
Way to view SQL executed by LINQ in Visual Studio?
Alternatively, use the SQL Server Profiler (if available), or somehow see what's being executed.

You may try following:-
var c = from rt in context.Related_Huge_Table_Data
join t in context.MainTables
on rt.MainTableId ==t.id where t.id=id
select new {rt.id};
var count=c.Distict().Count();

How can I use more than 2100 values in an IN clause using Dapper?

I have a List containing ids that I want to insert into a temp table using Dapper in order to avoid the SQL limit on parameters in the 'IN' clause.
So currently my code looks like this:
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
return db.Query<int>(
#"SELECT a.animalID
FROM
dbo.animalTypes [at]
INNER JOIN animals [a] on a.animalTypeId = at.animalTypeId
INNER JOIN edibleAnimals e on e.animalID = a.animalID
WHERE
at.animalId in #animalIds", new { animalIds }).ToList();
}
}
The problem I need to solve is that when there are more than 2100 ids in the animalIds list then I get a SQL error "The incoming request has too many parameters. The server supports a maximum of 2100 parameters".
So now I would like to create a temp table populated with the animalIds passed into the method. Then I can join the animals table on the temp table and avoid having a huge "IN" clause.
I have tried various combinations of syntax but not got anywhere.
This is where I am now:
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
db.Execute(#"SELECT INTO #tempAnmialIds #animalIds");
return db.Query<int>(
#"SELECT a.animalID
FROM
dbo.animalTypes [at]
INNER JOIN animals [a] on a.animalTypeId = at.animalTypeId
INNER JOIN edibleAnimals e on e.animalID = a.animalID
INNER JOIN #tempAnmialIds tmp on tmp.animalID = a.animalID).ToList();
}
}
I can't get the SELECT INTO working with the list of IDs. Am I going about this the wrong way maybe there is a better way to avoid the "IN" clause limit.
I do have a backup solution in that I can split the incoming list of animalIDs into blocks of 1000 but I've read that the large "IN" clause sufferes a performance hit and joining a temp table will be more efficient and it also means I don;t need extra 'splitting' code to batch up the ids in to blocks of 1000.

Ok, here's the version you want. I'm adding this as a separate answer, as my first answer using SP/TVP utilizes a different concept.
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
// This Open() call is vital! If you don't open the connection, Dapper will
// open/close it automagically, which means that you'll loose the created
// temp table directly after the statement completes.
db.Open();
// This temp table is created having a primary key. So make sure you don't pass
// any duplicate IDs
db.Execute("CREATE TABLE #tempAnimalIds(animalId int not null primary key);");
while (animalIds.Any())
{
// Build the statements to insert the Ids. For this, we need to split animalIDs
// into chunks of 1000, as this flavour of INSERT INTO is limited to 1000 values
// at a time.
var ids2Insert = animalIds.Take(1000);
animalIds = animalIds.Skip(1000).ToList();
StringBuilder stmt = new StringBuilder("INSERT INTO #tempAnimalIds VALUES (");
stmt.Append(string.Join("),(", ids2Insert));
stmt.Append(");");
db.Execute(stmt.ToString());
}
return db.Query<int>(#"SELECT animalID FROM #tempAnimalIds").ToList();
}
}
To test:
var ids = LoadAnimalTypeIdsFromAnimalIds(Enumerable.Range(1, 2500).ToList());
You just need to amend your select statement to what it originally was. As I don't have all your tables in my environment, I just selected from the created temp table to prove it works the way it should.
Pitfalls, see comments:
Open the connection at the beginning, otherwise the temp table will
be gone after dapper automatically closes the connection right after
creating the table.
This particular flavour of INSERT INTO is limited
to 1000 values at a time, so the passed IDs need to be split into
chunks accordingly.
Don't pass duplicate keys, as the primary key on the temp table will not allow that.
Edit
It seems Dapper supports a set-based operation which will make this work too:
public IList<int> LoadAnimalTypeIdsFromAnimalIdsV2(IList<int> animalIds)
{
// This creates an IEnumerable of an anonymous type containing an Id property. This seems
// to be necessary to be able to grab the Id by it's name via Dapper.
var namedIDs = animalIds.Select(i => new {Id = i});
using (var db = new SqlConnection(this.connectionString))
{
// This is vital! If you don't open the connection, Dapper will open/close it
// automagically, which means that you'll loose the created temp table directly
// after the statement completes.
db.Open();
// This temp table is created having a primary key. So make sure you don't pass
// any duplicate IDs
db.Execute("CREATE TABLE #tempAnimalIds(animalId int not null primary key);");
// Using one of Dapper's convenient features, the INSERT becomes:
db.Execute("INSERT INTO #tempAnimalIds VALUES(#Id);", namedIDs);
return db.Query<int>(#"SELECT animalID FROM #tempAnimalIds").ToList();
}
}
I don't know how well this will perform compared to the previous version (ie. 2500 single inserts instead of three inserts with 1000, 1000, 500 values each). But the doc suggests that it performs better if used together with async, MARS and Pipelining.

In your example, what I can't see is how your list of animalIds is actually passed to the query to be inserted into the #tempAnimalIDs table.
There is a way to do it without using a temp table, utilizing a stored procedure with a table value parameter.
SQL:
CREATE TYPE [dbo].[udtKeys] AS TABLE([i] [int] NOT NULL)
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[myProc](#data as dbo.udtKeys readonly)AS
BEGIN
select i from #data;
END
GO
This will create a user defined table type called udtKeys which contains just one int column named i, and a stored procedure that expects a parameter of that type. The proc does nothing else but to select the IDs you passed, but you can of course join other tables to it. For a hint regarding the syntax, see here.
C#:
var dataTable = new DataTable();
dataTable.Columns.Add("i", typeof(int));
foreach (var animalId in animalIds)
dataTable.Rows.Add(animalId);
using(SqlConnection conn = new SqlConnection("connectionString goes here"))
{
var r=conn.Query("myProc", new {data=dataTable},commandType: CommandType.StoredProcedure);
// r contains your results
}
The parameter within the procedure gets populated by passing a DataTable, and that DataTable's structure must match the one of the table type you created.
If you really need to pass more that 2100 values, you may want to consider indexing your table type to increase performance. You can actually give it a primary key if you don't pass any duplicate keys, like this:
CREATE TYPE [dbo].[udtKeys] AS TABLE(
[i] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[i] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)
GO
You may also need to assign execute permissions for the type to the database user you execute this with, like so:
GRANT EXEC ON TYPE::[dbo].[udtKeys] TO [User]
GO
See also here and here.

For me, the best way I was able to come up with was turning the list into a comma separated list in C# then using string_split in SQL to insert the data into a temp table. There are probably upper limits to this, but in my case I was only dealing with 6,000 records and it worked really fast.
public IList<int> LoadAnimalTypeIdsFromAnimalIds(IList<int> animalIds)
{
using (var db = new SqlConnection(this.connectionString))
{
return db.Query<int>(
#" --Created a temp table to join to later. An index on this would probably be good too.
CREATE TABLE #tempAnimals (Id INT)
INSERT INTO #tempAnimals (ID)
SELECT value FROM string_split(#animalIdStrings)
SELECT at.animalTypeID
FROM dbo.animalTypes [at]
JOIN animals [a] ON a.animalTypeId = at.animalTypeId
JOIN #tempAnimals temp ON temp.ID = a.animalID -- <-- added this
JOIN edibleAnimals e ON e.animalID = a.animalID",
new { animalIdStrings = string.Join(",", animalIds) }).ToList();
}
}
It might be worth noting that string_split is only available in SQL Server 2016 or higher or if using Azure SQL then compatibility mode 130 or higher. https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15

Get used tables from sql query [duplicate]

This question already has answers here:
List of tables used in an SQL Query
(5 answers)
Closed 9 years ago.
I need extract from simple string that represent an sql query the tables that are used on the query without execute the query itself in C#.
Example:
string strQuery = "SELECT * FROM table1 LEFT JOIN (SELECT * FROM table2) tt WHERE tt.name IN (SELECT name FROM table3)";
ArrayList arrUsedTables = GetUsedTablesFromQuery(strQuery);
and after this line the object arrUsedTables would contain:
table1,table2,table3
Remember that the query may be much complicated!

Without going to the DB you can't know for certain the names of the tables used in the query.
What if your query uses a view or a stored procedure?
Without consulting the database, these are transparent to the consumer.
The only way to be certain is to query the list of the tables from the database and then to attempt to parse them from your inline sql.

You will have to add references and directives for the following assemblies:
using Microsoft.Data.Schema.ScriptDom;
using Microsoft.Data.Schema.ScriptDom.Sql;
using System.IO;
Then, you may create the GetUsedTablesFromQuery method:
private static ArrayList GetUsedTablesFromQuery(string strQuery)
{
var parser = new TSql100Parser(true);
IList<ParseError> errors = new List<ParseError>();
using (TextReader r = new StringReader(strQuery))
{
var result = parser.GetTokenStream(r, out errors);
var tables = result
.Select((i, index) => (i.TokenType == TSqlTokenType.From) ? result[index + 2].Text : null)
.Where(i => i != null)
.ToArray();
return new ArrayList(tables);
}
}

You can certainly use a SQL parser such as ANTLR, as described in this question and answer, in order to get a full parse of the SQL, and then extract the table names.
Another option is to execute some raw SQL to get the execution plan of the query (using the instructions here). The execution plan is in XML, and then you can use Linq to XML to query the plan for all Table attributes on any ColumnReference tag.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

EF query where List contains a lot of elements - c#

Related

How to avoid duplicates when inserting data using OleDB and Entity Framework?

How can I map a "select all" query with 2 tables using SqlQuery

Optimize LINQ query that runs fast in Sql server?

How can I use more than 2100 values in an IN clause using Dapper?

Get used tables from sql query [duplicate]

Categories

Resources