Optimize long running insert query

Optimize long running insert query - c#

I have the following code that it works fine. My problem is the insert took more than three hours.
How can I optimize the insert query in sql table?
foreach(var sheetName in GetExcelSheetNames(connectionString)) {
using(OleDbConnection con1 = new OleDbConnection(connectionString)) {
var dt = new DataTable();
string query = string.Format("SELECT * FROM [{0}]", sheetName);
con1.Open();
OleDbDataAdapter adapter = new OleDbDataAdapter(query, con1);
adapter.Fill(dt);
using(SqlConnection con = new SqlConnection(consString)) {
con.Open();
for (int i = 2; i < dt.Rows.Count; i++) {
for (int j = 1; j < dt.Columns.Count; j += 3) {
try {
var s = dt.Rows[i][0].ToString();
var dt1 = DateTime.Parse(s, CultureInfo.GetCultureInfo("fr-FR"));
var s1 = dt.Rows[i][j].ToString();
var s2 = dt.Rows[i][j + 1].ToString();
var s3 = sheetName.Remove(sheetName.Length - 1);
{
SqlCommand command = new SqlCommand("INSERT INTO [Obj CA MPX] ([CA TTC],[VAL MRG TTC],[CA HT],[VAL MRG HT],[Rayon],[Date],[Code Site]) VALUES(#ca,#val,#catHT ,#valHT ,#rayon, #date ,#sheetName )", con);
command.Parameters.Add("#date", SqlDbType.Date).Value = dt1;
command.Parameters.AddWithValue("#ca", s1);
command.Parameters.AddWithValue("#val", s2);
command.Parameters.AddWithValue("#rayon", dt.Rows[0][j].ToString());
command.Parameters.AddWithValue("#sheetName", s3);
command.Parameters.Add("#catHT", DBNull.Value).Value = DBNull.Value;
command.Parameters.Add("#valHT", DBNull.Value).Value = DBNull.Value;
command.ExecuteNonQuery();
}
}
}

maybe you should save it as file and use bulk insert
https://msdn.microsoft.com/de-de/library/ms188365%28v=sql.120%29.aspx

SQL Server has the option of using a Bulk Insert.
Here is a good article on importing a csv.

You should first read this article from Eric Lippert: Which is faster?.
Keep this in mind while trying to optimize your process.
The insert took 3 hours, but have you inserted 10 items or 900.000.000.000 items?
If it's the last one, maybe 3 hours are pretty good.
What is your database? SQL Server 2005 Express? SQL Server 2014 Enterprise?
The advices could differ.
Without more details, we will only be able to give you suggestions, that could or could not apply depending on your configuration.
Here are some on the top of my head:
Is the bottleneck on the DB side? Check the execution plan, add indexes if needed
Beware of AddWithValue, it can prevent the use of indexes in your query
If you are loading a lot of data on a non-live database, you could use a lighter recovery model to prevent having a lot of useless logs (using Bulk load will use automatically BULKED_LOGGED, or you could activate the SIMPLE recovery model (alter database [YourDB] set recovery SIMPLE, don't forget to re-enable the FULL recovery model after)
Are there other alternatives than loading data from an Excel file? Can't you use another database instead or converting the Excel file to a CSV?
What does the performance monitor tells you? Maybe you need better hardware (more ram, faster disks, RAID), or move some heavily used files (mdf, ldf) on separate disks.
You could copy the Excel file several times and use parallelization, load in different tables that will be partitions of your final table.
This list could continue forever.
Here is an interesting article about optimizing data loading: We Loaded 1TB in 30 Minutes with SSIS, and So Can You
This article is focused on SSIS but some advices do not apply only to it.

You can put several (e.g. 100) inserts into a string using a string builder. Use an index for the parameter names. Note that you can have a maximum of 2100 parameters for one query.
StringBuilder batch = new StringBuilder();
for (int i = 0; i < pageSize; i++)
{
batch.AppendFormat(
#"INSERT INTO [Obj CA MPX] ([CA TTC],[VAL MRG TTC], ...) VALUES(#ca{0},#val{0}, ...)"
i);
batch.AppendLine();
batch.AppendLine();
}
SqlCommand command = new SqlCommand(batch.ToString(), con)
// append parameters, using the index
for (int i = 0; i < pageSize; i++)
{
command.Parameters.Add("#date" + i, SqlDbType.Date).Value = dt1[i];
command.Parameters.AddWithValue("#ca" + i, s1[i]);
// ...
}
command.ExecuteNonQuery();
Of course this is not finished, you have to integrate the pages into your existing loops, which may not be too simple.
Alternatively, you do not use parameters and put the arguments directly into the query. This way, you can create much larger batches (I would put 1000 to 10000 inserts into one batch) and it's much easier to implement.

Related

How to Insert to multi MYSQL tables C# fast?

I am trying to insert some data into two MYSQL tables.
the second table stores the first table row Id as a foreign key.
I have this code that works fine but it is super slow. what is the best/fastest way to make it faster?
string ConnectionString = "server=localhost; password = 1234; database = DB ; user = Jack";
MySqlConnection mConnection = new MySqlConnection(ConnectionString);
mConnection.Open();
int index = 1;
for (int i = 0; i < 100000; i++)
{
string insertPerson = "INSERT INTO myentities(Name) VALUES (#first_name);"
+ "INSERT INTO secondtable(Id, Name,myentities) VALUES (#ID, #city, LAST_INSERT_ID());";
MySqlCommand command = new MySqlCommand(insertPerson, mConnection);
command.Parameters.AddWithValue("#first_name", "Jack");
command.Parameters.AddWithValue("#ID", i+1);
command.Parameters.AddWithValue("#city", "Frank");
command.ExecuteNonQuery();
command.Parameters.Clear();
}
I have found the following code on one of the StackoverFlow questions but it was inserting data to a single table only, not to multiple tables which are connected through a foreign key.
This code is pretty fast, but I was not sure how I can make it work with multiple tables.
public static void BulkToMySQL()
{
string ConnectionString = "server=192.168.1xxx";
StringBuilder sCommand = new StringBuilder("INSERT INTO User (FirstName, LastName) VALUES ");
using (MySqlConnection mConnection = new MySqlConnection(ConnectionString))
{
List<string> Rows = new List<string>();
for (int i = 0; i < 100000; i++)
{
Rows.Add(string.Format("('{0}','{1}')", MySqlHelper.EscapeString("test"), MySqlHelper.EscapeString("test")));
}
sCommand.Append(string.Join(",", Rows));
sCommand.Append(";");
mConnection.Open();
using (MySqlCommand myCmd = new MySqlCommand(sCommand.ToString(), mConnection))
{
myCmd.CommandType = CommandType.Text;
myCmd.ExecuteNonQuery();
}
}
}

The fastest way possible is to craft a strategy for not calling mysql in a loop via the .NET MySQL Connector. Especially for i=0 to 99999 . The way you achieve this is either thru CASE A: direct db table manipulation or CASE B: thru CSV to db imports with LOAD DATA INFILE.
For CASE B: it is often wise to bring that data into a staging table or tables. Checks can be made for data readiness depending on the particular circumstances. What that means is that you may be getting external data that needs scrubbed (ETL). Other benefits include not committing unholy data to your production tables not fit for consumption. So it leaves an abort option open to you.
Now onto performance anecdotes. With MySQL and the .NET Connector version 6.9.9.0 in late 2016, I can achieve up to 40x performance gains by going this route. It may seem unnatural not to call an INSERT query but I don't in loops. Ok, sure, in small loops, but not in data ingest with bulk. Not even for 500 rows. You will experience noticable UX improvement if you re-craft some routines.
So the above is for data that truly came from external sources. For CASE A: the normal data that is already in your db the above does not apply. In those situations you strive to craft your SQL to massage your data as much as possible (read: 100%) on the server-side. As such it does so without bringing the data back to the client thus requiring some client-side with Connector looping call to get it back into the server. This does not mandate Stored Procedures necessarily or at all. Client-side calls that operate on the data in place without toward client transfers then back up are what you shoot for.

You can gain some improvement by moving unnecessary operations out of the loop, since anything you do there is repeated 100,000 times:
string insertPerson =
"INSERT INTO myentities(Name) VALUES (#first_name);"
+ "INSERT INTO secondtable(Id, Name,myentities) VALUES (#ID, #city, LAST_INSERT_ID());";
string ConnectionString = "server=localhost; password = 1234; database = DB ; user = Jack";
using (var Connection = new MySqlConnection(ConnectionString))
using (var command = new MySqlCommand(insertPerson, mConnection))
{
//guessing at column types and lengths here
command.Parameters.Add("#first_name", MySqlDbType.VarChar, 50).Value = "Jack";
var id = command.Parameters.Add("#ID", MySqlDbType.Int32);
command.Parameters.Add("#city", MySqlDbType.VarChar, 50).Value = "Frank";
mConnection.Open();
for (int i = 1; i <= 100000; i++)
{
id.Value = i;
command.ExecuteNonQuery();
}
}
But mostly, you try to avoid this scenario. Instead, you'd do something like use a numbers table to project the results for both tables in advance. There are some things you can do with foreign key constraints to set locking (you need to lock the whole table to avoid bad keys if someone else inserts or tries to read partially inserted records), transaction logging (you can set it only log the batch, rather than each change) and foreign keys enforcment (you can turn it off while you handle the insert).

SQL Parameters Inside A Loop

i have a list that i am pulling things out of to insert into a database. This is not going to be a web app so i have just been doing as follows:
string sqlStorage = (null,"asd"),
for (int i = 1; i < listsize; )
{
sqlStorage = sqlStorage + "(null,someVariableFromLoop)";
i++
}
string connString = "Server=localhost;...........";
MySqlConnection conn = new MySqlConnection(connString);
MySqlCommand command = conn.CreateCommand();
command.CommandText = #"INSERT INTO table1 VALUES " + tempSQLStorage;
etcetc...
However
"someVariableFromLoop"
is a large amount of text which includes all kinds of horrible code breaking characters. quotation marks etc etc.
So i looked into parameters (the way i should be doing SQL i know, i know), however i was unable to find a way to store these parameters inside the loop. i dont want to hit the DB every single iteration. I had a go at something along the lines of
"#variable"+i.toString();
but could not get it to work at all.
So does anyone have any idea how i would go about storing the parameters and the execute the query? Thanks in advance!

So i looked into parameters (the way i should be doing SQL i know, i know), however i was unable to find a way to store these parameters inside the loop. i dont want to hit the DB every single iteration. I had a go at something along the lines of
"#variable"+i.toString();
but could not get it to work at all.
Well, what was the error you received? Because that's the way you do it. Here's an example for MSSQL and I know the technique works, because I've done similar before:
int i = 0;
List<string> clauses = new List<string>() {"(#key0, #value0)"};
List<SqlParameter> paramList = new List<SqlParameter> {
new SqlParameter("#key0", DBNull.Value),
new SqlParameter("#value0", "asd")
};
for (i = 1; i < listSize; i++) {
clauses.Add("(#key" + i + ", #value" + i + ")");
paramList.Add(new SqlParameter("#key" + i, someKey));
paramList.Add(new SqlParameter("#value" + i, someValue);
}
SqlConnection conn = new SqlConnection(connString);
SqlCommand command = new SqlCommand(conn, #"INSERT INTO table1 VALUES " + String.Join(", ", clauses);
foreach(SqlParameter param in paramList) command.Parameters.Add(param);
command.ExecuteNonQuery();
Note, above code is quick and dirty. Obviously using statements and various other best practices should be incorporated as well for production code.
Also look at this: How do you use the MySql IN clause. It has an example of dynamically creating and passing parameters to the query, but for an SELECT...IN clause vs. INSERT...VALUES.

To ensure secure code (and avoid malformed queries), use SQL Command objects with Parameters. There is nothing horribly wrong with executing the command once for every record - a little extra overhead for round-trips over the network, but if the text is long you might have to do this anyway since queries do have a character limit.

A fast way to delete all rows of a datatable at once

I want to delete all rows in a datatable.
I use something like this:
foreach (DataRow row in dt.Rows)
{
row.Delete();
}
TableAdapter.Update(dt);
It works good but it takes lots of time if I have much rows.
Is there any way to delete all rows at once?

If you are running your code against a sqlserver database then
use this command
string sqlTrunc = "TRUNCATE TABLE " + yourTableName
SqlCommand cmd = new SqlCommand(sqlTrunc, conn);
cmd.ExecuteNonQuery();
this will be the fastest method and will delete everything from your table and reset the identity counter to zero.
The TRUNCATE keyword is supported also by other RDBMS.
5 years later:
Looking back at this answer I need to add something. The answer above is good only if you are absolutely sure about the source of the value in the yourTableName variable. This means that you shouldn't get this value from your user because he can type anything and this leads to Sql Injection problems well described in this famous comic strip. Always present your user a choice between hard coded names (tables or other symbolic values) using a non editable UI.

This will allow you to clear all the rows and maintain the format of the DataTable.
dt.Rows.Clear();
There is also
dt.Clear();
However, calling Clear() on the DataTable (dt) will remove the Columns and formatting from the DataTable.
Per code found in an MSDN question, an internal method is called by both the DataRowsCollection, and DataTable with a different boolean parameter:
internal void Clear(bool clearAll)
{
if (clearAll) // true is sent from the Data Table call
{
for (int i = 0; i < this.recordCapacity; i++)
{
this.rows[i] = null;
}
int count = this.table.columnCollection.Count;
for (int j = 0; j < count; j++)
{
DataColumn column = this.table.columnCollection[j];
for (int k = 0; k < this.recordCapacity; k++)
{
column.FreeRecord(k);
}
}
this.lastFreeRecord = 0;
this.freeRecordList.Clear();
}
else // False is sent from the DataRow Collection
{
this.freeRecordList.Capacity = this.freeRecordList.Count + this.table.Rows.Count;
for (int m = 0; m < this.recordCapacity; m++)
{
if ((this.rows[m] != null) && (this.rows[m].rowID != -1))
{
int record = m;
this.FreeRecord(ref record);
}
}
}
}

As someone mentioned, just use:
dt.Rows.Clear()

That's the easiest way to delete all rows from the table in dbms via DataAdapter. But if you want to do it in one batch, you can set the DataAdapter's UpdateBatchSize to 0(unlimited).
Another way would be to use a simple SqlCommand with CommandText DELETE FROM Table:
using(var con = new SqlConnection(ConfigurationSettings.AppSettings["con"]))
using(var cmd = new SqlCommand())
{
cmd.CommandText = "DELETE FROM Table";
cmd.Connection = con;
con.Open();
int numberDeleted = cmd.ExecuteNonQuery(); // all rows deleted
}
But if you instead only want to remove the DataRows from the DataTable, you just have to call DataTable.Clear. That would prevent any rows from being deleted in dbms.

Why dont you just do it in SQL?
DELETE FROM SomeTable

Just use dt.Clear()
Also you can set your TableAdapter/DataAdapter to clear before it fills with data.

TableAdapter.Update(dt.Clone());
//or
dt=dt.Clone();
TableAdapter.Update(dt);
//or
dt.Rows.Clear();
dt.AcceptChanges();
TableAdapter.Update(dt);

If you are really concerned about speed and not worried about the data you can do a Truncate. But this is assuming your DataTable is on a database and not just a memory object.
TRUNCATE TABLE tablename
The difference is this removes all rows without logging the row deletes making the transaction faster.

Here we have same question. You can use the following code:
SqlConnection con = new SqlConnection();
con.ConnectionString = ConfigurationManager.ConnectionStrings["yourconnectionstringnamehere"].ConnectionString;
con.Open();
SqlCommand com = new SqlCommand();
com.Connection = con;
com.CommandText = "DELETE FROM [tablenamehere]";
SqlDataReader data = com.ExecuteReader();
con.Close();
But before you need to import following code to your project:
using System.Configuration;
using System.Data.SqlClient;
This is the specific part of the code that can delete all rows is a table:
DELETE FROM [tablenamehere]
This must be your table name:tablenamehere
- This can delete all rows in table: DELETE FROM

I using MDaf just use this code :
DataContext db = new DataContext(ConfigurationManager.ConnectionStrings["con"].ConnectionString);
db.TruncateTable("TABLE_NAME_HERE");
//or
db.Execute("TRUNCATE TABLE TABLE_NAME_HERE ");

Here is a clean and modern way to do it using Entity FW and without SQL Injection or TSQL..
using (Entities dbe = new Entities())
{
dbe.myTable.RemoveRange(dbe.myTable.ToList());
dbe.SaveChanges();
}

Is there a Clear() method on the DataTable class??
I think there is. If there is, use that.

Datatable.clear() method from class DataTable

In C#, is "SELECT TOP 0 * FROM (/* ... */) s" used in conjuction with ADO.NET a good way to determine the column information in a SELECT statement?

I have a SQL SELECT statement which will not be known until runtime, which could contain JOIN's and inner selects. I need to determine the names and data types of each of the columns of the returned result of the statment from within C#. I am inclined to do something like:
string orginalSelectStatement = "SELECT * FROM MyTable";
string selectStatement = string.Format("SELECT TOP 0 * FROM ({0}) s", orginalSelectStatement);
SqlConnection connection = new SqlConnection(#"MyConnectionString");
SqlDataAdapter adapter = new SqlDataAdapter(selectStatement, connection);
DataTable table = new DataTable();
adapter.Fill(table);
foreach (DataColumn column in table.Columns)
{
Console.WriteLine("Name: {0}; Type: {1}", column.ColumnName, column.DataType);
}
Is there a better way to do what I am trying to do? By "better" I mean either a less resource-intensive way of accomplishing the same task or a more sure way of accomplishing the same task (i.e. for all I know the code snippet I just gave will fail in some situations).
SOLUTION:
First of all, my TOP 0 hack is bad, namely for something like this:
SELECT TOP 0 * FROM (SELECT 0 AS A, 1 AS A) S
In other words, in a sub-select, if two things are aliased to the same name, that throws an error. So it is out of the picture. However, for completeness sake, I went ahead and tested it, along with the two proposed solutions: SET FMTONLY ON and GetSchemaTable.
Here are the results (in milliseconds for 1,000 queries, each):
Schema Time: 3130
TOP 0 Time: 2808
FMTONLY ON Time: 2937
My recommendation would be GetSchemaTable since it's more likely to be future-proofed by a removal of the SET FMTONLY ON as valid SQL and it solves the aliasing problem, even though it is slightly slower. However, if you "know" that duplicate column names will never be an issue, then TOP 0 is faster than GetSchemaTable and is more future-proofed than SET FMTONLY ON.
Here is my experimental code:
int schemaTime = 0;
int topTime = 0;
int fmtOnTime = 0;
SqlConnection connection = new SqlConnection(#"MyConnectionString");
connection.Open();
SqlCommand schemaCommand = new SqlCommand("SELECT * FROM MyTable", connection);
SqlCommand topCommand = new SqlCommand("SELECT TOP 0 * FROM (SELECT * FROM MyTable) S", connection);
SqlCommand fmtOnCommand = new SqlCommand("SET FMTONLY ON; SELECT * FROM MyTable", connection);
for (int i = 0; i < 1000; i++)
{
{
DateTime start = DateTime.Now;
using (SqlDataReader reader = schemaCommand.ExecuteReader(CommandBehavior.SchemaOnly))
{
DataTable table = reader.GetSchemaTable();
}
DateTime stop = DateTime.Now;
TimeSpan span = stop - start;
schemaTime += span.Milliseconds;
}
{
DateTime start = DateTime.Now;
DataTable table = new DataTable();
SqlDataAdapter adapter = new SqlDataAdapter(topCommand);
adapter.Fill(table);
DateTime stop = DateTime.Now;
TimeSpan span = stop - start;
topTime += span.Milliseconds;
}
{
DateTime start = DateTime.Now;
DataTable table = new DataTable();
SqlDataAdapter adapter = new SqlDataAdapter(fmtOnCommand);
adapter.Fill(table);
DateTime stop = DateTime.Now;
TimeSpan span = stop - start;
fmtOnTime += span.Milliseconds;
}
}
Console.WriteLine("Schema Time: " + schemaTime);
Console.WriteLine("TOP 0 Time: " + topTime);
Console.WriteLine("FMTONLY ON Time: " + fmtOnTime);
connection.Close();

You could use GetSchemaTable to do what you want.
There is an example of how to use it here.

If using SQL Server, I would try using SET FMTONLY ON
Returns only metadata to the client. Can be used to test the format of
the response without actually running the query.
Apparently on SQL Server 2012, there's a better way. All is specified in the linked MSDN article.
BTW, this technique is what LINQ To SQL uses internally to determine the result set returned by a stored procedure, etc.

Dynamic SQL is always a bit of a minefield, but you could the SET FMTONLY ON on your query - this means the query will only return Metadata, the same as if no results were returned. So:
string selectStatement = string.Format("SET FMTONLY ON; {0}", orginalSelectStatement);
Alternatively, if you aren't tied to ADO, could you not go down the Linq-to-SQL route and generate a data context which will map out all of your database schemas in to code and their relevant types? You could also have a look at some of the Micro ORMs out there, such as Dapper.Net
There are plenty of other ORMs out there too.

Loading large Visual FoxPro files are very slow. C#(OleDB)

I want to load large .DBF (Visual FoxPro) files into a DataTable.
For smaller files < 300MB it works fine with a fill command, and it runs pretty fast.
But for larger file I run out of memory and need to load them into smaller parts.
(Loading row 0...1000, then 1001..2000 and so on)
Based on some code found on the internet I made this operation, input start is the row to start reading from and max is the number of rows that I want to read.
The problem is that even if I just want to read 5 rows it takes around 30-60seconds on my machine due to the very slow execution of the Command.ExecuteReader.
public DataTable LoadTable2(string folder, string table, int start, int max)
{
string ConnectionString = "Provider=vfpoledb.1;Data Source="+folder+"\\"+table;
OleDbConnection Connection = new OleDbConnection(ConnectionString);
Connection.Open();
string dataString = String.Format("Select * from {0}", table);
OleDbCommand Command = new OleDbCommand(dataString, Connection);
//Takes very long time on large files.
OleDbDataReader Reader = Command.ExecuteReader(CommandBehavior.SequentialAccess);
DataSet ds = new DataSet();
var dt = ds.Tables.Add(table);
// Add the table columns.
for (int i = 0; i < Reader.FieldCount; i++)
{
dt.Columns.Add(Reader.GetName(i), Reader.GetFieldType(i));
}
int intIdx = 0;
int cnt = 0;
while (Reader.Read())
{
if (intIdx >= start)
{
DataRow r = dt.NewRow();
// Assign DataReader values to DataRow.
for (int i = 0; i < Reader.FieldCount; i++)
r[i] = Reader[i];
dt.Rows.Add(r);
cnt++;
}
if (cnt >= max)
{
break;
}
intIdx++;
}
Reader.Close();
Connection.Close();
return dt;
}
I have tested with both OLE and ODBC connection, no big difference.
Files are all on local disc.
Does anyone have a good idea for how to make this much faster?
Best regards
Anders

I believe that with that driver (VFPOLEDB), you can change your query to specify the record numbers of interest. That way it would not be necessary to read through a bunch of records to get to the starting point. It would then not be necessary to skip over any records; just read the entire requested result set. The query might look like this:
SELECT * from thetable where recno() >= 5000 and recno() <= 5500
I realized that I have this driver installed and just now tested it and it does work. However, I don't think it "optimizes" that statement. In theory, it could directly compute the record offsets using record numbers, but (based on simple observation of a query on a larger dbf), it seems to do a full table scan. However, with FoxPro, you could create an index on recno(), and then it would be optimized.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.