Speed up SqlBulkCopy .NETCore - c#

I am really clueless here.
I import ~2 million rows into my Azure SQL database. I create a temp table where I put my values, when I use merge technique to insert only rows that do not have duplicates. My code and scripts are shown below.
public async Task BulkImportWithoutDuplicates(DataTable reader)
{
var tableName = "##tempImport";
using (var connection = new SqlConnection(sqlCOnn.ConnectionString))
{
using (SqlCommand command = new SqlCommand("", sqlCOnn))
{
try
{
sqlCOnn.Open();
// Creating temp table on database
command.CommandText = Scripts.GetTempTableScript();
command.ExecuteNonQuery();
// Bulk insert into temp table
using (SqlBulkCopy b = new SqlBulkCopy(conString, SqlBulkCopyOptions.TableLock))
{
b.BulkCopyTimeout = 0;
b.BatchSize = reader.Rows.Count;
b.DestinationTableName = tableName;
//dataTable
await b.WriteToServerAsync(reader);
b.Close();
}
// Updating destination table, and dropping temp table
command.CommandText = Scripts.GetMergeScript();
var rows = command.ExecuteNonQuery();
}
catch (Exception ex)
{
// Handle exception properly
}
finally
{
connection.Close();
}
}
}
}
public static string GetTempTableScript()
{
return $#"
IF OBJECT_ID('tempdb.dbo.##tempImport', 'U') IS NOT NULL
BEGIN
DROP TABLE ##tempImport;
END
CREATE TABLE ##tempImport ( ... all the columns);";
}
public static string GetMergeScript()
{
return $#"MERGE INTO dbo.Data AS target
USING ##tempImport AS source
ON (source.TransactionId = target.TransactionId AND source.UserId = target.UserId)
WHEN NOT MATCHED THEN
INSERT (Start, Spend, UserId, Product, Shop, ClientId, UploadDataId, UniqueId, TransactionId, q, cq, c2)
VALUES (source.Start, source.Spend, source.UserId, source.Product, source.Shop,
source.ClientId, source.UploadDataId, source.UniqueId, source.TransactionId, source.q, source.c1, source.c2);
";
}
I really do not get why it takes ages until it finishes. I waited for 24 minutes until it was added to temporary table alone.
I was reading this article and it seems that it shouldn't take long. https://www.adathedev.co.uk/2011/01/sqlbulkcopy-to-sql-server-in-parallel.html?m=1
What I am doing wrong here? How can I improve the speed?
I tried using both IDataReader and DataTable but both of them do not work well for me...

Related

Sqlite DataAdapter isn't updating when called

I have to write identical tables to two different Sqlite database. The tables rewrite existing data and ids, so I currently just delete the entire table and rewrite. Depending on the order that I write to the db, the C# SQLiteCommandBuilder does not update the second db called.
If I call Db_A before Db_B, then A gets written and B gets deleted and vice versa. Can anyone tell me why the second table enters the code to get deleted, but the Sqlite adapter never updates the second table? It doesn't throw an error either.
public static bool WriteDt_Name(DataTable dt)
{
using (connA = GetDbAConn())
{
SaveDataTable(connA, dt);
}
using (connB = GetDbBConn())
{
SaveDataTable(connB, dt);
}
return true;
}
public static void SaveDataTable(SQLiteConnection conn, DataTable dt)
{
string table = dt.TableName;
var cmd = conn.CreateCommand();
cmd.CommandText = string.Format("DELETE FROM {0}", table);
int val = cmd.ExecuteNonQuery();
cmd.CommandText = string.Format("SELECT * FROM {0}", table);
using (var adapter = new SQLiteDataAdapter(cmd))
{
using (SQLiteCommandBuilder builder = new SQLiteCommandBuilder(adapter))
{
adapter.Update(dt);
conn.Close();
}
}
}
public static SQLiteConnection GetDbAConn()
{
string base_dir = System.AppDomain.CurrentDomain.BaseDirectory;
string path = Directory.GetParent(Directory.GetParent(Directory.GetParent(Directory.GetParent(Directory.GetParent(base_dir).ToString()).ToString()).ToString()).ToString()).ToString();
path = path + "\\db\\DbA.sqlite;";
SQLiteConnection conn = new SQLiteConnection("Data Source=" + path + "Version=3;");
return conn;
}
I have tried splitting SaveDataTable into a SaveDt_A and SaveDt_B and calling it that way. I still get the same result.

List to SqlBulkCopy without datatable

We have a big list around 100000 records and want to insert it into a sql table.
What are we doing is; converting that list into data table and passing datatable to SqlBulkcopy method.
This conversion from list to Datatable taking more time. Tried using Parallel but as Datatable is not thread safe so avoided that.
Adding sample poc code which generates integer list and insert it into temp table
static void Main(string[] args)
{
List<int> valueList = GenerateList(100000);
Console.WriteLine("Starting with Bulk Insert ");
DateTime startTime = DateTime.Now;
int recordCount = BulkInsert(valueList);
TimeSpan ts = DateTime.Now.Subtract(startTime);
Console.WriteLine("Bulk insert for {0} records in {1} miliseconds.-> ", recordCount, ts.Milliseconds);
Console.WriteLine("Done.");
Console.ReadLine();
}
private static int BulkInsert(List<int> valueList)
{
SqlBulkHelper sqlBulkHelper = new SqlBulkHelper();
var eventIdDataTable = CreateIdentityDataTable(valueList, "SqlTable", "Id");
return FillBulkPoundTable(eventIdDataTable, "#SqlTable");
}
private static List<int> GenerateList(int size)
{
return Enumerable.Range(0, size).ToList();
}
private static DataTable CreateIdentityDataTable(List<int> ids, string dataTableName, string propertyName)
{
if (ids == null) return null;
using (var dataTable = new DataTable(dataTableName))
{
dataTable.Locale = CultureInfo.CurrentCulture;
var dtColumn = new DataColumn(propertyName, Type.GetType("System.Int32"));
dataTable.Columns.Add(dtColumn);
foreach (int id in ids)
{
DataRow row = dataTable.NewRow();
row[propertyName] = id;
dataTable.Rows.Add(row);
}
return dataTable;
}
}
private static int FillBulkPoundTable(DataTable dataTable, string destinationTableName)
{
int totalInsertedRecordCount = 0;
using (SqlConnection _connection = new SqlConnection(CongifUtil.sqlConnString))
{
string sql =
#"If object_Id('tempdb..#EventIds') is not null drop table #EventIds
CREATE TABLE #EventIds(EvId int) ";
_connection.Open();
using (var command = new SqlCommand(sql, _connection))
{
command.ExecuteNonQuery();
}
using (var sqlBulkCopy = new SqlBulkCopy(_connection))
{
sqlBulkCopy.BulkCopyTimeout = 0;
sqlBulkCopy.DestinationTableName = destinationTableName;
sqlBulkCopy.WriteToServer(dataTable);
}
using (var command = new SqlCommand(sql, _connection))
{
command.CommandText = "Select Count(1) as RecordCount from #EventIds";
SqlDataReader reader = command.ExecuteReader();
if (reader.HasRows)
{
while (reader.Read())
{
totalInsertedRecordCount = Convert.ToInt32(reader["RecordCount"]);
}
}
}
}
return totalInsertedRecordCount;
}
Currently it is taking around 8 seconds but we need to make it more faster. Reason is our target is to insert 900,000 records which will be devided into 100,000 batch each.
Can you give us any hint how can we make it perfect and faster?
PS. Tried with Dapper insert too but it is not faster than BulkCopy.
First Covert your list into XML something like
List<int> Branches = new List<int>();
Branches.Add(1);
Branches.Add(2);
Branches.Add(3);
XElement xmlElements = new XElement("Branches", Branches.Select(i => new
XElement("branch", i)));
Then pass the xml to a SP as parameter and insert it directly to your table, Example :
DECLARE #XML XML
SET #XML = '<Branches>
<branch>1</branch>
<branch>2</branch>
<branch>3</branch>
</Branches>'
DECLARE #handle INT
DECLARE #PrepareXmlStatus INT
EXEC #PrepareXmlStatus= sp_xml_preparedocument #handle OUTPUT, #XML
SELECT * FROM OPENXML(#handle, '/Branches/branch', 2)
WITH (
branch varchar
)
EXEC sp_xml_removedocument #handle
Bach Size
From what I understand, you try to insert with a BatchSize of 100000. Higher is not always better.
Try to lower this amount to 5,000 instead and check for the performance difference.
You increase the amount of database round-trip but it may also go faster (Too much factor such as the row size are involved here)
TableLock
Using the SqlBulkCopyOptions.TableLock will improve your insert performance.
using (var sqlBulkCopy = new SqlBulkCopy(_connection, SqlBulkCopyOptions.KeepIdentity))

How can I make the MySql insert more efficient/faster? [duplicate]

I am migrating my program from Microsoft SQL Server to MySQL. Everything works well except one issue with bulk copy.
In the solution with MS SQL the code looks like this:
connection.Open();
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "testTable";
bulkCopy.WriteToServer(rawData);
Now I try to do something similar for MySQL. Because I think there would be bad performance I don't want to write the DataTable to a CSV file and do the insert from there with the MySqlBulkLoader class.
Any help would be highly appreciated.
Because I think there would be bad performance I don't want to write the DataTable to a CSV file and do the insert from there with the MySqlBulkLoader class.
Don't rule out a possible solution based on unfounded assumptions. I just tested the insertion of 100,000 rows from a System.Data.DataTable into a MySQL table using a standard MySqlDataAdapter#Update() inside a Transaction. It consistently took about 30 seconds to run:
using (MySqlTransaction tran = conn.BeginTransaction(System.Data.IsolationLevel.Serializable))
{
using (MySqlCommand cmd = new MySqlCommand())
{
cmd.Connection = conn;
cmd.Transaction = tran;
cmd.CommandText = "SELECT * FROM testtable";
using (MySqlDataAdapter da = new MySqlDataAdapter(cmd))
{
da.UpdateBatchSize = 1000;
using (MySqlCommandBuilder cb = new MySqlCommandBuilder(da))
{
da.Update(rawData);
tran.Commit();
}
}
}
}
(I tried a couple of different values for UpdateBatchSize but they didn't seem to have a significant impact on the elapsed time.)
By contrast, the following code using MySqlBulkLoader took only 5 or 6 seconds to run ...
string tempCsvFileSpec = #"C:\Users\Gord\Desktop\dump.csv";
using (StreamWriter writer = new StreamWriter(tempCsvFileSpec))
{
Rfc4180Writer.WriteDataTable(rawData, writer, false);
}
var msbl = new MySqlBulkLoader(conn);
msbl.TableName = "testtable";
msbl.FileName = tempCsvFileSpec;
msbl.FieldTerminator = ",";
msbl.FieldQuotationCharacter = '"';
msbl.Load();
System.IO.File.Delete(tempCsvFileSpec);
... including the time to dump the 100,000 rows from the DataTable to a temporary CSV file (using code similar to this), bulk-loading from that file, and deleting the file afterwards.
Similar to SqlBulkCopy, we have MySqlBulkCopy for Mysql.
here is the example how to use it.
public async Task<bool> MySqlBulCopyAsync(DataTable dataTable)
{
try
{
bool result = true;
using (var connection = new MySqlConnector.MySqlConnection(_connString + ";AllowLoadLocalInfile=True"))
{
await connection.OpenAsync();
var bulkCopy = new MySqlBulkCopy(connection);
bulkCopy.DestinationTableName = "yourtable";
// the column mapping is required if you have a identity column in the table
bulkCopy.ColumnMappings.AddRange(GetMySqlColumnMapping(dataTable));
await bulkCopy.WriteToServerAsync(dataTable);
return result;
}
}
catch (Exception ex)
{
throw;
}
}
private List<MySqlBulkCopyColumnMapping> GetMySqlColumnMapping(DataTable dataTable)
{
List<MySqlBulkCopyColumnMapping> colMappings = new List<MySqlBulkCopyColumnMapping>();
int i = 0;
foreach (DataColumn col in dataTable.Columns)
{
colMappings.Add(new MySqlBulkCopyColumnMapping(i, col.ColumnName));
i++;
}
return colMappings;
}
You can ignore the column mapping if you don't have any identity column in your table.
If you have identity column then you have to use the column mapping otherwise it won't insert any records in the table
It will just give message like "x rows were copied but only 0 rows were inserted".
This class i available in the below library
Assembly MySqlConnector, Version=1.0.0.0
Using any of BulkOperation NuGet-package, you can easily have this done.
Here is an example using the package from https://www.nuget.org/packages/Z.BulkOperations/2.14.3/
MySqlConnection conn = DbConnection.OpenConnection();
DataTable dt = new DataTable("testtable");
MySqlDataAdapter da = new MySqlDataAdapter("SELECT * FROM testtable", conn);
MySqlCommandBuilder cb = new MySqlCommandBuilder(da);
da.Fill(dt);
instead of using
......
da.UpdateBatchSize = 1000;
......
da.Update(dt)
just following two lines
var bulk = new BulkOperation(conn);
bulk.BulkInsert(dt);
will take only 5 seconds to copy the whole DataTable into MySQL without first dumping the 100,000 rows from the DataTable to a temporary CSV file.

Fastest way to update more than 50.000 rows in a mdb database c#

I searched on the net something but nothing really helped me. I want to update, with a list of article, a database, but the way that I've found is really slow.
This is my code:
List<Article> costs = GetIdCosts(); //here there are 70.000 articles
conn = new OleDbConnection(string.Format(MDB_CONNECTION_STRING, PATH, PSW));
conn.Open();
transaction = conn.BeginTransaction();
using (var cmd = conn.CreateCommand())
{
cmd.Transaction = transaction;
cmd.CommandText = "UPDATE TABLE_RO SET TABLE_RO.COST = ? WHERE TABLE_RO.ID = ?;";
for (int i = 0; i < costs.Count; i++)
{
double cost = costs[i].Cost;
int id = costs[i].Id;
cmd.Parameters.AddWithValue("data", cost);
cmd.Parameters.AddWithValue("id", id);
if (cmd.ExecuteNonQuery() != 1) throw new Exception();
}
}
transaction.Commit();
But this way take a lot of minutes something like 10 minutes or more. There are another way to speed up this updating ? Thanks.
Try modifying your code to this:
List<Article> costs = GetIdCosts(); //here there are 70.000 articles
// Setup and open the database connection
conn = new OleDbConnection(string.Format(MDB_CONNECTION_STRING, PATH, PSW));
conn.Open();
// Setup a command
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
cmd.CommandText = "UPDATE TABLE_RO SET TABLE_RO.COST = ? WHERE TABLE_RO.ID = ?;";
// Setup the paramaters and prepare the command to be executed
cmd.Parameters.Add("?", OleDbType.Currency, 255);
cmd.Parameters.Add("?", OleDbType.Integer, 8); // Assuming you ID is never longer than 8 digits
cmd.Prepare();
OleDbTransaction transaction = conn.BeginTransaction();
cmd.Transaction = transaction;
// Start the loop
for (int i = 0; i < costs.Count; i++)
{
cmd.Parameters[0].Value = costs[i].Cost;
cmd.Parameters[1].Value = costs[i].Id;
try
{
cmd.ExecuteNonQuery();
}
catch (Exception ex)
{
// handle any exception here
}
}
transaction.Commit();
conn.Close();
The cmd.Prepare method will speed things up since it creates a compiled version of the command on the data source.
Small change option:
Using StringBuilder and string.Format construct one big command text.
var sb = new StringBuilder();
for(....){
sb.AppendLine(string.Format("UPDATE TABLE_RO SET TABLE_RO.COST = '{0}' WHERE TABLE_RO.ID = '{1}';",cost, id));
}
Even faster option:
As in first example construct a sql but this time make it look (in result) like:
-- declaring table variable
declare table #data (id int primary key, cost decimal(10,8))
-- insert union selected variables into the table
insert into #data
select 1121 as id, 10.23 as cost
union select 1122 as id, 58.43 as cost
union select ...
-- update TABLE_RO using update join syntax where inner join data
-- and copy value from column in #data to column in TABLE_RO
update dest
set dest.cost = source.cost
from TABLE_RO dest
inner join #data source on dest.id = source.id
This is the fastest you can get without using bulk inserts.
Performing mass-updates with Ado.net and OleDb is painfully slow. If possible, you could consider performing the update via DAO. Just add the reference to the DAO-Library (COM-Object) and use something like the following code (caution -> untested):
// Import Reference to "Microsoft DAO 3.6 Object Library" (COM)
string TargetDBPath = "insert Path to .mdb file here";
DAO.DBEngine dbEngine = new DAO.DBEngine();
DAO.Database daodb = dbEngine.OpenDatabase(TargetDBPath, false, false, "MS Access;pwd="+"insert your db password here (if you have any)");
DAO.Recordset rs = daodb.OpenRecordset("insert target Table name here", DAO.RecordsetTypeEnum.dbOpenDynaset);
if (rs.RecordCount > 0)
{
rs.MoveFirst();
while (!rs.EOF)
{
// Load id of row
int rowid = rs.Fields["Id"].Value;
// Iterate List to find entry with matching ID
for (int i = 0; i < costs.Count; i++)
{
double cost = costs[i].Cost;
int id = costs[i].Id;
if (rowid == id)
{
// Save changed values
rs.Edit();
rs.Fields["Id"].Value = cost;
rs.Update();
}
}
rs.MoveNext();
}
}
rs.Close();
Note the fact that we are doing a full table scan here. But, unless the total number of records in the table is many orders of magnitude bigger than the number of updated records, it should still outperform the Ado.net approach significantly...

C# loops and mass insert

The follow code will insert some values in my database. It gets 6 random values, puts the stuff in an array and then inserts it in the database.
public void LottoTest(object sender, EventArgs e)
{
Dictionary<int, int> numbers = new Dictionary<int, int>();
Random generator = new Random();
while (numbers.Count < 6)
{
numbers[generator.Next(1, 49)] = 1;
}
string[] lotto = numbers.Keys.OrderBy(n => n).Select(s => s.ToString()).ToArray();
foreach (String _str in lotto)
{
Response.Write(_str);
Response.Write(",");
}
var connectionstring = "Server=C;Database=lotto;User Id=lottoadmin;Password=password;";
using (var con = new SqlConnection(connectionstring)) // Create connection with automatic disposal
{
con.Open();
using (var tran = con.BeginTransaction()) // Open a transaction
{
// Create command with parameters (DO NOT PUT VALUES IN LINE!!!!!)
string sql =
"insert into CustomerSelections(val1,val2,val3,val4,val5,val6) values (#val1,#val2,#val3,#val4,#val5,#val6)";
var cmd = new SqlCommand(sql, con);
cmd.Parameters.AddWithValue("val1", lotto[0]);
cmd.Parameters.AddWithValue("val2", lotto[1]);
cmd.Parameters.AddWithValue("val3", lotto[2]);
cmd.Parameters.AddWithValue("val4", lotto[3]);
cmd.Parameters.AddWithValue("val5", lotto[4]);
cmd.Parameters.AddWithValue("val6", lotto[5]);
cmd.Transaction = tran;
cmd.ExecuteNonQuery(); // Insert Record
tran.Commit(); // commit transaction
Response.Write("<br />");
Response.Write("<br />");
Response.Write("Ticket has been registered!");
}
}
}
What is the best way to loop and insert MASS entries into the database. Lets say, 100,000 records via C#? I want to be able to generate the random numbers by my method and utilize the insert which i have too..
For true large scale inserts, SqlBulkCopy is your friend. The easy but inefficient way to do this is just to fill a DataTable with the data, and throw that at SqlBulkCopy, but it can be done twice as fast (trust me, I've timed it) by spoofing an IDataReader. I recently moved this code into FastMember for convenience, so you can just do something like:
class YourDataType {
public int val1 {get;set;}
public string val2 {get;set;}
... etc
public DateTime val6 {get;set;}
}
then create an iterator block (i.e. a non-buffered forwards only reader):
public IEnumerable<YourDataType> InventSomeData(int count) {
for(int i = 0 ; i < count ; i++) {
var obj = new YourDataType {
... initialize your random per row values here...
}
yield return obj;
}
}
then:
var data = InventSomeData(1000000);
using(var bcp = new SqlBulkCopy(connection))
using(var reader = ObjectReader.Create(data))
{ // note that you can be more selective with the column map
bcp.DestinationTableName = "CustomerSelections";
bcp.WriteToServer(reader);
}
You need Sql bulk insert. There is a nice tutorial on msdn http://blogs.msdn.com/b/nikhilsi/archive/2008/06/11/bulk-insert-into-sql-from-c-app.aspx
MSDN Table Value Parameters
Basically, you fill a datatable with the data you want to put into SqlServer.
DataTable tvp = new DataTable("LottoNumbers");
forach(var numberSet in numbers)
// add the data to the dataset
Then you pass the data through ADO using code similar to this...
command.Parameters.Add("#CustomerLottoNumbers", SqlDbType.Structured);
command.Parameters["CustomerLottoNumbers"].Value = tvp;
Then you could use sql similar to this...
INSERT CustomerSelections
SELECT * from #CustomerLottoNumbers

Categories