Firebird dotnet batch inserts - c#

can someone help me speed this up.
I have a dataset (from a csv file) called dsresult and I want to pump it into a firebird table.
Currently I am doing it 1 row at a time, but I would prefer to do this in batches of 500 rows.
I am using the firebird.net provider
string connectionString = "ServerType=1;User=SYSDBA;Password=masterkey;Dialect=3;Database=MYDB.FDB";
string sql = "INSERT INTO POSTIN (NUMID, CHANGE, PLACENAME, BOXCODE, STRCODE, TOWN) VALUES (#NUMID, #CHANGE, #PLACENAME, #BOXCODE, #STRCODE, #TOWN)";
FbConnection conn = new FbConnection(connectionString)
FbCommand command = new FbCommand(sql, conn);
foreach (DataRow r in dsResult.Tables[0].Rows)
{
command.Parameters.AddWithValue("#NUMID", r["NUMID"]);
command.Parameters.AddWithValue("#CHANGE", r["CHANGE"]);
command.Parameters.AddWithValue("#PLACENAME", r["PLACENAME"]);
command.Parameters.AddWithValue("#BOXCODE", r["BOXCODE"]);
command.Parameters.AddWithValue("#STRCODE", r["STRCODE"]);
command.Parameters.AddWithValue("#TOWN", r["TOWN"]);
command.ExecuteNonQuery();
}
it takes aaaaaaaaaaages to run.
in delphi i would have just jused cachedupdates.
post 500 records at a time
and commit on the 500th
Thanks

try something like this:
using(FbConnection c = new FbConnection(csb.ToString()))
{
FbBatchExecution fbe = new FbBatchExecution(c);
//loop through your commands here
{
fbe.SqlStatements.Add(cmd);
}
fbe.Execute();
}

Firebird's wire protocol doesn't support sending more commands in one batch (and in one roundtrip). Probably best idea is to use EXECUTE BLOCK (aka anonymous stored procedure) and send inserts there.
For example:
execute block
as
begin
insert into ...;
insert into ...;
...
end
and execute this.
BTW the FbBatchExecution will send one command a time as well.

You should only add the parameters once, and only change the values off them in the loop so something like:
create the parameters once:
command.Parameters.Add("#NUMID", FbDbType.Int)
....
in the loop do:
foreach (DataRow r in dsResult.Tables[0].Rows)
{
command.Parameters["#NUMID"].Value = r["NUMID"];
....
}
this should really speed things up.

Related

How do I batch 1000 inserts in the given loop scenario?

ORIGINAL QUESTION:
I have some code which looks like this:
for (int i = start_i; i <= i_s; i++)
{
var json2 = JObject.Parse(RequestServer("query_2", new List<JToken>(){json1["result"]}));
foreach (var data_1 in json2["result"]["data_1"])
{
var json3 = JObject.Parse(RequestServer("query_3", new List<JToken>(){data_1, 1}));
foreach (var data_2 in json3["result"]["data_2"])
{
var data_1 = data_2["id"];
var index = data_2["other"];
}
foreach (var other in json3["result"]["other"])
{
var data_3_1 = other["data_3"]["data_3_1"];
var data_4 = other["data_4"];
var data_5 = other["data_5"];
foreach (var data_3_1 in other["data_3"]["data_3_1"])
{
//Console.WriteLine(data_3_1); <- very fast
insert_data((string)data_3_1); <- very slow
}
}
}
}
This code was able to generate about 5000 WriteLines in less than a minute. However, I now want to insert that data into a database. When I try to do that, the code now takes much much longer to get through the 5000 sets of data.
My question is, how do I batch the database inserts into about 1000 inserts at a time, instead of doing one at a time. I have tried creating the insert statement using a stringbuilder which is fine, what I can't figure out is how to generate 1000 at a time. I have tried using for loops upto 1000, and then trying to break out of the foreach loop, before starting with the next 1000, but it just makes a big mess.
I have looked at questions like this example, but they are no good for my loop scenario. I know how to do bulk inserts at the sql level, I just can't seem to figure out how to generate the bulk sql inserts using the unique loop situation I have above using the those very specific loops in the example code.
The 5000 records was just a test run. The end code will have to deal with millions, if not billions of inserts. Based on rough calculations, the end result will use about 500GB of drive space when inserted into a database, so I will need to batch an optimum amount into RAM before inserting into the database.
UPDATE 1:
This is what happens in insert_data:
public static string insert_data(string data_3_1)
{
string str_conn = #"server=localhost;port=3306;uid=username;password=password;database=database";
MySqlConnection conn = null;
conn = new MySqlConnection(str_conn);
conn.Open();
MySqlCommand cmd = new MySqlCommand();
cmd.Connection = conn;
cmd.CommandText = "INSERT INTO database_table (data_3_1) VALUES (#data_3_1)";
cmd.Prepare();
cmd.Parameters.AddWithValue("#data_3_1", data_3_1);
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
return null;
}
You're correct that doing bulk inserts in batches can be a big throughput win. Here's why it's a win: When you do INSERT operations one at a time, the database server does an implicit COMMIT operation after every insert, and that can be slow. So, if you can wrap every hundred or so INSERTs in a single transaction, you'll reduce that overhead.
Here's an outline of how to do that. I'll try to put it in the context of your code, but you didn't show your MySQLConnection object or query objects, so this solution of mine necessarily will be incomplete.
var batchSize = 100;
var batchCounter = batchSize;
var beginBatch = new MySqlCommand("START TRANSACTION;", conn);
var endBatch = new MySqlCommand("COMMIT;", conn);
beginBatch.ExecuteNonQuery();
for (int i = start_i; i <= i_s; i++)
{
....
foreach (var data_1 in json2["result"]["data_1"])
{
...
foreach (var other in json3["result"]["other"])
{
...
foreach (var data_3_1 in other["data_3"]["data_3_1"])
{
//Console.WriteLine(data_3_1); <- very fast
/****************** batch handling **********************/
if ( --batchCounter <= 0) {
/* commit one batch, start the next */
endBatch.ExecuteNonQuery();
beginBatch.ExecuteNonQuery();
batchCounter = batchSize;
}
insert_data((string)data_3_1); <- very slow
}
}
}
}
/* commit the last batch. It's OK if it contains no records */
endBatch.ExecuteNonQuery();
If you want, you can try different values of batchSize to find a good value. But generally something like the 100 I suggest works well.
Batch sizes of 1000 are also OK. But the larger each transaction gets, the more server RAM it uses before it's committed, and the longer it might block other programs using the same MySQL server.
There's a nice and popular extension called MoreLinq that offers an extension method called Batch(int batchSize). To get an IEnumerable containing up to 1000 elements:
foreach (var upTo1000 in other["data_3"]["data_3_1"].Batch(1000))
{
// Build a query using the (up to) 1000 elements in upTo1000
}
The best approach for me was using LOAD DATA LOCAL INFILE statement. To make it work first you have to turn ON MySQL server parameter local_infile.
I used mysql2 package for NodeJS and query function:
db.query({
sql: "LOAD DATA LOCAL INFILE .......",
infileStreamFactory: <readable stream which provides your data in flat file format>
}, function(err, results) {....});
The trick is to provide a readable stream properly. By default, LOAD DATA expects tab delimited text file. Also LOAD DATA expects some file name and in you case if you provide a stream then file name can be arbitrary string.

Backup SQL Server Schema With Data

I've been tasked with creating a backup of the data in our "default schema" database dbo to the same database using a new schema called dbobackup.
I honestly do not understand what this means as far as a database goes. Apparently, it is like having a database backup inside the existing database. I guess there is some advantage to doing that.
Anyway, I can't seem to find anywhere online that will allow me to do this.
I have found a few posts on here about copying the schema without data, but I need the data too.
Backup SQL Schema Only?
How do I check to see if a schema exists, delete it if it does, and then create a schema that accepts data in the current database?
Once I have the new schema created, can I dump data in there with a simple command like this?
SELECT * INTO [dbobackup].Table1 FROM [dbo].Table1;
That line only backs up one table, though. If I need to do this to 245 tables for this particular customer, I'd need a script.
We have several customers, too, and their databases are not structured identically.
Could I do something along these lines?
I was thinking about creating a small console program to walk through the tables.
How would I modify something like the code below to do what I want?
public static void Backup(string sqlConnection)
{
using (var conn = new SqlConnection(sqlConnection))
{
conn.Open();
var tables = new List<String>();
var sqlSelectTables = "SELECT TableName FROM [dbo];";
using (var cmd = new SqlCommand(sqlSelectTables, conn))
{
using (var r = cmd.ExecuteReader())
{
while (r.Read())
{
var item = String.Format("{0}", r["TableName"]).Trim();
tables.Add(item);
}
}
}
var fmtSelectInto = "SELECT * INTO [dbobackup].{0} FROM [dbo].{0}; ";
using (var cmd = new SqlCommand(null, conn))
{
foreach (var item in tables)
{
cmd.CommandText = String.Format(fmtSelectInto, item);
cmd.ExecuteNonQuery();
}
}
}
}
SQL Server already has this built in. If you open SQL Server Management Studio and right click on the database you want to back up, then select all tasks then backup, you will get an option to back up your database into an existing database.
This is the important part and why you should use the built in functionality: You must copy the data from one DB to the other DB in the correct order or you'll get foreign key errors all over the place. If you have a lot of data tables with a lot of relationships, this will really be hard to nail down on your own. You could write code to make a complete graph of all of the dependencies and then figure out what order to copy the table data (which is essentially what SQL Server already does).
Additionally, there are third-party programs available to do this type of backup as well (see: Google).
This is sort of a "work in progress" approach I got started with that looks promising:
public static void CopyTable(
string databaseName, // i.e. Northwind
string tableName, // i.e. Employees
string schema1, // i.e. dbo
string schema2, // i.e. dboarchive
SqlConnection sqlConn)
{
var conn = new Microsoft.SqlServer.Management.Common.ServerConnection(sqlConn);
var server = new Microsoft.SqlServer.Management.Smo.Server(conn);
var db = new Microsoft.SqlServer.Management.Smo.Database(server, databaseName);
db.Tables.Refresh();
for (var itemId = 0; itemId < db.Tables.Count; itemId++)
{
var table = db.Tables.ItemById(itemId);
if (table.Name == tableName)
{
table.Schema = String.Format("{0}", DatabaseSchema.dboarchive);
table.Create();
}
}
}
The only issue I am currently running into is that my db variable always comes back with Tables.Count == 0.
If I get a chance to fix this, I will update.
For now, I've been told to remove this piece of code and check my code in.

SqLite C# extremely slow on update

I'm really struggling to iron out this issue. When I use the following code to update my database for large numbers of records it runs extremely slow. I've got 500,000 records to update which takes nearly an hour. During this operation, the journal file grows slowly with little change on the main SQLite db3 file - is this normal?
The operation only seems to be a problem when I have large numbers or records to update - it runs virtually instantly on smaller numbers of records.
Some other operations are performed on the database prior to this code running so could they be some how tying up the database? I've tried to ensure that all other connections are closed properly.
Thanks for any suggestions
using (SQLiteConnection sqLiteConnection = new SQLiteConnection("Data Source=" + _case.DatabasePath))
{
sqLiteConnection.Open();
using (SQLiteCommand sqLiteCommand = new SQLiteCommand("begin", sqLiteConnection))
{
sqLiteCommand.ExecuteNonQuery();
sqLiteCommand.CommandText = "UPDATE CaseFiles SET areaPk = #areaPk, KnownareaPk = #knownareaPk WHERE mhash = #mhash";
var pcatpk = sqLiteCommand.CreateParameter();
var pknowncatpk = sqLiteCommand.CreateParameter();
var pmhash = sqLiteCommand.CreateParameter();
pcatpk.ParameterName = "#areaPk";
pknowncatpk.ParameterName = "#knownareaPk";
pmhash.ParameterName = "#mhash";
sqLiteCommand.Parameters.Add(pcatpk);
sqLiteCommand.Parameters.Add(pknowncatpk);
sqLiteCommand.Parameters.Add(pmhash);
foreach (CatItem CatItem in _knownFiless)
{
if (CatItem.FromMasterHashes == true)
{
pcatpk.Value = CatItem.areaPk;
pknowncatpk.Value = CatItem.areaPk;
pmhash.Value = CatItem.mhash;
}
else
{
pcatpk.Value = CatItem.areaPk;
pknowncatpk.Value = null;
pmhash.Value = CatItem.mhash;
}
sqLiteCommand.ExecuteNonQuery();
}
sqLiteCommand.CommandText = "end";
sqLiteCommand.ExecuteNonQuery();
sqLiteCommand.Dispose();
sqLiteConnection.Close();
}
sqLiteConnection.Close();
}
The first thing to ensure that you have an index on mhash.
Group commands into batches.
Use more than one thread.
Or [inserted]
Bulk import the records to a temporary table. Create an index on the mhash column. Perform a single update statement to update the records.
You need to wrap everything inside a transaction otherwise I believe SQLite will create and commit one for you for every update ... hence the slowness. You clearly know that looking at your code but I am not sure using "Begin" and "End" commands achieve the same result here, you might end up with empty transaction at start and finish instead of one wrapping everything. Try something like this instead just in case:
using (SQLiteTransaction mytransaction = myconnection.BeginTransaction())
{
using (SQLiteCommand mycommand = new SQLiteCommand(myconnection))
{
SQLiteParameter myparam = new SQLiteParameter();
mycommand.CommandText = "YOUR QUERY HERE";
mycommand.Parameters.Add(myparam);
foreach (CatItem CatItem in _knownFiless)
{
...
mycommand.ExecuteNonQuery();
}
}
mytransaction.Commit();
}
This part is most certainly your problem.
foreach (CatItem CatItem in _knownFiless)
{
....
sqLiteCommand.ExecuteNonQuery();
}
You are looping a List(?) and executing a query against the database. That is not a good way to do it. Because database calls are quite expensive. So you might consider using another way of updating these items.
The SQL code appears to be okay. The C# code is not wrong, but it has some redundancy (explicit close/dispose is not needed since you're using a using already).
There is a for loop on _knownFiless (intended with double s?), could that run slowly possibly? It is unusual to run a query in a for loop against the DB, rather you should create a query with the respective set of parameters. Consider that (especially without an index on the hash) you will perform n * m operations (n being the run count of the for loop, m being the table size).
Considering that m is around 500k, and assuming that m = n you will get 250,000,000,000 operations. That may well last an hour.
Former connections or operations should have no effect as far as I know.
You should also ensure that the internal structure of the database is not causing problems. Is there a compound index that is affected from this operation? Any foreign keys / complex contraints?

How to optimise LinqToSQL in c#

i am attempting to update approximately 150,000 records in a table in SQL 2005 using linq2sql. When it comes to xx.SubmitChanges() it is taking about 45 minutes.
I am running sql as a local instance on a quad core pc.
Does anyone know why this is taking so long? or is that normal?
Code Sample:
var y = db.x.Where(j => j.NumberOfOrders > 0).Select(k => k);
foreach (var item in y)
{
try
{
item.k = "bla";
}
catch (Exception ex)
{
//
}
}
db.SubmitChanges();
this will take much time there is no bulk insert in linq to sql.In this case it is inserting one by one record to your context and finally its goes and save in your database when you call SubmitChanges().So it is taking time.
If you have big record like 150,000 records. Better to use Bulk insert in sql.This will take only fraction of seconds only to insert .
You don't need the Select() because it is projecting the same thing as the Where()
And there's no need for using try-catch for just a simple assigment.
But definitely the best thing to do is the Bulk Insert Stuff that anishmarokey is talking about
A large update such as this would be done with an UPDATE query (or stored proc) that can use the database to do the heavy lifting (and transaction management/consistency). I know you're simplifying the example, but what about something like this:
string CommandText = "UPDATE x SET k = #k WHERE NumberOfOrders > 0";
using (SqlConnection conn = new SqlConnection(My.Settings.DatabaseConnection)) {
using (SqlCommand cmd = new SqlCommand(CommandText, conn)) {
cmd.Parameters.AddWithValue("#k", "bla");
conn.Open();
cmd.ExecuteNonQuery();
}
}

What am I doing wrong with this query?

I can't seem to find why this function doesn't insert records into the database. :(
I get no error messages or whatsoever, just nothing in the database.
EDIT: this is how my query looks now .. still nothing ..
connection.Open();
XmlNodeList nodeItem = rssDoc.SelectNodes("/edno23/posts/post");
foreach (XmlNode xn in nodeItem)
{
cmd.Parameters.Clear();
msgText = xn["message"].InnerText;
C = xn["user_from"].InnerText;
avatar = xn["user_from_avatar"].InnerText;
string endhash = GetMd5Sum(msgText.ToString());
cmd.Parameters.Add("#endhash",endhash);
cmd.CommandText = "Select * FROM posts Where hash=#endhash";
SqlCeDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
string msgs = reader["hash"].ToString();
if (msgs != endhash || msgs == null)
{
sql = "INSERT INTO posts([user],msg,avatar,[date],hash) VALUES(#username,#messige,#userpic,#thedate,#hash)";
cmd.CommandText = sql;
cmd.Parameters.Add("#username", C);
cmd.Parameters.Add("#messige", msgText.ToString());
cmd.Parameters.Add("#userpic", avatar.ToString());
cmd.Parameters.Add("#thedate", dt);
cmd.Parameters.Add("#hash", endhash);
cmd.ExecuteNonQuery();// executes query
adapter.Update(data);// saves the changes
}
}
reader.Close();
}
connection.Close();
Does nodeItem actually have any items in it? If not, the contents of the foreach loop aren't being executed.
What's the adapter and data being used for? The queries and updates seem be done via other commands and readers.
What does 'hash' actually contain? If it's a hash, why are you hashing the content of the hash inside the while loop? If not, why is it being compared against a hash in the query SELECT * FROM posts WHERE hash = #endhash?
Won't closing the connection before the end of the while loop invalidate the reader used to control the loop?
Lots of things going on here...
You are using the command 'cmd' to loop over records with a datareader, and then using the same 'cmd' command inside the while statement to execute an insert statement. You declared another command 'cmdAdd' before but don't seem to use it anywhere; is that what you intended to use for the insert statement?
You also close your data connection inside the while loop that iterates over your datareader. You are only going to read one record and then close the connection to your database that way; if your conditional for inserting is not met, you're not going to write anything to the database.
EDIT:
You really should open and close the connection to the database outside the foreach on the xmlnodes. If you have 10 nodes to loop over, the db connection is going to be opened and closed 10 times (well, connection pooling will probably prevent that, but still...)
You are also loading the entire 'posts' table into a dataset for seemingly no reason. You're not changing any of the values in the dataset yet you are calling an update on it repeatedly (at "save teh shanges"). If the 'posts' table is even remotely large, this is going to suck a lot of memory for no reason (on a handheld device, no less).
Is anything returned from "Select * FROM posts Where hash=#endhash"?
If not, nothing inside the while loop matters....
Why are you closing the Database Connection inside the while loop?
The code you posted should throw an exception when you try to call cmd.ExecuteNonQuery() with an unopen DB connection object.
SqlCeCommand.ExecuteNonQuery() method returns the number of rows affected.
Why don't you check whether it is returning 1 or not in the debugger as shown below?
int rowsAffectedCount = cmd.ExecuteNonQuery();
Hope it helps :-)
You've got some issues with not implementing "using" blocks. I've added some to your inner code below. The blocks for the connection and select command are more wishful thinking on my part. I hope you're doing the same with the data adapter.
using (var connection = new SqlCeConnection(connectionString))
{
connection.Open();
var nodeItem = rssDoc.SelectNodes("/edno23/posts/post");
foreach (XmlNode xn in nodeItem)
{
using (
var selectCommand =
new SqlCeCommand(
"Select * FROM posts Where hash=#endhash",
connection))
{
var msgText = xn["message"].InnerText;
var c = xn["user_from"].InnerText;
var avatar = xn["user_from_avatar"].InnerText;
var endhash = GetMd5Sum(msgText);
selectCommand.Parameters.Add("#endhash", endhash);
selectCommand.CommandText =
"Select * FROM posts Where hash=#endhash";
using (var reader = selectCommand.ExecuteReader())
{
while (reader.Read())
{
var msgs = reader["hash"].ToString();
if (msgs == endhash && msgs != null)
{
continue;
}
const string COMMAND_TEXT =
"INSERT INTO posts([user],msg,avatar,[date],hash) VALUES(#username,#messige,#userpic,#thedate,#hash)";
using (
var insertCommand =
new SqlCeCommand(
COMMAND_TEXT, connection))
{
insertCommand.Parameters.Add("#username", c);
insertCommand.Parameters.Add(
"#messige", msgText);
insertCommand.Parameters.Add(
"#userpic", avatar);
insertCommand.Parameters.Add("#thedate", dt);
insertCommand.Parameters.Add(
"#hash", endhash);
insertCommand.ExecuteNonQuery();
// executes query
}
adapter.Update(data); // saves teh changes
}
reader.Close();
}
}
}
connection.Close();
}
Of course with the additional nesting, parts should be broken out as separate methods.
I suspect your problem is that you're trying to reuse the same SqlCeCommand instances.
Try making a new SqlCeCommand within the while loop. Also, you can use the using statement to close your data objects.
Why are you calling adapter.Update(data) since you're not changing the DataSet at all? I suspect you want to call adapter.Fill(data). The Update method will save any changes in the DataSet to the database.
How to debug programs: http://www.drpaulcarter.com/cs/debug.php
Seriously, can you post some more information about where it's working? Does it work if you use SQL Server Express instead of SQL CE? If so, can you break out SQL Profiler and take a look at the SQL commands being executed?

Categories