Million inserts: SqlBulkCopy timeout - c#

We already have a running system that handles all connection-strings (db2, oracle, MSServer).
Currently, We are using ExecuteNonQuery() to do some inserts.
We want to improve the performance, by using SqlBulkCopy() instead of ExecuteNonQuery(). We have some clients that have more than 50 million records.
We don't want to use SSIS, because our system supports multiple databases.
I created a sample project to test the performance of SqlBulkCopy(). I created a simple read and insert function for MSServer
Here's the small function:
public void insertIntoSQLServer()
{
using (SqlConnection SourceConnection = new SqlConnection(_sourceConnectionString))
{
//Open the connection to get the data from the source table
SourceConnection.Open();
using (SqlCommand command = new SqlCommand("select * from " + _sourceSchemaName + "." + _sourceTableName + ";", SourceConnection))
{
//Read from the source table
command.CommandTimeout = 2400;
SqlDataReader reader = command.ExecuteReader();
using (SqlConnection DestinationConnection = new SqlConnection(_destinationConnectionString))
{
DestinationConnection.Open();
//Clean the destination table
new SqlCommand("delete from " + _destinationSchemaName + "." + _destinationTableName + ";", DestinationConnection).ExecuteNonQuery();
using (SqlBulkCopy bc = new SqlBulkCopy(DestinationConnection))
{
bc.DestinationTableName = string.Format("[{0}].[{1}]", _destinationSchemaName, _destinationTableName);
bc.NotifyAfter = 10000;
//bc.SqlRowsCopied += bc_SqlRowsCopied;
bc.WriteToServer(reader);
}
}
}
}
}
When I have less that 200 000 in my dummyTable the bulk copy is working fine. But, when it's over 200 000 records, I have the following errors:
Attempt to invoke bulk copy on an object that has a pending operation.
OR
The wait operation timed out (for the IDataReader)
I increased the CommandTimeout for the reader. It seems that it has solved the timeout issue related to IDataReader.
Am I doing something wrong in the code?

Can you try adding the following before the call to WriteToServer ...
bc.BatchSize = 10000;
bc.BulkCopyTimeout = 0;
I don't know what the default batch size or timeout is, but I suspect this might be your issue.
Hope that helps
Also, you can try playing with different Batch Sizes for optimal performance.

You can try this
bc.BatchSize = 100000; // How many Rows you want to insert at a time
bc.BulkCopyTimeout = 60; // Time in Seconds. If you want infinite waiting Time then assign 0.

Related

Timeout expired with SQL Server insert

I am trying to insert 200,000 documents from a folder into the SQL Server database into a varbinary column. I get a timeout expiration message after inserting 80,000 documents. Average file size is about 250kb and max file size is 50MB. I am running this c# program on the server where the database is located.
Please suggest.
The error:
The timeout period elapsed prior to completion of the operation or the server is not responding.
The code:
string spath = #"c:\documents";
string[] files = Directory.GetFiles(spath, "*.*", SearchOption.AllDirectories);
Console.Write("Files Count:" + files.Length);
using (SqlConnection con = new SqlConnection(connectionString))
{
con.Open();
string insertSQL = "INSERT INTO table_Temp(doc_content, doc_path) values(#File, #path)";
SqlCommand cmd = new SqlCommand(insertSQL, con);
var pFile = cmd.Parameters.Add("#File", SqlDbType.VarBinary, -1);
var pPath = cmd.Parameters.Add("#path", SqlDbType.Text);
var tran = con.BeginTransaction();
var fn = 0;
foreach (string docPath in files)
{
string newPath = docPath.Remove(0, spath.Length);
string archive = new DirectoryInfo(docPath).Parent.Name;
fn += 1;
using (var stream = new FileStream(docPath, FileMode.Open, FileAccess.Read))
{
pFile.Value = stream;
pPath.Value = newPath;
cmd.Transaction = tran;
cmd.ExecuteNonQuery();
if (fn % 10 == 0)
{
tran.Commit();
tran = con.BeginTransaction();
Console.Write("|");
}
Console.Write(".");
}
}
tran.Commit();
}
For this, I would suggest using SQLBulkCopy methods, since it should be able to handle data insertion much more easily. Further, as others have pointed out, you might want to increase the timeout condition for your command.
While I would agree this may be best for a bulk copy of some sort, if you must do this in the c# program, your only option is probably to increase the timeout value. You can do this after your SqlCommand object has been created via cmd.CommandTimeout = <new timeout>; The property CommandTimeout is an integer representing the number seconds for the timeout, or zero if you never want it to timeout.
See the MSDN docs for details
You should be able to set the timeout for the transaction directly on that object in your application code, this way you are not changing sql server settings.
Secondly, you can build batches in your application also. You say you can get 80k docs before a timeout, set your batch size at 50k, process them, commit them, grab the next batch. Having your application manage batching also allows you to catch sql errors, such as timeout, and then dynamically adjust the batch size and retry without ever crashing. This is the entire reason for writing your application in the first place, other wise you could just use the wizard in management studio and manually insert your files.
I highly recommend batching over other options.
#Shubham Pandey also provides a great link to SQL bulk copy info, which also has links to more info. You should definitely experiment with the bulk copy class and see if you can get additional gains with it as well.

C# Low in memory and Insert into Oracle DB data - 1 million rows - OutOfMemory Exception

The Problem:
I have a web application where people can upload xml, xmls, csv files.
I then take their content and insert it into my Oracle DB.
Technical details:
I recently had a problem where I get OutOfMemory Exception trying to use the data.
The previous developer created a list of lists on the data in order to manage them. However, this is giving us OutOfMemory Exception.
We are using the LinqToExcel library.
Sample code:
excel = new ExcelQueryFactory(excelFile);
IEnumerable<RowNoHeader> data = from row in excel.WorksheetNoHeader(sheetName)
select row;
List<List<string>> d = new List<List<string>>(data.Count());
foreach (RowNoHeader row in data)
{
List<string> list = new List<string>();
foreach (Cell cell in row)
{
string cellValue = cell.Value.ToString().Trim(' ').Trim(null);
list.Add(cellValue);
}
d.Add(list);
}
I have tried to change the code and instead did this:
string connectionstring = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0;HDR=YES;';", excelFile);
OleDbConnection connection = new OleDbConnection();
connection.ConnectionString = connectionstring;
OleDbCommand excelCommand = new OleDbCommand();
excelCommand.Connection = connection;
excelCommand.CommandText = String.Format("Select * FROM [{0}$]", sheetName);
connection.Open();
DataTable dtbl = CreateTable(TableColumns);
OleDbDataReader reader = excelCommand.ExecuteReader();
while (reader.Read())
{
DataRow row = dtbl.NewRow();
dtbl.Rows.Add(row);
}
using (OracleCommand command = new OracleCommand(selectCommand, _oracleConnection))
{
using (OracleDataAdapter adapter = new OracleDataAdapter(command))
{
using (OracleCommandBuilder builder = new OracleCommandBuilder(adapter))
{
OracleTransaction trans = _oracleConnection.BeginTransaction();
command.Transaction = trans;
adapter.InsertCommand = builder.GetInsertCommand(true);
adapter.Update(dtbl);
trans.Commit();
}
}
}
However, I still get the same OutOfMemory Exception.
I have read online and've seen that I should make my project x64 and use the following:
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
However, I can't change my web application to run on x64.
My solution was to make this in batches like this:
int rowCount = 0;
while (reader.Read())
{
DataRow row = dtbl.NewRow();
dtbl.Rows.Add(row);
if (rowCount % _batches == 0 && rowCount != 0)
{
DBInsert(dtbl, selectCommand);
dtbl = CreateTable(TableColumns);
}
}
private void DBInsert(DataTable dt, string selectCommand)
{
using (OracleCommand command = new OracleCommand(selectCommand, _oracleConnection))
{
using (OracleDataAdapter adapter = new OracleDataAdapter(command))
{
using (OracleCommandBuilder builder = new OracleCommandBuilder(adapter))
{
OracleTransaction trans = _oracleConnection.BeginTransaction();
command.Transaction = trans;
adapter.InsertCommand = builder.GetInsertCommand(true);
adapter.Update(dt);
trans.Commit();
}
}
}
}
}
It works, however this is very slow. I was wondering if there is a way to either solve the problem with the memory serially or write in memory in parallel.
I have tried to insert the data in parallel using threads but this takes a lot of memory and throws OutOfMemory Exception as well.
Just don't load 1M rows into a DataTable. Use whatever bulk import mechanism is available to load a stream of rows. Oracle, like SQL Server offers several ways to bulk import data.
Collections like List or DataTable use an internal buffer to store data that they reallocate when it fills up, using twice the original size. With 1M rows that leads to a lot of reallocations and a lot of memory fragmentation. The runtime may no longer be able to even find a contiguous block of memory large enough to store 2M entries. That's why it's important to set the capacity parameter when creating a new List.
Apart from that, it doesn't serve any purpose to load everything in memory and then send it to the database. It's actually faster to send the data as soon as each file is read, or as soon as a sufficiently large number is loaded. Instead of trying to load 1M rows at once, read 500 or 1000 of them each time and send them to the database.
Furthermore, Oracle's ADO.NET provider includes the OracleBulkCopy class that works in a way similar to SqlBulkCopy for SQL Server. The WriteToServer method can accept a DataTable or a DataReader. You can use the DataTable overload to send batches of item. An even better idea is to use the overload that accepts a reader and have the class collect the batch and send it to the database.
Eg :
using(var bcp=OracleBulkCopy(connectionString))
{
bcp.BatchSize=5000;
bcp.DestinationTableName = "MyTable";
//For each source/target column pair, add a mapping
bcp.ColumnMappings.Add("ColumnA","ColumnA");
var reader = excelCommand.ExecuteReader();
bcp.WriteToServer(reader);
}

DataAdapter.Update() performance

I have a relatively simply routine that looks at database entries for media files, calculates the width, height and filesize, and writes them back into the database.
The database is SQLite, using the System.Data.SQLite library, processing ~4000 rows. I load all rows into an ADO table, update the rows/columns with the new values, then run adapter.Update(table); on it.
Loading the dataset from the db tables half a second or so, updating all the rows with image width/height and getting the file length from FileInfo took maybe 30 seconds. Fine.
The adapter.Update(table); command took somewhere in the vicinity of 5 to 7 minutes to run.
That seems awfully excessive. The ID is a PK INTEGER and thus - according to SQLite's docs, is inherently indexed, yet even so I can't help but think that if I were to run a separate update command for each individual update, this would have completed much faster.
I had considered ADO/adapters to be relatively low level (as opposed to ORMs anyway), and this terrible performance surprised me. Can anyone shed some light on why it would take 5-7 minutes to update a batch of ~4000 records against a locally placed SQLite database?
As a possible aside, is there some way to "peek into" how ADO is processing this? Internal library stepthroughs or...??
Thanks
public static int FillMediaSizes() {
// returns the count of records updated
int recordsAffected = 0;
DataTable table = new DataTable();
SQLiteDataAdapter adapter = new SQLiteDataAdapter();
using (SQLiteConnection conn = new SQLiteConnection(Globals.Config.dbAppNameConnectionString))
using (SQLiteCommand cmdSelect = new SQLiteCommand())
using (SQLiteCommand cmdUpdate = new SQLiteCommand()) {
cmdSelect.Connection = conn;
cmdSelect.CommandText =
"SELECT ID, MediaPathCurrent, MediaWidth, MediaHeight, MediaFilesizeBytes " +
"FROM Media " +
"WHERE MediaType = 1 AND (MediaWidth IS NULL OR MediaHeight IS NULL OR MediaFilesizeBytes IS NULL);";
cmdUpdate.Connection = conn;
cmdUpdate.CommandText =
"UPDATE Media SET MediaWidth = #w, MediaHeight = #h, MediaFilesizeBytes = #b WHERE ID = #id;";
cmdUpdate.Parameters.Add("#w", DbType.Int32, 4, "MediaWidth");
cmdUpdate.Parameters.Add("#h", DbType.Int32, 4, "MediaHeight");
cmdUpdate.Parameters.Add("#b", DbType.Int32, 4, "MediaFilesizeBytes");
SQLiteParameter param = cmdUpdate.Parameters.Add("#id", DbType.Int32);
param.SourceColumn = "ID";
param.SourceVersion = DataRowVersion.Original;
adapter.SelectCommand = cmdSelect;
adapter.UpdateCommand = cmdUpdate;
try {
conn.Open();
adapter.Fill(table);
conn.Close();
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e, true);
throw new DatabaseOperationException("", e);
}
foreach (DataRow row in table.Rows) {
try {
using (System.Drawing.Image img = System.Drawing.Image.FromFile(row["MediaPathCurrent"].ToString())) {
System.IO.FileInfo fi;
fi = new System.IO.FileInfo(row["MediaPathCurrent"].ToString());
if (img != null) {
int width = img.Width;
int height = img.Height;
long length = fi.Length;
row["MediaWidth"] = width;
row["MediaHeight"] = height;
row["MediaFilesizeBytes"] = (int)length;
}
}
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e);
DevUtil.Print(e);
continue;
}
}
try {
recordsAffected = adapter.Update(table);
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e);
throw new DatabaseOperationException("", e);
}
}
return recordsAffected;
}
Use Connection.BeginTransaction() to speed up the DataAdapter update.
conn.Open() 'open connection
Dim myTrans As SQLiteTransaction
myTrans = conn.BeginTransaction()
'Associate the transaction with the select command object of the DataAdapter
objDA.SelectCommand.Transaction = myTrans
objDA.Update(objDT)
Try
myTrans.Commit()
Catch ex As Exception
myTrans.Rollback()
End Try
conn.Close()
This vastly speeds up the update.
Loading the dataset from the db tables half a second or so
This is a single SQL statement (so it's fast). Excute SQL SELECT, populate the dataset, done.
updating all the rows with image width/height and getting the file
length from FileInfo took maybe 30 seconds. Fine.
This is updating the in memory data (so that's fast too), change x row in the dataset, don't talk to SQL at all.
The adapter.Update(table); command took somewhere in the vicinity of 5
to 7 minutes to run.
This will run a SQL update for every updated row. Which is why it's slow.
yet even so I can't help but think that if I were to run a separate
update command for each individual update, this would have completed
much faster.
This is basically what it's doing anyway!
From MSDN
The update is performed on a by-row basis. For every inserted,
modified, and deleted row, the Update method determines the type of
change that has been performed on it (Insert, Update or Delete).
Depending on the type of change, the Insert, Update, or Delete command
template executes to propagate the modified row to the data source.
When an application calls the Update method, the DataAdapter examines
the RowState property, and executes the required INSERT, UPDATE, or
DELETE statements iteratively for each row, based on the order of the
indexes configured in the DataSet.
is there some way to "peek into" how ADO is processing this?
Yes: Debug .NET Framework Source Code in Visual Studio 2012?

Efficiently inserting multiple records in oracle db using Oracle client

I've C#-WCF/Winforms application.It inserts 450,000 plus records in a staging table in Oracle database using Oracle Client API and a stored proc having a simple insert query.
Its taking about 15 minutes to insert records in db and sometimes the records dont get inserted too..giving all sorts of timeout errors at the wcf side.
Is there any efficient way of doing these inserts?
Thanks for reading.
Here's my code which does the batch insert:
OracleTransaction tran = null;
UpdateRowSource oldURS = this.cmd.UpdatedRowSource;
OracleCommand oldCmd = this.dbAdapter.InsertCommand;
int oldUBS = this.dbAdapter.UpdateBatchSize;
try
{
SetOutputParams();
this.OpenDBConnection();
tran = this.dbConn.BeginTransaction();
this.cmd.Transaction = tran;
this.cmd.UpdatedRowSource = UpdateRowSource.OutputParameters;
this.dbAdapter.InsertCommand = this.cmd;
this.dbAdapter.UpdateBatchSize = size;
this.dbAdapter.Update(data);
tran.Commit();
SetOutputParamValues();
}
catch (OracleException ex)
{
if (tran != null) {
tran.Rollback();
}
throw;
}
finally
{
this.CloseDBConnection();
this.cmd.Parameters.Clear();
this.cmd.UpdatedRowSource = oldURS;
this.dbAdapter.InsertCommand = oldCmd;
this.dbAdapter.UpdateBatchSize = oldUBS;
}
}
The fastest way to load data into a table is datapump (the impdp utility).
Another fast way is SQL*Loader.
If you want to stick with C#, look into bulk operations.
My google karma found the following examples
I have used SQL*Loader frequently from C# to bulk load records to stage.
The meat of the code goes like this:
public string RunSqlLdr(string user, string password, string dsn, string fileNameCtl, string fileNameLog)
{
// Redirect both streams so we can write/read them.
var cmdProcessInfo = new System.Diagnostics.ProcessStartInfo("cmd.exe")
{
RedirectStandardInput = true,
RedirectStandardOutput = true,
UseShellExecute = false
};
// Start the process.
var process = System.Diagnostics.Process.Start(cmdProcessInfo);
// Issue the sqlldr command and exit.
process.StandardInput.WriteLine("cd " + _directoryPath);
process.StandardInput.WriteLine("sqlldr " + user + "/" + password + "#" + dsn + " control=" + fileNameCtl + " log=" + fileNameLog);
process.StandardInput.WriteLine("exit");
// Read all the output generated from it.
var output = process.StandardOutput.ReadToEnd();
process.Dispose();
return output;
}
This will return the output from the command line, but you'll also want to check the generated log file for records loaded and error counts etc.
I am inserting a lot of data into an Oracle database that is located in Australia, far away form where I am running the client application in C#.
Here's the summary of how I am doing it. It amazes me how fast I insert hundreds of thousands of records fast using array binding.
This is not the exact code, but you get the idea:
using System.Data.OleDb;
int numRecords = 2;
int[] DISTRIBNO = new int[numRecords];
DISTRIBNO[0] = 100;
DISTRIBNO[1] = 101;
string sql = "INSERT INTO Distributors (distribno) VALUES (:DISTRIBNO)";
cnn = new Oracle.DataAccess.Client.OracleConnection(conString);
cnn.Open();
using (Oracle.DataAccess.Client.OracleCommand cmd = cnn.CreateCommand())
{
cmd.CommandText = sql;
cmd.CommandType = CommandType.Text;
cmd.BindByName = true;
// To use ArrayBinding, we need to set ArrayBindCount
cmd.ArrayBindCount = numRecords;
cmd.CommandTimeout = 0;
cmd.Parameters.Add(
":DISTRIBNO",
Oracle.DataAccess.Client.OracleDbType.Int32,
BR_VOLMONTH,
ParameterDirection.Input);
cmd.ExecuteNonQuery();
}//using
Carlos Merighe.
You will probably have to abandon "DataAdapter" code to get any real performance.
Here is the dotnet class that seems to be on target.
Oracle.DataAccess.Client.OracleBulkCopy
OracleBulkCopy Class
An OracleBulkCopy object efficiently bulk loads or copies data into an Oracle table from another data source.
https://docs.oracle.com/cd/E11882_01/win.112/e23174/OracleBulkCopyClass.htm#ODPNT7446
https://docs.oracle.com/cd/E85694_01/ODPNT/BulkCopyCtor3.htm
.........
This is from the (oracle owned) ODP.NET.
See
https://www.nuget.org/packages/Oracle.ManagedDataAccess/

Retrieve & update 20000 data records stops working

I am using the following code do get all data records from a MS SQL database and I try to update every single record. The code is used in a WebService. The issue is, that the code runs fine if I have 1000 data records but now I have 20000 data records an the code first returned with an timeout. Then I set the cmd.CommandTimeout to zero to have no timeout. Now when I invoke the function in the IE WebSvc the IE window is still blank and still try to load something but nothing happens. Only 150 datarecords are updated.
Do you have any idea where the issue might be ? Is the code not the best, so what should I change ?
Thank you very much!
WorldSignia
MyCode:
private string AddNewOrgBez()
{
try
{
SqlConnection sqlconn = new SqlConnection(this.connectionString);
SqlCommand cmd;
SqlDataReader reader;
sqlconn.Open();
cmd = new SqlCommand("SELECT * FROM dbo.mydata", sqlconn);
cmd.CommandTimeout = 0;
reader = cmd.ExecuteReader();
while (reader.Read())
{
// Felder holen
string okuerzel = reader["O_KURZ"].ToString();
string bezeichnung = reader["O_BEZ"].ToString();
string[] lines = CreateNewOrgBez(bezeichnung);
string sqlcmd = "UPDATE dbo.mydata SET WEB_OBEZ1 = '" + lines[0] + "', WEB_OBEZ2 = '" + lines[1] + "', WEB_OBEZ3 = '" + lines[2] + "' WHERE O_KURZ = '" + okuerzel + "'";
SqlConnection sqlconn2 = new SqlConnection(this.connectionString);
sqlconn2.Open();
SqlCommand cmd2 = new SqlCommand(sqlcmd, sqlconn2);
cmd2.CommandTimeout = 0;
cmd2.ExecuteNonQuery();
sqlconn2.Close();
}
reader.Close();
sqlconn.Close();
return "OK";
}
catch (Exception ex)
{
return ex.Message;
}
}
You are leaking every SqlCommand here - I suggest you review your use of SqlClient classes to find the ones that are IDisposable and restructure your code to ensure they are always freed, using the using construct.
For example, this ensures Dispose gets called even if there is an exception in the bracketed code:
using (SqlCommand cmd2 = new SqlCommand(sqlcmd, sqlconn2))
{
cmd2.CommandTimeout = 0;
cmd2.ExecuteNonQuery();
}
Using a new SqlConnection for every UPDATE is expensive too, this should be done outside the loop. Redundant connection establishment is likely the explanation for your timeout.
Take note of #ck's comment that for efficiency this type of piecemeal client-side operation is not as good as doing the heavy lifting server-side. You should be able to get this code working better, but that does not mean it's the ideal/fastest solution.
I found the issue.
You need first to get all the data records, for example in a new DataTable. The structure I used does not work, because it reads data from the database and also updates the database. After changing it to a new structure it works.
You were using two different connections to read and to update, and one of them was blocking another. This is why when you read all your data first, it began to work.
I doubt if your running into OutOfMemoryException. Can you profile your application and check the memory usage?
Since you are just overwriting the variables in While loop, why don't you try taking them out of the loop.

Categories