Timeout expired with SQL Server insert - c#

I am trying to insert 200,000 documents from a folder into the SQL Server database into a varbinary column. I get a timeout expiration message after inserting 80,000 documents. Average file size is about 250kb and max file size is 50MB. I am running this c# program on the server where the database is located.
Please suggest.
The error:
The timeout period elapsed prior to completion of the operation or the server is not responding.
The code:
string spath = #"c:\documents";
string[] files = Directory.GetFiles(spath, "*.*", SearchOption.AllDirectories);
Console.Write("Files Count:" + files.Length);
using (SqlConnection con = new SqlConnection(connectionString))
{
con.Open();
string insertSQL = "INSERT INTO table_Temp(doc_content, doc_path) values(#File, #path)";
SqlCommand cmd = new SqlCommand(insertSQL, con);
var pFile = cmd.Parameters.Add("#File", SqlDbType.VarBinary, -1);
var pPath = cmd.Parameters.Add("#path", SqlDbType.Text);
var tran = con.BeginTransaction();
var fn = 0;
foreach (string docPath in files)
{
string newPath = docPath.Remove(0, spath.Length);
string archive = new DirectoryInfo(docPath).Parent.Name;
fn += 1;
using (var stream = new FileStream(docPath, FileMode.Open, FileAccess.Read))
{
pFile.Value = stream;
pPath.Value = newPath;
cmd.Transaction = tran;
cmd.ExecuteNonQuery();
if (fn % 10 == 0)
{
tran.Commit();
tran = con.BeginTransaction();
Console.Write("|");
}
Console.Write(".");
}
}
tran.Commit();
}

For this, I would suggest using SQLBulkCopy methods, since it should be able to handle data insertion much more easily. Further, as others have pointed out, you might want to increase the timeout condition for your command.

While I would agree this may be best for a bulk copy of some sort, if you must do this in the c# program, your only option is probably to increase the timeout value. You can do this after your SqlCommand object has been created via cmd.CommandTimeout = <new timeout>; The property CommandTimeout is an integer representing the number seconds for the timeout, or zero if you never want it to timeout.
See the MSDN docs for details

You should be able to set the timeout for the transaction directly on that object in your application code, this way you are not changing sql server settings.
Secondly, you can build batches in your application also. You say you can get 80k docs before a timeout, set your batch size at 50k, process them, commit them, grab the next batch. Having your application manage batching also allows you to catch sql errors, such as timeout, and then dynamically adjust the batch size and retry without ever crashing. This is the entire reason for writing your application in the first place, other wise you could just use the wizard in management studio and manually insert your files.
I highly recommend batching over other options.
#Shubham Pandey also provides a great link to SQL bulk copy info, which also has links to more info. You should definitely experiment with the bulk copy class and see if you can get additional gains with it as well.

Related

Reducing CPU Load on a remote select from a database

I am remotely selecting results from a custom production database with a criteria of around three minutes from a C# application.
Every time the select command is executed, the server PC that I am using CPU goes up to around 50%. But surely, the load should be on the database that I am connecting to?
Why would the C# application rocket to 50% until the data is retrieved for reading?
Some background
I have worked out from debugging that the Select statement on the
remote database takes around 30-40 seconds, baring in mind I am selecting with a criteria that uses the indexed column.
At the same time of selecting data from the remote DB, I have monitored the TaskManager and the CPU sits at 50% until the select as complete.. this can last around 30-40 seconds each loop.
If I select in the native sql engine for the remote DB, there is no lag on the select, the data (if any) is returned immediately.
I know its not the parsing of result set that is taking up the CPU load as some selects will return nothing.
Here is some code I am using.
OdbcConnection remoteConn = new OdbcConnection(ConfigurationManager.ConnectionStrings["remoteConnectionString"].ToString());
remoteConn.Open();
OdbcCommand remoteCommand = new OdbcCommand();
remoteCommand.Connection = remoteConn;
using (remoteConn)
{
string localSql = "";
string remoteSql = "select * from tracking where last_update > 212316247440000000"; // Julian No = 2015-07-12 11:24:00
remoteCommand.CommandText = remoteSql;
OdbcDataReader remoteReader;
remoteReader = remoteCommand.ExecuteReader();
while (remoteReader.Read())
{
for (int i = 0; i < 68; i++)
{
localSql += ",'" + remoteReader[i].ToString() + "'";
}
}
}
I ran a performance and diagnostic test on the application and it yielded this result.
How, if any, can I reduce this CPU load or even eradicate it completely. It is completely out of the ordinary and I have no clue on how to go about it.
Thanks
Thanks for the information, Dan. Here are my thoughts...
I believe the primary reason why your app is consuming so much CPU is because of the drivers you are using.
Since you are connecting to an SQL Server database, you should use the SQL Server drivers which know how to optimize transportation of data between client and server.
To use the appropriate drivers, make sure you use the SqlConnection, SqlCommand, etc.
This will allow SQL Server to stream the results to your client as you query the data reader.
Secondly, do not use the ExecuteReader() method on the DbCommand object. One of the many wonderful features unique to the SQL Server drivers is the ExecuteReaderAsync() method.
Since this command is an IO-bound operation (not compute-bound) then there is no need to block the calling thread. When the result come back they will arrive on an IO completion thread.
Here is a code sample of what your code might look like after the change.
using (var remoteConn = new SqlConnection(ConfigurationManager.ConnectionStrings["remoteConnectionString"].ToString()))
{
remoteConn.Open();
using (var remoteCommand = new SqlCommand())
{
remoteCommand.Connection = remoteConn;
string localSql = "";
string remoteSql = "select * from tracking where last_update > 212316247440000000"; // Julian No = 2015-07-12 11:24:00
remoteCommand.CommandText = remoteSql;
var remoteReader = await remoteCommand.ExecuteReaderAsync();
while (remoteReader.Read())
{
for (int i = 0; i < 68; i++)
{
localSql += ",'" + remoteReader[i].ToString() + "'";
}
}
}
}
I am assuming that the select command is run from a separate computer than the one hosting the SQL DB.
Frankly it is a bit baffling. The only thing that comes to my mind is that it might be taking time passing and processing the metadata. Have you tried changing your query to return say only one piece of data?
For example:
select top 1 last_update from tracking where last_update > 212316247440000000
select count (*) from tracking where last_update > 212316247440000000
This will clarify if the data+metadata is causing the issue.
Could it be parsing of the result? Are you using some sort of datalayer that automatically parse resultsets into domain objects? That could be very CPU intense..
The best way for your IO Bound operations is using the Task, becouse it does not need to change existing code:
var task = Task.Factory.StartNew(() => {
//your code or #Randy code with little change here:
using (var remoteConn = new SqlConnection(ConfigurationManager.ConnectionStrings["remoteConnectionString"].ToString()))
{
remoteConn.Open();
using (var remoteCommand = new SqlCommand())
{
remoteCommand.Connection = remoteConn;
string localSql = "";
string remoteSql = "select * from tracking where last_update > 212316247440000000"; // Julian No = 2015-07-12 11:24:00
remoteCommand.CommandText = remoteSql;
var remoteReader = remoteCommand.ExecuteReader();
while (remoteReader.Read())
{
for (int i = 0; i < 68; i++)
{
localSql += ",'" + remoteReader[i].ToString() + "'";
}
}
}
}
});
//use can use 'task.wait();' to waiting for complete the task.

DataAdapter.Update() performance

I have a relatively simply routine that looks at database entries for media files, calculates the width, height and filesize, and writes them back into the database.
The database is SQLite, using the System.Data.SQLite library, processing ~4000 rows. I load all rows into an ADO table, update the rows/columns with the new values, then run adapter.Update(table); on it.
Loading the dataset from the db tables half a second or so, updating all the rows with image width/height and getting the file length from FileInfo took maybe 30 seconds. Fine.
The adapter.Update(table); command took somewhere in the vicinity of 5 to 7 minutes to run.
That seems awfully excessive. The ID is a PK INTEGER and thus - according to SQLite's docs, is inherently indexed, yet even so I can't help but think that if I were to run a separate update command for each individual update, this would have completed much faster.
I had considered ADO/adapters to be relatively low level (as opposed to ORMs anyway), and this terrible performance surprised me. Can anyone shed some light on why it would take 5-7 minutes to update a batch of ~4000 records against a locally placed SQLite database?
As a possible aside, is there some way to "peek into" how ADO is processing this? Internal library stepthroughs or...??
Thanks
public static int FillMediaSizes() {
// returns the count of records updated
int recordsAffected = 0;
DataTable table = new DataTable();
SQLiteDataAdapter adapter = new SQLiteDataAdapter();
using (SQLiteConnection conn = new SQLiteConnection(Globals.Config.dbAppNameConnectionString))
using (SQLiteCommand cmdSelect = new SQLiteCommand())
using (SQLiteCommand cmdUpdate = new SQLiteCommand()) {
cmdSelect.Connection = conn;
cmdSelect.CommandText =
"SELECT ID, MediaPathCurrent, MediaWidth, MediaHeight, MediaFilesizeBytes " +
"FROM Media " +
"WHERE MediaType = 1 AND (MediaWidth IS NULL OR MediaHeight IS NULL OR MediaFilesizeBytes IS NULL);";
cmdUpdate.Connection = conn;
cmdUpdate.CommandText =
"UPDATE Media SET MediaWidth = #w, MediaHeight = #h, MediaFilesizeBytes = #b WHERE ID = #id;";
cmdUpdate.Parameters.Add("#w", DbType.Int32, 4, "MediaWidth");
cmdUpdate.Parameters.Add("#h", DbType.Int32, 4, "MediaHeight");
cmdUpdate.Parameters.Add("#b", DbType.Int32, 4, "MediaFilesizeBytes");
SQLiteParameter param = cmdUpdate.Parameters.Add("#id", DbType.Int32);
param.SourceColumn = "ID";
param.SourceVersion = DataRowVersion.Original;
adapter.SelectCommand = cmdSelect;
adapter.UpdateCommand = cmdUpdate;
try {
conn.Open();
adapter.Fill(table);
conn.Close();
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e, true);
throw new DatabaseOperationException("", e);
}
foreach (DataRow row in table.Rows) {
try {
using (System.Drawing.Image img = System.Drawing.Image.FromFile(row["MediaPathCurrent"].ToString())) {
System.IO.FileInfo fi;
fi = new System.IO.FileInfo(row["MediaPathCurrent"].ToString());
if (img != null) {
int width = img.Width;
int height = img.Height;
long length = fi.Length;
row["MediaWidth"] = width;
row["MediaHeight"] = height;
row["MediaFilesizeBytes"] = (int)length;
}
}
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e);
DevUtil.Print(e);
continue;
}
}
try {
recordsAffected = adapter.Update(table);
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e);
throw new DatabaseOperationException("", e);
}
}
return recordsAffected;
}
Use Connection.BeginTransaction() to speed up the DataAdapter update.
conn.Open() 'open connection
Dim myTrans As SQLiteTransaction
myTrans = conn.BeginTransaction()
'Associate the transaction with the select command object of the DataAdapter
objDA.SelectCommand.Transaction = myTrans
objDA.Update(objDT)
Try
myTrans.Commit()
Catch ex As Exception
myTrans.Rollback()
End Try
conn.Close()
This vastly speeds up the update.
Loading the dataset from the db tables half a second or so
This is a single SQL statement (so it's fast). Excute SQL SELECT, populate the dataset, done.
updating all the rows with image width/height and getting the file
length from FileInfo took maybe 30 seconds. Fine.
This is updating the in memory data (so that's fast too), change x row in the dataset, don't talk to SQL at all.
The adapter.Update(table); command took somewhere in the vicinity of 5
to 7 minutes to run.
This will run a SQL update for every updated row. Which is why it's slow.
yet even so I can't help but think that if I were to run a separate
update command for each individual update, this would have completed
much faster.
This is basically what it's doing anyway!
From MSDN
The update is performed on a by-row basis. For every inserted,
modified, and deleted row, the Update method determines the type of
change that has been performed on it (Insert, Update or Delete).
Depending on the type of change, the Insert, Update, or Delete command
template executes to propagate the modified row to the data source.
When an application calls the Update method, the DataAdapter examines
the RowState property, and executes the required INSERT, UPDATE, or
DELETE statements iteratively for each row, based on the order of the
indexes configured in the DataSet.
is there some way to "peek into" how ADO is processing this?
Yes: Debug .NET Framework Source Code in Visual Studio 2012?

Million inserts: SqlBulkCopy timeout

We already have a running system that handles all connection-strings (db2, oracle, MSServer).
Currently, We are using ExecuteNonQuery() to do some inserts.
We want to improve the performance, by using SqlBulkCopy() instead of ExecuteNonQuery(). We have some clients that have more than 50 million records.
We don't want to use SSIS, because our system supports multiple databases.
I created a sample project to test the performance of SqlBulkCopy(). I created a simple read and insert function for MSServer
Here's the small function:
public void insertIntoSQLServer()
{
using (SqlConnection SourceConnection = new SqlConnection(_sourceConnectionString))
{
//Open the connection to get the data from the source table
SourceConnection.Open();
using (SqlCommand command = new SqlCommand("select * from " + _sourceSchemaName + "." + _sourceTableName + ";", SourceConnection))
{
//Read from the source table
command.CommandTimeout = 2400;
SqlDataReader reader = command.ExecuteReader();
using (SqlConnection DestinationConnection = new SqlConnection(_destinationConnectionString))
{
DestinationConnection.Open();
//Clean the destination table
new SqlCommand("delete from " + _destinationSchemaName + "." + _destinationTableName + ";", DestinationConnection).ExecuteNonQuery();
using (SqlBulkCopy bc = new SqlBulkCopy(DestinationConnection))
{
bc.DestinationTableName = string.Format("[{0}].[{1}]", _destinationSchemaName, _destinationTableName);
bc.NotifyAfter = 10000;
//bc.SqlRowsCopied += bc_SqlRowsCopied;
bc.WriteToServer(reader);
}
}
}
}
}
When I have less that 200 000 in my dummyTable the bulk copy is working fine. But, when it's over 200 000 records, I have the following errors:
Attempt to invoke bulk copy on an object that has a pending operation.
OR
The wait operation timed out (for the IDataReader)
I increased the CommandTimeout for the reader. It seems that it has solved the timeout issue related to IDataReader.
Am I doing something wrong in the code?
Can you try adding the following before the call to WriteToServer ...
bc.BatchSize = 10000;
bc.BulkCopyTimeout = 0;
I don't know what the default batch size or timeout is, but I suspect this might be your issue.
Hope that helps
Also, you can try playing with different Batch Sizes for optimal performance.
You can try this
bc.BatchSize = 100000; // How many Rows you want to insert at a time
bc.BulkCopyTimeout = 60; // Time in Seconds. If you want infinite waiting Time then assign 0.

Reading large volume of data from oracle database and export it as .dat file using C#

We have a query that will be executed on a monthly basis and returns data of size 1GB.
Query used here is just a select query with inner joins, no cursor involved.
Currently they are executing this query in Toad and exporting the data from output window as .dat file.
Please note that doing this manually using Toad takes 2 hrs of time.
After that they are changing the header text in .dat file to have meaningful names to share it with our clients.
I want to automate this process by creating an exe that will do this process.
Code snapshot looks like the below
using (OracleConnection conn = new OracleConnection())
{
conn.ConnectionString = ConfigurationManager.ConnectionStrings["connectionString"].ConnectionString;
conn.Open();
using (OracleCommand cmd = new OracleCommand(commandText))
{
cmd.Connection = conn;
using (OracleDataReader dtReader = cmd.ExecuteReader())
{
outputContent = new StringBuilder();
while (dtReader != null && dtReader.Read())
{
for (int i = 0; i < dtReader.FieldCount; i++)
{
outputContent.Append(dtReader[i]);
outputContent.Append(delimiter);
}
outputContent = outputContent.Replace(delimiter, Environment.NewLine, outputContent.Length - 1, 1);
}
}
}
}
outputPath = string.Format(ConfigurationManager.AppSettings["OutputPath"], DateTime.Now.Ticks);
outputStream = new StreamWriter(outputPath, true);
//Export
outputStream.Write(outputContent.ToString());
outputStream.Close();
From the log, it got ot know that, execute reader statement is completed within seconds.
But reading the data from datareader throws "Exception message is ORA-03113: end-of-file on communication channel
at System.Data.OracleClient.OracleConnection.CheckError(OciErrorHandle errorHandle, Int32 rc)" after 8 hours of time.
Could anyone please let me know the above approach is good to handle the data of 1GB size? or
Is there any other better way of doing this?
Thanks,
Gayathri
May be you can try
CommandBehavior = SequentialAccess
from MSDN
Use SequentialAccess to retrieve large values and binary data
A sample how to use it
You can export data directly from PL/SQL procedure and have a shell file (instead of an exe) that launches it from SqlPlus.
See this question on SO on what to put in the procedure to export data.

.net application's oracle connection timing out

i have the code below trying to do a bulk copy from oracle to SQL server 2005 and it keeps timing out. how can i extend the oracle connection timeout? it seems i can not from what i read on the web.
OracleConnection source = new OracleConnection(GetOracleConnectionString());
source.Open();
SqlConnection dest = new SqlConnection(GetSQLConnectionString() );
dest.Open();
OracleCommand sourceCommand = new OracleCommand(#"select * from table");
using (OracleDataReader dr = sourceCommand.ExecuteReader())
{
using (SqlBulkCopy s = new SqlBulkCopy(dest))
{
s.DestinationTableName = "Defects";
s.NotifyAfter = 100;
s.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied);
s.WriteToServer(dr);
s.Close();
}
}
source.Close();
dest.Close();
here is my oracle connection string:
return "User Id=USER;Password=pass;Data Source=(DESCRIPTION=" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=14.12.7.2)(PORT=1139))" +
"(CONNECT_DATA=(SID=QCTRP1)));";
You can set s.BulkCopyTimeout option
In your connection string, there is a 'Connection Lifetime' and 'Connection Timeout' parameter. You can set it accordingly. See here for the full reference.
BTW, I know you didn't ask this, but have you considered an ETL tool for migrating your DB records (e.g. Informatica, FME, etc.)? While your approach is valid, it isn't going to be very performant since you are hydrating all of the records from one DB to the client and then serializing them to another DB. For small bulk sets, this isn't a big issue, but if you were processing hundreds of thousands of rows, you might want to consider an official ETL tool.

Categories