I have a relatively simply routine that looks at database entries for media files, calculates the width, height and filesize, and writes them back into the database.
The database is SQLite, using the System.Data.SQLite library, processing ~4000 rows. I load all rows into an ADO table, update the rows/columns with the new values, then run adapter.Update(table); on it.
Loading the dataset from the db tables half a second or so, updating all the rows with image width/height and getting the file length from FileInfo took maybe 30 seconds. Fine.
The adapter.Update(table); command took somewhere in the vicinity of 5 to 7 minutes to run.
That seems awfully excessive. The ID is a PK INTEGER and thus - according to SQLite's docs, is inherently indexed, yet even so I can't help but think that if I were to run a separate update command for each individual update, this would have completed much faster.
I had considered ADO/adapters to be relatively low level (as opposed to ORMs anyway), and this terrible performance surprised me. Can anyone shed some light on why it would take 5-7 minutes to update a batch of ~4000 records against a locally placed SQLite database?
As a possible aside, is there some way to "peek into" how ADO is processing this? Internal library stepthroughs or...??
Thanks
public static int FillMediaSizes() {
// returns the count of records updated
int recordsAffected = 0;
DataTable table = new DataTable();
SQLiteDataAdapter adapter = new SQLiteDataAdapter();
using (SQLiteConnection conn = new SQLiteConnection(Globals.Config.dbAppNameConnectionString))
using (SQLiteCommand cmdSelect = new SQLiteCommand())
using (SQLiteCommand cmdUpdate = new SQLiteCommand()) {
cmdSelect.Connection = conn;
cmdSelect.CommandText =
"SELECT ID, MediaPathCurrent, MediaWidth, MediaHeight, MediaFilesizeBytes " +
"FROM Media " +
"WHERE MediaType = 1 AND (MediaWidth IS NULL OR MediaHeight IS NULL OR MediaFilesizeBytes IS NULL);";
cmdUpdate.Connection = conn;
cmdUpdate.CommandText =
"UPDATE Media SET MediaWidth = #w, MediaHeight = #h, MediaFilesizeBytes = #b WHERE ID = #id;";
cmdUpdate.Parameters.Add("#w", DbType.Int32, 4, "MediaWidth");
cmdUpdate.Parameters.Add("#h", DbType.Int32, 4, "MediaHeight");
cmdUpdate.Parameters.Add("#b", DbType.Int32, 4, "MediaFilesizeBytes");
SQLiteParameter param = cmdUpdate.Parameters.Add("#id", DbType.Int32);
param.SourceColumn = "ID";
param.SourceVersion = DataRowVersion.Original;
adapter.SelectCommand = cmdSelect;
adapter.UpdateCommand = cmdUpdate;
try {
conn.Open();
adapter.Fill(table);
conn.Close();
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e, true);
throw new DatabaseOperationException("", e);
}
foreach (DataRow row in table.Rows) {
try {
using (System.Drawing.Image img = System.Drawing.Image.FromFile(row["MediaPathCurrent"].ToString())) {
System.IO.FileInfo fi;
fi = new System.IO.FileInfo(row["MediaPathCurrent"].ToString());
if (img != null) {
int width = img.Width;
int height = img.Height;
long length = fi.Length;
row["MediaWidth"] = width;
row["MediaHeight"] = height;
row["MediaFilesizeBytes"] = (int)length;
}
}
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e);
DevUtil.Print(e);
continue;
}
}
try {
recordsAffected = adapter.Update(table);
}
catch (Exception e) {
Core.ExceptionHandler.HandleException(e);
throw new DatabaseOperationException("", e);
}
}
return recordsAffected;
}
Use Connection.BeginTransaction() to speed up the DataAdapter update.
conn.Open() 'open connection
Dim myTrans As SQLiteTransaction
myTrans = conn.BeginTransaction()
'Associate the transaction with the select command object of the DataAdapter
objDA.SelectCommand.Transaction = myTrans
objDA.Update(objDT)
Try
myTrans.Commit()
Catch ex As Exception
myTrans.Rollback()
End Try
conn.Close()
This vastly speeds up the update.
Loading the dataset from the db tables half a second or so
This is a single SQL statement (so it's fast). Excute SQL SELECT, populate the dataset, done.
updating all the rows with image width/height and getting the file
length from FileInfo took maybe 30 seconds. Fine.
This is updating the in memory data (so that's fast too), change x row in the dataset, don't talk to SQL at all.
The adapter.Update(table); command took somewhere in the vicinity of 5
to 7 minutes to run.
This will run a SQL update for every updated row. Which is why it's slow.
yet even so I can't help but think that if I were to run a separate
update command for each individual update, this would have completed
much faster.
This is basically what it's doing anyway!
From MSDN
The update is performed on a by-row basis. For every inserted,
modified, and deleted row, the Update method determines the type of
change that has been performed on it (Insert, Update or Delete).
Depending on the type of change, the Insert, Update, or Delete command
template executes to propagate the modified row to the data source.
When an application calls the Update method, the DataAdapter examines
the RowState property, and executes the required INSERT, UPDATE, or
DELETE statements iteratively for each row, based on the order of the
indexes configured in the DataSet.
is there some way to "peek into" how ADO is processing this?
Yes: Debug .NET Framework Source Code in Visual Studio 2012?
Related
I am trying to insert 200,000 documents from a folder into the SQL Server database into a varbinary column. I get a timeout expiration message after inserting 80,000 documents. Average file size is about 250kb and max file size is 50MB. I am running this c# program on the server where the database is located.
Please suggest.
The error:
The timeout period elapsed prior to completion of the operation or the server is not responding.
The code:
string spath = #"c:\documents";
string[] files = Directory.GetFiles(spath, "*.*", SearchOption.AllDirectories);
Console.Write("Files Count:" + files.Length);
using (SqlConnection con = new SqlConnection(connectionString))
{
con.Open();
string insertSQL = "INSERT INTO table_Temp(doc_content, doc_path) values(#File, #path)";
SqlCommand cmd = new SqlCommand(insertSQL, con);
var pFile = cmd.Parameters.Add("#File", SqlDbType.VarBinary, -1);
var pPath = cmd.Parameters.Add("#path", SqlDbType.Text);
var tran = con.BeginTransaction();
var fn = 0;
foreach (string docPath in files)
{
string newPath = docPath.Remove(0, spath.Length);
string archive = new DirectoryInfo(docPath).Parent.Name;
fn += 1;
using (var stream = new FileStream(docPath, FileMode.Open, FileAccess.Read))
{
pFile.Value = stream;
pPath.Value = newPath;
cmd.Transaction = tran;
cmd.ExecuteNonQuery();
if (fn % 10 == 0)
{
tran.Commit();
tran = con.BeginTransaction();
Console.Write("|");
}
Console.Write(".");
}
}
tran.Commit();
}
For this, I would suggest using SQLBulkCopy methods, since it should be able to handle data insertion much more easily. Further, as others have pointed out, you might want to increase the timeout condition for your command.
While I would agree this may be best for a bulk copy of some sort, if you must do this in the c# program, your only option is probably to increase the timeout value. You can do this after your SqlCommand object has been created via cmd.CommandTimeout = <new timeout>; The property CommandTimeout is an integer representing the number seconds for the timeout, or zero if you never want it to timeout.
See the MSDN docs for details
You should be able to set the timeout for the transaction directly on that object in your application code, this way you are not changing sql server settings.
Secondly, you can build batches in your application also. You say you can get 80k docs before a timeout, set your batch size at 50k, process them, commit them, grab the next batch. Having your application manage batching also allows you to catch sql errors, such as timeout, and then dynamically adjust the batch size and retry without ever crashing. This is the entire reason for writing your application in the first place, other wise you could just use the wizard in management studio and manually insert your files.
I highly recommend batching over other options.
#Shubham Pandey also provides a great link to SQL bulk copy info, which also has links to more info. You should definitely experiment with the bulk copy class and see if you can get additional gains with it as well.
The Problem:
I have a web application where people can upload xml, xmls, csv files.
I then take their content and insert it into my Oracle DB.
Technical details:
I recently had a problem where I get OutOfMemory Exception trying to use the data.
The previous developer created a list of lists on the data in order to manage them. However, this is giving us OutOfMemory Exception.
We are using the LinqToExcel library.
Sample code:
excel = new ExcelQueryFactory(excelFile);
IEnumerable<RowNoHeader> data = from row in excel.WorksheetNoHeader(sheetName)
select row;
List<List<string>> d = new List<List<string>>(data.Count());
foreach (RowNoHeader row in data)
{
List<string> list = new List<string>();
foreach (Cell cell in row)
{
string cellValue = cell.Value.ToString().Trim(' ').Trim(null);
list.Add(cellValue);
}
d.Add(list);
}
I have tried to change the code and instead did this:
string connectionstring = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0;HDR=YES;';", excelFile);
OleDbConnection connection = new OleDbConnection();
connection.ConnectionString = connectionstring;
OleDbCommand excelCommand = new OleDbCommand();
excelCommand.Connection = connection;
excelCommand.CommandText = String.Format("Select * FROM [{0}$]", sheetName);
connection.Open();
DataTable dtbl = CreateTable(TableColumns);
OleDbDataReader reader = excelCommand.ExecuteReader();
while (reader.Read())
{
DataRow row = dtbl.NewRow();
dtbl.Rows.Add(row);
}
using (OracleCommand command = new OracleCommand(selectCommand, _oracleConnection))
{
using (OracleDataAdapter adapter = new OracleDataAdapter(command))
{
using (OracleCommandBuilder builder = new OracleCommandBuilder(adapter))
{
OracleTransaction trans = _oracleConnection.BeginTransaction();
command.Transaction = trans;
adapter.InsertCommand = builder.GetInsertCommand(true);
adapter.Update(dtbl);
trans.Commit();
}
}
}
However, I still get the same OutOfMemory Exception.
I have read online and've seen that I should make my project x64 and use the following:
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
However, I can't change my web application to run on x64.
My solution was to make this in batches like this:
int rowCount = 0;
while (reader.Read())
{
DataRow row = dtbl.NewRow();
dtbl.Rows.Add(row);
if (rowCount % _batches == 0 && rowCount != 0)
{
DBInsert(dtbl, selectCommand);
dtbl = CreateTable(TableColumns);
}
}
private void DBInsert(DataTable dt, string selectCommand)
{
using (OracleCommand command = new OracleCommand(selectCommand, _oracleConnection))
{
using (OracleDataAdapter adapter = new OracleDataAdapter(command))
{
using (OracleCommandBuilder builder = new OracleCommandBuilder(adapter))
{
OracleTransaction trans = _oracleConnection.BeginTransaction();
command.Transaction = trans;
adapter.InsertCommand = builder.GetInsertCommand(true);
adapter.Update(dt);
trans.Commit();
}
}
}
}
}
It works, however this is very slow. I was wondering if there is a way to either solve the problem with the memory serially or write in memory in parallel.
I have tried to insert the data in parallel using threads but this takes a lot of memory and throws OutOfMemory Exception as well.
Just don't load 1M rows into a DataTable. Use whatever bulk import mechanism is available to load a stream of rows. Oracle, like SQL Server offers several ways to bulk import data.
Collections like List or DataTable use an internal buffer to store data that they reallocate when it fills up, using twice the original size. With 1M rows that leads to a lot of reallocations and a lot of memory fragmentation. The runtime may no longer be able to even find a contiguous block of memory large enough to store 2M entries. That's why it's important to set the capacity parameter when creating a new List.
Apart from that, it doesn't serve any purpose to load everything in memory and then send it to the database. It's actually faster to send the data as soon as each file is read, or as soon as a sufficiently large number is loaded. Instead of trying to load 1M rows at once, read 500 or 1000 of them each time and send them to the database.
Furthermore, Oracle's ADO.NET provider includes the OracleBulkCopy class that works in a way similar to SqlBulkCopy for SQL Server. The WriteToServer method can accept a DataTable or a DataReader. You can use the DataTable overload to send batches of item. An even better idea is to use the overload that accepts a reader and have the class collect the batch and send it to the database.
Eg :
using(var bcp=OracleBulkCopy(connectionString))
{
bcp.BatchSize=5000;
bcp.DestinationTableName = "MyTable";
//For each source/target column pair, add a mapping
bcp.ColumnMappings.Add("ColumnA","ColumnA");
var reader = excelCommand.ExecuteReader();
bcp.WriteToServer(reader);
}
I have been tasked with creating an application that monitors any "INSERT" events on a specific table. I was going to go about this using SqlDependency to create a notification link between the DB and the C# app, but it turns out I am not able to do this due to security issues.
Due to this, I have modeled my application as follows:
This is well and good, but as it turns out, the SQL table I am querying has a rather large size. The table has nearly 3.5 Million rows 55 columns. When loading into the C# DataTable object, I am getting an out of memory exception.
internal static DataTable ExecuteQuery(string query, Dictionary<string,string> parameters = null)
{
try
{
using (SqlConnection dbconn = new SqlConnection(SQLServer.Settings.ConnectionString))
using (SqlCommand cmd = new SqlCommand())
{
dbconn.Open(); // Open the connection
cmd.CommandText = query; // Set the query text
cmd.Connection = dbconn;
if (parameters != null)
{
foreach (var parameter in parameters) // Add filter parameters
cmd.Parameters.AddWithValue(parameter.Key, parameter.Value);
}
var dt = new DataTable();
using (SqlDataAdapter adpt = new SqlDataAdapter(cmd)){adpt.Fill(dt);} // MY ERROR OCCURS HERE!
dbconn.Close();
queryError = false;
return dt;
}
}
catch(Exception ex)
{
queryError = true;
EventLogger.WriteToLog("ExecuteQuery()", "Application", "Error: An error has occured while performing a database query.\r\nException: " + ex.Message);
return null;
}
}
When running the code above, I get the following error at the line for SqlDataAdapter.Fill(dt)
Exception of type 'System.OutOfMemoryException' was thrown.
Is there a way that I can either restructure my application OR prevent this incredibly high memory consumption from the DataTable class? SQL server seems capable enough to do a select * from the table but when I fill a DataTable with the same data, I use up over 6GB of RAM! Why is there so much overhead when using DataTable?
Here is a link to my flowchart.
I was able to resolve this issue by making use of the SqlDataReaderclass. This class lets you "stream" the sql result set row by row rather bringing back the entire result set all at once and loading that into memory.
So now in step 5 from the flow chart, I can query for only the very first row. Then in step 6, I can query again at a later date and iterate through the new result set one row at a time until I find the original row I started at. All the while, I am filling a DataTable with the new results. This accomplishes two things.
I don't need to load all the data from the query all at once into local memory.
I can immediately get the "inverse" DataSet. AKA... I can get the newly inserted rows that didn't exist the first time I checked.
Which is exactly what I was after. Here is just a portion of the code:
private static SqlDataReader reader;
private static SqlConnection dbconn = new SqlConnection(SQLServer.Settings.ConnectionString);
private void GetNextRows(int numRows)
{
if (dbconn.State != ConnectionState.Open)
OpenConnection();
// Iterate columns one by one for the specified limit.
int rowCnt = 0;
while (rowCnt < numRows)
{
while (reader.Read())
{
object[] row = new object[reader.FieldCount];
reader.GetValues(row);
resultsTable.LoadDataRow(row, LoadOption.PreserveChanges);
rowCnt++;
sessionRowPosition++;
break;
}
}
}
The whole class would be too large for me to post here but one of the caveats was that the interval between checks for me was long, on the order of days, so I needed to close the connection between checks. When closing the connection with a SqlDataReader, you loose your row position so I needed to add a counter to keep track of that.
Check you query for select. You probably get from database many rows.
Here's what I got: User selects from a checklistbox of database names one they'd like to archive. Switch case in place to catch the selection.
case "userSelection":
sqlAdapter = CreateMyAdapter("dbName", true, sqlconn, null);
sqlAdapter.SelectCommand.CommandText += "";
sqlAdapter.Fill(myDS.tableName);
sqlAdapter.Dispose();
The adapter:
private SqlDataAdapter CreateMyAdapter(string TableName, bool IncludeUpdates, SqlConnection sqlConn, SqlTransaction sqlTran)
{
SqlDataAdapter sqlAdapter = null;
SqlConnection sqlConnArchive = new SqlConnection();
strSQL = "SELECT " + TableName + ".* FROM " + TableName;
sqlAdapter = new SqlDataAdapter(strSQL, sqlConn);
// Right here, I create another sqlConnection that is pointed to
// another datasource.
sqlConnArchive = getThisOtherConnection();
SqlCommand sqlComm;
if (IncludeUpdates)
{
string strInsertSQL = "<insertQuery>";
sqlComm = new SqlCommand(strInsertSQL, sqlConnArchive);
sqlComm.Parameters.Add("#TableID", SqlDbType.Int, 0, "TableID");
// More params here...
sqlAdapter.InsertCommand = sqlComm;
// Update
// Delete
}
}
return sqlAdapter;
The issue:
As you can see sqlConn is the connection that is tied to the SELECT command. And sqlConnArchive is tied to the INSERT. The thought here is that I could select the data from DB_1 if you will, and insert it into DB_2 using the same SQLDataAdapter. But the issue that I'm running into is trying to insert. The select works fine, and at this line sqlAdapter.Fill(myDS.tableName); once fill executes the data is there. But the INSERT isn't working.
A few things:
I tested to see if perhaps SQLDataAdapter couldn't handle multiple datasources/connections, switched things around so it was pointing the the same DB just different tables, and I'm seeing the same results.
I've confirmed that the issue does not reside within the INSERT query.
There are no errors, just steps right over in debug.
I have tried several permutations of .Update() and none of them worked. This project that I've been assigned, throughout the entire thing it appears that .Fill(); is what is submitting the data back to the DB.
I've tested the database side and connectivity is a go. No issues with login, etc etc..
Any help is greatly appreciated.
Please note - I tried to place an even larger emphasis on the word "greatly" but was limited by my toolset. Apparently SOF doesn't support bold, blink, underline, flames, or embedded music.
I think you want ExecuteNonQuery.
var rowsAffected = sqlAdapter.InsertCommand.ExecuteNonQuery();
This executes the statement and then returns the number of rows affected. The Fill method won't run any InsertCommands.
We already have a running system that handles all connection-strings (db2, oracle, MSServer).
Currently, We are using ExecuteNonQuery() to do some inserts.
We want to improve the performance, by using SqlBulkCopy() instead of ExecuteNonQuery(). We have some clients that have more than 50 million records.
We don't want to use SSIS, because our system supports multiple databases.
I created a sample project to test the performance of SqlBulkCopy(). I created a simple read and insert function for MSServer
Here's the small function:
public void insertIntoSQLServer()
{
using (SqlConnection SourceConnection = new SqlConnection(_sourceConnectionString))
{
//Open the connection to get the data from the source table
SourceConnection.Open();
using (SqlCommand command = new SqlCommand("select * from " + _sourceSchemaName + "." + _sourceTableName + ";", SourceConnection))
{
//Read from the source table
command.CommandTimeout = 2400;
SqlDataReader reader = command.ExecuteReader();
using (SqlConnection DestinationConnection = new SqlConnection(_destinationConnectionString))
{
DestinationConnection.Open();
//Clean the destination table
new SqlCommand("delete from " + _destinationSchemaName + "." + _destinationTableName + ";", DestinationConnection).ExecuteNonQuery();
using (SqlBulkCopy bc = new SqlBulkCopy(DestinationConnection))
{
bc.DestinationTableName = string.Format("[{0}].[{1}]", _destinationSchemaName, _destinationTableName);
bc.NotifyAfter = 10000;
//bc.SqlRowsCopied += bc_SqlRowsCopied;
bc.WriteToServer(reader);
}
}
}
}
}
When I have less that 200 000 in my dummyTable the bulk copy is working fine. But, when it's over 200 000 records, I have the following errors:
Attempt to invoke bulk copy on an object that has a pending operation.
OR
The wait operation timed out (for the IDataReader)
I increased the CommandTimeout for the reader. It seems that it has solved the timeout issue related to IDataReader.
Am I doing something wrong in the code?
Can you try adding the following before the call to WriteToServer ...
bc.BatchSize = 10000;
bc.BulkCopyTimeout = 0;
I don't know what the default batch size or timeout is, but I suspect this might be your issue.
Hope that helps
Also, you can try playing with different Batch Sizes for optimal performance.
You can try this
bc.BatchSize = 100000; // How many Rows you want to insert at a time
bc.BulkCopyTimeout = 60; // Time in Seconds. If you want infinite waiting Time then assign 0.