I am programming a database export solution to convert a database. It is exporting and processing some hundred gigabytes of data in a multithreaded environment. All threads work in their own environment with own connections, but for orchestrating the export uses a sqlite "lookup" database that is used by all threads to distribute work and encode a specific field to a new id. This happens quite rarely (once every 50k rows exported) so it should not really slow the process even if locks are used.
For some reason about every 10-15min this lookup database throws an exception "could not open database" with errorcode 14. The exception randomly occurs on any of the ExecuteReader() methods. I tried to lock all methods accessing this database but still it crashes every 10-15min. Why? Actually when I just press on Resume in debugging mode everything works again, seems to be a temporary problem.
ExecuteLookup() is called by "main" exporting thread repeatedly.
DBQueryLookupDb() is called by any of the worker threads. (Something like INSERT INTO progress ...)
// using Microsoft.Data.Sqlite
private long ExecuteLookup(string value)
{
lock (this)
{
using (var cmd = GetSelectLookupCmd()) // "SELECT id,original FROM lookup_id WHERE original = #original", parameter is added, tried to re-use this command but same problem
{
cmd.Parameters["#original"].Value = value;
using (var res = cmd.ExecuteReader())
{
if (res.Read())
{
return res.GetInt64(0);
}
}
}
using (var cmd = GetInsertLookupCmd()) // INSERT INTO ...;SELECT last_insert_rowid();
{
cmd.Parameters["#original"].Value = value;
using (var res = cmd.ExecuteReader())
{
if (res.Read())
{
return res.GetInt64(0);
}
else { throw new Exception("Unexpected fail on lookup insert"); }
}
}
}
}
public void DBQueryLookupDb(string sql)
{
lock (this)
{
using (SqliteCommand c = new SqliteCommand())
{
c.Connection = lookupDb;
c.CommandType = System.Data.CommandType.Text;
c.CommandText = sql;
c.ExecuteNonQuery();
}
}
}
"PRAGMA JOURNAL_MODE ='PERSIST'" resolves the problem. TRUNCATE may work too. It may be a sync problem coming from the network drive (Sqlite deletes the journal and immediatley creates a new one before the network drive is ready again)
Related
I have a C# project that is working with TCP socket in an asynchronous way.
Every request comes from client and ask question from SQL Server stored procedure, opens and closes a SQL connection after ending of question.
I've used this code:
using (var con = new SqlConnection(setting.ConnectionString))
{
try
{
//some codes (edited)
SqlCommand command = new SqlCommand(con);
command.CommandText = "procedurename1";
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(new SqlParameter("#name", sb.ToString()));
SqlDataAdapter adapter = new SqlDataAdapter(command);
try
{
adapter.Fill(dataSet);
}
catch (Exception ex)
{
con.Close();
con.Dispose();
throw ex;
}
finally {
con.Close();
con.Dispose();
}
}
catch(Exception ex)
{}
finally
{
con.close();
con.dispose();
}
}
I've used
netstat -a -n | find /c "1433"
to count SQL connections open and close.
Problem is SQL connections count increases and it rarely decreases and count down.
Main problem, is when my program works under lots of requests about 30 minutes, I get
SqlCommand timeout error (default 30 seconds passed)
and after restarting my C# program, the SqlCommand timeout will be gone.
Is this a problem of my program or SQL Server side?
Remember it always calls a stored procedure in SQL Server, not executing query
directly.
main method:
public void main()
{
Task.Factory.StartNew(() =>
{
allDone.Reset();
mySocket.AcceptAsync(e);
allDone.WaitOne();
});
}
public void e_Completed(object sender, SocketAsyncEventArgs e)
{
var socket = (Socket)sender;
ThreadPool.QueueUserWorkItem(HandleTcpRequest, e.AcceptSocket);
e.AcceptSocket = null;
socket.AcceptAsync(e);
}
public void HandleTcpRequest(object state)
{
//do some code and connection to SQL server
DLL.Request httprequest = new DLL.Request(dataSet.Tables[0], fileDt);
DLL.IHttpContext _context = new DLL.HttpContext(httprequest);
_context.GetResults();
}
Main problem, is when my program works under lots of requests about 30 minutes,
To isolate the root problem of the time-out, I suggest testing the sql query of the stored procedure independent of TCP socket calls for 30 minutes
and log the time-out exception details for inspection
Run the following query within 30 minutes to simulate your working environment:
public void RunQuery()
{
using (var con = new SqlConnection(setting.ConnectionString))
{
try
{
//some codes
}
catch(SqlException ex)
{
//test for timeout
if (ex.Number == -2) {
Console.WriteLine ("Timeout occurred");
// log ex details for more inspection
}
}
}
}
Read How to handle the CommandTimeout properly?
As you use async calls, I suggest you to try to use Asynchronous Database Calls With Task-based Asynchronous Programming Model (TAP)
I'm going to take a long-shot based on the way the limited Sql-related code we can see is written since we can't see "//some codes".
I'm going to guess that some of the disposable things like SqlCommand, DataReader, SqlDataAdapter, TransactionScope, etc are not in 'using' blocks, so are holding resources open on the database.
It may also be worth raising the possibility that this kind of problem could be in the code shown in the question or any other program accessing that database, including your own applications and SSMS (e.g. if a developer has an uncommitted transaction running in a window).
P.S. I would suggest deleting everything in the using block except the "//some codes" part.
UPDATE after more code was added
Here is your code after correction. this will ensure that the resources are disposed, which will prevent the leaking resources that are probably causing your problem.
using (var con = new SqlConnection(setting.ConnectionString))
{
//some codes (edited)
using (SqlCommand command = new SqlCommand(con))
{
command.CommandText = "procedurename1";
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(new SqlParameter("#name", sb.ToString()));
using (var adapter = new SqlDataAdapter(command))
{
adapter.Fill(dataSet);
}
}
}
P.S. don't ever write "throw ex;" from inside a catch ever again. It causes the stack trace to be lost - just use "throw;".
I recently solved an issue where my program was crashing while using a BackgroundWorker.
I do not fully understand why there was a problem in the first place. When the thread has it's apartment state set to STA the Process memory increases until the program throws the exception "Unable to allocate environment handle", at first I thought it was an issue with the Database not being able to process the queries quickly enough. The bottom chart on the picture below shows the program running with the thread set to STA. You can see a steady increase in usage until it drops off almost completely. That is when the exception is thrown.
When the thread is running with the ApartmentState set to MTA, the top chart shows it behaving as expected, there is a increase in usage, then it starts a pattern of using and free the memory.
TLDR:
Why does a thread using the Apartment state STA have an issue.
Below I have included some relevant code.
BackgroundWorker myWorkerBee;
List<Customers> AllCustomers
private void btnStartConversion_Click(object sender, RoutedEventArgs e)
{
myWorkerBee = new Thread(myWorkerBee_DoWork);
myWorkerBee.SetApartmentState(ApartmentState.STA);
myWorkerBee.Start();
{
private void myWorkerBee_DoWork()
{
GetOldData(); //Creates Customer object and fills AllCustomers list
AddCustomers();
}
There is no issue with the program if it doesn't use the AddNewCustomerConvert(); Method.
private void AddCustomers()
{
for (int i = 0; i < AllCustomers.Count; i++)
{
AllCustomers[i].AddNewCustomerConvert();
}
}
This method exclusively calls the RawQuery(); Method, or methods that only call the RawQuery(); method
//REFERENCED PROGRAM
AddNewCustomerConvert()
{
//40 or so insert statements.
databaseConnection.RawQuery("//INSERT STATEMENT");
}
Sends Queries to the database
//REFERENCE DLL
public OdbcDataReader RawQuery(string query_to_perform)
{
// This method executes a query in the specific database that you
// are connected to.
System.Data.Odbc.OdbcCommand command = null;
// holds the query sent to the database
System.Data.Odbc.OdbcDataReader result_reader = null;
// The query is put into an OdbcCommand object and sent to the database. The
// return result will then be given back to the caller.
try
{
if (loggingEnabled)
{
myLog = File.AppendText(loggingFileName);
myLog.WriteLine(query_to_perform);
myLog.Close();
myLog.Dispose();
}
command = new System.Data.Odbc.OdbcCommand(query_to_perform, this.database_connection);
result_reader = command.ExecuteReader();
this.successful_query = true;
this.error_message = "";
}
catch (System.Data.Odbc.OdbcException ex)
{
this.successful_query = false;
this.error_message = ex.Message;
//destroy the connection on a failure
database_connection = new OdbcConnection();
throw;
}
return result_reader;
}
I had some trouble trying to place piece of code into another thread to increase performance.
I have following code below (with thread additions with comments), where I parse large XML file (final goal 100,000 rows) and then write it to a SQL Server CE 3.5 database file (.sdf) using record and insert (SqlCeResultSet/SqlCeUpdatableRecord).
Two lines of code in if statement inside the while loop,
xElem = (XElement)XNode.ReadFrom(xmlTextReader);
and
rs.Insert(record);
take about the same amount of time to execute. I was thinking to run rs.Insert(record); while I am parsing the next line of xml file. However, I still was unable to do it using either Thread or ThreadPool.
I have to make sure that the record that I pass to thread is not changed until I finish executing rs.Insert(record); in existing thread. Thus, I tried to place thread.Join() before writing new record (record.SetValue(i, values[i]);), but I still get conflict when I try to run the program - program crashes with bunch of errors due to trying to write identical row several times (especially for index).
Can anyone help me with some advise? How can I move rs.Insert(record); into another thread to increase performance?
XmlTextReader xmlTextReader = new XmlTextReader(modFunctions.InFName);
XElement xElem = new XElement("item");
using (SqlCeConnection cn = new SqlCeConnection(connectionString))
{
if (cn.State == ConnectionState.Closed)
cn.Open();
using (SqlCeCommand cmd = new SqlCeCommand())
{
cmd.Connection = cn;
cmd.CommandText = "item";
cmd.CommandType = CommandType.TableDirect;
using (SqlCeResultSet rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable))
{
SqlCeUpdatableRecord record = rs.CreateRecord();
// Thread code addition
Thread t = new Thread(new ThreadStart(() => rs.Insert(record));
while (xmlTextReader.Read())
{
if (xmlTextReader.NodeType == XmlNodeType.Element &&
xmlTextReader.LocalName == "item" &&
xmlTextReader.IsStartElement() == true)
{
xElem = (XElement)XNode.ReadFrom(xmlTextReader);
values[0] = (string)xElem.Element("Index"); // 0
values[1] = (string)xElem.Element("Name"); // 1
~~~
values[13] = (string)xElem.Element("Notes"); // 13
// Thread code addition -- Wait until previous thread finishes
if (ThreadStartedS == 1)
{
t.Join()
}
// SetValues to record
for (int i = 0; i < values.Length; i++)
{
record.SetValue(i, values[i]); // 0 to 13
}
// Thread code addition -- Start thread to execute rs.Insert(record)
ThreadStartedS = 1;
t.Start();
// Original code without threads
// Insert Record
//rs.Insert(record);
}
}
}
}
}
If all of your processing is going to be done on the device (reading from the XML file on the device then parsing the data on the device), then you will see no performance increase from threading your work.
These Windows Mobile devices only have a single processor, so for them to multithread means one process works for a while, then another process works for a while. You will never have simultaneous processes running at the same time.
On the other hand, if the data from your XML file were located on a remote server, you could call the data in chunks. As a chunk arrives, you could process that data in another thread while waiting on the next chunk of data to arrive in the main thread.
If all of this work is being done on one device, you will not have good luck with multithreading.
You can still display a progress bar (from 0 to NumberOfRecords) with a cancel button so the person waiting for the data collection to complete does not go insane with anticipation.
I have written a simple Mono C# application for writing to an SQLite database using the Mono implementation present in the Mono.Data.Sqlite package:
using System;
using Mono.Data.Sqlite;
class MainClass
{
public static void Main (string[] args)
{
using (var dbConnection = new SqliteConnection (#"Data Source=/var/log/gmblog;Version=3;"))
{
dbConnection.Open();
string sql = #"INSERT INTO ""queue"" (""data"") VALUES(""Test"")";
using (var insertCommand = new SqliteCommand (sql, dbConnection))
{
insertCommand.ExecuteNonQuery();
}
}
}
}
This works fine until I do an insert from another application such as sqlite3 and keeps this application running:
sqlite> insert into queue ("data") VALUES("test2");
Now the C# program hangs until it gives the following error:
Unhandled Exception: Mono.Data.Sqlite.SqliteException: The database file is locked
I don't have any problems writing to the table from other instances of sqlite3 or from a C++ application I created.
If I close the sqlite3 instance then the C# application works again.
Doing a lsof /var/log/gmblog shows that sqlite3 has obtained a reader lock after performing the INSERT:
sqlite3 15578 cup 3ur REG 8,17 13312 4988505 /var/log/gmblog
Before the INSERT it didn't have this lock:
sqlite3 15578 cup 3u REG 8,17 13312 4988505 /var/log/gmblog
But as I pointed out other applications do not have any problems written to the table while other applications are using the database.
Any ideas on what is wrong with my C# code? Is it a bug in the Mono implementation of SQLite?
Update 25/11
Note that it's the dbConnection.Open(); which results in the database locked error, not the insertCommand.ExecuteNonQuery();. I.e. the following code doesn't work either:
using System;
using Mono.Data.Sqlite;
class MainClass
{
public static void Main (string[] args)
{
using (var dbConnection = new SqliteConnection (#"Data Source=/var/log/gmblog;Version=3;"))
{
dbConnection.Open();
}
}
}
Please see the SQLiteConnection documentation: "If the SQLiteConnection goes out of scope, it is not closed. Therefore, you must explicitly close the connection by calling Close.". The example shows the connection being closed in a finally block. I can't see anywhere in your code sample where you close the connection.
You should also be able to use the "using" block which automatically calls close when the object goes out of scope:
using (var dbConnection = new SqliteConnection (#"..."))
{
dbConnection.Open();
string sql = #"INSERT INTO ""queue"" (""data"") VALUES(""Test"")";
using (var insertCommand = new SqliteCommand (sql, dbConnection))
{
insertCommand.ExecuteNonQuery();
}
}
This makes sure everything gets released properly and in a timely fashion.
I found a similar discussion on sqlite's website on multi-threading. I understand that you are not multi-threading there but instead trying to get a write-lock - but still the discussion have some useful hints that maybe helpful.
Each thread then proceeds to insert a number of records, let's say 1000. The problem you will encounter is the following: one thread will get control over the database by setting a lock on the file. This is fine, but the rest of the threads will keep on failing for each attempted INSERT while the lock is active.
Solution
Test for SQLITE_BUSY, which I didn't do originally. Here's some
pseudo-code to illustrate a solution:
while (continueTrying) {
retval = sqlite_exec(db, sqlQuery, callback, 0, &msg);
switch (retval) {
case SQLITE_BUSY:
Log("[%s] SQLITE_BUSY: sleeping fow a while...", threadName);
sleep a bit... (use something like sleep(), for example)
break;
case SQLITE_OK:
continueTrying = NO; // We're done
break;
default:
Log("[%s] Can't execute \"%s\": %s\n", threadName, sqlQuery, msg);
continueTrying = NO;
break;
}
}
You may also want to try busy_timeout parameter on connection string as shown here and here.
The busy_timeout parameter is implemented as a call to
sqlite(3)_busy_timeout. The default value is 0, which means to throw a
SqliteBusyException immediately if the database is locked.
A while back I asked a question about TransactionScope escalating to MSDTC when I wasn't expecting it to. (Previous question)
What it boiled down to was, in SQL2005, in order to use a TransactionScope, you can only instance and open a single SqlConnection within the life of the TransactionScope. With SQL2008, you can instance multiple SqlConnections, but only a single one can be open at any given time. SQL2000 will always escalate to DTC...we don't support SQL2000 in our application, a WinForms app, BTW.
Our solution to single-connection-only problem was to create a TransactionScope helper class, called LocalTransactionScope (aka 'LTS'). It wraps a TransactionScope and, most importantly, creates and maintains a single SqlConnection instance for our application. The good news is, it works - we can use LTS across disparate pieces of code and they all join the ambient transaction. Very nice. The trouble is, every root LTS instance created will create and effectively kill a connection from the connection pool. By 'Effectively Kill' I mean it will instance a SqlConnetion, which will open a new connection (for whatever reason, it never reuses a connection from the pool,) and when that root LTS is disposed, it closes and disposes the SqlConnection which is supposed to release the connection back to the pool so that it can be reused, however, it clearly never is reused. The pool bloats until it's maxed out, and then the application fails when a max-pool-size+1 connection is established.
Below I've attached a stripped down version of the LTS code and a sample console application class that will demonstrate the connection pool exhaustion. In order to watch your connection pool bloat, use SQL Server Managment Studio's 'Activity Monitor' or this query:
SELECT DB_NAME(dbid) as 'DB Name',
COUNT(dbid) as 'Connections'
FROM sys.sysprocesses WITH (nolock)
WHERE dbid > 0
GROUP BY dbid
I'm attaching LTS here, and a sample console application that you can use to demonstrate for yourself that it will consume connections from the pool and never re-use nor release them. You will need to add a reference to System.Transactions.dll for LTS to compile.
Things to note: It's the root-level LTS that opens and closes the SqlConnection, which always opens a new connection in the pool. Having nested LTS instances makes no difference because only the root LTS instance establishes a SqlConnection. As you can see, the connection string is always the same, so it should be reusing the connections.
Is there some arcane condition we're not meeting that causes the connections not to be re-used? Is there any solution to this other than turning pooling off entirely?
public sealed class LocalTransactionScope : IDisposable
{
private static SqlConnection _Connection;
private TransactionScope _TransactionScope;
private bool _IsNested;
public LocalTransactionScope(string connectionString)
{
// stripped out a few cases that need to throw an exception
_TransactionScope = new TransactionScope();
// we'll use this later in Dispose(...) to determine whether this LTS instance should close the connection.
_IsNested = (_Connection != null);
if (_Connection == null)
{
_Connection = new SqlConnection(connectionString);
// This Has Code-Stink. You want to open your connections as late as possible and hold them open for as little
// time as possible. However, in order to use TransactionScope with SQL2005 you can only have a single
// connection, and it can only be opened once within the scope of the entire TransactionScope. If you have
// more than one SqlConnection, or you open a SqlConnection, close it, and re-open it, it more than once,
// the TransactionScope will escalate to the MSDTC. SQL2008 allows you to have multiple connections within a
// single TransactionScope, however you can only have a single one open at any given time.
// Lastly, let's not forget about SQL2000. Using TransactionScope with SQL2000 will immediately and always escalate to DTC.
// We've dropped support of SQL2000, so that's not a concern we have.
_Connection.Open();
}
}
/// <summary>'Completes' the <see cref="TransactionScope"/> this <see cref="LocalTransactionScope"/> encapsulates.</summary>
public void Complete() { _TransactionScope.Complete(); }
/// <summary>Creates a new <see cref="SqlCommand"/> from the current <see cref="SqlConnection"/> this <see cref="LocalTransactionScope"/> is managing.</summary>
public SqlCommand CreateCommand() { return _Connection.CreateCommand(); }
void IDisposable.Dispose() { this.Dispose(); }
public void Dispose()
{
Dispose(true); GC.SuppressFinalize(this);
}
private void Dispose(bool disposing)
{
if (disposing)
{
_TransactionScope.Dispose();
_TransactionScope = null;
if (!_IsNested)
{
// last one out closes the door, this would be the root LTS, the first one to be instanced.
LocalTransactionScope._Connection.Close();
LocalTransactionScope._Connection.Dispose();
LocalTransactionScope._Connection = null;
}
}
}
}
This is a Program.cs that will exhibit the connection pool exhaustion:
class Program
{
static void Main(string[] args)
{
// fill in your connection string, but don't monkey with any pooling settings, like
// "Pooling=false;" or the "Max Pool Size" stuff. Doesn't matter if you use
// Doesn't matter if you use Windows or SQL auth, just make sure you set a Data Soure and an Initial Catalog
string connectionString = "your connection string here";
List<string> randomTables = new List<string>();
using (var nonLTSConnection = new SqlConnection(connectionString))
using (var command = nonLTSConnection.CreateCommand())
{
command.CommandType = CommandType.Text;
command.CommandText = #"SELECT [TABLE_NAME], NEWID() AS [ID]
FROM [INFORMATION_SCHEMA].TABLES]
WHERE [TABLE_SCHEMA] = 'dbo' and [TABLE_TYPE] = 'BASE TABLE'
ORDER BY [ID]";
nonLTSConnection.Open();
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
string table = (string)reader["TABLE_NAME"];
randomTables.Add(table);
if (randomTables.Count > 200) { break; } // got more than enough to test.
}
}
nonLTSConnection.Close();
}
// we're going to assume your database had some tables.
for (int j = 0; j < 200; j++)
{
// At j = 100 you'll see it pause, and you'll shortly get an InvalidOperationException with the text of:
// "Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool.
// This may have occurred because all pooled connections were in use and max pool size was reached."
string tableName = randomTables[j % randomTables.Count];
Console.Write("Creating root-level LTS " + j.ToString() + " selecting from " + tableName);
using (var scope = new LocalTransactionScope(connectionString))
using (var command = scope.CreateCommand())
{
command.CommandType = CommandType.Text;
command.CommandText = "SELECT TOP 20 * FROM [" + tableName + "]";
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
Console.Write(".");
}
Console.Write(Environment.NewLine);
}
}
Thread.Sleep(50);
scope.Complete();
}
Console.ReadKey();
}
}
The expected TransactionScope/SqlConnection pattern is, according to MSDN:
using(TransactionScope scope = ...) {
using (SqlConnection conn = ...) {
conn.Open();
SqlCommand.Execute(...);
SqlCommand.Execute(...);
}
scope.Complete();
}
So in the MSDN example the conenction is disposed inside the scope, before the scope is complete. Your code though is different, it disposes the connection after the scope is complete. I'm not an expert in matters of TransactionScope and its interaction with the SqlConnection (I know some things, but your question goes pretty deep) and I can't find any specifications what is the correct pattern. But I'd suggest you revisit your code and dispose the singleton connection before the outermost scope is complete, similarly to the MSDN sample.
Also, I hope you do realize your code will fall apart the moment a second thread comes to play into your application.
Is this code legal?
using(TransactionScope scope = ..)
{
using (SqlConnection conn = ..)
using (SqlCommand command = ..)
{
conn.Open();
SqlCommand.Execute(..);
}
using (SqlConnection conn = ..) // the same connection string
using (SqlCommand command = ..)
{
conn.Open();
SqlCommand.Execute(..);
}
scope.Complete();
}