I'm a novice C# dev and I'm writing a database app the performs updates on two different tables and inserts on another two tables and each process is running on it's own separate thread. So I have two threads handling inserts on two different tables and two threads handling updates on two different tables. Each process is updating and inserting approximately 4 or 5 times per second so I don't close the connection until the complete session is over then I close the entire app. I wanted to know if I should be closing the connection after each insert and update even though I'm preforming these operations so frequently. 2nd, should I have each thread running on it's own connection and command object.
By the way I'm writing the app in C# and the database is MySQL. Also, as of now I'm using one connection and command object for all four threads. I keep getting an error message saying "There is already an open DataReader associated with this connection that must be closed first", that's why I'm asking if I should be using multiple connection and command objects.
Thanks
-Donld
If you enable connection pooling, it should enable optimal use of MySql connections for your scenario. Either way, generally the best pattern to follow is:
Acquire and open connection
Do work
Close/release connection
Something similar to (I'm a bit rusty on the class names for the MySql connector, so this may not be exactly correct, but you should get the general idea!):
private void DoMyPieceOfWork(int value1, int value2)
{
using(MySqlConnection connection = new MySqlConnection(
CONNECTION_STRING_GOES_HERE))
{
connection.Open();
using(MySqlCommand command = new MySqlCommand(
"INSERT INTO TABLE `blah` (Column1, Column2) VALUES #column1, #column2"))
{
command.Parameters.Add("#column1", MySqlType.Int).Value = value1;
command.Parameters.Add("#column2", MySqlType.Int).Value = value2;
command.ExecuteNonQuery();
}
connection.Close();
}
}
Of course this is a contrived, simplistic, example but the gist of it stands.
you have either to create a new connection for each thread, or (it's an idea) create a synchronized queue of command. Then process the queue in a single working thread.
You may also take a look as the Task class of the .Net framework 4
Related
Suppose I have an application that extract some data from an internet site and add them to a database. This application run in multiple instance, each instance extract data for a specific country.
Some data are linked to a master table called rounds which have as PK an auto-increment field, my doubt comes from this code:
using (MySqlConnection connection = new DBConnection().Connect())
{
using (MySqlCommand command = new MySqlCommand())
{
command.Connection = connection;
command.CommandText = "INSERT IGNORE INTO competition_rounds (round_id, season_id, `name`)
VALUES (#round_id, #season_id, #round_name)";
command.Parameters.Add("#round_id", MySqlDbType.Int32).Value = round.Id;
command.Parameters.Add("#season_id", MySqlDbType.Int32).Value = round.seasonId;
command.Parameters.Add("#round_name", MySqlDbType.VarChar).Value = round.Name;
command.ExecuteNonQuery();
return Convert.ToInt32(command.LastInsertedId);
}
}
The code above add a new round to the rounds table, and this works well. But if I have multiple instances running, is possible that the application will fire the same code (in both the instances) and return the same id for both instance? eg:
instance 1 -> fire round insert -> return 3
instance 2 -> fire round insert -> return 3
both instance has executed the same method in the exact same time. Could this situation happen? Is possible prevent that? Should I create a guid or a composed PK?
The client loads the LastInsertedId property from the OK_PACKET:
An OK packet is sent from the server to the client to signal
successful completion of a command. As of MySQL 5.7.5, OK packets are
also used to indicate EOF, and EOF packets are deprecated.
On the server side, from the documentation:
You can retrieve the most recent automatically generated
AUTO_INCREMENT value with the LAST_INSERT_ID() SQL function or the
mysql_insert_id() C API function. These functions are
connection-specific, so their return values are not affected by
another connection which is also performing inserts.
In other words, this kind of situation is accounted for (in any respectable DB system).
You'll be fine.
Database Management Systems (DBMSs) such as MySQL operate on a basis of ACID (Atomicity, Consistency, Isolation, Durability) transactions. These transactions are scheduled in a sequential all or nothing fashion. Therefore, you don't need to worry about parallel transactions.
That said, with the multiple application instances you may need to worry about which transaction is process first. That is, UserA of application instanceA may send insert A and UserB of application instanceB may send insert B some time after UserA. Even though UserA sent the request first, they can be received and process in B then A order - perhaps due to network latency.
I have read and implemented several different versions of Microsofts suggested methods for querying a SQL Server database. In all that I have read, each query is surrounded by a using statement, e.g. In some method DoQuery:
List<List<string>> DoQuery(string cStr, string query)
{
using(SqlConnection c = new SqlConnection(cStr))
{
c.Open();
using(SqlCommand cmd = new SqlCommand(queryStr, c))
{
using(SqlDataReader reader = cmd.ExecuteReader())
{
while (reader.Read() )
{
...
//read columns and put into list to return
}
// close all of the using blocks
}
}
}
// return the list of rows containing the list of column values.
}
I need to run this code several hundreds of times for different query strings against the same database. It seems that creating a new connection each time would be inefficient and dropping it each time wasteful.
How should I structure this so that it is efficient? When I tried not using a using block and passing the connection into the DoQuery method, I got messages about the connection had not been closed. If I closed it after the query, then I got messages about it wasn't open.
I'm also trying to improve this because I keep getting somewhat random
IOException: Unable to read data from the transport connection: Operation on non-blocking socket would block.
I'm the only user of the database at this time and I'm not doing anything in multiple threads or async, etc. Just looping through query strings and running DoQuery on them.
Could my structure be part of that problem, i.e. not releasing the resources fast enough and thereby seeing the connection blocked?
I'm stuck here on efficiency and this blocking problem. Thanks in advance.
As it turns out, the query structure was fine and the queries were fine. The problem was that I had an ‘order by X desc’ on each query and that column was not indexed. This caused a full table scan to order the rows even if only returning 2. The table has about 3 million rows and I thought it could handle that better than it does. It timed out with 360 second connection timeout! I indexed the column and no more ‘blocking’ nonsense, which BTW, is a horrible message to return when it was actually a timeout. The queries now run fine if I index every column that appears in a where clause.
I have a .NET Core C# console application that performs a large number of calculations and then writes the results to a SQL Server 2016 Developer edition database using Dapper (and Dapper.Contrib). The issue I'm having is that when I run a lot of items in parallel (greater than 1000, for example), I start getting intermittent connection failures on the .Open() call, saying
A network-related or instance-specific error occurred...
This often happens after several thousand rows have already been inserted successfully.
A simplified version of the code would look like the following:
Parallel.ForEach(collection, (item) =>
{
var results = item.Calculate(parameters);
dal.Results.Insert(results);
allResults.AddRange(results);
});
And inside the Insert method, it looks like this:
public override void Insert(IEnumerable<Result> entities)
{
using (var connection = GetConnection())
{
connection.Open();
using (var transaction = connection.BeginTransaction(IsolationLevel.ReadCommitted))
{
connection.Insert(entities, transaction);
transaction.Commit();
}
}
}
Some other things about the code that I don't think are affecting this but might be relevant:
dal.Results is simply a repository that contains that Insert() method and is preinitialized with a connection string that is used to instantiate a new SqlConnection(connectionString) every time GetConnection() is called.
allResults is a ConcurrentBag<Result> that I'm using to store all the results for later use outside the Parallel.ForEach
I'm using a transaction because it seems to perform better this way, but I'm open to suggestions if that could be causing problems.
Thanks in advance for any guidance on this issue!
There is no advantage to execute heavily IO-bound db-operations in parallel.
You should create fever but bigger bunches of data to be inserted with minimun amount of database transactions. That can be achieve with several ways:
With SQL Bulk Insert operations provided by .NET framework
By using external library which is specialized on high-speed Bulk operations
By crafting sql stored procedure which takes array of data as parameter. More information about table-valued parameters can be found in https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-table-valued-parameters-database-engine
So try following: Execute CPU-intensive calculations in parallel loop and save allResults into database after loop.
I have the following code that takes about an hour to run through a few hundred thousand rows:
public void Recording(int rowindex)
{
using (OleDbCommand cmd = new OleDbCommand())
{
try
{
using (OleDbConnection connection = new OleDbConnection(Con))
{
cmd.Connection = connection;
connection.Open();
using (OleDbTransaction Scope = connection.BeginTransaction(SD.IsolationLevel.ReadCommitted))
{
try
{
string Query = #"UPDATE [" + SetupAction.currentTable + "] set Description=#Description, Description_Department=#Description_Department, Accounts=#Accounts where ID=#ID";
cmd.Parameters.AddWithValue("#Description", VirtualTable.Rows[rowindex][4].ToString());
cmd.Parameters.AddWithValue("#Description_Department", VirtualTable.Rows[rowindex][18].ToString());
cmd.Parameters.AddWithValue("#Accounts", VirtualTable.Rows[rowindex][22].ToString());
cmd.Parameters.AddWithValue("#ID", VirtualTable.Rows[rowindex][0].ToString());
cmd.CommandText = Query;
cmd.Transaction = Scope;
cmd.ExecuteNonQuery();
Scope.Commit();
}
catch (OleDbException odex)
{
MessageBox.Show(odex.Message);
Scope.Rollback();
}
}
}
}
catch (OleDbException ex)
{
MessageBox.Show("SQL: " + ex);
}
}
}
It works as I expect it to, however today my program crashed while running the query (in a for loop where rowindex is the index of a datatable), the computer crashed, and when I rebooted the program, it said:
Multi-step OleDB operation generated errors: followed by my connection string.
What happened is that database is entirely uninteractable, even microsoft access's recovery methods can't seem to help out here.
I've read that this may be caused when the data structure of the database is altered from what it expected it to be. My question is, how do I prevent this, since I can't really detect whether my program stopped functioning all of a sudden.
There could be a way for me to restructure it somehow, maybe there's a function I don't know about. Perhaps it is sending something of an empty query when the crash happens, but I don't know how to stop it.
The Jet/ACE database engine already attempts to avoid corruption and to automatically recover from catastrophic events (lost connections, computer crashing). Transactions can further protected against inconsistent data by committing (or discarding) multiple operations altogether. But eventually there may be some coincidental system failure which could terminate an operation at some critical write position, thereby creating critical inconsistencies in the database file. Making regular and timely backups is part of an overall solution. For very long operations it might be worth making an automated copy of the entire database file prior to the operation.
Otherwise, an extreme alternative is to
Create a second intermediate database into which all data is first inserted. (Only needs to be done once.)
In this intermediate database, create linked tables to relevant tables in the permanent, working database.
Also in the intermediate database, create an indexed local table that mirrors the linked table structure into which data will be inserted. OR if the intermediate database and table already exist, clear the local table (i.e. delete all rows).
Have your current software insert into the local intermediate table.
Run a single query which then updates the linked table from the temporary table. Wrap that update in a transaction.
Here's where the linked table has the benefit that it can be referenced in an SQL query just like any local table. You only have to explicitly open the intermediate data. In other words, just perform a simple query like UPDATE LocalTable INNER JOIN LinkedTable ON LocalTable.UpdateID = LinkedTable.ID SET LinkedTable.Data = LocalTable.Data
The benefit to this process is that the single query that updates one Access table from another can be very fast, possibly much faster than the multiple update operations in your code. This could reduce the likelihood that errors in your update code will negatively effect your database. This of course doesn't completely eliminate the random computer crash that can effect the database, but reducing the time that multiple connections and update queries are executed might make it less likely.
I think your catch block is wrong, because if you get an exception other than OleDbException, you will not roll back the transaction
try
{
// ...
Scope.Commit();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
Scope.Rollback();
}
That is, Exception instead of OleDbException. Exceptions could come from anywhere and not necessarily Ole DB, and you still want to roll back everything you've done so far in that case.
That being said, if you have a few hundred thousand rows, I would seriously consider batching the update, and processing just a few thousand per iteration with a transaction per iteration
In terms of transactional behavior, the main question would be: Do you really want to roll back everything you have updated so far in case of failure, or just retry/continue where you left off? If answer is that you want to retry/continue then you will likely want to create a BatchUpdateTask table or similar... with all the information you need for each iteration
On a ASP.NET website I use LINQ to SQL to get data. This operation somewhat long (can be up to 3 seconds) and often user clicks on a link second time
There is already an open DataReader associated with this Command
which must be closed first.
I looked at DataReader already open when using LINQ and other similar threads but I do not understand how to handle/fix this.
Should I get rid of LINQ to SQL alltogether? What is a proper way to handle this?
EDIT:
Code that I call from Page_Load
using (var wdc = new WebDataContext())
{
// Expensive operation, increase timeout
wdc.CommandTimeout = 120;
// First need to update data for this customer
wdc.Web_WrkTrackShipment_Update((int)this.Parent.ProviderUserKey, sessionId);
// Return set of this data based on parameters.
return wdc.Web_WrkTrackShipment_Load(sessionId, pageNo, pageSize, searchCriteria, dateFrom, dateTo, ref pageCount_Null).ToList();
}
You can fix this by setting Multiple Active Result Sets(MARS) to true in the connectionstring. Note that this might indicate a N+1 join problem, it really depends on the query.
string connectionString = "Data Source=MSSQL1;" +
"Initial Catalog=AdventureWorks;Integrated Security=SSPI" +
"MultipleActiveResultSets=True";
I suspect, yhe underlying reason is DataContext not being thread-safe. Different requests run on different threads.
Don't share your DataContext between threads. Either start one on BeginRequest and close it on EndRequest - or create one locally whenever you need it, and wrap it in a using statement so it is immediately disposed when your code has finished using it.