I'm using the MySQL .Net libraries in an older C# application I'm rewriting. The Data Access Layer is rather obsolete but I'm trying to make the best of it. But now I ran into some really nasty threading issues.
I have a series of about 20 Select statements which are used to process a report. They take about 5 seconds to complete and I'm displaying a progress bar while the Select statements run. I'm launching the operations via a simple ThreadPool call:
[LATER EDIT: What happens is that I called the method below twice due to a bug in my UI - this doesn't devalue the question though, merely explains why my threads were racing against each other.]
ThreadPool.QueueUserWorkItem(new WaitCallback(UpdateChart));
Sometimes it works.
Sometimes it crashes with a "possible IO stream race condition".
Sometimes it crashes with "connection should be open and valid".
Sometimes it crashes with "object reference not set...".
All classes in my DAL are Static because I thought this is a good way of improving performance (not having to create new class instances for every little operation).
And all my DAL classes use the same "root" DAL class which builds Connections:
public static class MySQLConnectionBuilder
{
private static MySqlConnectionStringBuilder ConnectionStringBuilder = new MySqlConnectionStringBuilder();
//I'm initializing the ConnectionStringBuilder with my server password & address.
public static MySqlConnection GetConnection ()
{
return new MySqlConnection(ConnectionStringBuilder.ConnectionString);
}
}
All my DAL classes have functions which are similar to the crashing function. The crashing function looks like this:
public static STDS.UserPresence.user_presenceDataTable GetPresence (int aUserID, DateTime aStart, DateTime aEnd)
{
ta.Connection = MySQLConnectionBuilder.GetConnection();
ds = ta.GetPresenceForUserBetweenDates(aUserID, aStart, aEnd);
ta.Connection.Close();
return ds;
}
Ideas? Tips on improvement? Will the threading issue go away if I switch to a more object-oriented (instance-driven) DAL?
The line
ta.Connection.Close()
Closes the connection last assigned to ta.Connection - not always the connection created in the same thread. This may close a connection on which a query is currently running in another thread.
If you want to quickly determine if this is what's happening, mark the connection variable with a [ThreadStatic] attribute in the class ta points to:
[ThreadStatic]
private static MySqlConnection connection;
I wouldn't use that approach for your final solution though, as it may cause the GC not to collect them.
A simple solution (for that problem, I can't determine if your classes have other multithreading issues) is to add the connection as a parameter to each of your DAL methods, allowing you to remove the class global Connection:
public static STDS.UserPresence.user_presenceDataTable GetPresence (int aUserID, DateTime aStart, DateTime aEnd)
{
using (MySqlConnection connection = MySQLConnectionBuilder.GetConnection())
{
ds = ta.GetPresenceForUserBetweenDates(connection, aUserID, aStart, aEnd);
return ds;
}
}
Threading issues never simply go away - they require attention. If you are unsure about what's happening, forget about the slight performance boost (if a query takes 5 seconds, any possible performance gain of using static classes would be below 1% anyway).
Related
I was looking into the possibility that one of my applications might have a memory leak, so started playing about with some very basic code samples. One I ended up with, when left over time, started to increase greatly in terms of the number of Handles (>3000). It is a very simple Console application with the code as follows:
public static void Main(string[] args)
{
using (SqlConnection sqlConnection = new SqlConnection())
{
}
Console.ReadLine();
}
Taking out the SqlConnection call removes any Handle increase, so I am assuming it has something to do with the connection pool. But as this only runs once before basically going into a wait for input, why would the Handle count keep increasing?
Thanks.
If you are running it on .NET 4.0, this might be the case
https://connect.microsoft.com/VisualStudio/feedback/details/691725/sqlconnection-handle-leak-net-4-0
you will find that the majority of the object cache is composed of framework objects such as those created so you can access the config files and resources with out having to manually parse the files yourself
IIRC the default object cache is about 4000 objects.
you have to remember that just because your only creating and disposing of a single object doesn't mean that's all the frame work is doing
I have worked with c# code for past 4 years, but recently I went through a scenario which I never pass through. I got a damn project to troubleshoot the "Index out of range error". The code looks crazy and all the unnecessary things were there but it's been in production for past 3 years I just need to fix this issue. Coming to the problem.
class FilterCondition
{
.....
public string DataSetName {get; set;}
public bool IsFilterMatch()
{
//somecode here
Dataset dsDataSet = FilterDataSources.GetDataSource(DataSetName); // Static class and Static collection
var filter = "columnname filtername"
//some code here
ds.defaultview.filter= filter;
var isvalid = ds.defaultView.rowcount > 0? true : false;
return isValid;
}
}
// from a out side function they put this in a parallel loop
Parallel.ForEach()
{
// at some point its calling
item.IsFiltermatch();
}
When I debug, dsDataSet I saw that dsDataSet is modified my multiple threads. That's why race condition happens and it failed to apply the filter and fails with index out of Range.
My question here is, my method is Non-static and thread safe, then how this race condition happening since dsDataset is a local variable inside my member function. Strange, I suspect something to do with Parallel.Foreach.
And when I put a normal lock over there issue got resolved, for that also I have no answer. Why should I put lock on a non-static member function?
Can anyone give me an answer for this. I am new to the group. if I am missing anything in the question please let me know. I can't copy the whole code since client restrictions there. Thanks for reading.
Because it's not thread safe.
You're accessing a static collection from multiple threads.
You have a misconception about local variables. Although the variable is local, it's pointing at an object which is not.
What you should do is add a lock around the places where you read and write to the static collection.
Problem: the problem lies within this call
FilterDataSources.GetDataSource(DataSetName);
Inside this method you are writing to a resource that is shared.
Solution:
You need to know which field is being written here and need to implement locking on it.
Note: If you could post your code for the above method we would be in a better position to help you.
I believe this is because of specific (not-stateless, not thread safe, etc) implementation of FilterDataSources.GetDataSource(DataSetName), even by a method call it seems this is a static method. This method can do different things even return cached DataSet instance, intercept calls to a data set items, return a DataSet wrapper so you are working with a wrapper not a data set, so a lot of stuff can be there. If you want to fine let's say "exact line of code" which causes this please show us implementation of GetDataSource() method and all underlying static context of FilterDataSource class (static fields, constructor, other static methods which are being called by GetDataSource() if such exists...)
my problem: Earlier this week, I got the task to speed up a task in our program. I looked at it and immediately got the idea of using a parallel foreach loop for a function in that task.
I implemented it, went through the function (including all sub-functions) and changed the SqlConnections (and other stuff) so it'd be able to run in parallel. I started the whole thing and all went well and fast (alone that reduced the time for that task by ~45%)
Now, yesterday we wanted to try the same thing with some more data and ...I got some weird problem: Whenever the parallel function got called, it did it's work...but sometimes one of the threads would hang for at least 4 minutes (timeouts are set to one minute, for connection AND command).
If I pause the program during that, I see that only one thread is still active from that loop and it hangs on
connection.Open()
After ~4 minutes the program simply proceeds without throwing an error (aside from a message in the Output box, saying that an exception somewhere occured, but it wasn't catched by my application but somewhere in the SqlConnection/SqlCommand object).
I can kill all connections on the MSSQLServer without anything happens, also the MSSQLServer does nothing during those 4 minutes, all connections are idle.
This is the procedure that is used for sending Update/Insert/Delete statements to the database:
int i = 80;
bool itDidntWork = true;
Random random = new Random();
while (itDidntWork && i > 0)
{
try
{
using (SqlConnection connection = new SqlConnection(sqlConnectionString))
{
connection.Open();
lock (connection)
{
command.Connection = connection;
command.ExecuteNonQuery();
}
itDidntWork = false;
}
}
catch (Exception ex)
{
if (ex is SqlException && ((SqlException)ex).ErrorCode == -2146232060)
{
Thread.Sleep(random.Next(500, 5000));
}
else
{
SqlConnection.ClearAllPools();
}
Thread.Sleep(random.Next(50, 110));
i--;
if (i == 0)
{
writeError(ex);
}
}
}
just in case: on smaller databases there can occur deadlocks (err number 2146232060), so if one occurs, I've to make the colliding statements occur in different time. Works great even on small databases/small servers. If the error wasn't caused by a deadlock, chances are that the connection was faulty, so I'm cleaning all broken connections.
Similiar functions exist for executing scalars, filling datatables/datasets (yes, the application is that old) and executing storedprocedures.
And yes all of those are used in the parallel loop.
Has someone any idea what could be going on there? Or an idea on how I can find out what is going on there?
*edit about the command object:
it is given to the function, the command object is always a new object when it is given into the function.
about the lock: If I put the lock away, I get dozens and hundreds of 'connection is closed' or 'connection is already open' errors, because the Open() function just get's a connection from .NET's connection pool. The lock does work as intended.
Example code:
using(SqlCommand deleteCommand = new SqlCommand(sqlStatement))
{
ExecuteNonQuerySafely(deleteCommand); // that's the function that contains the body I posted above
}
*edit 2
I've to make a correction: It hangs on this
command.Connection = connection;
at least I guess it does, because when I pause the application, the 'step' mark thingi is green and on
command.ExecuteNonQuery();
saying that that is the statement that'll be executed next.
*edit 3
just to be sure I just started another test without any locks around the connection object...will take some minutes to get the results.
*edit 4
well, I was wrong. I removed the lock statements and...it still worked. Maybe the first time I tried it there was a reused connection or something. Thank's for pointing it out.
*edit 5
I'm getting the feeling that this occurs only with one specific call to a specific database procedure. I don't know why. C# wise there is no difference between that call and other calls see edit 6. And since it didn't execute the statement at that point (I guess. Maybe someone can correct me on that. If, in debug mode, a line is green marked (instead of yellow) it didn't execute that statement yet but waits for the statement before that line to finish, is that correct?) it's strange.
*edit 6
There were 3 command objects that were reused the whole time. They were defined above the parallel function. I don't know how bad that is/was. They were only used to call one stored procedure (each of them called a different procedure), of course with different parameters and a new connection (through the above mentioned method).
*edit 7
ok, it's really only when one specific stored procedure is called. Except that it's on the assignment of the connection object that it hangs (next line is marked green).
Trying to figure out what the cause for that is atm.
*edit 8
yay, it just happened at another command. So that was that.
*edit 9
ok. Problem solved. The 'hangs' were actually CommandTimeouts that were set to 10 minutes(!). They were only set for two commands (the one I mentioned in edit 7 and the one that I mentioned in edit 8). Since I found both of them while I was restructuring my commands to make them like devundef suggested, I marked his answer as the one that solved my problem. Also his suggestion of limiting the amounts of threads my for-loop was using sped up the process even more.
Special thank's to Marc Gravell for explaining stuff and hanging in here with me on a saturday ;)
I think the problem can be found in your edit 6: edit 6: ...3 command objects were reused the whole time.
Any data that's used inside a parallel loop must be created inside the loop or it must have the proper synchronization code in place to ensure only 1 thread at a time has access to that particular object. I don't see such code inside the ExecuteNonQuerySafely.
a) Locking the connection has no effect there because the connection object is created inside the method.
b) Locking the command will not guarantee thread safety - probably you're setting the command parameters before locking it inside the method. A lock(command) will work if you lock the command before call the ExecuteNonQuerySafely, however locks inside a parallel loop is not a good thing to do - it's the definition of anti-parallel afterall, better avoid this altogether and create a new command for each iteration. Better yet would be to do a little refactoring on ExecuteNonQuerySafely, it could accept a callback action instead of an SqlCommand. Example:
public void ExecuteCommandSafely(Action<SqlCommand> callback) {
... do init stuff ...
using (var connection = new SqlConnection(...)) {
using (var command = new SqlCommand() {
command.Connection = connection;
try{
callback(command);
}
... error handling stuff ...
}
}
}
And use :
ExecuteCommandSafely((command) => {
command.CommandText = "...";
... set parameters ..
command.ExecuteNonQuery();
});
Last, the fact that you're getting errors executing the commands in parallel is a sign that maybe parallel execution is not a good thing to do in this case. You're wasting servers resources to get errors. Connections are expensive, try to use the MaxDegreeOfParalellism option to tune the workload for this particular loop (remembering that the optimal value will change according to the hardware/server/network/etc..). The Parallel.ForEach method has an overload that accepts a ParallelOptions parameters where you can set how many threads you want to execute in parallel for that par
(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism.aspx).
I have a class containing methods to fill DropDowns, return DataSet, return Scalar or simply excute a query. In one of my older posts in StackOverflow, I submitted a buggy code of the same class. Based on the advice of the contributors, I have improved the code and want to know whether this class is suitable to be used in a high-concurrent environment:
public sealed class reuse
{
public void FillDropDownList(string Query, DropDownList DropDownName)
{
using (TransactionScope transactionScope = new TransactionScope())
{
using (SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["MyDbConnection"].ConnectionString.ToString()))
{
SqlDataReader dr;
try
{
if (DropDownName.Items.Count > 0)
DropDownName.Items.Clear();
SqlCommand cmd = new SqlCommand(Query, con);
dr = cmd.ExecuteReader();
while (dr.Read())
DropDownName.Items.Add(dr[0].ToString());
dr.Close();
}
catch (Exception ex)
{
CustomErrorHandler.GetScript(HttpContext.Current.Response,ex.Message.ToString());
}
}
}
}
}
I want to know whether to dispose Command and DataReader objects as well or they too will get automatically disposed with USING?
The the command/reader: they would be disposed by "using", but only if you use "using" for them, which you should.
Criticisms:
you are mixing UI and data access horribly - the exception handling in particular gives no indication to the calling code (although personally I'd keep the control code separate too), and assumes the caller always wants that script-based approach (to me, if this code fails, things are very wrong: let that exception bubble upwards!)
no mechanism for proper parameters; my suspicion then, is that you're concatenating strings to make a query - potential (but very real) risk of SQL injection
you mention high-concurrent; if so, I would expect to see some cache involvement here
for code maintenance reasons, I'd move all "create a connection" code to a central point - "DRY" etc; I wouldn't expect an individual method like this to concern itself with details like where the connection-string comes from
Frankly I'd just use dapper here, and avoid all these issues:
using(var connection = Config.OpenConnection()) {
return connection.Query<string>(tsql, args).ToString();
}
(and let the caller iterate over the list, or use AddRange, or data-binding, whatever)
Generally agree with Marc's answer but I have some other comments and different angle. Hope my answer will be useful for you.
First, there is nothing wrong in using static classes and methods in concurrent environment as long as there is no need for any state information and no data is shared. In your case, filling up DropDownList, it is perfectly fine because you only need a list of strings and once that's done you can forget all about how you got it. There is also no interference between concurrent calls to static method if they do not access any static fields. Static methods are common across .NET framework and they are thread safe.
In my example below I do use one static field - log4net logger. It is still thread-safe because it does not carry any state and is merely a jump point to log4net library which itself is thread-safe. Do recommend at least looking at log4net - great logging lib.
It could only be unsafe if you tried to fill the same drop down list from two threads but then it would be also unsafe if this class was not static. Make sure drop downs are filled from one (main) thread.
Back to your code. Mixing UI and data retrieval is not a good practice as it makes code much less maintainable and less stable. Separate those two. Dapper library might be a good way to simplify things. I have not used it myself so all I can tell is that it looks very handy and efficient. If you want/need to learn how stuff works don't use it though. At least not at first.
Having non-parametrized query in one string is potentially prone to SQL injection attacks but if that query is not constructed based on any direct user input, it should be safe. Of course you can always adopt parametrization to be sure.
Handling exception using
CustomErrorHandler.GetScript(HttpContext.Current.Response, ex.Message.ToString());
feels flaky and too complex for this place and may result in another exception. Exception when handling another exception means panic. I would move that code outside. If you need something here let it be a simple log4net error log and re-throw that exception.
If you only do one DB read there is no need for an explicit transaction. As per connection object, it should not be static in any situation and be created on demand. There is no performance penalty in that because .NET keeps a pool of ready to use connections and recycles those that were 'disposed'.
I believe that an example is always better than just explanations so here is how I would re-arrange your code.
public static class reuse
{
static public readonly log4net.ILog log = log4net.LogManager.GetLogger("GeneralLog");
public static void FillDropDownList(string query, string[] parms, DropDownList dropDown)
{
dropDown.Items.Clear();
dropDown.DataSource = GetData(query, parms);
dropDown.DataBind();
}
private static IEnumerable<string> GetData(string query, string[] parms)
{
using (SqlConnection con = new SqlConnection(GetConnString()))
{
try
{
List<string> result = new List<string>();
SqlCommand cmd = new SqlCommand(query, con);
cmd.Parameters.AddRange(parms);
SqlDataReader dr = cmd.ExecuteReader();
if (dr.VisibleFieldCount > 0)
{
while (dr.Read())
result.Add(dr[0].ToString());
}
dr.Close();
return result;
}
catch (Exception ex)
{
log.Error("Exception in GetData()", ex);
throw;
}
}
}
private static string GetConnString()
{
return ConfigurationManager.ConnectionStrings["MyDbConnection"].ConnectionString.ToString(CultureInfo.InvariantCulture);
}
}
Hey all - I have an app where I'm authenticating the user. They pass username and password. I pass the username and password to a class that has a static method. For example it'm calling a method with the signature below:
public class Security
{
public static bool Security.Member_Authenticate (string username, string password)
{ //do stuff}
}
If I have 1000 people hitting this at once, will I have any problems with the returns of the method bleeding into others? I mean, since the methods are static, will there be issues with the a person getting authenticated when in fact they shouldn't be but the person before them was successfully authenticated ASP.Net returns a mismatched result due to the method being static? I've read of issues with static properties vs viewstate but am a bit confused on static methods. If this is a bad way of doing this,what's the prefered way?
This will not happen. When a method is Static (or Shared in VB.NET), then you're safe as long as the method doesn't rely on anything other than the inputs to figure something out. As long as you're not modifying any public variables or objects from anywhere else, you're fine.
A static method is just fine as long as it is not using any of persistent data between successive calls. I am guessing that your method simply runs a query on the database and returns true/false based on that.
In this scenario, I think the static method should work without a problem, regardless of how many calls you make to it.
ASP.net does use all sorts of under-the-hood thread pooling, which can make static methods and fields dicey.
However, you can avoid most threading issues with a static method by using only locally-scoped variables in that method. That way, each thread (user) will have their own in-memory copy of all the variables being used.
If you use higher-scoped variables, make sure to make all access to them thread-conscious.
Throwing exceptions is not a good practice as it makes the .net runtime to create extra infrastructure for catching them. To verify this create a class and and populate it with some random values using a loop. Make the loop iterate for a large counter like 10,000. Record the time it takes to create the list. Now enclose the instance creation in a try..catch block and record the time. Now, you can see the exceptionally large difference.
e.g
for(int i=0; i<10000; i++){
Employee emp = new Employee();
emp.Name = "Random Name" + i.ToString();
}
Versus
for(int i=0; i<10000; i++){
try{
Employee emp = new Employee();
emp.Name = "Random Name" + i.ToString();
}catch{}
}
Although there is no fixed solution whether to throw exception or not, it is a best practice to create alternate flows in your program and handle every condition with proper return values. Exceptions should be thrown only when the situation can be justified as exceptional.
While I can see the value of the static method in regards to the perceived performance gains, I believe the real issue here is whether the gains (and risks) are worth the maintenance kludge and security weakness you are potentially creating. I believe that most people would warn you away from providing a public method that accepts an user credentials and returns success or failure. It potentially provides an easy a method for hacking.
So, my point is philosophical. Otherwise, I agree with others who have pointed out that restricting the code to use local variables should ensure that you do not have any problems with side effects due to concurrent access of the method, even on different threads, i.e., if you invoke the method on a ThreadPool thread.
Maybe it's better to use public static void Authenticate(string, string) which throws an exception if something goes wrong (return false in original method) ?
This is a good .NET style. Boolean function return-type is the C style and is obsolete.
Why don't you have the user class with username and password and a method that is called authenticate?