Sync Lock issue in Parallel.Foreach - c#

I have worked with c# code for past 4 years, but recently I went through a scenario which I never pass through. I got a damn project to troubleshoot the "Index out of range error". The code looks crazy and all the unnecessary things were there but it's been in production for past 3 years I just need to fix this issue. Coming to the problem.
class FilterCondition
{
.....
public string DataSetName {get; set;}
public bool IsFilterMatch()
{
//somecode here
Dataset dsDataSet = FilterDataSources.GetDataSource(DataSetName); // Static class and Static collection
var filter = "columnname filtername"
//some code here
ds.defaultview.filter= filter;
var isvalid = ds.defaultView.rowcount > 0? true : false;
return isValid;
}
}
// from a out side function they put this in a parallel loop
Parallel.ForEach()
{
// at some point its calling
item.IsFiltermatch();
}
When I debug, dsDataSet I saw that dsDataSet is modified my multiple threads. That's why race condition happens and it failed to apply the filter and fails with index out of Range.
My question here is, my method is Non-static and thread safe, then how this race condition happening since dsDataset is a local variable inside my member function. Strange, I suspect something to do with Parallel.Foreach.
And when I put a normal lock over there issue got resolved, for that also I have no answer. Why should I put lock on a non-static member function?
Can anyone give me an answer for this. I am new to the group. if I am missing anything in the question please let me know. I can't copy the whole code since client restrictions there. Thanks for reading.

Because it's not thread safe.
You're accessing a static collection from multiple threads.
You have a misconception about local variables. Although the variable is local, it's pointing at an object which is not.
What you should do is add a lock around the places where you read and write to the static collection.

Problem: the problem lies within this call
FilterDataSources.GetDataSource(DataSetName);
Inside this method you are writing to a resource that is shared.
Solution:
You need to know which field is being written here and need to implement locking on it.
Note: If you could post your code for the above method we would be in a better position to help you.

I believe this is because of specific (not-stateless, not thread safe, etc) implementation of FilterDataSources.GetDataSource(DataSetName), even by a method call it seems this is a static method. This method can do different things even return cached DataSet instance, intercept calls to a data set items, return a DataSet wrapper so you are working with a wrapper not a data set, so a lot of stuff can be there. If you want to fine let's say "exact line of code" which causes this please show us implementation of GetDataSource() method and all underlying static context of FilterDataSource class (static fields, constructor, other static methods which are being called by GetDataSource() if such exists...)

Related

Is using property getters for initialization (to avoid having to call methods in specific order) bad practice?

Suppose I have a class that provides some data to my application. Data initially comes from database, and I provide it through some methods that handle the whole database thing and present the result as a usable class instead of raw query result. This class has to do some setup (not complex) to make sure any method called can use the database (e.g. connect to database, make sure it contains some critical info, etc). So, were I to put it in a method (say, method Init(), that would handle checking for database, connecting to it, verifying that it does contain the info), I would have to make sure that this method is called before any other method.
So, I usually find that instead of doing this:
public class DataProvider
{
private SqlController controller;
public void Init()
{
controller = new SqlController();
controller.Init();
controller.ConnectToDataBase();
CheckForCriticalInfoInDatabase();
}
public Data GetData()
{
// get data from database (not actually going to use raw queries like that, just an example)
var queryResult = sqlController.RunQuery("SELECT something FROM SOME_TABLE");
// and present it as usable class
Data usefulData = QueryResultToUsefulData(queryResult);
return usefulData;
}
...
}
and then always making sure I call Init() before GetData(), i do something like
private SqlController _controller;
private SqlController controller
{
get
{
if (_controller == null)
{
_controller = new SqlController();
_controller.Init();
_controller.ConnectToDataBase();
CheckForCriticalInfoInDatabase();
}
return controller;
}
}
So, now i can be sure that i won't use an uninitialised SqlController, and I don't have to do that same null check in every method that uses it. However, I never noticed getters being used this way in other peoples' code.
Is there some pitfall I don't see? To me it looks like it's the same as lazy initialization, with the exception being that I use it not because the initialization is heavy or long, but because I don't want to check the order in which I call methods. This question points out that it's not thread-safe (not a concern in my case, plus I imagine it could be made thread-safe with some locks) and that setting the property to null will result in unintuitive behaviour (not a concern, because I don't have a setter at all and the backing field shouldn't be touched either way).
Also, if this kind of code IS bas practice, what is the proper way to ensure that my methods don't rely on order in which they are called?
As #madreflection said in the OP comments, use a method for anything that is possibly going to be slow. Getters and setters should just be quick ways of getting and setting a value.
Connections to dbs can be slow or fail to connect so you may have catches setup to try different connection methods etc.
You could also have the checking occur in the constructor of the object, that way the object cannot be used without init() being run in a different function, saving on time tracing where an error is actually occurring.
For example if you had one function create the object, do a bunch of 'stuff' then try to use the object without running init(), then you get the error after all of the 'stuff' not where you created the object. This could lead you to think there is something wrong in whatever way you are using the object, not that it has not been initialised.

Are Private properties of a class called within a Parallel.Foreach body Thread Safe?

I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.

Knowing in what part the method returned

Is it possible to get run-time information about where a method has returned?
I mean, if the method returned after running all its lines of code, or because of an earlier
return statement that occurred due to some condition.
The scenario is using interceptor for creating UnitOfWork that should exists in method scope.
For example, lets consider this code:
[UnitOfWork]
public void Foo()
{
// insert some values to the database, using repositories interfaces...
DoSomeChangesInTheDataBaseUsingRepositories();
var result = DoSomethingElse();
if (!result) return;
DoMoreLogicBecuseTheResultWasTrue();
}
I have interceptor class that opens thread static unit of work for methods that are flagged with [UnitOfWork] and when the scope of the method ends it run commit on the UoW and dispose it.
This is fine, but lets consider the scenario above, where for some reason a programmer decided to return in the middle of the method due to some condition, and in that scenario the changes made by the repositories should not be persisted.
I know that this can indicate wrong design of the method, but be aware that it is a possible scenario to be written by a programmer and I want to defend my database from these kind of scenarios.
Also, I don't want to add code to the method itself that will tell me where it ended. I want to infer by the method info somehow its returned point, and if it is not at the end of its scope the interceptor will know not to commit.
The simple answer is use BREAKPOINTS and Debugging.
Edit:- As mentioned by Mels in the comments. This could be a useful suggestion.
If your application is very timing-sensitive, set conditional breakpoints such that they never actually stop the flow of execution. They do keep track of Hit Count, which you can use to backtrace the flow of execution.
Just for your attention. From the microsoft site:-
For those out there who have experience debugging native C++ or VB6
code, you may have used a feature where function return values are
provided for you in the Autos window. Unfortunately, this
functionality does not exist for managed code. While you can work
around this issue by assigning the return values to a local variable,
this is not as convenient because it requires modifying your code. In
managed code, it’s a lot trickier to determine what the return value
of a function you’ve stepped over. We realized that we couldn’t do the
right thing consistently here and so we removed the feature rather
than give you incorrect results in the debugger. However, we want to
bring this back for you and our CLR and Debugger teams are looking at
a number potential solutions to this problem. Unfortunately this is
will not be part of Visual Studio 11.
There are a couple ways that normally indicate that a method exited early for some reason, one is to use the actual return value, if the value is a valid result that then your method probably finished correctly, if its another value then probably not, this is the pattern that most TryXXX methods follow
int i;
//returns false as wasn't able to complete
bool passed = int.TryParse("woo", out i);
the other is to catch/trhow an exception, if an exception is found, then the method did not complete as you'd expect
try
{
Method();
}
catch(Exception e)
{
//Something went wrong (e.StackTrace)
}
Note: Catching Exception is a bad idea, the correct exceptions should be caught, i.e NullReferenceException
EDIT:
In answer to your update, if your code is dependant on the success of your method you should change the return type to a boolean or otherwise, return false if unsuccessful
Generally you should use trace logs to watch you code flow if you cant debug it.
You could always do something like this:
private Tuple<int, MyClass> MyMethod()
{
if (condition)
{
return new Tuple<int, MyClass>(0,new MyClass());
}
else if(condition)
{
return new Tuple<int, MyClass>(1, new MyClass());
}
return new Tuple<int, MyClass>(2,new MyClass());
}
This way you´ll have an index of which return was returning your MyClass object. All depends on what you are trying to accomplish and why - which is at best unclear. As someone else mentioned - that is what return values are for.
I am curios to know what you are trying to do...

When does a param that is passed by reference get updated?

Suppose I have a method like this:
public void MyCoolMethod(ref bool scannerEnabled)
{
try
{
CallDangerousMethod();
}
catch (FormatException exp)
{
try
{
//Disable scanner before validation.
scannerEnabled = false;
if (exp.Message == "FormatException")
{
MessageBox.Show(exp.Message);
}
}
finally
{
//Enable scanner after validation.
scannerEnabled = true;
}
}
And it is used like this:
MyCoolMethod(ref MyScannerEnabledVar);
The scanner can fire at any time on a separate thread. The idea is to not let it if we are handling an exception.
The question I have is, does the call to MyCoolMethod update MyScannerEnabledVar when scannerEnabled is set or does it update it when the method exits?
Note: I did not write this code, I am just trying to refactor it safely.
You can think of a ref as making an alias to a variable. It's not that the variable you pass is "passed by reference", it's that the parameter and the argument are the same variable, just with two different names. So updating one immediately updates the other, because there aren't actually two things here in the first place.
As SLaks notes, there are situations in VB that use copy-in-copy-out semantics. There are also, if I recall correctly, rare and obscure situations in which expression trees may be compiled into code that does copy-in-copy-out, but I do not recall the details.
If this code is intended to update the variable for reading on another thread, the fact that the variable is "immediately" updated is misleading. Remember, on multiple threads, reads and writes can be observed to move forwards and backwards in time with respect to each other if the reads and writes are not volatile. If the intention is to use the variable as a cross-thread communications mechanism them use an object actually designed for that purpose which is safe for that purpose. Use some sort of wait handle or mutex or whatever.
It gets updated live, as it is assigned inside the method.
When you pass a parameter by reference, the runtime passes (an equivalent to) a pointer to the field or variable that you referenced. When the method assigns to the parameter, it assigns directly to whatever the reference is pointing to.
Note, by the way, that this is not always true in VB.
Yes, it will be set when the variable is set within the method. Perhaps it would be best to return true or false whether the scanner is enabled rather than pass it in as a ref arg
The situation calls for more than a simple refactor. The code you posted will be subject to race conditions. The easy solution is to lock the unsafe method, thereby forcing threads to hop in line. The way it is, there's bound to be some bug(s) in the application due to this code, but its impossible to say what exactly they are without knowing a lot more about your requirements and implementation. I recommend you proceed with caution, a mutex/lock is an easy fix, but may have a great impact on performance. If this is a concern for you, then you all should review a better thread safe solution.

Static methods in a class - okay in this situation?

Hey all - I have an app where I'm authenticating the user. They pass username and password. I pass the username and password to a class that has a static method. For example it'm calling a method with the signature below:
public class Security
{
public static bool Security.Member_Authenticate (string username, string password)
{ //do stuff}
}
If I have 1000 people hitting this at once, will I have any problems with the returns of the method bleeding into others? I mean, since the methods are static, will there be issues with the a person getting authenticated when in fact they shouldn't be but the person before them was successfully authenticated ASP.Net returns a mismatched result due to the method being static? I've read of issues with static properties vs viewstate but am a bit confused on static methods. If this is a bad way of doing this,what's the prefered way?
This will not happen. When a method is Static (or Shared in VB.NET), then you're safe as long as the method doesn't rely on anything other than the inputs to figure something out. As long as you're not modifying any public variables or objects from anywhere else, you're fine.
A static method is just fine as long as it is not using any of persistent data between successive calls. I am guessing that your method simply runs a query on the database and returns true/false based on that.
In this scenario, I think the static method should work without a problem, regardless of how many calls you make to it.
ASP.net does use all sorts of under-the-hood thread pooling, which can make static methods and fields dicey.
However, you can avoid most threading issues with a static method by using only locally-scoped variables in that method. That way, each thread (user) will have their own in-memory copy of all the variables being used.
If you use higher-scoped variables, make sure to make all access to them thread-conscious.
Throwing exceptions is not a good practice as it makes the .net runtime to create extra infrastructure for catching them. To verify this create a class and and populate it with some random values using a loop. Make the loop iterate for a large counter like 10,000. Record the time it takes to create the list. Now enclose the instance creation in a try..catch block and record the time. Now, you can see the exceptionally large difference.
e.g
for(int i=0; i<10000; i++){
Employee emp = new Employee();
emp.Name = "Random Name" + i.ToString();
}
Versus
for(int i=0; i<10000; i++){
try{
Employee emp = new Employee();
emp.Name = "Random Name" + i.ToString();
}catch{}
}
Although there is no fixed solution whether to throw exception or not, it is a best practice to create alternate flows in your program and handle every condition with proper return values. Exceptions should be thrown only when the situation can be justified as exceptional.
While I can see the value of the static method in regards to the perceived performance gains, I believe the real issue here is whether the gains (and risks) are worth the maintenance kludge and security weakness you are potentially creating. I believe that most people would warn you away from providing a public method that accepts an user credentials and returns success or failure. It potentially provides an easy a method for hacking.
So, my point is philosophical. Otherwise, I agree with others who have pointed out that restricting the code to use local variables should ensure that you do not have any problems with side effects due to concurrent access of the method, even on different threads, i.e., if you invoke the method on a ThreadPool thread.
Maybe it's better to use public static void Authenticate(string, string) which throws an exception if something goes wrong (return false in original method) ?
This is a good .NET style. Boolean function return-type is the C style and is obsolete.
Why don't you have the user class with username and password and a method that is called authenticate?

Categories