I want to know the consequence of using a class level variable across different functions in a multi-threaded app.
I am creating a class variable and sharing it across get and set function.
The variable is bound to have a value like :
"testuser-2.3" {username-projectversion}
Code:
class Test()
{
private string key;
public Get(string something)
{
key = setToSOmething();
}
public Set(string something)
{
key = setToSOmething();
}
}
Is this code prone to fail in multithreaded environments? Like if two users are accessing diffrent versions of the project, will the "key" value will be diffrent at any random given point?
Thanks in advance.
I want to know the consequence of using a class level variable across different functions in a multi-threaded app.
What you're doing in your code will work, sort of, but it doesn't demonstrate the consequence of allowing multiple threads to modify a variable. It doesn't answer your question. It just means that you'll be okay with the particular thing you're doing with this particular variable.
In this case you're just assigning a different reference to your string variable. That's safe, in a way. You won't get any mangled strings, but it means that you don't know which string a given function will get when it reads the variable. In some scenarios that's not so bad, but it's a little chaotic.
The problem occurs when multiple threads interact with your variable in a way that isn't thread safe.
Here's a really simple test method I wrote without actually knowing what was going to happen:
public class MultithreadedStringTest
{
private string _sharedString;
[TestMethod]
public void DoesntMessUpStrings()
{
var inputStrings = "The quick fox jumped over the lazy brown dog".Split(' ');
var random= new Random();
Parallel.ForEach(Enumerable.Range(0, 1000), x =>
{
_sharedString += " " + inputStrings[random.Next(0, 9)];
});
var outputStrings = _sharedString.Trim().Split(' ');
var nonMangledStrings = outputStrings.Count(s => inputStrings.Contains(s));
Assert.AreEqual(1000, outputStrings.Length,
$"We lost {1000-outputStrings.Length} strings!");
Assert.AreEqual(nonMangledStrings, outputStrings.Length,
$"{outputStrings.Length-nonMangledStrings} strings got mangled.");
}
}
I'm starting with 10 words, and then, in a Parallel.Each loop appending 1000 words selected from those 10 to a single string from concurrent threads.
I expected the string to get mangled. It didn't. Instead what happened is that out of my 1000 words that I added, typically a few hundred just got lost.
We lost 495 strings!
Obviously that's not the particular operation that you're performing. But what it shows is that when we perform concurrent operations, we need to know that we're either calling a thread safe method or we're using some mechanism to prevent conflicts. We want to know how our code will behave and not cross our fingers and hope for the best.
If we're not careful with it the results will be unpredictable and inconsistent. It might work when you test it and fail later, and when it does it will be difficult to diagnose because you won't be able to see it happen.
Leaving aside the fact that Get and Set are both setting key in your code...
This code won't be prone to failure because of the nature of string. It's an immutable class, and the data is basically constructed elsewhere, and then your key assignment happens, as a single, atomic operation. (It's a reference or basically a pointer assignment).
So...even if you were to have two setters, key will always reference a valid string. Depending on the use of this code, though, the order in which the assignments actually happen could be counterintuitive. Say your get actually returns the string...set(get() + "X") could eventually lose Xes if called multiple times from multiple threads. Because all the get calls could get the same old string and perform the string addition on that. But this is you assuming set AND get together are an atomic operation. The accepted answer here:
reference assignment is atomic so why is Interlocked.Exchange(ref Object, Object) needed?
explains this better than I'm doing.
The contrary example would be if you were to use StringBuilder and actually modify the data inside the class...that would not be thread-safe and would certainly require a lock.
Updating answer to explain my reasoning behind my argument that the OP's code is fundamentally thread-safe considered on its own. Consider the following code which attempts to add thread-safety:
public partial class MainWindow : Window
{
private object emergencyLock = new object();
private string _status;
public string status
{
get
{
//make sure if an emergency is happening we get the best possible string.
lock (emergencyLock)
{
return _status;
}
}
set
{
//we could also lock here instead of in GetEmergencyString()..which would fix the get+set=atomic issue
_status = value;
}
}
private string GetEmergencyString()
{
//this function understands an emergency is happening
lock (emergencyLock)
{
//Maybe I'm fetching this string from a database, or building it by hand
//It takes a while...We'll simulate this here
Thread.Sleep(1000);
return "Oh crap, emergency!";
}
}
private void Normal_Button_Click(object sender, RoutedEventArgs e)
{
status = "Nothing much going on.";
}
private void Emergency_Button_Click(object sender, RoutedEventArgs e)
{
//GetEmergencyString() is evaluated first..finally returns a string,
//and THEN the assignment occurs as a single operation
status = GetEmergencyString();
}
}
I'll make the following points about this code:
It does prevent a status seeker from getting a "boring" status during an emergency. It also potentially forces the status seeker to wait a full second before getting that status...Effectively solving nothing, most likely.
Also consider that even single-threaded, there's a fundamental issue here. The fundamental issue is NOT thread safety (in my opinion). The fundamental issue is delay. Better solutions? Fixing the delay. Active notification of the new state..Events, pubsub, etc. A state machine. Even a volatile bool IsEmergency is much better than the "thread-safety" I've added. Any active, intelligent logic in the code. Maybe you don't want the emergency state to be overwritten by the normal state? Again...not a threading issue.
Related
I have a static class with properties to store user's inputs:
public static class UserData
{
public static double UserInput1 { get; set; }
}
And I have nested methods that need the user's inputs
public static double Foo()
{
[...]
var input1 = UserData.UserInput1;
var bar = Bar();
[...]
}
private static double Bar()
{
var input1 = UserData.UserInput1;
[...]
}
The positive thing is that I do not have to pass all user inputs to Foo(), then to Bar() (and to further nested methods within Bar()).
The negative thing is that I have to get UserData.UserInput1 and other user inputs very often. I could change the code to get the user inputs only once:
public static double Foo()
{
[...]
var input1 = UserData.UserInput1;
var bar = Bar(input1);
[...]
}
private static double Bar(double input1)
{
[...]
}
Which one is faster?
The second one is the faster than the first one. Because you avoid to obtain the static property from UserData.
It's not a big goal works with static when we talk about performance cost due to the need to perform a lookup in the symbol table and track shared memory. By passing input values as parameters, this is avoided and slightly better performance is achieved.
But both options are ok. It's more important to focus on code readability and maintainability rather than performance unless you are working on a critical performance issue.
Which one is faster?
Using static mutable state in this way will be way slower in the long run. Because you will spend a bunch of time trying to find and fix bugs. This time could be better spent doing things that will actually help performance, like profiling and optimizing code.
Try to make method that compute anything take the required input as parameters. Try to make input fields properties of the associated UI class. This should help keep the code simple and understandable.
Accessing a static property will be translated to a indirect memory access. Passing a parameter to a method might be free if the parameter is already in a register, or might involve a bit more work if it needs to be loaded, moved or passed on the stack. But we are talking about single digit cycles here, optimization on this level should only be done in super tight loops that are run many millions of times each second, and then you should typically ensure that all methods can be inlined, side stepping the problem.
If you're worried about such micro-optimizations (which you generally wouldn't need), consider using inlining.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
https://learn.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.methodimploptions?view=net-7.0
PS: using one over the other, or using AggressiveInlining, will not save you anywhere close to the 1ms you are hoping for, under non-extreme/farfetched scenarios.
I've came across recently with a dirty if-else code, so I've looked for a refactor options and found recommendation on state-machine as an elegant replacement for dirty if-else code.
But something is hard me to grasp: It looks that as client I have the responsibility to move the machine from one state to the other. Now, if there are 2 transitions options (depend on the result of work done in the current state) Do I need to use if-else also? If so, what the main benefit from that pattern? From my point of view the machine may do the transition automatically from the starting state
Before asking I've read the below, and it only strengthens my opinion:
Auto advancing state machine with Stateless
How to encapsulate .NET Stateless state machine
Statemachine that transitions to target state and fires transitions and states between?
In my example, I've an MarketPriceEvent which needs to be stored in Redis. Before stored it has to pass through validation path. The validation path states are:
Basic Validation
Comparison
Another comparison
Storing
Error auditing
The problem is that I've many decisions to make. For example: only if BasicValidation passed successfully I'd like to to move to Comparison. Now if Comparison succeeded i'd like to move to Storing, otherwise move to ErrorAuditing.
So if we're going into code:
_machine.Configure(State.Validate).PermitIf(Trigger.Validated, State.Compare1, () => isValid);
_machine.Configure(State.Compare1).OnEntry(CompareWithResource1).
PermitIf(Trigger.Compared, State.Store, () => isValid)
.PermitIf(Trigger.Compared, State.Compare2, () => !isValid);
And in my client/wrapper code I'll write:
//Stay at Validate state
var marketPriceProcessingMachine = new MarketPriceProcessingMachine();
if (marketPriceProcessingMachine.Permitted(Trigger.Validated))
marketPriceProcessingMachine.Fire(Trigger.Validated);
//else
// ...
In short, If I need to use if-else, What the benefit did I get from such State machine concept? If it's deterministic why it doesn't self move to the next state? If I'm wrong, What's the wrong?
One benefit of using a state machine is that you reduce the number of states an object can be in. I worked with someone who had 22 bool flags in a single class. There was a lot of if !(something && !somethingElse || !userClicked) …
This sort of code is hard to read, hard to debug, hard to unit test and it's more or less impossible to reason about what the state of the class really is. 22 bool flags means that the class can be in over 4 million states. Try making unit tests for that...
State machines can reduce the complexity of code, but it will almost always make the somewhat more complex at the beginning of a new project. However, in the long term I've found that the overall complexity ends up being overall lower. This is because it's easy to extend, and add more states, since the already defined states can be left alone.
What I've found over the years is that OOP and state machines are often two aspects of the same. And I've also found that OOP is hard, and difficult to get 'right'.
I think the state machine should not be visible to the outside of an object, including its triggers. You most likely want to have a public readonly state property.
I design the classes in such a way that the caller can not directly change the state, or let the caller call Fire method directly. Instead I use methods that are verbs that are actions, like Validate().
Your work flow needs conditionals, but you have some freedom of where to put them. I would suggest separating the business logic from the state machine configuration. I think this makes the state machine easier to read.
How about something like this:
namespace ConsoleApp1
{
using Stateless;
using System;
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Press Q to stop validating events");
ConsoleKeyInfo c;
do
{
var mpe = new MarketPriceEvent();
mpe.Validate();
c = Console.ReadKey();
} while (c.Key != ConsoleKey.Q);
}
}
public class MarketPriceEvent
{
public void Validate()
{
_machine.Fire(Trigger.Validate);
}
public enum State { Validate, Compare2, ErrorAuditing, Compare1, Storing }
private enum Trigger { Validate, CompareOneOk, CompareTwoOk, Error, }
private readonly StateMachine<State, Trigger> _machine;
public MarketPriceEvent()
{
_machine = new StateMachine<State, Trigger>(State.Validate);
_machine.Configure(State.Validate)
.Permit(Trigger.Validate, State.Compare1);
_machine.Configure(State.Compare1)
.OnEntry(DoEventValidation)
.Permit(Trigger.CompareOneOk, State.Compare2)
.Permit(Trigger.Error, State.ErrorAuditing);
_machine.Configure(State.Compare2)
.OnEntry(DoEventValidationAgainstResource2)
.Permit(Trigger.CompareTwoOk, State.Storing)
.Permit(Trigger.Error, State.ErrorAuditing);
_machine.Configure(State.Storing)
.OnEntry(HandleStoring);
_machine.Configure(State.ErrorAuditing)
.OnEntry(HandleError);
}
private void DoEventValidation()
{
// Business logic goes here
if (isValid())
_machine.Fire(Trigger.CompareOneOk);
else
_machine.Fire(Trigger.Error);
}
private void DoEventValidationAgainstResource2()
{
// Business logic goes here
if (isValid())
_machine.Fire(Trigger.CompareTwoOk);
else
_machine.Fire(Trigger.Error);
}
private bool isValid()
{
// Returns false every five seconds...
return (DateTime.UtcNow.Second % 5) != 0;
}
private void HandleStoring()
{
Console.WriteLine("Awesome, validation OK!");
}
private void HandleError()
{
Console.WriteLine("Oh noes, validation failed!");
}
}
}
I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?
There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.
I have worked with c# code for past 4 years, but recently I went through a scenario which I never pass through. I got a damn project to troubleshoot the "Index out of range error". The code looks crazy and all the unnecessary things were there but it's been in production for past 3 years I just need to fix this issue. Coming to the problem.
class FilterCondition
{
.....
public string DataSetName {get; set;}
public bool IsFilterMatch()
{
//somecode here
Dataset dsDataSet = FilterDataSources.GetDataSource(DataSetName); // Static class and Static collection
var filter = "columnname filtername"
//some code here
ds.defaultview.filter= filter;
var isvalid = ds.defaultView.rowcount > 0? true : false;
return isValid;
}
}
// from a out side function they put this in a parallel loop
Parallel.ForEach()
{
// at some point its calling
item.IsFiltermatch();
}
When I debug, dsDataSet I saw that dsDataSet is modified my multiple threads. That's why race condition happens and it failed to apply the filter and fails with index out of Range.
My question here is, my method is Non-static and thread safe, then how this race condition happening since dsDataset is a local variable inside my member function. Strange, I suspect something to do with Parallel.Foreach.
And when I put a normal lock over there issue got resolved, for that also I have no answer. Why should I put lock on a non-static member function?
Can anyone give me an answer for this. I am new to the group. if I am missing anything in the question please let me know. I can't copy the whole code since client restrictions there. Thanks for reading.
Because it's not thread safe.
You're accessing a static collection from multiple threads.
You have a misconception about local variables. Although the variable is local, it's pointing at an object which is not.
What you should do is add a lock around the places where you read and write to the static collection.
Problem: the problem lies within this call
FilterDataSources.GetDataSource(DataSetName);
Inside this method you are writing to a resource that is shared.
Solution:
You need to know which field is being written here and need to implement locking on it.
Note: If you could post your code for the above method we would be in a better position to help you.
I believe this is because of specific (not-stateless, not thread safe, etc) implementation of FilterDataSources.GetDataSource(DataSetName), even by a method call it seems this is a static method. This method can do different things even return cached DataSet instance, intercept calls to a data set items, return a DataSet wrapper so you are working with a wrapper not a data set, so a lot of stuff can be there. If you want to fine let's say "exact line of code" which causes this please show us implementation of GetDataSource() method and all underlying static context of FilterDataSource class (static fields, constructor, other static methods which are being called by GetDataSource() if such exists...)
I've written a helper class that takes a string in the constructor and provides a lot of Get properties to return various aspects of the string. Currently the only way to set the line is through the constructor and once it is set it cannot be changed. Since this class only has one internal variable (the string) I was wondering if I should keep it this way or should I allow the string to be set as well?
Some example code my help why I'm asking:
StreamReader stream = new StreamReader("ScannedFile.dat");
ScannerLine line = null;
int responses = 0;
while (!stream.EndOfStream)
{
line = new ScannerLine(stream.ReadLine());
if (line.IsValid && !line.IsKey && line.HasResponses)
responses++;
}
Above is a quick example of counting the number of valid responses in a given scanned file. Would it be more advantageous to code it like this instead?
StreamReader stream = new StreamReader("ScannedFile.dat");
ScannerLine line = new ScannerLine();
int responses = 0;
while (!stream.EndOfStream)
{
line.RawLine = stream.ReadLine();
if (line.IsValid && !line.IsKey && line.HasResponses)
responses++;
}
This code is used in the back end of a ASP.net web application and needs to be somewhat responsive. I am aware that this may be a case of premature optimization but I'm coding this for responsiveness on the client side and maintainability.
Thanks!
EDIT - I decided to include the constructor of the class as well (Yes, this is what it really is.) :
public class ScannerLine
{
private string line;
public ScannerLine(string line)
{
this.line = line;
}
/// <summary>Gets the date the exam was scanned.</summary>
public DateTime ScanDate
{
get
{
DateTime test = DateTime.MinValue;
DateTime.TryParseExact(line.Substring(12, 6).Trim(), "MMddyy", CultureInfo.InvariantCulture, DateTimeStyles.None, out test);
return test;
}
}
/// <summary>Gets a value indicating whether to use raw scoring.</summary>
public bool UseRaw { get { return (line.Substring(112, 1) == "R" ? true : false); } }
/// <summary>Gets the raw points per question.</summary>
public float RawPoints
{
get
{
float test = float.MinValue;
float.TryParse(line.Substring(113, 4).Insert(2, "."), out test);
return test;
}
}
}
**EDIT 2 - ** I included some sample properties of the class to help clarify. As you can see, the class takes a fixed string from a scanner and simply makes it easier to break apart the line into more useful chunks. The file is a line delimiated file from a Scantron machine and the only way to parse it is a bunch of string.Substring calls and conversions.
I would definitely stick with the immutable version if you really need the class at all. Immutability makes it easier to reason about your code - if you store a reference to a ScannerLine, it's useful to know that it's not going to change. The performance is almost certain to be insignificant - the IO involved in reading the line is likely to be more significant than creating a new object. If you're really concerned about performance, should should benchmark/profile the code before you decide to make a design decision based on those performance worries.
However, if your state is just a string, are you really providing much benefit over just storing the strings directly and having appropriate methods to analyse them later? Does ScannerLine analyse the string and cache that analysis, or is it really just a bunch of parsing methods?
You're first approach is more clear. Performance wise you can gain something but I don't think is worth.
I would go with the second option. It's more efficient, and they're both equally easy to understand IMO. Plus, you probably have no way of knowing how many times those statements in the while loop are going to be called. So who knows? It could be a .01% performance gain, or a 50% performance gain (not likely, but maybe)!
Immutable classes have a lot of advantages. It makes sense for a simple value class like this to be immutable. The object creation time for classes is small for modern VMs. The way you have it is just fine.
I'd actually ditch the "instance" nature of the class entirely, and use it as a static class, not an instance as you are right now. Every property is entirely independent from each other, EXCEPT for the string used. If these properties were related to each other, and/or there were other "hidden" variables that were set up every time that the string was assigned (pre-processing the properties for example), then there'd be reasons to do it one way or the other with re-assignment, but from what you're doing there, I'd change it to be 100% static methods of the class.
If you insist on having the class be an instance, then for pure performance reasons I'd allow re-assignment of the string, as then the CLR isn't creating and destroying instances of the same class continually (except for the string itself obviously).
At the end of the day, IMO this is something you can really do any way you want since there are no other class instance variables. There may be style reasons to do one or the other, but it'd be hard to be "wrong" when solving that problem. If there were other variables in the class that were set upon construction, then this'd be a whole different issue, but right now, code for what you see as the most clear.
I'd go with your first option. There's no reason for the class to be mutable in your example. Keep it simple unless you actually have a need to make it mutable. If you're really that concerned with performance, then run some performance analysis tests and see what the differences are.