Using ConcurrentBag correctly - c#

Edit: Thank you, you made me realise that the code below is not working as I assumed, since somehow I thought that cbag works like a hashset. Sorry about it, you saved me some headache :)
the following function is the only function that can change _currentSetOfStepsProcessing. This function can be called from different threads. I am not sure if I understood correctly the use of a ConcurrentBag, so please let me know if in your opinion this can work. _stepsToDo datastructure is never modified once the process starts.
void OnStepDone(InitialiseNewUserBase obj)
{
var stepToDo = _stepsToDo[_currentSetOfStepsProcessing];
stepToDo.TryTake(out obj);
if (stepToDo.Count == 0) //can I assume it will enter here once per ConcurrentBag?
{
if (_currentSetOfStepsProcessing < _stepsToDo.Count - 1)
{
_currentSetOfStepsProcessing++;
}
}
}
List<ConcurrentBag<InitialiseNewUserBase>> _stepsToDo = new List<ConcurrentBag<InitialiseNewUserBase>>();
Action _onFinish;
int _currentSetOfStepsProcessing;

stepToDo.TryTake(out obj); might fail, you don't handle that.
Why are you out-referencing the method argument? This simply overwrites the argument. Why take an argument if you throw it away? More likely, this is a misunderstanding of some kind.
can I assume it will enter here once per ConcurrentBag since access to the bag is apparently concurrent multiple accessing threads might see 0. So yes, you need to handle that case better.
Probably, you should not make things so difficult and use lock in combination with non-concurrent data structures. This would only be a good idea if there was a high frequency of bag operations which seems unlikely.
What about this:
foreach (/*processing step*/) {
Parallel.ForEach(/*item in the step*/, x => { ... });
}
Much simpler.

Related

Is it safe for property to update its elements every time its called?

I have a property, IList <IMyPlayer> Players {}, which syncs with the game server every time it is summoned. I need to know if it will update every increment when made limiting count in a for loop. The reason why is I'm worried a player may leave the game during this loop.
edit this is a single thread application.
public static IList <IMyPlayer> Players
{
get
{
playersField.Clear(); //GetPlayers() just adds without overwriting so list must be cleared every time.
if (Debugging == false)
{
MyAPIGateway.Multiplayer.Players.GetPlayers (playersField); //everytime the project needs to see all players, this will update. Little heavier on performance but its polymorphic.
}
return playersField.AsReadOnly();
}
}
for (int i = 0; i < AttendanceManager.Players.Count; i++)
{
if (AttendanceManager.Players[i].SteamUserId == MyAPIGateway.Multiplayer.MyId)
{
//do stuff
}
}
I can see several potential problems with your approach:
You add items while you are looping over it, but only loop until the original count is reached. So any added items are not accessed by the for loop
Your getter is doing more than a "normal" getter, which could mean performance problems if clients are not aware of that
using foreach would only call the getter once, which would behave differently that your for loop.
If you want to do this, I would instead make it a GetPlayers() function which makes it clearer that you are creating something as part of the method, and not just getting the current value of a property. If a client wants to reload the list each time they are stil lfree to do so, but it would be more obvious looking at the code.
For example:
for (int i = 0; i < AttendanceManager.GetPlayers().Count; i++)
{
if (AttendanceManager.GetPlayers()[i].SteamUserId == MyAPIGateway.Multiplayer.MyId)
look much more dodgy than a standard property getter.
I would most certainly not do this.
Every time you use your property it is going to call an API. That is going to be terrible for performance. Also it could be quite easy for your property to be called multiple times, even if you don't think it will be. One example I can think of is serialization, or if you are using this as an argument to say an MVC or Web API controller method.
This is what is commonly referred to as a side-effect, which is something you want to avoid in a getter at all costs.

Switch inside loops impacting performance?

I'm in a scenario where I'm looping through data and formatting it in specific ways based on a setting, and I'm concerned that what I feel is best stylistically might impede performance.
The basic pattern of the code is as follows
enum setting {single, multiple, foo, bar};
Data data = getData(Connection conn, int id);
setting blah = data.getSetting();
foreach (Item item in data)
{
switch(blah)
{
case blah.single:
processDataSingle(item blah);
break;
...
}
}
My concern is that there might be thousands, or even tens of thousands of items in data. I was wondering if having the switch inside the loop where it may be evaluated repeatedly might cause some serious performance issues. I know I could put the switch before the loop, but then each case contains it, which seems much less readable, in that it's less apparent that the basic function remains the same.
You could set up a delegate/action once, then call it every time in the loop:
Data data = getData(Connection conn, int id);
setting blah = data.getSetting();
Action<Item> doThis;
switch (blah)
{
case blah.single:
doThis = i => processSingleData(i blah);
break;
...
}
foreach (Item item in data)
{
doThis(item);
}
Basically, put the body of each "case" in an Action, select that Action in your switch outside the loop, and call the Action in the loop.
You could create a method to keep readability, then pass the data to the method:
void processAllData(IEnumerable<Item> data, setting blah)
{
switch(blah)
{
case blah.single:
foreach (Item item in data)
{
}
}
// next case, next loop ...
}
Then it's just a one-liner:
processAllData(data, blah);
This approach is readable since it encapsulates complexity, concise since you only see what you have to see and efficient since you can optimize the cases.
By using a Action delegate this way, you can factorize your code a lot
enum setting {single, multiple, foo, bar};
Data data = getData(Connection conn, int id);
var processAll = new Action<Action<item>>(action =>
{
foreach(var item in data)
action(item);
});
setting blah = data.getSetting();
switch(blah)
{
case blah.single:
processAll(item => processDataSingle(item, blah));
break;
...
}
It certainly does have the potential to affect performance if you're talking about possibly running the comparison tens of thousands of times or more. The other problem that could potentially arise in the code that you've written here is what happens if you need to add to your enum. Then you'd need to open up this code and adjust it to take care of that circumstance, which violates the Open/Closed Principle.
The best way, IMO, to solve both problems at once would be to use a Factory pattern to take care of this (see posts here and here for some advice on starting that). All you'd need to do is have an interface whose implementations would call the method that you'd want to call in your switch code above. Create a factory and have it pick which implementation to return back to your code (before the loop) based on the enum passed in. At that point all your loop needs to do is to call that interface method which will do exactly what you wanted.
Afterwards, any future feature additions will only require you to create another implementation of that interface, and adjust the enum accordingly. No muss, no fuss.
It's almost certainly slower to put the switch in the loop like that. Whether it's significant or not is impossible to say - use a Stopwatch to see.
If the values in the switch statement are near one to another, the compiler will produce a lookup table instead of N if statements. It increases performance, but it's hard to say when the compiler will decide to do this.
Instead you can create a Dictionary<switchType,Delegate>, populate it with pairs of value-action, and then selecting the appropriate action will take about O(1) as dictionary is a hash table.
dictionary[value].Invoke().

Should I created class or create if?

I have a situation:
I nee to do something with a class.
What should be more efficiente, modify the method this way witf IFs or created methos for each action?
public value Value(int command)
{
if (command == 1)
{
DoSomething1();
}
if (command == 2)
{
DoSomething2();
}
else
{
return empty();
}
}
There are going to be like 50 o more of this commands.
Whats isbetter in terms of performance on execution and size of the exectuable?
At a high-level, it looks like you're trying to implement some kind of dynamic-dispatch system? Or are you just wanting to perform a specified operation without any polymorphism? It's hard to tell.
Anyway, based on the example you've given, switch block would be the most performant, as the JIT compiler converts it into an efficient hashtable lookup instead of a series of comparisons, so just do this:
enum Command { // observe how I use an enum instead "magic" integers
DoSomethingX = 1,
DoSomethingY = 2
}
public Value GetValue(Command command) {
switch(command) {
case Command.DoSomethingX: return DoSomethingX();
case Command.DoSomethingY: return DoSomethingY();
default: return GetEmpty();
}
}
I also note that the switch block also means you get more compact code.
This isn't a performance problem as much as it is a paradigm problem.
In C# a method should be an encapsulation of a task. What you have here is a metric boatload of tasks, each unrelated. That should not be in a single method. Imagine trying to maintain this method in the future. Imagine trying to debug this, wondering where you are in the method as each bit is called.
Your life will be much easier if you split this out, though the performance will probably make no difference.
Although separate methods will nearly certainly be better in terms of performance, it is highly unlikely that you should notice the difference. However, having separate methods should definitely improve readability a lot, which is a lot more important.

Does checking against null for 'success' count as "Double use of variables"?

I have read that a variable should never do more than one thing. Overloading a variable to do more than one thing is bad.
Because of that I end up writing code like this: (With the customerFound variable)
bool customerFound = false;
Customer foundCustomer = null;
if (currentCustomer.IsLoaded)
{
if (customerIDToFind = currentCustomer.ID)
{
foundCustomer = currentCustomer;
customerFound = true;
}
}
else
{
foreach (Customer customer in allCustomers)
{
if (customerIDToFind = customer.ID)
{
foundCustomer = customer;
customerFound = true;
}
}
}
if (customerFound)
{
// Do something
}
But deep down inside, I sometimes want to write my code like this: (Without the customerFound variable)
Customer foundCustomer = null;
if (currentCustomer.IsLoaded)
{
if (customerIDToFind = currentCustomer.ID)
{
foundCustomer = currentCustomer;
}
}
else
{
foreach (Customer customer in allCustomers)
{
if (customerIDToFind = customer.ID)
{
foundCustomer = customer;
}
}
}
if (foundCustomer != null)
{
// Do something
}
Does this secret desires make me an evil programmer?
(i.e. is the second case really bad coding practice?)
I think you've misunderstood the advice. In that case, you're only using the variable for one purpose - to store the customer being searched for. Your logic checks to see if the customer was found, but doesn't change the purpose of the variable.
The "don't use variables for more than one thing" is aimed at things like "temp" variables that store state for ten different things during the course of a function.
You're asking about and demonstrating 2 different things.
What you're asking about: Using the same variable for 2 different things. For example storing a user's age and also his height with a single double variable.
What you're demonstrating: Using 2 variables for the same purpose.
I like your second code variant better, you have 1 variable not 2 that are co-dependent. The first piece of code may have more problems as you have more state to manage to signify the same exact thing.
I think the root thing that you're asking about is: Is it ok to use a magic value instead of a separate variable? It depends on your situation, but if you are guaranteed that the magic value (null in this case) can't be used otherwise to signify anything else, then go ahead.
When you would use the first variant of code that you gave:
If you can have a null value even if an object is found, and you need to distinguish that between actually finding a customer or not, then you should use the 2 variable variant.
Personally, I'd consider refactoring this into methods to find and check your customer, thereby reducing this block length dramatically. Something like:
Customer foundCustomer = null;
if (!this.TryGetLoadedCustomer(out foundCustomer))
foundCustomer = this.FindCustomer();
if (foundCustomer != null)
{ // ...
That being said, you're using the foundCustomer variable for a single purpose here, in both cases. It's being used in multiple places, but it's used for a single purpose - to track the correct customer.
If you're going to use the code as you have it above, I personally prefer the second case over your first option - since a null check is probably going to be required in any case.
The second way is better in my opinion as well. I'd say the first way is actually wrong, as you have two variables that depend on each other and give you redundant information. This opens the possibility of them being inconsistent - you can make a mistake and have customerFound be true, but foundCustomer be null. What do you in that case? It's better for that state to be impossible to reach.
I would say the second case is better than the first. Checking a variable against NULL does not constitute an entire other usage in my book. The second case is better because you have copied code in the first where you have to set the flag and set the variable. This is error prone if you had another case where you set the Customer but then forgot to set the flag.
In the second piece of code, a null value of foundCustomer indicates that no customer was found. This sounds perfectly reasonable, and I would not consider that to be a double use of the variable at all.
I think the second option makes pretty good sense. Why waste a variable if you can live without it?
But in the foreach statement I would add a break if the customer is found.
HTH
I'd argue the opposite. You're actually adding additional lines of code to achieve the same result which makes your first code example more prone to errors and harder to maintain.
I agree with the consensus, you're doing fine checking for nulls, the advice is really warning against something horrid like:
float x;
x=37; // current age
if (x<50) {
x=x/10; // current height
if (x>3) {
x=3.14; // pi to the desired level of precision
}
}
if (x==3.14) {
// hooray pi for everyone old enough
}
else {
// no pi for you youngster!
}
btw, I know it's just a wee code snippet, but I can't help but think that there is something wrong with:
if (customerIDToFind = currentCustomer.ID)
{
foundCustomer = currentCustomer;
}
else {
// foundCustomer remains null
}
if (!foundCustomer) {
// always true when currentCustomer.IsLoaded
}
That would mean that once you have a loaded customer then you'll never again search for another one. I'm guessing that you pruned a bit of handy code to make an example, if that's the case then please ignore this part of the comment! ;-)
I think that if you make it your mission to create simple code that other developers can understand easily, if you make that your beacon instead of a set of rigid rules from 1985, you will find your way.
There are a lot of practices that come from old school procedural development, where routines were more likely to be monolithic and extremely long. We can still learn from them, don't get me wrong, but we have many new strategies for handling complexity and creating human readable/self describing code, so to me the idea that this should be a hard and fast rule seems obsolescent at best.
That said, I would probably refactor this code into a two or three smaller methods, and then the variable reuse question would probably go away. :)
I'd use the second version, but I'm not sure if your sample was very good because I don't think that it is doing two things. Checking a variable for null is standard practice.

How do I automatically reset a boolean when any method other is called in C#?

Using C#, I need to do some extra work if function A() was called right before function C(). If any other function was called in between A() and C() then I don't want to do that extra work. Any ideas that would require the least amount of code duplication?
I'm trying to avoid adding lines like flag = false; into every function B1..BN.
Here is a very basic example:
bool flag = false;
void A()
{
flag = true;
}
void B1()
{
...
}
void B2()
{
...
}
void C()
{
if (flag)
{
//do something
}
}
The above example was just using a simple case but I'm open to using something other than booleans. The important thing is that I want to be able to set and reset a flag of sorts so that C() knows how to behave accordingly.
Thank you for your help. If you require clarification I will edit my post.
Why not just factor your "Extra work" into a memoised function (i.e. one that caches its results)? Whenever you need that work you just call this function, which will short circuit if the cache is fresh. Whenever that work becomes stale, invalidate the cache. In your rather odd examples above, I presume you'll need a function call in each of the Bs, and one in C. Calls to A will invalidate the cache.
If you're looking for away around that (i.e. some clever way to catch all function calls and insert this call), I really wouldn't bother. I can conceive of some insane runtime reflection proxy class generation, but you should make your code flow clear and obvious; if each function depends on the work being already done, just call "doWork" in each one.
Sounds like your design is way too tightly coupled if calling one method changes the behavior of another such that you have to make sure to call them in the right order. That's a major red flag.
Sounds like some refactoring is in order. It's a little tricky to give advice without seeing more of the real code, but here is a point in the right direction.
Consider adding a parameter to C like so:
void C(bool DoExtraWork) {
if (DoExtraWork)...
}
Of course "DoExtraWork" should be named something meaningful in the context of the caller.
I solved a problem with a similar situation (i.e., the need to know whether A was called directly before C) by having a simply state machine in place. Essentially, I built a state object using an enum and a property to manage/query the state.
When my equivalent of A() was called, it would have the business logic piece store off the state indicating that A was called. If other methods (your B's ) were called, it would toggle the state to one of a few other states (my situation was a bit more complicated) and then when C() was called, the business logic piece was queried to determine if we were going to call some method D() that held the "only if A was just called" functionality.
I suspect there are multiple ways to solve this problem, but I liked the state machine approach I took because it allowed me to expand what was initially a binary situation to handle a more complicated multi-state situation.
I was fortunate that multi-threading was not an issue in my case because that tends to make things more entertaining, but the state machine would likely work in that scenario as well.
Just my two cents.
I don't recommend this, but what the hell: If you're willing to replace all your simple method calls:
A();
... with syntax like this:
// _lastAction is a class-level Action member
(_lastAction = new Action(A)).Invoke();
... then inside of C() you can just do a check like this:
void C()
{
if (_lastAction.Method.Name == "A")
{
}
}
This probably isn't thread-safe (and it wouldn't work in code run through an obfuscator without a bit of tinkering), so I wouldn't use something like this without heavy testing. I also wouldn't use something like this period.
Note: my ancient version of C# only has Action<T> (and not Action or Action<T, T> etc.), so if you're stuck there, too, you'd have to add a dummy parameter to each method to use this approach.

Categories