Most effecient way to keep status flags (under 32 items) in c#

Most effecient way to keep status flags (under 32 items) in c# - c#

Consider we are defining a class that:
Many instances of that class will be created
We must store under 32 flags in each instance that keeps states or some options, etc.
Defining flags count is fixed and we no-need to keep it in an enumerable variable in runtime. (say we can define separate bool variables, rather than one bool array)
Some properties (from each instance) is depended on our flags (or options) And flags states Will be used (read/write) in a hot call path in our application.
Note: Performance is important for us in that Application.
And As assumptions #1 and #4 dictated, we must care about both speed and memory-load in balance
Obviously we can implement our class in several ways. For example defining a flags Enum field, Or using a BitVector, Or defining separate (bool or Enum or int ...) variables, Or else defining uint variable and using bit-masks, to keep the state in each instance. But:
Which is The Most Efficient way to keep status flags for this situation?
Is it ( = the most efficient way) deeply depends on current in-using tools such as Compiler or even Runtime (CLR)?

As no body answered my question, and I performed some tests and researches, I will answer it myself and I hope to be usable for others:
Which is The Most Efficient way to keep status flags for this
situation?
Because the computer will align data in memory according to the processor architecture ,Even in C# (as a high level language), still It is generally a good advise to avoid separate boolean fields in classes.
Using bit-mask based solutions (same as flags Enum or BitVector32 or manual bit-mask operations) is preferable. For two or more boolean values, it’s a better solution in memory-load and is fast. But when we have a single boolean state var, this is useless.
Generally we can say if we choose flags Enum or else BitVector32 as solution, it should be almost as fast as we expect for a manual bit-masked operations in C# in most cases.
When we need to use various small numeric ranges in addition to boolean values as state, BitVector32 is helpful as an existing util that helps us to keep our states in one variable and saving memory-load.
We may prefer to use flags Enum to make our code more maintainable and clear.
Also we can say about the 2'nd part of the question
Is it ( = the most efficient way) deeply depends on current in-using
tools such as Compiler or even Runtime (CLR)?
Partially Yes.
When we choose each one of mentioned solutions (rather than manual bitwise operations), the performance is depended on compiler optimization that will do (for example in method calls we made when we were using BitVector32 or Enum and or enum operations, etc). So optimizations will boost up our code, and it seems this is common in C#, but for every solution rather than manual bitwise operations, with tools rather than .net official, it is better to be tested in that case.

Related

Design patterns for finding a match

Need assistance in structuring my code with design patterns.
So I have a single method in service class that looks like this. The constructor loads a json file into a List<Option> of Option class instances. Finally when the service method is called it does some logic to find the Option match based on the parameter values and returns a new instance of 'Tool' class with the configured "options".
public Tool BestOptionFinderService.FindBestMatch(string value 1, int value2, int value3, .. bool value20, etc...) {..}
I'm not sure if I a "service" class is correct for this versus a "factory" or something else. I would appreciate your thoughts and suggestions on how you would design your code for this problem or similar.

I would say, having single method with ~20 params is awful by itself, without even considering OOP patterns. Let's try to do better; more than that, let's try not to reinvent the wheel.
Important yet obvious observation: whatever matching logic one has, any object either matches or nor, and never both. Thus it makes sense to stick with Boolean algebra and signatures like predicate: t -> bool or bool Predicate<T>(T obj). The other thing we know about predicates, is that one can easily reduce arbitrary two into a single one: there are different ways to do it, however you're obviously interested in the and or && operator.
Thus, instead of having 20 parameters, you could have had twenty different simple, clear, self-describing predicates, later on "reduced" into a single instance. Linq.Aggregate() could help you out to not only beautify your code, but make it parallel (if necessary) as well. Then you could map your input to an sufficient (it's up to you to decide whether you do need more, or don't) number of flags, representing a "fitness" of a particular object you're examining.
Such an approach would indeed be better because:
1) You stick to well-known Boolean algebra,
1.A) which, actually, forms a monoid under various operation, including and, which, in turn, makes it
1.B) easily distributable over any computation cluster
1.C) compact, expressive, self-explanatory.
2) Your code tells much more story: simple predicates are easy to read, maintain, unit-test and refactor, which is never the case for 20-params methods.
3) You clearly separate your primitives from a compositions - more complex structures, built up from somewhat known primitives. That's the way math goes and thus the only known reliable, proven by thousands of years way to tackle down the complexity and not get enslaved by it at some point.
4) There are more advantages, but I tired to type ;)

Why is it okay that this struct is mutable? When are mutable structs acceptable?

Eric Lippert told me I should "try to always make value types immutable", so I figured I should try to always make value types immutable.
But, I just found this internal mutable struct, System.Web.Util.SimpleBitVector32, in the System.Web assembly, which makes me think that there must be a good reason for having a mutable struct. I'm guessing the reason that they did it this way is because it performed better under testing, and they kept it internal to discourage its misuse. However, that's speculation.
I've C&P'd the source of this struct. What is it that justifies the design decision to use a mutable struct? In general, what sort of benefits can be gained by the approach and when are these benefits significant enough to justify the potential detriments?
[Serializable, StructLayout(LayoutKind.Sequential)]
internal struct SimpleBitVector32
{
private int data;
internal SimpleBitVector32(int data)
{
this.data = data;
}
internal int IntegerValue
{
get { return this.data; }
set { this.data = value; }
}
internal bool this[int bit]
{
get {
return ((this.data & bit) == bit);
}
set {
int data = this.data;
if (value) this.data = data | bit;
else this.data = data & ~bit;
}
}
internal int this[int mask, int offset]
{
get { return ((this.data & mask) >> offset); }
set { this.data = (this.data & ~mask) | (value << offset); }
}
internal void Set(int bit)
{
this.data |= bit;
}
internal void Clear(int bit)
{
this.data &= ~bit;
}
}

Given that the payload is a 32-bit integer, I'd say this could easily have been written as an immutable struct, probably with no impact on performance. Whether you're calling a mutator method that changes the value of a 32-bit field, or replacing a 32-bit struct with a new 32-bit struct, you're still doing the exact same memory operations.
Probably somebody wanted something that acted kind of like an array (while really just being bits in a 32-bit integer), so they decided they wanted to use indexer syntax with it, instead of a less-obvious .WithTheseBitsChanged() method that returns a new struct. Since it wasn't going to be used directly by anyone outside MS's web team, and probably not by very many people even within the web team, I imagine they had quite a bit more leeway in design decisions than the people building the public APIs.
So, no, probably not that way for performance -- it was probably just some programmer's personal preference in coding style, and there was never any compelling reason to change it.
If you're looking for design guidelines, I wouldn't spend too much time looking at code that hasn't been polished for public consumption.

Actually, if you search for all classes containing BitVector in the .NET framework, you'll find a bunch of these beasts :-)
System.Collections.Specialized.BitVector32 (the sole public one...)
System.Web.Util.SafeBitVector32 (thread safe)
System.Web.Util.SimpleBitVector32
System.Runtime.Caching.SafeBitVector32 (thread safe)
System.Configuration.SafeBitVector32 (thread safe)
System.Configuration.SimpleBitVector32
And if you look here were resides the SSCLI (Microsoft Shared Source CLI, aka ROTOR) source of System.Configuration.SimpleBitVector32, you'll find this comment:
//
// This is a cut down copy of System.Collections.Specialized.BitVector32. The
// reason this is here is because it is used rather intensively by Control and
// WebControl. As a result, being able to inline this operations results in a
// measurable performance gain, at the expense of some maintainability.
//
[Serializable()]
internal struct SimpleBitVector32
I believe this says it all. I think the System.Web.Util one is more elaborate but built on the same grounds.

SimpleBitVector32 is mutable, I suspect, for the same reasons that BitVector32 is mutable. In my opinion, the immutable guideline is just that, a guideline; however, one should have a really good reason for doing so.
Consider, also, the Dictionary<TKey, TValue> - I go into some extended details here. The dictionary's Entry struct is mutable - you can change TValue at any time. But, Entry logically represents a value.
Mutability must make sense. I agree with the #JoeWhite: somebody wanted something that acted kind of like an array (while really just being bits in a 32-bit integer); also that both BitVector structs could easily have been ... immutable.
But, as a blanket statement, I disagree with it was probably just some programmer's personal preference in coding style and lean more toward there was never [nor is there] any compelling reason to change it. Simply know and understand the responsibility of using a mutable struct.
Edit
For the record, I do heartily agree that you should always try to make a struct immutable. If you find that requirements dictate member mutability, revisit the design decision and get peers involved.
Update
I was not initially confident in my assessment of performance when considering a mutable value type v. immutable. However, as #David points out, Eric Lippert writes this:
There are times when you need to wring every last bit of performance
out of a system. And in those scenarios, you sometimes have to make a
tradeoff between code that is clean, pure, robust ,
understandable, predictable, modifiable and code that is none of the
above but blazingly fast.
I bolded pure because a mutable struct does not fit the pure ideal that a struct should be immutable. There are side-affect of writing a mutable struct: understability and predictability are compromised, as Eric goes on to explain:
Mutable value types ... behave
in a manner that many people find deeply counterintuitive, and thereby
make it easy to write buggy code (or correct code that is easily
turned into buggy code by accident.) But yes, they are real fast.
The point Eric is making is that you, as the designer and/or developer need to make a conscious and informed decision. How do you become informed? Eric explains that also:
I would consider coding up two benchmark solutions -- one using
mutable structs, one using immutable structs -- and run some
realistic user-scenario-focused benchmarks. But here's the thing: do not pick the faster one. Instead, decide BEFORE you run the benchmark
how slow is unacceptably slow.
We know that altering a value type is faster than creating a new value type; but considering correctness:
If both solutions are acceptable, choose the one that is clean,
correct and fast enough.
The key is being fast enough to offset side affects of choosing mutable over immutable. Only you can determine that.

Using a struct for a 32- or 64-bit vector as shown here is reasonable, with a few caveats:
I would recommend using an Interlocked.CompareExchange loop when performing any updates to the structure, rather than just using the ordinary Boolean operators directly. If one thread tries to write bit 3 while another tries to write bit 8, neither operation should interfere with the other beyond delaying it a little bit. Use of an Interlocked.CompareExchange loop will avoid the possibility of errant behavior (thread 1 reads value, thread 2 reads old value, thread 1 writes new value, thread 2 writes value computed based on old value and undoes thread 1's change) without needing any other type of locking.
Structure members, other than property setters, which modify "this" should be avoided. It's better to use a static method which accepts the structure as a reference parameter. Invoking a structure member which modifies "this" is generally identical to calling a static method which accepts the member as a reference parameter, both from a semantic and performance standpoint, but there's one key difference: If one tries to pass a read-only structure by reference to a static method, one will get a compiler error. By contrast, if one invokes on a read-only structure a method which modifies "this", there won't be any compiler error but the intended modification won't happen. Since even mutable structures can get treated as read-only in certain contexts, it's far better to get a compiler error when this happens than to have code which will compile but won't work.
Eric Lippert likes to rail on about how mutable structures are evil, but it's important to recognize his relation to them: he's one of the people on the C# team who is charged with making the language support features like closures and iterators. Because of some design decisions early in the creation of .net, properly supporting value-type semantics is difficult in some contexts; his job would be a lot easier if there weren't any mutable value types. I don't begrudge Eric for his point of view, but it's important to note that some principles which may be important in framework and language design are not so applicable to application design.

If I understand correctly, you cannot make a serializable immutable struct simply by using SerializableAttribute. This is because during deserialization, the serializer instantiates a default instance of the struct, then sets all the fields following instantiation. If they are readonly, deserialization will fail.
Thus, the struct had to be mutable, else a complex serialization system would have been necessary.

List of const int instead of enum

I started working on a large c# code base and found the use of a static class with several const ints fields. This class is acting exactly like an enum would.
I would like to convert the class to an actual enum, but the powers that be said no. The main reason I would like to convert it is so that I could have the enum as the data type instead of int. This would help a lot with readability.
Is there any reason to not use enums and to use const ints instead?
This is currently how the code is:
public int FieldA { get; set; }
public int FieldB { get; set; }
public static class Ids
{
public const int ItemA = 1;
public const int ItemB = 2;
public const int ItemC = 3;
public const int ItemD = 4;
public const int ItemE = 5;
public const int ItemF = 6;
}
However, I think it should be the following instead:
public Ids FieldA { get; set; }
public Ids FieldB { get; set; }

I think many of the answers here ignore the implications of the semantics of enums.
You should consider using an enum when the entire set of all valid values (Ids) is known in advance, and is small enough to be declared in program code.
You should consider using an int when the set of known values is a subset of all the possible values - and the code only needs to be aware of this subset.
With regards to refactoring - when time and business contraints allow, it's a good idea to clean code up when the new design/implementation has clear benefit over the previous implementation and where the risk is well understood. In situations where the benefit is low or the risk is high (or both) it may be better to take the position of "do no harm" rather than "continuously improve". Only you are in a position to judge which case applies to your situation.
By the way, a case where neither enums or constant ints are necessarily a good idea is when the IDs represent the identifiers of records in an external store (like a database). It's often risky to hardcode such IDs in the program logic, as these values may actually be different in different environments (eg. Test, Dev, Production, etc). In such cases, loading the values at runtime may be a more appropriate solution.

Your suggested solution looks elegant, but won't work as it stands, as you can't use instances of a static type. It's a bit trickier than that to emulate an enum.
There are a few possible reasons for choosing enum or const-int for the implementation, though I can't think of many strong ones for the actual example you've posted - on the face of it, it seems an ideal candidate for an enum.
A few ideas that spring to mind are:
Enums
They provide type-safety. You can't pass any old number where an enum value is required.
Values can be autogenerated
You can use reflection to easily convert between the 'values' and 'names'
You can easily enumerate the values in an enum in a loop, and then if you add new enum members the loop will automatically take them into account.
You can insert new enunm values without worrying about clashes occurring if you accidentally repeat a value.
const-ints
If you don't understand how to use enums (e.g. not knowing how to change the underlying data type of an enum, or how to set explicit values for enum values, or how to assign the same value to mulitple constants) you might mistakenly believe you're achieving something you can't use an enum for, by using a const.
If you're used to other languages you may just naturally approach the problem with consts, not realising that a better solution exists.
You can derive from classes to extend them, but annoyingly you can't derive a new enum from an existing one (which would be a really useful feature). Potentially you could therefore use a class (but not the one i your example!) to achieve an "extendable enum".
You can pass ints around easily. Using an enum may require you to be constantly casting (e.g.) data you receive from a database to and from the enumerated type. What you lose in type-safety you gain in convenience. At least until you pass the wrong number somewhere... :-)
If you use readonly rather than const, the values are stored in actual memory locations that are read when needed. This allows you to publish constants to another assembly that are read and used at runtime, rather than built into the other assembly, which means that you don't have to recompile the dependant assembly when you change any of the constants in your own assembly. This is an important consideration if you want to be able to patch a large application by just releasing updates for one or two assemblies.
I guess it is a way of making it clearer that the enum values must stay unchanged. With an enum another programmer will just drop in a new value without thinking, but a list of consts makes you stop and think "why is it like this? How do I add a new value safely?". But I'd achieve this by putting explicit values on the enums and adding a clear comment, rather than resorting to consts.
Why should you leave the implementation alone?
The code may well have been written by an idiot who has no good reason for what he did. But changing his code and showing him he's an idiot isn't a smart or helpful move.
There may be a good reason it's like that, and you will break something if you change it (e.g. it may need to be a class due to being accessed through reflection, being exposed through external interfaces, or to stop people easily serializing the values because they'll be broken by the obfuscation system you're using). No end of unnecessary bugs are introduced into systems by people who don't fully understand how something works, especially if they don't know how to test their changes to ensure they haven't broken anything.
The class may be autogenerated by an external tool, so it is the tool you need to fix, not the source code.
There may be a plan to do something more with that class in future (?!)
Even if it's safe to change, you will have to re-test everything that is affected by the change. If the code works as it stands, is the gain worth the pain? When working on legacy systems we will often see existing code of poor quality or just done a way we don't personally like, and we have to accept that it is not cost effective to "fix" it, no matter how much it niggles. Of course, you may also find yourself biting back an "I told you so!" when the const-based implementation fails due to lacking type-safety. But aside from type-safety, the implementation is ultimately no less efficient or effective than an enum.

If it ain't broke, don't fix it.
I don't know the design of the system you're working on, but I suspect that the fields are integers that just happen to have a number of predefined values. That's to say they could, in some future state, contain more than those predefined values. While an enum allows for that scenario (via casting), it implies that only the values the enumeration contains are valid.
Overall, the change is a semantic one but it is unnecessary. Unnecessary changes like this are often a source of bugs, additional test overhead and other headaches with only mild benefits. I say add a comment expressing that this could be an enum and leave it as it is.

Yes, it does help with readability, and no I cannot think of any reason against it.
Using const int is a very common "old school" of programming practice for C++.

The reason I see is that if you want to be loosely coupled with another system that uses the same constants, you avoid being tightly coupled and share the same enum type.
Like in RPC calls or something...

Is there any harm in having many enum values? (many >= 1000)

I have a large list of error messages that my biz code can return based on what's entered. The list may end up with more than a thousand.
I'd like to just enum these all out, using the [Description("")] attribute to record the friendly message.
Something like:
public enum ErrorMessage
{
[Description("A first name is required for users.")]
User_FirstName_Required = 1,
[Description("The first name is too long. It cannot exceed 32 characters.")]
User_FirstName_Length = 2,
...
}
I know enums are primitive types, integers specifically. There shouldn't be any problem with that many integers, right?
Is there something I'm not thinking of? It seems like this should be okay, but I figured I should ask the community before spending the time to do it this way.
Does .Net care about enum types differently when they have lots of values?
Update
The reason I didn't want to use Resources is because
a) I need to be able to reference each unique error message with an integer value. The biz layer services an API, in addition to other things, and a list of integer values has to be returned denoting the errors. I don't believe Resources allows you to address a resource value with an integer. Am I wrong?
b) There are no localization requirements.

I think a design that has 1,000+ values in an enum needs some more thought. Sounds like a "God Enum" anti-pattern will have to be invented for this case.

The main downside I'd point out with having the friendly description in an Attribute is that this will cause challenges if you ever need to localize your app for another language. If this is a consideration, it would be a good idea to put the strings in a resource file.
The enum itself should not be a problem, though having all of your error codes in one master list can be confusing. You may consider creating seperate enums for seperate categories of return codes, as this will make it easier for developers to understand the possible return values for a particular function. You can still give them distinct numeric values (by specifying the numeric values explicitly) if it's important that the codes be unique.
On a side note, the .NET BCL does not make much use of return codes and return codes are somewhat discouraged in modern .NET development. They create maintainability issues (you can almost never remove old return codes or risk breaking backwards compatibility) and they require special validation logic to handle the returns for every call. Stateful validation can be accomplished with IDataErrorInfo, where you use an intermediate class that can represent invalid states, but that only allows a Commit of changes that are validated. This allows you to manipulate the object freely, but also provide feedback to the user as to the validity of its state. The equivalent logic with error codes often requires a switch statement for each use.

1000 is not many, you should just make sure that the underlying integer type is big enough (don't use a char for your enum.
On second thought 1000 is tons if you're manually entering them, if they are generated from some data set it could make sense kinda...

I fully agree with duffymo. An enum with 1000+ values smells bad from y design point of view. Not to mention that it would be quite nasty for the developer to use intelligence on such a GOD ENUM:-)
I would better go for using resources.

I think it's very bad, for error handling you can simply use resource, as i see you want to do reflection and fetch the description its bad too.
If you don't want to use resources, you can define different enum for each of your business rules, Also your different business doesn't need others error message (and shouldn't be like this).

Do enums have a limit of members in C#?

I was wondering if the enum structure type has a limit on its members. I have this very large list of "variables" that I need to store inside an enum or as constants in a class but I finally decided to store them inside a class, however, I'm being a little bit curious about the limit of members of an enum (if any).
So, do enums have a limit on .Net?

Yes. The number of members with distinct values is limited by the underlying type of enum - by default this is Int32, so you can get that many different members (2^32 - I find it hard that you will reach that limit), but you can explicitly specify the underlying type like this:
enum Foo : byte { /* can have at most 256 members with distinct values */ }
Of course, you can have as many members as you want if they all have the same value:
enum { A, B = A, C = A, ... }
In either case, there is probably some implementation-defined limit in C# compiler, but I would expect it to be MIN(range-of-Int32, free-memory), rather than a hard limit.

Due to a limit in the PE file format, you probably can't exceed some 100,000,000 values. Maybe more, maybe less, but definitely not a problem.

From the C# Language Specification 3.0, 1.10:
An enum type’s storage format and
range of possible values are
determined by its underlying type.
While I'm not 100% sure I would expect Microsoft C# compiler only allowing non-negative enum values, so if the underlying type is an Int32 (it is, by default) then I would expect about 2^31 possible values, but this is an implementation detail as it is not specified. If you need more than that, you're probably doing something wrong.

You could theoretically use int64 as your base type in the enum and get 2^63 possible entries. Others have given you excellent answers on this.
I think there is a second implied question of should you use an enum for something with a huge number of items. This actually directly applies to your project in many ways.
One of the biggest considerations would be long term maintainability. Do you think the company will ever change the list of values you are using? If so will there need to be backward compatibility to previous lists? How significant a problem could this be? In general, the larger the number of members in an enum correlates to a higher probability the list will need to be modified at some future date.
Enums are great for many things. They are clean, quick and simple to implement. They work great with IntelliSense and make the next programmer's job easier, especially if the names are clear, concise and if needed, well documented.
The problem is an enumeration also comes with drawbacks. They can be problematic if they ever need to be changed, especially if the classes using them are being persisted to storage.
In most cases enums are persisted to storage as their underlying values, not as their friendly names.
enum InsuranceClass
{
Home, //value = 0 (int32)
Vehicle, //value = 1 (int32)
Life, //value = 2 (int32)
Health //value = 3 (int32)
}
In this example the value InsuranceClass.Life would get persisted as a number 2.
If another programmer makes a small change to the system and adds Pet to the enum like this;
enum InsuranceClass
{
Home, //value = 0 (int32)
Vehicle, //value = 1 (int32)
Pet, //value = 2 (int32)
Life, //value = 3 (int32)
Health //value = 4 (int32)
}
All of the data coming out of the storage will now show the Life policies as Pet policies. This is an extremely easy mistake to make and can introduce bugs that are difficult to track down.
The second major issue with enums is that every change of the data will require you to rebuild and redeploy your program. This can cause varying degrees of pain. On a web server that may not be a big issue, but if this is an app used on 5000 desktop systems you have an entirely different cost to redeploy your minor list change.
If your list is likely to change periodically you should really consider a system that stores that list in some other form, most likely outside your code. Databases were specifically designed for this scenario or even a simple config file could be used (not the preffered solution). Smart planning for changes can reduce or avoid the problems associated with rebuilding and redeploying your software.
This is not a suggestion to prematurely optimize your system for the possibility of change, but more a suggestion to structure the code so that a likely change in the future doesn't create a major problem. Different situations will require difference decisions.
Here are my rough rules of thumb for the use of enums;
Use them to classify and define other data, but not as data
themselves. To be clearer, I would use InsuranceClass.Life to
determine how the other data in a class should be used, but I would
not make the underlying value of {pseudocode} InsuranceClass.Life = $653.00 and
use the value itself in calculations. Enums are not constants. Doing
this creates confusion.
Use enums when the enum list is unlikely to change. Enums are great
for fundamental concepts but poor for constantly changing ideas.
When you create an enumeration this is a contract with future
programmers that you want to avoid breaking.
If you must change an enum, then have a rule everyone follows that
you add to the end, not the middle. The alternative is that you
define specific values to each enum and never change those. The
point is that you are unlikely to know how others are using your
enumerations underlying values and changing them can cause misery for anyone
else using your code. This is an order of magnitude more important
for any system that persists data.
The corollary to #2 and #3 is to never delete a member of an enum.
There are specific circles of hell for programmers who do this in a codebase used by others.
Hopefully that expanded on the answers in a helpful way.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.