I'm working with the Umbraco CMS which holds lots of data as strings.
Sometimes I need to compare a stored value string value (which is an int stored as a string) to an enum, but is it best to compare them as strings:
if ( stringValue == ( (int) Enum.Option ).ToString() ){
}
Or to parse and compare as ints:
if ( int.Parse(stringValue) == (int) Enum.Option ){
}
Or does it just not matter either way!
You should compare data in its native/canonical form. So use integers. Performance is usually a second-order concern in such cases. Correctness is first.
Maybe you want to try to use Enum.Parse?
enum MyEnum
{
Option,
Option1 = 1,
Option2 = 2
}
string stringValue = "0";
if((MyEnum)Enum.Parse(typeof(MyEnum), stringValue) == MyEnum.Option)
{
//Do what you need
}
Note:
The value parameter contains the string representation of an enumeration member's underlying value or named constant, or a list of named constants delimited by commas (,).
So stringValue can be "Option" or "0".
Its even better if you will compare enums.
For the sake of code readability, I'd choose the second approach: it makes clear beyond doubt that your string is expected to contain an integer in that particular context, and you're treating it as such.
Second approach would also allow you to handle error cases more deeply (what if your string isn't an integer ? Second block would throw, first one would silently act just like your data was different from the enum).
Also, as already stated, comparing integers is always better performance-wise than comparing strings, but I believe there wouldn't be much real-world difference in this case.
Casting from int to an enum is extremely cheap... it'll be faster than a dictionary lookup. Basically it's a no-op, just copying the bits into a location with a different notional type.
Parsing a string into an enum value will be somewhat slower.
from this SO answer.
If you want to check the validity, you can use
int value;
Option option;
if (int.TryParse(stringValue, out value) &&
Enum.IsDefined(typeof(Option), value)) {
option=(Option)value;
}
Related
Lets say, for example, I have two implementations, one with bit flags and one with simple enum:
1) bit flags
[Flags]
public enum LettersBitFlags
{
A = 1,
B = 2,
C = 4,
D = 8
}
public static bool IsLetterBBitFlags(LettersBitFlags letters)
{
return (letters & LettersBitFlags.B) != 0;
}
2) simple enum
public enum Letters
{
A,
B,
C,
D
}
public static bool IsLetterBParams(params Letters[] letters)
{
return letters.Any(x => x == Letters.B);
}
They obviously can be used like this:
Console.WriteLine(IsLetterBBitFlags(LettersBitFlags.A | LettersBitFlags.D)); //false
Console.WriteLine(IsLetterBBitFlags(LettersBitFlags.B | LettersBitFlags.C)); //true
Console.WriteLine(IsLetterBParams(Letters.A, Letters.D)); //false
Console.WriteLine(IsLetterBParams(Letters.B, Letters.C)); //true
Is there any practical reason to choose one over the other? As I see, they give the same result, but maybe some performance or anything else should make me use one and did not use the other?
They have different meanings. Flags are characterised by the fact that it's meaningful to OR them together; enums are simply discrete lumps of data, and the fact that they are numerical under the covers is nothing more than an implementation detail.
If you use flags where it's not meaningful to OR two of them together, you will badly mislead anyone who comes to use your data type. Conversely, if you use an enum where you meant a flag, then you'll have to manually capture and name exponentially many enum cases.
Advantages of bit flags:
1) The flags attribute gives you nicer ToString() behaviour than an array of an enum.
2) You can store multiple values in one variable (hence why you don't need an array). This could be important in a memory bound operation.
3) The method Enum.HasFlag exists so you don't actually have to implement your static class.
Disadvantages:
1) You cannot store multiple instances of the same value in the flag - on/off. Your array could have multiple instances of the same value.
2) Any = 0 value will have to be checked separately if you want one.
3) In my option it's not always as clear what you're doing when using bit flags - there's an extra element of knowledge that you need.
The 1st option is good when you represent data in some byte/ int bits, and the data may contain 1 or more of the enum values and then you ca check if data has flag or not.
I want to have to have a value object that represents length. I would prefer to use a struct given that it is a value type, but since zero length does not make sense I am forced to use a class. Adding two lengths together seems like a reasonable operation, so I want to overload the + operator. I am curious though, how should I handle adding null?
Adding null to an existing string returns a string with the same content as the existing string. Adding null to a int? that has a value returns null.
I can see a case where adding nullto an existing length simply returns a new length with the same value as the existing length. At the same time, I can see a case where adding null would be considered a bug. I have been trying to find some guidance but have not been able to find any. Is there a common guideline for this or is it different for each application?
I would highly recommend using struct for your length, and treating the default representation as zero length.
since zero length does not make sense I am forced to use a class
It is up to your code to treat the default representation of length struct as a representation of some specific length. In addition to treating it as zero length, you have at least two options:
You can treat default length as an unknown, in which case any operation with it would produce an unknown, or
You can treat it as a "trap representation" of length, in which case any operation with it would produce an exception.
It is probably a design mistake to not treat zeros in a uniform way with all other numbers. Specifically, zero length may become handy when you subtract length values, because subtracting two values of equal length would have nothing to produce.
As far as "unknown" length is concerned, using struct gives you a convenient standard representation of Nullable<length> immediately familiar to users of your length structure.
Simple Answer:
if your allowed to add nulls in your system then you should probably keep the existing value and treat it like a 0 like so:
public static NullNumber operator+ (NullNumber b, NullNumber c) {
return (b ?? 0) + (c ?? 0);
}
Advanced Answer:
You are probably correct about length not making sense at 0 and you are right about adding nulls seems like a bug
I can't see where the field is populated but I suspect either:
you don't have a constructor that requires you to pass in a length if it's required.
Or you have a faulty class that sometimes has a length and sometimes meaning it sounds closer to 2 classes
Strictly speaking, a null length doesn't exist in reality, everything has length. Getting a null return or a NullReferenceException when working with your struct would lead me to think I messed up the constructor or instantiation. In other words, the null reference would be employed in the scope of the application and not exposed to the client.
struct length = new MyStruct(); //no!
struct length = new MyStruct(double feet, double inches) //better...
struct length = 34.5; //ok...
Good evening!
Which types of values can be directly stored into an Excel worksheet using Range.Value2 and how do I quickly check if a particular value can?
Suppose I have an array of objects, perhaps multityped (e.g. one int, one double and one Foo stored in an object[]).
If I shall choose a range of width 3 and try to store this array using Range.Value2, this will result in an exception (of course Excel doesn't know what is a Foo).
I came up with an idea of checking each value in the array, and, if it's not storable, convert it to its string representation using ToString(). But how do I check if it's initially storable?
It would be horrible to end up doing something like that:
public bool storable<T>(T value)
{
return value is int ||
value is uint ||
value is short ||
value is byte ||
...
value is string;
}
...especially knowing that each is will cast the variable to the tested type and seriously affect performance.
On the other hand, I can't afford pre-casting each value to the string type as I sometimes want to be able to do graphs and diagrams with numeric values, not strings.
Can you tell me I am mistaken or offer me any solution to the problem?
Thank you!
I think you're going to have to do what you're unkeen to do (all the "is" checks), unless you can somehow make your input array a bit more strongly typed. Your best bet might be just to order the casts such that the most common ones get hit first.
I have a dataset. This dataset will serve a lookup table. Given a number, I should be able to lookup a corresponding value for that number.
The dataset (let's say its CSV) has a few caveats though. Instead of:
1,ABC
2,XYZ
3,LMN
The numbers are ranges (- being "through", not minus):
1-3,ABC // 1, 2, and 3 = ABC
4-8,XYZ // 4, 5, 6, 7, 8 = XYZ
11-11,LMN // 11 = LMN
All the numbers are signed ints. No ranges overlap with another ranges. There are some gaps; there are ranges that aren't defined in the dataset (like 9 and 10 in the last snippet above).
`
How might I model this dataset in C# so that I have the most-performant lookup while keeping my in-memory footprint low?
The only option I've come up with suffers from overconsumption of memory. Let's say my dataset is:
1-2,ABC
4-6,XYZ
Then I create a Dictionary<int,string>() whose key/values are:
1/ABC
2/ABC
4/XYZ
5/XYZ
6/XYZ
Now I have hash performance-lookup, but tons of wasted space in the hash table.
Any ideas? Maybe just use PLINQ instead and hope for good performance? ;)
If your dictionary is going to truly store a wide range of key values, an approach that expands all possible ranges into explicit keys will rapidly consume more memory than you likely have available.
You're best option is to use a data structure that supports some variation of binary search (or other O(log N) lookup technique). Here's a link to a generic RangeDictionary for .NET that uses an OrderedList internally, and has O(log N) performance.
Achieving constant-time O(1) lookup requires that you expand all ranges into explicit keys. This requires both a lot of memory, and can actually degrade performance when you need to split or insert a new range. This probably isn't what you want.
You can create a doubly-indirected lookup:
Dictionary<int, int> keys;
Dictionary<int, string> values;
Then store the data like this:
keys.Add(1, 1);
keys.Add(2, 1);
keys.Add(3, 1);
//...
keys.Add(11, 3);
values.Add(1, "ABC");
//...
values.Add(3, "LMN");
And then look the data up:
return values[keys[3]]; //returns "ABC"
I'm not sure how much memory footprint this will save with trivial strings, but once you get beyond "ABC" it should help.
EDIT
After Dan Tao's comment below, I went back and checked on what he was asking about. The following code:
var abc = "ABC";
var def = "ABC";
Console.WriteLine(ReferenceEquals(abc, def));
will write "True" to the console. Which means that the either the compiler or the runtime (clarification?) is maintaining the reference to "ABC", and assigns it as the value of both variables.
After reading up some more on Interned strings, if you're using string literals to populate the dictionary, or Interning computed strings, it will in fact take more space to implement my suggestion than the original dictionary would have taken. If you're not using Interned strings, then my solution should take less space.
FINAL EDIT
If you're treating your strings correctly, there should be no excess memory usage from the original Dictionary<int, string> because you can assign them to a variable and then assign that reference as the value (or, if you need to, because you can Intern them)
Just make sure your assignment code includes an intermediate variable assignment:
while (thereAreStringsLeftToAssign)
{
var theString = theStringToAssign;
foreach (var i in range)
{
strings.Add(i, theString);
}
}
As arootbeer has mentioned in his answer, the following code does not create multiple instances of the string "ABC"; rather, it interns a single instance and assigns a reference to that instance to each KeyValuePair<int, string> in dictionary:
var dictionary = new Dictionary<int, string>();
dictionary[0] = "ABC";
dictionary[1] = "ABC";
dictionary[2] = "ABC";
// etc.
OK, so in the case of string literals, you're only using one string instance per range of keys. Is there a scenario where this wouldn't be the case--that is, where you would be using a separate string instance for each key within the range (this is what I assume you're concerned about when you speak of "overconsumption of memory")?
Honestly, I don't think so. There are scenarios where multiple equivalent string instances may be created without the benefit of interning, yes. But I can't imagine these scenarios would affect what you're trying to do here.
My reasoning is this: you want to assign certain values to different ranges of keys, right? So any time you are defining a key-range-value pairing of this sort, you have a single value and several keys. The single part is what leads me to doubt that you'll ever have multiple instances of the same string, unless it is defined as the value for more than one range.
To illustrate: yes, the following code will instantiate two identical strings:
string x = "ABC";
Console.Write("Type 'ABC' and press Enter: ");
string y = Console.ReadLine();
Console.WriteLine(Equals(x, y));
Console.WriteLine(ReferenceEquals(x, y));
The above program, assuming the user follows instructions and types "ABC," outputs True, then False. So you might think, "Ah, so when a string is only provided at run-time, it isn't interned! So this could be where my values could be duplicated!"
But... again: I don't think so. It all comes back to the fact that you are going to be assigning a single value to a range of keys. So let's say your values come from user input; then your code would look something like this:
var dictionary = new Dictionary<int, string>();
int start, count;
GetRange(out start, out count);
string value = GetValue();
foreach (int key in Enumerable.Range(start, count))
{
// Look, you're using the same string instance to assign
// to each key... how could it be otherwise?
dictionary[key] = value;
}
Now, if you were actually thinking more along the lines of what LBushkin mentions in his answer--that you may potentially have huge ranges, making it impractical to define a KeyValuePair<int, string> for each key within that range (e.g., if you have a range of 1-1000000)--then I would agree that you're best off with some sort of data structure that bases its lookup on a binary search. If that's more your scenario, say so and I will be happy to offer more ideas on that front. (Or you could just take a look at the link LBushkin already posted.)
Use a balanced ordered tree (or something similar) mapping start-of-range to end-of-range and data. This will be easy to implement for non-overlapping ranges.
arootbeer has a good solution, but one you may find confusing to work with.
Another choice is to use a reference type instead of a string, so that you point to the same reference
class StringContainer {
public string Value { get; set; }
}
Dictionary<int, StringContainer> values;
var value1 = new StringContainer { Value = "ABC" };
values.Add(1, value1);
values.Add(2, value1);
They will both point to the same instance of StringContainer
EDIT: Thanks for the comments everyone. This method handles value types other than string, so it might be useful for more than the given example. Also, it is my understanding that strings don't always behave in the manner you would expect from reference values, but I could be wrong.
This seems so trivial but I'm not finding an answer with Google.
I'm after a high value for a string for a semaphore at the end of a sorted list of strings.
It seems to me that char.highest.ToString() should do it--but this compares low, not high.
Obviously it's not truly possible to create a highest possible string because it would always be lower than the same thing + more data but the strings I'm sorting are all valid pathnames and thus the symbols used are constrained.
In response to the comments:
In the pre-unicode days in Delphi I would simply have used #255. I simply want a string that will compare higher than any possible pathname. This should be trivial--why isn't it??
Response #2:
It's not the sorting that requires the sentinel, it's the processing afterwards. I have multiple lists that I am sort-of merging (a simplistic merge won't do the job.) and either I duplicate code or I have dummy values that always compare high.
A string representation of the highest character will only be one character long.
Why don't you just append it as a semaphore after sorting, rather than trying to make it something that will sort afterwards?
Alternatively, you could specify your own comparator that sorts your token after any other string, and calls the default comparator otherwise.
I had the same problem when trying to put null values at the bottom of a list in a LINQ OrderBy() statement. I ended up using...
Char.ConvertFromUtf32(0x10ffff)
...which worked a treat.
Something like this?
public static String Highest(this String value)
{
Char highest = '\0';
foreach (Char c in value)
{
highest = Math.Max(c, highest);
}
return new String(new Char[] { highest });
}