Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've seen this a couple of times recently in high profile code, where constant values are defined as variables, named after the value, then used only once. I wondered why it gets done?
E.g. Linux Source (resize.c)
unsigned five = 5;
unsigned seven = 7;
E.g. C#.NET Source (Quaternion.cs)
double zero = 0;
double one = 1;
Naming numbers is terrible practice, one day something will need to change, and you'll end up with unsigned five = 7.
If it has some meaning, give it a meaningful name. The 'magic number' five is no improvement over the magic number 5, it's worse because it might not actually equal 5.
This kind of thing generally arises from some cargo-cult style programming style guidelines where someone heard that "magic numbers are bad" and forbade their use without fully understanding why.
Well named variables
Giving proper names to variables can dramatically clarify code, such as
constant int MAXIMUM_PRESSURE_VALUE=2;
This gives two key advantages:
The value MAXIMUM_PRESSURE_VALUE may be used in many different places, if for whatever reason that value changes you need to change it in only one place.
Where used it immediately shows what the function is doing, for example the following code obviously checks if the pressure is dangerously high:
if (pressure>MAXIMUM_PRESSURE_VALUE){
//without me telling you you can guess there'll be some safety protection in here
}
Poorly named variables
However, everything has a counter argument and what you have shown looks very like a good idea taken so far that it makes no sense. Defining TWO as 2 doesn't add any value
constant int TWO=2;
The value TWO may be used in many different places, perhaps to double things, perhaps to access an index. If in the future you need to change the index you cannot just change to int TWO=3; because that would affect all the other (completely unrelated) ways you've used TWO, now you'd be tripling instead of doubling etc
Where used it gives you no more information than if you just used "2". Compare the following two pieces of code:
if (pressure>2){
//2 might be good, I have no idea what happens here
}
or
if (pressure>TWO){
//TWO means 2, 2 might be good, I still have no idea what happens here
}
Worse still (as seems to be the case here) TWO may not equal 2, if so this is a form of obfuscation where the intention is to make the code less clear: obviously it achieves that.
The usual reason for this is a coding standard which forbids magic numbers but doesn't count TWO as a magic number; which of course it is! 99% of the time you want to use a meaningful variable name but in that 1% of the time using TWO instead of 2 gains you nothing (Sorry, I mean ZERO).
this code is inspired by Java but is intended to be language agnostic
Short version:
A constant five that just holds the number five is pretty useless. Don't go around making these for no reason (sometimes you have to because of syntax or typing rules, though).
The named variables in Quaternion.cs aren't strictly necessary, but you can make the case for the code being significantly more readable with them than without.
The named variables in ext4/resize.c aren't constants at all. They're tersely-named counters. Their names obscure their function a bit, but this code actually does correctly follow the project's specialized coding standards.
What's going on with Quaternion.cs?
This one's pretty easy.
Right after this:
double zero = 0;
double one = 1;
The code does this:
return zero.GetHashCode() ^ one.GetHashCode();
Without the local variables, what does the alternative look like?
return 0.0.GetHashCode() ^ 1.0.GetHashCode(); // doubles, not ints!
What a mess! Readability is definitely on the side of creating the locals here. Moreover, I think explicitly naming the variables indicates "We've thought about this carefully" much more clearly than just writing a single confusing return statement would.
What's going on with resize.c?
In the case of ext4/resize.c, these numbers aren't actually constants at all. If you follow the code, you'll see that they're counters and their values actually change over multiple iterations of a while loop.
Note how they're initialized:
unsigned three = 1;
unsigned five = 5;
unsigned seven = 7;
Three equals one, huh? What's that about?
See, what actually happens is that update_backups passes these variables by reference to the function ext4_list_backups:
/*
* Iterate through the groups which hold BACKUP superblock/GDT copies in an
* ext4 filesystem. The counters should be initialized to 1, 5, and 7 before
* calling this for the first time. In a sparse filesystem it will be the
* sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
* For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
*/
static unsigned ext4_list_backups(struct super_block *sb, unsigned *three,
unsigned *five, unsigned *seven)
They're counters that are preserved over the course of multiple calls. If you look at the function body, you'll see that it's juggling the counters to find the next power of 3, 5, or 7, creating the sequence you see in the comment: 1, 3, 5, 7, 9, 25, 27, &c.
Now, for the weirdest part: the variable three is initialized to 1 because 30 = 1. The power 0 is a special case, though, because it's the only time 3x = 5x = 7x. Try your hand at rewriting ext4_list_backups to work with all three counters initialized to 1 (30, 50, 70) and you'll see how much more cumbersome the code becomes. Sometimes it's easier to just tell the caller to do something funky (initialize the list to 1, 5, 7) in the comments.
So, is five = 5 good coding style?
Is "five" a good name for the thing that the variable five represents in resize.c? In my opinion, it's not a style you should emulate in just any random project you take on. The simple name five doesn't communicate much about the purpose of the variable. If you're working on a web application or rapidly prototyping a video chat client or something and decide to name a variable five, you're probably going to create headaches and annoyance for anyone else who needs to maintain and modify your code.
However, this is one example where generalities about programming don't paint the full picture. Take a look at the kernel's coding style document, particularly the chapter on naming.
GLOBAL variables (to be used only if you really need them) need to
have descriptive names, as do global functions. If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should not call it "cntusr()".
...
LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood. Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.
If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See chapter 6 (Functions).
Part of this is C-style coding tradition. Part of it is purposeful social engineering. A lot of kernel code is sensitive stuff, and it's been revised and tested many times. Since Linux is a big open-source project, it's not really hurting for contributions — in most ways, the bigger challenge is checking those contributions for quality.
Calling that variable five instead of something like nextPowerOfFive is a way to discourage contributors from meddling in code they don't understand. It's an attempt to force you to really read the code you're modifying in detail, line by line, before you try to make any changes.
Did the kernel maintainers make the right decision? I can't say. But it's clearly a purposeful move.
My organisation have certain programming guidelines, one of which is the use of magic numbers...
eg:
if (input == 3) //3 what? Elephants?....3 really is the magic number here...
This would be changed to:
#define INPUT_1_VOLTAGE_THRESHOLD 3u
if (input == INPUT_1_VOLTAGE_THRESHOLD) //Not elephants :(
We also have a source file with -200,000 -> 200,000 #defined in the format:
#define MINUS_TWO_ZERO_ZERO_ZERO_ZERO_ZERO -200000
which can be used in place of magic numbers, for example when referencing a specific index of an array.
I imagine this has been done for "Readability".
The numbers 0, 1, ... are integers. Here, the 'named variables' give the integer a different type. It might be more reasonable to specify these constant (const unsigned five = 5;)
I've used something akin to that a couple times to write values to files:
const int32_t zero = 0 ;
fwrite( &zero, sizeof(zero), 1, myfile );
fwrite accepts a const pointer, but if some function needs a non const pointer, you'll end up using a non const variable.
P.S.: That always keeps me wondering what may be the sizeof zero .
How do you come to a conslusion that it is used only once? It is public, it could be used any number of times from any assembly.
public static readonly Quaternion Zero = new Quaternion();
public static readonly Quaternion One = new Quaternion(1.0f, 1.0f, 1.0f, 1.0f);
Same thing applies to .Net framework decimal class. which also exposes public constants like this.
public const decimal One = 1m;
public const decimal Zero = 0m;
Numbers are often given a name when these numbers have special meaning.
For example in the Quaternion case the identity quaternion and unit length quaternion have special meaning and are frequently used in a special context. Namely Quaternion with (0,0,0,1) is an identity quaternion so it's a common practice to define them instead of using magic numbers.
For example
// define as static
static Quaternion Identity = new Quaternion(0,0,0,1);
Quaternion Q1 = Quaternion.Identity;
//or
if ( Q1.Length == Unit ) // not considering floating point error
One of my first programming jobs was on a PDP 11 using Basic. The Basic interpreter allocated memory to every number required, so every time the program mentioned 0, a byte or two would be used to store the number 0. Of course back in those days memory was a lot more limited than today and so it was important to conserve.
Every program in that work place started with:
10 U0%=0
20 U1%=1
That is, for those who have forgotten their Basic:
Line number 10: create an integer variable called U0 and assign it the number 0
Line number 20: create an integer variable called U1 and assign it the number 1
These variables, by local convention, never held any other value, so they were effectively constants. They allowed 0 and 1 to be used throughout the program without wasting any memory.
Aaaaah, the good old days!
some times it's more readable to write:
double pi=3.14; //Constant or even not constant
...
CircleArea=pi*r*r;
instead of:
CircleArea=3.14*r*r;
and may be you would use pi more again (you are not sure but you think it's possible later or in other classes if they are public)
and then if you want to change pi=3.14 into pi=3.141596 it's easier.
and some other like e=2.71, Avogadro and etc.
Is there a generally accepted best approach to coding complex math? For example:
double someNumber = .123 + .456 * Math.Pow(Math.E, .789 * Math.Pow((homeIndex + .22), .012));
Is this a point where hard-coding the numbers is okay? Or should each number have a constant associated with it? Or is there even another way, like storing the calculations in config and invoking them somehow?
There will be a lot of code like this, and I'm trying to keep it maintainable.
Note: The example shown above is just one line. There would be tens or hundreds of these lines of code. And not only could the numbers change, but the formula could as well.
Generally, there are two kinds of constants - ones with the meaning to the implementation, and ones with the meaning to the business logic.
It is OK to hard-code the constants of the first kind: they are private to understanding your algorithm. For example, if you are using a ternary search and need to divide the interval in three parts, dividing by a hard-coded 3 is the right approach.
Constants with the meaning outside the code of your program, on the other hand, should not be hard-coded: giving them explicit names gives someone who maintains your code after you leave the company non-zero chances of making correct modifications without having to rewrite things from scratch or e-mailing you for help.
"Is it okay"? Sure. As far as I know, there's no paramilitary police force rounding up those who sin against the one true faith of programming. (Yet.).
Is it wise?
Well, there are all sorts of ways of deciding that - performance, scalability, extensibility, maintainability etc.
On the maintainability scale, this is pure evil. It make extensibility very hard; performance and scalability are probably not a huge concern.
If you left behind a single method with loads of lines similar to the above, your successor would have no chance maintaining the code. He'd be right to recommend a rewrite.
If you broke it down like
public float calculateTax(person)
float taxFreeAmount = calcTaxFreeAmount(person)
float taxableAmount = calcTaxableAmount(person, taxFreeAmount)
float taxAmount = calcTaxAmount(person, taxableAmount)
return taxAmount
end
and each of the inner methods is a few lines long, but you left some hardcoded values in there - well, not brilliant, but not terrible.
However, if some of those hardcoded values are likely to change over time (like the tax rate), leaving them as hardcoded values is not okay. It's awful.
The best advice I can give is:
Spend an afternoon with Resharper, and use its automatic refactoring tools.
Assume the guy picking this up from you is an axe-wielding maniac who knows where you live.
I usually ask myself whether I can maintain and fix the code at 3 AM being sleep deprived six months after writing the code. It has served me well. Looking at your formula, I'm not sure I can.
Ages ago I worked in the insurance industry. Some of my colleagues were tasked to convert the actuarial formulas into code, first FORTRAN and later C. Mathematical and programming skills varied from colleague to colleague. What I learned was the following reviewing their code:
document the actual formula in code; without it, years later you'll have trouble remember the actual formula. External documentation goes missing, become dated or simply may not be accessible.
break the formula into discrete components that can be documented, reused and tested.
use constants to document equations; magic numbers have very little context and often require existing knowledge for other developers to understand.
rely on the compiler to optimize code where possible. A good compiler will inline methods, reduce duplication and optimize the code for the particular architecture. In some cases it may duplicate portions of the formula for better performance.
That said, there are times where hard coding just simplify things, especially if those values are well understood within a particular context. For example, dividing (or multiplying) something by 100 or 1000 because you're converting a value to dollars. Another one is to multiply something by 3600 when you'd like to convert hours to seconds. Their meaning is often implied from the greater context. The following doesn't say much about magic number 100:
public static double a(double b, double c)
{
return (b - c) * 100;
}
but the following may give you a better hint:
public static double calculateAmountInCents(double amountDue, double amountPaid)
{
return (amountDue - amountPaid) * 100;
}
As the above comment states, this is far from complex.
You can however store the Magic numbers in constants/app.config values, so as to make it easier for the next developer to maitain your code.
When storing such constants, make sure to explain to the next developer (read yourself in 1 month) what your thoughts were, and what they need to keep in mind.
Also ewxplain what the actual calculation is for and what it is doing.
Do not leave in-line like this.
Constant so you can reuse, easily find, easily change and provides for better maintaining when someone comes looking at your code for the first time.
You can do a config if it can/should be customized. What is the impact of a customer altering the value(s)? Sometimes it is best to not give them that option. They could change it on their own then blame you when things don't work. Then again, maybe they have it in flux more often than your release schedules.
Its worth noting that the C# compiler (or is it the CLR) will automatically inline 1 line methods so if you can extract certain formulas into one liners you can just extract them as methods without any performance loss.
EDIT:
Constants and such more or less depends on the team and the quantity of use. Obviously if you're using the same hard-coded number more than once, constant it. However if you're writing a formula that its likely only you will ever edit (small team) then hard coding the values is fine. It all depends on your teams views on documentation and maintenance.
If the calculation in your line explains something for the next developer then you can leave it, otherwise its better to have calculated constant value in your code or configuration files.
I found one line in production code which was like:
int interval = 1 * 60 * 60 * 1000;
Without any comment, it wasn't hard that the original developer meant 1 hour in milliseconds, rather than seeing a value of 3600000.
IMO May be leaving out calculations is better for scenarios like that.
Names can be added for documentation purposes. The amount of documentation needed depends largely on the purpose.
Consider following code:
float e = m * 8.98755179e16;
And contrast it with the following one:
const float c = 299792458;
float e = m * c * c;
Even though the variable names are not very 'descriptive' in the latter you'll have much better idea what the code is doing the the first one - arguably there is no need to rename the c to speedOfLight, m to mass and e to energy as the names are explanatory in their domains.
const float speedOfLight = 299792458;
float energy = mass * speedOfLight * speedOfLight;
I would argue that the second code is the clearest one - especially if programmer can expect to find STR in the code (LHC simulator or something similar). To sum up - you need to find an optimal point. The more verbose code the more context you provide - which might both help to understand the meaning (what is e and c vs. we do something with mass and speed of light) and obscure the big picture (we square c and multiply by m vs. need of scanning whole line to get equation).
Most constants have some deeper meening and/or established notation so I would consider at least naming it by the convention (c for speed of light, R for gas constant, sPerH for seconds in hour). If notation is not clear the longer names should be used (sPerH in class named Date or Time is probably fine while it is not in Paginator). The really obvious constants could be hardcoded (say - division by 2 in calculating new array length in merge sort).
From this post (and not only) we got to the point that the ++ operator cannot be applied on expressions returning value.
And it's really obvious that 5++ is better to write as 5 + 1. I just want to summarize the whole thing around the increment/decrement operator. So let's go through these snippets of code that could be helpful to somebody stuck with the ++ first time at least.
// Literal
int x = 0++; // Error
// Constant
const int Y = 1;
double f = Y++; // error. makes sense, constants are not variables actually.
int z = AddFoo()++; // Error
Summary: ++ works for variables, properties (through a synthetic sugar) and indexers(the same).
Now the interest part - any literal expressions are optimized in CSC and, hence when we write, say
int g = 5 + 1; // This is compiled to 6 in IL as one could expect.
IL_0001: ldc.i4.6 // Pushes the integer value of 6 onto the evaluation stack as an int32.
For 5++ doesn't mean 5 becomes 6, it could be a shorthand for 5 + 1, like for x++ = x + 1
What's the real reason behind this restriction?
int p = Foo()++ //? yes you increase the return value of Foo() with 1, what's wrong with that?
Examples of code that can lead to logical issues are appreciated.
One of real-life example could be, perform one more actions than in the array.
for (int i = 0; i < GetCount()++; i++) { }
Maybe the lack of usage opts compiler teams to avoid similar features?
I don't insist this is a feature we lack of, just want to understand the dark side of this for compiler writers perhaps, though I'm not. But I know c++ allows this when returning a reference in the method. I'm neither a c++ guy(very poor knowledge) just want to get the real gist of the restriction.
Like, is it just because c# guys opted to restrict the ++ over value expressions or there are definite cases leading to unpredictable results?
In order for a feature to be worth supporting, it really needs to be useful. The code you've presented is in every case less readable than the alternative, which is just to use the normal binary addition operator. For example:
for (int i = 0; i < GetCount() + 1; i++) { }
I'm all in favour of the language team preventing you from writing unreadable code when in every case where you could do it, there's a simpler alternative.
Well before using these operators you should try to read up on how they do what they do. In particular you should understand the difference between postfix and prefix, which could help figure out what is and isn't allowed.
The ++ and -- operators modify their operands. Which means that the operand must be modifiable. If you can assign a value to the expression in question then it is modifiable, and is probably a variable(c#).
Taking a look at what these operators actually do. The postfix operators should increment after your line of code executes. As for the prefix operators, well they would need to have access to the value before the method had even been called yet. The way I read the syntax is ++lvalue (or ++variable) converting to memory operations:[read, write, read] or for lvalue++ [read, read, write] Though many compilers probably optimize secondary reads.
So looking at foo()++; the value is going to be plopped dead in the center of executing code. Which would mean the compiler would need to save the value somewhere more long-term in order for operations to be performed on said value, after the line of code has finished executing. Which is no doubt the exact reason C++ does not support this syntax either.
If you were to be returning a reference the compiler wouldn't have any trouble with the postfix. Of course in C# value types (ie. int, char, float, etc) cannot be passed by reference as they are value types.
I will have literally tens of millions of instances of some class MyClass and want to minimize its memory size. The question of measuring how much space an object takes in the memory was discussed in Find out the size of a .net object
I decided to follow Jon Skeet's suggestion, and this is my code:
// Edit: This line is "dangerous and foolish" :-)
// (However, commenting it does not change the result)
// [StructLayout(LayoutKind.Sequential, Pack = 1)]
public class MyClass
{
public bool isit;
public MyClass nextRight;
public MyClass nextDown;
}
class Program
{
static void Main(string[] args)
{
var a1 = new MyClass(); //to prevent JIT code mangling the result (Skeet)
var before = GC.GetTotalMemory(true);
MyClass[] arr = new MyClass[10000];
for (int i = 0; i < 10000; i++)
arr[i] = new MyClass();
var after = GC.GetTotalMemory(true);
var per = (after - before) / 10000.0;
Console.WriteLine("Before: {0} After: {1} Per: {2}", before, after, per);
Console.ReadLine();
}
}
I run the program on 64 bit Windows, Choose "release", platform target: "any cpu", and choose "optimize code" (The options only matter if I explicitly target x86) The result is, sadly, 48 bytes per instance.
My calculation would be 8 bytes per reference, plus 1 byte for bool plus some ~8byte overhead. What is going on? Is this a conspiracy to keep RAM prices high and/or let non-Microsoft code bloat? Well, ok, I guess my real question is: what am I doing wrong, or how can I minimize the size of MyClass?
Edit: I apologize for being sloppy in my question, I edited a couple of identifier names. My concrete and immediate concern was to build a "2-dim linked-list" as a sparse boolean matrice implementation, where I can get an enumeration of set values in a given row/column easily. [Of course that means I have to also store the x,y coordinates on the class, which makes my idea even less feasible]
Approach the problem from the other end. Rather than asking yourself "how can I make this data structure smaller and still have tens of millions of them allocated?" ask yourself "how can I represent this data using a completely different data structure that is far more compact?"
It looks like you are building a doubly-linked list of bools, which, as you note, uses thirty to fifty times more memory than it needs to. Is there some reason why you're not simply using a BitArray to store your list of bools?
UPDATE:
in fact I was trying to implement a sparse boolean two-dimensional matrix
Well why didn't you say so in the first place?
When I want to make a sparse Boolean two-d matrix of enormous size, I build an immutable persistent boolean quadtree with a memoized factory. If the array is sparse, or even if it is dense but self-similar in some way, you can achieve enormous compressions. Square arrays of 264 x 264 Booleans are easily representable even though obviously as a real array, that would be more memory than exists in the world.
I have been toying with the idea of doing a series of blog articles on this technique; I will likely do so in late March. (UPDATE: I did not write that article in March 2012; I wrote it in August 2020. https://ericlippert.com/2020/08/17/life-part-32/)
Briefly, the idea is to make an abstract class Quad that has two subclasses: Single, and Multi. "Single" is a doubleton -- like a singleton, but with exactly two instances, called True and False. A Multi is a Quad that has four sub-quads, called NorthEast, SouthEast, SouthWest and NorthWest.
Each Quad has an integer "level"; the level of a Single is zero, and a multi of level n is required to have all of its children be Quads of level n-1.
The Multi factory is memoized; when you ask it to make a new Multi with four children, it consults a cache to see if it has made it before. If it has, it does not construct a new one; it hands out the old one. Since Quads are immutable, you do not have to worry about someone changing the Quad on you after it is in the cache.
Consider now how many memory words (a word is 4 or 8 bytes depending on architecture) an "all false" Multi of level n consumes. A level 1 "all false" multi consumes four words for the links to its children, a word for the level count (if necessary; you are not required to keep the level in the multi, though it helps for debugging) and a couple words for the sync block and so on. Let's call it eight words. (Plus the memory for the False Single quad, which we can assume is a constant two or three words, and thereby may be ignored.)
A level 2 "all false" multi consumes the same eight words, but each of its four children is the same level 1 multi. Therefore the total consumption of the level 2 "all false" multi is let's say 16 words.
The same for the level 3, 4,... and so on. The total memory consumption for a level 64 multi that is logically a 264 x 264 square array of Booleans is only 64 x 16 memory words!
Make sense? Hopefully that is enough of a sketch to get you going. If not, see my blog link above.
8 (object reference) + 8 (object reference) + 1 (bool) + 16 (header) + 8 (reference in array itself) = 41
Even if it's misaligned internally, each will be aligned on the heap. So we're looking at least 48bytes.
I can't for the life of me see why you'd want a linked list of bools though. A list of them would take 48times less space, and that's before you get to optimisations of storing a bool per bit which would make it 384 times smaller. And easier to manipulate.
If these hundreds of millions of instances of the class are mostly copies of the class with minor variations in class property values, then your system is a prime candidate to use what is called the Flyweight pattern. This pattern minimizes memory use by using the same instanes over and over, and just changing the properties as needed...
I was reading a couple of threads on here about structs and there is/was one about structs and how they should be representing immutable values (eg like a digit - 1) because of their value type behaviour/semantics.
But on the other hand, structs represent things like phone numbers, which can change for the same household.
Is this a hard and fast rule?
A phone number does not change; you just get a different one and discard the old one. The old one is still the same it always was. Same with dates, numbers, etc. - think of this when approaching structs. They are a way to encapsulate a value - which simply is; not the usage of the value, which changes.
Yes, structs should always be immutable! Mutable structs can cause terrible headaches as their usage can create very strange behavoir.
Yes, structs should almost always be immutable. For example, in your phone number case, the phone number itself doesn't mutate: what happens is that the household is allocated a new phone number. The phone number 555-555-1234 is still the phone number 555-555-1234, but the household's phone number is the different number 555-555-5678.
Note that you can find violations of this guideline in the .NET Framework. For example, the WPF Point and Size structs are mutable. This is not a good practice to follow, as one finds out when one tries to write something.Location.X = newX.
Immutable value != Immutable variable
What I means is, even if your variable contains a value that can't be changed, you can still change your variable to have different contents. int x = 5; x++; is legal. 5++; is not.
If your struct contains an integer, and you assign a new value to that integer (e.g. myStruct.MyInt++. you might think you are changing the value of MyInt. Really, you're storing a new value that's one greater than the old value.
Why does it matter? Because there might be another thread accessing myStruct.MyInt concurrently, and you don't want the value it's working with to suddenly change int the middle of being used.