To define constants, what is the more common and correct way? What is the cost, in terms of compilation, linking, etc., of defining constants with #define? It is another way less expensive?
The best way to define any const is to write
const int m = 7;
const float pi = 3.1415926f;
const char x = 'F';
Using #define is a bad c++ style. It is impossible to hide #define in namespace scope.
Compare
#define pi 3.1415926
with
namespace myscope {
const float pi = 3.1415926f;
}
Second way is obviously better.
The compiler itself never sees a #define. The preprocessor expands all macros before they're passed to the compiler. One of the side effects, though, is that the values are repeated...and two identical strings are not necessarily the exact same string. If you say
#define SOME_STRING "Just an example"
it's perfectly legal for the compiler to add a copy of the string to the output file each time it sees the string. A good compiler will probably eliminate duplicate literals, but that's extra work it has to do. If you use a const instead, the compiler doesn't have to worry about that as much.
The cost is only to the preprocessor, when #defines are resolved (ignoring the additional debugging cost of dealing with a project full of #defines for constants, of course).
#define macros are processed by the pre-processor, they are not visible to the compiler. And since they are not visible to the compiler as a symbol, it is hard to debug something which involves a macro.
The preferred way of defining constants is using the const keyword along with proper type information.
const unsigned int ArraySize = 100;
Even better is
static const unsigned int ArraySize = 100;
when the constant is used only in a single file.
#define will increase Compilation time but it will faster in execution...
generally in conditional compilation #define is used...
where const is used in general computation of numbers
Choice is depends upon your requirement...
#define is string replacement. So if you make mistakes in the macros, they will show up as errors later on. Mostly incorrect types or incorrect expressions are the common ones.
For conditional compilation, pre-processor macros work best. For other constants which are to be used in computation, const works good.
CPU time isn't really the cost of using #define or macros. The "cost" as a developer is as follows:
If there is an error in your macro, the compiler will flag it where you referenced the macro, not where you defined it.
You will lose type safety and scoping for your macro.
Debugging tools will not know the value of the macro.
These things may not burn CPU cycles, but they can burn developer cycles.
For constants, declaring const variables is preferable, and for little type-independent functions, inline functions and templates are preferable.
Related
I created a DLL inside of MV C++ 2012 and when I used
Dumpbin /Exports filename
The name of the function inside of the DLL file has an equal sign inside of it. I had to use Common Language Runtime Support (/crl) because I used a DLL from C#. Is this why the name of the function would show up with an equals sign? My header file:
#ifdef ColorDLL_EXPORTS
#define ColorDLL_API __declspec(dllexport)
#else
#define ColorDLL_API __declspec(dllexport)
#endif
extern "C"{
ColorDLL_API int ColorSelect(int i);
}
ColorDLL.cpp
#include "stdafx.h"
#include "ColorDLL.h"
#using <ColorDiologeClass.dll>
extern "C"{
ColorDLL_API int ColorSelect(){
ColorDiologeClass::Class1::ColorReturn(1);
return 1;
}
}
When I used Dumpbin the name showed up as this:
Name
ColorSelect = _ColorSelect
Why is this? I am expecting it to show up as ColorSelect, not ColorSelect = _ColorSelect. And if I were to leave it this way, how would I call this function from a program like JMP where it needs the exact function name? Would it be ColorSelect? Or would it be ColorSelect = _ColorSelect?
The name is "mangled" - the return type and the parameters are enchoded into the name of the function. Should you wish to NOT have that, you would use extern "C" before the function name (or around a block of functions).
That would be name mangling, which is the under-the-covers feature of c++ that allows it to support function overloading (since it incorporates the argument types of the function into its name).
Here's another question that goes into greater detail.
Microsoft calls this "decorating" instead of mangling. They include a command line tool named "undname" that will produce the original name from the decorated name:
C:\>undname ?ColorSelect##YAHXZ
Microsoft (R) C++ Name Undecorator
Copyright (C) Microsoft Corporation. All rights reserved.
Undecoration of :- "?ColorSelect##YAHXZ"
is :- "int __cdecl ColorSelect(void)"
If you want to do the same in your own code, you can do that too, using UnDecorateSymbolName.
For what it's worth, decorating/mangling supports not only overloading, but typesafe linking. Typesafe linking stems from function overloading though it isn't really function overloading in itself.
Specifically, typesafe linking deals with (for example) how to deal with C++ code that has overloads of, say, sqrt for float, double, long double, and probably complex as well, but links to a C library that provides a double sqrt(double), but not the other overloads. In this case, we typically want that to be used when the right arguments were/are used, but not otherwise.
This can (or could) arise even without function overloading being involved. For example, in pure C you could do something like this:
#include <stdio.h>
extern int sqrt(int);
// ...
printf("%d", sqrt(100));
Now, we've told the compiler we're using a version of sqrt that takes (and returns) an int. Unfortunately, the linker doesn't realize that, so it still links with the sqrt in the standard library that takes and returns double. As a result, the code above will print some thoroughly useless result (typically 0, not that it matters a lot).
Typesafe linkage prevents that -- even though it isn't exactly function overloading, we still have two functions with the same name, but different types by the time we're linking. By encoding the parameter type(s) into the name, the linker can keep this sorted out just as well as the compiler can.
The same can (and frequently does) arise in C when we have name collisions between different libraries. With a traditional C compiler, straightening out this sort of mess can be extremely difficult (at best). With a C++ compiler, unless the two libraries use not only the same names, but identical number and types of parameters, it's never a problem at all.
After seeing how double.Nan == double.NaN is always false in C#, I became curious how the equality was implemented under the hood. So I used Resharper to decompile the Double struct, and here is what I found:
public struct Double : IComparable, IFormattable, IConvertible, IComparable<double>, IEquatable<double>
{
// stuff removed...
public const double NaN = double.NaN;
// more stuff removed...
}
This seems to indicate the the struct Double declares a constant that is defined in terms of this special lower case double, though I'd always thought that the two were completely synonymous. What's more, if I Go To Implementation on the lowercase double, Resharper simply scrolls me to the declaration at the top of the file. Similarly, jumping to implementation of the lowercase's NaN just takes me to the constant declaration earlier in the line!
So I'm trying to understand this seemingly recursive definition. Is this just an artefact of the decompiler? Perhaps a limitation in Resharper? Or is this lowercase double actually a different beast altogether - representing something at a lower level from the CLR/CTS?
Where does NaN really come from?
Beware looking at decompiled code, especially if it is for something inbuilt. The actual IL here (for .NET 4.5, at least) is:
.field public static literal float64 NaN = float64(NaN)
{
.custom instance void __DynamicallyInvokableAttribute::.ctor()
}
i.e. this is handled directly in IL via the NaN token.
However, because it is a const (literal in IL), it will get "burned into" the call site; anywhere else that uses double.NaN will also be using float64(NaN). Similarly, example, if I do:
const int I = 2;
int i = I;
int j = 2;
both of these assignments will look identical in the final IL (they will both be ldc.i4.2).
Because of this, most decompilers will recognise the IL pattern NaN and represent it with the language's equivalent of double.NaN. But that doesn't mean that the code is itself recursive; they probably just don't have a check for "but is it double.NaN itself?". Ultimately, this is simply a special case, where float64(NaN) is a recognised value in IL.
Incidentally, reflector decompiles it as:
[__DynamicallyInvokable]
public const double NaN = (double) 1.0 / (double) 0.0;
That again doesn't meant that this is truth :p Merely that this is something which may have the same end result.
By far the best source you can get for .NET assemblies is the actual source code that was used to build them. Beats any decompiler for accuracy, the comments can be quite useful as well. Download the Reference Source.
You'll then also see that Double.NaN isn't defined in IL as Marc assumed, it's actually in a C# source code file. The net/clr/bcl/system/double.cs source code file shows the real declaration:
public const double NaN = (double)0.0 / (double)0.0;
Which takes advantage of the C# compiler evaluating constant expressions at compile time. Or to put it tongue-in-cheek, NaN is defined by the C++ compiler since that's the language that was used to write the C# compiler ;)
In C++ we can do this:
struct {
#if defined (BIGENDIAN)
uint32_t h;
uint32_t l;
#else
uint32_t l;
uint32_t h;
#endif
} dw;
Now, in C# not so simple. I have a method to test for BigEndian but to define the struct at compile time, how can we get the same effect in C#? I was thinking that I can have classes like "BoardBig" and "BoardLittle" and use a factory to get the class I need based off of the IsBigEndian check. And for _WIN64 checks, I can have classes like "Position_64" and "Position_32" something like that. Is this a good approach? Since C# cannot define statements like #define IsBigEndian 1 or what have ya, not sure what to do.
Update: And as other posters have pointed out (upvoted), this is not a solution for endianness in C#.
C# Conditional compilation directives
#if BIGENDIAN
uint32_t h;
uint32_t l;
#else
uint32_t l;
uint32_t h;
#endif
BTW, you should avoid these if you can. Makes code harder to test.
Since you cannot "memory-map" the C# structures to raw data, there is no real advantage is using preprocessor for this purpose. So while C# does have preprocessor features that can be used for other purposes, I don't think they will be valuable to you here.
Instead, just work with one preferred structure and bury the low-level bit-twiddling for the special cases. Here is an example of big-endian and little-endian handling for a structure:
Marshalling a big-endian byte collection into a struct in order to pull out values
There is conditional compilation in C#, but you can't use it to get different code depending on the endianess. For managed languages the endianess of the system is not known at compile time.
The compiler produces IL code, which can be executed both on big endian and little endian systems. It's the JIT compiler that takes care of turning the IL code into native machine code, and turn numeric literals into the correct format.
You can use BitConverter.IsLittleEndian to find out the endianess at runtime.
Why does LayoutKind.Sequential work differently if a struct contains a DateTime field?
Consider the following code (a console app which must be compiled with "unsafe" enabled):
using System;
using System.Runtime.InteropServices;
namespace ConsoleApplication3
{
static class Program
{
static void Main()
{
Inner test = new Inner();
unsafe
{
Console.WriteLine("Address of struct = " + ((int)&test).ToString("X"));
Console.WriteLine("Address of First = " + ((int)&test.First).ToString("X"));
Console.WriteLine("Address of NotFirst = " + ((int)&test.NotFirst).ToString("X"));
}
}
}
[StructLayout(LayoutKind.Sequential)]
public struct Inner
{
public byte First;
public double NotFirst;
public DateTime WTF;
}
}
Now if I run the code above, I get output similar to the following:
Address of struct = 40F2CC
Address of First = 40F2D4
Address of NotFirst = 40F2CC
Note that the address of First is NOT the same as the address of the struct; however, the address of NotFirst is the same as the address of the struct.
Now comment out the "DateTime WTF" field in the struct, and run it again.
This time, I get output similar to this:
Address of struct = 15F2E0
Address of First = 15F2E0
Address of NotFirst = 15F2E8
Now "First" does have the same address as the struct.
I find this behaviour surprising given the use of LayoutKind.Sequential. Can anyone provide an explanation? Does this behaviour have any ramifications when doing interop with C/C++ structs that use the Com DATETIME type?
[EDIT] NOTE: I have verified that when you use Marshal.StructureToPtr() to marshal the struct, the data is marshalled in the correct order, with the "First" field being first. This seems to suggest that it will work fine with interop. The mystery is why the internal layout changes - but of course, the internal layout is never specified, so the compiler can do what it likes.
[EDIT2] Removed "unsafe" from struct declaration (it was leftover from some testing I was doing).
[EDIT3] The original source for this question was from the MSDN C# forums:
http://social.msdn.microsoft.com/Forums/en-US/csharplanguage/thread/fb84bf1d-d9b3-4e91-823e-988257504b30
Why does LayoutKind.Sequential work differently if a struct contains a DateTime field?
It is related to the (surprising) fact that DateTime itself has layout "Auto" (link to SO question by myself). This code reproduces the behavior you saw:
static class Program
{
static unsafe void Main()
{
Console.WriteLine("64-bit: {0}", Environment.Is64BitProcess);
Console.WriteLine("Layout of OneField: {0}", typeof(OneField).StructLayoutAttribute.Value);
Console.WriteLine("Layout of Composite: {0}", typeof(Composite).StructLayoutAttribute.Value);
Console.WriteLine("Size of Composite: {0}", sizeof(Composite));
var local = default(Composite);
Console.WriteLine("L: {0:X}", (long)(&(local.L)));
Console.WriteLine("M: {0:X}", (long)(&(local.M)));
Console.WriteLine("N: {0:X}", (long)(&(local.N)));
}
}
[StructLayout(LayoutKind.Auto)] // also try removing this attribute
struct OneField
{
public long X;
}
struct Composite // has layout Sequential
{
public byte L;
public double M;
public OneField N;
}
Sample output:
64-bit: True
Layout of OneField: Auto
Layout of Composite: Sequential
Size of Composite: 24
L: 48F050
M: 48F048
N: 48F058
If we remove the attribute from OneField, things behave as expected. Example:
64-bit: True
Layout of OneField: Sequential
Layout of Composite: Sequential
Size of Composite: 24
L: 48F048
M: 48F050
N: 48F058
These example are with x64 platform compilation (so the size 24, three times eight, is unsurprising), but also with x86 we see the same "disordered" pointer addresses.
So I guess I can conclude that the layout of OneField (resp. DateTime in your example) has influence on the layout of the struct containing a OneField member even if that composite struct itself has layout Sequential. I am not sure if this is problematic (or even required).
According to comment by Hans Passant in the other thread, it no longer makes an attempt to keep it sequential when one of the members is an Auto layout struct.
Go read the specification for layout rules more carefully. Layout rules only govern the layout when the object is exposed in unmanaged memory. This means that the compiler is free to place the fields however it wants until the object is actually exported. Somewhat to my surprise, this is even true for FixedLayout!
Ian Ringrose is right about compiler efficiency issues, and that does account for the final layout that is being selected here, but it has nothing to do with why the compiler is ignoring your layout specification.
A couple of people have pointed out that DateTime has Auto layout. That is the ultimate source of your surprise, but the reason is a bit obscure. The documentation for Auto layout says that "objects defined with [Auto] layout cannot be exposed outside of managed code. Attempting to do so generates an exception." Also note that DateTime is a value type. By incorporating a value type having Auto layout into your structure, you inadvertently promised that you would never expose the containing structure to unmanaged code (because doing so would expose the DateTime, and that would generate an exception). Since the layout rules only govern objects in unmanaged memory, and your object can never be exposed to unmanaged memory, the compiler is not constrained in its choice of layout and is free to do whatever it wants. In this case it is reverting to the Auto layout policy in order to achieve better structure packing and alignment.
There! Wasn't that obvious!
All of this, by the way, is recognizable at static compile time. In fact, the compiler is recognizing it in order to decide that it can ignore your layout directive. Having recognized it, a warning here from the compiler would seem to be in order. You haven't actually done anything wrong, but it's helpful to be told when you've written something that has no effect.
The various comments here recommending Fixed layout are generally good advice, but in this case that wouldn't necessarily have any effect, because including the DateTime field exempted the compiler from honoring layout at all. Worse: the compiler isn't required to honor layout, but it is free to honor layout. Which means that successive versions of CLR are free to behave differently on this.
The treatment of layout, in my view, is a design flaw in CLI. When the user specifies a layout, the compiler shouldn't go lawyering around them. Better to keep things simple and have the compiler do what it is told. Especially so where layout is concerned. "Clever", as we all know, is a four letter word.
A few factors
doubles are a lot faster if they are aligned
CPU caches may work better if there are no “holes” in the struck
So the C# compiler has a few undocumented rules it uses to try to get the “best” layout of structs, these rules may take into account the total size of a struct, and/or if it contains another struct etc. If you need to know the layout of a struct then you should specify it yourself rather than letting the compiler decide.
However the LayoutKind.Sequential does stop the compiler changing the order of the fields.
To answer my own questions (as advised):
Question: "Does this behaviour have any ramifications when doing interop with C/C++ structs that use the Com DATETIME type?"
Answer: No, because the layout is respected when using Marshalling. (I verified this empirically.)
Question "Can anyone provide an explanation?".
Answer: I'm still not sure about this, but since the internal representation of a struct is not defined, the compiler can do what it likes.
You're checking the addresses as they are within the managed structure. Marshal attributes have no guarantees for the arrangement of fields within managed structures.
The reason it marshals correctly into native structures, is because the data is copied into native memory using the attributes set by marshal values.
So, the arrangement of the managed structure has no impact on the arranged of the native structure. Only the attributes affect the arrangement of native structure.
If fields setup with marshal attributes were stored in managed data the same way as native data, then there would be no point in Marshal.StructureToPtr, you'd simply byte-copy the data over.
If you're going to interop with C/C++, I would always be specific with the StructLayout. Instead of Sequential, I would go with Explicit, and specify each position with FieldOffset. In addition, add your Pack variable.
[StructLayout(LayoutKind.Explicit, Pack=1, CharSet=CharSet.Unicode)]
public struct Inner
{
[FieldOffset(0)]
public byte First;
[FieldOffset(1)]
public double NotFirst;
[FieldOffset(9)]
public DateTime WTF;
}
It sounds like DateTime can't be Marshaled anyhow, only to a string (bingle Marshal DateTime).
The Pack variable is especially important in C++ code that might be compiled on different systems that have different word sizes.
I would also ignore the addresses that can be seen when using unsafe code. It doesn't really matter what the compiler does as long as the Marshaling is correct.
Having a separate helper assembly containing only P/Invoke declarations for legacy 3rd party components, I wonder which of these two ways is The Better One™ if the assembly must be marked CLS compliant:
Use Int32 in a public P/Invoke declaration where the unmanaged declaration has unsigned int.
Use UInt32 in an internal P/Invoke declaration where the unmanaged declaration has unsigned int, and wrap it in a public method that takes an Int32 and converts it to UInt32 when calling the internal method.
What are the up- and downsides of these?
The P/Invoke marshaller isn't going to complain when the uint gets too big, you'll just end up with a negative int. The extra layer does allow you to use the checked keyword to generate an OverflowException. Which is fairly desirable.
Whether it is worth the hassle is a secondary question. Lots of APIs, like Win32, use unsigned as a logical constraint. Like the length of the string or the size of a block of memory, it can never be negative. In practice, such a number can never overflow. Because it isn't possible to allocate that much memory. I can't remember running once in an API where it was a slam-dunk that uint should be used. As such, I think you're fine just using a straight pinvoke declaration with ints.
I don't think you'd get correct behavior if you went with option 1. Int32 can only go as high as 2,147,483,647. Whereas the unsigned int goes up to 4,294,967,295. As long as you KNOW you don't need any values above 2 billion, it doesn't really matter. But to be technically correct, the public interface should expose a larger type and perform bounds checking to make sure it fits in an unsigned int and throw an exception if it doesn't. An Int64 will do (9,223,372,036,854,775,807).