I've done a few years of c# now, and I'm trying to learn some new stuff. So I decided to have a look at c++, to get to know programming in a different way.
I've been doing loads of reading, but I just started writing some code today.
On my Windows 7/64 bit machine, running VS2010, I created two projects:
1) A c# project that lets me write things the way I'm used to.
2) A c++ "makefile" project that let's me play around, trying to implement the same thing. From what I understand, this ISN'T a .NET project.
I got to trying to populate a dictionary with 10K values. For some reason, the c++ is orders of magnitude slower.
Here's the c# below. Note I put in a function after the time measurement to ensure it wasn't "optimized" away by the compiler:
var freq = System.Diagnostics.Stopwatch.Frequency;
int i;
Dictionary<int, int> dict = new Dictionary<int, int>();
var clock = System.Diagnostics.Stopwatch.StartNew();
for (i = 0; i < 10000; i++)
dict[i] = i;
clock.Stop();
Console.WriteLine(clock.ElapsedTicks / (decimal)freq * 1000M);
Console.WriteLine(dict.Average(x=>x.Value));
Console.ReadKey(); //Don't want results to vanish off screen
Here's the c++, not much thought has gone into it (trying to learn, right?)
int input;
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
int i;
boost::unordered_map<int, int> dict;
// start timer
QueryPerformanceCounter(&t1);
for (i=0;i<10000;i++)
dict[i]=i;
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
cout << elapsedTime << " ms insert time\n";
int input;
cin >> input; //don't want console to disappear
Now, some caveats. I managed to find this related SO question. One of the guys wrote a long answer mentioning WOW64 skewing the results. I've set the project to release and gone through the "properties" tab of the c++ project, enabling everything that sounded like it would make it fast. Changed the platform to x64, though I'm not sure whether that addresses his wow64 issue. I'm not that experienced with the compiler options, perhaps you guys have more of a clue?
Oh, and the results: c#:0.32ms c++: 8.26ms. This is a bit strange. Have I misinterpreted something about what .Quad means? I copied the c++ timer code from someplace on the web, going through all the boost installation and include/libfile rigmarole. Or perhaps I am actually using different instruments unwittingly? Or there's some critical compile option that I haven't used? Or maybe the c# code is optimized because the average is a constant?
Here's the c++ command line, from the Property page->C/C++->Command Line:
/I"C:\Users\Carlos\Desktop\boost_1_47_0" /Zi /nologo /W3 /WX- /MP /Ox /Oi /Ot /GL /D "_MBCS" /Gm- /EHsc /GS- /Gy- /arch:SSE2 /fp:fast /Zc:wchar_t /Zc:forScope /Fp"x64\Release\MakeTest.pch" /Fa"x64\Release\" /Fo"x64\Release\" /Fd"x64\Release\vc100.pdb" /Gd /errorReport:queue
Any help would be appreciated, thanks.
A simple allocator change will cut that time down a lot.
boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>, boost::fast_pool_allocator<std::pair<const int, int>>> dict;
0.9ms on my system (from 10ms before). This suggests to me that actually, the vast, vast majority of your time is not spent in the hash table at all, but in the allocator. The reason that this is an unfair comparison is because your GC will never collect in such a trivial program, giving it an undue performance advantage, and native allocators do significant caching of free memory- but that'll never come into play in such a trivial example, because you've never allocated or deallocated anything and so there's nothing to cache.
Finally, the Boost pool implementation is thread-safe, whereas you never play with threads so the GC can just fall back to a single-threaded implementation, which will be much faster.
I resorted to a hand-rolled, non-freeing non-thread-safe pool allocator and got down to 0.525ms for C++ to 0.45ms for C# (on my machine). Conclusion: Your original results were very skewed because of the different memory allocation schemes of the two languages, and once that was resolved, then the difference becomes relatively minimal.
A custom hasher (as described in Alexandre's answer) dropped my C++ time to 0.34ms, which is now faster than C#.
static const int MaxMemorySize = 800000;
static int FreedMemory = 0;
static int AllocatorCalls = 0;
static int DeallocatorCalls = 0;
template <typename T>
class LocalAllocator
{
public:
std::vector<char>* memory;
int* CurrentUsed;
typedef T value_type;
typedef value_type * pointer;
typedef const value_type * const_pointer;
typedef value_type & reference;
typedef const value_type & const_reference;
typedef std::size_t size_type;
typedef std::size_t difference_type;
template <typename U> struct rebind { typedef LocalAllocator<U> other; };
template <typename U>
LocalAllocator(const LocalAllocator<U>& other) {
CurrentUsed = other.CurrentUsed;
memory = other.memory;
}
LocalAllocator(std::vector<char>* ptr, int* used) {
CurrentUsed = used;
memory = ptr;
}
template<typename U> LocalAllocator(LocalAllocator<U>&& other) {
CurrentUsed = other.CurrentUsed;
memory = other.memory;
}
pointer address(reference r) { return &r; }
const_pointer address(const_reference s) { return &r; }
size_type max_size() const { return MaxMemorySize; }
void construct(pointer ptr, value_type&& t) { new (ptr) T(std::move(t)); }
void construct(pointer ptr, const value_type & t) { new (ptr) T(t); }
void destroy(pointer ptr) { static_cast<T*>(ptr)->~T(); }
bool operator==(const LocalAllocator& other) const { return Memory == other.Memory; }
bool operator!=(const LocalAllocator&) const { return false; }
pointer allocate(size_type count) {
AllocatorCalls++;
if (*CurrentUsed + (count * sizeof(T)) > MaxMemorySize)
throw std::bad_alloc();
if (*CurrentUsed % std::alignment_of<T>::value) {
*CurrentUsed += (std::alignment_of<T>::value - *CurrentUsed % std::alignment_of<T>::value);
}
auto val = &((*memory)[*CurrentUsed]);
*CurrentUsed += (count * sizeof(T));
return reinterpret_cast<pointer>(val);
}
void deallocate(pointer ptr, size_type n) {
DeallocatorCalls++;
FreedMemory += (n * sizeof(T));
}
pointer allocate() {
return allocate(sizeof(T));
}
void deallocate(pointer ptr) {
return deallocate(ptr, 1);
}
};
int main() {
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
std::vector<char> memory;
int CurrentUsed = 0;
memory.resize(MaxMemorySize);
struct custom_hash {
size_t operator()(int x) const { return x; }
};
boost::unordered_map<int, int, custom_hash, std::equal_to<int>, LocalAllocator<std::pair<const int, int>>> dict(
std::unordered_map<int, int>().bucket_count(),
custom_hash(),
std::equal_to<int>(),
LocalAllocator<std::pair<const int, int>>(&memory, &CurrentUsed)
);
// start timer
std::string str;
QueryPerformanceCounter(&t1);
for (int i=0;i<10000;i++)
dict[i]=i;
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = ((t2.QuadPart - t1.QuadPart) * 1000.0) / frequency.QuadPart;
std::cout << elapsedTime << " ms insert time\n";
int input;
std::cin >> input; //don't want console to disappear
}
Storing a consecutive sequence of numeric integral keys added in ascending order is definitely NOT what hash tables are optimized for.
Use an array, or else generate random values.
And do some retrievals. Hash tables are highly optimized for retrieval.
You can try dict.rehash(n) with different (large) values of n before inserting elements, and see how this impacts performance. Memory allocations (they take place when the container fills buckets) are generally more expensive in C++ than in C#, and rehashing is also heavy. For std::vector and std::deque, the analog member function is reserve.
Different rehash policies and load factor threshold (have a look at the max_load_factor member function) will also greatly impact unordered_map's performance.
Next, since you're using VS2010, I suggest you use std::unordered_map from the <unordered_map> header. Don't use boost when you can use the standard library.
The actual hash function used may greatly impact performance. You may try with the following:
struct custom_hash { size_t operator()(int x) const { return x; } };
and use std::unordered_map<int, int, custom_hash>.
Finally, I agree that this is a poor usage of hash tables. Use random values for insertion, you'll get a more precise picture of what is going on. Testing insertion speeds of hash tables isn't stupid at all, but hash tables are not meant to store consecutive integers. Use a vector for this.
Visual Studio TR1 unordered_map is the same as stdext::hash_map:
Another thread asking why it performs slow, see my answer with links to others that have discovered the same issue. The conclusion is to use another hash_map implementation when in C+++:
Alternative to stdext::hash_map for performance reasons
Btw. remember when in C++ then there is big difference between optimized Release-build and non-optimized Debug-build compared to C#.
Related
Let's say we have one structure :
[StructLayout(LayoutKind.Explicit, Size=8)] // using System.Runtime.InteropServices;
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
}
What I want to have : Both direct access to type string and int values, for the field Ident in this structure, without breaking the 8 bytes size of the structure, nor having to compute a string value each time from the int value.
The field Ident in that structure as int is interesting because I can fast compare with other idents if they match, other idents may come from datas that are unrelated to this structure, but are in the same int format.
Question : Is there a way to define a field that is not part of the struture layout ? Like :
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
[NoOffset()] // <- is there something I can do the like of this
string _identStr;
public string IdentStr {
get { // EDIT ! missed the getter on this property
if (string.IsNullOrEmpty(_identStr)) _identStr =
System.Text.Encoding.ASCII.GetString(Ident.GetBytes());
// do the above only once. May use an extra private bool field to go faster.
return _identStr;
}
}
}
PS : I use pointers ('*' and '&', unsafe) because I need to deal with endianness (Local system, binary files/file format, network) and fast type conversions, fast arrays filling. I also use many flavours of Marshal methods (fixing structures on byte arrays), and a little of PInvoke and COM interop. Too bad some assemblies I'm dealing with doesn't have their dotNet counterpart yet.
TL;DR; For details only
The question is all it is about, I just don't know the answer. The following should answer most questions like "other approaches", or "why not do this instead", but could be ignored as the answer would be straightforward. Anyway, I preemptively put everything so it's clear from the start what am I trying to do. :)
Options/Workaround I'm currently using (or thinking of using) :
Create a getter (not a field) that computes the string value each time :
public string IdentStr {
get { return System.Text.Encoding.ASCII.GetString(Ident.GetBytes()); }
// where GetBytes() is an extension method that converts an int to byte[]
}
This approach, while doing the job, performs poorly : The GUI displays aircraft from a database of default flights, and injects other flights from the network with a refresh rate of one second (I should increase that to 5 seconds). I have around 1200 flights within a area, relating to 2400 airports (departure and arrival), meaning I have 2400 calls to the above code each second to display the ident in a DataGrid.
Create another struct (or class), which only purpose is to manage
data on GUI side, when not reading/writing to a stream or file. That means, read
the data with the explicit layout struct. Create another struct with
the string version of the field. Work with GUI. That will perform
better on an overall point of view, but, in the process of defining
structures for the game binaries, I'm already at 143 structures of
the kind (just with older versions of the game datas; there are a bunch I didn't write yet, and I plan to add structures for the newest datas types). ATM, more than half of them require one or more extra
fields to be of meaningful use. It's okay if I were the only one to use the assembly, but
other users will probably get lost with AirportHeader,
AirportHeaderEx, AirportEntry, AirportEntryEx,
AirportCoords, AirportCoordsEx.... I would avoid doing that.
Optimize option 1 to make computations perform faster (thanks to SO,
there are a bunch of ideas to look for - currently working on the idea). For the Ident field, I
guess I could use pointers (and I will). Already doing it for fields I must display in little endian and read/write in big
endian. There are other values, like 4x4 grid informations that are
packed in a single Int64 (ulong), that needs bit shifting to
expose the actual values. Same for GUIDs or objects pitch/bank/yaw.
Try to take advantage of overlapping fields (on study). That would work for GUIDs. Perhaps it may work for the Ident example, if MarshalAs can constrain the
value to an ASCII string. Then I just need to specify the same
FieldOffset, '0' in this case. But I'm unsure setting the field
value (entry.FieldStr = "FMEP";) actually uses the Marshal constrain on the managed code side. My undestanding is it will store the string in Unicode on managed side (?).
Furthermore, that wouldn't work for packed bits (bytes that contains
several values, or consecutive bytes hosting values that have to be
bit shifted). I believe it is impossible to specify value position, length and format
at bit level.
Why bother ? context :
I'm defining a bunch of structures to parse binary datas from array of bytes (IO.File.ReadAllBytes) or streams, and write them back, datas related to a game. Application logic should use the structures to quickly access and manipulate the datas on demand. Assembly expected capabilities is read, validate, edit, create and write, outside the scope of the game (addon building, control) and inside the scope of the game (API, live modding or monitoring). Other purpose is to understand the content of binaries (hex) and make use of that understanding to build what's missing in the game.
The purpose of the assembly is to provide a ready to use basis components for a c# addon contributor (I don't plan to make the code portable). Creating applications for the game or processing addon from source to compilation into game binaries. It's nice to have a class that loads the entire content of a file in memory, but some context require you to not do that, and only retrieve from the file what is necessary, hence the choice of the struct pattern.
I need to figure out the trust and legal issues (copyrighted data) but that's outside the scope of the main concern. If that matter, Microsoft did provide over the years public freely accessible SDKs exposing binaries structures on previous versions of the game, for the purpose of what I'm doing (I'm not the first and probably not the last to do so). Though, I wouldn't dare to expose undocumented binaries (for the latest game datas for instance), nor facilitate a copyright breach on copyrighted materials/binaries.
I'm just asking confirmation if there is a way or not to have private fields not being part of the structure layout. Naive belief ATM is "that's impossible, but there are workarounds". It's just that my c# experience is pretty sparce, so maybe I'm wrong, why I ask. Thanks !
As suggested, there are several ways to get the job done. Here are the getters/setters I came up with within the structure. I'll measure how each code performs on various scenarios later. The dict approach is very seducing as on many scenarios, I would need a directly accessible global database of (59000) airports with runways and parking spots (not just the Ident), but a fast check between struct fields is also interesting.
public string IdentStr_Marshal {
get {
var output = "";
GCHandle pinnedHandle; // CS0165 for me (-> c# v5)
try { // Fast if no exception, (very) slow if exception thrown
pinnedHandle = GCHandle.Alloc(this, GCHandleType.Pinned);
IntPtr structPtr = pinnedHandle.AddrOfPinnedObject();
output = Marshal.PtrToStringAnsi(structPtr, 4);
// Cannot use UTF8 because the assembly should work in Framework v4.5
} finally { if (pinnedHandle.IsAllocated) pinnedHandle.Free(); }
return output;
}
set {
value.PadRight(4); // Must fill the blanks - initial while loop replaced (Charlieface's)
IntPtr intValuePtr = IntPtr.Zero;
// Cannot use UTF8 because some users are on Win7 with FlightSim 2004
try { // Put a try as a matter of habit, but not convinced it's gonna throw.
intValuePtr = Marshal.StringToHGlobalAnsi(value);
Ident = Marshal.ReadInt32(intValuePtr, 0).BinaryConvertToUInt32(); // Extension method to convert type.
} finally { Marshal.FreeHGlobal(intValuePtr); // freeing the right pointer }
}
}
public unsafe string IdentStr_Pointer {
get {
string output = "";
fixed (UInt32* ident = &Ident) { // Fixing the field
sbyte* bytes = (sbyte*)ident;
output = new string(bytes, 0, 4, System.Text.Encoding.ASCII); // Encoding added (#Charlieface)
}
return output;
}
set {
// value must not exceed a length of 4 and must be in Ansi [A-Z,0-9,whitespace 0x20].
// value validation at this point occurs outside the structure.
fixed (UInt32* ident = &Ident) { // Fixing the field
byte* bytes = (byte*)ident;
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value);
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
}
}
}
static Dictionary<UInt32, string> ps_dict = new Dictionary<UInt32, string>();
public string IdentStr_StaticDict {
get {
string output; // logic update with TryGetValue (#Charlieface)
if (ps_dict.TryGetValue(Ident, out output)) return output;
output = System.Text.Encoding.ASCII.GetString(Ident.ToBytes(EndiannessType.LittleEndian));
ps_dict.Add(Ident, output);
return output;
}
set { // input can be "FMEE", "DME" or "DK". length of 2 characters is the minimum.
var bytes = new byte[4]; // Need to convert value to a 4 byte array
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value); // should be 4 bytes or less
// Put the valid ASCII codes in the array.
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
Ident = BitConverter.ToUInt32(bytes, 0); // Set structure int value
if (!ps_dict.ContainsKey(Ident)) // Add if missing
ps_dict.Add(Ident, System.Text.Encoding.ASCII.GetString(bytes));
}
}
As mentioned by others, it is not possible to exclude a field from a struct for marshalling.
You also cannot use a pointer as a string in most places.
If the number of different possible strings is relatively small (and it probably will be, given it's only 4 characters), then you could use a static Dictionary<int, string> as a kind of string-interning mechanism.
Then you write a property to add/retrieve the real string.
Note that dictionary access is O(1), and hashing an int just returns itself, so it will be very, very fast, but will take up some memory.
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
static Dictionary<int, string> _identStrings = new Dictionary<int, string>();
public string IdentStr =>
_identStrings.TryGetValue(Ident, out var ret) ? ret :
(_identStrings[Ident] = Encoding.ASCII.GetString(Ident.GetBytes());
}
This is not possible because a structure must contain all of its values in a specific order. Usually this order is controlled by the CLR itself. If you want to change the order of the data order, you can use the StructLayout. However, you cannot exclude a field or that data would simply not exist in memory.
Instead of a string (which is a reference type) you can use a pointer to point directly to that string and use that in your structure in combination with the StructLayout. To get this string value, you can use a get-only property that reads directly from unmanaged memory.
I'm trying to copy an array of floats from my C# application to an array in a C-coded DLL.
Im used to programming in C#, not so much with C. However I have no problem doing the reverse procedure ie. reading an array of floats from a C coded DLL into my C# application. I've read several threads on this site but cant work out where Im going wrong.
C# CODE
[DllImport(#"MyDll")]
static extern int CopyArray(double[] MyArray);
double[] myValues = new double[100]
int a = 0;
while(a < 100)
{
myValues[a] = a;
a++;
}
CopyArray(myValues);
C++ DLL
This is the function header;
__declspec(dllexport) int CopyArray(float* Wavelengths);
This is the function code;
float theValues[100];
int CopyArray(float* theArray)
{
status = 0;
int count = 0;
while (count < 100)
{
theValues[count] = theArray[count];
++count;
}
return(status);
}
I'm expecting my C# array to end up in the C array "theValues" but that's not happening. There is nothing getting into "theValues" array.
A couple of things.
You are mixing data types and they are different lengths (float is 32bit and double is 64bit). Both types exist in both languages, but your caller and callee need to agree on the data type. Here is a list of data types and their managed/unmanaged equivalents.
The parameter you are sending is not a pointer. It might be translated to that automatically by the compiler, but there are several options. It is entirely possible that the compiler will pick one you don't want (more info here). The one you are probably looking for is [In]. If you want to get data back from C to C#, you will also want [Out]:
[DllImport(#"MyDll")]
static extern int CopyArray([In, Out]double[] MyArray);
I am writing a method to measure the frequency of a sampled sine wave. It takes a somewhat large 1D array (10^3 to 10^4 samples order of magnitude) and returns a double. A helper methods is also called within the body of the method that checks whether the wave is crossing zero. Here is an example of what I have written:
public static double Hertz(float[] v, int n) {
int nZCros = 0
for (int i = 1; i < n; i++) {
if (IsZeroCrossing(v.Skip(i - 1).ToArray())) {
++nZCros;
}
}
return (double)nZCros / 2.0;
}
private static bool IsZeroCrossing(float[] v) {
bool cross;
//checks if the first two elements of the array are opposite sign or not
return cross;
}
My problem is that method takes 200-300 ms to run. So I decided to try using unsafe and pointers, like this,
public unsafe static double Hertz(float* v, int n) {
int nZCros = 0
for (int i = 1; i < n; i++) {
if (IsZeroCrossing(&v[i - 1])) {
++nZCros;
}
}
return (double)nZCros / 2.0;
}
private unsafe static bool IsZeroCrossing(float* v) {
bool cross;
//checks if the first two elements of the array are opposite sign or not
return cross;
}
which runs in 2-4 ms.
However, I am not really comfortable with venturing outside the recommended bounds. Is there a way to achieve the same speed in a safe context? And if there isn't, does it defeat the purpose of using C#? Should I really be using C# for these kind of signal processing applications and scientific implementations?
This is just one of many DSP methods I'm writing which take a large array of samples as an input. But this one got me to realize there was a problem, because I accidentally put in 48000 samples instead of 4800 when I was testing this method and it took 20 seconds to return a value.
Thank you.
UPDATE: I tried adding Take(2) after Skip(i - 1) in the former snippet. This brought it down to 90-100 ms, but the question still stands.
You don't need to pass a copy of the array elements to IsZeroCrossing().
Instead, just pass the two elements you are interested in:
private static bool IsZeroCrossing(float elem1, float elem2)
{
return elem1*elem2 < 0.0f; // Quick way to check if signs differ.
}
And call it like so:
if (IsZeroCrossing(v[i-1], v[i]) {
It's possible that such a simple method will be inlined for a release build, making it as fast as possible.
Before you react from the gut, as I did initially, read the whole question please. I know they make you feel dirty, I know we've all been burned before and I know it's not "good style" but, are public fields ever ok?
I'm working on a fairly large scale engineering application that creates and works with an in memory model of a structure (anything from high rise building to bridge to shed, doesn't matter). There is a TON of geometric analysis and calculation involved in this project. To support this, the model is composed of many tiny immutable read-only structs to represent things like points, line segments, etc. Some of the values of these structs (like the coordinates of the points) are accessed tens or hundreds of millions of times during a typical program execution. Because of the complexity of the models and the volume of calculation, performance is absolutely critical.
I feel that we're doing everything we can to optimize our algorithms, performance test to determine bottle necks, use the right data structures, etc. etc. I don't think this is a case of premature optimization. Performance tests show order of magnitude (at least) performance boosts when accessing fields directly rather than through a property on the object. Given this information, and the fact that we can also expose the same information as properties to support data binding and other situations... is this OK? Remember, read only fields on immutable structs. Can anyone think of a reason I'm going to regret this?
Here's a sample test app:
struct Point {
public Point(double x, double y, double z) {
_x = x;
_y = y;
_z = z;
}
public readonly double _x;
public readonly double _y;
public readonly double _z;
public double X { get { return _x; } }
public double Y { get { return _y; } }
public double Z { get { return _z; } }
}
class Program {
static void Main(string[] args) {
const int loopCount = 10000000;
var point = new Point(12.0, 123.5, 0.123);
var sw = new Stopwatch();
double x, y, z;
double calculatedValue;
sw.Start();
for (int i = 0; i < loopCount; i++) {
x = point._x;
y = point._y;
z = point._z;
calculatedValue = point._x * point._y / point._z;
}
sw.Stop();
double fieldTime = sw.ElapsedMilliseconds;
Console.WriteLine("Direct field access: " + fieldTime);
sw.Reset();
sw.Start();
for (int i = 0; i < loopCount; i++) {
x = point.X;
y = point.Y;
z = point.Z;
calculatedValue = point.X * point.Y / point.Z;
}
sw.Stop();
double propertyTime = sw.ElapsedMilliseconds;
Console.WriteLine("Property access: " + propertyTime);
double totalDiff = propertyTime - fieldTime;
Console.WriteLine("Total difference: " + totalDiff);
double averageDiff = totalDiff / loopCount;
Console.WriteLine("Average difference: " + averageDiff);
Console.ReadLine();
}
}
result:
Direct field access: 3262
Property access: 24248
Total difference: 20986
Average difference: 0.00020986
It's only 21 seconds, but why not?
Your test isn't really being fair to the property-based versions. The JIT is smart enough to inline simple properties so that they have a runtime performance equivalent to that of direct field access, but it doesn't seem smart enough (today) to detect when the properties access constant values.
In your example, the entire loop body of the field access version is optimized away, becoming just:
for (int i = 0; i < loopCount; i++)
00000025 xor eax,eax
00000027 inc eax
00000028 cmp eax,989680h
0000002d jl 00000027
}
whereas the second version, is actually performing the floating point division on each iteration:
for (int i = 0; i < loopCount; i++)
00000094 xor eax,eax
00000096 fld dword ptr ds:[01300210h]
0000009c fdiv qword ptr ds:[01300218h]
000000a2 fstp st(0)
000000a4 inc eax
000000a5 cmp eax,989680h
000000aa jl 00000096
}
Making just two small changes to your application to make it more realistic makes the two operations practically identical in performance.
First, randomize the input values so that they aren't constants and the JIT isn't smart enough to remove the division entirely.
Change from:
Point point = new Point(12.0, 123.5, 0.123);
to:
Random r = new Random();
Point point = new Point(r.NextDouble(), r.NextDouble(), r.NextDouble());
Secondly, ensure that the results of each loop iteration are used somewhere:
Before each loop, set calculatedValue = 0 so they both start at the same point. After each loop call Console.WriteLine(calculatedValue.ToString()) to make sure that the result is "used" so the compiler doesn't optimize it away. Finally, change the body of the loop from "calculatedValue = ..." to "calculatedValue += ..." so that each iteration is used.
On my machine, these changes (with a release build) yield the following results:
Direct field access: 133
Property access: 133
Total difference: 0
Average difference: 0
Just as we expect, the x86 for each of these modified loops is identical (except for the loop address)
000000dd xor eax,eax
000000df fld qword ptr [esp+20h]
000000e3 fmul qword ptr [esp+28h]
000000e7 fdiv qword ptr [esp+30h]
000000eb fstp st(0)
000000ed inc eax
000000ee cmp eax,989680h
000000f3 jl 000000DF (This loop address is the only difference)
Given that you deal with immutable objects with readonly fields, I would say that you have hit the one case when I don't find public fields to be a dirty habit.
IMO, the "no public fields" rule is one of those rules which are technically correct, but unless you are designing a library intended to be used by the public it is unlikely to cause you any problem if you break it.
Before I get too massively downvoted, I should add that encapsulation is a good thing. Given the invariant "the Value property must be null if HasValue is false", this design is flawed:
class A {
public bool HasValue;
public object Value;
}
However, given that invariant, this design is equally flawed:
class A {
public bool HasValue { get; set; }
public object Value { get; set; }
}
The correct design is
class A {
public bool HasValue { get; private set; }
public object Value { get; private set; }
public void SetValue(bool hasValue, object value) {
if (!hasValue && value != null)
throw new ArgumentException();
this.HasValue = hasValue;
this.Value = value;
}
}
(and even better would be to provide an initializing constructor and make the class immutable).
I know you feel kind of dirty doing this, but it isn't uncommon for rules and guidelines to get shot to hell when performance becomes an issue. For example, quite a few high traffic websites using MySQL have data duplication and denormalized tables. Others go even crazier.
Moral of the story - it may go against everything you were taught or advised, but the benchmarks don't lie. If it works better, just do it.
If you really need that extra performance, then it's probably the right thing to do. If you don't need the extra performance then it's probably not.
Rico Mariani has a couple of related posts:
Ten Questions on Value-Based Programming
Ten Questions on Value-Based Programming : Solution
Personally, the only time I would consider using public fields is in a very implementation-specific private nested class.
Other times it just feels too "wrong" to do it.
The CLR will take care of performance by optimising out the method/property (in release builds) so that shouldn't be an issue.
Not that I disagree with the other answers, or with your conclusion... but I'd like to know where you get the order of magnitude performance difference stat from. As I understand the C# compiler, any simple property (with no additional code other than direct access to the field), should get inlined by the JIT compiler as a direct access anyway.
The advantedge of using properties even in these simple cases (in most situations) was that by writing it as a property you allow for future changes that might modify the property. (Although in your case there would not be any such changes in future of course)
Try compiling a release build and running directly from the exe instead of through the debugger. If the application was run through a debugger then the JIT compiler will not inline the property accessors. I was not able to replicate your results. In fact, each test I ran indicated that there was virtually no difference in execution time.
But, like the others I am not completely oppossed to direct field access. Especially because it is easy to make the field private and add a public property accessor at a later time without needed make any more code modifications to get the application to compile.
Edit: Okay, my initial tests used an int data type instead of double. I see a huge difference when using doubles. With ints the direct vs. property is virtually the same. With doubles property access is about 7x slower than direct access on my machine. This is somewhat puzzling to me.
Also, it is important to run the tests outside of the debugger. Even in release builds the debugger adds overhead which skews the results.
Here's some scenarios where it is OK (from the Framework Design Guidelines book):
DO use constant fields for constants
that will never change.
DO use public
static readonly fields for predefined
object instances.
And where it is not:
DO NOT assign instances of mutable
types to readonly fields.
From what you have stated I don't get why your trivial properties don't get inlined by the JIT?
If you modify your test to use the temp variables you assign rather than directly access the properties in your calculation you will see a large performance improvement:
sw.Start();
for (int i = 0; i < loopCount; i++)
{
x = point._x;
y = point._y;
z = point._z;
calculatedValue = x * y / z;
}
sw.Stop();
double fieldTime = sw.ElapsedMilliseconds;
Console.WriteLine("Direct field access: " + fieldTime);
sw.Reset();
sw.Start();
for (int i = 0; i < loopCount; i++)
{
x = point.X;
y = point.Y;
z = point.Z;
calculatedValue = x * y / z;
}
sw.Stop();
Perhaps I'll repeat someone else, but here's my point too if it may help.
Teachings are to give you the tools you need to achieve a certain level of ease when encountering such situations.
The Agile Software development methodology says that you have to first deliver the product to your client no matter what your code might look like. Second, you may optimize and make your code "beautiful" or according to the programming states of the art.
Here, either you or your client require performance. Within your project, PERFORMANCE is CRUCIAL, if I understand correctly.
So, I guess you'll agree with me that we don't care about what the code might look like or whether it respects the "art". Do what you have to to make it performant and powerful! Properties allow your code to "format" the data I/O if required. A property has its own memory address, then it looks for its member address when you return the member's value, so you got two searches of address. If performance is such critical, just do it, and make your immutable members public. :-)
This reflects some others point of view too, if I read correctly. :)
Have a good day!
Types which encapsulate functionality should use properties. Types which only serve to hold data should use public fields, except in the case of immutable classes (where wrapping fields in read-only properties is the only way to reliably protect them against modification). Exposing members as public fields essentially proclaims "these members may be freely modified at any time without regard for anything else". If the type in question is a class type, it further proclaims "anyone who exposes a reference to this thing will be allowing the recipient to change these members at any time in any fashion they see fit." While one shouldn't expose public fields in cases where such a proclamation would be inappropriate, one should expose public fields in cases where such a proclamation would be appropriate and client code could benefit from the assumptions enabled thereby.
I have an object for which I want to generate a unique hash (override GetHashCode()) but I want to avoid overflows or something unpredictable.
The code should be the result of combining the hash codes of a small collection of strings.
The hash codes will be part of generating a cache key, so ideally they should be unique however the number of possible values that are being hashed is small so I THINK probability is in my favour here.
Would something like this be sufficient AND is there a better way of doing this?
int hash = 0;
foreach(string item in collection){
hash += (item.GetHashCode() / collection.Count)
}
return hash;
EDIT: Thanks for answers so far.
#Jon Skeet: No, order is not important
I guess this is almost a another question but since I am using the result to generate a cache key (string) would it make sense to use a crytographic hash function like MD5 or just use the string representation of this int?
The fundamentals pointed out by Marc and Jon are not bad but they are far from optimal in terms of their evenness of distribution of the results. Sadly the 'multiply by primes' approach copied by so many people from Knuth is not the best choice in many cases better distribution can be achieved by cheaper to calculate functions (though this is very slight on modern hardware). In fact throwing primes into many aspects of hashing is no panacea.
If this data is used for significantly sized hash tables I recommend reading of Bret Mulvey's excellent study and explanation of various modern (and not so modern) hashing techniques handily done with c#.
Note that the behaviour with strings of various hash functions is heavily biased towards wehther the strings are short (roughly speaking how many characters are hashed before the bits begin to over flow) or long.
One of the simplest and easiest to implement is also one of the best, the Jenkins One at a time hash.
private static unsafe void Hash(byte* d, int len, ref uint h)
{
for (int i = 0; i < len; i++)
{
h += d[i];
h += (h << 10);
h ^= (h >> 6);
}
}
public unsafe static void Hash(ref uint h, string s)
{
fixed (char* c = s)
{
byte* b = (byte*)(void*)c;
Hash(b, s.Length * 2, ref h);
}
}
public unsafe static int Avalanche(uint h)
{
h += (h<< 3);
h ^= (h>> 11);
h += (h<< 15);
return *((int*)(void*)&h);
}
you can then use this like so:
uint h = 0;
foreach(string item in collection)
{
Hash(ref h, item);
}
return Avalanche(h);
you can merge multiple different types like so:
public unsafe static void Hash(ref uint h, int data)
{
byte* d = (byte*)(void*)&data;
AddToHash(d, sizeof(int), ref h);
}
public unsafe static void Hash(ref uint h, long data)
{
byte* d= (byte*)(void*)&data;
Hash(d, sizeof(long), ref h);
}
If you only have access to the field as an object with no knowledge of the internals you can simply call GetHashCode() on each one and combine that value like so:
uint h = 0;
foreach(var item in collection)
{
Hash(ref h, item.GetHashCode());
}
return Avalanche(h);
Sadly you can't do sizeof(T) so you must do each struct individually.
If you wish to use reflection you can construct on a per type basis a function which does structural identity and hashing on all fields.
If you wish to avoid unsafe code then you can use bit masking techniques to pull out individual bits from ints (and chars if dealing with strings) with not too much extra hassle.
Hashes aren't meant to be unique - they're just meant to be well distributed in most situations. They're just meant to be consistent. Note that overflows shouldn't be a problem.
Just adding isn't generally a good idea, and dividing certainly isn't. Here's the approach I usually use:
int result = 17;
foreach (string item in collection)
{
result = result * 31 + item.GetHashCode();
}
return result;
If you're otherwise in a checked context, you might want to deliberately make it unchecked.
Note that this assumes that order is important, i.e. that { "a", "b" } should be different from { "b", "a" }. Please let us know if that's not the case.
There is nothing wrong with this approach as long as the members whose hashcodes you are combining follow the rules of hash codes. In short ...
The hash code of the private members should not change for the lifetime of the object
The container must not change the object the private members point to lest it in turn change the hash code of the container
If the order of the items is not important (i.e. {"a","b"} is the same as {"b","a"}) then you can use exclusive or to combine the hash codes:
hash ^= item.GetHashCode();
[Edit: As Mark pointed out in a comment to a different answer, this has the drawback of also give collections like {"a"} and {"a","b","b"} the same hash code.]
If the order is important, you can instead multiply by a prime number and add:
hash *= 11;
hash += item.GetHashCode();
(When you multiply you will sometimes get an overflow that is ignored, but by multiplying with a prime number you lose a minimum of information. If you instead multiplied with a number like 16, you would lose four bits of information each time, so after eight items the hash code from the first item would be completely gone.)