I want to create a byte sequence with a fixed length out of a string which has a variable length. What is the best way to archive this. All bytes should be as different as possible.
The code is used for research for myself, nothing productive.
This has been my first approach for the generation of the bytes:
static byte[] GenerateBytes(string password, Int32 strength)
{
Byte[] result = new byte[strength];
Byte[] pwBytes = Encoding.ASCII.GetBytes(password);
Int32 prime = GetLowerPrime(pwBytes.Length);
// Offset count to avoid values
Int32 count = prime;
Int32 sum = 0;
for (int i = 0; i < result.Length; i++) {
sum += (result[i] = pwBytes[(count++ % pwBytes.Length)]);
}
count += prime;
Int32 pcount = prime;
for (int i = 0; i < result.Length * 7; i++) {
result[(i % result.Length)] ^= (Byte)(pwBytes[(count++ % pwBytes.Length)] ^ ((pcount += pwBytes[(count % pwBytes.Length)]) % 255));
}
return result;
}
And generated some samples with 256 / 128 / 64 generated bytes and counted the unique bytes:
Password "Short": 170 103 60
Password "LongerX": 173 101 55
Password "Really Long": 169 100 57
Password "Unbelivable Safe!0§$": 162 101 56
Password "MCV": 119 113 61
Password "AAA": 50 51 50
Password "BBB": 67 67 52
Password "AAAAAA": 48 48 48
I tried to change the prime selector a bit this improves the generation with short keys but has partly a impact on long ones. I also tracked some statistics of the bytes. Generated and each byte value is used between 9 and 30 times.
What do you think about the results? How can i improve the generation of the bytes?
You seems to be reinventing the wheel. If you need to make key from the password, use hashing function, or, the best way - one of the standard password-based key derivation function. Search for PBKDF2.
well if you really want to roll your own solution that has no real practical use other than theoretical interest, (because this sounds like a homework question) just start off with a one-time pad of random bytes and XOR the pwd with the first few bytes, should give you reasonably high entropy for short pwds.
I am getting this string
8802000030000000C602000033000000000000800000008000000000000000001800000000000
and this is what i am expecting to convert from string,
88020000 long in little endian => 648
30000000 long in little endian => 48
C6020000 long in little endian => 710
33000000 long in little endian => 51
left side is the value i am getting from the string and right side is the value i am expecting. The right side values are might be wrong but is there any way i can get right side value from left??
I went through several threads here like
How to convert an int to a little endian byte array?
C# Big-endian ulong from 4 bytes
I tried quite different functions but nothing giving me value which are around or near by what i am expecting.
Update :
I am reading text file as below. Most of the data are current in text format, but all of the sudden i am getting bunch of GRAPHICS info, i am not sure how to handle it.
RECORD=28
cVisible=1
dwUser=0
nUID=23
c_status=1
c_data_validated=255
c_harmonic=0
c_dlg_verified=0
c_lock_sizing=0
l_last_dlg_updated=0
s_comment=
s_hlinks=
dwColor=33554432
memUsr0=
memUsr1=
memUsr2=
memUsr3=
swg_bUser=0
swg_dConnKVA=L0
swg_dDemdKVA=L0
swg_dCodeKVA=L0
swg_dDsgnKVA=L0
swg_dConnFLA=L0
swg_dDemdFLA=L0
swg_dCodeFLA=L0
swg_dDsgnFLA=L0
swg_dDiversity=L4607182418800017408
cStandard=0
guidDB={901CB951-AC37-49AD-8ED6-3753E3B86757}
l_user_selc_rating=0
r_user_selc_SCkA=
a_conn1=21
a_conn2=11
a_conn3=7
l_ct_ratio_1=x44960000
l_ct_ratio_2=x40a00000
l_set_ct_ratio_1=
l_set_ct_ratio_2=
c_ct_conn=0
ENDREC
GRAPHICS0=8802000030000000C602000033000000000000800000008000000000000000001800000000000
EOF
Depending on how you want to parse up the input string, you could do something like this:
string input = "8802000030000000C6020000330000000000008000000080000000000000000018000000";
for (int i = 0; i < input.Length ; i += 8)
{
string subInput = input.Substring(i, 8);
byte[] bytes = new byte[4];
for (int j = 0; j < 4; ++j)
{
string toParse = subInput.Substring(j * 2, 2);
bytes[j] = byte.Parse(toParse, NumberStyles.HexNumber);
}
uint num = BitConverter.ToUInt32(bytes, 0);
Console.WriteLine(subInput + " --> " + num);
}
88020000 --> 648
30000000 --> 48
C6020000 --> 710
33000000 --> 51
00000080 --> 2147483648
00000080 --> 2147483648
00000000 --> 0
00000000 --> 0
18000000 --> 24
Do you really literally mean that that's a string? What it looks like is this: You have a bunch of 32-bit words, each represented by 8 hex digits. Each one is presented in little-endian order, low byte first. You need to interpret each of those as an integer. So, e.g., 88020000 is 88 02 00 00, which is to say 0x00000288.
If you can clarify exactly what it is you've got -- a string, an array of some kind of numeric type, or what -- then it'll be easier to advise you further.
I've been working on converting a C++ crypting method to C#. The problem is, I cant get it to encrypt/decrypt the way I want it to.
The idea is simple, I capture a packet, and decrypt it. The output will be:
Packet Size - Command/Action - Null (End)
(The decryptor cuts off the first and last 2 bytes)
The C++ code is this:
// Crypt the packet with Xor operator
void cryptPacket(char *packet)
{
unsigned short paksize=(*((unsigned short*)&packet[0])) - 2;
for(int i=2; i<paksize; i++)
{
packet[i] = 0x61 ^ packet[i];
}
}
So I thought this would work in C# if I didn't want to use pointers:
public static char[] CryptPacket(char[] packet)
{
ushort paksize = (ushort) (packet.Length - 2);
for(int i=2; i<paksize; i++)
{
packet[i] = (char) (0x61 ^ packet[i]);
}
return packet;
}
-but it isn't, the value returned is just another line of rubish instead of the decrypted value. The output given is: ..O♦&/OOOe.
Well.. atleast the '/' is in the right place for some reason.
Some more information:
The test packet I'm using is this:
Hex value: 0C 00 E2 66 65 47 4E 09 04 13 65 00
Plain text: ...feGN...e.
Decrypted: XX/hereXX
X = Unknown value, I cant really remember, but it doesn't matter.
Using Hex Workshop you can decrypt the packet this way:
Special Paste the hex value as CF_TEXT, make sure the 'treat as hexidecimal value' box is checked.
Afterwards, select everything from the hexidecimal value you just pasted, except the first and last 2 bytes.
Go to Tools>Operations>Xor.
Select 'Treat data as 8 bit data' and set value to '61'.
Press 'OK', and you'r done.
That's all the information I can give at the moment, because I'm writing this off the top of my head.
Thank you for your time.
In case you don't see a question in this:
It would be great if someone could take a look at the code to see what's wrong with it, or if there's another way to do it. I'm converting this code because I'm horrible with C++, and want to create a C# application with that code.
Ps: The code tags and such were a pain, so I'm sorry if the spacing etc. is a little messed up.
Your problem might be that as .NET's char is unicode, some characters are going to be using more than one byte, and your bitmask is only one byte long. So the most significant byte will be left unaltered.
I just tried your function and it seems ok:
class Program
{
// OP's method: http://stackoverflow.com/questions/4815959
public static byte[] CryptPacket(byte[] packet)
{
int paksize = packet.Length - 2;
for (int i = 2; i < paksize; i++)
{
packet[i] = (byte)(0x61 ^ packet[i]);
}
return packet;
}
// http://stackoverflow.com/questions/321370 :)
public static byte[] StringToByteArray(string hex)
{
return Enumerable.Range(0, hex.Length).
Where(x => 0 == x % 2).
Select(x => Convert.ToByte(hex.Substring(x, 2), 16)).
ToArray();
}
static void Main(string[] args)
{
string hex = "0C 00 E2 66 65 47 4E 09 04 13 65 00".Replace(" ", "");
byte[] input = StringToByteArray(hex);
Console.WriteLine("Input: " + ASCIIEncoding.ASCII.GetString(input));
byte[] output = CryptPacket(input);
Console.WriteLine("Output: " + ASCIIEncoding.ASCII.GetString(output));
Console.ReadLine();
}
}
Console output:
Input: ...feGN.....
Output: ...../here..
(where '.' represents funny ascii characters)
It seems a bit smelly that your CryptPacket method is overwriting the initial array with the output values. And that irrelevant characters are not trimmed. But if you are trying to port something, I guess you should know what you are doing.
You could also consider trimming the input array, to remove the unwanted characters first, and then use a generic ROT13 method (like this one). This way you have your own "specialized" version with 2-byte offsets inside the crypt function itself, instead of something like:
public static byte[] CryptPacket(byte[] packet)
{
// create a new instance
byte[] output = new byte[packet.Length];
// process ALL array items
for (int i = 0; i < packet.Length; i++)
{
output[i] = (byte)(0x61 ^ packet[i]);
}
return output;
}
Here's an almost literal translation from C++ to C#, and it seems to work:
var packet = new byte[] {
0x0C, 0x00, 0xE2, 0x66, 0x65, 0x47,
0x4E, 0x09, 0x04, 0x13, 0x65, 0x00
};
CryptPacket(packet);
// displays "....../here." where "." represents an unprintable character
Console.WriteLine(Encoding.ASCII.GetString(packet));
// ...
void CryptPacket(byte[] packet)
{
int paksize = (packet[0] | (packet[1] << 8)) - 2;
for (int i = 2; i < paksize; i++)
{
packet[i] ^= 0x61;
}
}
Consider the following unit test:
[TestMethod]
public void TestByteToString()
{
var guid = new Guid("61772f3ae5de5f4a8577eb1003c5c054");
var guidString = guid.ToString("n");
var byteString = ToHexString(guid.ToByteArray());
Assert.AreEqual(guidString, byteString);
}
private String ToHexString(Byte[] bytes)
{
var hex = new StringBuilder(bytes.Length * 2);
foreach(var b in bytes)
{
hex.AppendFormat("{0:x2}", b);
}
return hex.ToString();
}
Here's the result:
Assert.AreEqual failed. Expected:<61772f3ae5de5f4a8577eb1003c5c054>. Actual:<3a2f7761dee54a5f8577eb1003c5c054>.
Well, they are the same, after the first 4 bytes. And the first four are the same, just in the reverse order.
Basically, when created from the string, it's assumed to be in "big-endian" format: Highest byte to the left. However, when stored internally (on an Intel-ish machine), the bytes are ordered "little-endian": highest order byte to the right.
If you compare the results, you can see that the first three groups are reversed:
61 77 2f 3a e5 de 5f 4a 8577eb1003c5c054
3a 2f 77 61 de e5 4a 5f 8577eb1003c5c054
That's because in the GUID structure, these 3 groups are defined as DWORD and two WORDs rather than bytes:
{0x00000000,0x0000,0x0000,{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}}
so in memory, an Intel processor stores them in Little-endian order (the most significant byte the last).
A GUID is structured as follows:
int a
short b
short c
byte[8] d
So for the part represented by a your code gets the bytes reversed. All other parts are transformed correctly.
Make sure you run outside of the IDE. That is key.
-edit- I LOVE SLaks comment. "The amount of misinformation in these answers is staggering." :D
Calm down guys. Pretty much all of you were wrong. I DID make optimizations.
It turns out whatever optimizations I made wasn't good enough.
I ran the code in GCC using gettimeofday (I'll paste code below) and used g++ -O2 file.cpp and got slightly faster results then C#.
Maybe MS didn't create the optimizations needed in this specific case but after downloading and installing mingw I was tested and found the speed to be near identical.
Justicle Seems to be right. I could have sworn I use clock on my PC and used that to count and found it was slower but problem solved. C++ speed isn't almost twice as slower in the MS compiler.
When my friend informed me of this I couldn't believe it. So I took his code and put some timers onto it.
Instead of Boo I used C#. I constantly got faster results in C#. Why? The .NET version was nearly half the time no matter what number I used.
C++ version (bad version):
#include <iostream>
#include <stdio.h>
#include <intrin.h>
#include <windows.h>
using namespace std;
int fib(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
int main()
{
__int64 time = 0xFFFFFFFF;
while (1)
{
int n;
//cin >> n;
n = 41;
if (n < 0) break;
__int64 start = __rdtsc();
int res = fib(n);
__int64 end = __rdtsc();
cout << res << endl;
cout << (float)(end-start)/1000000<<endl;
break;
}
return 0;
}
C++ version (better version):
#include <iostream>
#include <stdio.h>
#include <intrin.h>
#include <windows.h>
using namespace std;
int fib(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
int main()
{
__int64 time = 0xFFFFFFFF;
while (1)
{
int n;
//cin >> n;
n = 41;
if (n < 0) break;
LARGE_INTEGER start, end, delta, freq;
::QueryPerformanceFrequency( &freq );
::QueryPerformanceCounter( &start );
int res = fib(n);
::QueryPerformanceCounter( &end );
delta.QuadPart = end.QuadPart - start.QuadPart;
cout << res << endl;
cout << ( delta.QuadPart * 1000 ) / freq.QuadPart <<endl;
break;
}
return 0;
}
C# version:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
using System.ComponentModel;
using System.Threading;
using System.IO;
using System.Diagnostics;
namespace fibCSTest
{
class Program
{
static int fib(int n)
{
if (n < 2)return n;
return fib(n - 1) + fib(n - 2);
}
static void Main(string[] args)
{
//var sw = new Stopwatch();
//var timer = new PAB.HiPerfTimer();
var timer = new Stopwatch();
while (true)
{
int n;
//cin >> n;
n = 41;
if (n < 0) break;
timer.Start();
int res = fib(n);
timer.Stop();
Console.WriteLine(res);
Console.WriteLine(timer.ElapsedMilliseconds);
break;
}
}
}
}
GCC version:
#include <iostream>
#include <stdio.h>
#include <sys/time.h>
using namespace std;
int fib(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
int main()
{
timeval start, end;
while (1)
{
int n;
//cin >> n;
n = 41;
if (n < 0) break;
gettimeofday(&start, 0);
int res = fib(n);
gettimeofday(&end, 0);
int sec = end.tv_sec - start.tv_sec;
int usec = end.tv_usec - start.tv_usec;
cout << res << endl;
cout << sec << " " << usec <<endl;
break;
}
return 0;
}
EDIT: TL/DR version: CLR JIT will inline one level of recursion, MSVC 8 SP1 will not without #pragma inline_recursion(on). And you should run the C# version outside of a debugger to get the fully optimized JIT.
I got similar results to acidzombie24 with C# vs. C++ using VS 2008 SP1 on a Core2 Duo laptop running Vista plugged in with "high performance" power settings (~1600 ms vs. ~3800 ms). It's kind of tricky to see the optimized JIT'd C# code, but for x86 it boils down to this:
00000000 55 push ebp
00000001 8B EC mov ebp,esp
00000003 57 push edi
00000004 56 push esi
00000005 53 push ebx
00000006 8B F1 mov esi,ecx
00000008 83 FE 02 cmp esi,2
0000000b 7D 07 jge 00000014
0000000d 8B C6 mov eax,esi
0000000f 5B pop ebx
00000010 5E pop esi
00000011 5F pop edi
00000012 5D pop ebp
00000013 C3 ret
return fib(n - 1) + fib(n - 2);
00000014 8D 7E FF lea edi,[esi-1]
00000017 83 FF 02 cmp edi,2
0000001a 7D 04 jge 00000020
0000001c 8B DF mov ebx,edi
0000001e EB 19 jmp 00000039
00000020 8D 4F FF lea ecx,[edi-1]
00000023 FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h]
00000029 8B D8 mov ebx,eax
0000002b 4F dec edi
0000002c 4F dec edi
0000002d 8B CF mov ecx,edi
0000002f FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h]
00000035 03 C3 add eax,ebx
00000037 8B D8 mov ebx,eax
00000039 4E dec esi
0000003a 4E dec esi
0000003b 83 FE 02 cmp esi,2
0000003e 7D 04 jge 00000044
00000040 8B D6 mov edx,esi
00000042 EB 19 jmp 0000005D
00000044 8D 4E FF lea ecx,[esi-1]
00000047 FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h]
0000004d 8B F8 mov edi,eax
0000004f 4E dec esi
00000050 4E dec esi
00000051 8B CE mov ecx,esi
00000053 FF 15 F8 2F 12 00 call dword ptr ds:[00122FF8h]
00000059 03 C7 add eax,edi
0000005b 8B D0 mov edx,eax
0000005d 03 DA add ebx,edx
0000005f 8B C3 mov eax,ebx
00000061 5B pop ebx
00000062 5E pop esi
00000063 5F pop edi
00000064 5D pop ebp
00000065 C3 ret
In contrast to the C++ generated code (/Ox /Ob2 /Oi /Ot /Oy /GL /Gr):
int fib(int n)
{
00B31000 56 push esi
00B31001 8B F1 mov esi,ecx
if (n < 2) return n;
00B31003 83 FE 02 cmp esi,2
00B31006 7D 04 jge fib+0Ch (0B3100Ch)
00B31008 8B C6 mov eax,esi
00B3100A 5E pop esi
00B3100B C3 ret
00B3100C 57 push edi
return fib(n - 1) + fib(n - 2);
00B3100D 8D 4E FE lea ecx,[esi-2]
00B31010 E8 EB FF FF FF call fib (0B31000h)
00B31015 8D 4E FF lea ecx,[esi-1]
00B31018 8B F8 mov edi,eax
00B3101A E8 E1 FF FF FF call fib (0B31000h)
00B3101F 03 C7 add eax,edi
00B31021 5F pop edi
00B31022 5E pop esi
}
00B31023 C3 ret
The C# version basically inlines fib(n-1) and fib(n-2). For a function that is so call heavy, reducing the number of function calls is the key to speed. Replacing fib with the following:
int fib(int n);
int fib2(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
int fib(int n)
{
if (n < 2) return n;
return fib2(n - 1) + fib2(n - 2);
}
Gets it down to ~1900 ms. Incidentally, if I use #pragma inline_recursion(on) I get similar results with the original fib. Unrolling it one more level:
int fib(int n);
int fib3(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
int fib2(int n)
{
if (n < 2) return n;
return fib3(n - 1) + fib3(n - 2);
}
int fib(int n)
{
if (n < 2) return n;
return fib2(n - 1) + fib2(n - 2);
}
Gets it down to ~1380 ms. Beyond that it tapers off.
So it appears that the CLR JIT for my machine will inline recursive calls one level, whereas the C++ compiler will not do that by default.
If only all performance critical code were like fib!
EDIT:
While the original C++ timing is wrong (comparing cycles to milliseconds), better timing does show C# is faster with vanilla compiler settings.
OK, enough random speculation, time for some science. After getting weird results with existing C++ code, I just tried running:
int fib(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
int main()
{
__int64 time = 0xFFFFFFFF;
while (1)
{
int n;
//cin >> n;
n = 41;
if (n < 0) break;
LARGE_INTEGER start, end, delta, freq;
::QueryPerformanceFrequency( &freq );
::QueryPerformanceCounter( &start );
int res = fib(n);
::QueryPerformanceCounter( &end );
delta.QuadPart = end.QuadPart - start.QuadPart;
cout << res << endl;
cout << ( delta.QuadPart * 1000 ) / freq.QuadPart <<endl;
break;
}
return 0;
}
EDIT:
MSN pointed out you should time C# outside the debugger, so I re-ran everything:
Best Results (VC2008, running release build from commandline, no special options enabled)
C++ Original Code - 10239
C++ QPF - 3427
C# - 2166 (was 4700 in debugger).
The original C++ code (with rdtsc) wasn't returning milliseconds, just a factor of reported clock cycles, so comparing directly to StopWatch() results is invalid. The original timing code is just wrong.
Note StopWatch() uses QueryPerformance* calls:
http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx
So in this case C++ is faster than C#.
It depends on your compiler settings - see MSN's answer.
Don't understand the answer with garbage collection or console buffering.
It could be that your timer mechanism in C++ is inherently flawed.
According to http://en.wikipedia.org/wiki/Rdtsc, it is possible that you get wrong benchmark results.
Quoted:
While this makes time keeping more
consistent, it can skew benchmarks,
where a certain amount of spin-up time
is spent at a lower clock rate before
the OS switches the processor to the
higher rate. This has the effect of
making things seem like they require
more processor cycles than they
normally would.
I think the problem is your timing code in C++.
From the MS docs for __rdtsc:
Generates the rdtsc instruction, which returns the processor time stamp.
The processor time stamp records the number of clock cycles since the last reset.
Perhaps try GetTickCount().
Not saying that's the issue, but you may want to read How to: Use the High-Resolution Timer
Also see this...
http://en.wikipedia.org/wiki/Comparison_of_Java_and_C%2B%2B#Performance
Several studies of mostly numerical benchmarks argue that Java could potentially be faster than C++ in some circumstances, for a variety of reasons:[8][9]
Pointers make optimization difficult since they may point to arbitrary data, though many C++ compilers provide the C99 keyword restrict which corrects this problem.[10]
Compared to C++ implementations which make unrestrained use of standard implementations of malloc/new for memory allocation, implementations of Java garbage collection may have better cache coherence as its allocations are generally made sequentially.
* Run-time compilation can potentially use additional information available at run-time to optimise code more effectively, such as knowing what processor the code will be executed on.
It's about Java but begins to tackle the issue of Performance between C runtimes and JITed runtimes.
Maybe C# is able to unroll stack in recursive calls? I think it is also reduces number of computations.
One important thing to remember when comparing languages is that if you do a simple line-by-line translation, you're not comparing apples to apples.
What makes sense in one language may have horrible side effects in another. To really compare the performance characteristics you need a C# version and a C++, and the code for those versions may be very different. For example, in C# I wouldn't even use the same function signature. I'd go with something more like this:
IEnumerable<int> Fibonacci()
{
int n1 = 0;
int n2 = 1;
yield return 1;
while (true)
{
int n = n1 + n2;
n1 = n2;
n2 = n;
yield return n;
}
}
and then wrap that like this:
public static int fib(int n)
{
return Fibonacci().Skip(n).First();
}
That will do much better, because it works from the bottom up to take advantage of the calculations in the last term to help build the next one, rather than two separate sets of recursive calls.
And if you really want screaming performance in C++ you can use meta-programming to make the compiler pre-compute your results like this:
template<int N> struct fibonacci
{
static const int value = fibonacci<N - 1>::value + fibonacci<N - 2>::value;
};
template<> struct fibonacci<1>
{
static const int value = 1;
};
template<> struct fibonacci<0>
{
static const int value = 0;
};
It could be that the methods are pre-jitted at runtime prior to running the test...or that the Console is a wrapper around the API for outputting to console, when the C++'s code for cout is buffered..I guess..
Hope this helps,
Best regards,
Tom.
you are calling static function in c# code which will be inlined, and in c++ you use nonstatic function. i have ~1.4 sec for c++. with g++ -O3 you can have 1.21 sec.
you just can't compare c# with c++ with badly translated code
If that code is truly 1/2 the execution time then some possible reasons are:
Garbage collection speeds up execution of C# code over C++ code if that were happening anywhere in the above code.
The C# writing to the console may be buffered (C++ might not, or it might just not be as efficient)
Speculation 1
Garbage collection procedure might play a role.
In the C++ version all memory management would occur inline while the program is running, and that would count into the final time.
In .NET the Garbage Collector (GC) of the Common Language Runtime (CLR) is a separate process on a different thread and often cleans up your program after it's completed. Therefore your program will finish, the times will print out before memory is freed. Especially for small programs which usually won't be cleaned up at all until completion.
It all depends on details of the Garbage Collection implementation (and if it optimizes for the stack in the same way as the heap) but I assume this plays a partial role in the speed gains. If the C++ version was also optimized to not deallocate/clean up memory until after it finished (or push that step until after the program completed) then I'm sure you would see C++ speed gains.
To Test GC: To see the "delayed" .NET GC behaviour in action, put a breakpoint in some of your object's destructor/finalizer methods. The debugger will come alive and hit those breakpoints after the program is completed (yes, after Main is completed).
Speculation 2
Otherwise, the C# source code is compiled by the programmer down to IL code (Microsoft byte code instructions) and at runtime those are in turn compiled by the CLR's Just-In-Time compiler into an processor-specific instruction set (as with classic compiled programs) so there's really no reason a .NET program should be slower once it gets going and has run the first time.
I think everyone here has missed the "secret ingredient" that makes all the difference: The JIT compiler knows exactly what the target architecture is, whereas a static compiler does not. Different x86 processors have very different architectures and pipelines, so a sequence of instructions that is the fastest possible on one CPU might be relatively slower on another.
In this case the Microsoft C++ compiler's optimization strategy was targeted to a different processor than the CPU acidzombie24 was actually using, but gcc chose instructions more suited to his CPU. On a newer, older, or different-manufacturer CPU it is likely Microsoft C++ would be faster than gcc.
JIT has the best potential of all: Since it knows exactly what CPU is being targeted it has the ability to generate the very best possible code in every situation. Thus C# is inherently (in the long term) likely to be faster than C++ for such code.
Having said this, I would guess that the fact that CLR's JIT picked a better instruction sequence than Microsoft C++ was more a matter of luck than knowing the architecture. This is evidenced by the fact that on Justicle's CPU the Microsoft C++ compiler selected a better instruction sequence than the CLR JIT compiler.
A note on _rdtsc vs QueryPerformanceCounter: Yes _rdtsc is broken, but when you're talking a 3-4 second operation and running it several times to validate consistent timing, any situation that causes _rdtsc to give bogus timings (such as processor speed changes or processor changes) should cause outlying values in the test data that will be thrown out, so assuming acidzombie24 did his original benchmarks properly I doubt the _rdtsc vs QueryPerformanceCounter question really had any impact.
I know that the .NET compiler has a Intel optimization.