Get Calling Object's HashCode? - c#

This might be a duplicate, but I haven't seen this exact question or a similar one asked/answered with a date newer than the release of .Net 4.
I'm looking for a temporary hack that allows me to look through the call stack and get all calling objects (not methods, but the instances that hold the methods) in the stack. Ultimately I need their hashcodes.
Is this possible?
EDIT:
Whether it came across in my question or not, was really asking if there was a simple/built-in way to do this. Really, just a stop-gap fix until I can make breaking changes to other parts of the system. Thanks for the great answers. After seeing them, I think I'll wait . . . :)

What are you trying to achieve here?
Have a look at a similar question I answered about a month ago: How to get current value of EIP in managed code?. You might get some inspiration from that. Or you might decide it is too ugly (+1 for the latter).
If all you want to do is assemble 'unique' call paths within a program session, go right ahead: I'd be very sure to use an AOP weaver and thread local storage. It wouldn't be too hard that way.
Caveat 1: Hashes are not very useful for generic .NET objects
A random object's hashcode may vary with it's location on the heap to begin with. For reference: on MONO, with the moving heap allocator disabled, Object::GetHash is this pretty blob of code (mono/metadata/monitor.c)
#else
/*
* Wang's address-based hash function:
* http://www.concentric.net/~Ttwang/tech/addrhash.htm
*/
return (GPOINTER_TO_UINT (obj) >> MONO_OBJECT_ALIGNMENT_SHIFT) * 2654435761u;
#endif
Of course, with the moving allocator things are slightly more complex to guarantee a constant hash over the lifetime of the object, but you get the point: each runtime will generate different hashes, and the amount of allocations done will alter the future default hash codes of the identical objects.
Caveat 2: Your stack will contain alien frames
Even if you got that part fixed by supplying proper deterministic hash functions, you will require each stackframe to be of 'recgonizable' type. This is probably not going to be the case. Certainly not if you use anything akin to LINQ, anonymous types, static constructors, delegates; all kinds of things could be interleaving stack frames with those of (anonymous) helper types, or even performance trampolines invented by the JIT compiler to optimize tail recursion, a large switch jump table or sharing code between multiple overloads.
Takeaway: stack analysis is hard: you should definitely use the proper API if you are going to undertake it.
Conclusion:
By all means have a ball. But heed the advice
your requirements are non-standard (underlined by the runtime library not supporting it); This is usually a sign that: you are solving a unique problem (but reconsider the tool chosen?) or you are solving it the wrong way
You could perhaps get a lot more info by generating a flow graph with some handwritten simulation code instead of trying to hook into the CLR VM
if you're gonna do it, use the proper API (probably the profiler API since a sampling profiler will save exactly this: stack 'fingerprints' every so-many instructions)
If you really must do it by instrumenting your code, consider using AOP

You can get the call stack by creating an instance of the StackTrace class and inspecting the StackFrame objects within it. Looking at the member list, this doesn't seem to reveal the instances, though, just the classes and methods.

This is possible only with unmanaged APIs, specifically with the CLR profiling API. I know nithing about it, other than it is used to implement profiling and debugging tools. You have to google it and be comfortable with burning 1 week bringing it to production. If at all possible, give up your plan and find an alternative. Tell us what you want to do and we can help!

Try Environment.StackTrace.

Related

C# classes - Why so many static methods?

I'm pretty new to C# so bear with me.
One of the first things I noticed about C# is that many of the classes are static method heavy. For example...
Why is it:
Array.ForEach(arr, proc)
instead of:
arr.ForEach(proc)
And why is it:
Array.Sort(arr)
instead of:
arr.Sort()
Feel free to point me to some FAQ on the net. If a detailed answer is in some book somewhere, I'd welcome a pointer to that as well. I'm looking for the definitive answer on this, but your speculation is welcome.
Because those are utility classes. The class construction is just a way to group them together, considering there are no free functions in C#.
Assuming this answer is correct, instance methods require additional space in a "method table." Making array methods static may have been an early space-saving decision.
This, along with avoiding the this pointer check that Amitd references, could provide significant performance gains for something as ubiquitous as arrays.
Also see this rule from FXCOP
CA1822: Mark members as static
Rule Description
Members that do not access instance data or call instance methods can
be marked as static (Shared in Visual Basic). After you mark the
methods as static, the compiler will emit nonvirtual call sites to
these members. Emitting nonvirtual call sites will prevent a check at
runtime for each call that makes sure that the current object pointer
is non-null. This can achieve a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue.
Perceived functionality.
"Utility" functions are unlike much of the functionality OO is meant to target.
Think about the case with collections, I/O, math and just about all utility.
With OO you generally model your domain. None of those things really fit in your domain--it's not like you are coding and go "Oh, we need to order a new hashtable, ours is getting full". Utility stuff often just doesn't fit.
We get pretty close, but it's still not very OO to pass around collections (where is your business logic? where do you put the methods that manipulate your collection and that other little piece or two of data you are always passing around with it?)
Same with numbers and math. It's kind of tough to have Integer.sqrt() and Long.sqrt() and Float.sqrt()--it just doesn't make sense, nor does "new Math().sqrt()". There are a lot of areas it just doesn't model well. If you are looking for mathematical modeling then OO may not be your best bet. (I made a pretty complete "Complex" and "Matrix" class in Java and made them fairly OO, but making them really taught me some of the limits of OO and Java--I ended up "Using" the classes from Groovy mostly)
I've never seen anything anywhere NEAR as good as OO for modeling your business logic, being able to demonstrate the connections between code and managing your relationship between data and code though.
So we fall back on a different model when it makes more sense to do so.
The classic motivations against static:
Hard to test
Not thread-safe
Increases code size in memory
1) C# has several tools available that make testing static methods relatively easy. A comparison of C# mocking tools, some of which support static mocking: https://stackoverflow.com/questions/64242/rhino-mocks-typemock-moq-or-nmock-which-one-do-you-use-and-why
2) There are well-known, performant ways to do static object creation/logic without losing thread safety in C#. For example implementing the Singleton pattern with a static class in C# (you can jump to the fifth method if the inadequate options bore you): http://www.yoda.arachsys.com/csharp/singleton.html
3) As #K-ballo mentions, every method contributes to code size in memory in C#, rather than instance methods getting special treatment.
That said, the 2 specific examples you pointed out are just a problem of legacy code support for the static Array class before generics and some other code sugar was introduced back in C# 1.0 days, as #Inerdia said. I tried to answer assuming you had more code you were referring to, possibly including outside libraries.
The Array class isn't generic and can't be made fully generic because this would break backwards compatibility. There's some sort of magic going on where arrays implement IList<T>, but that's only for single-dimension arrays with a lower bound of 0 – "list-ish" arrays.
I'm guessing the static methods are the only way to add generic methods that work over any shape of array regardless of whether it qualifies for the above-mentioned compiler magic.

How to identify what state variables are read/written in a given method in C#

What is the simplest way to identify if a given method is reading or writing a member variable or property?
I am writing a tool to assist in an RPC system, in which access to remote objects is expensive. Being able to detect if a given object is not used in a method could allow us to avoid serializing its state. Doing it on source code is perfectly reasonable (but being able to do it on compiled code would be amazing)
I think I can either write my own simple parser, I can try to use one of the existing C# parsers and work with the AST. I am not sure if it is possible to do this with Assemblies using Reflection. Are there any other ways? What would be the simplest?
EDIT: Thanks for all the quick replies. Let me give some more information to make the question clearer. I definitely prefer correct, but it definitely shouldn't be extremely complex. What I mean is that we can't go too far checking for extremes or impossibles (as the passed-in delegates that were mentioned, which is a great point). It would be enough to detect those cases and assume everything could be used and not optimize there. I would assume that those cases would be relatively uncommon.
The idea is for this tool to be handed to developers outside of our team, that should not be concerned about this optimization. The tool takes their code and generates proxies for our own RPC protocol. (we are using protobuf-net for serialization only, but no wcf nor .net remoting). For this reason, anything we use has to be free or we wouldn't be able to deploy the tool for licensing issues.
You can have simple or you can have correct - which do you prefer?
The simplest way would be to parse the class and the method body. Then identify the set of tokens which are properties and field names of the class. The subset of those tokens which appears in the method body are the properties and field names you care about.
This trivial analysis of course is not correct. If you had
class C
{
int Length;
void M() { int x = "".Length; }
}
Then you would incorrectly conclude that M references C.Length. That's a false positive.
The correct way to do it is to write a full C# compiler, and use the output of its semantic analyzer to answer your question. That's how the IDE implements features like "go to definition".
Before attempting to write this kind of logic yourself, I would check to see if you can leverage NDepend to meet your needs.
NDepend is a code dependency analysis tool ... and much more. It implements a sophisticated analyzer for examining relationships between code constructs and should be able to answer that question. It also operates on both source and IL, if I'm not mistaken.
NDepend exposes CQL - Code Query Language - which allows you to write SQL-like queries against the relationships between structures in your code. NDepend has some support for scripting and is capable of being integrated with your build process.
To complete the LBushkin answer on NDepend (Disclaimer: I am one of the developer of this tool), NDepend can indeed help you on that. The Code LINQ Query (CQLinq) below, actually match methods that...
shouldn't provoque any RPC calls but
that are reading/writing any fields of any RPC types,
or that are reading/writing any properties of any RPC types,
Notice how first we define the 4 sets: typesRPC, fieldsRPC, propertiesRPC, methodsThatShouldntUseRPC - and then we match methods that violate the rule. Of course this CQLinq rule needs to be adapted to match your own typesRPC and methodsThatShouldntUseRPC:
warnif count > 0
// First define what are types whose call are RDC
let typesRPC = Types.WithNameIn("MyRpcClass1", "MyRpcClass2")
// Define instance fields of RPC types
let fieldsRPC = typesRPC.ChildFields()
.Where(f => !f.IsStatic).ToHashSet()
// Define instance properties getters and setters of RPC types
let propertiesRPC = typesRPC.ChildMethods()
.Where(m => !m.IsStatic && (m.IsPropertyGetter || m.IsPropertySetter))
.ToHashSet()
// Define methods that shouldn't provoke RPC calls
let methodsThatShouldntUseRPC =
Application.Methods.Where(m => m.NameLike("XYZ"))
// Filter method that should do any RPC call
// but that is using any RPC fields (reading or writing) or properties
from m in methodsThatShouldntUseRPC.UsingAny(fieldsRPC).Union(
methodsThatShouldntUseRPC.UsingAny(propertiesRPC))
let fieldsRPCUsed = m.FieldsUsed.Intersect(fieldsRPC )
let propertiesRPCUsed = m.MethodsCalled.Intersect(propertiesRPC)
select new { m, fieldsRPCUsed, propertiesRPCUsed }
My intuition is that detecting which member variables will be accessed is the wrong approach. My first guess at a way to do this would be to just request serialized objects on an as-needed basis (preferably at the beginning of whatever function needs them, not piecemeal). Note that TCP/IP (i.e. Nagle's algorithm) should stuff these requests together if they are made in rapid succession and are small
Eric has it right: to do this well, you need what amounts to a compiler front end. What he didn't emphasize enough is the need for strong flow analysis capabilities (or a willingness to accept very conservative answers possibly alleviated by user annotations). Maybe he meant that in the phrase "semantic analysis" although his example of "goto definition" just needs a symbol table, not flow analysis.
A plain C# parser could only be used to get very conservative answers (e.g., if method A in class C contains identifier X, assume it reads class member X; if A contains no calls then you know it can't read member X).
The first step beyond this is having a compiler's symbol table and type information (if method A refers to class member X directly, then assume it reads member X; if A contains **no* calls and mentions identifier X only in the context of accesses to objects which are not of this class type then you know it can't read member X). You have to worry about qualified references, too; Q.X may read member X if Q is compatible with C.
The sticky point are calls, which can hide arbitrary actions. An analysis based on just parsing and symbol tables could determine that if there are calls, the arguments refer only to constants or to objects which are not of the class which A might represent (possibly inherited).
If you find an argument that has an C-compatible class type, now you have to determine whether that argument can be bound to this, requiring control and data flow analysis:
method A( ) { Object q=this;
...
...q=that;...
...
foo(q);
}
foo might hide an access to X. So you need two things: flow analysis to determine whether the initial assignment to q can reach the call foo (it might not; q=that may dominate all calls to foo), and call graph analysis to determine what methods foo might actually invoke, so that you can analyze those for accesses to member X.
You can decide how far you want to go with this simply making the conservative assumption "A reads X" anytime you don't have enough information to prove otherwise. This will you give you a "safe" answer (if not "correct" or what I'd prefer to call "precise").
Of frameworks that might be helpful, you might consider Mono, which surely parses and builds symbol tables. I don't know what support it provides for flow analysis or call graph extraction; I would not expect the Mono-to-IL front-end compiler to do a lot of that, as people usually hide that machinery in the JIT part of JIT-based systems. A downside is that Mono may be behind the "modern C#" curve; last time I heard, it handled only C# 2.0 but my information may be stale.
An alternative is our DMS Software Reengineering Toolkit and its C# Front End.
(Not an open source product).
DMS provides general source code parsing, tree building/inspection/analysis, general symbol table support and built-in machinery for implementing control-flow analysis, data flow analysis, points-to analysis (needed for "What does object O point to?"), and call graph construction. This machinery has all been tested by fire with DMS's Java and C front ends, and the symbol table support has been used to implement full C++ name and type resolution, so its pretty effective. (You don't want to underestimate the work it takes to build all that machinery; we've been working on DMS since 1995).
The C# Front End provides for full C# 4.0 parsing and full tree building. It presently does not build symbol tables for C# (we're working on this) and that's a shortcoming compared to Mono. With such a symbol table, however, you would have access to all that flow analysis machinery (which has been tested with DMS's Java and C front ends) and that might be a big step up from Mono if it doesn't provide that.
If you want to do this well, you have a considerable amount of work in front of you. If you want to stick with "simple", you'll have to do with just parsing the tree and being OK with being very conservative.
You didn't say much about knowing if a method wrote to a member. If you are going to minimize traffic the way you describe, you want to distinguish "read", "write" and "update" cases and optimize messages in both directions. The analysis is obviously pretty similar for the various cases.
Finally, you might consider processing MSIL directly to get the information you need; you'll still have the flow analysis and conservative analysis issues. You might find the following technical paper interesting; it describes a fully-distributed Java object system that has to do the same basic analysis you want to do,
and does so, IIRC, by analyzing class files and doing massive byte code rewriting.
Java Orchestra System
By RPC do you mean .NET Remoting? Or DCOM? Or WCF?
All of these offer the opportunity to monitor cross process communication and serialization via sinks and other constructs, but they are all platform specific, so you'll need to specify the platform...
You could listen for the event that a property is being read/written to with an interface similar to INotifyPropertyChanged (although you obviously won't know which method effected the read/write.)
I think the best you can do is explicitly maintain a dirty flag.

C# running faster than C++?

A friend and I have written an encryption module and we want to port it to multiple languages so that it's not platform specific encryption. Originally written in C#, I've ported it into C++ and Java. C# and Java will both encrypt at about 40 MB/s, but C++ will only encrypt at about 20 MB/s. Why is C++ running this much slower? Is it because I'm using Visual C++?
What can I do to speed up my code? Is there a different compiler that will optimize C++ better?
I've already tried optimizing the code itself, such as using x >> 3 instead of x / 8 (integer division), or y & 63 instead of y % 64 and other techniques. How can I build the project differently so that it is more performant in C++ ?
EDIT:
I must admit that I have not looked into how the compiler optimizes code. I have classes that I will be taking here in College that are dedicated to learning about compilers and interpreters.
As for my code in C++, it's not very complicated. There are NO includes, there is "basic" math along with something we call "state jumping" to produce pseudo random results. The most complicated things we do are bitwise operations that actually do the encryption and unchecked multiplication during an initial hashing phase. There are dynamically allocated 2D arrays which stay alive through the lifetime of the Encryption object (and properly released in a destructor). There's only 180 lines in this. Ok, so my micro-optimizations aren't necessary, but I should believe that they aren't the problem, it's about time. To really drill the point in, here is the most complicated line of code in the program:
input[L + offset] ^= state[state[SIndex ^ 255] & 63];
I'm not moving arrays, or working with objects.
Syntactically the entire set of code runs perfect and it'll work seamlessly if I were to encrypt something with C# and decrypt it with C++, or Java, all 3 languages interact as you'd expect they would.
I don't necessarily expect C++ to run faster then C# or Java (which are within 1 MB/s of each other), but I'm sure there's a way to make C++ run just as fast, or at least faster then it is now. I admit I'm not a C++ expert, I'm certainly not as seasoned in it as many of you seem to be, but if I can cut and paste 99% of the code from C# to C++ and get it to work in 5 mins, then I'm a little put out that it takes twice as long to execute.
RE-EDIT:
I found an optimization in Visual Studio I forgot to set before. Now C++ is running 50% faster then C#. Thanks for all the tips, I've learned a lot about compilers in my research.
Without source code it's difficult to say anything about the performance of your encryption algorithm/program.
I reckon though that you made a "mistake" while porting it to C++, meaning that you used it in a inefficient way (e.g. lots of copying of objects happens). Maybe you also used VC 6, whereas VC 9 would/could produce much better code.
As for the "x >> 3" optimization... modern compilers do convert integer division to bitshifts by themselves. Needless to say that this optimization may not be the bottleneck of your program at all. You should profile it first to find out where you're spending most of your time :)
The question is extreamly broad. Something that's efficient in C# may not be efficient in C++ and vice-versa.
You're making micro-optimisations, but you need to examine the overall design of your solution to make sure that it makes sense in C++. It may be a good idea to re-design large parts of your solution so that it works better in C++.
As with all things performance related, profile the code first, then modify, then profile again. Repeat until you've got to an acceptable level of performance.
Things that are 'relatively' fast in C# may be extremely slow in C++.
You can write 'faster' code in C++, but you can also write much slower code. Especially debug builds may be extremely slow in C++. So look at the type of optimizations by your compiler.
Mostly when porting applications, C# programmers tend to use the 'create a million newed objects' approach, which really makes C++ programs slow. You would rewrite these algorithm to use pre-allocated arrays and run with tight loops over these.
With pre-allocated memory you leverage the strengths of C++ in using pointers to memory by casting these to the right pod structured data.
But it really depends on what you have written in your code.
So measure your code an see where the implementations burn the most cpu, and then structure your code to use the right algorithms.
Your timing results are definitely not what I'd expect with well-written C++ and well-written C#. You're almost certainly writing inefficient C++. (Either that, or you're not compiling with the same sort of options. Make sure you're testing the release build, and check the optimization options.
However, micro-optimizations, like you mention, are going to do effectively nothing to improve the performance. You're wasting your time doing things that the compiler will do for you.
Usually you start by looking at the algorithm, but in this case we know the algorithm isn't causing the performance issue. I'd advise using a profiler to see if you can find a big time sink, but it may not find anything different from in C# or Java.
I'd suggest looking at how C++ differs from Java and C#. One big thing is objects. In Java and C#, objects are represented in the same way as C++ pointers to objects, although it isn't obvious from the syntax.
If you're moving objects about in Java and C++, you're moving pointers in Java, which is quick, and objects in C++, which can be slow. Look for where you use medium or large objects. Are you putting them in container classes? Those classes move objects around. Change those to pointers (preferably smart pointers, like std::tr1::shared_ptr<>).
If you're not experienced in C++ (and an experienced and competent C++ programmer would be highly unlikely to be microoptimizing), try to find somebody who is. C++ is not a really simple language, having a lot more legacy baggage than Java or C#, and you could be missing quite a few things.
Free C++ profilers:
What's the best free C++ profiler for Windows?
"Porting" performance-critical code from one language to another is usually a bad idea. You tend not to use the target language (C++ in this case) to its full potential.
Some of the worst C++ code I've seen was ported from Java. There was "new" for almost everything - normal for Java, but a sure performance killer for C++.
You're usually better off not porting, but reimplementing the critical parts.
The main reason C#/Java programs do not translate well (assuming everything else is correct). Is that C#/Java developers have not grokked the concept of objects and references correctly. Note in C#/Java all objects are passed by (the equivalent of) a pointer.
Class Message
{
char buffer[10000];
}
Message Encrypt(Message message) // Here you are making a copy of message
{
for(int loop =0;loop < 10000;++loop)
{
plop(message.buffer[loop]);
}
return message; // Here you are making another copy of message
}
To re-write this in a (more) C++ style you should probably be using references:
Message& Encrypt(Message& message) // pass a reference to the message
{
...
return message; // return the same reference.
}
The second thing that C#/Java programers have a hard time with is the lack of Garbage collection. If you are not releasing any memory correctly, you could start running low on memory and the C++ version is thrashing. In C++ we generally allocate objects on the stack (ie no new). If the lifetime of the object is beyond the current scope of the method/function then we use new but we always wrap the returned variable in a smart pointer (so that it will be correctly deleted).
void myFunc()
{
Message m;
// read message into m
Encrypt(m);
}
void alternative()
{
boost::shared_pointer<Message> m(new Message);
EncryptUsingPointer(m);
}
Show your code. We can't tell you how to optimize your code if we don't know what it looks like.
You're absolutely wasting your time converting divisions by constants into shift operations. Those kinds of braindead transformations can be made even by the dumbest compiler.
Where you can gain performance is in optimizations that require information the compiler doesn't have. The compiler knows that division by a power of two is equivalent to a right-shift.
Apart from this, there is little reason to expect C++ to be faster. C++ is much more dependent on you writing good code. C# and Java will produce pretty efficient code almost no matter what you do. But in C++, just one or two missteps will cripple performance.
And honestly, if you expected C++ to be faster because it's "native" or "closer to the metal", you're about a decade too late. JIT'ed languages can be very efficient, and with one or two exceptions, there's no reason why they must be slower than a native language.
You might find these posts enlightening.
They show, in short, that yes, ultimately, C++ has the potential to be faster, but for the most part, unless you go to extremes to optimize your code, C# will be just as fast, or faster.
If you want your C++ code to compete with the C# version, then a few suggestions:
Enable optimizations (you've hopefully already done this)
Think carefully about how you do disk I/O (IOStremas isn't exactly an ideal library to use)
Profile your code to see what needs optimizing.
Understand your code. Study the assembler output, and see what can be done more efficiently.
Many common operations in C++ are surprisingly slow. Dynamic memory allocation is a prime example. It is almost free in C# or Java, but very costly in C++. Stack-allocation is your friend.
Understand your code's cache behavior. Is your data scattered all over the place? It shouldn't be a surprise then that your code is inefficient.
Totally of topic but...
I found some info on the encryption module on the homepage you link to from your profile http://www.coreyogburn.com/bigproject.html
(quote)
Put together by my buddy Karl Wessels and I, we believe we have quite a powerful new algorithm.
What separates our encryption from the many existing encryptions is that ours is both fast AND secure. Currently, it takes 5 seconds to encrypt 100 MB. It is estimated that it would take 4.25 * 10^143 years to decrypt it!
[...]
We're also looking into getting a copyright and eventual commercial release.
I don't want to discourage you, but getting encryption right is hard. Very hard.
I'm not saying it's impossible for a twenty year old webdeveloper to develop an encryption algorithm that outshines all existing algorithms, but it's extremely unlikely, and I'm very sceptic, I think most people would be.
Nobody who cares about encryption would use an algorithm that's unpublished. I'm not saying you have to open up your sourcecode, but the workings of the algorithm must be public, and scrutinized, if you want to be taken seriously...
There are areas where a language running on a VM outperforms C/C++, for example heap allocation of new objects. You can find more details here.
There is a somwhat old article in Doctor Dobbs Journal named Microbenchmarking C++, C#, and Java where you can see some actual benchmarks, and you will find that C# sometimes is faster than C++. One of the more extreme examples is the single hash map benchmark. .NET 1.1 is a clear winner at 126 and VC++ is far behind at 537.
Some people will not believe you if you claim that a language like C# can be faster than C++, but it actually can. However, using a profiler and the very high level of fine-grained control that C++ offers should enable you to rewrite your application to be very performant.
When serious about performance you might want to be serious about profiling.
Separately, the "string" object implementation used in C# Java and C++, is noticeably slower in C++.
There are some cases where VM based languages as C# or Java can be faster than a C++ version. At least if you don't put much work into optimization and have a good knowledge of what is going on in the background. One reason is that the VMs can optimize byte-code at runtime and figure out which parts of the program are used often and changes its optimization strategy. On the other hand an old fashioned compiler has to decide how to optimize the program on compile-time and may not find the best solution.
The C# JIT probably noticed at run-time that the CPU is capable of running some advanced instructions, and is compiling to something better than what the C++ was compiled.
You can probably (surely with enough efforts) outperform this by compiling using the most sophisticated instructions available to the designated C.P.U and using knowledge of the algorithm to tell the compiler to use SIMD instructions at specific stages.
But before any fancy changes to your code, make sure are you C++ compiling to your C.P.U, not something much more primitive (Pentium ?).
Edit:
If your C++ program does a lot of unwise allocations and deallocations this will also explain it.
In another thread, I pointed out that doing a direct translation from one language to another will almost always end up in the version in the new language running more poorly.
Different languages take different techniques.
Try the intel compiler. Its much better the VC or gcc. As for the original question, I would be skeptical. Try to avoid using any containers and minimize the memory allocations in the offending function.
[Joke]There is an error in line 13[/Joke]
Now, seriously, no one can answer the question without the source code.
But as a rule of the thumb, the fact that C++ is that much slower than managed one most likely points to the difference of memory management and object ownership issues.
For instance, if your algorithm is doing any dynamic memory allocations inside the processing loop, this will affect the performance. If you pass heavy structures by the value, this will affect the performance. If you do unnecessary copies of objects, this will affect the performance. Exception abuse will cause performance to go south. And still counting.
I know the cases when forgotten "&" after the parameter name resulted in weeks of profiling/debugging:
void DoSomething(const HeavyStructure param); // Heavy structure will be copied
void DoSomething(const HeavyStructure& param); // No copy here
So, check your code to find possible bottlenecks.
C++ is not a language where you must use classes. In my opinion its not logical to use OOP methodologies where it doesnt really help. For a encrypter / decrypter its best not use classes; use arrays, pointers, use as few functions / classes / files possible. Best encryption system consists of a single file containing few functions. After your function works nice you can wrap it into classes if you wish. Also check the release build. There is huge speed difference
Nothing is faster than good machine/assembly code, so my goal when writing C/C++ is to write my code in such a way that the compiler understands my intentions to generate good machine code. Inlining is my favorite way to do this.
First, here's an aside. Good machine code:
uses registers more often than memory
rarely branches (if/else, for, and while)
uses memory more often than functions calls
rarely dynamically allocates any more memory (from the heap) than it already has
If you have a small class with very little code, then implement its methods in the body of the class definition and declare it locally (on the stack) when you use it. If the class is simple enough, then the compiler will often only generate a few instructions to effect its behavior, without any function calls or memory allocation to slow things down, just as if you had written the code all verbose and non-object oriented. I usually have assembly output turned on (/FAs /Fa with Visual C++) so I can check the output.
It's nice to have a language that allows you to write high-level, encapsulated object-oriented code and still translate into simple, pure, lightning fast machine code.
Here's my 2 cents.
I wrote a BlowFish cipher in C (and C#). The C# was almost 'identical' to the C.
How I compiled (i cant remember the numbers now, so just recalled ratios):
C native: 50
C managed: 15
C#: 10
As you can see, the native compilation out performs any managed version. Why?
I am not 100% sure, but my C version compiled to very optimised assembly code, the assembler output almost looked the same as a hand written assembler one I found.

C# - Default library has better performance?

Earlier today i made myself a lightweight memory stream, which basically writes to a byte array. I thought i'd benchmark the two of them to see if there's any difference - And there was:
(writing 1 byte to the array)
MemoryStream: 1.0001ms
mine: 3.0004ms
Everyone tells me that MemoryStream basically provides a byte array and a bunch of methods to work with it.
My question: Does the default C# library have a slightly better performance than the code we write? (maybe it runs in release rather than debug?)
The .NET implementation was probably a bit better than your own, but also, how did you benchmark? A couple of million iterations, or just a few? Remember that you need to use a large test base so that you can eliminate some data (CPU being called away for a moment, etc) that will give false results.
The folks at Microsoft are much smarter than you and I and most likely have written a better optimized wrapper over Byte[], much better than something that you or I would implement.
If you are curious, I would suggest that you disassemble the types that you have recreated to see how exactly Microsoft has implemented them. In some of the more important areas of the framework (such as this I would imagine) you will find that the BCL calls out to unmanaged code to accomplish its goals.
Unmanaged code has a much better chance of outperforming managed code in cases like this since you can freely work with arrays without the overhead of a managed runtime (for things like bounds checking and such).
Many of the framework assemblies are NGENed, which may give them a small boost by bypassing the initial JIT time. This is unlikely to be the cause of a 2ms difference, especially if you'd already warmed up your methods before starting the stopwatch, but I mention it for completeness.
Also, yes, the framework assemblies are built in "release" mode (optimisations on and checks off), not "debug."
You probably used Array.Copy() instead of the faster Buffer.BlockCopy(). The fastest way is to use unsafe code with pointers. Check out how they do this in the Mono project (search for memcpy).
Id wager that Microsoft's implementation is a wee bit better than yours. ;)
Did you check the source?

How does const correctness help write better programs?

This question is from a C# guy asking the C++ people. (I know a fair bit of C but only have some general knowledge of C++).
Allot of C++ developers that come to C# say they miss const correctness, which seems rather strange to me. In .Net in order to disallow changing of things you need to create immutable objects or objects with read only properties/fields, or to pass copies of objects (by default structs are copied).
Is C++ const correctness, a mechanism to create immutable/readonly objects, or is there more to it? If the purpose is to create immutable/readonly objects, the same thing can be done in an environment like .Net.
A whole section is devoted to Const Correctness in the FAQ. Enjoy!
const correctness in C++ is at heart simply a way of saying exactly what you mean:
void foo( const sometype & t ) {
...
}
says "I will not change the object referred to by t, and if I try to, please Mr. Compiler warn me about it".
It is not a security feature or a way of (reliably) creating read-only objects, because it is trivial to subvert using C-style casts.
I bet this is a matter of mindsets. I'm working myself with long-time C++ people and have noticed they seem to think in C++ (of course they do!). It will take time for them to start seeing multiple answers to the ones they've grown familiar to have the 'stock' answers like const correctness. Give them time, and try to educate them gently.
Having said that, I do like the C++ 'const' way of guarantees. It's mainly promising the user of an interface that the object, data behind the pointer or whatever won't be messed with. I would see this as the #1 value of const correctness. #2 value is what it can allow the compiler in terms of optimization.
The problem is, of course, if the whole sw stack hasn't been built with const in mind from the ground up.
I'm sure C# has its own set of paradigms to do the same. This shows why it's so important to "learn one new (computing or natural) language a year". Simply to exercise one's mind of the other ways to see the world, and solve its problems.
With const-correctness, you can expose a mutable object as read-only without making a copy of it and wasting precious memory (especially relevant for collections). I know, there is this CPU-and-memory-are-cheap philosophy among desktop developers, but it still matters in embedded world.
On top of that, if you add complexities of memory ownership in C++, it is almost always better not to copy non-trivial objects that contain or reference other objects, hence there is a method of exposing existing objects as read-only.
Excellent Scott Meyer's book Effective C++ has dedicated an item to this issue:
ITEM 3: Use const whenever possible.
It really enlightens you :-)
The real answer, even if it is really hard to grasp if you have not used it, is that you loose expressiveness. The language has concepts that you cannot possibly express in C#, and they have meaning, they are part of the design, but lost in the translation to code.
This is not an answer, but rather an example:
Consider that you have a container that is sorted on a field of the elements that are stored. Those objects are really big. You need to offer access to the data for readers (consider showing the information in the UI).
Now, in C#/Java, you can go in one of two ways: either you make a copy for the caller to use (guarantees that the data will not change, but inefficient) or you return a reference to your internally held object (just hoping the caller will not change your data through setters).
If the user gets the reference and changes through it the field that serves as index, then your container invariants are broken.
In C++ you can return a constant reference/pointer, and the compiler will disable calling setter methods (mutating methods) on the instance you return. You get both the security that the user will not change (*) and efficiency in the call.
The third not mentioned before option is making the object inmutable, but that disables changes everywhere. Perfectly controlled changes to the object will be disallowed, and the only possibility of change is creating a new element with the changes performed. That amounts to time and space.
C++ provides plenty of ways you can mess up your code. I see const correctness as a first, efficient way of putting constraints to the code, in order to control the "side" effects (unwanted changes).
Marking a parameter as const (usually reference to const object or pointer to const object), will ensure that the passed object can't be changed, so you have no "side" effect of that function/method.
Marking a method as const will guarantee that that method will not change the state of the object it works with. If it does, the compiler will generate an error.
Marking a const data member will ensure that the data member can only be initialized in the initialization list of the constructor and can't be changed.
Also, smart compilers can use the constness as hints for various performance optimizations.
Of course, you can override constness. Constness is compile time checking. You can't break the rules by default, but you can use casting (e.g. const_cast) to make a const object non-const so it can be changed.
Just enlisting the compiler's help when writing code would be enough for me to advocate const-correctness. But today there is an additional advantage: multi-threading code is generally easier to write when you know where can our objects change and where they cannot.
I think it's a pity C# doesn't support const correctness as C++ does. 95% of all parameters I pass to a function are constant values, the const feature guarantees that the data you pass by reference won't be modified after the call. The "const" keyword provides compiler-checked documentation, which is a great help in large programs.

Categories