Good morning,
I am writting a spell checker which, for the case, is performance-critical. That being, and since I am planning to connect to a DB and making the GUI using C#, I wrote an edit-distance calculation routine in C and compiled to a DLL which I use in C# using DllImport. The problem is that I think (though I am possibly wrong) that marshalling words one by one from String to char * is causing a lot of overhead. That being, I thought about using C++/CLI so that I can work with the String type in .NET directly... My question is then how does C++/CLI performance compares to native C code for heavy mathematical calculations and array access?
Thank you very much.
C++/CLI will have to do some kind of marshaling too.
Like all performance related problems, you should measure and optimize. Are you sure C# is not going to be fast enough for your purposes? Don't underestimate the optimizations that JIT compiler is going to do. Don't speculate on the overhead of a language implementation solely for being managed without trying. If it's not enough, have you considered unsafe C# code (with pointers) before trying unmanaged code?
Regarding the performance profile of C++/CLI, it really depends on the way it's used. If you compile to managed code (CIL) with (/clr:pure), it's not going to be very different from C#. Native C++ functions in C++/CLI will have similar performance characteristics to plain C++. Passing objects between native C++ and CLI environment will have some overhead.
I would not expect that the bottleneck will be with the DLLImport.
I have written programs which call DLLImport several hundert times per second and it just works fine.You will pay a small performance fine, but the fine is small.
Don't assume you know what needs to be optimized. Let sampling tell you.
I've done a couple spelling-correctors, and the way I did it (outlined here) was to organize the dictionary as a trie in memory, and search upon that.
If the number of words is large, the size of the trie can be much reduced by sharing common suffixes.
Related
I have a large application written in C++, For various reasons I am going to rewrite it in C#, I have done plenty of Delphi to C#, VB to C# but C++ to C# I have never done, although I am competent in C++ I want this to be as smooth a conversion as possible.
Mainly what I am asking is what pitfalls await me in this conversion is there any key areas I should be aware of or any advice you can provide me.
This article is quite good, but is there anything else I should be weary of?
http://msdn.microsoft.com/en-us/magazine/cc301520.aspx
Main pitfall - do not think it's an upgrade. These are DIFFERENT languages, and in many places you will need complete different approach to the problems. So you should think reimplementation with minimal code reuse.
This article is decent.
I'd advice you to pay attention to the objects lifecycle.
In C++ you destroy objects explicitly when you've done with them. In C# (.NET) you don't. It can happen that an object holds on to some important resource (file handle, database connection etc.). If it is an issue, make use of the using directive.
You need to translate the spirit of the code, but not the code itself. You need to leave behind all the things you had to do in C++ because that was how it was done there. Good translation is highly creative process, so be creative.
The handling of strings was a pitfall for me at the beginning. While I Visual C++ you use pointers the methods in C# have indeed return values.
dummy = dummy.Replace("a", "b");
If you have C++ dll and you want to use them in your C# project you can used them by pinvoke and DllImport
There will be lot of differences while you will try to convert or rewrite unmanaged code to managed one. Here is a C++ to C# converter which is quite good for converting your c++ code to C# , though you can not expect to convert the whole project using it.
I want to know which is beneficial in terms of performance and execution time: calling C code from C# or converting C code into C# code.
Note: My C code uses Linear Algebra Library.
In terms of performance unmanaged code is supposed to be faster than managed code, at least that's what the theory says (depending on the scenario this might or might not be the case, only profiling could show). Now if you have many calls between those two worlds marshaling will occur which will slow down your code and probably render it less performant than a 100% managed solution. If on the other hand you have let's say a single call to an unmanaged function that does the heavy lifting it will probably perform better. So if you decide to go the interop path make sure you limit the number of calls to a minimum.
First, make sure that's where you need to optimize.
Then, write and measure both solutions (or more if you can) on your deployment computer(s).
The only way to have a definitive answer to your specific case is to do actual performance timings for both. Having said that, I'd recommend using a more C# approach for most cases. The fact is that managed code is extremely fast. Also, there is going to be a performance penalty by going from managed code and calling out to unmanaged code.
Bottom line, default approach should be to keep everything in managed code IMO. Only look to write something in pure C if situation truly calls for it. There are many extremely fast and high performance solution in Production today in pure managed C# code.
The only real answer is "it depends". How often are you calling the C code? Are you passing huge data structures back and forth? Are there C'isms in the code that'd take lots of extra processing to emulate in C#? There are dozens of other questions that come into play here.
I will mention, though, that .net is not nearly as slow as you seem to think.
In general, calling C code from C# is faster than the other way 'round.
There is virtually no overhead in calling C functions; in fact, the OS functions are all called using C linkage.
C# however needs some wrapper for the "managed" part, and again some wrapper for Datatypes.
If you have an existing C library that works fine, save yourself the trouble of porting it. You will only lose efficiency there - instead, make it usable from C# with some stubs and wrapper classes, that's the best solution if you have to or want to use C# with the existing C library.
Pure C code will be the fastest and most efficient way; however, you should be aware you'll lack many of the conveniences of C# if you chose to go that way.
In C++ world there is a variety of ways to make an exploitable vulnerability: buffer overflow, unsafe sting handling, various arithmetic tricks, printf issues, strings not ending with '\0' and many more. Despite most of these problems were solved in java, there are some things to talk about.
But is there any list of typical C#-specific coding vulnerabilities? (and not related to .NET platform itself)
Here are a few issues you can run into:
If you've got any sort of language interpreter (HTML, JavaScript, and SQL being the big three) then you can still have injection or XSS vulnerabilities.
P/Invoke can cause problems, especially if you're doing any custom marshalling. Even if you're calling a "safe" API through P/Invoke, your marshalling code could contain a bug that corrupts or exposes memory.
If you're doing file access then you need to make sure your files are always in acceptable directories. Be sure to sanitize against bad absolute and relative paths.
Cryptography. Good cryptographic programming is really hard, and .Net's various safety features do nothing against crypto attacks.
C# is based on .NET and .NET is supposed to be type-safe, which means none of your list of horrors applies to C# or any .NET language.
But then again, C# has an unsafe keyword and after that all bets are off.
It allows real pointers and everything that comes with them.
Not really. I'm going to make a bold statement here:
There's no such thing as a "C#-specific coding vulnerability that isn't related to the .net platform".
A program written in C++ is compiled directly into a machine executable, so the language compiler is directly responsible for the creation of the executed code, hence the way C++ can be easily capable of "creating an exploitable vulnerability".
A program written in C# however is compiled into IL, which is the only language that the .net platform works with. The .net environment creates a machine executable based on that IL code. Everything that C# can do is merely a subset of what the .net platform is capable of. This is how I can make my bold statement. Anything you could possibly do with C# that created a coding vulnerability would be one of:
1) A bug in the .net platform
or
2) Executing code outside of the .net platform
So the way your question is currently phrased leads me to believe that either you're not fully aware of the huge differences between "writing code in C" and "writing code for the .net platform" or I'm misunderstanding your question. Perhaps a bit of both! 8 )
Hope this helps!
Probably none from your list of concerns but this is the one to be careful with: void*
Don't forget, you can call any C++ from C#. I do it all the time. So all the buffer overrun issues and so on for C++ are relevant for C# as well even if you don't directly call C++ because C# calls C++ to do it's work.
Think about it. And any COM calls and Marshal calls are just as open to attack as normal. In Linux you can use _r routines and in Ver 8 up in VC++ you can use _s routines to lessen then chance of buffer overflow (requires user buffers and/or max sizes). About the only way to stop vulnerabilities is to turn off your computer and read a paper back book (unless it too has a virus).
What are the advantages (the list of possible disadvantages is lenghtly) of doing 100% managed development using C++/CLI (that is, compile with /clr:safe which "generates ... assemblies, like those written in ... C#")? Especially when compard to C# (note C++/CLI : Advantages over C# and Is there any advantage to using C++/CLI over either standard C++ or C#? are mostly about managed/unmanaged interop).
For example, here are a few off the top of my head:
C++-style references for managed types, not as elegant as full blown non-nullable references but better than nothing or using a work-around.
templates which are more powerful than generics
preprocessor (this may be a disadvantage!, but macros can be useful for code generation)
stack semantics for reference types--automatically calling IDisposable::Dispose()
easier implementation of Dispose() via C++ destructor
C# 3.0 added auto-implemented properties, so that is no longer a C++/CLI advantage.
I would think that the single biggest advantage is the managed/unmanaged interop. Writing pure managed C++/CLI would (to me at least) without interoping with C# or other .Net languages seems like missing the point entirely. Yeah you could do this, but why would you.
If you're going to write pure managed code why not use C#. Especially (like nobugs said) if VS2010 drops IntelliSense support for C++/CLI. Also in VS2008 the IntelliSense for C++/CLI isn't as good the C# IntelliSense; so from a developer standpoint, it's easier to work/explore/refactor in C# than C++/CLI.
If you want some of the C++ benefits you list like the preprocessor, stack semantics and templates, then why not use C++?
Odd, I like C++/CLI but you listed exactly its features I dislike. My criticisms:
Okay. But accidental use of the hat is pretty widespread, getting the value of the value type boxed without warning. There is no way to diagnose this mistake.
Power that comes at a high price, templates you write are not usable in any other .NET language. If anything, it worsens the C++ template export problem. The complete failure of STL/CLR is worth pondering too.
Erm, no.
This was IMO a serious mistake. It already is difficult to avoid problems with accidental boxing, as outlined in the first bullet. Stack semantics makes it seriously difficult for any starting programmer to sort this out. This was a design decision to placate C++ programmers, that's okay, but the using statement was a better solution.
Not sure how it is easier. The GC.SuppressFinalize() call is automatic, that's all. It is very rare for anybody to write a finalizer, but you can't avoid the auto-generated code from making the call. That's inefficient and a violation of the 'you don't pay for what you don't use' principle. Add to this that writing the destructor also forces a default finalizer to be auto-generated. One you'd never use and wouldn't want to be used if you forgot or omitted to use the destructor.
Well, that's all very subjective perhaps. The death-knell will come with VS2010, it will ship without IntelliSense support for C++/CLI.
In C++/CLI you can define functions outside of classes, you can't do that in C#. But I don't know if that is an advantage
Like others here, I can't think of any general cases where a clear advantage exists, so my thinking turned to situational advantages -- are there any cases where there is an advantage in a particular scenario?
Advantage: Leverage the C++ skill set of technical staff in a rapid prototyping scenario.
Let me elaborate ...
I have worked quite a bit with scientists and (non-software) engineers who aren't formally trained programmers. Many of these people use C++ for developing specific modules involving high-end physics/mathematics. If a pure .NET module is required in a rapid prototyping scenario and the skill set of the scientist/engineer responsible for the module is C++, I would teach them a small amount of additional syntax (public ref, ^ and % and gcnew) and get them to program up their module as a 100% managed C++/CLI DLL.
I recognize there are a whole heap of possible "Yes, but ..." responses, but I think leveraging the C++ skill set of technical staff is a possible advantage of C++/CLI.
I agree on what you have mentioned and as an example of preprocessor use point to: Boost Preprocessor library for generating a set of types based on a list of basic types e.g. PointI32, PointF32 etc. in C++/CLI
You can have enums and delegates as generic constraints in C++/CLI, but not in C#.
https://connect.microsoft.com/VisualStudio/feedback/details/386194/allow-enum-as-generic-constraint-in-c
There is a library to simulate these constraints in C#.
http://code.google.com/p/unconstrained-melody/
One could imagine the following requirements for a hypothetical product:
Quick time-to-market on Windows
Eventual deploy to non-Windows platforms
Must not rely on Mono for non-Windows
In such a scenario, using eg C# for 1 would stymie you on 2 and 3 without a rewrite. So, one could develop in C++/CLI, suitably munged with macros and template shenanigans to look as much like ordinary C++ as possible, to hit reqt 1, then to hit reqt 2 one would need to (a) reimplement said macros and template shenanigans to map to pukka C++ and (b) implement .NET framework classes used in pukka C++. Note that (a) and (b) could be reused in future once done once.
The most obvious objection would be "well why not do the whole thing in native C++ then?"; well maybe there's lots of good stuff in the vast .NET class library that you want to use to get to market asap.
All a bit tenuous I admit, so I very much doubt this has ever been done, but it'd be a fun thing to try out !
Recently I was talking with a friend of mine who had started a C++ class a couple months ago (his first exposure to programming). We got onto the topic of C# and .NET generally, and he made the point to me that he felt it was 'doomed' for all of the commonly-cited issues (low speed, breakable bytecode, etc). I agreed with him on all those issues, but I held back in saying it was doomed, only because I felt that, in time, languages like C# could instead become native code (if Microsoft so chose to change the implementation of .NET from a bytecode, JIT runtime environemnent to one which compiles directly to native code like your C++ program does).
My question is, am I out to lunch here? I mean, it may take a lot of work (and may break too many things), but there isn't some type of magical barrier which prevents C# code from being compiled natively (if one wanted to do it), right? There was a time where C++ was considered a very high-level language (which it still is, but not as much as in the past) yet now it's the bedrock (along with C) for Microsoft's native APIs. The idea that .NET could one day be on the same level as C++ in that respect seems only to be a matter of time and effort to me, not some fundamental flaw in the design of the language.
EDIT: I should add that if native compilation of .NET is possible, why does Microsoft choose not to go that route? Why have they chosen the JIT bytecode path?
Java uses bytecode. C#, while it uses IL as an intermediate step, has always compiled to native code. IL is never directly interpreted for execution as Java bytecode is. You can even pre-compile the IL before distribution, if you really want to (hint: performance is normally better in the long run if you don't).
The idea that C# is slow is laughable. Some of the winforms components are slow, but if you know what you're doing C# itself is a very speedy language. In this day and age it generally comes down to the algorithm anyway; language choice won't help you if you implement a bad bubble sort. If C# helps you use more efficient algorithms from a higher level (and in my experience it generally does) that will trump any of the other speed concerns.
Based on your edit, I also want to explain the (typical) compilation path again.
C# is compiled to IL. This IL is distributed to local machines. A user runs the program, and that program is then JIT-compiled to native code for that machine once. The next time the user runs the program on that machine they're running a fully-native app. There is also a JIT optimizer that can muddy things a bit, but that's the general picture.
The reason you do it this way is to allow individual machines to make compile-time optimizations appropriate to that machine. You end up with faster code on average than if you distributed the same fully-compiled app to everyone.
Regarding decompilation:
The first thing to note is that you can pre-compile to native code before distribution if you really want to. At this point you're close to the same level as if you had distributed a native app. However, that won't stop a determined individual.
It also largely misunderstands the economics at play. Yes, someone might perhaps reverse-engineer your work. But this assumes that all the value of the app is in the technology. It's very common for a programmer to over-value the code, and undervalue the execution of the product: interface design, marketing, connecting with users, and on-going innovation. If you do all of that right, a little extra competition will help you as much as it hurts by building up demand in your market. If you do it wrong, hiding your algorithm won't save you.
If you're more worried about your app showing up on warez sites, you're even more misguided. It'll show up there anyway. A much better strategy is to engage those users.
At the moment, the biggest impediment to adoption (imo) is that the framework redistributable has become mammoth in size. Hopefully they'll address that in a relatively near release.
Are you suggesting that the fact that C# is managed code is a design flaw??
C# can be natively compiled using tool such as NGEN, and the MONO (open source .net framework) team has developed full AOT (ahead of time) compilation which allows c# to run on the IPhone. However, full compilation is culbersome because it destroys cross-platform compatibility, and some machine-specific optimizations cannot be done. However, it is also important to note that .net is not an interpreted language, but a JIT (just in time) compiled language, which means it runs natively on the machine.
dude, fyi, you can always compile your c# assemblies into native image using ngen.exe
and are you suggesting .net is flawed design? it was .net which brought back ms back into the game from their crappy vb 5, vb 6, com days. it was one of their biggest bets
java does the same stuff - so are you suggesting java too is a mistake?
reg. big vendors - please note .net has been hugely hugely successful across companies of all sizes (except for those open source guys - nothing wrong with that). all these companies have made significant amount of investments into the .net framework.
and to compare c# speed with c++ is a crazy idea according to me. does c++ give u managed environment along with a world class powerful framework?
and you can always obfuscate your assemblies if you are so paranoid about decompilation
its not about c++ v/s c#, managed v/s unmanaged. both are equally good and equally powerful in their own domains
C# could be natively compiled but it is unlikely the base class library will ever go there. On the flip side, I really don't see much advantage to moving beyond JIT.
It certainly could, but the real question is why? I mean, sure, it can be slow(er), but most of the time any major differences in performance come down to design problems (wrong algorithms, thread contention, hogging resources, etc.) rather than issues with the language. As for the "breakable" bytecode, it doesn't really seem to be a huge concern of most companies, considering adoption rates.
What it really comes down to is, what's the best tool for the job? For some, it's C++; for others, Java; for others, C#, or Python, or Erlang.
Doomed? Because of supposed performance issues?
How about comparing the price of:
programmer's hour
hardware components
If you have performance issues with applications, it's much cheaper to just buy yourself better hardware, compared to the benefits you loose in switching from a higher-abstraction language to a lower one (and I don't have anything against C++, I've been a C++ developer for a long time).
How about comparing maintenance problems when trying to find memory leaks in C++ code compared to garbage-collected C# code?
"Hardware is Cheap, Programmers are Expensive": http://www.codinghorror.com/blog/archives/001198.html