C# running faster than C++?

C# running faster than C++? - c#

A friend and I have written an encryption module and we want to port it to multiple languages so that it's not platform specific encryption. Originally written in C#, I've ported it into C++ and Java. C# and Java will both encrypt at about 40 MB/s, but C++ will only encrypt at about 20 MB/s. Why is C++ running this much slower? Is it because I'm using Visual C++?
What can I do to speed up my code? Is there a different compiler that will optimize C++ better?
I've already tried optimizing the code itself, such as using x >> 3 instead of x / 8 (integer division), or y & 63 instead of y % 64 and other techniques. How can I build the project differently so that it is more performant in C++ ?
EDIT:
I must admit that I have not looked into how the compiler optimizes code. I have classes that I will be taking here in College that are dedicated to learning about compilers and interpreters.
As for my code in C++, it's not very complicated. There are NO includes, there is "basic" math along with something we call "state jumping" to produce pseudo random results. The most complicated things we do are bitwise operations that actually do the encryption and unchecked multiplication during an initial hashing phase. There are dynamically allocated 2D arrays which stay alive through the lifetime of the Encryption object (and properly released in a destructor). There's only 180 lines in this. Ok, so my micro-optimizations aren't necessary, but I should believe that they aren't the problem, it's about time. To really drill the point in, here is the most complicated line of code in the program:
input[L + offset] ^= state[state[SIndex ^ 255] & 63];
I'm not moving arrays, or working with objects.
Syntactically the entire set of code runs perfect and it'll work seamlessly if I were to encrypt something with C# and decrypt it with C++, or Java, all 3 languages interact as you'd expect they would.
I don't necessarily expect C++ to run faster then C# or Java (which are within 1 MB/s of each other), but I'm sure there's a way to make C++ run just as fast, or at least faster then it is now. I admit I'm not a C++ expert, I'm certainly not as seasoned in it as many of you seem to be, but if I can cut and paste 99% of the code from C# to C++ and get it to work in 5 mins, then I'm a little put out that it takes twice as long to execute.
RE-EDIT:
I found an optimization in Visual Studio I forgot to set before. Now C++ is running 50% faster then C#. Thanks for all the tips, I've learned a lot about compilers in my research.

Without source code it's difficult to say anything about the performance of your encryption algorithm/program.
I reckon though that you made a "mistake" while porting it to C++, meaning that you used it in a inefficient way (e.g. lots of copying of objects happens). Maybe you also used VC 6, whereas VC 9 would/could produce much better code.
As for the "x >> 3" optimization... modern compilers do convert integer division to bitshifts by themselves. Needless to say that this optimization may not be the bottleneck of your program at all. You should profile it first to find out where you're spending most of your time :)

The question is extreamly broad. Something that's efficient in C# may not be efficient in C++ and vice-versa.
You're making micro-optimisations, but you need to examine the overall design of your solution to make sure that it makes sense in C++. It may be a good idea to re-design large parts of your solution so that it works better in C++.
As with all things performance related, profile the code first, then modify, then profile again. Repeat until you've got to an acceptable level of performance.

Things that are 'relatively' fast in C# may be extremely slow in C++.
You can write 'faster' code in C++, but you can also write much slower code. Especially debug builds may be extremely slow in C++. So look at the type of optimizations by your compiler.
Mostly when porting applications, C# programmers tend to use the 'create a million newed objects' approach, which really makes C++ programs slow. You would rewrite these algorithm to use pre-allocated arrays and run with tight loops over these.
With pre-allocated memory you leverage the strengths of C++ in using pointers to memory by casting these to the right pod structured data.
But it really depends on what you have written in your code.
So measure your code an see where the implementations burn the most cpu, and then structure your code to use the right algorithms.

Your timing results are definitely not what I'd expect with well-written C++ and well-written C#. You're almost certainly writing inefficient C++. (Either that, or you're not compiling with the same sort of options. Make sure you're testing the release build, and check the optimization options.
However, micro-optimizations, like you mention, are going to do effectively nothing to improve the performance. You're wasting your time doing things that the compiler will do for you.
Usually you start by looking at the algorithm, but in this case we know the algorithm isn't causing the performance issue. I'd advise using a profiler to see if you can find a big time sink, but it may not find anything different from in C# or Java.
I'd suggest looking at how C++ differs from Java and C#. One big thing is objects. In Java and C#, objects are represented in the same way as C++ pointers to objects, although it isn't obvious from the syntax.
If you're moving objects about in Java and C++, you're moving pointers in Java, which is quick, and objects in C++, which can be slow. Look for where you use medium or large objects. Are you putting them in container classes? Those classes move objects around. Change those to pointers (preferably smart pointers, like std::tr1::shared_ptr<>).
If you're not experienced in C++ (and an experienced and competent C++ programmer would be highly unlikely to be microoptimizing), try to find somebody who is. C++ is not a really simple language, having a lot more legacy baggage than Java or C#, and you could be missing quite a few things.

Free C++ profilers:
What's the best free C++ profiler for Windows?

"Porting" performance-critical code from one language to another is usually a bad idea. You tend not to use the target language (C++ in this case) to its full potential.
Some of the worst C++ code I've seen was ported from Java. There was "new" for almost everything - normal for Java, but a sure performance killer for C++.
You're usually better off not porting, but reimplementing the critical parts.

The main reason C#/Java programs do not translate well (assuming everything else is correct). Is that C#/Java developers have not grokked the concept of objects and references correctly. Note in C#/Java all objects are passed by (the equivalent of) a pointer.
Class Message
{
char buffer[10000];
}
Message Encrypt(Message message) // Here you are making a copy of message
{
for(int loop =0;loop < 10000;++loop)
{
plop(message.buffer[loop]);
}
return message; // Here you are making another copy of message
}
To re-write this in a (more) C++ style you should probably be using references:
Message& Encrypt(Message& message) // pass a reference to the message
{
...
return message; // return the same reference.
}
The second thing that C#/Java programers have a hard time with is the lack of Garbage collection. If you are not releasing any memory correctly, you could start running low on memory and the C++ version is thrashing. In C++ we generally allocate objects on the stack (ie no new). If the lifetime of the object is beyond the current scope of the method/function then we use new but we always wrap the returned variable in a smart pointer (so that it will be correctly deleted).
void myFunc()
{
Message m;
// read message into m
Encrypt(m);
}
void alternative()
{
boost::shared_pointer<Message> m(new Message);
EncryptUsingPointer(m);
}

Show your code. We can't tell you how to optimize your code if we don't know what it looks like.
You're absolutely wasting your time converting divisions by constants into shift operations. Those kinds of braindead transformations can be made even by the dumbest compiler.
Where you can gain performance is in optimizations that require information the compiler doesn't have. The compiler knows that division by a power of two is equivalent to a right-shift.
Apart from this, there is little reason to expect C++ to be faster. C++ is much more dependent on you writing good code. C# and Java will produce pretty efficient code almost no matter what you do. But in C++, just one or two missteps will cripple performance.
And honestly, if you expected C++ to be faster because it's "native" or "closer to the metal", you're about a decade too late. JIT'ed languages can be very efficient, and with one or two exceptions, there's no reason why they must be slower than a native language.
You might find these posts enlightening.
They show, in short, that yes, ultimately, C++ has the potential to be faster, but for the most part, unless you go to extremes to optimize your code, C# will be just as fast, or faster.
If you want your C++ code to compete with the C# version, then a few suggestions:
Enable optimizations (you've hopefully already done this)
Think carefully about how you do disk I/O (IOStremas isn't exactly an ideal library to use)
Profile your code to see what needs optimizing.
Understand your code. Study the assembler output, and see what can be done more efficiently.
Many common operations in C++ are surprisingly slow. Dynamic memory allocation is a prime example. It is almost free in C# or Java, but very costly in C++. Stack-allocation is your friend.
Understand your code's cache behavior. Is your data scattered all over the place? It shouldn't be a surprise then that your code is inefficient.

Totally of topic but...
I found some info on the encryption module on the homepage you link to from your profile http://www.coreyogburn.com/bigproject.html
(quote)
Put together by my buddy Karl Wessels and I, we believe we have quite a powerful new algorithm.
What separates our encryption from the many existing encryptions is that ours is both fast AND secure. Currently, it takes 5 seconds to encrypt 100 MB. It is estimated that it would take 4.25 * 10^143 years to decrypt it!
[...]
We're also looking into getting a copyright and eventual commercial release.
I don't want to discourage you, but getting encryption right is hard. Very hard.
I'm not saying it's impossible for a twenty year old webdeveloper to develop an encryption algorithm that outshines all existing algorithms, but it's extremely unlikely, and I'm very sceptic, I think most people would be.
Nobody who cares about encryption would use an algorithm that's unpublished. I'm not saying you have to open up your sourcecode, but the workings of the algorithm must be public, and scrutinized, if you want to be taken seriously...

There are areas where a language running on a VM outperforms C/C++, for example heap allocation of new objects. You can find more details here.

There is a somwhat old article in Doctor Dobbs Journal named Microbenchmarking C++, C#, and Java where you can see some actual benchmarks, and you will find that C# sometimes is faster than C++. One of the more extreme examples is the single hash map benchmark. .NET 1.1 is a clear winner at 126 and VC++ is far behind at 537.
Some people will not believe you if you claim that a language like C# can be faster than C++, but it actually can. However, using a profiler and the very high level of fine-grained control that C++ offers should enable you to rewrite your application to be very performant.

When serious about performance you might want to be serious about profiling.
Separately, the "string" object implementation used in C# Java and C++, is noticeably slower in C++.

There are some cases where VM based languages as C# or Java can be faster than a C++ version. At least if you don't put much work into optimization and have a good knowledge of what is going on in the background. One reason is that the VMs can optimize byte-code at runtime and figure out which parts of the program are used often and changes its optimization strategy. On the other hand an old fashioned compiler has to decide how to optimize the program on compile-time and may not find the best solution.

The C# JIT probably noticed at run-time that the CPU is capable of running some advanced instructions, and is compiling to something better than what the C++ was compiled.
You can probably (surely with enough efforts) outperform this by compiling using the most sophisticated instructions available to the designated C.P.U and using knowledge of the algorithm to tell the compiler to use SIMD instructions at specific stages.
But before any fancy changes to your code, make sure are you C++ compiling to your C.P.U, not something much more primitive (Pentium ?).
Edit:
If your C++ program does a lot of unwise allocations and deallocations this will also explain it.

In another thread, I pointed out that doing a direct translation from one language to another will almost always end up in the version in the new language running more poorly.
Different languages take different techniques.

Try the intel compiler. Its much better the VC or gcc. As for the original question, I would be skeptical. Try to avoid using any containers and minimize the memory allocations in the offending function.

[Joke]There is an error in line 13[/Joke]
Now, seriously, no one can answer the question without the source code.
But as a rule of the thumb, the fact that C++ is that much slower than managed one most likely points to the difference of memory management and object ownership issues.
For instance, if your algorithm is doing any dynamic memory allocations inside the processing loop, this will affect the performance. If you pass heavy structures by the value, this will affect the performance. If you do unnecessary copies of objects, this will affect the performance. Exception abuse will cause performance to go south. And still counting.
I know the cases when forgotten "&" after the parameter name resulted in weeks of profiling/debugging:
void DoSomething(const HeavyStructure param); // Heavy structure will be copied
void DoSomething(const HeavyStructure& param); // No copy here
So, check your code to find possible bottlenecks.

C++ is not a language where you must use classes. In my opinion its not logical to use OOP methodologies where it doesnt really help. For a encrypter / decrypter its best not use classes; use arrays, pointers, use as few functions / classes / files possible. Best encryption system consists of a single file containing few functions. After your function works nice you can wrap it into classes if you wish. Also check the release build. There is huge speed difference

Nothing is faster than good machine/assembly code, so my goal when writing C/C++ is to write my code in such a way that the compiler understands my intentions to generate good machine code. Inlining is my favorite way to do this.
First, here's an aside. Good machine code:
uses registers more often than memory
rarely branches (if/else, for, and while)
uses memory more often than functions calls
rarely dynamically allocates any more memory (from the heap) than it already has
If you have a small class with very little code, then implement its methods in the body of the class definition and declare it locally (on the stack) when you use it. If the class is simple enough, then the compiler will often only generate a few instructions to effect its behavior, without any function calls or memory allocation to slow things down, just as if you had written the code all verbose and non-object oriented. I usually have assembly output turned on (/FAs /Fa with Visual C++) so I can check the output.
It's nice to have a language that allows you to write high-level, encapsulated object-oriented code and still translate into simple, pure, lightning fast machine code.

Here's my 2 cents.
I wrote a BlowFish cipher in C (and C#). The C# was almost 'identical' to the C.
How I compiled (i cant remember the numbers now, so just recalled ratios):
C native: 50
C managed: 15
C#: 10
As you can see, the native compilation out performs any managed version. Why?
I am not 100% sure, but my C version compiled to very optimised assembly code, the assembler output almost looked the same as a hand written assembler one I found.

Related

Accessing math coprocessor from C#

How can I access math coprocessor from C# code? I would like to make some calculations on integers as fast as it's possible. I know it's possible under C++ compliers to use Assembler code inside it, but what about .Net?

The JIT compiler knows about the math coprocessor and will use it. What you really want is to use the SIMD engine, not the math coprocessor. This was part of the promise of JIT-compilation, that the runtime could pick the fastest hardware acceleration available on each computer, but I don't think .NET actually does that, at least in v4.
Or are you using the term "math coprocessor" to mean something other than the x87 FPU? There are some FPGA boards marketed as accelerator/coprocessor systems. If that's what you mean, you'll need to consult the programming manual that comes with the particular product. There are no special CPU instructions for accessing those, inline assembler wouldn't be helpful in this case.
For example, the GPU is even faster at math on large datasets than the CPU's SIMD engine, and you can access that from .NET using DirectX Compute Shaders (or p/invoking OpenCL), no assembler required.

I don't think that this would be possible to do directly from managed code. You could still call unmanaged code which does those calculations but whether the cost of interop marshaling is worth it is difficult to say. You will have to minimize it as much as possible and do all the calculations in unmanaged code and do only a single call to minimize overhead.

No, you cannot directly use inline assembler in C# managed code.
Your best bet is to make sure your general approach/algorithm is clean and efficient, and your math operations are clean and efficient, and then rely on the compiler to make efficient use of the available coprocessor.

This is not natively supported by C# as a language, nor .NET as a framework.
If you need that kind of speed or prowess, use something else altogether.

I know this is an old post, but for those coming here for similar reason of speeding up maths operations, for example a large number of vector operations.
To get the greatest speed from C# in maths you should convert your formulae to the logarithmic equivalent. This takes some practice, but once you have the idea you can do it with every formulae. Then decide to keep your values in log form, only converting to human readable form for those values the user needs to see.
The reason logs work faster is because they are all addition and subtraction (subtraction just being the addition of a compliment number), your processors can do these in large numbers with ease.
If you have not done this sort of maths before there are lessons online that will lead you through it, it has a learning curve but for maths/graphics programmers the learning curve is worth it.

C# - Default library has better performance?

Earlier today i made myself a lightweight memory stream, which basically writes to a byte array. I thought i'd benchmark the two of them to see if there's any difference - And there was:
(writing 1 byte to the array)
MemoryStream: 1.0001ms
mine: 3.0004ms
Everyone tells me that MemoryStream basically provides a byte array and a bunch of methods to work with it.
My question: Does the default C# library have a slightly better performance than the code we write? (maybe it runs in release rather than debug?)

The .NET implementation was probably a bit better than your own, but also, how did you benchmark? A couple of million iterations, or just a few? Remember that you need to use a large test base so that you can eliminate some data (CPU being called away for a moment, etc) that will give false results.

The folks at Microsoft are much smarter than you and I and most likely have written a better optimized wrapper over Byte[], much better than something that you or I would implement.
If you are curious, I would suggest that you disassemble the types that you have recreated to see how exactly Microsoft has implemented them. In some of the more important areas of the framework (such as this I would imagine) you will find that the BCL calls out to unmanaged code to accomplish its goals.
Unmanaged code has a much better chance of outperforming managed code in cases like this since you can freely work with arrays without the overhead of a managed runtime (for things like bounds checking and such).

Many of the framework assemblies are NGENed, which may give them a small boost by bypassing the initial JIT time. This is unlikely to be the cause of a 2ms difference, especially if you'd already warmed up your methods before starting the stopwatch, but I mention it for completeness.
Also, yes, the framework assemblies are built in "release" mode (optimisations on and checks off), not "debug."

You probably used Array.Copy() instead of the faster Buffer.BlockCopy(). The fastest way is to use unsafe code with pointers. Check out how they do this in the Mono project (search for memcpy).

Id wager that Microsoft's implementation is a wee bit better than yours. ;)
Did you check the source?

Pointers in C# and how frequently it is used in the application?

For me , the Pointer was one of the hardest concept in programming languages in C++. When I was learning C++, I spent tremendous amount of time learning it. However, Now I primarily work in projects that are entirely written in languages like C#, and VB.NET etc. As a matter fact, I have NOT touched C++ for almost 4 years. Even though, C# has pointer, but I have not encouter the situation where I must use pointer in C#. So my question is , what kinds of productivity can we obtain in C# by using pointer ? what are the situation where the uses of the pointer is must?

You're already using lots of pointers in C#, except that they don't look like pointers. Every time you do something with an instance of a class, that's a pointer. You're getting almost all the potential benefit already, without the hassle.
It is possible to use pointers more explicitly in C#, which is what most people mean by C# pointers, but I would think the occasions would be very rare. They may be useful to link to C libraries and the like, but other than that I don't see much use for them.

Personally, I've never had a need for using pointers in .NET, but if you're dealing with absolute performance critical code, you'd use pointers. If you look at the System.String class, you'll see that a lot of the methods that handle the string manipulation, use pointers. Also, when dealing with image processing, very often it's useful to use pointers. Now, one can definitely argue whether those sort of applications should be written in .NET in the first place (I think they should), but at least if you need to squeeze out that extra bit of speed, you can.

I use pointers in C# only in rare circumstances that mostly have to do with sending/receiving data, where you have to convert a byte array to a struct and vice-versa. Though even then, you don't have to deal with pointers directly typically.
In some cases, you can use pointers to improve performance, because with the Marshaller, sometimes you have to copy memory to access data, while with pointers, you can access it directly (think Bitmap.Lock()).

Personally, I've never needed to use a pointer in C#. If I need that kind of functionality, I write that code in C++/CLI, and call it from C#. If I need to pass pointers from C# to C++/CLI or vice-versa, I pass them around as an IntPtr and cast it to the type I need in C++/CLI.
In my opinion - if you're using pointers in C#, in 99% of cases, you're using the language wrong.
Edit:
The nice thing about C++/CLI is that you can mark individual classes for native-only compilation. I do a lot of image processing work which needs to happen very quickly; it uses a lot of pointer-based code. I generally have a managed C++/CLI object forward calls to a native C++ object where my processing takes place. I turn on optimizations for that native code and viola, I get a nice performance gain.
Granted, this only matters if the performance gain you get by executing native, optimized code can offset the overhead of managed to unmanaged transitions. In my case, it always does.

Is Python slower than Java/C#? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Is Python slower than Java/C#?
performance-comparison-c-java-python-ruby-jython-jruby-groovy
Here is a project that optimizes CPython: unladen-swallow

Don't conflate Language and Run-Time.
Python (the language) has many run-time implementations.
CPython is usually interpreted, and will be slower than native-code C#. It might be slower than Java, depending on the Java JIT compiler.
JYthon is interpreted in the JVM and has the same performance profile as Java.
IronPython relies on the same .NET libraries and IL as C#, so the performance difference will be relatively small.
Python can be translated to native code via PyREX, PyToC, and others. In this case, it will generally perform as well as C++. You can -- to an extent -- further optimize C++ and perhaps squeeze out a little bit better performance than unoptimized output from PyREX.
For more information, see http://arcriley.blogspot.com/2009/03/so-long-pyrex.html
Note that Python (the language) is not slow. Some Python run-times (CPython, for example) will be slower than native-code C++.

It is not really correct to ask why Python is slower than Java/C#. How fast is Java? Well, naive interpreters are around ten times slower than optimised compilers. I believe there is a Java bytcode interpreter written in JavaScript - that probably isn't very fast. So, the intended question appears to be "Why is the CPython language system slower than the equivalent Sun, IBM and Oracle JRE and Microsoft .NET runtime?"
I believe the correct answer is non-technical. The fastest Java and .NET runtime are faster because they have large full time technical teams developing them in performance-competitive environment.
Dynamic language systems are easy to implement. Any idiot can do it. I have. Static language systems are more complex to design and implement. A simple static system will tend to run much faster than the equivalent just-working dynamic equivalent. However, it is possible for highly optimised dynamic systems to run almost as fast. I understand some Smalltalk implementation were quite good. An often quoted example of a developed dynamic system is the MIT Lisp Machine.
In addition if the real grunt is being done by library code, then the language system may not matter. Alternatively, the language may encourage (or give time(!)) to develop more efficient algorithms which can easily wipe out constant factor performance differences.

As mentioned in the other answers this depends on the run-time system as well as the task at hand. So the standard (C)Python is not necessarily slower than Java or C#. Some of its modules are implemented in C. Thus combining speed of a native implementation with Python's language.
We did a small experiment: We compared the execution time of a Factorial computation in different languages. The test was actually intended to evaluate the performance of arbitrary-precision integers implementations.
testee. language arbitrary-precision integers run-time
1. Java java.math.BigInteger JRE 6.13
2. .NET System.Numerics.BigInteger MS CLR 4.0
3. Python long Active Python 2.6.2.2
4. Squeak BigInt Squeak 3.10.2
5. .NET Mono.Math.BigInteger MS CLR 4.0
results:
1) 2) 3) 4) 5)
10.000! 343 ms 137 ms 91 ms 1.200 ms 169 ms
20.000! 1.480 ms 569 ms 372 ms 1.457 ms 701 ms
30.000! 3.424 ms 1.243 ms 836 ms 3.360 ms 1.675 ms
40.000! 6.340 ms 2.101 ms 1.975 ms 6.738 ms 3.042 ms
50.000! 10.493 ms 3.763 ms 3.658 ms 10.019 ms 5.242 ms
60.000! 15.586 ms 7.683 ms 5.788 ms 14.241 ms 10.000 ms
(source: mycsharp.de)
The bar chart shows the results. Python is the clear winner. As far as I know Python uses the Karatsuba-algorithm to multiply large integers, which explains the speed.
Besides, Python's "arbitrary-precision integers"-type is the built-in long. Hence you don't even need special type handling which is required for Java's BigInteger-class.

Simply - Python is slow.
No matter what interpreter (currently available) you use, it is slower than Java and C. In various benchmarks, its slower than Ruby and PHP.
Do not depend on other's answers, check and verify yourself.
http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=python3&lang2=java&data=u64q
Personally I do not think, there is much serious contribution and development done on getting python faster. Since the productivity is good in python and it solves some of problem straight forward, speed/performance is not taken seriously. There are some architecture issues too preventing Python getting performance tweaks.
Disclaimer - This answer probably will hurt Python lovers. I too am Python developer, loves developing webapps in Django/Flask/Pyramid rather than Spring (Java). But I see practically in my work and experience, how Python is slower. The speed is not always my priority. But I do stand with them, who says Python Interpreter should get oiling and greasing or total engine change to at least stand in marathon. It's a mainstream programming language.

As suggested in comments, you should really provide a test case to reason about. Reasons behind performance differences will change depending on the test being executed.
However, I'd suggest that the static vs dynamic nature may well have a lot to do with it. For non-virtual calls, the JIT-compiled C#/Java is extremely cheap as it can be determined accurately at JIT-time. Even virtual calls just involve a single level of redirection. When binding becomes dynamic, there's a wider range of things to consider.
I don't know enough details about Python to claim to understand its exact runtime behaviour, which I suspect may vary with version and implementation too. There is such a thing as "python byte code" which is then executed by a virtual machine - whether this virtual machine actually performs JIT-compilation or not is another matter.

It boils down to the fact that the compilation phase has lesser information to work with and hence the runtime needs to do more work in case of duck typed (dynamically typed) languages.
Thus if I am making the method invocation foo.bar(), in case of Java or C++ the invocation to bar can be optimized in the compilation process by discovering the type of "foo" and then directly invoking the method at the memory location where the compiler knows it will be found. Since a python or any other dynamically typed language compiler does not know what type the object foo belongs to, it has to do a type check at runtime and then look up the address of the bar method and then invoke it.
There are other difficulties a python compiler writer struggles with as well, though the one above hopefully adequately gives an indication. So even with the best compiler writers, statically typed languages are likely to perform much better at runtime.
Where dynamically typed languages score are typically in the development time. Due to fewer lines of code to write and maintain, and no compile wait times for developers, the development often goes through much faster.

What you got there is clear example of writing Java in Python:
def __init__(self,size):
self.first = None
last = None
for i in range(size):
current = Person(i)
if self.first == None : self.first = current
if last != None :
last.next = current
current.prev = last
last = current
self.first.prev = last
last.next = self.first
A bit more pythonic:
def __init__(self,size):
chain = [Person(i) for i in range(size)]
self.first = chain[0]
chain = zip(chain, chain[1:].append(chain[0]))
for p,n in chain:
p.next = n
n.prev = p

I think it's ultimately that Python doesn't go as far as it can with optimizations. Most of the optimization techniques that are common are for static languages. There are optimization techniques for dynamic languages, but the modern ones don't seem to make as much use of them as they could. Steve Yegge has an excellent blog post on the subject.
EDIT: I just wanted to point out that I'm not necessarily stating this to be critical of Python. I prefer simplicity over unnecessary speed any day.

It doesn't have anything to do with the languages themselves, it's just the fact that java implementation and runtime system (JVM) are very high quality, and that lots of resources have been invested in stability, scalability and performance improvements over the years.
Contrast that to the fact that CPython implementation just recently implemented eg threaded dispatch in its interpreter which gave it performance boost of up to 20% for certain problems. It's not a good thing as it sounds, it is bad because that kind of basic optimization should be there from the day one.

I think opposite. I can do simple program in Python faster than in Java,
and those Python scripts work really fast.
Of course your question without examples is hard to answer. Maybe you have found slow library, bug etc. Give us more details please.

Since it's interpreted and not compiled.. it should be slower in execution time.
As a table mentioned in Code Complete (second edition) book, page 600,
C# equals C++ in execution time (1:1). And Python is slower above hundred times than C++ in execution time (>100:1).
And Java is slower than C++ by one time and a half (1.5:1).
These statistics are on average. I don't know who made this study, but seems interesting.

This type of question can't be answered just by qualitative reasoning, you need good benchmarks to back it up. Here's one set that compare Python 3 vs C# Mono and find Python to be 3 to 300 times slower. The Python vs. Java results are similar. (The usual cautions about interpreting benchmarks apply.)
These benchmarks also report the source code size, and Python was significantly more concise than Java and C#.

I would argue that the ease and simplicity of writing Python code makes it possible to write more complex code; for example, code that takes advantage of multi-core processors. Since per-core performance has been mostly stagnant for the past 5-10 years, I don't think it's clear that Python programs (whether they're running on CPython or something else) are slower in the long run.

C# / F# Performance comparison

Is there any C#/F# performance comparison available on web to show proper usage of new F# language?

Natural F# code (e.g. functional/immutable) is slower than natural (imperative/mutable object-oriented) C# code. However, this kind of F# is much shorter than usual C# code.
Obviously, there is a trade-off.
On the other hand, you can, in most cases, achieve performance of F# code equal to performance of C# code. This will usually require coding in imperative or mutable object-oriented style, profile and remove bottlenecks. You use that same tools that you would otherwise use in C#: e.g. .Net reflector and a profiler.
That having said, it pays to be aware of some high-productivity constructs in F# that decrease performance. In my experience I have seen the following cases:
references (vs. class instance variables), only in code executed billions of times
F# comparison (<=) vs. System.Collections.Generic.Comparer, for example in binary search or sort
tail calls -- only in certain cases that cannot be optimized by the compiler or .Net runtime. As noted in the comments, depends on the .Net runtime.
F# sequences are twice slower than LINQ. This is due to references and the use of functions in F# library to implement translation of seq<_>. This is easily fixable, as you might replace the Seq module, by one with same signatures that uses Linq, PLinq or DryadLinq.
Tuples, F# tuple is a class sorted on the heap. In some case, e.g. a int*int tuple it might pay to use a struct.
Allocations, it's worth remembering that a closure is a class, created with the new operator, which remembers the accessed variables. It might be worth to "lift" the closure out, or replaced it with a function that explicitly takes the accessed variables as arguments.
Try using inline to improve performance, especially for generic code.
My experience is to code in F# first and optimize only the parts that matter. In certain cases, it might be easier to write the slow functions in C# rather that to try to tweak F#. However, from programmer efficiency point of view makes sense to start/prototype in F# then profile, disassemble and optimize.
Bottom line is, your F# code might end-up slower than C# because of program design decisions, but ultimately efficiency can be obtained.

See these questions that I asked recently:
Is a program F# any more efficient (execution-wise) than C#?
How can I use functional programming in the real world?
Is it possible that F# will be optimized more than other .Net languages in the future?

Here are a few links on (or related to) this topic:
http://cs.hubfs.net/forums/thread/3207.aspx
http://strangelights.com/blog/archive/2007/06/17/1588.aspx
http://khigia.wordpress.com/2008/03/30/ocaml-vs-f-for-big-integer-surprising-performance-test/
http://cs.hubfs.net/blogs/f_team/archive/2006/08/15/506.aspx
http://blogs.msdn.com/jomo_fisher/
What I seem to remember from another post on Robert Pickering's blog (or was it Scott Hanselman?) that in the end, because both are sitting on the same framework, you can get the same performance from both, but you sometimes have to 'twist' the natural expression of the language to do so. In the example I recall, he had to twist F# to get comparable performance with C#...

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.