I am working on a project where I am heavily using regexes. The regular expressions that I am using are quite complicated and I have to set an appropriate timeout to stop the execution, so that it doesn't try to match the string for long time.
The problem is that I have noticed that running the same regular expression (compiled) on the same string is being executed with different running times, varying from 17ms to 59ms.
Do you have any idea why it is the case? I am measuring the run time using Stopwatch like this:
for (int i = 0; i < 15; i++)
{
sw.Start();
regex.IsMatch(message);
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds);
sw.Reset();
}
For reference I am using the default regular expressions library from .NET in System.Text.RegularExpressions.
According to the comments, I modified the code in the following way:
List<long> results = new List<long>();
for (int i = 0; i < 150; i++)
{
sw.Start();
for (int j = 0; j < 20; j++ )
{
regex.IsMatch(message);
}
sw.Stop();
results.Add(sw.ElapsedMilliseconds);
sw.Reset();
}
Debug.WriteLine(results.Max());
Debug.WriteLine(results.Average());
Debug.WriteLine(results.Min());
and the output for this was:
790
469,086666666667
357
Still the difference is very significant for me.
Since you say you are using RegexOptions.Compiled, please refer to the regex performance tips from David Gutierrez's blog:
In this case, we first do the work to parse into opcodes. Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance. There are even more costs for compilation that should mentioned, however. Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that's not memory that you'll ever get back... The bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly.
That means that running the regex match first time, this additional work ("compile time") is performed, and all subsequent times the regex is executed without that preparation.
However, beginning with .NET 2.0, the behavior of caching has modified a bit:
In the .NET Framework 2.0, only regular expressions used in static method calls are cached. By default, the last 15 regular expressions are cached, although the size of the cache can be adjusted by setting the value of the CacheSize property.
It's common situation for any managed platform Java/.NET - while they do some things behind the scene GC for example, and while we use concurent OS-es (win, linux) such tests are not exactly measeare. You think that you are testing regex itself - but you test .NET, Windows, and your antivirus at same time too.
One valid way is execute regex for 50-1000 times, summarize time and eval average duration. For example rewrite:
sw.Start();
for (int i = 0; i < 1000; i++)
{
regex.IsMatch(message);
}
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds / 1000);
and i think you result will be much stable. But you still will get some range of values for ex [15ms .. 18ms], and that is described upper.
If you want really perfect measure (but your question... sory man... show that you not really want it). You require to use PROFILER that will give you exactly measure of time inside regex call without anything except it.
Related
I have the two following pieces of code in C# and D, the goal was to compare speed in a simple loop.
D:
import std.stdio;
import std.datetime;
void main() {
StopWatch timer;
long summer = 0;
timer.start();
for (long i = 0; i < 10000000000; i++){
summer++;
}
timer.stop();
long interval_t = timer.peek().msecs;
writeln(interval_t);
}
Output: about 30 seconds
C#:
using System;
using System.Diagnostics;
class Program{
static void Main(){
Stopwatch timer = new Stopwatch();
timer.Start();
long summer = 0;
for(long i = 0; i < 10000000000; i++){
summer++;
}
timer.Stop();
Console.WriteLine(timer.ElapsedMilliseconds);
}
}
Output: about 8 seconds
Why is the C# code so much faster?
There's a little more to this than just saying: "You didn't turn on the optimizer."
At least at a guess, you didn't (initially) turn on the optimizer in either case. Despite this, the C# version without optimization turned on ran almost as fast as the D version with optimization. Why would that be?
The answer stems from the difference in compilation models. D does static compilation, so the source is translated to an executable containing machine code, which then executes. The only optimization that happens is whatever is done during that static compilation.
C#, by contrast, translates from source code to MSIL, an intermediate language (i.e., basically a bytecode). That is then translated to machine language by the JIT compiler built into the CLR (common language runtime--Microsoft's virtual machine for MSIL). You can specify optimization when you run the C# compiler. That only controls optimization when doing the initial compilation from source to byte code. When you run the code, the JIT compiler does its thing--and it does its optimization whether you specify optimization in the initial translation from source to byte code or not. That's why you get much faster results with C# than with D when you didn't specify optimization with either one.
I feel obliged to add, however, that both results you got (7 and 8 seconds for D and C# respectively) are really pretty lousy. A decent optimizer should recognize that the final output didn't depend on the loop at all, and based on that it should eliminate the loop completely. Just for comparison, I did (about) the most straightforward C++ translation I could:
#include <iostream>
#include <time.h>
int main() {
long summer = 0;
auto start = clock();
for (long i = 0; i < 10000000000; i++)
summer++;
std::cout << double(clock() - start) / CLOCKS_PER_SEC;
}
Compiled with VC++ using cl /O2b2 /GL, this consistently shows a time of 0.
I believe your question should be titled:
Why are for loops compiled by <insert your D compiler here> so much slower than for loops compiled by <insert your C# compiler/runtime here>?
Performance can vary dramatically across implementations, and is not a trait of the language itself. You are probably using DMD, the reference D compiler, which is not known for using a highly-optimizing backend. For best performance, try the GDC or LDC compilers.
You should also post the compilation options you used (optimizations may have been enabled with only one compiler).
See this question for more information:
How fast is D compared to C++?
Several answers have suggested that an optimizer would optimize the entire loop away.
Mostly they explicitly don't do that as they expect the programmer coded the loop that way as a timing loop.
This technique is often used in hardware drivers to wait for time periods shorter than the time taken to set a timer and handle the timer interrupt.
This is the reason for the "bogomips" calculation at linux boot time... To calibrate how many iterations of a tight loop per second this particular CPU/compiler can do.
I'm refactoring my app to make it faster. I was looking for tips on doing so, and found this statement:
"ForEach can simplify the code in a For loop but it is a heavy object and is slower than a loop written using For."
Is that true? If it was true when it was written, is it still true today, or has foreach itself been refactored to improve performance?
I have the same question about this tip from the same source:
"Where possible use arrays instead of collections. Arrays are normally more efficient especially for value types. Also, initialize collections to their required size when possible."
UPDATE
I was looking for performance tips because I had a database operation that was taking several seconds.
I have found that the "using" statement is a time hog.
I completely solved my performance problem by reversing the for loop and the "using" (of course, refactoring was necessary for this to work).
The slower-than-molasses code was:
for (int i = 1; i <= googlePlex; i++) {
. . .
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
. . .
InsertRecord();
. . .
The faster-than-a-speeding-bullet code is:
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
for (int i = 1; i <= googlePlex; i++) {
. . .
InsertRecord();
. . .
Short answer:
Code that is hard to read eventually results in software that behaves and performs poorly.
Long answer:
There was a culture of micro-optimization suggestions in early .NET. Partly it was because a few Microsoft's internal tools (such as FxCop) had gained popularity among general public. Partly it was because C# had and has aspirations to be a successor to assembly, C, and C++ regarding the unhindered access to raw hardware performance in the few hottest code paths of a performance critical application. This does require more knowledge and discipline than a typical application, of course. The consequences of performance related decisions in framework code and in app code are also quite different.
The net impact of this on C# coding culture has been positive, of course; but it would be ridiculous to stop using foreach or is or "" just in order to save a couple CIL instructions that your recent jitter could probably optimize away completely if it wanted to.
There are probably very many loops in your app and probably at most one of them might be a current performance bottleneck. "Optimizing" a non-bottleck for perfomance at the expense of readability is a very bad deal.
It's true in many cases that foreach is slower than an equivalent for. It's also true that
for (int i = 0; i < myCollection.Length; i++) // Compiler must re-evaluate getter because value may have changed
is slower than
int max = myCollection.Length;
for (int i = 0; i < max; i++)
But that probably will not matter at all. For a very detailed discussion see Performance difference for control structures 'for' and 'foreach' in C#
Have you done any profiling to determine the hot spots of your application? I would be astonished if the loop management overhead is where you should be focusing your attention.
You should try profiling your code with Red Gate ANTS or something of that ilk - you will be surprised.
I found that in an application I was writing it was the parameter sniffing in SQL that took up 25% of the processing time. After writing a command cache which sniffed the params at the start of the application, there was a big speed boost.
Unless you are doing a large amount of nested for loops, I don't think you will see much of a performance benefit from changing your loops. I can't imagine anything but a real time application such as a game or a large number crunching or scientific application would need that kind of optimisation.
Yes. The classic for is a bit faster than a foreach as the iteration is index based instead of access the element of the collection thought an enumerator
static void Main()
{
const int m = 100000000;
//just to create an array
int[] array = new int[100000000];
for (int x = 0; x < array.Length; x++) {
array[x] = x;
}
var s1 = Stopwatch.StartNew();
var upperBound = array.Length;
for (int i = 0; i < upperBound; i++)
{
}
s1.Stop();
GC.Collect();
var s2 = Stopwatch.StartNew();
foreach (var item in array) {
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.Read();
//2.49 ns
//4.68 ns
// In Release Mode
//0.39 ns
//1.05 ns
}
We were having a performance issue in a C# while loop. The loop was super slow doing only one simple math calc. Turns out that parmIn can be a huge number anywhere from 999999999 to MaxInt. We hadn't anticipated the giant value of parmIn. We have fixed our code using a different methodology.
The loop, coded for simplicity below, did one math calc. I am just curious as to what the actual execution time for a single iteration of a while loop containing one simple math calc is?
int v1=0;
while(v1 < parmIn) {
v1+=parmIn2;
}
There is something else going on here. The following will complete in ~100ms for me. You say that the parmIn can approach MaxInt. If this is true, and the ParmIn2 is > 1, you're not checking to see if your int + the new int will overflow. If ParmIn >= MaxInt - parmIn2, your loop might never complete as it will roll back over to MinInt and continue.
static void Main(string[] args)
{
int i = 0;
int x = int.MaxValue - 50;
int z = 42;
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
while (i < x)
{
i += z;
}
st.Stop();
Console.WriteLine(st.Elapsed.Milliseconds.ToString());
Console.ReadLine();
}
Assuming an optimal compiler, it should be one operation to check the while condition, and one operation to do the addition.
The time, small as it is, to execute just one iteration of the loop shown in your question is ... surprise ... small.
However, it depends on the actual CPU speed and whatnot exactly how small it is.
It should be just a few machine instructions, so not many cycles to pass once through the iteration, but there could be a few cycles to loop back up, especially if branch prediction fails.
In any case, the code as shown either suffers from:
Premature optimization (in that you're asking about timing for it)
Incorrect assumptions. You can probably get a much faster code if parmIn is big by just calculating how many loop iterations you would have to perform, and do a multiplication. (note again that this might be an incorrect assumption, which is why there is only one sure way to find performance issues, measure measure measure)
What is your real question?
It depends on the processor you are using and the calculation it is performing. (For example, even on some modern architectures, an add may take only one clock cycle, but a divide may take many clock cycles. There is a comparison to determine if the loop should continue, which is likely to be around one clock cycle, and then a branch back to the start of the loop, which may take any number of cycles depending on pipeline size and branch prediction)
IMHO the best way to find out more is to put the code you are interested into a very large loop (millions of iterations), time the loop, and divide by the number of iterations - this will give you an idea of how long it takes per iteration of the loop. (on your PC). You can try different operations and learn a bit about how your PC works. I prefer this "hands on" approach (at least to start with) because you can learn so much more from physically trying it than just asking someone else to tell you the answer.
The while loop is couple of instructions and one instruction for the math operation. You're really looking at a minimal execution time for one iteration. it's the sheer number of iterations you're doing that is killing you.
Note that a tight loop like this has implications on other things as well, as it bogs down one CPU and it blocks the UI thread (if it's running on it). Thus, not only it is slow due to the number of operations, it also adds a perceived perf impact due to making the whole machine look unresponsive.
If you're interested in the actual execution time, why not time it for yourself and find out?
int parmIn = 10 * 1000 * 1000; // 10 million
int v1=0;
Stopwatch sw = Stopwatch.StartNew();
while(v1 < parmIn) {
v1+=parmIn2;
}
sw.Stop();
double opsPerSec = (double)parmIn / sw.Elapsed.TotalSeconds;
And, of course, the time for one iteration is 1/opsPerSec.
Whenever someone asks about how fast control structures in any language you know they are trying to optimize the wrong thing. If you find yourself changing all your i++ to ++i or changing all your switch to if...else for speed you are micro-optimizing. And micro optimizations almost never give you the speed you want. Instead, think a bit more about what you are really trying to do and devise a better way to do it.
I'm not sure if the code you posted is really what you intend to do or if it is simply the loop stripped down to what you think is causing the problem. If it is the former then what you are trying to do is find the largest value of a number that is smaller than another number. If this is really what you want then you don't really need a loop:
// assuming v1, parmIn and parmIn2 are integers,
// and you want the largest number (v1) that is
// smaller than parmIn but is a multiple of parmIn2.
// AGAIN, assuming INTEGER MATH:
v1 = (parmIn/parmIn2)*parmIn2;
EDIT: I just realized that the code as originally written gives the smallest number that is a multiple of parmIn2 that is larger than parmIn. So the correct code is:
v1 = ((parmIn/parmIn2)*parmIn2)+parmIn2;
If this is not what you really want then my advise remains the same: think a bit on what you are really trying to do (or ask on Stackoverflow) instead of trying to find out weather while or for is faster. Of course, you won't always find a mathematical solution to the problem. In which case there are other strategies to lower the number of loops taken. Here's one based on your current problem: keep doubling the incrementer until it is too large and then back off until it is just right:
int v1=0;
int incrementer=parmIn2;
// keep doubling the incrementer to
// speed up the loop:
while(v1 < parmIn) {
v1+=incrementer;
incrementer=incrementer*2;
}
// now v1 is too big, back off
// and resume normal loop:
v1-=incrementer;
while(v1 < parmIn) {
v1+=parmIn2;
}
Here's yet another alternative that speeds up the loop:
// First count at 100x speed
while(v1 < parmIn) {
v1+=parmIn2*100;
}
// back off and count at 50x speed
v1-=parmIn2*100;
while(v1 < parmIn) {
v1+=parmIn2*50;
}
// back off and count at 10x speed
v1-=parmIn2*50;
while(v1 < parmIn) {
v1+=parmIn2*10;
}
// back off and count at normal speed
v1-=parmIn2*10;
while(v1 < parmIn) {
v1+=parmIn2;
}
In my experience, especially with graphics programming where you have millions of pixels or polygons to process, speeding up code usually involve adding even more code which translates to more processor instructions instead of trying to find the fewest instructions possible for the task at hand. The trick is to avoid processing what you don't have to.
What's the fastest way to parse strings in C#?
Currently I'm just using string indexing (string[index]) and the code runs reasonably, but I can't help but think that the continuous range checking that the index accessor does must be adding something.
So, I'm wondering what techniques I should consider to give it a boost. These are my initial thoughts/questions:
Use methods like string.IndexOf() and IndexOfAny() to find characters of interest. Are these faster than manually scanning a string by string[index]?
Use regex's. Personally, I don't like regex as I find them difficult to maintain, but are these likely to be faster than manually scanning the string?
Use unsafe code and pointers. This would eliminate the index range checking but I've read that unsafe code wont run in untrusted environments. What exactly are the implications of this? Does this mean the whole assembly won't load/run, or will only the code marked unsafe refuse to run? The library could potentially be used in a number of environments, so to be able to fall back to a slower but more compatible mode would be nice.
What else might I consider?
NB: I should say, the strings I'm parsing could be reasonably large (say 30k) and in a custom format for which there is no standard .NET parser. Also, performance of this code is not super critical, so this partly just a theoretical question of curiosity.
30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.
For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random) and sum all the character code-points via the indexer takes... 3ms:
var watch = Stopwatch.StartNew();
char[] chars = new char[128 * 1024];
Random rand = new Random(); // fill with junk
for (int i = 0; i < chars.Length; i++) chars[i] =
(char) ((int) 'a' + rand.Next(26));
int sum = 0;
string s = new string(chars);
int len = s.Length;
for(int i = 0 ; i < len ; i++)
{
sum += (int) chars[i];
}
watch.Stop();
Console.WriteLine(sum);
Console.WriteLine(watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
For files that are actually large, a reader approach should be used - StreamReader etc.
"Parsing" is quite an inexact term. Since you talks of 30k, it seems that you might be dealing with some sort of structured string which can be covered by creating a parser using a parser generator tool.
A nice tool to create, maintain and understand the whole process is the GOLD Parsing System by Devin Cook: http://www.devincook.com/goldparser/
This can help you create code which is efficient and correct for many textual parsing needs.
As for your points:
is usually not useful for parsing which goes further than splitting a string.
is better suited if there are no recursions or too complex rules.
is basically a no-go if you haven't really identified this as a serious problem. The JIT can take care of doing the range checks only when needed, and indeed for simple loops (the typical for loop) this is handled pretty well.
I was trying to figure out if a for loop was faster than a foreach loop and was using the System.Diagnostics classes to time the task. While running the test I noticed that which ever loop I put first always executes slower then the last one. Can someone please tell me why this is happening? My code is below:
using System;
using System.Diagnostics;
namespace cool {
class Program {
static void Main(string[] args) {
int[] x = new int[] { 3, 6, 9, 12 };
int[] y = new int[] { 3, 6, 9, 12 };
DateTime startTime = DateTime.Now;
for (int i = 0; i < 4; i++) {
Console.WriteLine(x[i]);
}
TimeSpan elapsedTime = DateTime.Now - startTime;
DateTime startTime2 = DateTime.Now;
foreach (var item in y) {
Console.WriteLine(item);
}
TimeSpan elapsedTime2 = DateTime.Now - startTime2;
Console.WriteLine("\nSummary");
Console.WriteLine("--------------------------\n");
Console.WriteLine("for:\t{0}\nforeach:\t{1}", elapsedTime, elapsedTime2);
Console.ReadKey();
}
}
}
Here is the output:
for: 00:00:00.0175781
foreach: 00:00:00.0009766
Probably because the classes (e.g. Console) need to be JIT-compiled the first time through. You'll get the best metrics by calling all methods (to JIT them (warm then up)) first, then performing the test.
As other users have indicated, 4 passes is never going to be enough to to show you the difference.
Incidentally, the difference in performance between for and foreach will be negligible and the readability benefits of using foreach almost always outweigh any marginal performance benefit.
I would not use DateTime to measure performance - try the Stopwatch class.
Measuring with only 4 passes is never going to give you a good result. Better use > 100.000 passes (you can use an outer loop). Don't do Console.WriteLine in your loop.
Even better: use a profiler (like Redgate ANTS or maybe NProf)
I am not so much in C#, but when I remember right, Microsoft was building "Just in Time" compilers for Java. When they use the same or similar techniques in C#, it would be rather natural that "some constructs coming second perform faster".
For example it could be, that the JIT-System sees that a loop is executed and decides adhoc to compile the whole method. Hence when the second loop is reached, it is yet compiled and performs much faster than the first. But this is a rather simplistic guess of mine. Of course you need a far greater insight in the C# runtime system to understand what is going on. It could also be, that the RAM-Page is accessed first in the first loop and in the second it is still in the CPU-cache.
Addon: The other comment that was made: that the output module can be JITed a first time in the first loop seams to me more likely than my first guess. Modern languages are just very complex to find out what is done under the hood. Also this statement of mine fits into this guess:
But also you have terminal-outputs in your loops. They make things yet more difficult. It could also be, that it costs some time to open the terminal a first time in a program.
I was just performing tests to get some real numbers, but in the meantime Gaz beat me to the answer - the call to Console.Writeline is jitted at the first call, so you pay that cost in the first loop.
Just for information though - using a stopwatch rather than the datetime and measuring number of ticks:
Without a call to Console.Writeline before the first loop the times were
for: 16802
foreach: 2282
with a call to Console.Writeline they were
for: 2729
foreach: 2268
Though these results were not consistently repeatable because of the limited number of runs, but the magnitude of difference was always roughly the same.
The edited code for reference:
int[] x = new int[] { 3, 6, 9, 12 };
int[] y = new int[] { 3, 6, 9, 12 };
Console.WriteLine("Hello World");
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 4; i++)
{
Console.WriteLine(x[i]);
}
sw.Stop();
long elapsedTime = sw.ElapsedTicks;
sw.Reset();
sw.Start();
foreach (var item in y)
{
Console.WriteLine(item);
}
sw.Stop();
long elapsedTime2 = sw.ElapsedTicks;
Console.WriteLine("\nSummary");
Console.WriteLine("--------------------------\n");
Console.WriteLine("for:\t{0}\nforeach:\t{1}", elapsedTime, elapsedTime2);
Console.ReadKey();
The reason why is there are several forms of overhead in the foreach version that are not present in the for loop
Use of an IDisposable.
An additional method call for every element. Each element must be accessed under the hood by using IEnumerator<T>.Current which is a method call. Because it's on an interface it cannot be inlined. This means N method calls where N is the number of elements in the enumeration. The for loop just uses and indexer
In a foreach loop all calls go through an interface. In general this a bit slower than through a concrete type
Please note that the things I listed above are not necessarily huge costs. They are typically very small costs that can contribute to a small performance difference.
Also note, as Mehrdad pointed out, the compilers and JIT may choose to optimize a foreach loop for certain known data structures such as an array. The end result may just be a for loop.
Note: Your performance benchmark in general needs a bit more work to be accurate.
You should use a StopWatch instead of DateTime. It is much more accurate for performance benchmarks.
You should perform the test many times not just once
You need to do a dummy run on each loop to eliminate the problems that come with JITing a method the first time. This probably isn't an issue when all of the code is in the same method but it doesn't hurt.
You need to use more than just 4 values in the list. Try 40,000 instead.
You should be using the StopWatch to time the behavior.
Technically the for loop is faster. Foreach calls the MoveNext() method (creating a method stack and other overhead from a call) on the IEnumerable's iterator, when for only has to increment a variable.
I don't see why everyone here says that for would be faster than foreach in this particular case. For a List<T>, it is (about 2x slower to foreach through a List than to for through a List<T>).
In fact, the foreach will be slightly faster than the for here. Because foreach on an array essentially compiles to:
for(int i = 0; i < array.Length; i++) { }
Using .Length as a stop criteria allows the JIT to remove bounds checks on the array access, since it's a special case. Using i < 4 makes the JIT insert extra instructions to check each iteration whether or not i is out of bounds of the array, and throw an exception if that is the case. However, with .Length, it can guarantee you'll never go outside of the array bounds so the bounds checks are redundant, making it faster.
However, in most loops, the overhead of the loop is insignificant compared to the work done inside.
The discrepancy you're seeing can only be explained by the JIT I guess.
I wouldn't read too much into this - this isn't good profiling code for the following reasons
1. DateTime isn't meant for profiling. You should use QueryPerformanceCounter or StopWatch which use the CPU hardware profile counters
2. Console.WriteLine is a device method so there may be subtle effects such as buffering to take into account
3. Running one iteration of each code block will never give you accurate results because your CPU does a lot of funky on the fly optimisation such as out of order execution and instruction scheduling
4. Chances are the code that gets JITed for both code blocks is fairly similar so is likely to be in the instruction cache for the second code block
To get a better idea of timing, I did the following
Replaced the Console.WriteLine with a math expression ( e^num)
I used QueryPerformanceCounter/QueryPerformanceTimer through P/Invoke
I ran each code block 1 million times then averaged the results
When I did that I got the following results:
The for loop took 0.000676 milliseconds
The foreach loop took 0.000653 milliseconds
So foreach was very slightly faster but not by much
I then did some further experiments and ran the foreach block first and the for block second
When I did that I got the following results:
The foreach loop took 0.000702 milliseconds
The for loop took 0.000691 milliseconds
Finally I ran both loops together twice i.e for + foreach then for + foreach again
When I did that I got the following results:
The foreach loop took 0.00140 milliseconds
The for loop took 0.001385 milliseconds
So basically it looks to me that whatever code you run second, runs very slightly faster but not
enough to be of any significance.
--Edit--
Here are a couple of useful links
How to time managed code using QueryPerformanceCounter
The instruction cache
Out of order execution