Are foreach and the use of collections slow? - c#

I'm refactoring my app to make it faster. I was looking for tips on doing so, and found this statement:
"ForEach can simplify the code in a For loop but it is a heavy object and is slower than a loop written using For."
Is that true? If it was true when it was written, is it still true today, or has foreach itself been refactored to improve performance?
I have the same question about this tip from the same source:
"Where possible use arrays instead of collections. Arrays are normally more efficient especially for value types. Also, initialize collections to their required size when possible."
UPDATE
I was looking for performance tips because I had a database operation that was taking several seconds.
I have found that the "using" statement is a time hog.
I completely solved my performance problem by reversing the for loop and the "using" (of course, refactoring was necessary for this to work).
The slower-than-molasses code was:
for (int i = 1; i <= googlePlex; i++) {
. . .
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
. . .
InsertRecord();
. . .
The faster-than-a-speeding-bullet code is:
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
for (int i = 1; i <= googlePlex; i++) {
. . .
InsertRecord();
. . .

Short answer:
Code that is hard to read eventually results in software that behaves and performs poorly.
Long answer:
There was a culture of micro-optimization suggestions in early .NET. Partly it was because a few Microsoft's internal tools (such as FxCop) had gained popularity among general public. Partly it was because C# had and has aspirations to be a successor to assembly, C, and C++ regarding the unhindered access to raw hardware performance in the few hottest code paths of a performance critical application. This does require more knowledge and discipline than a typical application, of course. The consequences of performance related decisions in framework code and in app code are also quite different.
The net impact of this on C# coding culture has been positive, of course; but it would be ridiculous to stop using foreach or is or "" just in order to save a couple CIL instructions that your recent jitter could probably optimize away completely if it wanted to.
There are probably very many loops in your app and probably at most one of them might be a current performance bottleneck. "Optimizing" a non-bottleck for perfomance at the expense of readability is a very bad deal.

It's true in many cases that foreach is slower than an equivalent for. It's also true that
for (int i = 0; i < myCollection.Length; i++) // Compiler must re-evaluate getter because value may have changed
is slower than
int max = myCollection.Length;
for (int i = 0; i < max; i++)
But that probably will not matter at all. For a very detailed discussion see Performance difference for control structures 'for' and 'foreach' in C#
Have you done any profiling to determine the hot spots of your application? I would be astonished if the loop management overhead is where you should be focusing your attention.

You should try profiling your code with Red Gate ANTS or something of that ilk - you will be surprised.
I found that in an application I was writing it was the parameter sniffing in SQL that took up 25% of the processing time. After writing a command cache which sniffed the params at the start of the application, there was a big speed boost.
Unless you are doing a large amount of nested for loops, I don't think you will see much of a performance benefit from changing your loops. I can't imagine anything but a real time application such as a game or a large number crunching or scientific application would need that kind of optimisation.

Yes. The classic for is a bit faster than a foreach as the iteration is index based instead of access the element of the collection thought an enumerator
static void Main()
{
const int m = 100000000;
//just to create an array
int[] array = new int[100000000];
for (int x = 0; x < array.Length; x++) {
array[x] = x;
}
var s1 = Stopwatch.StartNew();
var upperBound = array.Length;
for (int i = 0; i < upperBound; i++)
{
}
s1.Stop();
GC.Collect();
var s2 = Stopwatch.StartNew();
foreach (var item in array) {
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.Read();
//2.49 ns
//4.68 ns
// In Release Mode
//0.39 ns
//1.05 ns
}

Related

Which takes less Cpu/memory/ram... etc?

I have some coding-resource realted questions :
Q1 :
A:
for (int i = 0; i < 15; i++)
{
FunA(i);
FunB(i);
}
Or
B:
for (int i = 0; i < 15; i++)
{
FunA(i);
}
for (int i = 0; i < 15 ; i++)
{
FunB(i);
}
does B take more resources than A because there are 2 loops ? and why ?
Q2:
A:
FunA(10*2-15+15/X);
FunA(10*2-15+15/X);
FunA(10*2-15+15/X);
B:
int result=10*2-15+15/X;
FunA(result);
FunA(result);
FunA(result);
does A take more resource because it have to calculate result every time or B takes more ? does compiler understand results are same and make a variable out of them?
Use a profiler.
For your second question the C# compiler is going to optimize compile-time constants such as 10*2-15+15.
Your second B example is better code despite the compiler potentially optimizing the A example for you.
Performance aside, the second A example is just bad coding practices. Unnecessary duplicate constants (more chance for user error, especially if you change this code later on).
That said, a good rule of thumb is don't rely on compiler optimizations. Try and make the code readable and for things that actually are constant you should literally define them as const:
const int WHATEVER = 10 * 2 - 15 + 15;
int result = WHATEVER / X;
FunA(result);
FunA(result);
FunA(result);
Finally, here's the obligatory mention that profiling is almost always more accurate than rolling your own benchmarks (which I see written incorrectly more times than not). Visual Studio 2017 has a built-in profiler or you can use a variety of other ones out there.
When in doubt about performance, don't guess. Get actual metrics.

Why C# doesn't simply translate in the best available option?

I was wondering how it is possible for 2 exactly equivalent peace of code to have differences in performance.
I don't know if it's still the case but I know that a for loop used to be faster than the foreach equivalent.
The same goes for Linq.
I don't quite understand why would this
for (int i = 0; i < 100000; i++)
{
ListDec = new List<decimal>();
foreach (string s in myStrings)
ListDec.Add(decimal.Parse(s));
}
be consinstntly 6% to 8% faster then
for (int i = 0; i < 100000; i++)
ListDec = myStrings.Select(decimal.Parse).ToList();
Probably I don't understand how compilers work, but intuitively I tend to ask myself if you design a language, and it turns out that one element is consistently slower that one of its equivalents, why don't you just make the compiler of your language translate the slower one in the faster one?
A compiler could simply convert a foreach loop in the equivalent for loop, and a linq expression in it's vanilla equivalent. This would make having differences in performance almost impossible because the fastest option would always be picked, in the end, at execution. While the coder could just pick the one he prefers without having to care about performance.
Sounds like an ideal world, but since our friends at Microsoft are not exactly that kind of dump guys, there must be something I'm missing.

Same regular expression executed in different running time in .NET

I am working on a project where I am heavily using regexes. The regular expressions that I am using are quite complicated and I have to set an appropriate timeout to stop the execution, so that it doesn't try to match the string for long time.
The problem is that I have noticed that running the same regular expression (compiled) on the same string is being executed with different running times, varying from 17ms to 59ms.
Do you have any idea why it is the case? I am measuring the run time using Stopwatch like this:
for (int i = 0; i < 15; i++)
{
sw.Start();
regex.IsMatch(message);
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds);
sw.Reset();
}
For reference I am using the default regular expressions library from .NET in System.Text.RegularExpressions.
According to the comments, I modified the code in the following way:
List<long> results = new List<long>();
for (int i = 0; i < 150; i++)
{
sw.Start();
for (int j = 0; j < 20; j++ )
{
regex.IsMatch(message);
}
sw.Stop();
results.Add(sw.ElapsedMilliseconds);
sw.Reset();
}
Debug.WriteLine(results.Max());
Debug.WriteLine(results.Average());
Debug.WriteLine(results.Min());
and the output for this was:
790
469,086666666667
357
Still the difference is very significant for me.
Since you say you are using RegexOptions.Compiled, please refer to the regex performance tips from David Gutierrez's blog:
In this case, we first do the work to parse into opcodes. Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance. There are even more costs for compilation that should mentioned, however. Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that's not memory that you'll ever get back... The bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly.
That means that running the regex match first time, this additional work ("compile time") is performed, and all subsequent times the regex is executed without that preparation.
However, beginning with .NET 2.0, the behavior of caching has modified a bit:
In the .NET Framework 2.0, only regular expressions used in static method calls are cached. By default, the last 15 regular expressions are cached, although the size of the cache can be adjusted by setting the value of the CacheSize property.
It's common situation for any managed platform Java/.NET - while they do some things behind the scene GC for example, and while we use concurent OS-es (win, linux) such tests are not exactly measeare. You think that you are testing regex itself - but you test .NET, Windows, and your antivirus at same time too.
One valid way is execute regex for 50-1000 times, summarize time and eval average duration. For example rewrite:
sw.Start();
for (int i = 0; i < 1000; i++)
{
regex.IsMatch(message);
}
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds / 1000);
and i think you result will be much stable. But you still will get some range of values for ex [15ms .. 18ms], and that is described upper.
If you want really perfect measure (but your question... sory man... show that you not really want it). You require to use PROFILER that will give you exactly measure of time inside regex call without anything except it.

Performance analyze for loop and foreach [duplicate]

This question already has answers here:
Performance difference for control structures 'for' and 'foreach' in C#
(9 answers)
Closed 9 years ago.
In this thread we discussed about performance analysis of the for loop and foreach.
Which one gives better performance - the for or foreach?
Here are two simple methods:
public static void TestFor()
{
Stopwatch stopwatch = Stopwatch.StartNew();
int[] myInterger = new int[1];
int total = 0;
for (int i = 0; i < myInterger.Length; i++)
{
total += myInterger[i];
}
stopwatch.Stop();
Console.WriteLine("for loop Time Elapsed={0}", stopwatch.Elapsed);
}
public static void TestForeach()
{
Stopwatch stopwatchForeach = Stopwatch.StartNew();
int[] myInterger1 = new int[1];
int totall = 0;
foreach (int i in myInterger1)
{
totall += i;
}
stopwatchForeach.Stop();
Console.WriteLine("foreach loop Time Elapsed={0}", stopwatchForeach.Elapsed);
}
Then I ran the above code the result was foreach loop Time Elapsed=00:00:00.0000003,
for loop Time Elapsed= 00:00:00.0001462. I think we want high performance code. We would use foreach
My decision would not be based on some simple performance loop like this. I am assuming that you have a frequent use of loops/large data sets. You will not notice the difference until we start talking about iterations in the hundreds of thousands (at a minimum).
1) If you are writing applications with potential memory pressured frameworks (XBOX, Windows Phone, Silverlight). I would use the for loop as foreach can leave lightweight "garbage" that can be left behind for collectioning. When I was doing XBOX development years ago for games, a common trick was to initialize a fixed array of items shown on a screen using a for loop and keep that in memory and then you don't have to worry about garbage collection/memory adjustments/garbage collection etc. This can be an issue if you have a loop like this called 60+ times/second (i.e. games)
2) If you have a very large set you are iterating AND performance is your key decision driver (remember these numbers are not going to be noticeable unless they are large), then you may want to look at parallelizing your code. The difference then might not be for vs foreach, but Parallel.For vs Parallel.Foreach vs PLINQ (AsParallel() method). You have have different threads tackle the problem.
Edit: In a production application, you are more than likely going to have some kind of logic in your loops which will take >>> time to iterate an item. Once you add that to the mix performance drivers usually shift to the actual logic not optimizing iterations (which the compiler does pretty well).

Fast string parsing in C#

What's the fastest way to parse strings in C#?
Currently I'm just using string indexing (string[index]) and the code runs reasonably, but I can't help but think that the continuous range checking that the index accessor does must be adding something.
So, I'm wondering what techniques I should consider to give it a boost. These are my initial thoughts/questions:
Use methods like string.IndexOf() and IndexOfAny() to find characters of interest. Are these faster than manually scanning a string by string[index]?
Use regex's. Personally, I don't like regex as I find them difficult to maintain, but are these likely to be faster than manually scanning the string?
Use unsafe code and pointers. This would eliminate the index range checking but I've read that unsafe code wont run in untrusted environments. What exactly are the implications of this? Does this mean the whole assembly won't load/run, or will only the code marked unsafe refuse to run? The library could potentially be used in a number of environments, so to be able to fall back to a slower but more compatible mode would be nice.
What else might I consider?
NB: I should say, the strings I'm parsing could be reasonably large (say 30k) and in a custom format for which there is no standard .NET parser. Also, performance of this code is not super critical, so this partly just a theoretical question of curiosity.
30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.
For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random) and sum all the character code-points via the indexer takes... 3ms:
var watch = Stopwatch.StartNew();
char[] chars = new char[128 * 1024];
Random rand = new Random(); // fill with junk
for (int i = 0; i < chars.Length; i++) chars[i] =
(char) ((int) 'a' + rand.Next(26));
int sum = 0;
string s = new string(chars);
int len = s.Length;
for(int i = 0 ; i < len ; i++)
{
sum += (int) chars[i];
}
watch.Stop();
Console.WriteLine(sum);
Console.WriteLine(watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
For files that are actually large, a reader approach should be used - StreamReader etc.
"Parsing" is quite an inexact term. Since you talks of 30k, it seems that you might be dealing with some sort of structured string which can be covered by creating a parser using a parser generator tool.
A nice tool to create, maintain and understand the whole process is the GOLD Parsing System by Devin Cook: http://www.devincook.com/goldparser/
This can help you create code which is efficient and correct for many textual parsing needs.
As for your points:
is usually not useful for parsing which goes further than splitting a string.
is better suited if there are no recursions or too complex rules.
is basically a no-go if you haven't really identified this as a serious problem. The JIT can take care of doing the range checks only when needed, and indeed for simple loops (the typical for loop) this is handled pretty well.

Categories