Simple Basic Question

Simple Basic Question - c#

Sorry for asking such a simple question but wanted to clear a concept.
Below is my code where I am creating a dictionary inside for loop
if(condition)
{
// some code here
for(int i=0; i<length; i++)
{
Dictionary<string, string> parameter = new Dictionary<string, string>();
parameter.Add("ServiceTypeID", "1");
parameter.Add("Description", "disc");
}
}
instead of creating dictionary object every time should I be creating the dictionary object before for loop and applying clear method on dictionary object like
if(condition)
{
Dictionary<string, string> parameter = new Dictionary<string, string>();
// some code here
for(int i=0; i<length; i++)
{
parameter.clear();
parameter.Add("ServiceTypeID", "1");
parameter.Add("Description", "disc");
}
}
Out of these two option which one will be better for performance.
Thanks,
nil

In most practical scenarios the difference is close to zero.
One may think that clearing a data structure is quicker than initializing an empty one. This is not always the case. Note that modern languages (C#, Java) the memory manager is optimized for allocating many small objects (this is related to the way Garbage Collectors work). In C++, due to the lack of a GC, the memory manager is tuned to allocation of few large objects. Thus, re-constructing the Dictionary inside the loop is comparable (performance-wise) with clearing it.
Moreover, clear() may not necessarily free all allocated memory. It can be that it only resets some pointers/indices. Therefore, if you use clear() your Dictionary may still occupy large chunks of memory which may slow down other parts of your code.
Bottom line: don't worry about it unless a profiler told you that this is the bottleneck of your program.

If both of these solutions are working for you you should remember two items :
first one creates dictionary object in each loop, so its speed is lower because of allocating memory in each loop times
second one is faster. but takes the Dictionary object alive for more time, so memory will be full if GC takes no action on it! (GC removes it after scope ends) so in long blocks of code, takes the memory in use for more time!

In few words for the performance, clearly, the second loop is better because you create only one object and then in the loop you add items.
However in the first loop parameter variable is not useful because at the end of the for it does not exists any more...
also in the second loop you have the same problem... at the end of the if that reference isn't usable....

Related

Is it better to Clear lists and refill them or simply assign an existing collection to the list?

I'm working on copying/modifying a package for unity (xNode) and I'm at the point where NodePortDictionary is defined.
In OnBeforeSerialize the original author clears both the Key and Value lists and iterates through the dictionary adding the key and value pairs to the lists.
public void OnBeforeSerialize() {
keys.Clear();
values.Clear();
foreach(KeyValuePair<string, NodePort> pair in this) {
keys.Add(pair.Key);
values.Add(pair.Value);
}
}
Is there a reason that it should be done that way over this way?
public void OnBeforeSerialize() {
keys = Keys.ToList();
values = Values.ToList();
}
I'm not asking if it's 'a better practice', I'm trying to understand if its better from a performance perspective. More specifically is there enough of a difference in performance to be concerned about?

These questions are relevant only if your lists have thousands or millions of entries, of if your project is extremely time sensitive (like trying to keep under a certain time threshold, which normally happens in C or C++).
In your first code
public void OnBeforeSerialize() {
keys.Clear();
values.Clear();
foreach(KeyValuePair<string, NodePort> pair in this) {
keys.Add(pair.Key);
values.Add(pair.Value);
}
}
Your cycle runs for the length of the list of pairs, which will have O(n). In the second code
public void OnBeforeSerialize() {
keys = Keys.ToList();
values = Values.ToList();
}
the .ToList() run for the length of the list of pairs, and you have two of them, so you will run in O(2n). But O(n)=O(2n), so unless you have millions of entries, you won't see a big difference.
Now in terms of memory allocation, it is better to clean the variables and re-use them because of memory swap. There are concepts called Memory Heap and Memory Stack, where one is accessed faster than the other one. There are limits in how much you can keep on stack, and its not a lot. (Homework: check in which memory you create local variables, global variables, and where the variables can be resized and where they cannot. Also, check if List can be resized, or if it creates a new instance.)
About this:
Right now if both approaches work, it means you haven't exceeded the stack size. Then your second approach will be faster than the first one. The compiler will only assign a new memory address.
If you exceed the stack size, your second approach won't work. You will run into a memory exception. Your only option will be the first approach.
In your example, NodePort and string may not be memory aligned (which means that there is a chance the compiler does bit padding to align to nice lengths, i.e., 64 bits for example). So you should see what the compiler is doing behind the scenes for these variables (you will have to read the assembly code for this, which again, if your project is not time sensitive, its a waste of time).
Visual Studio has excellent profiling tools, so you can play a bit and learn about the memory allocation. Also I suggest to you to read about heap vs. stack, and you will see that there is a lot more of what meets the eye.
Check these, they might help you:
https://www.guru99.com/stack-vs-heap.html
https://www.c-sharpcorner.com/article/stack-vs-heap-memory-c-sharp/

Reducing allocation and GC while frequently iterating through Dictionary

Context: I'm using Unity3D's IMGUI where OnGUI{} method is being called/update VERY often (few times per frame) to keep GUI content relevant. I need to iterate through Dictionary with data and display said data, but because I also will be making changes to Dictionary content (Add/Remove) I have to iterate through separate List/Array/whatever.
So in other words right now I have:
foreach (string line in new List<string>(this.myDic.Keys))
{
//fill GUI
//edit Dictionary content if needed
}
The problem here is that it allocates short-lived List multiple time per frame, thousands and thousands times per second and insane amount in general, producing GC. What I want is to avoid this allocation by reusing the same List I initialize at the start. However, another issue came up:
tempList.Clear();
foreach (KeyValuePair<string,string> pair in myDic)
{
tempList.Add(pair.key)
}
var j = tempList.Count;
for (int i = 0; i < j; i++)
{
//fill GUI
//edit Dictionary content if needed
}
As you can see now I basically have two loops, both processing same amount of data. Which leads me to the question: is it moot point here trying to optimize the allocation issue here with reusable List? Or may be even if it looks scary the double-loop variant still better solution?
P.S. Yes, I know, best option would be switch from IMGUI but right now I'm kinda limited to it.

First of all, the only object you allocate by calling new List<string>(this.myDic.Keys) (or this.myDic.Keys.ToArray() alternatively) is an array containing references to already existing objects (strings in your case). So, the GC collects only one object when the scope ends.
The size of that object is about equal to objectCount*referenceSize. The reference size depends on the selected platform: 32bit or 64bit.
Speaking formally, you can save some memory traffic by reusing an existing list, but I guess it's not worth it.
Anyway, if you're up to do that, please note that your approach to refilling the list isn't optimal. I suggest you using .AddRange (it internally uses Array.Copy).
tempList.Clear();
tempList.AddRange(myDic.Keys)
foreach (key in tempList) { ... } //replace with 'for' if you also want to avoid allocation of an iterator
Most likely it's a premature optimization, please do some performance benchmark to test if it makes any sense for your application.

Does foreach loop work more slowly when used with a not stored list or array?

I am wondered at if foreach loop works slowly if an unstored list or array is used as an in array or List.
I mean like that:
foreach (int number in list.OrderBy(x => x.Value)
{
// DoSomething();
}
Does the loop in this code calculates the sorting every iteration or not?
The loop using stored value:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>;
foreach (int number in list)
{
// DoSomething();
}
And if it does, which code shows the better performance, storing the value or not?

This is often counter-intuitive, but generally speaking, the option that is best for performance is to wait as long as possible to materialize results into a concrete structure like a list or array. Please keep in mind that this is a generalization, and so there are plenty of cases where it doesn't hold. Nevertheless, the first instinct is better when you avoid creating the list for as long as possible.
To demonstrate with your sample, we have these two options:
var list = tours.OrderBy(x => x.Value).ToList();
foreach (int number in list)
{
// DoSomething();
}
vs this option:
foreach (int number in list.OrderBy(x => x.Value))
{
// DoSomething();
}
To understand what is going on here, you need to look at the .OrderBy() extension method. Reading the linked documentation, you'll see it returns a IOrderedEnumerable<TSource> object. With an IOrderedEnumerable, all of the sorting needed for the foreach loop is already finished when you first start iterating over the object (and that, I believe, is the crux of your question: No, it does not re-sort on each iteration). Also note that both samples use the same OrderBy() call. Therefore, both samples have the same problem to solve for ordering the results, and they accomplish it the same way, meaning they take exactly the same amount of time to reach that point in the code.
The difference in the code samples, then, is entirely in using the foreach loop directly vs first calling .ToList(), because in both cases we start from an IOrderedEnumerable. Let's look closely at those differences.
When you call .ToList(), what do you think happens? This method is not magic. There is still code here which must execute in order to produce the list. This code still effectively uses it's own foreach loop that you can't see. Additionally, where once you only needed to worry about enough RAM to handle one object at a time, you are now forcing your program to allocate a new block of RAM large enough to hold references for the entire collection. Moving beyond references, you may also potentially need to create new memory allocations for the full objects, if you were reading a from a stream or database reader before that really only needed one object in RAM at a time. This is an especially big deal on systems where memory is the primary constraint, which is often the case with web servers, where you may be serving and maintaining session RAM for many many sessions, but each session only occasionally uses any CPU time to request a new page.
Now I am making one assumption here, that you are working with something that is not already a list. What I mean by this, is the previous paragraphs talked about needing to convert an IOrderedEnumerable into a List, but not about converting a List into some form of IEnumerable. I need to admit that there is some small overhead in creating and operating the state machine that .Net uses to implement those objects. However, I think this is a good assumption. It turns out to be true far more often than we realize. Even in the samples for this question, we're paying this cost regardless, by the simple virtual of calling the OrderBy() function.
In summary, there can be some additional overhead in using a raw IEnumerable vs converting to a List, but there probably isn't. Additionally, you are almost certainly saving yourself some RAM by avoiding the conversions to List whenever possible... potentially a lot of RAM.

Yes and no.
Yes the foreach statement will seem to work slower.
No your program has the same total amount of work to do so you will not be able to measure a difference from the outside.
What you need to focus on is not using a lazy operation (in this case OrderBy) multiple times without a .ToList or ToArray. In this case you are only using it once(foreach) but it is an easy thing to miss.
Edit: Just to be clear. The as statement in the question will not work as intended but my answer assumes no .ToList() after OrderBy .

This line won't run:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>; // Returns null.
Instead, you want to store the results this way:
List<Tour> list = tours.OrderBy(x => x.Value).ToList();
And yes, the second option (storing the results) will enumerate much faster as it will skip the sorting operation.

How to effectively discard an array and fill it with new values?

I've got a few global arrays I use in a simple WinForms game. The arrays are initialized when a new game starts. When a player is in the middle of the game (the arrays are filled with data) he clicks on the StartNewGame() button (restarts the game). What to do next?
Is it ok to reinitialize the whole array for the new game or should I just set every array item to null and use the already initialized array (which would be slower)?
I mean is it okay to do something like this?
MyClass[,] gameObjects;
public Form1()
{
StartNewGame();
// game flow .. simplified here .. normally devided in functions and events..
StartNewGame();
// other game flow
}
public StartNewGame()
{
gameObjects = new MyClass[10,10];
// some work with gameObjects
}

This almost entirely depends upon MyClass, specifically how many data members it contains, how much processing does its constructor (and members' constructors) require and whether it is a relatively simply operation to (re)set an object of this class to "initialized" state. A more objective answer can be obtained through benchmarking.

From you question, I understand that there are not so many array's - in that case I would say, reinitialize the whole array
In cases you have a lot of work that can take 30 sec to set up maybe you do clean up instead of reinitializing everything.
If you choose to place null, you can jet some ugly exception , so I think you rather clean the object inside the array rather then set them to null

If there are only 100 elements as in your example, then there shouldn't really be a noticeable performance hit.
If you reinitialize the array, you will perform n constructions for n objects. The garbage collector will come clean up the old array and de-allocate those old n objects at some later time. (So you have n allocations upfront, and n deallocations by the GC).
If you set each pointer in the array to null, the garbage collector will still do the same amount of work and come clean up those n objects at some later time. The only difference is you're not deallocating the array here, but that single deallocation is negligible.
From my point of view, the best way to achieve performance in this case is to not reallocate the objects at all, but to use the same ones. Add a valid bit to mark whether or not an object is valid (in use), and to reinitialize you simply set all the valid bits to false. In a similar fashion, programs don't go through and write 0's to all your memory when it's not in use. They just leave it as garbage and overwrite data as necessary.
But again, if your number of objects isn't going into the thousands, I'd say you really won't notice the performance hit.

gameObjects = new MyClass[10,10];
... is the way to go. This is definitely faster than looping through the array and setting the items to null. It is also simpler to code and to understand. But both variants are very fast in anyway, unless you have tens of millions of entries! '[10, 10]' is very small, so forget about performance and do what seems more appropriate and more understandable to you. A clean coding is more important than performance in most cases.

Will declaring a variable inside/outside a loop change the performance?

Is this:
foreach(Type item in myCollection)
{
StringBuilder sb = new StringBuilder();
}
much slower than:
StringBuilder sb = new StringBuilder();
foreach(Type item in myCollection)
{
sb = new StringBuilder();
}
In other words, will it really matter where I declare my StringBuilder?

No, it will not matter performance-wise where you declare it.
For general code-cleanliness, you should declare it in the inner-most scope that it is used - ie. your first example.

You could maybe gain some performance, if you write this:
StringBuilder sb = new StringBuilder();
foreach(Type item in myCollection)
{
sb.Length = 0;
}
So you have to instantiate the StringBuilder just once and reset the size in the loop, which should be slightly faster than instantiating a new object.

In the 2nd example you're creating an extra instance of StringBuilder. Apart from that they are both they same, so the performance issue is negligable.

There isn't enough code here to clearly indicate a performance difference in your specific case. Having said that, the difference between declaring a reference variable inside of a loop like this vs. outside is trivial for most cases.

The effective difference between your two code samples is that the second will allocate 1 more instance of StringBuilder than the first. The performance impact of this as compared to the rest of your application is essentially nothing.

Best way to check is by trying both methods in a loop, about 100.000 each. Measure the amount of time each 100.000 iterations take and compare them. I don't think there is a lot of difference.
But there is a small difference, though. The first example will have as many variables as the number of iterations. The second example just has one variable. The compiler is smart enough to do some optimizations here, so you won't notice a speed improvement.
However, if you don't want to use the last object generated inside the loop once you're outside the loop again, then the first solution would be better. In the second solution, it just takes a while before the garbage collector will free the last object created. In the first example, the garbage collector will be a bit faster in freeing the object. It depends on the rest of the code but if you store a lot of data in this StringBuilder object then the second example might hold on to this memory a lot longer, thus decreasing the performance of your code after leaving the loop! Then again, if the objects eats up 100 KB and you have 16 GB in your machine, no one cares... The garbage collector will eventually free it again, probably as soon when you leave the method which contains this loop.

If you have other similar type code segments, you could always profile or put some timers around the code and run a benchmark type test to see for yourself. Another factor would be the memory footprint, which others have commented on.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.