C# foreach memory cache and string memory optimization - c#

foreach (string sn in MACOrSerial.Split(','))
{
MACOrSerial = sn.Trim();
}
MACORSerial contains a string of text (EX. AA123241, BB123431, CC1231243) separated by commas.
I grab one substring and place that into the same MACORSerial.
This will not cause me any problems as the the foreach still will be using the original MACOrSerial in memory.
While I think this is the most memory efficient approach, is it proper or should I just make another string with a new name such as
MacORSerialSubString = sn.Trim()?
I am not having any memory issues. I just want to make sure my code is clean and concise.

Your assumption is incorrect - the loop goes over the string[] resulting from the Split - these are all new string instances.
You are not saving any memory by reassigning a string to the original variable and you are losing readability by reuse of a variable.
Here is one approach that is more readable and uses some of the built in capabilities:
string[] serialNumbers = MACOrSerial.Split(new [] {',', ' ', '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries);
foreach (string sn in serialNumbers)
{
// do stuff
}

My $.02- the code as you posted is extremely un-intuitive, because it breaks with the expected pattern.
IMHO it is more important to write the code in a way that other developers can understand at a glance than it is to (actually / attempt to) micro-optimize for memory, especially when (as others have pointed out) your micro-optimization does not actually reduce the amount of memory used.

I would create a new variable so that your code is more clear. It doesn't have any noticable impact on memory.

For clarity (and later maintaining) of the code, I would prefer use of a separate variable. Plus I would remove the empty strings up front:
foreach (string sn in MACOrSerial.Split(',', StringSplitOptions.RemoveEmptyEntries))
{
string MacORSerialSubString = sn.Trim();
}

In reality, your iteration will be using the result of the MACOrSerial.Split() call, which is an array that is independent from the MACOrSerial variable.
There will be no difference between using MACOrSerial or another string variable in terms of memory usage, each time sn.Trim() is called, a new string is generated and it's just the reference to this new string that gets placed in your string variable.

Related

C# big string array to string

I have a string array of about 20,000,000 values.
And i need to convert it to a string
I've tried:
string data = "";
foreach (var i in tm)
{
data = data + i;
}
But that takes too long time
does someone know a faster way?
Try StringBuilder:
StringBuilder sb = new StringBuilder();
foreach (var i in tm)
{
sb.Append(i);
}
To get the resulting String use ToString():
string result = sb.ToString();
The answer is going to depend on the size of the output string and the amount of memory you have available and usable. The hard limit on string length appears to be 2^31-1 (int.MaxValue) characters, occupying just over 4GB of memory. Whether you can actually allocate that is dependent on your framework version, etc. If you're going to be producing a larger output then you can't put it into a single string anyway.
You've already discovered that naive concatenation is going to be tragically slow. The problem is that every pass through the loop creates a new string, then immediately discards it on the next iteration. This is going to fill up memory pretty quickly, forcing the Garbage Collector to work overtime finding old strings to clear out of memory, not to mention the amount of memory fragmentation and all that stuff that modern programmers don't pay much attention to.
A StringBuiler, is a reasonable solution. Internally it allocates blocks of characters that it then stitches together at the end using pointers and memory copies. Saves a lot of hassles that way and is quite speedy.
As for String.Join... it uses a StringBuilder. So does String.Concat although it is certainly quicker when not inserting separator characters.
For simplicity I would use String.Concat and be done with it.
But then I'm not much for simplicity.
Here's an untested and possibly horribly slow answer using LINQ. When I get time I'll test it and see how it performs, but for now:
string result = new String(lines.SelectMany(l => (IEnumerable<char>)l).ToArray());
Obviously there is a potential overflow here since the ToArray call can potentially create an array larger than the String constructor can handle. Try it out and see if it's as quick as String.Concat.
So you can do it in LINQ, like such.
string data = tm.Aggregate("", (current, i) => current + i);
Or you can use the string.Join function
string data = string.Join("", tm);
Cant check it right now but I'm curious on how this option would perform:
var data = String.Join(string.Empty, tm);
Is Join optimized and ignores concatenation a with String.Empty?
For this big data unfortunately memory based methods will fail and this will be a real headache for GC. For this operation create a file and put every string in it. Like this:
using (StreamWriter sw = new StreamWriter("some_file_to_write.txt")){
for (int i=0; i<tm.Length;i++)
sw.Write(tm[i]);
}
Try to avoid using "var" on this performance demanding approach. Correction: "var" does not effect perfomance. "dynamic" does.

how to convert part of the string to int/float/vector3 etc. without creating a temp string?

in C#, I have a string like this:
"1 3.14 (23, 23.2, 43,88) 8.27"
I need to convert this string to other types according to the value like int/float/vector3, now i have some code like this:
public static int ReadInt(this string s, ref string op)
{
s = s.Trim();
string ss = "";
int idx = s.IndexOf(" ");
if (idx > 0)
{
ss = s.Substring(0, idx);
op = s.Substring(idx);
}
else
{
ss = s;
op = "";
}
return Convert.ToInt32(ss);
}
this will read the first int value out, i have some similar functions to read float vector3 etc. but the problem is : in my application, i have to do this a lot because i received the string from some plugin and i need to do it every single frame, so i created a lot of strings which caused a lot GC will impact the performance, is their a way i can do similar stuff without creating temp strings?
Generation 0 objects such as those created here may well not impact performance too much, as they are relatively cheap to collect. I would change from using Convert to calling int.Parse() with the invariant culture before I started worrying about the GC overhead of the extra strings.
Also, you don't really need to create a new string to accomplish the Trim() behavior. After all, you're scanning and indexing the string anyway. Just do your initial scan for whitespace, and then for the space delimiter between ss and op, so you get just the substrings you need. Right now you're creating 50% more string instances than you really need.
All that said, no...there's not anything built into the basic .NET framework that would parse a substring without actually creating a new string instance. You would have to write your own parsing routines to accomplish that.
You should measure the actual real-world performance impact first, to make sure these substrings really are a significant issue.
I don't know what the "some plugin" is or how you have to handle the input from it, but I would not be surprised to hear that the overhead in acquiring the original input string(s) for this scenario swamps the overhead of the substrings for parsing.

Why is StringBuilder faster than string operations, but a List<T> is faster than a LinkedList<T>?

So we are told that StringBuilder should be used when you are doing more than a few operations on a string (I've heard as low as three). Therefore we should replace this:
string s = "";
foreach (var item in items) // where items is IEnumerable<string>
s += item;
With this:
string s = new StringBuilder(items).ToString();
I assume that internally StringBuilder holds references to each Appended string, combining then on request. Lets compare this to the HybridDictionary, that uses a LinkedList for the first 10 elements, then swaps to a HashTable when the list grows more then 10. As we can see the same kind of pattern is here, small number of references = linkedList, else make ever increasing blocks of arrays.
Lets look at how a List works. Start off with a list size (internal default is 4). Add elements to the internal array, if the array is full, make a new array of double the size of the current array, copy the current array's elements across, then add the new element and make the new array the current array.
Can you see my confusion as to the performance benefits? For all elements besides strings, we make new arrays, copy old values and add the new value. But for strings that's bad? because we know that "a" + "b" makes a new string reference from the two old references, "a" and "b".
Hope my question isn't too confusing. Why does there seem to be a double standard between string concatenation and array concatenation (I know strings are arrays of chars)?
String: Making new references is bad!
T : where T != String: Making new references is good!
Edit: Maybe what I'm really asking here, is when does making new, bigger arrays and copying the old values across, start being faster than have references to randomly places objects all over the heap?
Double edit: By faster I mean reading, writing and finding variables, not inserting or removing (i.e. LinkedList would kickass at inserting for example, but I don't care about that).
Final edit: I don't care about StringBuilder, I'm interested in the trade off in time taken to copy data from one part of the heap to another for cache alignments, vs just taking the cache misses from teh cpu and have references all over the heap. When does one become faster then the other?*
Therefore we should replace this:
No you shouldn't. The first case you showed string concatenation that can take place at compile time and have replaced it with string concatenation that takes place a runtime. The former is much more desirable, and will execute faster than the latter.
It's important to use a string builder when the number of strings being concatted is not known at compile time. Often (but not always) this means concatting strings in a loop.
Earlier versions of String Builder (before 4.0, if memory serves), did internally look more or less like a List<char>, and it's correct that post 4.0 it looks more like a LinkedList<char[]>. However, the key difference here between using a StringBuilder and using regular string concatenation in a loop is not the difference between a linked list style in which objects contain references to the next object in the "chain" and an array-based style in which an internal buffer overallocates space and is reallocated occasionally as needed, but rather the difference between a mutable object and an immutable object. The problem with traditional string concatenation is that, since strings are immutable, each concatenation must copy all of the memory from both strings into a new string. When using a StringBuilder the new string only needs to be copied onto the end of some type of data structure, leaving all of the existing memory as it is. What type of data structure that is isn't terribly important here; we can rely on Microsoft to use a structure/algorithm that has been proven to have the best performance characteristics for the most common situations.
It seems to me that you are conflating the resizing of a list with the evaluation of a string expression, and assuming that the two should behave the same way.
Consider your example: string s = "a" + "b" + "c" + "d"
Assuming no optimisations of the constant expression (which the compiler would handle automatically), what this will do is evaluate each operation in turn:
string s = (("a" + "b") + "c") + "d"
This results in the strings "ab" and "abc" being created as part of that single expression. This has to happen, because strings in.NET are immutable, which means their values cannot be changed once created. This is because, if strings were mutable, you'd have code like this:
string a = "hello";
string b = a; // would assign b the same reference as a
string b += "world"; // would update the string it references
// now a == "helloworld"
If this were a List, the code would make more sense, and doesn't even need explanation:
var a = new List<int> { 1, 2, 3 };
var b = a;
b.Add(4);
// now a == { 1, 2, 3, 4 }
So the reason that non-string "list" types allocate extra memory early is for reasons of efficiency, and to reduce allocations when the list is extended. The reason that a string does not do that is because a string's value is never updated once created.
Your assumption about the operation of the StringBuilder is irrelevant, but the purpose of a StringBuilder is essentially to create a non-immutable object that reduces the overhead of multiple string operations.
The backing store of a StringBuilder is a char[] that gets resized as needed. Nothing is turned into a string until you invoke StringBuilder.ToString() on it.
The backing store of List<T> is a T[] that gets resized as needed.
The problem with something like
string s = a + b + c + d ;
is that the compiler parses it as
+
/ \
a +
/ \
b +
/ \
c d
and, unless it can see opportunities for optimization, do something like
string t1 = c + d ;
string t2 = b + t1 ;
string s = a + t2 ;
thus creating two temporaries and the final string. With a StringBuilder, though, it's going to build out the character array it needs and at the end create one string.
This is a win because strings, once created, are immutable (can't be changed) and are generally interned in the string pool (meaning that there is only ever one instance of the string...no matter how many time your create the string "abc", every instance will always be a reference to the same object in the string pool.
This adds cost to string creation as well: having determined the candidate string, the runtime has to check the string pool to see if it already exists. If it does, that reference is used; if it does not the candidate string is added to the string pool.
Your example, though:
string s = "a" + "b" + "c" + "d" ;
is a non-sequitur: the compile sees the constant expression and does an optimization called constant folding, so it becomes (even in debug mode):
string s = "abcd" ;
Similar optimizations happen with arithmetic expressions:
int x = 12 / 3 ;
is going to be optimized away to
int x = 4 ;

Variable scope and performance

Below is the code i am using
private void TestFunction()
{
foreach (MySampleClass c in dictSampleClass)
{
String sText = c.VAR1 + c.VAR2 + c.VAR3
PerformSomeTask(sText,c.VAR4);
}
}
My friend has suggesed to change to (to improve performance. dictSampleClass is a dictionary. It has 10K objects)
private void TestFunction()
{
String sText="";
foreach (MySampleClass c in dictSampleClass)
{
sText = c.VAR1 + c.VAR2 + c.VAR3
PerformSomeTask(sText,c.VAR4);
}
}
My Question is, "Does above change improve performance? if yes, how?"
WOW thats more than expected response. Most guys said "C# compiler would take care of that". So what about c compiler??
The compiler has intelligence to move variable declarations into/out of loops where required. In your example however, you are using a string, which is immutable. By declaring it outside I believe you are trying to "create once, use many", however strings are created each time they are modified so you can't achieve that.
Not to sound harse, but this is a premature optimisation, and likely a failing one - due to string immutability.
If the collection is large, go down the route of appending many strings into a StringBuilder - separated by a delimiter. Then split the string on this delimiter and iterate the array to add them, instead of concatenating them and adding them in one loop.
StringBuilder sb = new StringBuilder();
foreach (MySampleClass c in dictSampleClass)
{
sb.Append(c.VAR1);
sb.Append(c.VAR2);
sb.Append(c.VAR3);
sb.Append("|");
}
string[] results = sb.ToString().Split('|');
for (int i = 0; i < dictSampleClass.Count; i++)
{
string s = results[i];
MySampleClass c = dictSampleClass[i];
PerformSomeTask(s,c.VAR4);
}
I infer no benefits to using this code - most likely doesn't even work!
UPDATE: in light of the fast performance of string creation from multiple strings, if PerformSomeTask is your bottleneck, try to break the iteration of the strings into multiple threads - it won't improve the efficiency of the code but you'll be able to utilise multiple cores.
Run the two functions through reflector and have a look at the generated assembly. It might give some insights but at best, the performance improval would be minimal.
I'd go for this instead:
private void TestFunction()
{
foreach (MySampleClass c in dictSampleClass)
{
PerformSomeTask(c.VAR1 + c.VAR2 + c.VAR3, c.VAR4);
}
}
There's still probably no real performance benefit, but it removes creating a variable that you don't really need.
No, it does not. Things like these are handled by the C# compiler which is very inteligent nad you basicaly do not need to care about these details.
Aways use code that promotes readability.
You can check this by disassembling the program.
Ii will not improve performance. But on another note: Assumed improvements are error prone. You have to measure you application and only optimise the slow part if needed. I don't believe that loop is your appliacations's bottleneck.

String Concatenation Vs String Builder Append

So...I have this scenario where I have a Foreach loop that loops through a List of Checkboxes to check which are selected. For every selected checkbox, I have to do a pretty long string concatenation, involving 30 different strings of an average length of 20 characters, and then send it out as a HTTP request. 2 of the strings are dependant on the index/value of the checkbox selected.
The length of the List of Checkboxes is also variable depending upon the user's data. I would say the average length of the List would be 20, but it can go up to 50-60. So the worst case scenario would be performing the whole string concatenation 60 or so times.
For now I'm doing it with simple string concatenation via the '+' operator, but I'm wondering if it would be faster to do it with Stringbuilder. Of course, that means I'd have to either create a Stringbuilder object within the loop, or create it before the loop and call Stringbuilder.Remove at the end of it after sending out the HTTP request.
I appreciate any insights anybody can share regarding this issue.
EDIT
Thanks for all the replies everybody, so from what I've gathered, the best way for me to go about doing this would be something like:
StringBuilder sb = new StringBuilder();
foreach (CheckBox item in FriendCheckboxList)
{
if (item.Checked)
{
sb.Append(string1);
sb.Append(string2);
sb.Append(string3);
.
.
.
sb.Append(stringLast);
SendRequest(sb.ToString());
sb.Length = 0;
}
}
Use StringBuilder. That's what it's for.
Strings are immutable. String concatenation creates a new string, needing more memory, and is generally considered slow:
string a = "John" + " " + "Saunders";
This creates a string "John ", then creates another string "John Saunders", then finally, assigns that to "a". The "John " is left for garbage collection.
string a = "John";
a += " ";
a += "Saunders";
This is about the same, as "John" is replaced by a new string "John ", which is replaced by a new string "John Saunders". The originals are left to be garbage collected.
On the other hand, StringBuilder is designed to be appended, removed, etc.
Example:
StringBuilder sb = new StringBuilder();
for (int i=0; i<n; i++)
{
sb.Length = 0;
sb.Append(field1[i]);
sb.Append(field2[i]);
...
sb.Append(field30[i]);
// Do something with sb.ToString();
}
This topic has been analysed to death over the years. The end result is that if you are doing a small, known number of concatenations, use '+', otherwise use stringbuilder. From what you've said, concatenate with '+' should be faster. There are a gazillion (give or take) sites out there analysing this - google it.
For the size of string you are talking about, it's negligible anyway.
EDIT: on second thought, SB is probably faster. But like I said, who cares?
I know this has been answered, but I wanted to point out that I actually think the "blindly accepted as gospel" approach of always using StringBuilder is sometimes wrong, and this is one case.
For background, see this blog entry: http://geekswithblogs.net/johnsperfblog/archive/2005/05/27/40777.aspx
The short of it is, for this particular case, as described, you will see better performance by avoiding StringBuilder and making use of the + operator thusly:
foreach (CheckBox item in FriendCheckboxList)
{
if (item.Checked)
{
string request = string1 +
string2 +
string3 +
.
.
.
stringLast;
SendRequest(request);
}
}
The reason is that the C# compiler (as of .NET 1.1), will convert that statement into a single IL call to String.Concat passing an array of Strings as an argument. The blog entry does an excellent job outlining the implementation details of String.Concat, but suffice to say, it is extremely efficient for this case.
If your asking this question, chances are you should use StringBuilder for many reasons, but i'll provide two.
When you use string concatenation it has to allocate a new buffer and and copy the data in the other string into the new string variable. So you are going to incur many repeated allocations. Which in the end ends up fragmenting the memory, using up heap space, and making more work for the Garbage collector.
The StringBuilder on the other hand pre-allocates a buffer and as you add strings to it doesn't need to keep re-allocating (assuming initial buffer is large enough). Which increases performance and is far less taxing on memory.
As developers we should try to anticipate future growth. Let's say that your list grows substantially over time and then all of a sudden starts performing slowly. If you can prevent this with little effort now, why wouldn't you do it?
In general I would recommend to use a StringBuilder.
Have you tested this and checked the performance? Is the performance an issue vs how long it will take you to rewrite the code?

Categories