why is doing the following so bad?
String val = null;
String someOtherValue = "hello"
val += someOtherValue;
It must be pretty bad, but why is that? I had this line in my program and it slowed everything down immensely!
I'm assuming it's because it keeps re-creating the string? Is this the only reason though?
That exact code is perfectly fine; the compiler will optimize it away.
Doing that in a loop can be slow, since that creates a separate (immutable) string object for each concatenation.
Instead, use StringBuilder.
Yes, the reason is that strings are immutable in C#, which means they can't be changed. The framework is forced to allocate a new string every time you do the +=
Try using StringBuilder instead..
The difference is very noticeable in long loops
I don't think that there's a sliver bullet that "this is better than that". It depends on the scenario where you need to concat the strings. Performance vs readability is also an issue here. Sometimes it's better to write a well readable code by compromising a little on the performance.
Referring the article from James Michael Hare
The fact is, the concattenation operator (+) has been optimized for
speed and looks the cleanest for joining together a known set of
strings in the simplest manner possible.
StringBuilder, on the other hand, excels when you need to build a
string of inderterminant length. Use it in those times when you are
looping till you hit a stop condition and building a result and it
won’t steer you wrong.
String.Format seems to be the looser from the stats, but consider
which of these is more readable. Yes, ignore the fact that you could
do this with ToString() on a DateTime.
Have a look on the article, it's worth reading.
Related
I'm doing a bit of coding, where I have to write this sort of code:
if( array[i]==false )
array[i]=true;
I wonder if it should be re-written as
array[i]=true;
This raises the question: are comparisions faster than assignments?
What about differences from language to language? (contrast between java & cpp, eg.)
NOTE: I've heard that "premature optimization is the root of all evil." I don't think that applies here :)
This isn't just premature optimization, this is micro-optimization, which is an irrelevant distraction.
Assuming your array is of boolean type then your comparison is unnecessary, which is the only relevant observation.
Well, since you say you're sure that this matters you should just write a test program and measure to find the difference.
Comparison can be faster if this code is executed on multiple variables allocated at scattered addresses in memory. With comparison you will only read data from memory to the processor cache, and if you don't change the variable value when the cache decides to to flush the line it will see that the line was not changed and there's no need to write it back to the memory. This can speed up execution.
Edit: I wrote a script in PHP. I just noticed that there was a glaring error in it meaning the best-case runtime was being calculated incorrectly (scary that nobody else noticed!)
Best case just beats outright assignment but worst case is a lot worse than plain assignment. Assignment is likely fastest in terms of real-world data.
Output:
assignment in 0.0119960308075 seconds
worst case comparison in 0.0188510417938 seconds
best case comparison in 0.0116770267487 seconds
Code:
<?php
$arr = array();
$mtime = explode(" ", microtime());
$starttime = $mtime[1] + $mtime[0];
reset_arr($arr);
for ($i=0;$i<10000;$i++)
$arr[i] = true;
$mtime = explode(" ", microtime());
$firsttime = $mtime[1] + $mtime[0];
$totaltime = ($firsttime - $starttime);
echo "assignment in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if ($arr[i])
$arr[i] = true;
$mtime = explode(" ", microtime());
$secondtime = $mtime[1] + $mtime[0];
$totaltime = ($secondtime - $firsttime);
echo "worst case comparison in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if (!$arr[i])
$arr[i] = false;
$mtime = explode(" ", microtime());
$thirdtime = $mtime[1] + $mtime[0];
$totaltime = ($thirdtime - $secondtime);
echo "best case comparison in ".$totaltime." seconds<br />";
function reset_arr($arr) {
for ($i=0;$i<10000;$i++)
$arr[$i] = false;
}
I believe if comparison and assignment statements are both atomic(ie one processor instruction) and the loop executes n times, then in the worst-case comparing then assigning would require n+1(comparing on every iteration plus setting the assignement) executions whereas constantly asssigning the bool would require n executions. Therefore the second one is more efficient.
Depends on the language. However looping through arrays can be costly as well. If the array is in consecutive memory, the fastest is to write 1 bits (255s) across the entire array with memcpy assuming your language/compiler can do this.
Thus performing 0 reads-1 write total, no reading/writing the loop variable/array variable (2 reads/2 writes each loop) several hundred times.
I really wouldn't expect there to be any kind of noticeable performance difference for something as trivial as this so surely it comes down to what gives you clear, more readable code. I my opinion that would be always assigning true.
Might give this a try:
if(!array[i])
array[i]=true;
But really the only way to know for sure is to profile, I'm sure pretty much any compiler would see the comparison to false as unnecessary and optimize it out.
It all depends on the data type. Assigning booleans is faster than first comparing them. But that may not be true for larger value-based datatypes.
As others have noted, this is micro-optimization.
(In politics or journalism, this is known as navel-gazing ;-)
Is the program large enough to have more than a couple layers of function/method/subroutine calls?
If so, it probably had some avoidable calls, and those can waste hundreds as much time as low-level inefficiencies.
On the assumption that you have removed those (which few people do), then by all means run it 10^9 times under a stopwatch, and see which is faster.
Why would you even write the first version? What's the benefit of checking to see if something is false before setting it true. If you always are going to set it true, then always set it true.
When you have a performance bottleneck that you've traced back to setting a single boolean value unnecessarily, come back and talk to us.
I remember in one book about assembly language the author claimed that if condition should be avoided, if possible.
It is much slower if the condition is false and execution has to jump to another line, considerably slowing down performance. Also since programs are executed in machine code, I think 'if' is slower in every (compiled) language, unless its condition is true almost all the time.
If you just want to flip the values, then do:
array[i] = !array[i];
Performance using this is actually worse though, as instead of only having to do a single check for a true false value then setting, it checks twice.
If you declare a 1000000 element array of true,false, true,false pattern comparision is slower. (var b = !b) essentially does a check twice instead of once
was wondering which of the following options is more efficient. Any suggestions?
Listing 1
string header = "Header 1"; // create a string variable and reuse
client.AddMessage(header, ...);
client.AddMessage(header, ...);
client.AddMessage(header, ...);
client.AddMessage(header, ...);
client.AddMessage(header, ...);
...
Listing 2
client.AddMessage("Header 1", ...); // hard code the string in each call
client.AddMessage("Header 1", ...);
client.AddMessage("Header 1", ...);
client.AddMessage("Header 1", ...);
client.AddMessage("Header 1", ...);
....
You should probably not care about this kind of (possible) micro-optimization : what matters, here, is maintenability :
do you have only one string ?
or can you have several distinct values, one day or another ?
(The compiler should optimize that for you, anyway, I suppose)
Strings are interned in the .NET world, so either one will work the same way.
In other words - it makes no difference in regards to performance.
As for maintainability - if you need to change the header name, option 1 is better (DRY).
I doubt there's much in it but Listing 1 I would say because all those individual strings in listing 2 would have to be created in turn unless the optimiser is doing things behind the scenes.
I'd got with version one but use a const instead if the value is not to change.
const string HEADER = "Header 1";
Strings will be re-used (interned). So they should both be equally as efficient. I'd argue that Listing 1 is more maintainable.
Both are the same. Strings are interned.
Im not 100% sure but i think its about the same thing you could add const to your string declaration if you want more performance but i don't think you will be able to see any measurable difference..
But Listing 1 has one benefit that you can change the value by just changing one part of the code...
The first option is more efficient for the purposes of refactoring, you only have one line to change if you ever wish to change the code. Additionally if its always going to be the same value defined at compile time you may want to consider using a constant.
If you really believe this piece of code is a performance bottleneck, you should try it both ways, profile, and then you'll know which is more efficient.
If you don't care enough to measure, do not optimise.
If you're just asking out of curiosity, Strings get interned in .NET so it's exactly the same from a performance viewpoint either way - i.e. don't do the second one because then whoever has to maintain it will kill you.
This is an almost academic question but I'm curious as to its answer.
Suppose you have a loop that performs a routine replace on every row in a dataset. Let's say there's 10,000 such rows.
Is it more efficient to have something like this:
Row = Row.Replace('X', 'Y');
Or to check whether the row even contains the character that is to be replaced in the first place, like this:
if (Row.Contains('X')) Row = Row.Replace('X', 'Y');
Is there any difference in terms of efficiency? I realize that that the difference might be very minor bit I'm interested in knowing if one way is better than the other regardless of how much better it may be. Also, would your answer be different if the probability of finding the character that's to be replaced was 10% from it it being 90%?
For your check, Row.Contains('X'), is an O(n) function, which means that it iterates over the entire string one character at a time to see if that character exists.
Row.Replace('X', 'Y') works exactly the same way, it checks every single character one character at a time.
So, if you have that check in place, you iterate over the string potentially twice. If you just replace, you iterate over the string once.
You need to measure first on a realistic dataset, then decide which is higher performance. If your typical dataset doesn't often have anything, then having the Contains() call may be faster (because although Replace also iterates through all chars in the string, there will be an extra string object created and garbage collected due to the immutability of strings), but if "X" is often present, the check becomes a waste and actually slows things down.
Also, this typically isn't the first place to look for and worry about performance problems. Things like chatty interfaces, network I/O, web services, databases, file I/O and GUI updates are going to hurt you orders of magnitude more than stuff like this.
If you were going to do stuff like this, and if Row came back from a database (as it's name suggests) then getting the database to do the query might be another approach to save performance. E.g.
select MyTextColumn from MyTable where MyTextColumn like '%X%'
Then perform the replacement on all the results, because you know you only returned results where the replacement was needed.
This does introduce other concerns though - for example, in SQL Server, if the above example included an index on MyTextColumn, SQL Server won't be able to use that index because the like argument starts with a wildcard (it's not considered to be "sargable").
In summary, write for correctness, readability and maintenance first, then measure performance and make targeted improvements where they are found to be required.
The first option is faster. In order to check if a substring is present it first has to find it. As there won't be any caching mechanism why not replace it directly? Otherwise you'd be searching twice. If 'X' is present many times you would be basically doubling the effort.
Don't forget that strings in C# are IMMUTABLE. That means they cannot change.
For it to replace anything it has to create a new string in memory, and copy the data across, then garbage collect the old string later on.
Using Contains() first, will prevent needless creation, copying, and garbage collection of string data, and therefore perform faster.
Hi I have this code below and am looking for a prettier/faster way to do this.
Thanks!
string value = "HelloGoodByeSeeYouLater";
string[] y = new string[]{"Hello", "You"};
foreach(string x in y)
{
value = value.Replace(x, "");
}
You could do:
y.ToList().ForEach(x => value = value.Replace(x, ""));
Although I think your variant is more readable.
Forgive me, but someone's gotta say it,
value = Regex.Replace( value, string.Join("|", y.Select(Regex.Escape)), "" );
Possibly faster, since it creates fewer strings.
EDIT: Credit to Gabe and lasseespeholt for Escape and Select.
While not any prettier, there are other ways to express the same thing.
In LINQ:
value = y.Aggregate(value, (acc, x) => acc.Replace(x, ""));
With String methods:
value = String.Join("", value.Split(y, StringSplitOptions.None));
I don't think anything is going to be faster in managed code than a simple Replace in a foreach though.
It depends on the size of the string you are searching. The foreach example is perfectly fine for small operations but creates a new instance of the string each time it operates because the string is immutable. It also requires searching the whole string over and over again in a linear fashion.
The basic solutions have all been proposed. The Linq examples provided are good if you are comfortable with that syntax; I also liked the suggestion of an extension method, although that is probably the slowest of the proposed solutions. I would avoid a Regex unless you have an extremely specific need.
So let's explore more elaborate solutions and assume you needed to handle a string that was thousands of characters in length and had many possible words to be replaced. If this doesn't apply to the OP's need, maybe it will help someone else.
Method #1 is geared towards large strings with few possible matches.
Method #2 is geared towards short strings with numerous matches.
Method #1
I have handled large-scale parsing in c# using char arrays and pointer math with intelligent seek operations that are optimized for the length and potential frequency of the term being searched for. It follows the methodology of:
Extremely cheap Peeks one character at a time
Only investigate potential matches
Modify output when match is found
For example, you might read through the whole source array and only add words to the output when they are NOT found. This would remove the need to keep redimensioning strings.
A simple example of this technique is looking for a closing HTML tag in a DOM parser. For example, I may read an opening STYLE tag and want to skip through (or buffer) thousands of characters until I find a closing STYLE tag.
This approach provides incredibly high performance, but it's also incredibly complicated if you don't need it (plus you need to be well-versed in memory manipulation/management or you will create all sorts of bugs and instability).
I should note that the .Net string libraries are already incredibly efficient but you can optimize this approach for your own specific needs and achieve better performance (and I have validated this firsthand).
Method #2
Another alternative involves storing search terms in a Dictionary containing Lists of strings. Basically, you decide how long your search prefix needs to be, and read characters from the source string into a buffer until you meet that length. Then, you search your dictionary for all terms that match that string. If a match is found, you explore further by iterating through that List, if not, you know that you can discard the buffer and continue.
Because the Dictionary matches strings based on hash, the search is non-linear and ideal for handling a large number of possible matches.
I'm using this methodology to allow instantaneous (<1ms) searching of every airfield in the US by name, state, city, FAA code, etc. There are 13K airfields in the US, and I've created a map of about 300K permutations (again, a Dictionary with prefixes of varying lengths, each corresponding to a list of matches).
For example, Phoenix, Arizona's main airfield is called Sky Harbor with the short ID of KPHX. I store:
KP
KPH
KPHX
Ph
Pho
Phoe
Ar
Ari
Ariz
Sk
Sky
Ha
Har
Harb
There is a cost in terms of memory usage, but string interning probably reduces this somewhat and the resulting speed justifies the memory usage on data sets of this size. Searching happens as the user types and is so fast that I have actually introduced an artificial delay to smooth out the experience.
Send me a message if you have the need to dig into these methodologies.
Extension method for elegance
(arguably "prettier" at the call level)
I'll implement an extension method that allows you to call your implementation directly on the original string as seen here.
value = value.Remove(y);
// or
value = value.Remove("Hello", "You");
// effectively
string value = "HelloGoodByeSeeYouLater".Remove("Hello", "You");
The extension method is callable on any string value in fact, and therefore easily reusable.
Implementation of Extension method:
I'm going to wrap your own implementation (shown in your question) in an extension method for pretty or elegant points and also employ the params keyword to provide some flexbility passing the arguments. You can substitute somebody else's faster implementation body into this method.
static class EXTENSIONS {
static public string Remove(this string thisString, params string[] arrItems) {
// Whatever implementation you like:
if (thisString == null)
return null;
var temp = thisString;
foreach(string x in arrItems)
temp = temp.Replace(x, "");
return temp;
}
}
That's the brightest idea I can come up with right now that nobody else has touched on.
In web app I am splitting strings and assigning to link names or to collections of strings. Is there a significant performance benefit to using stringbuilder for a web application?
EDIT: 2 functions: splitting up a link into 5-10 strings. THen repackaging into another string. Also I append one string at a time to a link everytime the link is clicked.
How many strings will you be concatenating? Do you know for sure how many there will be, or does it depend on how many records are in the database etc?
See my article on this subject for more details and guidelines - but basically, being in a web app makes no difference to how expensive string concatenation is vs using a StringBuilder.
EDIT: I'm afraid it's still not entirely clear from the question exactly what you're doing. If you've got a fixed set of strings to concatenate, and you can do it all in one go, then it's faster and probably more readable to do it using concatenation. For instance:
string faster = first + " " + second + " " + third + "; " + fourth;
string slower = new StringBuilder().Append(first)
.Append(" ")
.Append(second)
.Append(" ")
.Append(third)
.Append("; ")
.Append(fourth)
.ToString();
Another alternative is to use a format string of course. This may well be the slowest, but most readable:
string readable = string.Format("{0} {1} {2}; {3}",
first, second, third, fourth);
The part of your question mentioning "adding a link each time" suggests using a StringBuilder for that aspect though - anything which naturally leads to a loop is more efficient (for moderate to large numbers) using StringBuilder.
You should take a look at this excellent article by Jon Skeet about concatenating strings.
Yes, concatenating regular strings is expensive (really appending on string on to the end of another). Each time a string is changed, .net drops the old string and creates a new one with the new values. It is an immutable object.
EDIT:
Stringbuilder should be used with caution, and evaluated like any other approach. Sometimes connactenting two strings together will be more efficient, and should be evaluated on a case by case basis.
Atwood has an interesting article related to this.
Why would the performance be any different in a web application or a winforms application?
Using stringbuilder is a matter of good practice because of memory and object allocation, the rules apply no matter why you are building the code.
If you're making the string in a loop with a high number of iterations, then it's a good idea to use stringbuilder. Otherwise, string concatenation is your best bet.
FIRSTLY, Are you still writing this application? If yes then STOP Performance tuning!
SECONDLY, Prioritise Correctness over Speed. Readability is way more important in the long run for obvious reasons.
THIRDLY, WE don't know the exact situation and code you are writing. We can't really advise you if this micro optimisation is important to the performance of your code or not. MEASURE the difference. I highly recommend Red Gate's Ants Profiler.