Related
I have a equation string and when I split it with a my pattern I get the folowing string array.
string[] equationList = {"code1","+","code2","-","code3"};
Then from this I create a list which only contains the codes.
List<string> codeList = {"code1","code2","code3"};
Then existing code loop through the codeList and retrieve the value of each code and replaces the value in the equationList with the below code.
foreach (var code in codeList ){
var codeVal = GetCodeValue(code);
for (var i = 0; i < equationList.Length; i++){
if (!equationList[i].Equals(code,StringComparison.InvariantCultureIgnoreCase)) continue;
equationList[i] = codeVal;
break;
}
}
I am trying to improve the efficiency and I believe I can get rid of the for loop within the foreach by using linq.
My question is would it be any better if I do in terms of speeding up the process?
If yes then can you please help with the linq statement?
Before jumping to LINQ... which doesn't solve any problems you've described, let's look at the logic you have here.
We split a string with a 'pattern'. How?
We then create a new list of codes. How?
We then loop through those codes and decode them. How?
But since we forgot to keep track of where those code came from, we now loop through the equationList (which is an array, not a List<T>) to substitute the results.
Seems a little convoluted to me.
Maybe a simpler solution would be:
Take in a string, and return IEnumerable<string> of words (similar to what you do now).
Take in a IEnumerable<string> of words, and return a IEnumerable<?> of values.
That is to say with this second step iterate over the strings, and simply return the value you want to return - rather than trying to extract certain values out, parsing them, and then inserting them back into a collection.
//Ideally we return something more specific eg, IEnumerable<Tokens>
public IEnumerable<string> ParseEquation(IEnumerable<string> words)
{
foreach (var word in words)
{
if (IsOperator(word)) yield return ToOperator(word);
else if (IsCode(word)) yield return ToCode(word);
else ...;
}
}
This is quite similar to the LINQ Select Statement... if one insisted I would suggest writing something like so:
var tokens = equationList.Select(ToToken);
...
public Token ToToken(string word)
{
if (IsOperator(word)) return ToOperator(word);
else if (IsCode(word)) return ToCode(word);
else ...;
}
If GetCodeValue(code) doesn't already, I suggest it probably could use some sort of caching/dictionary in its implementation - though the specifics dictate this.
The benefits of this approach is that it is flexible (we can easily add more processing steps), simple to follow (we put in these values and get these as a result, no mutating state) and easy to write. It also breaks the problem down into nice little chunks that solve their own task, which will help immensely when trying to refactor, or find niggly bugs/performance issues.
If your array is always alternating codex then operator this LINQ should do what you want:
string[] equationList = { "code1", "+", "code2", "-", "code3" };
var processedList = equationList.Select((s,j) => (j % 2 == 1) ? s :GetCodeValue(s)).ToArray();
You will need to check if it is faster
I think the fastest solution will be this:
var codeCache = new Dictionary<string, string>();
for (var i = equationList.Length - 1; i >= 0; --i)
{
var item = equationList[i];
if (! < item is valid >) // you know this because you created the codeList
continue;
string codeVal;
if (!codeCache.TryGetValue(item, out codeVal))
{
codeVal = GetCodeValue(item);
codeCache.Add(item, codeVal);
}
equationList[i] = codeVal;
}
You don't need a codeList. If every code is unique you can remove the codeCace.
In my project I am looping across a dataview result.
string html =string.empty;
DataView dV = data.DefaultView;
for(int i=0;i< dV.Count;i++)
{
DataRowView rv = dV[i];
html += rv.Row["X"].Tostring();
}
Number of rows in dV will alway be 3 or 4.
Is it better to use the string concat += opearator or StringBuilder for this case and why?
I would use StringBuilder here, just because it describes what you're doing.
For a simple concatenation of 3 or 4 strings, it probably won't make any significant difference, and string concatenation may even be slightly faster - but if you're wrong and there are lots of rows, StringBuilder will start getting much more efficient, and it's always more descriptive of what you're doing.
Alternatively, use something like:
string html = string.Join("", dv.Cast<DataRowView>()
.Select(rv => rv.Row["X"]));
Note that you don't have any sort of separator between the strings at the moment. Are you sure that's what you want? (Also note that your code doesn't make a lot of sense at the moment - you're not using i in the loop. Why?)
I have an article about string concatenation which goes into more detail about why it's worth using StringBuilder and when.
EDIT: For those who doubt that string concatenation can be faster, here's a test - with deliberately "nasty" data, but just to prove it's possible:
using System;
using System.Diagnostics;
using System.Text;
class Test
{
static readonly string[] Bits = {
"small string",
"string which is a bit longer",
"stirng which is longer again to force yet another copy with any luck"
};
static readonly int ExpectedLength = string.Join("", Bits).Length;
static void Main()
{
Time(StringBuilderTest);
Time(ConcatenateTest);
}
static void Time(Action action)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
// Make sure it's JITted
action();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000000; i++)
{
action();
}
sw.Stop();
Console.WriteLine("{0}: {1} millis", action.Method.Name,
(long) sw.Elapsed.TotalMilliseconds);
}
static void ConcatenateTest()
{
string x = "";
foreach (string bit in Bits)
{
x += bit;
}
// Force a validation to prevent dodgy optimizations
if (x.Length != ExpectedLength)
{
throw new Exception("Eek!");
}
}
static void StringBuilderTest()
{
StringBuilder builder = new StringBuilder();
foreach (string bit in Bits)
{
builder.Append(bit);
}
string x = builder.ToString();
// Force a validation to prevent dodgy optimizations
if (x.Length != ExpectedLength)
{
throw new Exception("Eek!");
}
}
}
Results on my machine (compiled with /o+ /debug-):
StringBuilderTest: 2245 millis
ConcatenateTest: 989 millis
I've run this several times, including reversing the order of the tests, and the results are consistent.
StringBuilder is recommended.. why dont you do an analysis for yourself and then decide what is the best for you..
var stopWatch=new StopWatch();
stopWatch.Start();
string html =string.empty;
DataView dV = data.DefaultView;
for(int i=0;i< dV.Count;i++)
{
html += dV.Row["X"].Tostring();
}
stopWatch.Stop();
Console.Write(stopWatch.EllapsedMilliseconds());
var stopWatch=new StopWatch();
stopWatch.Start();
string html =new StringBuilder();
DataView dV = data.DefaultView;
for(int i=0;i< dV.Count;i++)
{
html.Append(dV.Row["X"].ToString());
}
var finalHtml=html.ToString();
stopWatch.Stop();
Console.Write(stopWatch.EllapsedMilliseconds());
From the Documentation:
The String class is preferable for a concatenation operation if a
fixed number of String objects are concatenated. In that case, the
individual concatenation operations might even be combined into a
single operation by the compiler.
A StringBuilder object is preferable for a concatenation operation if
an arbitrary number of strings are concatenated; for example, if a
loop concatenates a random number of strings of user input.
So in your case i would say the String is better.
EDIT:
This is a no end disscussion, anyway i would recommend you to check how many opaeration do you have in average and test the performance for each one of them to compare results.
Check this nice link regarding this issue including some performance test code.
StringBuilder for sure. String are immutable remember !
EDIT: For 3-4 rows, concatenation will be a preferred choice as Jon Skeet has said in his answer
StringBuilder is recommended. It is mutable. It should place much less stress on the memory allocator :-)
A string instance is immutable. You cannot change it after it was
created.
Any operation that appears to change the string instead returns a new instance.
stringbuilder is what you are looking for. In general, if there is a function for some job try to utilize it instead of writing some procedure which does pretty much the same job.
I've written a class for processing strings and I have the following problem: the string passed in can come with spaces at the beginning and at the end of the string.
I need to trim the spaces from the strings and convert them to lower case letters. My code so far:
var searchStr = wordToSearchReplacemntsFor.ToLower();
searchStr = searchStr.Trim();
I couldn't find any function to help me in StringBuilder. The problem is that this class is supposed to process a lot of strings as quickly as possible. So I don't want to be creating 2 new strings for each string the class processes.
If this isn't possible, I'll go deeper into the processing algorithm.
Try method chaining.
Ex:
var s = " YoUr StRiNg".Trim().ToLower();
Cyberdrew has the right idea. With string being immutable, you'll be allocating memory during both of those calls regardless. One thing I'd like to suggest, if you're going to call string.Trim().ToLower() in many locations in your code, is to simplify your calls with extension methods. For example:
public static class MyExtensions
{
public static string TrimAndLower(this String str)
{
return str.Trim().ToLower();
}
}
Here's my attempt. But before I would check this in, I would ask two very important questions.
Are sequential "String.Trim" and "String.ToLower" calls really impacting the performance of my app? Would anyone notice if this algorithm was twice as slow or twice as fast? The only way to know is to measure the performance of my code and compare against pre-set performance goals. Otherwise, micro-optimizations will generate micro-performance gains.
Just because I wrote an implementation that appears faster, doesn't mean that it really is. The compiler and run-time may have optimizations around common operations that I don't know about. I should compare the running time of my code to what already exists.
static public string TrimAndLower(string str)
{
if (str == null)
{
return null;
}
int i = 0;
int j = str.Length - 1;
StringBuilder sb;
while (i < str.Length)
{
if (Char.IsWhiteSpace(str[i])) // or say "if (str[i] == ' ')" if you only care about spaces
{
i++;
}
else
{
break;
}
}
while (j > i)
{
if (Char.IsWhiteSpace(str[j])) // or say "if (str[j] == ' ')" if you only care about spaces
{
j--;
}
else
{
break;
}
}
if (i > j)
{
return "";
}
sb = new StringBuilder(j - i + 1);
while (i <= j)
{
// I was originally check for IsUpper before calling ToLower, probably not needed
sb.Append(Char.ToLower(str[i]));
i++;
}
return sb.ToString();
}
If the strings use only ASCII characters, you can look at the C# ToLower Optimization. You could also try a lookup table if you know the character set ahead of time
So first of all, trim first and replace second, so you have to iterate over a smaller string with your ToLower()
other than that, i think your best algorithm would look like this:
Iterate over the string once, and check
whether there's any upper case characters
whether there's whitespace in beginning and end (and count how many chars you're talking about)
if none of the above, return the original string
if upper case but no whitespace: do ToLower and return
if whitespace:
allocate a new string with the right size (original length - number of white chars)
fill it in while doing the ToLower
You can try this:
public static void Main (string[] args) {
var str = "fr, En, gB";
Console.WriteLine(str.Replace(" ","").ToLower());
}
I want to take and edit a string in-place in a .NET app. I know that StringBuilder allows me to do in-place appends, inserts, and replaces, but it does not allow an easy way of doing stuff like this:
while (script.IndexOf("#Unique", StringComparison.InvariantCultureIgnoreCase) != -1)
{
int Location = script.IndexOf("#Unique", StringComparison.InvariantCultureIgnoreCase);
script = script.Remove(Location, 7);
script = script.Insert(Location, Guid.NewGuid().ToString());
}
As there is no IndexOf in StringBuilder. Does anyone have an effective way to do in-place editing of textual information?
Edit #1:
Changed sample to make more obvious that each 'replace' needs to have a different result.
If your code really is this straightforward then why not just use one of the built-in Replace methods, either on string, StringBuilder or Regex?
EDIT FOLLOWING COMMENT...
You can replace each occurrence with a separate value by using one of the overloads of Regex.Replace that takes a MatchEvaluator argument:
string foo = "blah blah #Unique blah #Unique blah blah #Unique blah";
// replace each occurrence of "#Unique" with a separate guid
string bar = Regex.Replace(foo, "#Unique",
new MatchEvaluator(m => Guid.NewGuid().ToString()),
RegexOptions.IgnoreCase));
How many replacements will you be doing?
If its not four figures, then just accept the new string instances, you may be prematurely optimising...
Another solution... Split on "#uniqueID" then rejoin with a StringBuilder adding your seperator for each iteration.
How about StringBuilder "Replace" method:
StringBuilder script;
script.Replace("#Unique", GetGuidString());
StringBuilder is made so that you can easily add to it, but at the tradeoff that it's difficult to search in it - and especially, it's more difficult (i.e. slower) to index it.
If you need to modify some characters "in-place", it's best to do it on the resulting string.
But it's difficult to know from your question what is the right answer for you, my feeling is that you shouldn't be needing in-place replacement in a StringBuilder, and the problem is somewhere else/you do something else wrong.
User Dennis has provided an IndexOf extension method for StringBuilder. With this, you should be able to use StringBuilder in this manner.
Can you use a string split to do this efficiently?
Something like:
var sections = "a-#Unique-b-#Unique-c".Split(new string[] { "#Unique" }, StringSplitOptions.None);
int i;
StringBuilder builder = new StringBuilder();
for(i = 0; i < sections.Length - 1; i++)
{
builder.Append(sections[i]);
builder.Append(Guid.NewGuid().ToString());
}
builder.Append(sections[i]);
Console.WriteLine(builder.ToString());
Console.ReadKey(true);
complex but should be performant solution
public StringBuilder Replace(this StringBuilder sb, string toReplace, Func<string> getReplacement)
{
for (int i = 0; i < sb.Length; i++)
{
bool replacementFound = true;
for (int toReplaceIndex = 0; toReplaceIndex < toReplace.Length; toReplaceIndex++)
{
int sbIndex = toReplaceIndex + i;
if (sbIndex < sb.Length)
{
return sb;
}
if (sb[sbIndex] != toReplace[toReplaceIndex])
{
replacementFound = false;
break;
}
}
if (replacementFound)
{
string replacement = getReplacement();
// reuse the space of the toReplace string
for (int replacementIndex = 0; replacementIndex < toReplace.Length && replacementIndex < replacement.Length; replacementIndex++)
{
int sbIndex = replacementIndex + i;
sb[sbIndex] = replacement[i];
}
// remove toReplace string remainders
if (replacement.Length < toReplace.Length)
{
sb.Remove(i + replacement.Length, replacement.Length - toReplace.Length)
}
// insert chars not yet inserted
if (replacement.Length > toReplace.Length)
{
sb.Insert(i + toReplace.Length, replacement.ToCharArray(toReplace.Length, toReplace.Length - replacement.Length));
}
}
}
return sb;
}
use case
var sb = new StringBuilder(script);
script = sb.Replace("#Unique", () => Guid.NewGuid().ToString()).ToString();
You are going to need to use an unmanaged code block
As simple as declare a pointer to your string and manipulate it in memory.
Example
unsafe
{
char* ip;
ip = &to_your_string;
}
Suppose I have a stringbuilder in C# that does this:
StringBuilder sb = new StringBuilder();
string cat = "cat";
sb.Append("the ").Append(cat).(" in the hat");
string s = sb.ToString();
would that be as efficient or any more efficient as having:
string cat = "cat";
string s = String.Format("The {0} in the hat", cat);
If so, why?
EDIT
After some interesting answers, I realised I probably should have been a little clearer in what I was asking. I wasn't so much asking for which was quicker at concatenating a string, but which is quicker at injecting one string into another.
In both cases above I want to inject one or more strings into the middle of a predefined template string.
Sorry for the confusion
NOTE: This answer was written when .NET 2.0 was the current version. This may no longer apply to later versions.
String.Format uses a StringBuilder internally:
public static string Format(IFormatProvider provider, string format, params object[] args)
{
if ((format == null) || (args == null))
{
throw new ArgumentNullException((format == null) ? "format" : "args");
}
StringBuilder builder = new StringBuilder(format.Length + (args.Length * 8));
builder.AppendFormat(provider, format, args);
return builder.ToString();
}
The above code is a snippet from mscorlib, so the question becomes "is StringBuilder.Append() faster than StringBuilder.AppendFormat()"?
Without benchmarking I'd probably say that the code sample above would run more quickly using .Append(). But it's a guess, try benchmarking and/or profiling the two to get a proper comparison.
This chap, Jerry Dixon, did some benchmarking:
http://jdixon.dotnetdevelopersjournal.com/string_concatenation_stringbuilder_and_stringformat.htm
Updated:
Sadly the link above has since died. However there's still a copy on the Way Back Machine:
http://web.archive.org/web/20090417100252/http://jdixon.dotnetdevelopersjournal.com/string_concatenation_stringbuilder_and_stringformat.htm
At the end of the day it depends whether your string formatting is going to be called repetitively, i.e. you're doing some serious text processing over 100's of megabytes of text, or whether it's being called when a user clicks a button now and again. Unless you're doing some huge batch processing job I'd stick with String.Format, it aids code readability. If you suspect a perf bottleneck then stick a profiler on your code and see where it really is.
From the MSDN documentation:
The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs. A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated. In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input.
I ran some quick performance benchmarks, and for 100,000 operations averaged over 10 runs, the first method (String Builder) takes almost half the time of the second (String Format).
So, if this is infrequent, it doesn't matter. But if it is a common operation, then you may want to use the first method.
I would expect String.Format to be slower - it has to parse the string and then concatenate it.
Couple of notes:
Format is the way to go for user-visible strings in professional applications; this avoids localization bugs
If you know the length of the resultant string beforehand, use the StringBuilder(Int32) constructor to predefine the capacity
I think in most cases like this clarity, and not efficiency, should be your biggest concern. Unless you're crushing together tons of strings, or building something for a lower powered mobile device, this probably won't make much of a dent in your run speed.
I've found that, in cases where I'm building strings in a fairly linear fashion, either doing straight concatenations or using StringBuilder is your best option. I suggest this in cases where the majority of the string that you're building is dynamic. Since very little of the text is static, the most important thing is that it's clear where each piece of dynamic text is being put in case it needs updated in the future.
On the other hand, if you're talking about a big chunk of static text with two or three variables in it, even if it's a little less efficient, I think the clarity you gain from string.Format makes it worth it. I used this earlier this week when having to place one bit of dynamic text in the center of a 4 page document. It'll be easier to update that big chunk of text if its in one piece than having to update three pieces that you concatenate together.
If only because string.Format doesn't exactly do what you might think, here is a rerun of the tests 6 years later on Net45.
Concat is still fastest but really it's less than 30% difference. StringBuilder and Format differ by barely 5-10%. I got variations of 20% running the tests a few times.
Milliseconds, a million iterations:
Concatenation: 367
New stringBuilder for each key: 452
Cached StringBuilder: 419
string.Format: 475
The lesson I take away is that the performance difference is trivial and so it shouldn't stop you writing the simplest readable code you can. Which for my money is often but not always a + b + c.
const int iterations=1000000;
var keyprefix= this.GetType().FullName;
var maxkeylength=keyprefix + 1 + 1+ Math.Log10(iterations);
Console.WriteLine("KeyPrefix \"{0}\", Max Key Length {1}",keyprefix, maxkeylength);
var concatkeys= new string[iterations];
var stringbuilderkeys= new string[iterations];
var cachedsbkeys= new string[iterations];
var formatkeys= new string[iterations];
var stopwatch= new System.Diagnostics.Stopwatch();
Console.WriteLine("Concatenation:");
stopwatch.Start();
for(int i=0; i<iterations; i++){
var key1= keyprefix+":" + i.ToString();
concatkeys[i]=key1;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.WriteLine("New stringBuilder for each key:");
stopwatch.Restart();
for(int i=0; i<iterations; i++){
var key2= new StringBuilder(keyprefix).Append(":").Append(i.ToString()).ToString();
stringbuilderkeys[i]= key2;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.WriteLine("Cached StringBuilder:");
var cachedSB= new StringBuilder(maxkeylength);
stopwatch.Restart();
for(int i=0; i<iterations; i++){
var key2b= cachedSB.Clear().Append(keyprefix).Append(":").Append(i.ToString()).ToString();
cachedsbkeys[i]= key2b;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.WriteLine("string.Format");
stopwatch.Restart();
for(int i=0; i<iterations; i++){
var key3= string.Format("{0}:{1}", keyprefix,i.ToString());
formatkeys[i]= key3;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
var referToTheComputedValuesSoCompilerCantOptimiseTheLoopsAway= concatkeys.Union(stringbuilderkeys).Union(cachedsbkeys).Union(formatkeys).LastOrDefault(x=>x[1]=='-');
Console.WriteLine(referToTheComputedValuesSoCompilerCantOptimiseTheLoopsAway);
String.Format uses StringBuilder internally, so logically that leads to the idea that it would be a little less performant due to more overhead. However, a simple string concatenation is the fastest method of injecting one string between two others, by a significant degree. This evidence was demonstrated by Rico Mariani in his very first Performance Quiz, years ago. Simple fact is that concatenations, when the number of string parts is known (without limitation — you could concatenate a thousand parts, as long as you know it's always 1000 parts), are always faster than StringBuilder or String.Format. They can be performed with a single memory allocation and a series of memory copies. Here is the proof.
And here is the actual code for some String.Concat methods, which ultimately call FillStringChecked, which uses pointers to copy memory (extracted via Reflector):
public static string Concat(params string[] values)
{
int totalLength = 0;
if (values == null)
{
throw new ArgumentNullException("values");
}
string[] strArray = new string[values.Length];
for (int i = 0; i < values.Length; i++)
{
string str = values[i];
strArray[i] = (str == null) ? Empty : str;
totalLength += strArray[i].Length;
if (totalLength < 0)
{
throw new OutOfMemoryException();
}
}
return ConcatArray(strArray, totalLength);
}
public static string Concat(string str0, string str1, string str2, string str3)
{
if (((str0 == null) && (str1 == null)) && ((str2 == null) && (str3 == null)))
{
return Empty;
}
if (str0 == null)
{
str0 = Empty;
}
if (str1 == null)
{
str1 = Empty;
}
if (str2 == null)
{
str2 = Empty;
}
if (str3 == null)
{
str3 = Empty;
}
int length = ((str0.Length + str1.Length) + str2.Length) + str3.Length;
string dest = FastAllocateString(length);
FillStringChecked(dest, 0, str0);
FillStringChecked(dest, str0.Length, str1);
FillStringChecked(dest, str0.Length + str1.Length, str2);
FillStringChecked(dest, (str0.Length + str1.Length) + str2.Length, str3);
return dest;
}
private static string ConcatArray(string[] values, int totalLength)
{
string dest = FastAllocateString(totalLength);
int destPos = 0;
for (int i = 0; i < values.Length; i++)
{
FillStringChecked(dest, destPos, values[i]);
destPos += values[i].Length;
}
return dest;
}
private static unsafe void FillStringChecked(string dest, int destPos, string src)
{
int length = src.Length;
if (length > (dest.Length - destPos))
{
throw new IndexOutOfRangeException();
}
fixed (char* chRef = &dest.m_firstChar)
{
fixed (char* chRef2 = &src.m_firstChar)
{
wstrcpy(chRef + destPos, chRef2, length);
}
}
}
So, then:
string what = "cat";
string inthehat = "The " + what + " in the hat!";
Enjoy!
Oh also, the fastest would be:
string cat = "cat";
string s = "The " + cat + " in the hat";
It really depends. For small strings with few concatenations, it's actually faster just to append the strings.
String s = "String A" + "String B";
But for larger string (very very large strings), it's then more efficient to use StringBuilder.
In both cases above I want to inject one or more strings into the middle of a predefined template string.
In which case, I would suggest String.Format is the quickest because it is design for that exact purpose.
It really depends on your usage pattern.
A detailed benchmark between string.Join, string,Concat and string.Format can be found here: String.Format Isn't Suitable for Intensive Logging
I would suggest not, since String.Format was not designed for concatenation, it was design for formatting the output of various inputs such as a date.
String s = String.Format("Today is {0:dd-MMM-yyyy}.", DateTime.Today);