klocwork reports issues with concatenation in a loop - c#

String concatenation in a loop. I get this bug when i tested my project in klocwork. i am concatenating many strings in loop is it a big mistake .
for(int j=0;j<nAttr;j++)
{
builder = new StringBuilder();
size=rnd.Next(1,10);
for(int k=0; k<size; k++)
{
ch = Convert.ToChar(Convert.ToInt32(26 * rnd.NextDouble() + 65)) ;
if(ch=='[' || ch==']')
j--;
else
builder.Append(ch);
}
strXml+=" "+builder.ToString();//here the bug arises
strXml+="="+"\"";
i also found this bug when i tested
Class/struct data member is hidden by a local variable what does it mean
private void TraverseValues(XmlNode n,ArrayList arr)
{
if(n.HasChildNodes )
{
for(int i=0;i<n.ChildNodes.Count;i++)
{
if(n.ChildNodes[i].Name=="#text")
arr.Add(n.ChildNodes[i].InnerText );
else
TraverseValues(n.ChildNodes[i],arr)//here the code arises
i completed my project .it works fine then i tested my product with klocwork it suggested these bugs .but i am not able to understand y it says it has critical error in code

Klickwork creates a static analysis application.
I think Arunachalam refers to some static analysis assertion as "bug".
When strXml is of string type then it would be possible that static analysis will show strXml+=" "+builder.ToString(); as bad code (a "bug").
From the code posted it is definitely not bug in the core .NET Library.

It sounds like you might be defining a variable in your loop that has the same name as one in the class?

What type of Bug do you expect there to be? Could you post more of your code, so that we can se, what should come out of this operation?
I just scrapped your code into a new project, cleaned the code a little bit up, but it seems to work like expected.
StringBuilder builder;
int nAttr = 5;
int size;
char ch;
Random rnd = new Random();
for(int j=0;j<nAttr;j++)
{
builder = new StringBuilder();
builder.Append(" ");
size=rnd.Next(1,10);
for(int k=0; k<size; k++)
{
ch = Convert.ToChar(Convert.ToInt32(26 * rnd.NextDouble() + 65)) ;
if(ch=='[' || ch==']')
j--;
else
builder.Append(ch);
}
builder.Append("=\"");
Console.WriteLine(builder.ToString());
}
One thing I do not understand is why do you use a StringBuilder, then concatenate the result of the StringBuilder with a literal to a new string and concatenate that again. You should use your StringBuilder to build the whole string, the export the result.

You might want to have
if (ch=='[' || ch==']')
k--;
But even this doesn't make sense, since you will only get characters from 'A' to 'Z'.

If I understand it correctly, it sounds like your tool is trying to tell you to only update strXml once:
builder = new StringBuilder();
for(int j=0;j<nAttr;j++)
{
builder.Append(" ");
size=rnd.Next(1,10);
for(int k=0; k<size; k++)
{
ch = Convert.ToChar(Convert.ToInt32(26 * rnd.NextDouble() + 65)) ;
if(ch=='[' || ch==']')
j--;
else
builder.Append(ch);
}
builder.Append("\"");
//...
}
strXml += builder.ToString();

Related

C# Extension method slower than chained Replace unless in tight loop. Why?

I have an extension method to remove certain characters from a string (a phone number) which is performing much slower than I think it should vs chained Replace calls. The weird bit, is that in a loop it overtakes the Replace thing if the loop runs for around 3000 iterations, and after that it's faster. Lower than that and chaining Replace is faster. It's like there's a fixed overhead to my code which Replace doesn't have. What could this be!?
Quick look. When only testing 10 numbers, mine takes about 0.3ms, while Replace takes only 0.01ms. A massive difference! But when running 5 million, mine takes around 1700ms while Replace takes about 2500ms.
Phone numbers will only have 0-9, +, -, (, )
Here's the relevant code:
Building test cases, I'm playing with testNums.
int testNums = 5_000_000;
Console.WriteLine("Building " + testNums + " tests");
Random rand = new Random();
string[] tests = new string[testNums];
char[] letters =
{
'0','1','2','3','4','5','6','7','8','9',
'+','-','(',')'
};
for(int t = 0; t < tests.Length; t++)
{
int length = rand.Next(5, 20);
char[] word = new char[length];
for(int c = 0; c < word.Length; c++)
{
word[c] = letters[rand.Next(letters.Length)];
}
tests[t] = new string(word);
}
Console.WriteLine("Tests built");
string[] stripped = new string[tests.Length];
Using my extension method:
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].CleanNumberString();
}
stopwatch.Stop();
Console.WriteLine("Clean: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Using chained Replace:
stripped = new string[tests.Length];
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < stripped.Length; i++)
{
stripped[i] = tests[i].Replace(" ", string.Empty)
.Replace("-", string.Empty)
.Replace("(", string.Empty)
.Replace(")", string.Empty)
.Replace("+", string.Empty);
}
stopwatch.Stop();
Console.WriteLine("Replace: " + stopwatch.Elapsed.TotalMilliseconds + "ms");
Extension method in question:
public static string CleanNumberString(this string s)
{
Span<char> letters = stackalloc char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] >= '0' && s[i] <= '9')
letters[count++] = s[i];
}
return new string(letters.Slice(0, count));
}
What I've tried:
I've run them around the other way. Makes a tiny difference, but not enough.
Make it a normal static method, which was significantly slower than extension. As a ref parameter was slightly slower, and in parameter was about the same as extension method.
Aggressive Inlining. Doesn't make any real difference. I'm in release mode, so I suspect the compiler inlines it anyway. Either way, not much change.
I have also looked at memory allocations, and that's as I expect. My one allocates on the managed heap only one string per iteration (the new string at the end) which Replace allocates a new object for each Replace. So the memory used by the Replace one is much, higher. But it's still faster!
Is it calling native C code and doing something crafty there? Is the higher memory usage triggering the GC and slowing it down (still doesn't explane the insanely fast time on only one or two iterations)
Any ideas?
(Yes, I know not to bother optimising things like this, it's just bugging me because I don't know why it's doing this)
After doing some benchmarks, I think can safely assert that your initial statement is wrong for the exact reason you mentionned in your deleted answer: the loading time of the method is the only thing that misguided you.
Here's the full benchmark on a simplified version of the problem:
static void Main(string[] args)
{
// Build string of n consecutive "ab"
int n = 1000;
Console.WriteLine("N: " + n);
char[] c = new char[n];
for (int i = 0; i < n; i+=2)
c[i] = 'a';
for (int i = 1; i < n; i += 2)
c[i] = 'b';
string s = new string(c);
Stopwatch stopwatch;
// Make sure everything is loaded
s.CleanNumberString();
s.Replace("a", "");
s.UnsafeRemove();
// Tests to remove all 'a' from the string
// Unsafe remove
stopwatch = Stopwatch.StartNew();
string a1 = s.UnsafeRemove();
stopwatch.Stop();
Console.WriteLine("Unsafe remove:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Extension method
stopwatch = Stopwatch.StartNew();
string a2 = s.CleanNumberString();
stopwatch.Stop();
Console.WriteLine("Clean method:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// String replace
stopwatch = Stopwatch.StartNew();
string a3 = s.Replace("a", "");
stopwatch.Stop();
Console.WriteLine("String.Replace:\t" + stopwatch.Elapsed.TotalMilliseconds + "ms");
// Make sure the returned strings are identical
Console.WriteLine(a1.Equals(a2) && a2.Equals(a3));
Console.ReadKey();
}
public static string CleanNumberString(this string s)
{
char[] letters = new char[s.Length];
int count = 0;
for (int i = 0; i < s.Length; i++)
if (s[i] == 'b')
letters[count++] = 'b';
return new string(letters.SubArray(0, count));
}
public static T[] SubArray<T>(this T[] data, int index, int length)
{
T[] result = new T[length];
Array.Copy(data, index, result, 0, length);
return result;
}
// Taken from https://stackoverflow.com/a/2183442/6923568
public static unsafe string UnsafeRemove(this string s)
{
int len = s.Length;
char* newChars = stackalloc char[len];
char* currentChar = newChars;
for (int i = 0; i < len; ++i)
{
char c = s[i];
switch (c)
{
case 'a':
continue;
default:
*currentChar++ = c;
break;
}
}
return new string(newChars, 0, (int)(currentChar - newChars));
}
When ran with different values of n, it is clear that your extension method (or at least my somewhat equivalent version of it) has a logic that makes it faster than String.Replace(). In fact, it is more performant on either small or big strings:
N: 100
Unsafe remove: 0,0024ms
Clean method: 0,0015ms
String.Replace: 0,0021ms
True
N: 100000
Unsafe remove: 0,3889ms
Clean method: 0,5308ms
String.Replace: 1,3993ms
True
I highly suspect optimizations for the replacement of strings (not to be compared to removal) in String.Replace() to be the culprit here. I also added a method from this answer to have another comparison on removal of characters. That one's times behave similarly to your method but gets faster on higher values (80k+ on my tests) of n.
With all that being said, since your question is based on an assumption that we found was false, if you need more explanation on why the opposite is true (i.e. "Why is String.Replace() slower than my method"), plenty of in-depth benchmarks about string manipulation already do so.
I ran the clean method a couple more. interestingly, it is a lot faster than the Replace. Only the first time run was slower. Sorry that I couldn't explain why it's slower the first time but I ran more of the method then the result was expected.
Building 100 tests
Tests built
Replace: 0.0528ms
Clean: 0.4526ms
Clean: 0.0413ms
Clean: 0.0294ms
Replace: 0.0679ms
Replace: 0.0523ms
used dotnet core 2.1
So I've found with help from daehee Kim and Mat below that it's only the first iteration, but it's for the whole first loop. Every loop after there is ok.
I use the following line to force the JIT to do its thing and initialise this method:
RuntimeHelpers.PrepareMethod(typeof(CleanExtension).GetMethod("CleanNumberString", BindingFlags.Public | BindingFlags.Static).MethodHandle);
I find the JIT usually takes about 2-3ms to do its thing here (including Reflection time of about 0.1ms). Note that you should probably not be doing this because you're now getting the Reflection cost as well, and the JIT will be called right after this anyway, but it's probably a good idea for benchmarks to fairly compare.
The more you know!
My benchmark for a loop of 5000 iterations, repeated 5000 times with random strings and averaged is:
Clean: 0.41078ms
Replace: 1.4974ms

Efficient Replace Characters in a string from one array for another

The specific problem I have is that I have to replace the numbers in chemical formulae with the equivalent Unicode subscripts, so H2SO4 => H₂SO₄. (Those subscripts are not font adjustments, they are special unicode characters.)
So my initial cut was:
return unit.Replace("2", "₂").
Replace("3", "₃").
Replace("4", "₄").
Replace("5", "₅").
Replace("6", "₆").
Replace("7", "₇");
Which works, but obviously isn't particularly efficient. Any suggestions for a more optimal algorithm?
There are only 10 possible subscript characters that need replacement and most chemical formulas are not too long. For this reason, I think your implementation is not horribly inefficient and I would suggest benchmarking your code before trying to optimize it.
But here's my attempt to create a method that does what you need:
public string ToSubscriptFormula(string input)
{
var characters = input.ToCharArray();
for (var i = 0; i < characters.Length; i++)
{
switch (characters[i])
{
case '2':
characters[i] = '₂';
break;
case '3':
characters[i] = '₃';
break;
// case statements omitted
}
}
return new string(characters);
}
I would recommend avoiding the use of StringBuilder unless you're appending a large amount of strings, as the overhead of creating an instance would actually make your code less efficient. See this post by Jon Skeet for a detailed explanation of when it should be used.
Also, given the limited number of case statements, I personally don't think using a Dictionary<char,char> would add any readability or performance benefit, but under different scenarios it might be useful to consider using one.
But if you really had to super-optimize your method, you could replace the case statement with the following code (thanks to andrew for the suggestion):
public string ToSubscriptFormula(string input)
{
var characters = input.ToCharArray();
const int distance = '₀' - '0'; // distance of subscript from digit
for (var i = 0; i < characters.Length; i++)
{
if(char.IsDigit(characters[i]))
{
characters[i] = (char) (characters[i] + distance);
}
}
return new string(characters);
}
The trick here is that all subscript characters are successive and that casting an int to char will give you the corresponding character.
Finally, as #nwellnhof has suggested in the comments, char.IsDigit() would return true for some non-latin digit characters in the Unicode Nd Category.
If your chemical formula contains such characters, the statement should be replaced with c >= '0' && c<='9'. This will probably be slightly faster than char.IsDigit but I'm not sure if it would make a difference in most practical scenarios.
I would be tempted to do something like this:
public string replace(string input)
{
StringBuilder sb = new StringBuilder();
Dictionary<char, char> map = new Dictionary<char, char>();
map.Add('2', '₂');
map.Add('3', '₃');
map.Add('4', '₄');
map.Add('5', '₅');
map.Add('6', '₆');
map.Add('7', '₇');
char tmp;
foreach(char c in input)
{
if (map.TryGetValue(c, out tmp))
sb.Append(tmp);
else
sb.Append(c);
}
return sb.ToString();
}
The Dictionary is defined inside the method here for simplicity, but should be defined somewhere else in scope.
So, very simply, iterate the input string only once. For every character, find the matching Dictionary entry if it exists, and append either that or the original character to a StringBuilder in order to avoid creating multiple string objects.
My first thought was what about formulae with balancing prefix numbers:
E.g. 2H₂(g) + O₂(g) → 2H₂O(g)
Presumably you don't want this to replace the leading numbers?
Also, I'm not sure why it is mentioned above that only 8 digits (or even only 6 digits) need replacement - aren't all digits required (0-9)? Sure, you don't have 0 and 1 by themselves, but you need them for, e.g., 10.
Anyway, notwithstanding the above (which I didn't attempt to implement since it wasn't the question), avoiding StringBuilder and operating on a char array seemed to make sense, and I preferred to avoid a large switch statement.
public class Program
{
public static void Main()
{
Console.WriteLine(SubscriptNums("C6H12O6"));
}
public static string SubscriptNums(string input)
{
char[] replacementChars = { '₀', '₁', '₂', '₃', '₄', '₅', '₆', '₇', '₈', '₉' };
int zeroCharIndex = (int)'0';
char[] inputCharArray = input.ToCharArray();
for(int i = 0; i < inputCharArray.Length; i++)
{
if (inputCharArray[i] >= '0' && inputCharArray[i] <= '9')
{
inputCharArray[i] = replacementChars[(int)inputCharArray[i] - zeroCharIndex];
}
}
return new string(inputCharArray);
}
}
Edit 1 - removed magic number for numeric value of '0'.
Edit 2 - removed use of IsDigit.
You could iterate over the string and check each char. If it is to replace, append the according character to the StringBuilder. If not, just add the original character. This way, you only have to iterate over the string once, and not once for each replacement. Furthermore, as strings are immutable, each call of String.Replace() will create a new copy of the string for the result, which will immediately be GC'ed again.
StringBuilder sb = new StringBuilder();
for (int i = 0; i < unit.Length; i++) {
switch(unit[i]) {
case '2': sb.Append('₂'); break;
case '3': sb.Append('₃'); break;
...
default: sb.Append(unit[i]); break;
}
}
output = sb.ToString();
You could also introduce some replacement dictionary, like Abdullah Nehir suggested
StringBuilder sb = new StringBuilder();
Dictionary<char, char> replacements = new Dictionary<char, char>();
//put in the pairs
for (int i = 0; i < unit.Length; i++) {
if (replacements.ContainsKey(unit[i]))
sb.Append(replacement[unit[i]];
else
sb.Append(unit[i]);
}
Instead of accessing the values via index, you can also iterate the string with a foreach loop
foreach (char c in unit) {
if (replacements.ContainsKey(c))
sb.Append(replacements[c]);
else
sb.Append(c);
}
If you were looking for some elegant code where you don't have to type string.Replace for each character, then this would help you:
public static string Replace(string input)
{
char[] inputCharArr = input.ToCharArray();
StringBuilder sb = new StringBuilder();
foreach (var c in inputCharArr)
{
int intC = (int)c;
//If the digit was a number ([0-9] are [48-57] in unicode),
//replace the old char with the new char
//(8272 when added to the unicode of [0-9] gives the desired result)
if (intC > 47 && intC < 58)
sb.Append((char)(intC + 8272));
else sb.Append(c);
}
return sb.ToString();
}
See the edit history if you wonder what the comments are talking about.

C# Reverse() function not working properly

I'm really confused why the reverse function isn't working properly..
I currently have
List<string> decimalVector = new List<string>();
string tempString = "10"
//For Vector Representation
for (int i = 0; i < tempString.Length; i++)
{
//As long as we aren't at the last digit...
if (i != (tempString.Length-1))
{
decimalVector.Add(tempString[i].ToString() + ",");
}
else
{
decimalVector.Add(tempString[i].ToString());
}
}
Console.Write("Decimal: " + decimalOutput);
Console.Write(" Vector Representation: [");
decimalVector.Reverse();
for (int i = 0; i < decimalVector.Count; i++)
{
Console.Write(decimalVector[i]);
}
Console.Write("]");
For some reason instead of the code outputting [0,1] as it should - since that is the reverse of what is currently in the decimalVector ([1,0]) ..It prints out [01,] I am so confused. Why is it randomly moving my comma out of place? Am I doing something really stupid and not seeing it?
You're reversing the order of the elements, not the order of the characters. It's 1, followed by 0. When reversed it's 0 followed by 1,. When you print that, you get 01,.
You should not include the separating , as part of the list elements, but rather only add it when printing.
Btw there is the string.Join method, which solves your problem elegantly:
string.join(",", tempString.Select(c => c.ToString()).Reverse())
Try this:
foreach (string s in decimalVector.Reverse())
{
Console.Write(s);
}

Replace strings in C#

This might be a very basic question. I need to write a code which works similar as string replace algorithm.
static string stringReplace(string s, string stringOld, string stringNew)
{
string newWord = "";
int oldMax = stringOld.Length;
int index = 0;
for (int i = 0; i < s.Length; i++)
{
if (index != oldMax && s[i] == stringOld[index])
{
if (stringOld[index] < stringNew[index])
{
newWord = newWord + stringNew[index];
index++;
}
else
{
newWord = newWord + stringNew[index];
}
}
else
{
newWord = newWord + s[i];
}
}
return newWord;
}
Since it's 3am the code above is probably bugged. When the new word is shorter than the old one, it goes wrong. Same as when it's longer. When the index variable is equal for both stringOld and stringNew, it will do the swap. I think... Please don't post "use string.Replace(), I have to write that algorithm myself...
I don't know what you're trying to do with your code, but the problem is not a small one.
Think logically about what you are trying to do.
It is a two step process:
Find the starting index of stringOld in s.
If found replace stringOld with stringNew.
Step 1:
There are many rather complex (and elegant) efficient string search algorithms, you can search for them online or look at popular 'Introduction to Algorithms' by Cormen, Leiserson, Rivest & Stein, but the naive approach involves two loops and is pretty simple. It is also described in that book (and online.)
Step 2:
If a match is found at index i; simply copy characters 0 to i-1 of s to newWord, followed by newString and then the rest of the characters in s starting at index i + oldString.Length.

Use of unasigned local variable (string) error

So i get this error at the textbox assignation but i don't understand why, can anyone give me an advice on what to do?
private void button2_Click(object sender, EventArgs e)
{
int n = Convert.ToInt32(textBox16.Text);
int t = Convert.ToInt32(textBox17.Text);
matrix.CalculeazaQR(n, t);
string temp;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
temp = matrix.q[i, j].ToString("0.00");
if (j % (n - 1) == 0)
temp += "\n";
temp += ",";
}
}
textBox3.Text = temp;
}
You are assigning temp inside the for loop and compiler can't determine whether you will get inside the loop or not. You may initialize the temp on top like:
string temp = string.Empty;
Statements inside the loop would only execute if the condition is true and the compiler at compile time can't determine whether the condition will be true or not, it will consider the temp to remain unassigned, hence the error.
put like this:
string temp="";
you have to assign string(or any variable) to empty or something before you actually use it.
"can anyone give me an advice on what to do?"
Well, Initialize text:
string temp = string.Empty;
The compiler has no way of knowing if temp (which is used on textBox3.Text = temp;) has a value after the loops (for instance, when n < 1).
For one thing, your loop is broken to start with - only the very last iteration will matter (i.e. when both i and j are n - 1) as you're replacing the value of temp completely.
But the compiler doesn't know that n is positive - it doesn't know that you'll ever get into the loop. In general, the compiler will never assume that you enter the body of an if statement, a for statement, a while statement or a foreach loop - so any assignments made within those bodies don't affect whether a local variable is definitely assigned or not at the end of the statement... and a local variable has to be definitely assigned before you can read from it (as you're doing at the end of the method).
I suspect you actually want a StringBuilder which you append to within the loop:
StringBuilder builder = new StringBuilder();
for (...)
{
for (...)
{
builder.AppendFormat("{0:0.00},", matrix.q[i, j]);
}
builder.Append("\n");
}
textBox3.Test = builder.ToString();

Categories