Inverse String.Replace - Faster way of doing it? - c#

I have a method to replace every character except those I specify. For example,
ReplaceNot("test. stop; or, not", ".;/\\".ToCharArray(), '*');
would return
"****.*****;***,****".
Now, this is not an instance of premature optimization. I call this method quite a few times during a network operation. I found that on longer strings, it is causing some latency, and removing it helped a bit. Any help to speed this up would be appreciated.
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
int index = 0;
int old = -1;
StringBuilder sb = new StringBuilder(original.Length);
while ((index = original.IndexOfAny(pattern, index)) > -1)
{
sb.Append(new string(replacement, index - old - 1));
sb.Append(original[index]);
old = index++;
}
if (original.Length - old > 1)
{
sb.Append(new string(replacement, original.Length - (old + 1)));
}
return sb.ToString();
}
Final #'s. I also added a test case for a 3K character string, ran at 100K times instead of 1M to see how well each of these scales. The only surprise was that the regular expression 'scaled better' than the others, but it is no help since it is very slow to begin with:
User Short * 1M Long * 100K Scale
John 319 2125 6.66
Luke 360 2659 7.39
Guffa 409 2827 6.91
Mine 447 3372 7.54
DirkGently 1094 9134 8.35
Michael 1591 12785 8.04
Peter 21106 94386 4.47
Update: I made the creation of the regular expression for Peter's version a static variable, and set it to RegexOptions.Compiled to be fair:
User Short * 1M Long * 100K Scale
Peter 8997 74715 8.30
Pastebin link to my testing code, please correct me if it is wrong:
http://pastebin.com/f64f260ee

Can't you use Regex.Replace like so:
Regex regex = new Regex(#"[^.;/\\]");
string s = regex.Replace("test. stop; or, not", "*");

Alright, on a ~60KB string, this will perform about 40% faster than your version:
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
int index = 0;
StringBuilder sb = new StringBuilder(new string(replacement, original.Length));
while ((index = original.IndexOfAny(pattern, index)) > -1)
{
sb[index] = original[index++];
}
return sb.ToString();
}
The trick is to initialize a new string with all replacement characters, since most of them will be replaced.

I don't know if this will be any faster, but it avoids newing up strings just so they can be appended to the string builder, which may help:
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
StringBuilder sb = new StringBuilder(original.Length);
foreach (char ch in original) {
if (Array.IndexOf( pattern, ch) >= 0) {
sb.Append( ch);
}
else {
sb.Append( replacement);
}
}
return sb.ToString();
}
If the number of chars in pattern will be of any size (which I'm guessing it generally won't), it might pay to sort it and perform an Array.BinarySearch() instead of the Array.indexOf().
For such a simple transformation, I'd bet that it'll have no problem being faster than a regex, too.
Also, since your set of characters in pattern are likely to usually come from a string anyway (at least that's been my general experience with this type of API), why don't you have the method signature be:
public static string ReplaceNot(this string original, string pattern, char replacement)
or better yet, have an overload where pattern can be a char[] or string?

Here's another version for you. My tests suggest that its performance is pretty good.
public static string ReplaceNot(
this string original, char[] pattern, char replacement)
{
char[] buffer = new char[original.Length];
for (int i = 0; i < buffer.Length; i++)
{
bool replace = true;
for (int j = 0; j < pattern.Length; j++)
{
if (original[i] == pattern[j])
{
replace = false;
break;
}
}
buffer[i] = replace ? replacement : original[i];
}
return new string(buffer);
}

The StringBuilder has an overload that takes a character and a count, so you don't have to create intermediate strings to add to the StringBuilder. I get about 20% improvement by replacing this:
sb.Append(new string(replacement, index - old - 1));
with:
sb.Append(replacement, index - old - 1);
and this:
sb.Append(new string(replacement, original.Length - (old + 1)));
with:
sb.Append(replacement, original.Length - (old + 1));
(I tested the code that you said was about four times faster, and I find it about 15 times slower...)

It's going to be O(n). You seem to be replacing all alphabets and whitespaces by *, why not just test if the current character is an alphabet/whitespace and replace it?

Related

Efficient Replace Characters in a string from one array for another

The specific problem I have is that I have to replace the numbers in chemical formulae with the equivalent Unicode subscripts, so H2SO4 => H₂SO₄. (Those subscripts are not font adjustments, they are special unicode characters.)
So my initial cut was:
return unit.Replace("2", "₂").
Replace("3", "₃").
Replace("4", "₄").
Replace("5", "₅").
Replace("6", "₆").
Replace("7", "₇");
Which works, but obviously isn't particularly efficient. Any suggestions for a more optimal algorithm?
There are only 10 possible subscript characters that need replacement and most chemical formulas are not too long. For this reason, I think your implementation is not horribly inefficient and I would suggest benchmarking your code before trying to optimize it.
But here's my attempt to create a method that does what you need:
public string ToSubscriptFormula(string input)
{
var characters = input.ToCharArray();
for (var i = 0; i < characters.Length; i++)
{
switch (characters[i])
{
case '2':
characters[i] = '₂';
break;
case '3':
characters[i] = '₃';
break;
// case statements omitted
}
}
return new string(characters);
}
I would recommend avoiding the use of StringBuilder unless you're appending a large amount of strings, as the overhead of creating an instance would actually make your code less efficient. See this post by Jon Skeet for a detailed explanation of when it should be used.
Also, given the limited number of case statements, I personally don't think using a Dictionary<char,char> would add any readability or performance benefit, but under different scenarios it might be useful to consider using one.
But if you really had to super-optimize your method, you could replace the case statement with the following code (thanks to andrew for the suggestion):
public string ToSubscriptFormula(string input)
{
var characters = input.ToCharArray();
const int distance = '₀' - '0'; // distance of subscript from digit
for (var i = 0; i < characters.Length; i++)
{
if(char.IsDigit(characters[i]))
{
characters[i] = (char) (characters[i] + distance);
}
}
return new string(characters);
}
The trick here is that all subscript characters are successive and that casting an int to char will give you the corresponding character.
Finally, as #nwellnhof has suggested in the comments, char.IsDigit() would return true for some non-latin digit characters in the Unicode Nd Category.
If your chemical formula contains such characters, the statement should be replaced with c >= '0' && c<='9'. This will probably be slightly faster than char.IsDigit but I'm not sure if it would make a difference in most practical scenarios.
I would be tempted to do something like this:
public string replace(string input)
{
StringBuilder sb = new StringBuilder();
Dictionary<char, char> map = new Dictionary<char, char>();
map.Add('2', '₂');
map.Add('3', '₃');
map.Add('4', '₄');
map.Add('5', '₅');
map.Add('6', '₆');
map.Add('7', '₇');
char tmp;
foreach(char c in input)
{
if (map.TryGetValue(c, out tmp))
sb.Append(tmp);
else
sb.Append(c);
}
return sb.ToString();
}
The Dictionary is defined inside the method here for simplicity, but should be defined somewhere else in scope.
So, very simply, iterate the input string only once. For every character, find the matching Dictionary entry if it exists, and append either that or the original character to a StringBuilder in order to avoid creating multiple string objects.
My first thought was what about formulae with balancing prefix numbers:
E.g. 2H₂(g) + O₂(g) → 2H₂O(g)
Presumably you don't want this to replace the leading numbers?
Also, I'm not sure why it is mentioned above that only 8 digits (or even only 6 digits) need replacement - aren't all digits required (0-9)? Sure, you don't have 0 and 1 by themselves, but you need them for, e.g., 10.
Anyway, notwithstanding the above (which I didn't attempt to implement since it wasn't the question), avoiding StringBuilder and operating on a char array seemed to make sense, and I preferred to avoid a large switch statement.
public class Program
{
public static void Main()
{
Console.WriteLine(SubscriptNums("C6H12O6"));
}
public static string SubscriptNums(string input)
{
char[] replacementChars = { '₀', '₁', '₂', '₃', '₄', '₅', '₆', '₇', '₈', '₉' };
int zeroCharIndex = (int)'0';
char[] inputCharArray = input.ToCharArray();
for(int i = 0; i < inputCharArray.Length; i++)
{
if (inputCharArray[i] >= '0' && inputCharArray[i] <= '9')
{
inputCharArray[i] = replacementChars[(int)inputCharArray[i] - zeroCharIndex];
}
}
return new string(inputCharArray);
}
}
Edit 1 - removed magic number for numeric value of '0'.
Edit 2 - removed use of IsDigit.
You could iterate over the string and check each char. If it is to replace, append the according character to the StringBuilder. If not, just add the original character. This way, you only have to iterate over the string once, and not once for each replacement. Furthermore, as strings are immutable, each call of String.Replace() will create a new copy of the string for the result, which will immediately be GC'ed again.
StringBuilder sb = new StringBuilder();
for (int i = 0; i < unit.Length; i++) {
switch(unit[i]) {
case '2': sb.Append('₂'); break;
case '3': sb.Append('₃'); break;
...
default: sb.Append(unit[i]); break;
}
}
output = sb.ToString();
You could also introduce some replacement dictionary, like Abdullah Nehir suggested
StringBuilder sb = new StringBuilder();
Dictionary<char, char> replacements = new Dictionary<char, char>();
//put in the pairs
for (int i = 0; i < unit.Length; i++) {
if (replacements.ContainsKey(unit[i]))
sb.Append(replacement[unit[i]];
else
sb.Append(unit[i]);
}
Instead of accessing the values via index, you can also iterate the string with a foreach loop
foreach (char c in unit) {
if (replacements.ContainsKey(c))
sb.Append(replacements[c]);
else
sb.Append(c);
}
If you were looking for some elegant code where you don't have to type string.Replace for each character, then this would help you:
public static string Replace(string input)
{
char[] inputCharArr = input.ToCharArray();
StringBuilder sb = new StringBuilder();
foreach (var c in inputCharArr)
{
int intC = (int)c;
//If the digit was a number ([0-9] are [48-57] in unicode),
//replace the old char with the new char
//(8272 when added to the unicode of [0-9] gives the desired result)
if (intC > 47 && intC < 58)
sb.Append((char)(intC + 8272));
else sb.Append(c);
}
return sb.ToString();
}
See the edit history if you wonder what the comments are talking about.

Is there a better way than String.Replace to remove backspaces from a string?

I have a string read from another source such as "\b\bfoo\bx". In this case, it would translate to the word "fox" as the first 2 \b's are ignored, and the last 'o' is erased, and then replaced with 'x'. Also another case would be "patt\b\b\b\b\b\b\b\b\b\bfoo" should be translated to "foo"
I have come up with something using String.Replace, but it is complex and I am worried it is not working correctly, also it is creating a lot of new string objects which I would like to avoid.
Any ideas?
Probably the easiest is to just iterate over the entire string. Given your inputs, the following code does the trick in 1-pass
public string ReplaceBackspace(string hasBackspace)
{
if( string.IsNullOrEmpty(hasBackspace) )
return hasBackspace;
StringBuilder result = new StringBuilder(hasBackspace.Length);
foreach (char c in hasBackspace)
{
if (c == '\b')
{
if (result.Length > 0)
result.Length--;
}
else
{
result.Append(c);
}
}
return result.ToString();
}
The way I would do it is low-tech, but easy to understand.
Create a stack of characters. Then iterate through the string from beginning to end. If the character is a normal character (non-slash), push it onto the stack. If it is a slash, and the next character is a 'b', pop the top of the stack. If the stack is empty, ignore it.
At the end, pop each character in turn, add it to a StringBuilder, and reverse the result.
Regular expressions version:
var data = #"patt\b\b\b\b\b\b\b\b\b\bfoo";
var regex = new Regex(#"(^|[^\\b])\\b");
while (regex.IsMatch(data))
{
data = regex.Replace(data, "");
}
Optimized version (and this one works with backspace '\b' and not with string "\b"):
var data = "patt\b\b\b\b\b\b\b\b\b\bfoo";
var regex = new Regex(#"[^\x08]\x08", RegexOptions.Compiled);
while (data.Contains('\b'))
{
data = regex.Replace(data.TrimStart('\b'), "");
}
public static string ProcessBackspaces(string source)
{
char[] buffer = new char[source.Length];
int idx = 0;
foreach (char c in source)
{
if (c != '\b')
{
buffer[idx] = c;
idx++;
}
else if (idx > 0)
{
idx--;
}
}
return new string(buffer, 0, idx);
}
EDIT
I've done a quick, rough benchmark of the code posted in answers so far (processing the two example strings from the question, one million times each):
ANSWER | TIME (ms)
------------------------|-----------
Luke (this one) | 318
Alexander Taran | 567
Robert Paulson | 683
Markus Nigbur | 2100
Kamarey (new version) | 7075
Kamarey (old version) | 30902
You could iterate through the string backward, making a character array as you go. Every time you hit a backspace, increment a counter, and every time you hit a normal character, skip it if your counter is non-zero and decrement the counter.
I'm not sure what the best C# data structure is to manage this and then be able to get the string in the right order afterward quickly. StringBuilder has an Insert method but I don't know if it will be performant to keep inserting characters at the start or not. You could put the characters in a stack and hit ToArray() at the end -- that might or might not be faster.
String myString = "patt\b\b\b\b\b\b\b\b\b\bfoo";
List<char> chars = myString.ToCharArray().ToList();
int delCount = 0;
for (int i = chars.Count -1; i >= 0; i--)
{
if (chars[i] == '\b')
{
delCount++;
chars.RemoveAt(i);
} else {
if (delCount > 0 && chars[i] != null) {
chars.RemoveAt(i);
delCount--;
}
}
}
i'd go like this:
code is not tested
char[] result = new char[input.Length()];
int r =0;
for (i=0; i<input.Length(); i++){
if (input[i] == '\b' && r>0) r--;
else result[r]=input[i];
}
string resultsring = result.take(r);
Create a StringBuilder and copy over everything but backspace chars.

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.
If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)
If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.
This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.
Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.
I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}
How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}
Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.
/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...
I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.
When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}
The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

What is the most efficient (read time) string search method? (C#)

I find that my program is searching through lots of lengthy strings (20,000+) trying to find a particular unique phrase.
What is the most efficent method for doing this in C#?
Below is the current code which works like this:
The search begins at startPos because the target area is somewhat removed from the start
It loops through the string, at each step it checks if the substring from that point starts with the startMatchString, which is an indicator that the start of the target string has been found. (The length of the target string varys).
From here it creates a new substring (chopping off the 11 characters that mark the start of the target string) and searches for the endMatchString
I already know that this is a horribly complex and possibly very inefficent algorithm.
What is a better way to accomplish the same result?
string result = string.Empty;
for (int i = startPos; i <= response.Length - 1; i++)
{
if (response.Substring(i).StartsWith(startMatchString))
{
string result = response.Substring(i).Substring(11);
for (int j = 0; j <= result.Length - 1; j++)
{
if (result.Substring(j).StartsWith(endMatchString))
{
return result.Remove(j)
}
}
}
}
return result;
You can use String.IndexOf, but make sure you use StringComparison.Ordinal or it may be one order of magnitude slower.
private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
if (startMarch != -1) {
startMarch += startMatchString.Length;
int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
}
return string.Empty;
}
Searching 1000 times a string at about the 40% of a 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal it took about 2000 milliseconds.
Searching 1 time with your method took over 60 seconds as it creates a new string (O(n)) each iteration, making your method O(n^2).
There are a whole bunch of algorithms,
boyer and moore
Sunday
Knuth-Morris-Pratt
Rabin-Karp
I would recommend to use the simplified Boyer-Moore, called Boyer–Moore–Horspool.
The C-code appears at the wikipedia.
For the java code look at
http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html
A nice article about these is available under
http://www.ibm.com/developerworks/java/library/j-text-searching.html
If you want to use built-in stuff go for regular expressions.
It depends on what you're trying to find in the string. If you're looking for a specific sequence IndexOf/Contains are fast, but if you're looking for wild card patterns Regex is optimized for this kind of search.
I would try to use a Regular Expression instead of rolling my own string search algorithm. You can precompile the regular expression to make it run faster.
For very long strings you cannot beat the boyer-moore search algorithm. It is more complex than I might try to explain here, but The CodeProject site has a pretty good article on it.
You could use a regex; it’s optimized for this kind of searching and manipulation.
You could also try IndexOf ...
string result = string.Empty;
if (startPos >= response.Length)
return result;
int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;
if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
if (endingIndex > -1)
result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}
return result;
Here's an example using IndexOf (beware: written from the top of my head, didn't test it):
int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
int end = response.IndexOf(startMatchString, start + skip);
if (end >= 0)
return response.Substring(start + skip, end - start - skip);
else
return response.Substring(start + skip);
}
return string.Empty;
As said before regex is your friend.
You might want to look at RegularExpressions.Group.
This way you can name part of the matched resultset.
Here is an example

performance issues with finding nth occurence of a character with a regular expression

I have a regex to find the nth occurrence of a character in a string, here's the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
{
return m.Groups[2].Captures[n - 1].Index;
}
else
{
return -1;
}
}
Now, I have 1594 entries in this string, with 1593 semicolons. If I write:
tempstring.NthIndexOf(";", 1593)
The answer comes back immediately and correctly. If I give it anything over 1594 it hangs. Does anyone know how to fix this?
Test Case
string holder = "test;test2;test3";
string test = "";
for (int i = 0; i < 600; i++)
{
test += holder;
}
int index = test.NthIndexOf(";", 2000);
This takes a very long time. Change 600 to 6 and it is very fast. Make 2000 to 1700 and it is very fast as well.
Why is my regular expression so slow?
If you're really only looking for character repetitions, and not string repetitions, then you should be able to replace you method with something simple like
public static int NthIndexOf(this string target, char testChar, int n)
{
int count = 0;
for(int i=0; i<target.Length; i++)
{
if(target[i] == testChar)
{
count++;
if(count == n) return i;
}
}
return -1;
}
and use that. It should have far fewer limitations.
As for why your original regex is going slow, here's what I suspect:
For your fast case, it's working because it can find a match on it's first pass through (with each group matching exactly one character)
For the slow case is because it can't find a match (and won't ever find one, because there aren't enough semicolons to satisfy the regex), but it recursively tries every possible way to break up the string (which is a really big operation)
Try to use a more distinct and efficient regular expression:
"^(?:[^" + value + "]*" + value + "){" + (n - 1) + "}([^" + value + "]*)
This will build the following regular expression for tempstring.NthIndexOf(";", 1593):
^(?:[^;]*;){1592}([^;]*)
But this will only work for single characters as separator.
Another approach would be to step through each character and count the occurences of the character you were looking for.

Categories