I'm aimed at speed, must be ultra fast.
string s = something;
for (int j = 0; j < s.Length; j++)
{
if (s[j] == 'ь')
if(s.Length>(j+1))
if(s[j+1] != 'о')
s[j] = 'ъ';
It gives me an error Error "Property or indexer 'string.this[int]' cannot be assigned to -- it is read only"
How do I do it the fastest way?
Fast way? Use a StringBuilder.
Fastest way? Always pass around a char* and a length instead of a string so you can modify the buffer in-place, but make sure you don't ever modify any string object.
There are at least two options:
Use a StringBuilder and keep track of the previous character.
You could just use a regular expression "ь(?!о)" or a simple string replacement of "ьо" depending on what your needs are (your question seems self-contradictory).
I tested the performance of a StringBuilder approach versus regular expressions and there is very little difference - at most a factor of 2:
Method Iterations per second
StringBuilder 153480.094
Regex (uncompiled) 90021.978
Regex (compiled) 136355.787
string.Replace 1427605.174
If performance is critical for you I would strongly recommend making some performance measurements before jumping to conclusions about what the fastest approach is.
Strings in .Net is read-only. You could use StringBuilder.
Related
Developing an application in c# to replace variables across strings.
Need suggestions how to do that very efficiently?
private string MyStrTr(string source, string frm, string to)
{
char[] input = source.ToCharArray();
bool[] replaced = new bool[input.Length];
for (int j = 0; j < input.Length; j++)
replaced[j] = false;
for (int i = 0; i < frm.Length; i++)
{
for(int j = 0; j<input.Length;j++)
if (replaced[j] == false && input[j]==frm[i])
{
input[j] = to[i];
replaced[j] = true;
}
}
return new string(input);
}
Above code works fine but each variables need to traverse across string according to variable count.
Exact requirement.
parent.name = am;
parent.number = good;
I {parent.name} a {parent.number} boy.
output should be I am a good boy.
think like source will be huge.
For example if i have 5 different variable need to traverse full string 5 times.
Need a suggestions how to process the variables parellely during first time traversal?
I think you're suffering from premature optimization. Write the simplest thing first and see if it works for you. If you don't suffer a performance problem, then you're done. Don't waste time trying to make it faster when you don't know how fast it is, or if it's the cause of your performance problem.
By the way, Facebook's Terms of Service, the entire HTML page that includes a lot of Javascript, is only 164 kilobytes. That's not especially large.
String.Replace should work quite well, even if you have multiple strings to replace. That is, you can write:
string result = source.Replace("{parent.name}", "am");
result = result.Replace("{parent.number}", "good");
// more replacements here
return result;
That will exercise the garbage collector a little bit, but it shouldn't be a problem unless you have a truly massive page or a whole mess of replacements.
You can potentially save yourself some garbage collection by converting the string to a StringBuilder, and calling StringBuilder.Replace multiple times. I honestly don't know, though, whether that will have any appreciable effect. I don't know how StringBuilder.Replace is implemented.
There is a way to make this faster, by writing code that will parse the string and do all the replacements in a single pass. It's a lot of code, though. You have to build a state machine from the multiple search strings and go through the source text one character at a time. It's doable, but it's a difficult enough task that you probably don't want to do it unless the simple method flat doesn't work quickly enough.
I'm working with huge string data for a project in C#. I'm confused about which approach should I use to manipulate my string data.
First Approach:
StringBuilder myString = new StringBuilder().Append(' ', 1024);
while(someString[++counter] != someChar)
myString[i++] += someString[counter];
Second Approach:
String myString = new String();
int i = counter;
while(soumeString[++counter] != someChar);
myString = someString.SubString(i, counter - i);
Which one of the two would be more fast(and efficient)? Considering the strings I'm working with are huge.
The strings are already in the RAM.
The size of the string can vary from 32MB-1GB.
You should use IndexOf rather than doing individual character manipulations in a loop, and add whole chunks of string to the result:
StringBuilder myString = new StringBuilder();
int pos = someString.IndexOf(someChar, counter);
myString.Append(someString.SubString(counter, pos));
For "huge" strings, it may make sense to take a streamed approach and not load the whole thing into memory. For the best raw performance, you can sometimes squeeze a little more speed out by using pointer math to search and capture pieces of strings.
To be clear, I'm stating two completely different approaches.
1 - Stream
The OP doesn't say how big these strings are, but it may be impractical to load them into memory. Perhaps they are being read from a file, from a data reader connected to a DB, from an active network connection, etc.
In this scenario, I would open a stream, read forward, buffering my input in a StringBuilder until the criteria was met.
2 - Unsafe Char Manipulation
This requires that you do have the complete string. You can obtain a char* to the start of a string quite simply:
// fix entire string in memory so that we can work w/ memory range safely
fixed( char* pStart = bigString )
{
char* pChar = pStart; // unfixed pointer to start of string
char* pEnd = pStart + bigString.Length;
}
You can now increment pChar and examine each character. You can buffer it (e.g. if you want to examine multiple adjacent characters) or not as you choose. Once you determine the ending memory location, you now have a range of data that you can work with.
Unsafe Code and Pointers in c#
2.1 - A Safer Approach
If you are familiar with unsafe code, it is very fast, expressive, and flexible. If not, I would still use a similar approach, but without the pointer math. This is similar to the approach which #supercat suggested, namely:
Get a char[].
Read through it character by character.
Buffer where needed. StringBuilder is good for this; set an initial size and reuse the instance.
Analyze buffer where needed.
Dump buffer often.
Do something with the buffer when it contains the desired match.
And an obligatory disclaimer for unsafe code: The vast majority of the time the framework methods are a better solution. They are safe, tested, and invoked millions of times per second. Unsafe code puts all of the responsibility on the developer. It does not make any assumptions; it's up to you to be a good framework/OS citizen (e.g. not overwriting immutable strings, allowing buffer overruns, etc.). Because it does not make any assumptions and removes the safeguards, it will often yield a performance increase. It's up to the developer to determine if there is indeed a benefit, and to decide if the advantages are significant enough.
Per request from OP, here are my test results.
Assumptions:
Big string is already in memory, no requirement for reading from disk
Goal is to not use any native pointers/unsafe blocks
The "checking" process is simple enough that something like Regex is not needed. For now simplifying to a single char comparison. The below code can easily be modified to consider multiple chars at once, this should have no effect on the relative performance of the two approaches.
public static void Main()
{
string bigStr = GenString(100 * 1024 * 1024);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10; i++)
{
int counter = -1;
StringBuilder sb = new StringBuilder();
while (bigStr[++counter] != 'x')
sb.Append(bigStr[counter]);
Console.WriteLine(sb.ToString().Length);
}
sw.Stop();
Console.WriteLine("StringBuilder: {0}", sw.Elapsed.TotalSeconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < 10; i++)
{
int counter = -1;
while (bigStr[++counter] != 'x') ;
Console.WriteLine(bigStr.Substring(0, counter).Length);
}
sw.Stop();
Console.WriteLine("Substring: {0}", sw.Elapsed.TotalSeconds);
}
public static string GenString(int size)
{
StringBuilder sb = new StringBuilder(size);
for (int i = 0; i < size - 1; i++)
{
sb.Append('a');
}
sb.Append('x');
return sb.ToString();
}
Results (release build, .NET 4):
StringBuilder ~7.9 sec
Substring ~1.9 sec
StringBuilder was consistently > 3x slower, with a variety of different sized strings.
There's an IndexOf operation which would search more quickly for someChar, but I'll assume your real function to find the desired length is more complicated than that. In that scenario, I would recommend copying someString to a Char[], doing the search, and then using the new String(Char[], Int32, Int32) constructor to produce the final string. Indexing a Char[] is going to be so much more efficient than indexing an String or StringBuilder that unless you expect that you'll typically be needing only a small fraction of the string, copying everything to the Char[] will be a 'win' (unless, of course, you could simply use something like IndexOf).
Even if the length of the string will often be much larger than the length of interest, you may still be best off using a Char[]. Pre-initialize the Char[] to some size, and then do something like:
Char[] temp = new Char[1024];
int i=0;
while (i < theString.Length)
{
int subLength = theString.Length - i;
if (subLength > temp.Length) // May impose other constraints on subLength, provided
subLength = temp.Length; // it's greater than zero.
theString.CopyTo(i, temp, 0, subLength);
... do stuff with the array
i+=subLength;
}
Once you're all done, you may then use a single SubString call to construct a string with the necessary characters from the original. If your application requires buinding a string whose characters differ from the original, you could use a StringBuilder and, within the above loop, use the Append(Char[], Int32, Int32) method to add processed characters to it.
Note also that when the above loop construct, one may decide to reduce subLength at any point in the loop provided it is not reduced to zero. For example, if one is trying to find whether the string contains a prime number of sixteen or fewer digits enclosed by parentheses, one could start by scanning for an open-paren; if one finds it and it's possible that the data one is looking for might extend beyond the array, set subLength to the position of the open-paren, and reloop. Such an approach will result in a small amount of redundant copying, but not much (often none), and will eliminate the need to keep track of parsing state between loops. A very convenient pattern.
You always want to use StringBuilder when manipulating strings. This is becwuse strings are immutable, so every time a new object needs to be created.
I'm trying to parse a large text string. I need to split the original string in blocks of 15 characters(and the next block might contain white spaces, so the trim function is used). I'm using two strings, the original and a temporary one. This temp string is used to store each 15 length block.
I wonder if I could fall into a performance issue because strings are immutable. This is the code:
string original = "THIS IS SUPPOSE TO BE A LONG STRING AN I NEED TO SPLIT IT IN BLOCKS OF 15 CHARACTERS.SO";
string temp = string.Empty;
while (original.Length != 0)
{
temp = original.Substring(0, 14).Trim();
original = original.Substring(14, (original.Length -14)).Trim();
}
I appreciate your feedback in order to find a best way to achieve this functionality.
You'll get slightly better performance like this (but whether the performance gain will be significant is another matter entirely):
for (var startIndex = 0; startIndex < original.Length; startIndex += 15)
{
temp = original.Substring(startIndex, Math.Min(original.Length - startIndex, 15)).Trim();
}
This performs better because you're not copying the last all-but-15-characters of the original string with each loop iteration.
EDIT
To advance the index to the next non-whitespace character, you can do something like this:
for (var startIndex = 0; startIndex < original.Length; )
{
if (char.IsWhiteSpace(string, startIndex)
{
startIndex++;
continue;
}
temp = original.Substring(startIndex, Math.Min(original.Length - startIndex, 15)).Trim();
startIndex += 15;
}
I think you are right about the immutable issue - recreating 'original' each time is probably not the fastest way.
How about passing 'original' into a StringReader class?
If your original string is longer than few thousand chars, you'll have noticable (>0.1s) processing time and a lot of GC pressure. First Substring call is fine and I don't think you can avoid it unless you go deep inside System.String and mess around with m_FirstChar. Second Substring can be avoided completely when going char-by-char and iterating over int.
In general, if you would run this on bigger data such code might be problematic, it of course depends on your needs.
In general, it might be a good idea to use StringBuilder class, which will allow you to operator on strings in "more mutable" way without performance hit, like remove from it's beggining without reallocating whole string.
In your example however I would consider throwing out lime that takes substring from original and substitute it with some code that would update some indexes pointing where you should get new substring from. Then while condition would be just checking if your index as at the end of the string and your temp method would take substring not from 0 to 14 but from i, where i would be this index.
However - don't optimize code if you don't have to, I'm assuming here that you need more performance and you want to sacrifice some time and/or write a bit less understandable code for more efficiency.
I'm trying to speed up the following:
string s; //--> s is never null
if (s.Length != 0)
{
<do something>
}
Problem is, it appears the .Length actually counts the characters in the string, and this is way more work than I need. Anybody have an idea on how to speed this up?
Or, is there a way to determine if s[0] exists, w/out checking the rest of the string?
EDIT: Now that you've provided some more context:
Trying to reproduce this, I failed to find a bottleneck in string.Length at all. The only way of making it faster was to comment out both the test and the body of the if block - which isn't really fair. Just commenting out the condition slowed things down, i.e. unconditionally copying the reference was slower than checking the condition.
As has been pointed out, using the overload of string.Split which removes empty entries for you is the real killer optimization.
You can go further, by avoiding creating a new char array with just a space in every time. You're always going to pass the same thing effectively, so why not take advantage of that?
Empty arrays are effectively immutable. You can optimize the null/empty case by always returning the same thing.
The optimized code becomes:
private static readonly char[] Delimiters = " ".ToCharArray();
private static readonly string[] EmptyArray = new string[0];
public static string[] SplitOnMultiSpaces(string text)
{
if (string.IsNullOrEmpty(text))
{
return EmptyArray;
}
return text.Split(Delimiters, StringSplitOptions.RemoveEmptyEntries);
}
String.Length absolutely does not count the letters in the string. The value is stored as a field - although I seem to remember that the top bit of that field is used to remember whether or not all characters are ASCII (or used to be, anyway) to enable other optimisations. So the property access may need to do a bitmask, but it'll still be O(1) and I'd expect the JIT to inline it, too. (It's implemented as an extern, but hopefully that wouldn't affect the JIT in this case - I suspect it's a common enough operation to potentially have special support.)
If you already know that the string isn't null, then your existing test of
if (s.Length != 0)
is the best way to go if you're looking for raw performance IMO. Personally in most cases I'd write:
if (s != "")
to make it clearer that we're not so much interested in the length as a value as whether or not this is the empty string. That will be slightly slower than the length test, but I believe it's clearer. As ever, I'd go for the clearest code until you have benchmark/profiling data to indicate that this really is a bottleneck. I know your question is explicitly about finding the most efficient test, but I thought I'd mention this anyway. Do you have evidence that this is a bottleneck?
EDIT: Just to give clearer reasons for my suggestion of not using string.IsNullOrEmpty: a call to that method suggests to me that the caller is explicitly trying to deal with the case where the variable is null, otherwise they wouldn't have mentioned it. If at this point of the code it counts as a bug if the variable is null, then you shouldn't be trying to handle it as a normal case.
In this situation, the Length check is actually better in one way than the inequality test I've suggested: it acts as an implicit assertion that the variable isn't null. If you have a bug and it is null, the test will throw an exception and the bug will be detected early. If you use the equality test it will treat null as being different to the empty string, so it will go into your "if" statement's body. If you use string.IsNullOrEmpty it will treat null as being the same as empty, so it won't go into the block.
String.IsNullOrEmpty is the preferred method for checking for null or zero length strings.
Internally, it will use Length. The Length property for a string should not be calculated on the fly though.
If you're absolutely certain that the string will never be null and you have some strong objection to String.IsNullOrEmpty, the most efficient code I can think of would be:
if(s.Length > 0)
{
// Do Something
}
Or, possibly even better:
if(s != "")
{
// Do Something
}
Accessing the Length property shouldn't do a count -- .NET strings store a count inside the object.
The SSCLI/Rotor source code contains an interesting comment which suggests that String.Length is (a) efficient and (b) magic:
// Gets the length of this string
//
/// This is a EE implemented function so that the JIT can recognise is specially
/// and eliminate checks on character fetchs in a loop like:
/// for(int I = 0; I < str.Length; i++) str[i]
/// The actually code generated for this will be one instruction and will be inlined.
//
public extern int Length {
[MethodImplAttribute(MethodImplOptions.InternalCall)]
get;
}
Here is the function String.IsNullOrEmpty -
if (!String.IsNullOrEmpty(yourstring))
{
// your code
}
String.IsNullOrWhiteSpace(s);
true if s is null or Empty, or if s consists exclusively of white-space characters.
As always with performace: benchmark.
Using C# 3.5 or before, you'll want to test yourString.Length vs String.IsNullOrEmpty(yourString)
using C# 4, do both of the above and add String.IsNullOrWhiteSpace(yourString)
Of course, if you know your string will never be empty, you could just attempt to access s[0] and handle the exception when it's not there. That's not normally good practice, but it may be closer to what you need (if s should always have a non-blank value).
for (int i = 0; i < 100; i++)
{
System.Diagnostics.Stopwatch timer = new System.Diagnostics.Stopwatch();
string s = "dsfasdfsdafasd";
timer.Start();
if (s.Length > 0)
{
}
timer.Stop();
System.Diagnostics.Debug.Write(String.Format("s.Length != 0 {0} ticks ", timer.ElapsedTicks));
timer.Reset();
timer.Start();
if (s == String.Empty)
{
}
timer.Stop();
System.Diagnostics.Debug.WriteLine(String.Format("s== String.Empty {0} ticks", timer.ElapsedTicks));
}
Using the stopwatch the s.length != 0 takes less ticks then s == String.Empty
after I fix the code
Based on your intent described in your answer, why don't you just try using this built-in option on Split:
s.Split(new[]{" "}, StringSplitOptions.RemoveEmptyEntries);
Just use String.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries) and it will do it all for you.
What's the fastest way to parse strings in C#?
Currently I'm just using string indexing (string[index]) and the code runs reasonably, but I can't help but think that the continuous range checking that the index accessor does must be adding something.
So, I'm wondering what techniques I should consider to give it a boost. These are my initial thoughts/questions:
Use methods like string.IndexOf() and IndexOfAny() to find characters of interest. Are these faster than manually scanning a string by string[index]?
Use regex's. Personally, I don't like regex as I find them difficult to maintain, but are these likely to be faster than manually scanning the string?
Use unsafe code and pointers. This would eliminate the index range checking but I've read that unsafe code wont run in untrusted environments. What exactly are the implications of this? Does this mean the whole assembly won't load/run, or will only the code marked unsafe refuse to run? The library could potentially be used in a number of environments, so to be able to fall back to a slower but more compatible mode would be nice.
What else might I consider?
NB: I should say, the strings I'm parsing could be reasonably large (say 30k) and in a custom format for which there is no standard .NET parser. Also, performance of this code is not super critical, so this partly just a theoretical question of curiosity.
30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.
For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random) and sum all the character code-points via the indexer takes... 3ms:
var watch = Stopwatch.StartNew();
char[] chars = new char[128 * 1024];
Random rand = new Random(); // fill with junk
for (int i = 0; i < chars.Length; i++) chars[i] =
(char) ((int) 'a' + rand.Next(26));
int sum = 0;
string s = new string(chars);
int len = s.Length;
for(int i = 0 ; i < len ; i++)
{
sum += (int) chars[i];
}
watch.Stop();
Console.WriteLine(sum);
Console.WriteLine(watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
For files that are actually large, a reader approach should be used - StreamReader etc.
"Parsing" is quite an inexact term. Since you talks of 30k, it seems that you might be dealing with some sort of structured string which can be covered by creating a parser using a parser generator tool.
A nice tool to create, maintain and understand the whole process is the GOLD Parsing System by Devin Cook: http://www.devincook.com/goldparser/
This can help you create code which is efficient and correct for many textual parsing needs.
As for your points:
is usually not useful for parsing which goes further than splitting a string.
is better suited if there are no recursions or too complex rules.
is basically a no-go if you haven't really identified this as a serious problem. The JIT can take care of doing the range checks only when needed, and indeed for simple loops (the typical for loop) this is handled pretty well.