Developing an application in c# to replace variables across strings.
Need suggestions how to do that very efficiently?
private string MyStrTr(string source, string frm, string to)
{
char[] input = source.ToCharArray();
bool[] replaced = new bool[input.Length];
for (int j = 0; j < input.Length; j++)
replaced[j] = false;
for (int i = 0; i < frm.Length; i++)
{
for(int j = 0; j<input.Length;j++)
if (replaced[j] == false && input[j]==frm[i])
{
input[j] = to[i];
replaced[j] = true;
}
}
return new string(input);
}
Above code works fine but each variables need to traverse across string according to variable count.
Exact requirement.
parent.name = am;
parent.number = good;
I {parent.name} a {parent.number} boy.
output should be I am a good boy.
think like source will be huge.
For example if i have 5 different variable need to traverse full string 5 times.
Need a suggestions how to process the variables parellely during first time traversal?
I think you're suffering from premature optimization. Write the simplest thing first and see if it works for you. If you don't suffer a performance problem, then you're done. Don't waste time trying to make it faster when you don't know how fast it is, or if it's the cause of your performance problem.
By the way, Facebook's Terms of Service, the entire HTML page that includes a lot of Javascript, is only 164 kilobytes. That's not especially large.
String.Replace should work quite well, even if you have multiple strings to replace. That is, you can write:
string result = source.Replace("{parent.name}", "am");
result = result.Replace("{parent.number}", "good");
// more replacements here
return result;
That will exercise the garbage collector a little bit, but it shouldn't be a problem unless you have a truly massive page or a whole mess of replacements.
You can potentially save yourself some garbage collection by converting the string to a StringBuilder, and calling StringBuilder.Replace multiple times. I honestly don't know, though, whether that will have any appreciable effect. I don't know how StringBuilder.Replace is implemented.
There is a way to make this faster, by writing code that will parse the string and do all the replacements in a single pass. It's a lot of code, though. You have to build a state machine from the multiple search strings and go through the source text one character at a time. It's doable, but it's a difficult enough task that you probably don't want to do it unless the simple method flat doesn't work quickly enough.
Related
in C#, I have a string like this:
"1 3.14 (23, 23.2, 43,88) 8.27"
I need to convert this string to other types according to the value like int/float/vector3, now i have some code like this:
public static int ReadInt(this string s, ref string op)
{
s = s.Trim();
string ss = "";
int idx = s.IndexOf(" ");
if (idx > 0)
{
ss = s.Substring(0, idx);
op = s.Substring(idx);
}
else
{
ss = s;
op = "";
}
return Convert.ToInt32(ss);
}
this will read the first int value out, i have some similar functions to read float vector3 etc. but the problem is : in my application, i have to do this a lot because i received the string from some plugin and i need to do it every single frame, so i created a lot of strings which caused a lot GC will impact the performance, is their a way i can do similar stuff without creating temp strings?
Generation 0 objects such as those created here may well not impact performance too much, as they are relatively cheap to collect. I would change from using Convert to calling int.Parse() with the invariant culture before I started worrying about the GC overhead of the extra strings.
Also, you don't really need to create a new string to accomplish the Trim() behavior. After all, you're scanning and indexing the string anyway. Just do your initial scan for whitespace, and then for the space delimiter between ss and op, so you get just the substrings you need. Right now you're creating 50% more string instances than you really need.
All that said, no...there's not anything built into the basic .NET framework that would parse a substring without actually creating a new string instance. You would have to write your own parsing routines to accomplish that.
You should measure the actual real-world performance impact first, to make sure these substrings really are a significant issue.
I don't know what the "some plugin" is or how you have to handle the input from it, but I would not be surprised to hear that the overhead in acquiring the original input string(s) for this scenario swamps the overhead of the substrings for parsing.
I'm trying to parse a large text string. I need to split the original string in blocks of 15 characters(and the next block might contain white spaces, so the trim function is used). I'm using two strings, the original and a temporary one. This temp string is used to store each 15 length block.
I wonder if I could fall into a performance issue because strings are immutable. This is the code:
string original = "THIS IS SUPPOSE TO BE A LONG STRING AN I NEED TO SPLIT IT IN BLOCKS OF 15 CHARACTERS.SO";
string temp = string.Empty;
while (original.Length != 0)
{
temp = original.Substring(0, 14).Trim();
original = original.Substring(14, (original.Length -14)).Trim();
}
I appreciate your feedback in order to find a best way to achieve this functionality.
You'll get slightly better performance like this (but whether the performance gain will be significant is another matter entirely):
for (var startIndex = 0; startIndex < original.Length; startIndex += 15)
{
temp = original.Substring(startIndex, Math.Min(original.Length - startIndex, 15)).Trim();
}
This performs better because you're not copying the last all-but-15-characters of the original string with each loop iteration.
EDIT
To advance the index to the next non-whitespace character, you can do something like this:
for (var startIndex = 0; startIndex < original.Length; )
{
if (char.IsWhiteSpace(string, startIndex)
{
startIndex++;
continue;
}
temp = original.Substring(startIndex, Math.Min(original.Length - startIndex, 15)).Trim();
startIndex += 15;
}
I think you are right about the immutable issue - recreating 'original' each time is probably not the fastest way.
How about passing 'original' into a StringReader class?
If your original string is longer than few thousand chars, you'll have noticable (>0.1s) processing time and a lot of GC pressure. First Substring call is fine and I don't think you can avoid it unless you go deep inside System.String and mess around with m_FirstChar. Second Substring can be avoided completely when going char-by-char and iterating over int.
In general, if you would run this on bigger data such code might be problematic, it of course depends on your needs.
In general, it might be a good idea to use StringBuilder class, which will allow you to operator on strings in "more mutable" way without performance hit, like remove from it's beggining without reallocating whole string.
In your example however I would consider throwing out lime that takes substring from original and substitute it with some code that would update some indexes pointing where you should get new substring from. Then while condition would be just checking if your index as at the end of the string and your temp method would take substring not from 0 to 14 but from i, where i would be this index.
However - don't optimize code if you don't have to, I'm assuming here that you need more performance and you want to sacrifice some time and/or write a bit less understandable code for more efficiency.
I have some code that removes HTML tags from text. I don't care about the content (script, css, text etc), the important thing, at least for now, is that the tags themselves are stripped out.
This may be entering the theatre of micro-optimisation, however this code is among a small number of functions that will be running very often against large amounts of data, so any percentage saving may carry through to a useful saving from the overall application's perspective.
The code at present looks like this:
public static string StripTags(string html)
{
var currentIndex = 0;
var insideTag = false;
var output = new char[html.Length];
for (int i = 0; i < html.Length; i++)
{
var c = html[i];
if (c == '>')
{
insideTag = false;
continue;
}
if (!insideTag)
{
if (c == '<')
{
insideTag = true;
continue;
}
output[currentIndex] = c;
currentIndex++;
}
}
return new string(output, 0, currentIndex);
}
Are there any obvious .net tricks I'm missing out on here? For info this is using .net 4.
Many thanks.
In this code you copy chars one by one. You might be able to speed it up considerably by only checking where the current section (inside or outside html) ends and then use Array.copy to move that whole chunk in one go, this would enable lower level optimizations. (for instance on 64 bit it could copy 4 unicode chars (4 * 2* 8 bit) in one processor cycle). The bits of text in between the tags are probably quite large so this could add up.
Also the stringbuilder documentation mentioned somewhere that becuase it's implemented in the framework and not in C# it has perfomance that you can't replicate in managed C#. Not sure how you could append a chunk you might look into that.
Regards Gert-Jan
You should take a look at the following library as it seems to be the best way to interact with html files in .NET: http://htmlagilitypack.codeplex.com/
Do not solve a non existing problem.
How many times will this method be called? Many! How many? Several thousands? Not enough to warrant optimization.
Can you just do a Parallel.For and speed it up 3-5 times depending on machine? Possibly.
Is your code dependent on lots of other code? Certainly.
Is it possible that you have this:
// Some slow code
StripTags(s); // Super fast version
// Some more slow code here
Will it matter then how fast is your StripTags?
Are you getting them from a file? Are you getting them from a network? Very rarely the bottleneck is your raw CPU power.
Let me repeat myself:
Do not solve a non existing problem!
You can also encode it:
string encodedString = Server.HtmlEncode(stringToEncode);
Have a look here: http://msdn.microsoft.com/en-us/library/ms525347%28v=vs.90%29.aspx
Googling for remove html from string yields many links that talk about using Regular Expressions all similar to the following:
public string Strip(string text)
{
return Regex.Replace(text, #”<(.|\n)*?>”, string.Empty);
}
What's the fastest way to parse strings in C#?
Currently I'm just using string indexing (string[index]) and the code runs reasonably, but I can't help but think that the continuous range checking that the index accessor does must be adding something.
So, I'm wondering what techniques I should consider to give it a boost. These are my initial thoughts/questions:
Use methods like string.IndexOf() and IndexOfAny() to find characters of interest. Are these faster than manually scanning a string by string[index]?
Use regex's. Personally, I don't like regex as I find them difficult to maintain, but are these likely to be faster than manually scanning the string?
Use unsafe code and pointers. This would eliminate the index range checking but I've read that unsafe code wont run in untrusted environments. What exactly are the implications of this? Does this mean the whole assembly won't load/run, or will only the code marked unsafe refuse to run? The library could potentially be used in a number of environments, so to be able to fall back to a slower but more compatible mode would be nice.
What else might I consider?
NB: I should say, the strings I'm parsing could be reasonably large (say 30k) and in a custom format for which there is no standard .NET parser. Also, performance of this code is not super critical, so this partly just a theoretical question of curiosity.
30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.
For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random) and sum all the character code-points via the indexer takes... 3ms:
var watch = Stopwatch.StartNew();
char[] chars = new char[128 * 1024];
Random rand = new Random(); // fill with junk
for (int i = 0; i < chars.Length; i++) chars[i] =
(char) ((int) 'a' + rand.Next(26));
int sum = 0;
string s = new string(chars);
int len = s.Length;
for(int i = 0 ; i < len ; i++)
{
sum += (int) chars[i];
}
watch.Stop();
Console.WriteLine(sum);
Console.WriteLine(watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
For files that are actually large, a reader approach should be used - StreamReader etc.
"Parsing" is quite an inexact term. Since you talks of 30k, it seems that you might be dealing with some sort of structured string which can be covered by creating a parser using a parser generator tool.
A nice tool to create, maintain and understand the whole process is the GOLD Parsing System by Devin Cook: http://www.devincook.com/goldparser/
This can help you create code which is efficient and correct for many textual parsing needs.
As for your points:
is usually not useful for parsing which goes further than splitting a string.
is better suited if there are no recursions or too complex rules.
is basically a no-go if you haven't really identified this as a serious problem. The JIT can take care of doing the range checks only when needed, and indeed for simple loops (the typical for loop) this is handled pretty well.
Below is the code i am using
private void TestFunction()
{
foreach (MySampleClass c in dictSampleClass)
{
String sText = c.VAR1 + c.VAR2 + c.VAR3
PerformSomeTask(sText,c.VAR4);
}
}
My friend has suggesed to change to (to improve performance. dictSampleClass is a dictionary. It has 10K objects)
private void TestFunction()
{
String sText="";
foreach (MySampleClass c in dictSampleClass)
{
sText = c.VAR1 + c.VAR2 + c.VAR3
PerformSomeTask(sText,c.VAR4);
}
}
My Question is, "Does above change improve performance? if yes, how?"
WOW thats more than expected response. Most guys said "C# compiler would take care of that". So what about c compiler??
The compiler has intelligence to move variable declarations into/out of loops where required. In your example however, you are using a string, which is immutable. By declaring it outside I believe you are trying to "create once, use many", however strings are created each time they are modified so you can't achieve that.
Not to sound harse, but this is a premature optimisation, and likely a failing one - due to string immutability.
If the collection is large, go down the route of appending many strings into a StringBuilder - separated by a delimiter. Then split the string on this delimiter and iterate the array to add them, instead of concatenating them and adding them in one loop.
StringBuilder sb = new StringBuilder();
foreach (MySampleClass c in dictSampleClass)
{
sb.Append(c.VAR1);
sb.Append(c.VAR2);
sb.Append(c.VAR3);
sb.Append("|");
}
string[] results = sb.ToString().Split('|');
for (int i = 0; i < dictSampleClass.Count; i++)
{
string s = results[i];
MySampleClass c = dictSampleClass[i];
PerformSomeTask(s,c.VAR4);
}
I infer no benefits to using this code - most likely doesn't even work!
UPDATE: in light of the fast performance of string creation from multiple strings, if PerformSomeTask is your bottleneck, try to break the iteration of the strings into multiple threads - it won't improve the efficiency of the code but you'll be able to utilise multiple cores.
Run the two functions through reflector and have a look at the generated assembly. It might give some insights but at best, the performance improval would be minimal.
I'd go for this instead:
private void TestFunction()
{
foreach (MySampleClass c in dictSampleClass)
{
PerformSomeTask(c.VAR1 + c.VAR2 + c.VAR3, c.VAR4);
}
}
There's still probably no real performance benefit, but it removes creating a variable that you don't really need.
No, it does not. Things like these are handled by the C# compiler which is very inteligent nad you basicaly do not need to care about these details.
Aways use code that promotes readability.
You can check this by disassembling the program.
Ii will not improve performance. But on another note: Assumed improvements are error prone. You have to measure you application and only optimise the slow part if needed. I don't believe that loop is your appliacations's bottleneck.