Developing an application in c# to replace variables across strings.
Need suggestions how to do that very efficiently?
private string MyStrTr(string source, string frm, string to)
{
char[] input = source.ToCharArray();
bool[] replaced = new bool[input.Length];
for (int j = 0; j < input.Length; j++)
replaced[j] = false;
for (int i = 0; i < frm.Length; i++)
{
for(int j = 0; j<input.Length;j++)
if (replaced[j] == false && input[j]==frm[i])
{
input[j] = to[i];
replaced[j] = true;
}
}
return new string(input);
}
Above code works fine but each variables need to traverse across string according to variable count.
Exact requirement.
parent.name = am;
parent.number = good;
I {parent.name} a {parent.number} boy.
output should be I am a good boy.
think like source will be huge.
For example if i have 5 different variable need to traverse full string 5 times.
Need a suggestions how to process the variables parellely during first time traversal?
I think you're suffering from premature optimization. Write the simplest thing first and see if it works for you. If you don't suffer a performance problem, then you're done. Don't waste time trying to make it faster when you don't know how fast it is, or if it's the cause of your performance problem.
By the way, Facebook's Terms of Service, the entire HTML page that includes a lot of Javascript, is only 164 kilobytes. That's not especially large.
String.Replace should work quite well, even if you have multiple strings to replace. That is, you can write:
string result = source.Replace("{parent.name}", "am");
result = result.Replace("{parent.number}", "good");
// more replacements here
return result;
That will exercise the garbage collector a little bit, but it shouldn't be a problem unless you have a truly massive page or a whole mess of replacements.
You can potentially save yourself some garbage collection by converting the string to a StringBuilder, and calling StringBuilder.Replace multiple times. I honestly don't know, though, whether that will have any appreciable effect. I don't know how StringBuilder.Replace is implemented.
There is a way to make this faster, by writing code that will parse the string and do all the replacements in a single pass. It's a lot of code, though. You have to build a state machine from the multiple search strings and go through the source text one character at a time. It's doable, but it's a difficult enough task that you probably don't want to do it unless the simple method flat doesn't work quickly enough.
A list of Equity Analytics (stocks) objects doing a calculation for daily returns.
Was thinking there must be a pairwise solution to do this:
for(int i = 0; i < sdata.Count; i++){
sdata[i].DailyReturn = (i > 0) ? (sdata[i-1].AdjClose/sdata[i].AdjClose) - 1
: 0.0;
}
LINQ stands for: "Language-Integrated Query".
LINQ should not be used and almost can't be used for assignments, LINQ doesn't change the given IEnumerable parameter but creates a new one.
As suggest in a comment below, there is a way to create a new IEnumerable with LINQ, it will be slower, and a lot less readable.
Though LINQ is nice, an important thing is to know when not to use it.
Just use the good old for loop.
I'm new to LINQ, I started using it because of stackoverflow so I tried to play with your question, and I know this may attract downvotes but I tried and it does what you want in case there is at least one element in the list.
sdata[0].DailyReturn = 0.0;
sdata.GetRange(1, sdata.Count - 1).ForEach(c => c.DailyReturn = (sdata[sdata.IndexOf(c)-1].AdjClose / c.AdjClose) - 1);
But must say that avoiding for loops isn't the best practice. from my point of view, LINQ should be used where convenient and not everywhere. Good old loops are sometimes easier to maintain.
I'm refactoring my app to make it faster. I was looking for tips on doing so, and found this statement:
"ForEach can simplify the code in a For loop but it is a heavy object and is slower than a loop written using For."
Is that true? If it was true when it was written, is it still true today, or has foreach itself been refactored to improve performance?
I have the same question about this tip from the same source:
"Where possible use arrays instead of collections. Arrays are normally more efficient especially for value types. Also, initialize collections to their required size when possible."
UPDATE
I was looking for performance tips because I had a database operation that was taking several seconds.
I have found that the "using" statement is a time hog.
I completely solved my performance problem by reversing the for loop and the "using" (of course, refactoring was necessary for this to work).
The slower-than-molasses code was:
for (int i = 1; i <= googlePlex; i++) {
. . .
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
. . .
InsertRecord();
. . .
The faster-than-a-speeding-bullet code is:
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
for (int i = 1; i <= googlePlex; i++) {
. . .
InsertRecord();
. . .
Short answer:
Code that is hard to read eventually results in software that behaves and performs poorly.
Long answer:
There was a culture of micro-optimization suggestions in early .NET. Partly it was because a few Microsoft's internal tools (such as FxCop) had gained popularity among general public. Partly it was because C# had and has aspirations to be a successor to assembly, C, and C++ regarding the unhindered access to raw hardware performance in the few hottest code paths of a performance critical application. This does require more knowledge and discipline than a typical application, of course. The consequences of performance related decisions in framework code and in app code are also quite different.
The net impact of this on C# coding culture has been positive, of course; but it would be ridiculous to stop using foreach or is or "" just in order to save a couple CIL instructions that your recent jitter could probably optimize away completely if it wanted to.
There are probably very many loops in your app and probably at most one of them might be a current performance bottleneck. "Optimizing" a non-bottleck for perfomance at the expense of readability is a very bad deal.
It's true in many cases that foreach is slower than an equivalent for. It's also true that
for (int i = 0; i < myCollection.Length; i++) // Compiler must re-evaluate getter because value may have changed
is slower than
int max = myCollection.Length;
for (int i = 0; i < max; i++)
But that probably will not matter at all. For a very detailed discussion see Performance difference for control structures 'for' and 'foreach' in C#
Have you done any profiling to determine the hot spots of your application? I would be astonished if the loop management overhead is where you should be focusing your attention.
You should try profiling your code with Red Gate ANTS or something of that ilk - you will be surprised.
I found that in an application I was writing it was the parameter sniffing in SQL that took up 25% of the processing time. After writing a command cache which sniffed the params at the start of the application, there was a big speed boost.
Unless you are doing a large amount of nested for loops, I don't think you will see much of a performance benefit from changing your loops. I can't imagine anything but a real time application such as a game or a large number crunching or scientific application would need that kind of optimisation.
Yes. The classic for is a bit faster than a foreach as the iteration is index based instead of access the element of the collection thought an enumerator
static void Main()
{
const int m = 100000000;
//just to create an array
int[] array = new int[100000000];
for (int x = 0; x < array.Length; x++) {
array[x] = x;
}
var s1 = Stopwatch.StartNew();
var upperBound = array.Length;
for (int i = 0; i < upperBound; i++)
{
}
s1.Stop();
GC.Collect();
var s2 = Stopwatch.StartNew();
foreach (var item in array) {
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.Read();
//2.49 ns
//4.68 ns
// In Release Mode
//0.39 ns
//1.05 ns
}
What's the fastest way to parse strings in C#?
Currently I'm just using string indexing (string[index]) and the code runs reasonably, but I can't help but think that the continuous range checking that the index accessor does must be adding something.
So, I'm wondering what techniques I should consider to give it a boost. These are my initial thoughts/questions:
Use methods like string.IndexOf() and IndexOfAny() to find characters of interest. Are these faster than manually scanning a string by string[index]?
Use regex's. Personally, I don't like regex as I find them difficult to maintain, but are these likely to be faster than manually scanning the string?
Use unsafe code and pointers. This would eliminate the index range checking but I've read that unsafe code wont run in untrusted environments. What exactly are the implications of this? Does this mean the whole assembly won't load/run, or will only the code marked unsafe refuse to run? The library could potentially be used in a number of environments, so to be able to fall back to a slower but more compatible mode would be nice.
What else might I consider?
NB: I should say, the strings I'm parsing could be reasonably large (say 30k) and in a custom format for which there is no standard .NET parser. Also, performance of this code is not super critical, so this partly just a theoretical question of curiosity.
30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.
For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random) and sum all the character code-points via the indexer takes... 3ms:
var watch = Stopwatch.StartNew();
char[] chars = new char[128 * 1024];
Random rand = new Random(); // fill with junk
for (int i = 0; i < chars.Length; i++) chars[i] =
(char) ((int) 'a' + rand.Next(26));
int sum = 0;
string s = new string(chars);
int len = s.Length;
for(int i = 0 ; i < len ; i++)
{
sum += (int) chars[i];
}
watch.Stop();
Console.WriteLine(sum);
Console.WriteLine(watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
For files that are actually large, a reader approach should be used - StreamReader etc.
"Parsing" is quite an inexact term. Since you talks of 30k, it seems that you might be dealing with some sort of structured string which can be covered by creating a parser using a parser generator tool.
A nice tool to create, maintain and understand the whole process is the GOLD Parsing System by Devin Cook: http://www.devincook.com/goldparser/
This can help you create code which is efficient and correct for many textual parsing needs.
As for your points:
is usually not useful for parsing which goes further than splitting a string.
is better suited if there are no recursions or too complex rules.
is basically a no-go if you haven't really identified this as a serious problem. The JIT can take care of doing the range checks only when needed, and indeed for simple loops (the typical for loop) this is handled pretty well.
I was working on some code recently and came across a method that had 3 for-loops that worked on 2 different arrays.
Basically, what was happening was a foreach loop would walk through a vector and convert a DateTime from an object, and then another foreach loop would convert a long value from an object. Each of these loops would store the converted value into lists.
The final loop would go through these two lists and store those values into yet another list because one final conversion needed to be done for the date.
Then after all that is said and done, The final two lists are converted to an array using ToArray().
Ok, bear with me, I'm finally getting to my question.
So, I decided to make a single for loop to replace the first two foreach loops and convert the values in one fell swoop (the third loop is quasi-necessary, although, I'm sure with some working I could also put it into the single loop).
But then I read the article "What your computer does while you wait" by Gustav Duarte and started thinking about memory management and what the data was doing while it's being accessed in the for-loop where two lists are being accessed simultaneously.
So my question is, what is the best approach for something like this? Try to condense the for-loops so it happens in as little loops as possible, causing multiple data access for the different lists. Or, allow the multiple loops and let the system bring in data it's anticipating. These lists and arrays can be potentially large and looping through 3 lists, perhaps 4 depending on how ToArray() is implemented, can get very costy (O(n^3) ??). But from what I understood in said article and from my CS classes, having to fetch data can be expensive too.
Would anyone like to provide any insight? Or have I completely gone off my rocker and need to relearn what I have unlearned?
Thank you
The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.
If each of your loops is O(n), then you've still only got an O(n) operation.
Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.
For referemce,
the article is at
What your computer does while you wait - Gustav Duarte
Also there's a guide to big-O notation.
It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.
Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.
Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...
So in every way, I'd go with 2 separate loops ;-p
Am I understanding you correctly in this?
You have these loops:
for (...){
// Do A
}
for (...){
// Do B
}
for (...){
// Do C
}
And you converted it into
for (...){
// Do A
// Do B
}
for (...){
// Do C
}
and you're wondering which is faster?
If not, some pseudocode would be nice, so we could see what you meant. :)
Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.
If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.
My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.
Aside from cache thrashing on large functions, there may be benefits on tiny functions as well. This applies on any auto-vectorizing compiler (not sure if Java JIT will do this yet, but you can count on it eventually).
Suppose this is your code:
// if this compiles down to a raw memory copy with a bitmask...
Date morningOf(Date d) { return Date(d.year, d.month, d.day, 0, 0, 0); }
Date timestamps[N];
Date mornings[N];
// ... then this can be parallelized using SSE or other SIMD instructions
for (int i = 0; i != N; ++i)
mornings[i] = morningOf(timestamps[i]);
// ... and this will just run like normal
for (int i = 0; i != N; ++i)
doOtherCrap(mornings[i]);
For large data sets, splitting the vectorizable code out into a separate loop can be a big win (provided caching doesn't become a problem). If it was all left as a single loop, no vectorization would occur.
This is something that Intel recommends in their C/C++ optimization manual, and it really can make a big difference.
... working on one piece of data but with two functions can sometimes make it so that code to act on that data doesn't fit in the processor's low level caches.
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // pushes functionreallybig2 out of cache
myObject.functionreallybig2(); // pushes functionreallybig1 out of cache
}
vs
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // this stays in the cache next time through loop
}
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig2(); // this stays in the cache next time through loop
}
But it was probably a mistake (usually this type of trick is commented)
When data is cycicly loaded and unloaded like this, it is called cache thrashing, btw.
This is a seperate issue from the data these functions are working on, as typically the processor caches that separately.
I apologize for not responding sooner and providing any kind of code. I got sidetracked on my project and had to work on something else.
To answer anyone still monitoring this question;
Yes, like jalf said, the function is something like:
PrepareData(vectorA, VectorB, xArray, yArray):
listA
listB
foreach(value in vectorA)
convert values insert in listA
foreach(value in vectorB)
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
I changed it to:
PrepareData(vectorA, vectorB, ref xArray, ref yArray):
listA
listB
for(int i = 0; i < vectorA.count && vectorB.count; i++)
convert values insert in listA
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
Keeping in mind that the vectors can potentially have a large number of items. I figured the second one would be better, so that the program wouldnt't have to loop n times 2 or 3 different times. But then I started to wonder about the affects (effects?) of memory fetching, or prefetching, or what have you.
So, I hope this helps to clear up the question, although a good number of you have provided excellent answers.
Thank you every one for the information. Thinking in terms of Big-O and how to optimize has never been my strong point. I believe I am going to put the code back to the way it was, I should have trusted the way it was written before instead of jumping on my novice instincts. Also, in the future I will put more reference so everyone can understand what the heck I'm talking about (clarity is also not a strong point of mine :-/).
Thank you again.