Why does my object take a long time to be created? - c#

I'm writing code that scans large sections of text and performs some basic statistics on it, such as number of upper and lower case characters, punctuation characters etc.
Originally my code looked like this:
foreach (var character in stringToCount)
{
if (char.IsControl(character))
{
controlCount++;
}
if (char.IsDigit(character))
{
digitCount++;
}
if (char.IsLetter(character))
{
letterCount++;
} //etc.
}
And then from there I was creating a new object like this, which simply reads the local variables and passes them to the constructor:
var result = new CharacterCountResult(controlCount, highSurrogatecount, lowSurrogateCount, whiteSpaceCount,
symbolCount, punctuationCount, separatorCount, letterCount, digitCount, numberCount, letterAndDigitCount,
lowercaseCount, upperCaseCount, tempDictionary);
However a user over on Code Review Stack Exchange pointed out that I can just do the following. Great, I've saved myself a load of code which is good.
var result = new CharacterCountResult(stringToCount.Count(char.IsControl),
stringToCount.Count(char.IsHighSurrogate), stringToCount.Count(char.IsLowSurrogate),
stringToCount.Count(char.IsWhiteSpace), stringToCount.Count(char.IsSymbol),
stringToCount.Count(char.IsPunctuation), stringToCount.Count(char.IsSeparator),
stringToCount.Count(char.IsLetter), stringToCount.Count(char.IsDigit),
stringToCount.Count(char.IsNumber), stringToCount.Count(char.IsLetterOrDigit),
stringToCount.Count(char.IsLower), stringToCount.Count(char.IsUpper), tempDictionary);
However creating the object the second way takes approximately (on my machine) an extra ~200ms.
How can this be? While it might not seem a significant amount of extra time, it soon adds up when I've left it running processing text.
What should I be doing differently?

You are using method groups (syntactic sugar hiding a lambda or delegate) and iterating over the characters many times, whereas you could get it done with one pass (as in your original code).
I remember your previous question, and I recall seeing the recommendation to use the method group and string.Count(char.IsLetterOrDigit) and thinking "yeh that looks pretty but won't perform well", so it was amusing to actually see that you found exactly that.
If performance is important, I would just do it without delegates period, and use one giant loop with a single pass, the traditional way without delegates or multiple iterations, and even further, tune it by organizing the logic such that any case that excludes other cases is organized such that you do "lazy evaluation". Example, if you know a character is whitespace, then don't check for digit or alpha, etc. Or if you know it is digitOrAlpha, then include digit and alpha checks inside that condition.
Something like:
foreach(var ch in string) {
if(char.IsWhiteSpace(ch)) {
...
}
else {
if(char.IsLetterOrDigit(ch)) {
letterOrDigit++;
if(char.IsDigit(ch)) digit++;
if(char.IsLetter(ch)) letter++;
}
}
}
If you REALLY want to micro-optimize, write a program to pre-calculate all of the options and emit a huge switch statement which does table lookups.
switch(ch) {
case 'A':
isLetter++;
isUpper++;
isLetterOrDigit++;
break;
case 'a':
isLetter++;
isLower++;
isLetterOrDigit++;
break;
case '!':
isPunctuation++;
...
}
Now if you want to get REALLY crazy, organize the switch statement according to real-life frequency of occurence, and put the most common letters at the top of the "tree", and so forth. Of course, if you care that much about speed, it might be a job for plain C.
But I've wandered a bit far afield from your original question. :)

Your old way you walked through the text once, increasing all of your counters as you go. In your new way you walk though the text 13 times (once for each call to stringToCount.Count() and only update one counter per pass.
However, this kind of problem is the perfect situation for Parallel.ForEach. You can walk through the text with multiple threads (being sure your increments are thread safe) and get your totals faster.
Parallel.ForEach(stringToCount, character =>
{
if (char.IsControl(character))
{
//Interlocked.Increment gives you a thread safe ++
Interlocked.Increment(ref controlCount);
}
if (char.IsDigit(character))
{
Interlocked.Increment(ref digitCount);
}
if (char.IsLetter(character))
{
Interlocked.Increment(ref letterCount);
} //etc.
});
var result = new CharacterCountResult(controlCount, highSurrogatecount, lowSurrogateCount, whiteSpaceCount,
symbolCount, punctuationCount, separatorCount, letterCount, digitCount, numberCount, letterAndDigitCount,
lowercaseCount, upperCaseCount, tempDictionary);
It still walks through the text once, but many workers will be walking through various parts of the text at the same time.

Related

C# Fastest way to determine if a string contains all elements of a list

Quick background. I have a string of words - I separate out those words into a List (I've tried HashSet it doesn't make any difference - and you lose the ordered nature of a List).
I then manipulate the original words in many dull ways - and create thousands of "new strings" - all of these strings are in a StringBuilder which has been set .ToString();
At the end of the manipulation, I want to QC those new strings - and be sure that every word that was in the original set - is still somewhere in those new strings and I have not accidentally lost a word.
That original string, can run to hundreds of individual words.
Short Example:
List<string> uniqueWords = new List<string> { "two", "three", "weather sunday" };
string final = "two and tomorrow\n\rtwo or wednesday\n\rtwo with thursday\n\rtwo without friday\n\rthree gone tomorrow\n\rthree weather saturday\n\rthree timely sunday";
The output string can run to tens of millions of characters, millions of words, 200,000+ rows of data (when split). You may notice that there are words that are actually two words separated by a space - so I cannot simply split out the individual words by splitting on the space as comparing them to the original would fail, and I need to confirm the words are exactly as they appeared originally - having weather somewhere and sunday somewhere - is not the same as having 'weather sunday' - for my purposes.
The the code I have tried so far and have benchmarked:
First attempt:
var allWords = uniqueWords.Where(substring => final.Contains(substring, StringComparison.CurrentCultureIgnoreCase)).ToList();
Second Attempt:
List<string> removeableList = new(uniqueWords);
foreach (var item in uniqueWords)
{
if (removeableList.Count == 0)
{
break;
}
if (final.Contains(item))
{
removeableList.Remove(item);
}
}
Third Attempt:
List<string> removeableList = new(uniqueWords);
for (int i = uniqueWords.Count; i >= 0; i--)
{
if (removeableList.Count == 0)
{
break;
}
if (final.Contains(uniqueWords[i]))
{
removeableList.Remove(uniqueWords[i]);
}
}
These are the results:
These results are repeatable, though I will say that the First Attempt tends to fluctuate quite a lot while the Second and Third Attempts tend to remain at about the same level - the Third Attempt does seem to do better than the Second rather consistently.
Are there any options that I am missing?
I have tried it using a Regex Matches collection into a HashSet - oh that was bad, 4 times worse than the First Attempt.
If there is a way to improve the performance on this task I would love to find it.
Your attempt #1 uses CurrentCultureIgnoreCase which will be slow. But even after removing that, you are adding to the list, rather than removing, and therefore the list might need to be resized.
You are also measuring two different things: option #1 is getting the list of words which are in final, the others get the list of words which are not.
Further options include:
Use List.RemoveAll
List<string> remainingWords = new(uniqueWords);
remainingWords.RemoveAll(final.Contains); // use delegate directly, without anonymous delegate
Use a pre-sized list and use Linq
List<string> remainingWords = new(uniqueWords.Length);
remainingWords.AddRange(uniqueWords.Where(s => !final.Contains(s)));
Each of these two options can be flipped depending on what result you are trying to achieve, as mentioned.
List<string> words = new(uniqueWords);
words.RemoveAll(s => !final.Contains(s));
List<string> words = new(uniqueWords.Length);
words.AddRange(uniqueWords.Where(final.Contains)); // use delegate directly, without anonymous delegate
#Charlieface, thanks for that - I tried those, I think you have a point about adding to a list - as that appears much slower. For me it doesn't matter whether it is adding or removing, the result is a True/False return - whether the list is empty or of the size of the original list.
Sixth Attempt:
List<string> removeableList = new(uniqueWords.Count);
removeableList.AddRange(uniqueWords.Where(s => !parsedTermsComplete!.Contains(s)));
Seventh Attempt:
List<string> removeableList = new(uniqueWords);
removeableList.RemoveAll(parsedTermsComplete!.Contains);
Results in comparison to Third Attempt (fastest generally):
The adding does appear slower - and memory is a little higher for the RemoveAll but timing is consistent - bearing in mind it fluctuates depending on what Windows decides to do at any given moment...
Here is an interesting implementation of the AhoCorasickTree method - which I saw mentioned on this site somewhere else.
My knowledge on this is extremely limited so this may not be a good implementation at all - I am not saying it is a good implementation just that it works - this comes from a nuget package, but I am unsure on SO's policy on nuget package links, so won't link for now. In testing, creating an array was faster than creating a list.
Eighth Attempt:
var wordArray = uniqueWords.ToArray();
int i = uniqueWords.Count - 1;
foreach (var item in wordArray)
{
var keyWords = new AhoCorasickTree(new[] { item });
if (keyWords.Contains(parsedTermsComplete))
{
uniqueWords.RemoveAt(i);
}
i--;
}
I noticed in testing that creating a "removableList" was actually slower than creating a removableArray (found this out implementing the above Aho run). I updated the Third Attempt to incorporate this:
var removeableArray = uniqueWords.ToArray();
for (int i = removeableArray.Length -1; i >= 0; i--)
{
if (!uniqueWords.Any())
{
break;
}
if (parsedTermsComplete!.Contains(removeableArray[i]))
{
uniqueWords.RemoveAt(i);
}
}
The Benchmarks come out like this, the Third Attempt is updated to an array, the Seventh Attempt is the AhoCorasick implementation on a list, and the Eighth Attempt is the AhoCorasick implementation on an Array.
The ToArray - does seem faster than List, which is good to know.
My only issue with the AhoCorasick is that in practice - in a WASM application - this is actually much slower, so not a good option for me - but I put it here because it does seem to be much faster in Benchmarks (may be using multiple threads where WASM is limited to 1) and doesn't appear to allocate any memory, so might be useful to someone - interesting that the Third Attempt also appears to be allocated no memory when using an Array implementation whereas on a list it was allocated.

Redundant/Better Performance Code VS Optimized/Less Performance Code

In my case, I'm using C#, but the concept of the question would apply to Java as well. Hopefully the answer would be generic enough to cover both languages. Otherwise it's better to split the question into two.
I've always thought of which one is a better practice.
Does the compiler take care of enhancing the 'second' code so its performance would be as good as the 'first' code?
Could it be worked around to get a 'better performance' and 'optimized' code at the same time?
Redundant/Better Performance Code:
string name = GetName(); // returned string could be empty
List<string> myListOfStrings = GetListOfStrings();
if(string.IsNullOrWhiteSpace(name)
{
foreach(string s in myListOfStrings)
Console.WriteLine(s);
}
else
{
foreach(string s in myListOfStrings)
Console.WriteLine(s + " (Name is: " + name);
}
Optimized/Less Performance Code:
string name = GetName(); // returned string could be empty
List<string> myListOfStrings = GetListOfStrings();
foreach(string s in myListOfStrings)
Console.WriteLine(string.IsNullOrWhiteSpace(name) ? s : s + " (Name is: " + name);
Obviously the execution time of the 'first' code is less because it executes the condition 'string.IsNullOrWhiteSpace(name)' only once per loop. Whereas the 'second' code (which is nicer) executes the condition on every iteration.
Please consider a long loop execution time not a short one because I know that when it is short, the performance won't differ.
Does the compiler take care of enhancing the 'second' code so its performance would be as good as the 'first' code?
No, it cannot.
It doesn't know that the boolean expression will not change between iterations of the loop. It's possible for the code to not return the same value each time, so it is forced to perform the check in each iteration.
It's also possible that the boolean expression could have side effects. In this case it doesn't, but there's no way for the compiler to know that. It's important that such side effects would be performed in order to meet the specs, so it needs to execute the check in each iteration.
So, the next question you need to ask is, in a case such as this, is it important to perform the optimization that you've mentioned? In any situation I can imagine for the exact code you showed, probably not. The check is simply going to be so fast that it's almost certainly not going to be a bottleneck. If there are performance problems there are almost certainly bigger fish.
That said, with only a few changes to the example it can be made to matter. If the boolean expression itself is computationally expensive (i.e. it is the result of a database call, a web service call, some expensive CPU computation, etc.) then it could be a performance optimization that matters. Another case to consider is what would happen if the boolean expression had side effects. What if it was a MoveNext call on an IEnumerator? If it was important that it only be executed exactly once because you don't want the side effects to happen N times then that makes this a very important issue.
There are several possible solutions in such a case.
The easiest is most likely to just compute the boolean expression once and then store it in a variable:
bool someValue = ComputeComplexBooleanValue();
foreach(var item in collection)
{
if(someValue)
doStuff(item);
else
doOtherStuff(item);
}
If you want to execute the boolean value 0-1 times (i.e. avoid calling it even once in the event that the collection is empty) then we can use Lazy to lazily compute the value, but ensure it's still only computed at most one time:
var someValue = new Lazy<bool>(() => ComputeComplexBooleanValue());
foreach (var item in collection)
{
if (someValue.Value)
doStuff(item);
else
doOtherStuff(item);
}
You should always go the way that is easier to understand and maintain first. This means reducing duplicate code to absolute minumum (DRY). In addition this kind of micro optimization is not that important for many systems. Also note that shorter code is not always better.
I think I would go with somehting like this:
string name = GetName(); // returned string could be empty
bool nameIsEmpty = string.IsNullOrWhiteSpace(name);
foreach (string s in GetListOfStrings()) {
string messageAddition = "";
if (!nameIsEmpty) {
messageAddition = " (Name is: " + name + ")";
}
Console.WriteLine(s + messageAddition);
// more code which uses the computed value..
// otherwise the condition can be moved out the loop
}
I find an extra if statement easier to read than the ?: operator within a method call but this might be a personal taste.
If you want to improve performance later you should profile your application and start optimizing the slowest sections of code first. Maybe your GetListOfStrings() method is so slow that the performance of the other code is totally irrelevant. If you measured that duplicating the loop improves the perfomance by a significant value you can think about changing it.

C# *Strange* problem with StopWatch and a foreach loop

I have the this code:
var options = GetOptions(From, Value, SelectedValue);
var stopWatch = System.Diagnostics.Stopwatch.StartNew();
foreach (Option option in options)
{
stringBuilder.Append("<option");
stringBuilder.Append(" value=\"");
stringBuilder.Append(option.Value);
stringBuilder.Append("\"");
if (option.Selected)
stringBuilder.Append(" selected=\"selected\"");
stringBuilder.Append('>');
stringBuilder.Append(option.Text);
stringBuilder.Append("</option>");
}
HttpContext.Current.Response.Write("<b>" + stopWatch.Elapsed.ToString() + "</b><br>");
It is writing:
00:00:00.0004255 in the first try (not in debug)
00:00:00.0004260 in the second try and
00:00:00.0004281 in the third try.
Now, if I change the code so the measure will be inside the foreach loop:
var options = GetOptions(From, Value, SelectedValue);
foreach (Option option in options)
{
var stopWatch = System.Diagnostics.Stopwatch.StartNew();
stringBuilder.Append("<option");
stringBuilder.Append(" value=\"");
stringBuilder.Append(option.Value);
stringBuilder.Append("\"");
if (option.Selected)
stringBuilder.Append(" selected=\"selected\"");
stringBuilder.Append('>');
stringBuilder.Append(option.Text);
stringBuilder.Append("</option>");
HttpContext.Current.Response.Write("<b>" + stopWatch.Elapsed.ToString() + "</b><br>");
}
...I get
[00:00:00.0000014, 00:00:00.0000011] = 00:00:00.0000025 in the first try (not in debug),
[00:00:00.0000016, 00:00:00.0000011] = 00:00:00.0000027 in the second try and
[00:00:00.0000013, 00:00:00.0000011] = 00:00:00.0000024 in the third try.
?!
It is completely unsense according to the first results... I've heard that the foreach loop is slow, but never imagined that it is so slow... Is it that?
options has 2 options.
Here's the option class, if it is needed:
public class Option
{
public Option(string text, string value, bool selected)
{
Text = text;
Value = value;
Selected = selected;
}
public string Text
{
get;
set;
}
public string Value
{
get;
set;
}
public bool Selected
{
get;
set;
}
}
Thanks.
The foreach loop itself has nothing to do with the time difference.
What is the GetOptions method returning? My guess is that it's not returning a collection of options, but rather an enumerator that is capable of getting the options. That means that actually fetching the options are not done until you start to iterate them.
In the first case you are starting the clock before starting iterating the options, which means that the time for fetching the options is included in the time.
In the second case you are starting the clock after starting iterating the options, which means that the time for fetching the options is not included in the time.
So, the time difference that you see it not due to the foreach loop itself, it's the time it takes to fetch the options.
You can make sure that the options are fetched immediately by reading them into a collection:
var options = GetOptions(From, Value, SelectedValue).ToList();
Now measure the performance, and you will see very little difference.
If you measure the time taken to do something 160 times, it will usually take of the order of 160 times longer than measuring the time it takes to do it once. Are you suggesting that the contents of the loop is only executed once, or are you trying to compare chalk and cheese?
In the first case, try changing the last line of your code from using
stopWatch.Elapsed.ToString()
to
stopWatch.Elapsed.ToString() / options.Count
That will at least mean you are comparing one iteration with one iteration.
However, your results will still be useless. Timing a very short operation once gives poor results - you have to repeat such thing tens of thousands of times to get a statistically meaningingful average time. Otherwise the inaccuracy of the system clock and the overheads involved in starting and stopping your timer will swamp your results.
Also, what is the PC doing while all this is happening? If there are other processes loading the CPU, then they could easily interfere with your timings. If you're running this on a busy server then you may get competely random results.
Lastly, how you exceute the tests can alter things. If you always run test 1 followed by test 2, it's possible that running the first test affects CPU caches (e.g. of the data in the options list) etc so that the following code is able to execute faster. If garbage collection occurs during one of your tests, it wil skew the results.
You need to eliminate all these factors before you have numbers that are worth comparing. Only then should you ask "why is test 1 running so much slower than test 2"?
The first code example doesn't output anything until all the options have been iterated while the second one outputs a time after the first option has been processed. If there are multiple options, you would expect to see such a difference.
Just pause it a few times in the IDE and you'll see where the time goes.
There's a very natural and strong temptation to think that the time things take is proportional to how much code they are. For example, which do you think is faster?
for (MyClass x in y)
for (MyClass theParticularInstanceOfClass in MyCollectionOfInstances)
It is natural to think that the first is faster, when in fact the code size is irrelevant and could be hiding a multitude of expensive operations.

Performance issue: comparing to String.Format

A while back a post by Jon Skeet planted the idea in my head of building a CompiledFormatter class, for using in a loop instead of String.Format().
The idea is the portion of a call to String.Format() spent parsing the format string is overhead; we should be able to improve performance by moving that code outside of the loop. The trick, of course, is the new code should exactly match the String.Format() behavior.
This week I finally did it. I went through using the .Net framework source provided by Microsoft to do a direct adaption of their parser (it turns out String.Format() actually farms the work to StringBuilder.AppendFormat()). The code I came up with works, in that my results are accurate within my (admittedly limited) test data.
Unfortunately, I still have one problem: performance. In my initial tests the performance of my code closely matches that of the normal String.Format(). There's no improvement at all; it's even consistently a few milliseconds slower. At least it's still in the same order (ie: the amount slower doesn't increase; it stays within a few milliseconds even as the test set grows), but I was hoping for something better.
It's possible that the internal calls to StringBuilder.Append() are what actually drive the performance, but I'd like to see if the smart people here can help improve things.
Here is the relevant portion:
private class FormatItem
{
public int index; //index of item in the argument list. -1 means it's a literal from the original format string
public char[] value; //literal data from original format string
public string format; //simple format to use with supplied argument (ie: {0:X} for Hex
// for fixed-width format (examples below)
public int width; // {0,7} means it should be at least 7 characters
public bool justify; // {0,-7} would use opposite alignment
}
//this data is all populated by the constructor
private List<FormatItem> parts = new List<FormatItem>();
private int baseSize = 0;
private string format;
private IFormatProvider formatProvider = null;
private ICustomFormatter customFormatter = null;
// the code in here very closely matches the code in the String.Format/StringBuilder.AppendFormat methods.
// Could it be faster?
public String Format(params Object[] args)
{
if (format == null || args == null)
throw new ArgumentNullException((format == null) ? "format" : "args");
var sb = new StringBuilder(baseSize);
foreach (FormatItem fi in parts)
{
if (fi.index < 0)
sb.Append(fi.value);
else
{
//if (fi.index >= args.Length) throw new FormatException(Environment.GetResourceString("Format_IndexOutOfRange"));
if (fi.index >= args.Length) throw new FormatException("Format_IndexOutOfRange");
object arg = args[fi.index];
string s = null;
if (customFormatter != null)
{
s = customFormatter.Format(fi.format, arg, formatProvider);
}
if (s == null)
{
if (arg is IFormattable)
{
s = ((IFormattable)arg).ToString(fi.format, formatProvider);
}
else if (arg != null)
{
s = arg.ToString();
}
}
if (s == null) s = String.Empty;
int pad = fi.width - s.Length;
if (!fi.justify && pad > 0) sb.Append(' ', pad);
sb.Append(s);
if (fi.justify && pad > 0) sb.Append(' ', pad);
}
}
return sb.ToString();
}
//alternate implementation (for comparative testing)
// my own test call String.Format() separately: I don't use this. But it's useful to see
// how my format method fits.
public string OriginalFormat(params Object[] args)
{
return String.Format(formatProvider, format, args);
}
Additional notes:
I'm wary of providing the source code for my constructor, because I'm not sure of the licensing implications from my reliance on the original .Net implementation. However, anyone who wants to test this can just make the relevant private data public and assign values that mimic a particular format string.
Also, I'm very open to changing the FormatInfo class and even the parts List if anyone has a suggestion that could improve the build time. Since my primary concern is sequential iteration time from front to end maybe a LinkedList would fare better?
[Update]:
Hmm... something else I can try is adjusting my tests. My benchmarks were fairly simple: composing names to a "{lastname}, {firstname}" format and composing formatted phone numbers from the area code, prefix, number, and extension components. Neither of those have much in the way of literal segments within the string. As I think about how the original state machine parser worked, I think those literal segments are exactly where my code has the best chance to do well, because I no longer have to examine each character in the string.
Another thought:
This class is still useful, even if I can't make it go faster. As long as performance is no worse than the base String.Format(), I've still created a strongly-typed interface which allows a program to assemble it's own "format string" at run time. All I need to do is provide public access to the parts list.
Here's the final result:
I changed the format string in a benchmark trial to something that should favor my code a little more:
The quick brown {0} jumped over the lazy {1}.
As I expected, this fares much better compared to the original; 2 million iterations in 5.3 seconds for this code vs 6.1 seconds for String.Format. This is an undeniable improvement. You might even be tempted to start using this as a no-brainer replacement for many String.Format situations. After all, you'll do no worse and you might even get a small performance boost: as much 14%, and that's nothing to sneeze at.
Except that it is. Keep in mind, we're still talking less than half a second difference for 2 million attempts, under a situation specifically designed to favor this code. Not even busy ASP.Net pages are likely to create that much load, unless you're lucky enough to work on a top 100 web site.
Most of all, this omits one important alternative: you can create a new StringBuilder each time and manually handle your own formatting using raw Append() calls. With that technique my benchmark finished in only 3.9 seconds. That's a much greater improvement.
In summary, if performance doesn't matter as much, you should stick with the clarity and simplicity of the built-in option. But when in a situation where profiling shows this really is driving your performance, there is a better alternative available via StringBuilder.Append().
Don't stop now!
Your custom formatter might only be slightly more efficient than the built-in API, but you can add more features to your own implementation that would make it more useful.
I did a similar thing in Java, and here are some of the features I added (besides just pre-compiled format strings):
1) The format() method accepts either a varargs array or a Map (in .NET, it'd be a dictionary). So my format strings can look like this:
StringFormatter f = StringFormatter.parse(
"the quick brown {animal} jumped over the {attitude} dog"
);
Then, if I already have my objects in a map (which is pretty common), I can call the format method like this:
String s = f.format(myMap);
2) I have a special syntax for performing regular expression replacements on strings during the formatting process:
// After calling obj.toString(), all space characters in the formatted
// object string are converted to underscores.
StringFormatter f = StringFormatter.parse(
"blah blah blah {0:/\\s+/_/} blah blah blah"
);
3) I have a special syntax that allows the formatted to check the argument for null-ness, applying a different formatter depending on whether the object is null or non-null.
StringFormatter f = StringFormatter.parse(
"blah blah blah {0:?'NULL'|'NOT NULL'} blah blah blah"
);
There are a zillion other things you can do. One of the tasks on my todo list is to add a new syntax where you can automatically format Lists, Sets, and other Collections by specifying a formatter to apply to each element as well as a string to insert between all elements. Something like this...
// Wraps each elements in single-quote charts, separating
// adjacent elements with a comma.
StringFormatter f = StringFormatter.parse(
"blah blah blah {0:#['$'][,]} blah blah blah"
);
But the syntax is a little awkward and I'm not in love with it yet.
Anyhow, the point is that your existing class might not be much more efficient than the framework API, but if you extend it to satisfy all of your personal string-formatting needs, you might end up with a very convenient library in the end. Personally, I use my own version of this library for dynamically constructing all SQL strings, error messages, and localization strings. It's enormously useful.
It seems to me that in order to get actual performance improvement, you'd need to factor out any format analysis done by your customFormatter and formattable arguments into a function that returns some data structure that tells a later formatting call what to do. Then you pull those data structures in your constructor and store them for later use. Presumably this would involve extending ICustomFormatter and IFormattable. Seems kinda unlikely.
Have you accounted for the time to do the JIT compile as well? After all, the framework will be ngen'd which could account for the differences?
The framework provides explicit overrides to the format methods that take fixed-sized parameter lists instead of the params object[] approach to remove the overhead of allocating and collecting all of the temporary object arrays. You might want to consider that for your code as well. Also, providing strongly-typed overloads for common value types would reduce boxing overhead.
I gotta believe that spending as much time optimizing data IO would earn exponentially bigger returns!
This is surely a kissin' cousin to YAGNI for this. Avoid Premature Optimization. APO.

Back to basics; for-loops, arrays/vectors/lists, and optimization

I was working on some code recently and came across a method that had 3 for-loops that worked on 2 different arrays.
Basically, what was happening was a foreach loop would walk through a vector and convert a DateTime from an object, and then another foreach loop would convert a long value from an object. Each of these loops would store the converted value into lists.
The final loop would go through these two lists and store those values into yet another list because one final conversion needed to be done for the date.
Then after all that is said and done, The final two lists are converted to an array using ToArray().
Ok, bear with me, I'm finally getting to my question.
So, I decided to make a single for loop to replace the first two foreach loops and convert the values in one fell swoop (the third loop is quasi-necessary, although, I'm sure with some working I could also put it into the single loop).
But then I read the article "What your computer does while you wait" by Gustav Duarte and started thinking about memory management and what the data was doing while it's being accessed in the for-loop where two lists are being accessed simultaneously.
So my question is, what is the best approach for something like this? Try to condense the for-loops so it happens in as little loops as possible, causing multiple data access for the different lists. Or, allow the multiple loops and let the system bring in data it's anticipating. These lists and arrays can be potentially large and looping through 3 lists, perhaps 4 depending on how ToArray() is implemented, can get very costy (O(n^3) ??). But from what I understood in said article and from my CS classes, having to fetch data can be expensive too.
Would anyone like to provide any insight? Or have I completely gone off my rocker and need to relearn what I have unlearned?
Thank you
The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.
If each of your loops is O(n), then you've still only got an O(n) operation.
Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.
For referemce,
the article is at
What your computer does while you wait - Gustav Duarte
Also there's a guide to big-O notation.
It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.
Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.
Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...
So in every way, I'd go with 2 separate loops ;-p
Am I understanding you correctly in this?
You have these loops:
for (...){
// Do A
}
for (...){
// Do B
}
for (...){
// Do C
}
And you converted it into
for (...){
// Do A
// Do B
}
for (...){
// Do C
}
and you're wondering which is faster?
If not, some pseudocode would be nice, so we could see what you meant. :)
Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.
If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.
My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.
Aside from cache thrashing on large functions, there may be benefits on tiny functions as well. This applies on any auto-vectorizing compiler (not sure if Java JIT will do this yet, but you can count on it eventually).
Suppose this is your code:
// if this compiles down to a raw memory copy with a bitmask...
Date morningOf(Date d) { return Date(d.year, d.month, d.day, 0, 0, 0); }
Date timestamps[N];
Date mornings[N];
// ... then this can be parallelized using SSE or other SIMD instructions
for (int i = 0; i != N; ++i)
mornings[i] = morningOf(timestamps[i]);
// ... and this will just run like normal
for (int i = 0; i != N; ++i)
doOtherCrap(mornings[i]);
For large data sets, splitting the vectorizable code out into a separate loop can be a big win (provided caching doesn't become a problem). If it was all left as a single loop, no vectorization would occur.
This is something that Intel recommends in their C/C++ optimization manual, and it really can make a big difference.
... working on one piece of data but with two functions can sometimes make it so that code to act on that data doesn't fit in the processor's low level caches.
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // pushes functionreallybig2 out of cache
myObject.functionreallybig2(); // pushes functionreallybig1 out of cache
}
vs
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // this stays in the cache next time through loop
}
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig2(); // this stays in the cache next time through loop
}
But it was probably a mistake (usually this type of trick is commented)
When data is cycicly loaded and unloaded like this, it is called cache thrashing, btw.
This is a seperate issue from the data these functions are working on, as typically the processor caches that separately.
I apologize for not responding sooner and providing any kind of code. I got sidetracked on my project and had to work on something else.
To answer anyone still monitoring this question;
Yes, like jalf said, the function is something like:
PrepareData(vectorA, VectorB, xArray, yArray):
listA
listB
foreach(value in vectorA)
convert values insert in listA
foreach(value in vectorB)
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
I changed it to:
PrepareData(vectorA, vectorB, ref xArray, ref yArray):
listA
listB
for(int i = 0; i < vectorA.count && vectorB.count; i++)
convert values insert in listA
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
Keeping in mind that the vectors can potentially have a large number of items. I figured the second one would be better, so that the program wouldnt't have to loop n times 2 or 3 different times. But then I started to wonder about the affects (effects?) of memory fetching, or prefetching, or what have you.
So, I hope this helps to clear up the question, although a good number of you have provided excellent answers.
Thank you every one for the information. Thinking in terms of Big-O and how to optimize has never been my strong point. I believe I am going to put the code back to the way it was, I should have trusted the way it was written before instead of jumping on my novice instincts. Also, in the future I will put more reference so everyone can understand what the heck I'm talking about (clarity is also not a strong point of mine :-/).
Thank you again.

Categories