C# better way to do this?

C# better way to do this? - c#

Hi I have this code below and am looking for a prettier/faster way to do this.
Thanks!
string value = "HelloGoodByeSeeYouLater";
string[] y = new string[]{"Hello", "You"};
foreach(string x in y)
{
value = value.Replace(x, "");
}

You could do:
y.ToList().ForEach(x => value = value.Replace(x, ""));
Although I think your variant is more readable.

Forgive me, but someone's gotta say it,
value = Regex.Replace( value, string.Join("|", y.Select(Regex.Escape)), "" );
Possibly faster, since it creates fewer strings.
EDIT: Credit to Gabe and lasseespeholt for Escape and Select.

While not any prettier, there are other ways to express the same thing.
In LINQ:
value = y.Aggregate(value, (acc, x) => acc.Replace(x, ""));
With String methods:
value = String.Join("", value.Split(y, StringSplitOptions.None));
I don't think anything is going to be faster in managed code than a simple Replace in a foreach though.

It depends on the size of the string you are searching. The foreach example is perfectly fine for small operations but creates a new instance of the string each time it operates because the string is immutable. It also requires searching the whole string over and over again in a linear fashion.
The basic solutions have all been proposed. The Linq examples provided are good if you are comfortable with that syntax; I also liked the suggestion of an extension method, although that is probably the slowest of the proposed solutions. I would avoid a Regex unless you have an extremely specific need.
So let's explore more elaborate solutions and assume you needed to handle a string that was thousands of characters in length and had many possible words to be replaced. If this doesn't apply to the OP's need, maybe it will help someone else.
Method #1 is geared towards large strings with few possible matches.
Method #2 is geared towards short strings with numerous matches.
Method #1
I have handled large-scale parsing in c# using char arrays and pointer math with intelligent seek operations that are optimized for the length and potential frequency of the term being searched for. It follows the methodology of:
Extremely cheap Peeks one character at a time
Only investigate potential matches
Modify output when match is found
For example, you might read through the whole source array and only add words to the output when they are NOT found. This would remove the need to keep redimensioning strings.
A simple example of this technique is looking for a closing HTML tag in a DOM parser. For example, I may read an opening STYLE tag and want to skip through (or buffer) thousands of characters until I find a closing STYLE tag.
This approach provides incredibly high performance, but it's also incredibly complicated if you don't need it (plus you need to be well-versed in memory manipulation/management or you will create all sorts of bugs and instability).
I should note that the .Net string libraries are already incredibly efficient but you can optimize this approach for your own specific needs and achieve better performance (and I have validated this firsthand).
Method #2
Another alternative involves storing search terms in a Dictionary containing Lists of strings. Basically, you decide how long your search prefix needs to be, and read characters from the source string into a buffer until you meet that length. Then, you search your dictionary for all terms that match that string. If a match is found, you explore further by iterating through that List, if not, you know that you can discard the buffer and continue.
Because the Dictionary matches strings based on hash, the search is non-linear and ideal for handling a large number of possible matches.
I'm using this methodology to allow instantaneous (<1ms) searching of every airfield in the US by name, state, city, FAA code, etc. There are 13K airfields in the US, and I've created a map of about 300K permutations (again, a Dictionary with prefixes of varying lengths, each corresponding to a list of matches).
For example, Phoenix, Arizona's main airfield is called Sky Harbor with the short ID of KPHX. I store:
KP
KPH
KPHX
Ph
Pho
Phoe
Ar
Ari
Ariz
Sk
Sky
Ha
Har
Harb
There is a cost in terms of memory usage, but string interning probably reduces this somewhat and the resulting speed justifies the memory usage on data sets of this size. Searching happens as the user types and is so fast that I have actually introduced an artificial delay to smooth out the experience.
Send me a message if you have the need to dig into these methodologies.

Extension method for elegance
(arguably "prettier" at the call level)
I'll implement an extension method that allows you to call your implementation directly on the original string as seen here.
value = value.Remove(y);
// or
value = value.Remove("Hello", "You");
// effectively
string value = "HelloGoodByeSeeYouLater".Remove("Hello", "You");
The extension method is callable on any string value in fact, and therefore easily reusable.
Implementation of Extension method:
I'm going to wrap your own implementation (shown in your question) in an extension method for pretty or elegant points and also employ the params keyword to provide some flexbility passing the arguments. You can substitute somebody else's faster implementation body into this method.
static class EXTENSIONS {
static public string Remove(this string thisString, params string[] arrItems) {
// Whatever implementation you like:
if (thisString == null)
return null;
var temp = thisString;
foreach(string x in arrItems)
temp = temp.Replace(x, "");
return temp;
}
}
That's the brightest idea I can come up with right now that nobody else has touched on.

Related

How to split the string more efficiently?

I have a JSON string which looks like:
{"Detail": [
{"PrimaryKey":111,"Date":"2016-09-01","Version":"7","Count":2,"Name":"Windows","LastAccessTime":"2016-05-25T21:49:52.36Z"},
{"PrimaryKey":222,"Date":"2016-09-02","Version":"8","Count":2,"Name":"Windows","LastAccessTime":"2016-07-25T21:49:52.36Z"},
{"PrimaryKey":333,"Date":"2016-09-03","Version":"9","Count":3,"Name":"iOS","LastAccessTime":"2016-08-22T21:49:52.36Z"},
.....( *many values )
]}
The array Detail has lots of PrimaryKeys. Sometimes, it is about 500K PrimaryKeys. The system we use can only process JSON strings with certain length, i.e. 128KB. So I have to split this JSON string into segments (each one is 128KB or fewer chars in length).
Regex reg = new Regex(#"\{"".{0," + (128*1024).ToString() + #"}""\}");
MatchCollection mc = reg.Matches(myListString);
Currently, I use regular expression to do this. It works fine. However, it uses too much memory. Is there a better way to do this (unnecessary to be regular expression)?
*** Added more info.
The 'system' I mentioned above is Azure DocumentDB. By default, the document can only be 512KB (as now). Although we can request MS increase this, but the json file we got always much much more than 512KB. That's why we need to figure out a way to do this.
If possible, we want to keep using documentDB, but we are open to other suggestions.
*** Some info to make things clear: 1) the values in the array are different. Not duplicated. 2) Yes, I use StringBuilder whenever I can. 3) Yes, I tried IndexOf & Substring, but based on tests, the performance is not better than regular expression in this case (although it could be the way I implement it).
* **the json object is complex, but all I care is this "Detail" which is an array. We can assume the string is just like the example, only has "Detail". We need to split this json array string into size smaller than 512KB. Basically, we can think this as a simple string, not json. but, it is a json format, so maybe some libraries can do this better.

Take a look at Json.NET (available via NuGet).
It has a JsonReader class, which allows you to create a required object by reading json by token, example of json reading with JsonReader. Not that if you pass invalid json string (e.g. without "end array" character or without "end object" character) to JsonReader - it will throw an exception only when it reaches invalid item, so you can pass different substrings to it.
Also, I guess that your system has something similar to JsonReader, so you can use it.
Reading a string with StringReader should not require too much application memory and it should be faster then iterating through regular expression matches.

Here is a hacky solution assuming data contains your JSON data:
var details = data
.Split('[')[1]
.Split(']')[0]
.Split(new[] { "}," }, StringSplitOptions.None)
.Select(d => d.Trim())
.Select(d => d.EndsWith("}") ? d : d + "}");;
foreach (var detail in details)
{
// Now process "detail" with your JSON library.
}
Working example: https://dotnetfiddle.net/sBQjyi
Obviously you should only do this if you really can't use a normal JSON library. See Mikhail Neofitov's answer for library suggestions.
If you are reading the JSON data from file or network you should implement a more stream-like processing where you read one details line, deserialize it with your JSON library and yield it to the caller. When the caller requests the next detail object, read the next line, deserialize it and so on. This way you can minimize the memory footprint of your deserializer.

You might want to consider storing each detail in a separate document. It means two round trips to get both the header and all of the detail documents, but it means you are never dealing with a really large JSON document. Also, if Detail is added to incrementally, it'll be much more efficient for writes because there is no way to just add another row. You have to rewrite the entire document. Your read/write ratio will determine the break even point in overall efficiency.
Another argument for this is that the complexity of regex parsing, feeding it through your JSON parser, then reassembling it goes away. You never know if your regex parser will deal with all cases (commas inside of quotes, international characters, etc.). I've seen many folks think they have a good regex only to find odd cases in production.
If your Detail array can grow unbounded (or even with a large bound), then you should definitely make this change regardless of your JSON parser limitations or read/write ratio because eventually, you'll exceed the limit.

Efficient method for checking substrings C#

I have a bunch of txt files that contains 300k lines. Each line has a URL. E.g. http://www.ieee.org/conferences_events/conferences/conferencedetails/index.html?Conf_ID=30718
In some string[] array I have a list of web-sites
amazon.com
google.com
ieee.org
...
I need to check whether that URL contains one of web-sites and update some counter that corresponds to certain web-site?
For now I'm using contains method, but it is very slow. There are ~900 records in array, so Worst case is 900*300K(for 1 file). I believe, that indexOf will be slow as well.
Can someone help me with faster approach? Thank you in advance

Good solution would leverage hashing. My approach would be following
Hash all your known hosts (the string[] collection that you mention)
Store the hash in a List<int> (hashes.Add("www.ieee.com".GetHashCode())
Sort the list (hashes.Sort())
When looking up a url:
Parse out host name from the url (get ieee.com from http://www.ieee.com/...). You can use new Uri("http://www.ieee.com/...").Host to get www.ieee.com.
Preprocess it to always expect same case. Use lower case (if you have http://www.IEee.COM/ take www.ieee.com)
Hash parsed host name, and look for it in the hashes list. Use BinarySearch method to find the hash.
If the hash exists, then you have this host in your list
Even faster, and memory efficient way is to use Bloom filters. I suggest you read about them on wikipedia, and there's even a C# implementation of bloom filter on CodePlex. Of course, you need to take into account that bloom filter allows false positive results (it can tell you that a value is in a collection even though it's not), so it's used for optimization only. It does not tell you that something is not in a collection if it is really not.
Using a Dictionary<TKey, TValue> is also an option, but if you only need to count number of occurrences, it's more efficient to maintain collection of hashes yourself.

Create a Dictionary of domain to counter.
For each URL, extract the domain (I'll leave that part to you to figure out), then look up the domain in the Dictionary and increment the counter.
I assume we're talking about domains since this is what you showed in your array as examples. If this can be any part of the URL instead, storing all your strings in a trie-like structure could work.

You can read this question, the answers will be help you:
High performance "contains" search in list of strings in C#

Well in a sort of similar need, though with indexof, I achieved a huge performance improvement with a simple loop
as in something like
int l = url.length;
int position = 0;
while (position < l)
{
if (url[i] == website[0])
{
//test rest of web site from position in an other loop
if (exactMatch(url,position, website))
}
}
Seems a bit wrong but in extreme cases searching for a set of strings (about 10) in a large structured (1.2Mb) file (so regex was out), I went from 3 minutes, to < 1 second.

Your problem as you describe it should not involve searching for substrings at all. Split your source file up into lines (or read it in line by line) which you already know will each contain a URL, and run it through some function to extract the domain name, then compare this with some fast access tally of your target domains such as a Dictionary<string, int>, incrementing as you go, e.g.:
var source = Enumerable.Range(0, 300000).Select(x => Guid.NewGuid().ToString()).Select(x => x.Substring(0, 4) + ".com/" + x.Substring(4, 10));
var targets = Enumerable.Range(0, 900).Select(x => Guid.NewGuid().ToString().Substring(0, 4) + ".com").Distinct();
var tally = targets.ToDictionary(x => x, x => 0);
Func<string, string> naiveDomainExtractor = x=> x.Split('/')[0];
foreach(var line in source)
{
var domain = naiveDomainExtractor(line);
if(tally.ContainsKey(domain)) tally[domain]++;
}
...which takes a third of a second on my not particularly speedy machine, including generation of test data.
Admittedly your domain extractor maybe a bit more sophisticated but it will probably not be very processor intensive, and if you've got multiple cores at your disposal you can speed things up further by using a ConcurrentDictionary<string, int> and Parallel.ForEach.

You'd have to test the performance but you might try converting the urls to the actual System.Uri object.
Store the list of websites as a HashSet<string> - then use the HashSet to look up the Uri's Host:
IEnumerable<Uri> inputUrls = File.ReadAllLines(#"c:\myFile.txt").Select(e => new Uri(e));
string[] myUrls = new[] { "amazon.com", "google.com", "stackoverflow.com" };
HashSet<string> urls = new HashSet<string>(myUrls);
IEnumerable<Uri> matches = inputUrls.Where(e => urls.Contains(e.Host));

Unsafe string creation from char[]

I'm working on a high performance code in which this construct is part of the performance critical section.
This is what happens in some section:
A string is 'scanned' and metadata is stored efficiently.
Based upon this metadata chunks of the main string are separated into a char[][].
That char[][] should be transferred into a string[].
Now, I know you can just call new string(char[]) but then the result would have to be copied.
To avoid this extra copy step from happening I guess it must be possible to write directly to the string's internal buffer. Even though this would be an unsafe operation (and I know this bring lots of implications like overflow, forward compatibility).
I've seen several ways of achieving this, but none I'm really satisfied with.
Does anyone have true suggestions as to how to achieve this?
Extra information:
The actual process doesn't include converting to char[] necessarily, it's practically a 'multi-substring' operation. Like 3 indexes and their lengths appended.
The StringBuilder has too much overhead for the small number of concats.
EDIT:
Due to some vague aspects of what it is exactly that I'm asking, let me reformulate it.
This is what happens:
Main string is indexed.
Parts of the main string are copied to a char[].
The char[] is converted to a string.
What I'd like to do is merge step 2 and 3, resulting in:
Main string is indexed.
Parts of the main string are copied to a string (and the GC can keep its hands off of it during the process by proper use of the fixed keyword?).
And a note is that I cannot change the output type from string[], since this is an external library, and projects depend on it (backward compatibility).

I think that what you are asking to do is to 'carve up' an existing string in-place into multiple smaller strings without re-allocating character arrays for the smaller strings. This won't work in the managed world.
For one reason why, consider what happens when the garbage collector comes by and collects or moves the original string during a compaction- all of those other strings 'inside' of it are now pointing at some arbitrary other memory, not the original string you carved them out of.
EDIT: In contrast to the character-poking involved in Ben's answer (which is clever but IMHO a bit scary), you can allocate a StringBuilder with a pre-defined capacity, which eliminates the need to re-allocate the internal arrays. See http://msdn.microsoft.com/en-us/library/h1h0a5sy.aspx.

What happens if you do:
string s = GetBuffer();
fixed (char* pch = s) {
pch[0] = 'R';
pch[1] = 'e';
pch[2] = 's';
pch[3] = 'u';
pch[4] = 'l';
pch[5] = 't';
}
I think the world will come to an end (Or at least the .NET managed portion of it), but that's very close to what StringBuilder does.
Do you have profiler data to show that StringBuilder isn't fast enough for your purposes, or is that an assumption?

Just create your own addressing system instead of trying to use unsafe code to map to an internal data structure.
Mapping a string (which is also readable as a char[]) to an array of smaller strings is no different from building a list of address information (index & length of each substring). So make a new List<Tuple<int,int>> instead of a string[] and use that data to return the correct string from your original, unaltered data structure. This could easily be encapsulated into something that exposed string[].

In .NET, there is no way to create an instance of String which shares data with another string. Some discussion on why that is appears in this comment from Eric Lippert.

Matching Two Large Sets of Strings in C#

Here is the situation:
I have a webpage that I have scraped as a string.
I have several fields in a MSSQL database. For example, car model, it has an ID and a Name, such as Mustang or Civic. It is pre-filled with most models of car.
I want to find any match for any row in my models table. So if I have Civic, Mustang and E350 in my Model Table I want to find any occurance of any of the three on the page I have scraped.
What is an efficient way to do this in C#. I am using LINQ to SQL to interface with the db.
Does creating a dictionary of all models, tokenizing the page and iterating through the tokens make sense? Or should I just iterate through the tokens and use a WHERE clause and ask the database if there is a match?
//Dictionary dic contains all models from the DB, with the name being the key and the id being the value...
foreach(string pageToken in pageTokens)
{
if(dic.ContainsKey(pageToken))
{
//Do what I need to do
}
}
Both of these methods seem terrible to me. Any suggestions on what I should do? Something with set intersection I would imagine might be nice?
Neither of these methods address what happens when a Model name is more than one word..like "F150 Extended Cab". Thoughts on that?

Searching for multiple strings in a larger text is a well-understood problem, and signifigant research has been made into making it fast. The two most popular and effective methods for this are the Aho-Corasick Algorithm (I'd rcommend this one) and the Rabin-Karp Algorithm. They use a little preprocessing, but are orders of magnitude less complex & faster than the naieve method (the naieve method is worst-case O(m*n^2*p) where m is the length of the long string [the webpage you scraped] and n is the average length of the needles and p is the number of needles). Aho-Corsaik is linear. A C# implementation of it can be found at CodeProject for free.
Edit: Oops, I was wrong about the complexity of Aho-Corasick -- it's linear in the number & length of input strings + the size of the string being analyzed [the scraped text] plus the number of matches. But it's still linear and linear is a lot better than cubic :-).

My first approach would be super-simple:
foreach(string carModel in listOfCarModelsFromDatabase) {
if(pageText.Contains(carModel) {
// do something
}
}
I'd only start worrying about making it faster if the above weren't fast enough. The list of car models just can't possibly be that large (< 10000?) and it's only one page of text.

You should be using Regex, not tokenizing based on space.
With Regex you could use spaces and be just fine, and I believe it would be faster than tokenizing and looping through list of possible values.
How you construct that Regex though I am not sure.
Most simply, you could simply build a Regex with every model like
(Model 1|Model 2|Model 3)
But I am sure there are more efficient ways to do this in regex.

For a really simple solution that does substring matches (that should perform reasonably well), you could use a parameterized SQL query like this:
select ModelID, ModelName
from Model
where ? like '%' + ModelName + '%'
where the ? is a parameter that gets replaced with the entire webpage text.

Performance issue: comparing to String.Format

A while back a post by Jon Skeet planted the idea in my head of building a CompiledFormatter class, for using in a loop instead of String.Format().
The idea is the portion of a call to String.Format() spent parsing the format string is overhead; we should be able to improve performance by moving that code outside of the loop. The trick, of course, is the new code should exactly match the String.Format() behavior.
This week I finally did it. I went through using the .Net framework source provided by Microsoft to do a direct adaption of their parser (it turns out String.Format() actually farms the work to StringBuilder.AppendFormat()). The code I came up with works, in that my results are accurate within my (admittedly limited) test data.
Unfortunately, I still have one problem: performance. In my initial tests the performance of my code closely matches that of the normal String.Format(). There's no improvement at all; it's even consistently a few milliseconds slower. At least it's still in the same order (ie: the amount slower doesn't increase; it stays within a few milliseconds even as the test set grows), but I was hoping for something better.
It's possible that the internal calls to StringBuilder.Append() are what actually drive the performance, but I'd like to see if the smart people here can help improve things.
Here is the relevant portion:
private class FormatItem
{
public int index; //index of item in the argument list. -1 means it's a literal from the original format string
public char[] value; //literal data from original format string
public string format; //simple format to use with supplied argument (ie: {0:X} for Hex
// for fixed-width format (examples below)
public int width; // {0,7} means it should be at least 7 characters
public bool justify; // {0,-7} would use opposite alignment
}
//this data is all populated by the constructor
private List<FormatItem> parts = new List<FormatItem>();
private int baseSize = 0;
private string format;
private IFormatProvider formatProvider = null;
private ICustomFormatter customFormatter = null;
// the code in here very closely matches the code in the String.Format/StringBuilder.AppendFormat methods.
// Could it be faster?
public String Format(params Object[] args)
{
if (format == null || args == null)
throw new ArgumentNullException((format == null) ? "format" : "args");
var sb = new StringBuilder(baseSize);
foreach (FormatItem fi in parts)
{
if (fi.index < 0)
sb.Append(fi.value);
else
{
//if (fi.index >= args.Length) throw new FormatException(Environment.GetResourceString("Format_IndexOutOfRange"));
if (fi.index >= args.Length) throw new FormatException("Format_IndexOutOfRange");
object arg = args[fi.index];
string s = null;
if (customFormatter != null)
{
s = customFormatter.Format(fi.format, arg, formatProvider);
}
if (s == null)
{
if (arg is IFormattable)
{
s = ((IFormattable)arg).ToString(fi.format, formatProvider);
}
else if (arg != null)
{
s = arg.ToString();
}
}
if (s == null) s = String.Empty;
int pad = fi.width - s.Length;
if (!fi.justify && pad > 0) sb.Append(' ', pad);
sb.Append(s);
if (fi.justify && pad > 0) sb.Append(' ', pad);
}
}
return sb.ToString();
}
//alternate implementation (for comparative testing)
// my own test call String.Format() separately: I don't use this. But it's useful to see
// how my format method fits.
public string OriginalFormat(params Object[] args)
{
return String.Format(formatProvider, format, args);
}
Additional notes:
I'm wary of providing the source code for my constructor, because I'm not sure of the licensing implications from my reliance on the original .Net implementation. However, anyone who wants to test this can just make the relevant private data public and assign values that mimic a particular format string.
Also, I'm very open to changing the FormatInfo class and even the parts List if anyone has a suggestion that could improve the build time. Since my primary concern is sequential iteration time from front to end maybe a LinkedList would fare better?
[Update]:
Hmm... something else I can try is adjusting my tests. My benchmarks were fairly simple: composing names to a "{lastname}, {firstname}" format and composing formatted phone numbers from the area code, prefix, number, and extension components. Neither of those have much in the way of literal segments within the string. As I think about how the original state machine parser worked, I think those literal segments are exactly where my code has the best chance to do well, because I no longer have to examine each character in the string.
Another thought:
This class is still useful, even if I can't make it go faster. As long as performance is no worse than the base String.Format(), I've still created a strongly-typed interface which allows a program to assemble it's own "format string" at run time. All I need to do is provide public access to the parts list.

Here's the final result:
I changed the format string in a benchmark trial to something that should favor my code a little more:
The quick brown {0} jumped over the lazy {1}.
As I expected, this fares much better compared to the original; 2 million iterations in 5.3 seconds for this code vs 6.1 seconds for String.Format. This is an undeniable improvement. You might even be tempted to start using this as a no-brainer replacement for many String.Format situations. After all, you'll do no worse and you might even get a small performance boost: as much 14%, and that's nothing to sneeze at.
Except that it is. Keep in mind, we're still talking less than half a second difference for 2 million attempts, under a situation specifically designed to favor this code. Not even busy ASP.Net pages are likely to create that much load, unless you're lucky enough to work on a top 100 web site.
Most of all, this omits one important alternative: you can create a new StringBuilder each time and manually handle your own formatting using raw Append() calls. With that technique my benchmark finished in only 3.9 seconds. That's a much greater improvement.
In summary, if performance doesn't matter as much, you should stick with the clarity and simplicity of the built-in option. But when in a situation where profiling shows this really is driving your performance, there is a better alternative available via StringBuilder.Append().

Don't stop now!
Your custom formatter might only be slightly more efficient than the built-in API, but you can add more features to your own implementation that would make it more useful.
I did a similar thing in Java, and here are some of the features I added (besides just pre-compiled format strings):
1) The format() method accepts either a varargs array or a Map (in .NET, it'd be a dictionary). So my format strings can look like this:
StringFormatter f = StringFormatter.parse(
"the quick brown {animal} jumped over the {attitude} dog"
);
Then, if I already have my objects in a map (which is pretty common), I can call the format method like this:
String s = f.format(myMap);
2) I have a special syntax for performing regular expression replacements on strings during the formatting process:
// After calling obj.toString(), all space characters in the formatted
// object string are converted to underscores.
StringFormatter f = StringFormatter.parse(
"blah blah blah {0:/\\s+/_/} blah blah blah"
);
3) I have a special syntax that allows the formatted to check the argument for null-ness, applying a different formatter depending on whether the object is null or non-null.
StringFormatter f = StringFormatter.parse(
"blah blah blah {0:?'NULL'|'NOT NULL'} blah blah blah"
);
There are a zillion other things you can do. One of the tasks on my todo list is to add a new syntax where you can automatically format Lists, Sets, and other Collections by specifying a formatter to apply to each element as well as a string to insert between all elements. Something like this...
// Wraps each elements in single-quote charts, separating
// adjacent elements with a comma.
StringFormatter f = StringFormatter.parse(
"blah blah blah {0:#['$'][,]} blah blah blah"
);
But the syntax is a little awkward and I'm not in love with it yet.
Anyhow, the point is that your existing class might not be much more efficient than the framework API, but if you extend it to satisfy all of your personal string-formatting needs, you might end up with a very convenient library in the end. Personally, I use my own version of this library for dynamically constructing all SQL strings, error messages, and localization strings. It's enormously useful.

It seems to me that in order to get actual performance improvement, you'd need to factor out any format analysis done by your customFormatter and formattable arguments into a function that returns some data structure that tells a later formatting call what to do. Then you pull those data structures in your constructor and store them for later use. Presumably this would involve extending ICustomFormatter and IFormattable. Seems kinda unlikely.

Have you accounted for the time to do the JIT compile as well? After all, the framework will be ngen'd which could account for the differences?

The framework provides explicit overrides to the format methods that take fixed-sized parameter lists instead of the params object[] approach to remove the overhead of allocating and collecting all of the temporary object arrays. You might want to consider that for your code as well. Also, providing strongly-typed overloads for common value types would reduce boxing overhead.

I gotta believe that spending as much time optimizing data IO would earn exponentially bigger returns!
This is surely a kissin' cousin to YAGNI for this. Avoid Premature Optimization. APO.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.