C# Named parameters to a string that replace to the parameter values - c#

I want in a good performance way (I hope) replace a named parameter in my string to a named parameter from code, example, my string:
"Hi {name}, do you like milk?"
How could I replace the {name} by code, Regular expressions? To expensive? Which way do you recommend?
How do they in example NHibernates HQL to replace :my_param to the user defined value? Or in ASP.NET (MVC) Routing that I like better, "{controller}/{action}", new { controller = "Hello", ... }?

Have you confirmed that regular expressions are too expensive?
The cost of regular expressions is greatly exaggerated. For such a simple pattern performance will be quite good, probably only slightly less good than direct search-and-replace, in fact. Also, have you experimented with the Compiled flag when constructing the regular expression?
That said, can't you just use the simplest way, i.e. Replace?
string varname = "name";
string pattern = "{" + varname + "}";
Console.WriteLine("Hi {name}".Replace(pattern, "Mike"));

Regex is certainly a viable option, especially with a MatchEvaluator:
Regex re = new Regex(#"\{(\w*?)\}", RegexOptions.Compiled); // store this...
string input = "Hi {name}, do you like {food}?";
Dictionary<string, string> vals = new Dictionary<string, string>();
vals.Add("name", "Fred");
vals.Add("food", "milk");
string q = re.Replace(input, delegate(Match match)
{
string key = match.Groups[1].Value;
return vals[key];
});

Now if you have you replacements in a dictionary, like this:
var replacements = new Dictionary<string, string>();
replacements["name"] = "Mike";
replacements["age"]= "20";
then the Regex becomes quite simple:
Regex regex = new Regex(#"\{(?<key>\w+)\}");
string formattext = "{name} is {age} years old";
string newStr = regex.Replace(formattext,
match=>replacements[match.Groups[1].Captures[0].Value]);

After thinking about this, I realized what I actually wished for, was that String.Format() would take an IDictionary as argument, and that templates could be written using names instead of indexes.
For string substitutions with lots of possible keys/values, the index numbers result in illegible string templates - and in some cases, you may not even know which items are going to have what number, so I came up with the following extension:
https://gist.github.com/896724
Basically this lets you use string templates with names instead of numbers, and a dictionary instead of an array, and lets you have all the other good features of String.Format(), allowing the use of a custom IFormatProvider, if needed, and allowing the use of all the usual formatting syntax - precision, length, etc.
The example provided in the reference material for String.Format is a great example of how templates with many numbered items become completely illegible - porting that example to use this new extension method, you get something like this:
var replacements = new Dictionary<String, object>()
{
{ "date1", new DateTime(2009, 7, 1) },
{ "hiTime", new TimeSpan(14, 17, 32) },
{ "hiTemp", 62.1m },
{ "loTime", new TimeSpan(3, 16, 10) },
{ "loTemp", 54.8m }
};
var template =
"Temperature on {date1:d}:\n{hiTime,11}: {hiTemp} degrees (hi)\n{loTime,11}: {loTemp} degrees (lo)";
var result = template.Subtitute(replacements);
As someone pointed out, if what you're writing needs to be highly optimized, don't use something like this - if you have to format millions of strings this way, in a loop, the memory and performance overhead could be significant.
On the other hand, if you're concerned about writing legible, maintainable code - and if you're doing, say, a bunch of database operations, in the grand scheme of things, this function will not add any significant overhead.
...
For convenience, I did attempt to add a method that would accept an anonymous object instead of a dictionary:
public static String Substitute(this String template, object obj)
{
return Substitute(
template,
obj.GetType().GetProperties().ToDictionary(p => p.Name, p => p.GetValue(obj, null))
);
}
For some reason, this doesn't work - passing an anonymous object like new { name: "value" } to that extension method gives a compile-time error message saying the best match was the IDictionary version of that method. Not sure how to fix that. (anyone?)

How about
stringVar = "Hello, {0}. How are you doing?";
arg1 = "John"; // or args[0]
String.Format(stringVar, arg1)
You can even have multiple args, just increment the {x} and add another parameter to the Format() method. Not sure the different but both "string" and "String" have this method.

A compiled regex might do the trick , especially if there are many tokens to be replaced. If there are just a handful of them and performance is key, I would simply find the token by index and replace using string functions. Believe it or not this will be faster than a regex.

Try using StringTemplate. It's much more powerful than that, but it does the job flawless.

or try this with Linq if you have all your replace values stored in a Dictionary obj.
For example:
Dictionary<string,string> dict = new Dictionary<string,string>();
dict.add("replace1","newVal1");
dict.add("replace2","newVal2");
dict.add("replace3","newVal3");
var newstr = dict.Aggregate(str, (current, value) => current.Replace(value.Key, value.Value));
dict is your search-replace pairs defined Dictionary object.
str is your string which you need to do some replacements with.

I would go for the mindplay.dk solution... Works quite well.
And, with a slight modification, it supports templates-of-templates, like
"Hi {name}, do you like {0}?", replacing {name} but retaining {0}:
In the given source (https://gist.github.com/896724), replace as follows:
var format = Pattern.Replace(
template,
match =>
{
var name = match.Groups[1].Captures[0].Value;
if (!int.TryParse(name, out parsedInt))
{
if (!map.ContainsKey(name))
{
map[name] = map.Count;
list.Add(dictionary.ContainsKey(name) ? dictionary[name] : null);
}
return "{" + map[name] + match.Groups[2].Captures[0].Value + "}";
}
else return "{{" + name + "}}";
}
);
Furthermore, it supports a length ({name,30}) as well as a formatspecifier, or a combination of both.

UPDATE for 2022 (for both .NET 4.8 and .NET 6):
Especially when multi-line string templates are needed, C# 6 now offers us both $ and # used together like:
(You just need to escape quotes by replacing " with "")
string name = "Mike";
int age = 20 + 14; // 34
string product = "milk";
var htmlTemplateContent = $#"
<!DOCTYPE html>
<html>
<head>
<meta charset=""utf-8"" />
<title>Sample HTML page</title>
</head>
<body>
Hi {name}, now that you're {age.ToString()}, how do you like {product}?
</body>
</html>";

Related

Replace string with regular expression and my own parameter

In my html I've serval token like this:
{PROP_1_1}, {PROP_1_2}, {PROP_37871_1} ...
Actually I replace that token with the following code:
htmlBuffer = htmlBuffer.Replace("{PROP_" + prop.PropertyID + "_1}", prop.PropertyDefaultHtml);
where prop is a custom object. But in this case it affects only the tokens ending with '_1'. I would like to propagate this logic to all the rest ending up with '_X' where X is numeric.
How could I implement a regexp pattern to achieve this?
You can use Regex.Replace():
Regex rgx = new Regex("{PROP_" + prop.PropertyID + "_\d+}");
htmlBuffer = rgx.Replace(htmlBuffer, prop.PropertyDefaultHtml);
You can do even better, you can catch both identifiers in a regular expression. That way you can loop through the references that exist in the string and get the properties for those, instead of looping through all the properties that you have and check if there is any reference for them in the string.
Example:
htmlBuffer = Regex.Replace(htmlBuffer, #"{PROP_(\d+)_(\d+)}", m => {
int id = Int32.Parse(m.Groups[1].Value);
int suffix = Int32.Parse(m.Groups[2].Value);
return properties[id].GetValue(suffix);
});

Find/parse server-side <?abc?>-like tags in html document

I guess I need some regex help. I want to find all tags like <?abc?> so that I can replace it with whatever the results are for the code ran inside. I just need help regexing the tag/code string, not parsing the code inside :p.
<b><?abc print 'test' ?></b> would result in <b>test</b>
Edit: Not specifically but in general, matching (<?[chars] (code group) ?>)
This will build up a new copy of the string source, replacing <?abc code?> with the result of process(code)
Regex abcTagRegex = new Regex(#"\<\?abc(?<code>.*?)\?>");
StringBuilder newSource = new StringBuilder();
int curPos = 0;
foreach (Match abcTagMatch in abcTagRegex.Matches(source)) {
string code = abcTagMatch.Groups["code"].Value;
string result = process(code);
newSource.Append(source.Substring(curPos, abcTagMatch.Index));
newSource.Append(result);
curPos = abcTagMatch.Index + abcTagMatch.Length;
}
newSource.Append(source.Substring(curPos));
source = newSource.ToString();
N.B. I've not been able to test this code, so some of the functions may be slightly the wrong name, or there may be some off-by-one errors.
var new Regex(#"<\?(\w+) (\w+) (.+?)\?>")
This will take this source
<b><?abc print 'test' ?></b>
and break it up like this:
Value: <?abc print 'test' ?>
SubMatch: abc
SubMatch: print
SubMatch: 'test'
These can then be sent to a method that can handle it differently depending on what the parts are.
If you need more advanced syntax handling you need to go beyond regex I believe.
I designed a template engine using Antlr but thats way more complex ;)
exp = new Regex(#"<\?abc print'(.+)' \?>");
str = exp.Replace(str, "$1")
Something like this should do the trick. Change the regexes how you see fit

.NET String parsing performance improvement - Possible Code Smell

The code below is designed to take a string in and remove any of a set of arbitrary words that are considered non-essential to a search phrase.
I didn't write the code, but need to incorporate it into something else. It works, and that's good, but it just feels wrong to me. However, I can't seem to get my head outside the box that this method has created to think of another approach.
Maybe I'm just making it more complicated than it needs to be, but I feel like this might be cleaner with a different technique, perhaps by using LINQ.
I would welcome any suggestions; including the suggestion that I'm over thinking it and that the existing code is perfectly clear, concise and performant.
So, here's the code:
private string RemoveNonEssentialWords(string phrase)
{
//This array is being created manually for demo purposes. In production code it's passed in from elsewhere.
string[] nonessentials = {"left", "right", "acute", "chronic", "excessive", "extensive",
"upper", "lower", "complete", "partial", "subacute", "severe",
"moderate", "total", "small", "large", "minor", "multiple", "early",
"major", "bilateral", "progressive"};
int index = -1;
for (int i = 0; i < nonessentials.Length; i++)
{
index = phrase.ToLower().IndexOf(nonessentials[i]);
while (index >= 0)
{
phrase = phrase.Remove(index, nonessentials[i].Length);
phrase = phrase.Trim().Replace(" ", " ");
index = phrase.IndexOf(nonessentials[i]);
}
}
return phrase;
}
Thanks in advance for your help.
Cheers,
Steve
This appears to be an algorithm for removing stop words from a search phrase.
Here's one thought: If this is in fact being used for a search, do you need the resulting phrase to be a perfect representation of the original (with all original whitespace intact), but with stop words removed, or can it be "close enough" so that the results are still effectively the same?
One approach would be to tokenize the phrase (using the approach of your choice - could be a regex, I'll use a simple split) and then reassemble it with the stop words removed. Example:
public static string RemoveStopWords(string phrase, IEnumerable<string> stop)
{
var tokens = Tokenize(phrase);
var filteredTokens = tokens.Where(s => !stop.Contains(s));
return string.Join(" ", filteredTokens.ToArray());
}
public static IEnumerable<string> Tokenize(string phrase)
{
return string.Split(phrase, ' ');
// Or use a regex, such as:
// return Regex.Split(phrase, #"\W+");
}
This won't give you exactly the same result, but I'll bet that it's close enough and it will definitely run a lot more efficiently. Actual search engines use an approach similar to this, since everything is indexed and searched at the word level, not the character level.
I guess your code is not doing what you want it to do anyway. "moderated" would be converted to "d" if I'm right. To get a good solution you have to specify your requirements a bit more detailed. I would probably use Replace or regular expressions.
I would use a regular expression (created inside the function) for this task. I think it would be capable of doing all the processing at once without having to make multiple passes through the string or having to create multiple intermediate strings.
private string RemoveNonEssentialWords(string phrase)
{
return Regex.Replace(phrase, // input
#"\b(" + String.Join("|", nonessentials) + #")\b", // pattern
"", // replacement
RegexOptions.IgnoreCase)
.Replace(" ", " ");
}
The \b at the beginning and end of the pattern makes sure that the match is on a boundary between alphanumeric and non-alphanumeric characters. In other words, it will not match just part of the word, like your sample code does.
Yeah, that smells.
I like little state machines for parsing, they can be self-contained inside a method using lists of delegates, looping through the characters in the input and sending each one through the state functions (which I have return the next state function based on the examined character).
For performance I would flush out whole words to a string builder after I've hit a separating character and checked the word against the list (might use a hash set for that)
I would create A Hash table of Removed words parse each word if in the hash remove it only one time through the array and I believe that creating a has table is O(n).
How does this look?
foreach (string nonEssent in nonessentials)
{
phrase.Replace(nonEssent, String.Empty);
}
phrase.Replace(" ", " ");
If you want to go the Regex route, you could do it like this. If you're going for speed it's worth a try and you can compare/contrast with other methods:
Start by creating a Regex from the array input. Something like:
var regexString = "\\b(" + string.Join("|", nonessentials) + ")\\b";
That will result in something like:
\b(left|right|chronic)\b
Then create a Regex object to do the find/replace:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(regexString, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Then you can just do a Replace like so:
string fixedPhrase = regex.Replace(phrase, "");

Remove all problematic characters in an intelligent way in C#

Is there any .Net library to remove all problematic characters of a string and only leave alphanumeric, hyphen and underscore (or similar subset) in an intelligent way? This is for using in URLs, file names, etc.
I'm looking for something similar to stringex which can do the following:
A simple prelude
"simple English".to_url =>
"simple-english"
"it's nothing at all".to_url =>
"its-nothing-at-all"
"rock & roll".to_url =>
"rock-and-roll"
Let's show off
"$12 worth of Ruby power".to_url =>
"12-dollars-worth-of-ruby-power"
"10% off if you act now".to_url =>
"10-percent-off-if-you-act-now"
You don't even wanna trust Iconv for this next part
"kick it en Français".to_url =>
"kick-it-en-francais"
"rock it Español style".to_url =>
"rock-it-espanol-style"
"tell your readers 你好".to_url =>
"tell-your-readers-ni-hao"
You can try this
string str = phrase.ToLower(); //optional
str = str.Trim();
str = Regex.Replace(str, #"[^a-z0-9\s_]", ""); // invalid chars
str = Regex.Replace(str, #"\s+", " ").Trim(); // convert multiple spaces into one space
str = str.Substring(0, str.Length <= 400 ? str.Length : 400).Trim(); // cut and trim it
str = Regex.Replace(str, #"\s", "-");
Perhaps this question here can help you on your way. It gives you code on how Stackoverflow generates its url's (more specifically, how question names are turned into nice urls.
Link to Question here, where Jeff Atwood shows their code
From your examples, the closest thing I've found (although I don't think it does everything that you're after) is:
My Favorite String Extension Methods in C#
and also:
ÜberUtils - Part 3 : Strings
Since neither of these solutions will give you exactly what you're after (going from the examples in your question) and assuming that the goal here is to make your string "safe", I'd second Hogan's advice and go with Microsoft's Anti Cross Site Scripting Library, or at least use that as a basis for something that you create yourself, perhaps deriving from the library.
Here's a link to a class that builds a number of string extension methods (like the first two examples) but leverages Microsoft's AntiXSS Library:
Extension Methods for AntiXss
Of course, you can always combine the algorithms (or similar ones) used within the AntiXSS library with the kind of algorithms that are often used in websites to generate "slug" URL's (much like Stack Overflow and many blog platforms do).
Here's an example of a good C# slug generator:
Improved C# Slug Generator
You could use HTTPUtility.UrlEncode, but that would encode everything, and not replace or remove problematic characters. So your spaces would be + and ' would be encoded as well. Not a solution, but maybe a starting point
If the goal is to make the string "safe" I recommend Mirosoft's anti-xss libary
There will be no library capable of what you want since you are stating specific rules that you want applied, e.g. $x => x-dollars, x% => x-percent. You will almost certainly have to write your own method to acheive this. It shouldn't be too difficult. A string extension method and use of one or more Regex's for making the replacements would probably be quite a nice concise way of doing it.
e.g.
public static string ToUrl(this string text)
{
return text.Trim().Regex.Replace(text, ..., ...);
}
Something the Ruby version doesn't make clear (but the original Perl version does) is that the algorithm it's using to transliterate non-Roman characters is deliberately simplistic -- "better than nothing" in both senses. For example, while it does have a limited capability to transliterate Chinese characters, this is entirely context-insensitive -- so if you feed it Japanese text then you get gibberish out.
The advantage of this simplistic nature is that it's pretty trivial to implement. You just have a big table of Unicode characters and their corresponding ASCII "equivalents". You could pull this straight from the Perl (or Ruby) source code if you decide to implement this functionality yourself.
I'm using something like this in my blog.
public class Post
{
public string Subject { get; set; }
public string ResolveSubjectForUrl()
{
return Regex.Replace(Regex.Replace(this.Subject.ToLower(), "[^\\w]", "-"), "[-]{2,}", "-");
}
}
I couldn't find any library that does it, like in Ruby, so I ended writing my own method. This is it in case anyone cares:
/// <summary>
/// Turn a string into something that's URL and Google friendly.
/// </summary>
/// <param name="str"></param>
/// <returns></returns>
public static string ForUrl(this string str) {
return str.ForUrl(true);
}
public static string ForUrl(this string str, bool MakeLowerCase) {
// Go to lowercase.
if (MakeLowerCase) {
str = str.ToLower();
}
// Replace accented characters for the closest ones:
char[] from = "ÂÃÄÀÁÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöøùúûüýÿ".ToCharArray();
char[] to = "AAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaceeeeiiiidnoooooouuuuyy".ToCharArray();
for (int i = 0; i < from.Length; i++) {
str = str.Replace(from[i], to[i]);
}
// Thorn http://en.wikipedia.org/wiki/%C3%9E
str = str.Replace("Þ", "TH");
str = str.Replace("þ", "th");
// Eszett http://en.wikipedia.org/wiki/%C3%9F
str = str.Replace("ß", "ss");
// AE http://en.wikipedia.org/wiki/%C3%86
str = str.Replace("Æ", "AE");
str = str.Replace("æ", "ae");
// Esperanto http://en.wikipedia.org/wiki/Esperanto_orthography
from = "ĈĜĤĴŜŬĉĝĥĵŝŭ".ToCharArray();
to = "CXGXHXJXSXUXcxgxhxjxsxux".ToCharArray();
for (int i = 0; i < from.Length; i++) {
str = str.Replace(from[i].ToString(), "{0}{1}".Args(to[i*2], to[i*2+1]));
}
// Currencies.
str = new Regex(#"([¢€£\$])([0-9\.,]+)").Replace(str, #"$2 $1");
str = str.Replace("¢", "cents");
str = str.Replace("€", "euros");
str = str.Replace("£", "pounds");
str = str.Replace("$", "dollars");
// Ands
str = str.Replace("&", " and ");
// More aesthetically pleasing contractions
str = str.Replace("'", "");
str = str.Replace("’", "");
// Except alphanumeric, everything else is a dash.
str = new Regex(#"[^A-Za-z0-9-]").Replace(str, "-");
// Remove dashes at the begining or end.
str = str.Trim("-".ToCharArray());
// Compact duplicated dashes.
str = new Regex("-+").Replace(str, "-");
// Let's url-encode just in case.
return str.UrlEncode();
}

Highlight a list of words using a regular expression in c#

I have some site content that contains abbreviations. I have a list of recognised abbreviations for the site, along with their explanations. I want to create a regular expression which will allow me to replace all of the recognised abbreviations found in the content with some markup.
For example:
content: This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.
abbreviations: memb = Member; deb = Debut;
result: This is just a little test of the [a title="Member"]memb[/a] to see if it gets picked up.
[a title="Debut"]Deb[/a] of course should also be caught here.
(This is just example markup for simplicity).
Thanks.
EDIT:
CraigD's answer is nearly there, but there are issues. I only want to match whole words. I also want to keep the correct capitalisation of each word replaced, so that deb is still deb, and Deb is still Deb as per the original text. For example, this input:
This is just a little test of the memb.
And another memb, but not amemba.
Deb of course should also be caught here.deb!
First you would need to Regex.Escape() all the input strings.
Then you can look for them in the string, and iteratively replace them by the markup you have in mind:
string abbr = "memb";
string word = "Member";
string pattern = String.Format("\b{0}\b", Regex.Escape(abbr));
string substitue = String.Format("[a title=\"{0}\"]{1}[/a]", word, abbr);
string output = Regex.Replace(input, pattern, substitue);
EDIT: I asked if a simple String.Replace() wouldn't be enough - but I can see why regex is desirable: you can use it to enforce "whole word" replacements only by making a pattern that uses word boundary anchors.
You can go as far as building a single pattern from all your escaped input strings, like this:
\b(?:{abbr_1}|{abbr_2}|{abbr_3}|{abbr_n})\b
and then using a match evaluator to find the right replacement. This way you can avoid iterating the input string more than once.
Not sure how well this will scale to a big word list, but I think it should give the output you want (although in your question the 'result' seems identical to 'content')?
Anyway, let me know if this is what you're after
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var input = #"This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.";
var dictionary = new Dictionary<string,string>
{
{"memb", "Member"}
,{"deb","Debut"}
};
var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";
foreach (Match metamatch in Regex.Matches(input
, regex /*#"(memb)|(deb)"*/
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
input = input.Replace(metamatch.Value, dictionary[metamatch.Value.ToLower()]);
}
Console.Write (input);
Console.ReadLine();
}
}
}
For anyone interested, here is my final solution. It is for a .NET user control. It uses a single pattern with a match evaluator, as suggested by Tomalak, so there is no foreach loop. It's an elegant solution, and it gives me the correct output for the sample input while preserving correct casing for matched strings.
public partial class Abbreviations : System.Web.UI.UserControl
{
private Dictionary<String, String> dictionary = DataHelper.GetAbbreviations();
protected void Page_Load(object sender, EventArgs e)
{
string input = "This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!";
var regex = "\\b(?:" + String.Join("|", dictionary.Keys.ToArray()) + ")\\b";
MatchEvaluator myEvaluator = new MatchEvaluator(GetExplanationMarkup);
input = Regex.Replace(input, regex, myEvaluator, RegexOptions.IgnoreCase);
litContent.Text = input;
}
private string GetExplanationMarkup(Match m)
{
return string.Format("<b title='{0}'>{1}</b>", dictionary[m.Value.ToLower()], m.Value);
}
}
The output looks like this (below). Note that it only matches full words, and that the casing is preserved from the original string:
This is just a little test of the <b title='Member'>memb</b>. And another <b title='Member'>memb</b>, but not amemba to see if it gets picked up. <b title='Debut'>Deb</b> of course should also be caught here.<b title='Debut'>deb</b>!
I doubt it will perform better than just doing normal string.replace, so if performance is critical measure (refactoring a bit to use a compiled regex). You can do the regex version as:
var abbrsWithPipes = "(abbr1|abbr2)";
var regex = new Regex(abbrsWithPipes);
return regex.Replace(html, m => GetReplaceForAbbr(m.Value));
You need to implement GetReplaceForAbbr, which receives the specific abbr being matched.
I'm doing pretty exactly what you're looking for in my application and this works for me:
the parameter str is your content:
public static string GetGlossaryString(string str)
{
List<string> glossaryWords = GetGlossaryItems();//this collection would contain your abbreviations; you could just make it a Dictionary so you can have the abbreviation-full term pairs and use them in the loop below
str = string.Format(" {0} ", str);//quick and dirty way to also search the first and last word in the content.
foreach (string word in glossaryWords)
str = Regex.Replace(str, "([\\W])(" + word + ")([\\W])", "$1<span class='glossaryItem'>$2</span>$3", RegexOptions.IgnoreCase);
return str.Trim();
}

Categories