I need to resolve a huge load of placeholders (about 250) in a plain text.
A placeholder is defined as %ThisIsAPlaceholder%, an example would be %EmailSender%.
Now it's gets a bit creepy: the code should handle case insensitive placeholders too. So, %EmailSender%, %EMAILSENDER% and %emailsender% are the same placeholder. I think that's where it gets complicated.
My first approach was the something like:
public string ResolvePlaceholders(string text)
{
var placeholders = new IEnumerable<string>
{
"%EmailSender%",
"%ErrorMessage%",
"%ActiveUser%"
};
var resolvedText = text;
foreach(var placeholder in placeholders)
{
if(!replacedText.Contains(placeholder))
continue;
var value = GetValueByPlaceholder(placeholder);
resolvedText = resolvedText.Replace(placeholder, value);
}
return resolvedText;
}
But.. as you may notice, i can't handle case insesitive placeholders.
Also i check for every placeholder (if it is used in the text). When using > 200 placholders in a text with about 10'000 words i think this solution is not very fast.
How can this be solved in a better way? A solution that supports case insensitive placeholders would be appreciated.
A really basic but efficient replacement scheme for your case would be something like this:
private readonly static Regex regex = new Regex("%(?<name>.+?)%");
private static string Replace(string input, ISet<string> replacements)
{
string result = regex.Replace(input, m => {
string name = m.Groups["name"].Value;
string value;
if (replacements.Contains(name))
{
return GetValueByPlaceholder(name);
}
else
{
return m.Captures[0].Value;
}
});
return result;
}
public static void Main(string[] args)
{
var replacements = new HashSet<string>(StringComparer.CurrentCultureIgnoreCase)
{
"EmailSender", "ErrorMessage", "ActiveUser"
};
string text = "Hello %ACTIVEUSER%, There is a message from %emailsender%. %errorMessage%";
string result = Replace(text, replacements);
Console.WriteLine(result);
}
It will use a regular expression to go through the input text once. Note that we are getting case-insensitive comparisons via the equality comparer passed to the HashSet that we constructed in Main. Any unrecognized items will be ignored. For more general cases, the Replace method could take a dictionary:
private static string Replace(string input, IDictionary<string, string> replacements)
{
string result = regex.Replace(input, m => {
string name = m.Groups["name"].Value;
string value;
if (replacements.TryGetValue(name, out value))
{
return value;
}
else
{
return m.Captures[0].Value;
}
});
return result;
}
A typical recommendation when matching using quantifiers on input from an untrusted source (e.g. users over the internet) is to specify a match timeout for the regular expression. You would have to catch the RegexMatchTimeoutException that is thrown and do something in that case.
Regex solution
private static string ReplaceCaseInsensitive(string input, string search, string replacement)
{
string result = Regex.Replace(
input,
Regex.Escape(search),
replacement.Replace("$","$$"),
RegexOptions.IgnoreCase
);
return result;
}
Non regex solution
public static string Replace(this string str, string old, string #new, StringComparison comparison)
{
#new = #new ?? "";
if (string.IsNullOrEmpty(str) || string.IsNullOrEmpty(old) || old.Equals(#new, comparison))
return str;
int foundAt;
while ((foundAt = str.IndexOf(old, 0, StringComparison.CurrentCultureIgnoreCase)) != -1)
str = str.Remove(foundAt, old.Length).Insert(foundAt, #new);
return str;
}
Seems like a duplicate question / answer
String.Replace ignoring case
I have this function to extract all words from text
public static string[] GetSearchWords(string text)
{
string pattern = #"\S+";
Regex re = new Regex(pattern);
MatchCollection matches = re.Matches(text);
string[] words = new string[matches.Count];
for (int i=0; i<matches.Count; i++)
{
words[i] = matches[i].Value;
}
return words;
}
and I want to exclude a list of words from the return array, the words list looks like this
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
How can I modify the above function to avoid returning words which are in my list.
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();
I think Except method fits your needs
If you aren't forced to use Regex, you can use a little LINQ:
void Main()
{
var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');
string str = "if you read about cooking you can cook";
var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}
string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
var words = text.Split();
return words.Where(word => !toExclude.Contains(word)).ToArray();
}
I'm assuming a word is a series of non-whitespace characters.
Do any of you know of an easy/clean way to find a substring within a string while ignoring some specified characters to find it. I think an example would explain things better:
string: "Hello, -this- is a string"
substring to find: "Hello this"
chars to ignore: "," and "-"
found the substring, result: "Hello, -this"
Using Regex is not a requirement for me, but I added the tag because it feels related.
Update:
To make the requirement clearer: I need the resulting substring with the ignored chars, not just an indication that the given substring exists.
Update 2:
Some of you are reading too much into the example, sorry, i'll give another scenario that should work:
string: "?A&3/3/C)412&"
substring to find: "A41"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)41"
And as a bonus (not required per se), it will be great if it's also not safe to assume that the substring to find will not have the ignored chars on it, e.g.: given the last example we should be able to do:
substring to find: "A3C412&"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)412&"
Sorry if I wasn't clear before, or still I'm not :).
Update 3:
Thanks to everyone who helped!, this is the implementation I'm working with for now:
http://www.pastebin.com/pYHbb43Z
An here are some tests:
http://www.pastebin.com/qh01GSx2
I'm using some custom extension methods I'm not including but I believe they should be self-explainatory (I will add them if you like)
I've taken a lot of your ideas for the implementation and the tests but I'm giving the answer to #PierrOz because he was one of the firsts, and pointed me in the right direction.
Feel free to keep giving suggestions as alternative solutions or comments on the current state of the impl. if you like.
in your example you would do:
string input = "Hello, -this-, is a string";
string ignore = "[-,]*";
Regex r = new Regex(string.Format("H{0}e{0}l{0}l{0}o{0} {0}t{0}h{0}i{0}s{0}", ignore));
Match m = r.Match(input);
return m.Success ? m.Value : string.Empty;
Dynamically you would build the part [-, ] with all the characters to ignore and you would insert this part between all the characters of your query.
Take care of '-' in the class []: put it at the beginning or at the end
So more generically, it would give something like:
public string Test(string query, string input, char[] ignorelist)
{
string ignorePattern = "[";
for (int i=0; i<ignoreList.Length; i++)
{
if (ignoreList[i] == '-')
{
ignorePattern.Insert(1, "-");
}
else
{
ignorePattern += ignoreList[i];
}
}
ignorePattern += "]*";
for (int i = 0; i < query.Length; i++)
{
pattern += query[0] + ignorepattern;
}
Regex r = new Regex(pattern);
Match m = r.Match(input);
return m.IsSuccess ? m.Value : string.Empty;
}
Here's a non-regex string extension option:
public static class StringExtensions
{
public static bool SubstringSearch(this string s, string value, char[] ignoreChars, out string result)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentException("Search value cannot be null or empty.", "value");
bool found = false;
int matches = 0;
int startIndex = -1;
int length = 0;
for (int i = 0; i < s.Length && !found; i++)
{
if (startIndex == -1)
{
if (s[i] == value[0])
{
startIndex = i;
++matches;
++length;
}
}
else
{
if (s[i] == value[matches])
{
++matches;
++length;
}
else if (ignoreChars != null && ignoreChars.Contains(s[i]))
{
++length;
}
else
{
startIndex = -1;
matches = 0;
length = 0;
}
}
found = (matches == value.Length);
}
if (found)
{
result = s.Substring(startIndex, length);
}
else
{
result = null;
}
return found;
}
}
EDIT: here's an updated solution addressing the points in your recent update. The idea is the same except if you have one substring it will need to insert the ignore pattern between each character. If the substring contains spaces it will split on the spaces and insert the ignore pattern between those words. If you don't have a need for the latter functionality (which was more in line with your original question) then you can remove the Split and if checking that provides that pattern.
Note that this approach is not going to be the most efficient.
string input = #"foo ?A&3/3/C)412& bar A341C2";
string substring = "A41";
string[] ignoredChars = { "&", "/", "3", "C", ")" };
// builds up the ignored pattern and ensures a dash char is placed at the end to avoid unintended ranges
string ignoredPattern = String.Concat("[",
String.Join("", ignoredChars.Where(c => c != "-")
.Select(c => Regex.Escape(c)).ToArray()),
(ignoredChars.Contains("-") ? "-" : ""),
"]*?");
string[] substrings = substring.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string pattern = "";
if (substrings.Length > 1)
{
pattern = String.Join(ignoredPattern, substrings);
}
else
{
pattern = String.Join(ignoredPattern, substring.Select(c => c.ToString()).ToArray());
}
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Index: {0} -- Match: {1}", match.Index, match.Value);
}
Try this solution out:
string input = "Hello, -this- is a string";
string[] searchStrings = { "Hello", "this" };
string pattern = String.Join(#"\W+", searchStrings);
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
}
The \W+ will match any non-alphanumeric character. If you feel like specifying them yourself, you can replace it with a character class of the characters to ignore, such as [ ,.-]+ (always place the dash character at the start or end to avoid unintended range specifications). Also, if you need case to be ignored use RegexOptions.IgnoreCase:
Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
If your substring is in the form of a complete string, such as "Hello this", you can easily get it into an array form for searchString in this way:
string[] searchString = substring.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
This code will do what you want, although I suggest you modify it to fit your needs better:
string resultString = null;
try
{
resultString = Regex.Match(subjectString, "Hello[, -]*this", RegexOptions.IgnoreCase).Value;
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}
You could do this with a single Regex but it would be quite tedious as after every character you would need to test for zero or more ignored characters. It is probably easier to strip all the ignored characters with Regex.Replace(subject, "[-,]", ""); then test if the substring is there.
Or the single Regex way
Regex.IsMatch(subject, "H[-,]*e[-,]*l[-,]*l[-,]*o[-,]* [-,]*t[-,]*h[-,]*i[-,]*s[-,]*")
Here's a non-regex way to do it using string parsing.
private string GetSubstring()
{
string searchString = "Hello, -this- is a string";
string searchStringWithoutUnwantedChars = searchString.Replace(",", "").Replace("-", "");
string desiredString = string.Empty;
if(searchStringWithoutUnwantedChars.Contains("Hello this"))
desiredString = searchString.Substring(searchString.IndexOf("Hello"), searchString.IndexOf("this") + 4);
return desiredString;
}
You could do something like this, since most all of these answer require rebuilding the string in some form.
string1 is your string you want to look through
//Create a List(Of string) that contains the ignored characters'
List<string> ignoredCharacters = new List<string>();
//Add all of the characters you wish to ignore in the method you choose
//Use a function here to get a return
public bool subStringExist(List<string> ignoredCharacters, string myString, string toMatch)
{
//Copy Your string to a temp
string tempString = myString;
bool match = false;
//Replace Everything that you don't want
foreach (string item in ignoredCharacters)
{
tempString = tempString.Replace(item, "");
}
//Check if your substring exist
if (tempString.Contains(toMatch))
{
match = true;
}
return match;
}
You could always use a combination of RegEx and string searching
public class RegExpression {
public static void Example(string input, string ignore, string find)
{
string output = string.Format("Input: {1}{0}Ignore: {2}{0}Find: {3}{0}{0}", Environment.NewLine, input, ignore, find);
if (SanitizeText(input, ignore).ToString().Contains(SanitizeText(find, ignore)))
Console.WriteLine(output + "was matched");
else
Console.WriteLine(output + "was NOT matched");
Console.WriteLine();
}
public static string SanitizeText(string input, string ignore)
{
Regex reg = new Regex("[^" + ignore + "]");
StringBuilder newInput = new StringBuilder();
foreach (Match m in reg.Matches(input))
{
newInput.Append(m.Value);
}
return newInput.ToString();
}
}
Usage would be like
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this"); //Should match
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this2"); //Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A41"); // Should match
RegExpression.Example("?A&3/3/C) 412&", "&/3C\\)", "A41"); // Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A3C412&"); // Should match
Output
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this
was matched
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this2
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A41
was matched
Input: ?A&3/3/C) 412&
Ignore: &/3C)
Find: A41
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A3C412&
was matched
I would like to find all special characters in a string and replace with a Hyphen (-)
I am using the below code
string content = "foo,bar,(regular expression replace) 123";
string pattern = "[^a-zA-Z]"; //regex pattern
string result = System.Text.RegularExpressions.Regex.Replace(content,pattern, "-");
OutPut
foo-bar--regular-expression-replace----
I am getting multiple occurrence of hyphen (---) in the out put.
I would like to get some thing like this
foo-bar-regular-expression-replace
How do I achieve this
Any help would be appreciated
Thanks
Deepu
why not just do this:
public static string ToSlug(this string text)
{
StringBuilder sb = new StringBuilder();
var lastWasInvalid = false;
foreach (char c in text)
{
if (char.IsLetterOrDigit(c))
{
sb.Append(c);
lastWasInvalid = false;
}
else
{
if (!lastWasInvalid)
sb.Append("-");
lastWasInvalid = true;
}
}
return sb.ToString().ToLowerInvariant().Trim();
}
Try the pattern: "[^a-zA-Z]+" - i.e. replace one-or-more non-alpha (you might allow numeric, though?).
Wouldn't this work?
string pattern = "[^a-zA-Z]+";
The input string is something like this:
LineA: 50
LineB: 120
LineA: 12
LineB: 53
I would like to replace the LineB values with a result of MultiplyCalculatorMethod(LineAValue), where LineAValue is the value of the line above LineB and MultiplyCalculatorMethod is my other, complicated C# method.
In semi-code, I would like to do something like this:
int MultiplyCalculatorMethod(int value)
{
return 2 * Math.Max(3,value);
}
string ReplaceValues(string Input)
{
Matches mat = Regex.Match(LineA:input_value\r\nLineB:output_value)
foreach (Match m in mat)
{
m.output_value = MultiplyCalculatorMethod(m.input_value)
}
return m.OutputText;
}
Example:
string Text = "LineA:5\r\nLineB:2\r\nLineA:2\r\nLineB:7";
string Result = ReplaceValues(Text);
//Result = "LineA:5\r\nLineB:10\r\nLineA:2\r\nLineB:6";
I wrote a Regex.Match to match LineA: value\r\nLineB: value and get these values in groups. But when I use Regex.Replace, I can only provide a "static" result that is combining groups from the match, but I can not use C# methods there.
So my questions is how to Regex.Replace where Result is a result of C# method where input is LineA value.
You can use a MatchEvaluator like this:
public static class Program
{
public static void Main()
{
string input = "LineA:5\r\nLineB:2\r\nLineA:2\r\nLineB:7";
string output = Regex.Replace(input, #"LineA:(?<input_value>\d+)\r\nLineB:\d+", new MatchEvaluator(MatchEvaluator));
Console.WriteLine(output);
}
private static string MatchEvaluator(Match m)
{
int inputValue = Convert.ToInt32(m.Groups["input_value"].Value);
int outputValue = MultiplyCalculatorMethod(inputValue);
return string.Format("LineA:{0}\r\nLineB:{1}", inputValue, outputValue);
}
static int MultiplyCalculatorMethod(int value)
{
return 2 * Math.Max(3, value);
}
}
Try using following Replace overload.
public static string Replace( string input, string pattern, MatchEvaluator evaluator);
MatchEvaluator has access to Match contents and can call any other methods to return the replacement string.