Simple regex - replace quote information in forum topic in C# - c#

This should be pretty straightforward I would think.
I have this string:
[quote=Joe Johnson|1]Hi![/quote]
Which should be replaced with something like
<div class="quote">Hi!<div>JoeJohnson</div></div>
I'm pretty shure this is not going very well. So far I have this:
Regex regexQuote = new Regex(#"\[quote\=(.*?)\|(.*?)\](.*?)\[\/quote\]");
Can anyone point me in the right direction?
Any help appreciated!

Try this:
string pattern = #"\[quote=(.*?)\|(\d+)\]([\s\S]*?)\[/quote\]";
string replacement =
#"<div class=""quote"">$3<div>$1</div></div>";
Console.WriteLine(
Regex.Replace(input, pattern, replacement));

why didn't you said you want to deal with nested tags as well...
i've barely ever worked with regex, but here the thing:
static string ReplaceQuoteTags(string input)
{
const string closeTag = #"[/quote]";
const string pattern = #"\[quote=(.*?)\|(\d+?)\](.*?)\[/quote\]"; //or whatever you prefer
const string replacement = #"<div class=""quote"">{0}<div>{2}</div></div>";
int searchStartIndex = 0;
int closeTagIndex = input.IndexOf(closeTag, StringComparison.OrdinalIgnoreCase);
while (closeTagIndex > -1)
{
Regex r = new Regex(pattern, RegexOptions.RightToLeft | RegexOptions.IgnoreCase);
bool found = false;
input = r.Replace(input,
x =>
{
found = true;
return string.Format(replacement, x.Groups[3], x.Groups[2], x.Groups[1]);
}
, 1, closeTagIndex + closeTag.Length);
if (!found)
{
searchStartIndex = closeTagIndex + closeTag.Length;
//in case there is a close tag without a proper corresond open tag.
}
closeTagIndex = input.IndexOf(closeTag, searchStartIndex, StringComparison.OrdinalIgnoreCase);
}
return input;
}

this should be your regex in dot net:
\[quote\=(?<name>(.*))\|(?<id>(.*))\](?<content>(.*))\[\/quote\]
string name = regexQuote.Match().Groups["name"];
string id = regexQuote.Match().Groups["id"];
//..

Related

Remove trailing pipes - '|' in c#

I have a string that looks something like this:
"PID||000000|Z123345|23345|SOMEONE^FIRSTNAME^^^MISS^||150|F|1111||1 DREYFUS CLOSE^SOUTH CITY^COUNTY^^POST CODE^^^||0123 45678910^PRN^PH^^^^0123 45678910^^~^^CP^^^^^^~^NET^^^^^^^||||1A|||||A||||||||N||||||||||";
I am trying to remove any separating '|' characters after the 30th '|' in the string so that the output string looks like this:
"PID||000000|Z123345|23345|SOMEONE^FIRSTNAME^^^MISS^||150|F|1111||1 DREYFUS CLOSE^SOUTH CITY^COUNTY^^POST CODE^^^||0123 45678910^PRN^PH^^^^0123 45678910^^~^^CP^^^^^^~^NET^^^^^^^||||1A|||||A||||||||N";
I am trying to do it using as little code as possible, but not having much luck. Any help or ideas would be great.
You can use the TrimEnd method
string text = "stuff||||N||||||||||";
string result = text.TrimEnd('|'); //Result is stuff||||N
Brute force but only a little bit of code:
string s2 = string.Join("|", s1.Split('|').Take(31));
If you need any other processing of this kind of data (it looks like a kind of nested CSV) then string.Split() is useful to know.
string str = "PID||000000|Z123345|23345|SOMEONE^FIRSTNAME^^^MISS^||150|F|1111||1 DREYFUS CLOSE^SOUTH CITY^COUNTY^^POST CODE^^^||0123 45678910^PRN^PH^^^^0123 45678910^^~^^CP^^^^^^~^NET^^^^^^^||||1A|||||A||||||||N||||||||||";
int c = 0;
int after = 30;
StringBuilder newStr = new StringBuilder();
for(int i = 0;i < str.length; i++){
if(str[i] == '|'){
if(after != c){
newStr.append(str[i]);
c++;
}
}else{
newStr.append(str[i]);
}
}
results in
newStr == "PID||000000|Z123345|23345|SOMEONE^FIRSTNAME^^^MISS^||150|F|1111||1 DREYFUS CLOSE^SOUTH CITY^COUNTY^^POST CODE^^^||0123 45678910^PRN^PH^^^^0123 45678910^^~^^CP^^^^^^~^NET^^^^^^^||||1A|||||A||||||||N";
A regex should do the trick:
var regex = new Regex(#"^([^\|]*\|){0,30}[^\|]*");
var match = regex.Match(input);
if(match.Success)
{
var val = match.Value;
}
If what you really want is that everything after the 30th chunk loses its '|', then try:
var chunks = input.Split('|');
var output = String.Join('|',chunks.Take(30)) + String.Concat(chunks.Skip(30));
That said, I think it sounds like what you're really looking for is probably something like:
var output = input.TrimEnd('|');
// Get the indexes of all the | characters.
int[] pipeIndexes = Enumerable.Range(0, s.Length).Where(i => s[i] == '|').ToArray();
// If there are more than thirty pipes:
if (pipeIndexes.Length > 30)
{
// The former part of the string remains intact.
string formerPart = s.Substring(0, pipeIndexes[30]);
// The latter part needs to have all | characters removed.
string latterPart = s.Substring(pipeIndexes[30]).Replace("|", "");
s = formerPart + latterPart;
}

Replace named group in regex with value

I want to use regular expression same way as string.Format. I will explain
I have:
string pattern = "^(?<PREFIX>abc_)(?<ID>[0-9])+(?<POSTFIX>_def)$";
string input = "abc_123_def";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
string replacement = "456";
Console.WriteLine(regex.Replace(input, string.Format("${{PREFIX}}{0}${{POSTFIX}}", replacement)));
This works, but i must provide "input" to regex.Replace. I do not want that. I want to use pattern for matching but also for creating strings same way as with string format, replacing named group "ID" with value. Is that possible?
I'm looking for something like:
string pattern = "^(?<PREFIX>abc_)(?<ID>[0-9])+(?<POSTFIX>_def)$";
string result = ReplaceWithFormat(pattern, "ID", 999);
Result will contain "abc_999_def". How to accomplish this?
Yes, it is possible:
public static class RegexExtensions
{
public static string Replace(this string input, Regex regex, string groupName, string replacement)
{
return regex.Replace(input, m =>
{
return ReplaceNamedGroup(input, groupName, replacement, m);
});
}
private static string ReplaceNamedGroup(string input, string groupName, string replacement, Match m)
{
string capture = m.Value;
capture = capture.Remove(m.Groups[groupName].Index - m.Index, m.Groups[groupName].Length);
capture = capture.Insert(m.Groups[groupName].Index - m.Index, replacement);
return capture;
}
}
Usage:
Regex regex = new Regex("^(?<PREFIX>abc_)(?<ID>[0-9]+)(?<POSTFIX>_def)$");
string oldValue = "abc_123_def";
var result = oldValue.Replace(regex, "ID", "456");
Result is: abc_456_def
No, it's not possible to use a regular expression without providing input. It has to have something to work with, the pattern can not add any data to the result, everything has to come from the input or the replacement.
Intead of using String.Format, you can use a look behind and a look ahead to specify the part between "abc_" and "_def", and replace it:
string result = Regex.Replace(input, #"(?<=abc_)\d+(?=_def)", "999");
There was a problem in user1817787 answer and I had to make a modification to the ReplaceNamedGroup function as follows.
private static string ReplaceNamedGroup(string input, string groupName, string replacement, Match m)
{
string capture = m.Value;
capture = capture.Remove(m.Groups[groupName].Index - m.Index, m.Groups[groupName].Length);
capture = capture.Insert(m.Groups[groupName].Index - m.Index, replacement);
return capture;
}
Another edited version of the original method by #user1817787, this one supports multiple instances of the named group (also includes similar fix to the one #Justin posted (returns result using {match.Index, match.Length} instead of {0, input.Length})):
public static string ReplaceNamedGroup(
string input, string groupName, string replacement, Match match)
{
var sb = new StringBuilder(input);
var matchStart = match.Index;
var matchLength = match.Length;
var captures = match.Groups[groupName].Captures.OfType<Capture>()
.OrderByDescending(c => c.Index);
foreach (var capt in captures)
{
if (capt == null)
continue;
matchLength += replacement.Length - capt.Length;
sb.Remove(capt.Index, capt.Length);
sb.Insert(capt.Index, replacement);
}
var end = matchStart + matchLength;
sb.Remove(end, sb.Length - end);
sb.Remove(0, matchStart);
return sb.ToString();
}
I shortened ReplaceNamedGroup, still supporting multiple captures.
private static string ReplaceNamedGroup(string input, string groupName, string replacement, Match m)
{
string result = m.Value;
foreach (Capture cap in m.Groups[groupName]?.Captures)
{
result = result.Remove(cap.Index - m.Index, cap.Length);
result = result.Insert(cap.Index - m.Index, replacement);
}
return result;
}
The simple solution is to refer to the matched groups in replacement. So the Prefix is $1 and Postfix is $3.
I've haven't tested the code below but should work similar to a regEx I've written recently:
string pattern = "^(?<PREFIX>abc_)(?<ID>[0-9])+(?<POSTFIX>_def)$";
string input = "abc_123_def";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
string replacement = String.Format("$1{0}$3", "456");
Console.WriteLine(regex.Replace(input, string.Format("${{PREFIX}}{0}${{POSTFIX}}", replacement)));
In case this helps anyone, I enhanced the answer with the ability to replace multiple named capture groups in one go, which this answer helped massively to achieve.
public static class RegexExtensions
{
public static string Replace(this string input, Regex regex, Dictionary<string, string> captureGroupReplacements)
{
string temp = input;
foreach (var key in captureGroupReplacements.Keys)
{
temp = regex.Replace(temp, m =>
{
return ReplaceNamedGroup(key, captureGroupReplacements[key], m);
});
}
return temp;
}
private static string ReplaceNamedGroup(string groupName, string replacement, Match m)
{
string capture = m.Value;
capture = capture.Remove(m.Groups[groupName].Index - m.Index, m.Groups[groupName].Length);
capture = capture.Insert(m.Groups[groupName].Index - m.Index, replacement);
return capture;
}
}
Usage:
var regex = new Regex(#"C={BasePath:""(?<basePath>[^\""].*)"",ResultHeadersPath:""ResultHeaders"",CORS:(?<cors>true|false)");
content = content.Replace(regex, new Dictionary<string, string>
{
{ "basePath", "www.google.com" },
{ "cors", "false" }
};
All credit should go to user1817787 for this one.
You should check the documentation about RegEx replace here
I created this to replace a named group. I cannot use solution that loop on all groups name because I have case where not all expression is grouped.
public static string ReplaceNamedGroup(this Regex regex, string input, string namedGroup, string value)
{
var replacement = Regex.Replace(regex.ToString(),
#"((?<GroupPrefix>\(\?)\<(?<GroupName>\w*)\>(?<Eval>.[^\)]+)(?<GroupPostfix>\)))",
#"${${GroupName}}").TrimStart('^').TrimEnd('$');
replacement = replacement.Replace("${" + namedGroup + "}", value);
return Regex.Replace(input, regex.ToString(), replacement);
}

Find substring ignoring specified characters

Do any of you know of an easy/clean way to find a substring within a string while ignoring some specified characters to find it. I think an example would explain things better:
string: "Hello, -this- is a string"
substring to find: "Hello this"
chars to ignore: "," and "-"
found the substring, result: "Hello, -this"
Using Regex is not a requirement for me, but I added the tag because it feels related.
Update:
To make the requirement clearer: I need the resulting substring with the ignored chars, not just an indication that the given substring exists.
Update 2:
Some of you are reading too much into the example, sorry, i'll give another scenario that should work:
string: "?A&3/3/C)412&"
substring to find: "A41"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)41"
And as a bonus (not required per se), it will be great if it's also not safe to assume that the substring to find will not have the ignored chars on it, e.g.: given the last example we should be able to do:
substring to find: "A3C412&"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)412&"
Sorry if I wasn't clear before, or still I'm not :).
Update 3:
Thanks to everyone who helped!, this is the implementation I'm working with for now:
http://www.pastebin.com/pYHbb43Z
An here are some tests:
http://www.pastebin.com/qh01GSx2
I'm using some custom extension methods I'm not including but I believe they should be self-explainatory (I will add them if you like)
I've taken a lot of your ideas for the implementation and the tests but I'm giving the answer to #PierrOz because he was one of the firsts, and pointed me in the right direction.
Feel free to keep giving suggestions as alternative solutions or comments on the current state of the impl. if you like.
in your example you would do:
string input = "Hello, -this-, is a string";
string ignore = "[-,]*";
Regex r = new Regex(string.Format("H{0}e{0}l{0}l{0}o{0} {0}t{0}h{0}i{0}s{0}", ignore));
Match m = r.Match(input);
return m.Success ? m.Value : string.Empty;
Dynamically you would build the part [-, ] with all the characters to ignore and you would insert this part between all the characters of your query.
Take care of '-' in the class []: put it at the beginning or at the end
So more generically, it would give something like:
public string Test(string query, string input, char[] ignorelist)
{
string ignorePattern = "[";
for (int i=0; i<ignoreList.Length; i++)
{
if (ignoreList[i] == '-')
{
ignorePattern.Insert(1, "-");
}
else
{
ignorePattern += ignoreList[i];
}
}
ignorePattern += "]*";
for (int i = 0; i < query.Length; i++)
{
pattern += query[0] + ignorepattern;
}
Regex r = new Regex(pattern);
Match m = r.Match(input);
return m.IsSuccess ? m.Value : string.Empty;
}
Here's a non-regex string extension option:
public static class StringExtensions
{
public static bool SubstringSearch(this string s, string value, char[] ignoreChars, out string result)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentException("Search value cannot be null or empty.", "value");
bool found = false;
int matches = 0;
int startIndex = -1;
int length = 0;
for (int i = 0; i < s.Length && !found; i++)
{
if (startIndex == -1)
{
if (s[i] == value[0])
{
startIndex = i;
++matches;
++length;
}
}
else
{
if (s[i] == value[matches])
{
++matches;
++length;
}
else if (ignoreChars != null && ignoreChars.Contains(s[i]))
{
++length;
}
else
{
startIndex = -1;
matches = 0;
length = 0;
}
}
found = (matches == value.Length);
}
if (found)
{
result = s.Substring(startIndex, length);
}
else
{
result = null;
}
return found;
}
}
EDIT: here's an updated solution addressing the points in your recent update. The idea is the same except if you have one substring it will need to insert the ignore pattern between each character. If the substring contains spaces it will split on the spaces and insert the ignore pattern between those words. If you don't have a need for the latter functionality (which was more in line with your original question) then you can remove the Split and if checking that provides that pattern.
Note that this approach is not going to be the most efficient.
string input = #"foo ?A&3/3/C)412& bar A341C2";
string substring = "A41";
string[] ignoredChars = { "&", "/", "3", "C", ")" };
// builds up the ignored pattern and ensures a dash char is placed at the end to avoid unintended ranges
string ignoredPattern = String.Concat("[",
String.Join("", ignoredChars.Where(c => c != "-")
.Select(c => Regex.Escape(c)).ToArray()),
(ignoredChars.Contains("-") ? "-" : ""),
"]*?");
string[] substrings = substring.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string pattern = "";
if (substrings.Length > 1)
{
pattern = String.Join(ignoredPattern, substrings);
}
else
{
pattern = String.Join(ignoredPattern, substring.Select(c => c.ToString()).ToArray());
}
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Index: {0} -- Match: {1}", match.Index, match.Value);
}
Try this solution out:
string input = "Hello, -this- is a string";
string[] searchStrings = { "Hello", "this" };
string pattern = String.Join(#"\W+", searchStrings);
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
}
The \W+ will match any non-alphanumeric character. If you feel like specifying them yourself, you can replace it with a character class of the characters to ignore, such as [ ,.-]+ (always place the dash character at the start or end to avoid unintended range specifications). Also, if you need case to be ignored use RegexOptions.IgnoreCase:
Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
If your substring is in the form of a complete string, such as "Hello this", you can easily get it into an array form for searchString in this way:
string[] searchString = substring.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
This code will do what you want, although I suggest you modify it to fit your needs better:
string resultString = null;
try
{
resultString = Regex.Match(subjectString, "Hello[, -]*this", RegexOptions.IgnoreCase).Value;
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}
You could do this with a single Regex but it would be quite tedious as after every character you would need to test for zero or more ignored characters. It is probably easier to strip all the ignored characters with Regex.Replace(subject, "[-,]", ""); then test if the substring is there.
Or the single Regex way
Regex.IsMatch(subject, "H[-,]*e[-,]*l[-,]*l[-,]*o[-,]* [-,]*t[-,]*h[-,]*i[-,]*s[-,]*")
Here's a non-regex way to do it using string parsing.
private string GetSubstring()
{
string searchString = "Hello, -this- is a string";
string searchStringWithoutUnwantedChars = searchString.Replace(",", "").Replace("-", "");
string desiredString = string.Empty;
if(searchStringWithoutUnwantedChars.Contains("Hello this"))
desiredString = searchString.Substring(searchString.IndexOf("Hello"), searchString.IndexOf("this") + 4);
return desiredString;
}
You could do something like this, since most all of these answer require rebuilding the string in some form.
string1 is your string you want to look through
//Create a List(Of string) that contains the ignored characters'
List<string> ignoredCharacters = new List<string>();
//Add all of the characters you wish to ignore in the method you choose
//Use a function here to get a return
public bool subStringExist(List<string> ignoredCharacters, string myString, string toMatch)
{
//Copy Your string to a temp
string tempString = myString;
bool match = false;
//Replace Everything that you don't want
foreach (string item in ignoredCharacters)
{
tempString = tempString.Replace(item, "");
}
//Check if your substring exist
if (tempString.Contains(toMatch))
{
match = true;
}
return match;
}
You could always use a combination of RegEx and string searching
public class RegExpression {
public static void Example(string input, string ignore, string find)
{
string output = string.Format("Input: {1}{0}Ignore: {2}{0}Find: {3}{0}{0}", Environment.NewLine, input, ignore, find);
if (SanitizeText(input, ignore).ToString().Contains(SanitizeText(find, ignore)))
Console.WriteLine(output + "was matched");
else
Console.WriteLine(output + "was NOT matched");
Console.WriteLine();
}
public static string SanitizeText(string input, string ignore)
{
Regex reg = new Regex("[^" + ignore + "]");
StringBuilder newInput = new StringBuilder();
foreach (Match m in reg.Matches(input))
{
newInput.Append(m.Value);
}
return newInput.ToString();
}
}
Usage would be like
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this"); //Should match
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this2"); //Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A41"); // Should match
RegExpression.Example("?A&3/3/C) 412&", "&/3C\\)", "A41"); // Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A3C412&"); // Should match
Output
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this
was matched
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this2
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A41
was matched
Input: ?A&3/3/C) 412&
Ignore: &/3C)
Find: A41
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A3C412&
was matched

Search keyword highlight in ASP.Net

I am outputting a list of search results for a given string of keywords, and I want any matching keywords in my search results to be highlighted. Each word should be wrapped in a span or similar. I am looking for an efficient function to do this.
E.g.
Keywords: "lorem ipsum"
Result: "Some text containing lorem and ipsum"
Desired HTML output: "Some text containing <span class="hit">lorem</span> and <span class="hit">ipsum</span>"
My results are case insensitive.
Here's what I've decided on. An extension function that I can call on the relevant strings within my page / section of my page:
public static string HighlightKeywords(this string input, string keywords)
{
if (input == string.Empty || keywords == string.Empty)
{
return input;
}
string[] sKeywords = keywords.Split(' ');
foreach (string sKeyword in sKeywords)
{
try
{
input = Regex.Replace(input, sKeyword, string.Format("<span class=\"hit\">{0}</span>", "$0"), RegexOptions.IgnoreCase);
}
catch
{
//
}
}
return input;
}
Any further suggestions or comments?
try highlighter from Lucene.net
http://incubator.apache.org/lucene.net/docs/2.0/Highlighter.Net/Lucene.Net.Highlight.html
How to use:
http://davidpodhola.blogspot.com/2008/02/how-to-highlight-phrase-on-results-from.html
EDIT:
As long as Lucene.net highlighter is not suitable here new link:
http://mhinze.com/archive/search-term-highlighter-httpmodule/
Use the jquery highlight plugin.
For highlighting it at server side
protected override void Render( HtmlTextWriter writer )
{
StringBuilder html = new StringBuilder();
HtmlTextWriter w = new HtmlTextWriter( new StringWriter( html ) );
base.Render( w );
html.Replace( "lorem", "<span class=\"hit\">lorem</span>" );
writer.Write( html.ToString() );
}
You can use regular expressions for advanced text replacing.
You can also write the above code in an HttpModule so that it can be re used in other applications.
An extension to the answer above. (don't have enough reputation to give comment)
To avoid span from being replaced when search criteria were [span pan an a], the found word was replaced to something else than replace back... not very efficient though...
public string Highlight(string input)
{
if (input == string.Empty || searchQuery == string.Empty)
{
return input;
}
string[] sKeywords = searchQuery.Replace("~",String.Empty).Replace(" "," ").Trim().Split(' ');
int totalCount = sKeywords.Length + 1;
string[] sHighlights = new string[totalCount];
int count = 0;
input = Regex.Replace(input, Regex.Escape(searchQuery.Trim()), string.Format("~{0}~", count), RegexOptions.IgnoreCase);
sHighlights[count] = string.Format("<span class=\"highlight\">{0}</span>", searchQuery);
foreach (string sKeyword in sKeywords.OrderByDescending(s => s.Length))
{
count++;
input = Regex.Replace(input, Regex.Escape(sKeyword), string.Format("~{0}~", count), RegexOptions.IgnoreCase);
sHighlights[count] = string.Format("<span class=\"highlight\">{0}</span>", sKeyword);
}
for (int i = totalCount - 1; i >= 0; i--)
{
input = Regex.Replace(input, "\\~" + i + "\\~", sHighlights[i], RegexOptions.IgnoreCase);
}
return input;
}

Replace all Special Characters in a string IN C#

I would like to find all special characters in a string and replace with a Hyphen (-)
I am using the below code
string content = "foo,bar,(regular expression replace) 123";
string pattern = "[^a-zA-Z]"; //regex pattern
string result = System.Text.RegularExpressions.Regex.Replace(content,pattern, "-");
OutPut
foo-bar--regular-expression-replace----
I am getting multiple occurrence of hyphen (---) in the out put.
I would like to get some thing like this
foo-bar-regular-expression-replace
How do I achieve this
Any help would be appreciated
Thanks
Deepu
why not just do this:
public static string ToSlug(this string text)
{
StringBuilder sb = new StringBuilder();
var lastWasInvalid = false;
foreach (char c in text)
{
if (char.IsLetterOrDigit(c))
{
sb.Append(c);
lastWasInvalid = false;
}
else
{
if (!lastWasInvalid)
sb.Append("-");
lastWasInvalid = true;
}
}
return sb.ToString().ToLowerInvariant().Trim();
}
Try the pattern: "[^a-zA-Z]+" - i.e. replace one-or-more non-alpha (you might allow numeric, though?).
Wouldn't this work?
string pattern = "[^a-zA-Z]+";

Categories