asp.net regex.replace() - c#

I have the following code to first remove html tags and then highlight the search term within the resulting text:
protected void ListView1_ItemDataBound(object sender, ListViewItemEventArgs e)
{
try
{
// get search value query string
string searchText = Request.QueryString["search"].Trim();
string encodedValue = Server.HtmlEncode(searchText);
Literal Content = e.Item.FindControl("Content") as Literal;
string contentText = Content.Text;
Content.Text = Regex.Replace(contentText, #"<(.|\n)*?>", string.Empty).Replace(encodedValue, "<font class='highlight2'>" + encodedValue + "</font>");
}
catch
{
// do nothing
}
}
This works to a degree but the second replace is not case insensitive. How can I do the second replace also with regex.replace() so case sensitivity is not an issue? Thank you!

Use this overload which takes in RegexOptions. You'll want the IgnoreCase value.

First let's talk about the regex you're using to remove the tags, <(.|\n)*?>. If you want the dot to match anything including a newline, you should use Singleline mode. It's also known as DOTALL mode in some flavors, because that's what it does: allows the dot to match newlines. You can use the RegexOptions.Singleline flag for that, or embed it in the regex with an inline modifier:
`(?s)<.*?>`
This is still pretty fragile, but I'll leave it at that because there's no way to make it bulletproof; regexes and HTML are fundamentally incompatible.
As for the second replacement, the first thing you need to do is break up those chained method calls--in fact, I would say they never should have been chained. Feeding the result of a Regex.Replace directly to String.Replace is either an error or excessively clever. In either case, you have to split them up if you want to call Regex.Replace twice.
You also need to escape any regex metacharacters the search expression, assuming you still want to do a literal search and not a regex search. You can use the Escape method for that.
string searchText = Request.QueryString["search"].Trim();
string encodedValue = Server.HtmlEncode(searchText);
string escapedValue = Regex.Escape(encodedValue);
string contentText = Content.Text;
contentText = Regex.Replace(contentText, #"(?s)<.*?>", string.Empty);
contentText = Regex.Replace(contentText, escapedValue,
"<font class='highlight2'>$&</font>", RegexOptions.IgnoreCase);
Content.Text = contentText;
There are a few other things in your code that don't seem right to me (like why you seem to be permanently removing all the tags), but I'm trying to stay focused on your actual question. To that end, I've tried to make the minimum necessary changes in the code to illustrate my answer. But there's one more thing I just have to comment on:
catch
{
// do nothing
}
Don't do that. At the very least, send an error message to the console or rethrow the exception for the calling code to deal with, but never silently swallow them.

Related

Regex in C# - remove quotes and escaped quotes from a value after another value

I am using HighCharts and am generating script from C# and there's an unfortunate thing where they use inline functions for formatters and events. Unfortunately, I can't output JSON like that from any serializer I know of. In other words, they want something like this:
"labels":{"formatter": function() { return Highcharts.numberFormat(this.value, 0); }}
And with my serializers available to me, I can only get here:
"labels":{"formatter":"function() { return Highcharts.numberFormat(this.value, 0); }"}
These are used for click events as well as formatters, and I absolutely need them.
So I'm thinking regex, but it's been years and years and also I was never a regex wizard.
What kind of Regex replace can I use on the final serialized string to replace any quoted value that starts with function() with the unquoted version of itself? Also, the function itself may have " in it, in which case the quoted string might have \" in it, which would need to also be replaced back down to ".
I'm assuming I can use a variant of the first answer here:
Finding quoted strings with escaped quotes in C# using a regular expression
but I can't seem to make it happen. Please help me for the love of god.
I've put more sweat into this, and I've come up with
serialized = Regex.Replace(serialized, #"""function\(\)[^""\\]*(?:\\.[^""\\]*)*""", "function()$1");
However, my end result is always:
formatter:function()$1
This tells me I'm matching the proper stuff, but my capture isn't working right. Now I feel like I'm probably being an idiot with some C# specific regex situation.
Update: Yes, I was being an idiot. I didn't have a capture around what I really wanted.
`enter code here` serialized = Regex.Replace(serialized, #"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1");
that gets my match, but in a case like this:
"formatter":"function() { alert(\"hi!\"); return Highcharts.numberFormat(this.value, 0); }"
it returns:
"formatter":function() { alert(\"hi!\"); return Highcharts.numberFormat(this.value, 0); }
and I need to get those nasty backslashes out of there. Now I think I'm truly stuck.
Regexp for match
"function\(\) (?<code>.*)"
Replace expression
function() ${code}
Try this : http://regexr.com?30jpf
What it does :
Finds double quotes JUST before a function declaration and immediately after it.
Regex :
(")(?=function()).+(?<=\})(")
Replace groups 1 & 3 with nothing :
3 capturing groups:
group 1: (")
group 2: ()
group 3: (")
string serialized = JsonSerializer.Serialize(chartDefinition);
serialized = Regex.Replace(serialized, #"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1").Replace("\\\"", "\"");

C# Regular expression problem

I have the following string:
http://www.powerwXXe.com/text1 123-456 text2 text3/
Can someone give me advice on how to get the value of text1, text2 and text3 and put them into a string. I have heard of regular expressions but have no idea how to use them.
Instead of going the RegEx route, if you know that the string will always be of a similar format, you can using string.Split, first on /, then on space and retrieve the results from the resulting string arrays.
string[] slashes = myString.Split('/');
string[] textVals = slashes[3].Split(' ');
// at this point:
// textVals[0] = "text1"
// textVals[1] = "123-456"
// textVals[2] = "text2"
// textVals[3] = "text3"
Here is a link on getting started with regular expressions in C#:Regular Expression Tutorial
I don't think it is appropriate to write out a tutorial here since the information is online, so please check out the link and let me know if you have a specific question.
Instead of using regex, you can use string.Fromat("http://myurl.com/{0}{1}{2}", value1, textbox2.Text, textbox3.Text) and format the url in whatever fashion. If you are looking to go the regex route, you can always check regexlib.
The use of regular expressions relies on patterns you see in your strings - you need to be able to generalize the pattern of strings you're looking for before you can use a regular expression.
For a problem of this scope, if you can pin down the pattern, you're probably better off using other string parsing methods, such as String.IndexOf and String.Split.
Regular expressions is a powerful tool, and certainly worth learning, but it might not be necessary here.
Based on the example you gave, it looks as though text1, text2 and text3 are separated by spaces? If so, and if you always know the positions they'll be in, you may want to skip regular expressions and just use .Split(' ') to split the string into an array of strings and then grab the pertinent items from there. Something like this:
string foo = "http://www.powerwXXe.com/text1 123-456 text2 text3/"
string[] fooParts = foo.Split(' ');
string text1 = fooParts[0].Replace("http://www.powerwXXe.com/", "");
string text2 = fooParts[2];
string text3 = fooParts[3].Replace("/", "");
You'd want to perform bounds checking on the string[] before trying to grab anything from it, but this would work. Regex is awesome for string parsing, but when it's simple stuff you need to do, sometimes it's overkill when simple methods from the string class will do.
It all depends on how much you know about about the string you are parsing. Where does the string come from and how much do you know about it's formating?
Based on your example string you could get away with something as simple as
string pattern = #"http://www.powerwXXe.com/(?<myGroup1>\S+)\s\S+\s(?<myGroup2>\S+)\s(?<myGroup3>\S+)/";
var reg = new System.Text.RegularExpressions.Regex(pattern);
string input = "http://www.powerwXXe.com/text1 123-456 text2 text3/";
System.Text.RegularExpressions.Match myMatch = reg.Match(input);
The caputerd strings would then be contained in myMatch.Groups["myGroup1"], ["myGroup2"], ["myGroup3"] respectivly.
This however assumes that your string always begins with http://www.powerwXXe.com/, that there will always be three groups to capture and that the groups are separated by a space (which is an illegal character in url's and would in almost all cases be converted to %20, which would have to be accounted for in the pattern).
So, how much do you know about your string? And, as some has already stated, do you really need regular expressions?

Nothing happens when using Regex in asp.net

Regex really does nothing if i run this code:
input contains: "geeeeekdldn"
Regex.Replace(input, #"g(.|\n)*?n", string.Empty);
normally after regex the value of input is "" but i still get "geeeeekdldn"
can someone help me please
You need to assign the output of the Replace to a new string:
string output = Regex.Replace(input, #"g(.|\n)*?n", string.Empty);
Replace doesn't update the input string - see the MSDN documentation - because (as Hans points out) .NET strings are immutable and cannot, therefore, be changed. So any method that manipulates a string must return a new string rather than updating the supplied string.
Regex.Replace is a function which has the string with the replacement made as its return value. At the moment you are discarding this return value. You probably want
string processedInput = Regex.Replace(input, #"g(.|\n)*?n", string.Empty);
In addition to all the (correct) answers: the String type in .Net is immutable, meaning that a string value can only be replaced, not changed. So all functions that work on a string always return a new one instead of changing the argument.

help with a tag removal regex

I have strings in the form: "[user:fred][priority:3]Lorem ipsum dolor sit amet." where the area enclosed in square brackets is a tag (in the format [key:value]). I need to be able to remove a specific tag given it's key with the following extension method:
public static void RemoveTagWithKey(this string message, string tagKey) {
if (message.ContainsTagWithKey(tagKey)) {
var regex = new Regex(#"\[" + tagKey + #":[^\]]");
message = regex.Replace(message , string.Empty);
}
}
public static bool ContainsTagWithKey(this string message, string tagKey) {
return message.Contains(string.Format("[{0}:", tagKey));
}
Only the tag with the specified key should be removed from the string. My regex doesn't work because it's daft. I need help to write it properly. Alternatively, an implementation without regex is welcome.
I know there are much more feature-rich tools out there, but I like the simplicity and cleanliness of Code Architects Regex Tester (aka YART: Yet Another Regex Tester). Shows groups and captures in a tree view, quite fast, very small, open source. It also generates code in C++, VB, and C# and can automatically escape or unescape regexes for these languages. I dump it in my VS tools folder (C:\Program Files\Microsoft Visual Studio 9.0\Common7\Tools) and set a menu item to it in the Tools menu with Tools > External Tools so I can fire it up quickly from inside VS.
Regexes can be really hard to write sometimes and I know it really helps to be able to test the regex and see the results as you go.
(source: dotnet2themax.com)
Another really popular (but not free) option is Regex Buddy.
If you want to do this without a Regex it isn't difficult. You're already searching for a specific tag key, so you can just search for "[" + tagKey, then search from there for the closing "]", and remove everything between those offsets. Something like...
int posStart = message.IndexOf("[" + tagKey + ":");
if(posStart >= 0)
{
int posEnd = message.IndexOf("]", posStart);
if(posEnd > posStart)
{
message = message.Remove(posStart, posEnd - posStart);
}
}
Is that better than a Regex solution? Since you're only looking for a specific key I think it probably is, on the grounds of simplicity. I love Regexes but they're not always the clearest answer.
Edit: Another reason the IndexOf() solution could be seen as better is that it means there is only one rule for finding the start of the tag, whereas the original code uses a Contains() which searches for something like '[tag:' and then uses a regex which uses a slightly different expression to do the substitution / removal. In theory you could have text which matches one criterion but not the other.
Try this instead:
new Regex(#"\[" + tagKey + #":[^\]+]");
The only thing I changed was to add + to the [^\] pattern, meaning that you match one or more characters that are not a backslash.
I think this is the regex you're looking for:
string regex = #"\[" + tag + #":[^\]+]\]";
Also, you don't need to do a separate check to see if there are tags of that type. Just do a regex replace; if there are no matches, the original string is returned.
public static string RemoveTagWithKey(string message, string tagKey) {
string regex = #"\[" + tag + #":[^\]+]\]";
return Regex.Replace(message, regex, string.Empty);
}
You seem to be writing an extension method, but I wrote this as a static utility method to keep things simple.

Easiest way to convert a URL to a hyperlink in a C# string?

I am consuming the Twitter API and want to convert all URLs to hyperlinks.
What is the most effective way you've come up with to do this?
from
string myString = "This is my tweet check it out http://tinyurl.com/blah";
to
This is my tweet check it out http://tinyurl.com/>blah
Regular expressions are probably your friend for this kind of task:
Regex r = new Regex(#"(https?://[^\s]+)");
myString = r.Replace(myString, "$1");
The regular expression for matching URLs might need a bit of work.
I did this exact same thing with jquery consuming the JSON API here is the linkify function:
String.prototype.linkify = function() {
return this.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/, function(m) {
return m.link(m);
});
};
This is actually an ugly problem. URLs can contain (and end with) punctuation, so it can be difficult to determine where a URL actually ends, when it's embedded in normal text. For example:
http://example.com/.
is a valid URL, but it could just as easily be the end of a sentence:
I buy all my witty T-shirts from http://example.com/.
You can't simply parse until a space is found, because then you'll keep the period as part of the URL. You also can't simply parse until a period or a space is found, because periods are extremely common in URLs.
Yes, regex is your friend here, but constructing the appropriate regex is the hard part.
Check out this as well: Expanding URLs with Regex in .NET.
You can add some more control on this by using MatchEvaluator delegate function with regular expression:
suppose i have this string:
find more on http://www.stackoverflow.com
now try this code
private void ModifyString()
{
string input = "find more on http://www.authorcode.com ";
Regex regx = new Regex(#"\b((http|https|ftp|mailto)://)?(www.)+[\w-]+(/[\w- ./?%&=]*)?");
string result = regx.Replace(input, new MatchEvaluator(ReplaceURl));
}
static string ReplaceURl(Match m)
{
string x = m.ToString();
x = "< a href=\"" + x + "\">" + x + "</a>";
return x;
}
/cheer for RedWolves
from: this.replace(/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&\?/.=]+/, function(m){...
see: /[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&\?/.=]+/
There's the code for the addresses "anyprotocol"://"anysubdomain/domain"."anydomainextension and address",
and it's a perfect example for other uses of string manipulation. you can slice and dice at will with .replace and insert proper "a href"s where needed.
I used jQuery to change the attributes of these links to "target=_blank" easily in my content-loading logic even though the .link method doesn't let you customize them.
I personally love tacking on a custom method to the string object for on the fly string-filtering (the String.prototype.linkify declaration), but I'm not sure how that would play out in a large-scale environment where you'd have to organize 10+ custom linkify-like functions. I think you'd definitely have to do something else with your code structure at that point.
Maybe a vet will stumble along here and enlighten us.

Categories