Regex group capturing problem - c#

If I had an html string containing this somewhere in the middle of it:
<img src="http://images.domain.com/Images/hello.jpg" alt="Failed to Load" />
What regex would I use in order to just obtain the name of the image file? i.e. hello.jpg
Currently I am using this:
(?<front>.*<img.*src="http://images.domain.com/Images/)(?<imgName>.*)"(?<end>.*)
However the value that it finds for the imgName group is:
hello.jpg" alt="Failed to Load
Does anyone know how to fix that?

The easiest fix is to have the imgName group match anything except for quotes by changing .* to [^"]*:
(?<front>.*<img.*src="http://images.domain.com/Images/)(?<imgName>[^"]*)"(?<end>.*)

Please see why you shouldn't be trying this.
Anyway, try (?<imgName>.*?) instead.

Related

How to change rel to nofollow with Regex - c#

i want to change some urls to nofollow and i also want, some urls dofollow
i try to do it with this Regex :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!blogs.cc)[^"]+)"([^>]*)>
i can support one url to dofollow (in this ex:"blogs.cc")
if i want to dofollow more of one, what do i do?
i try with :
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)(((?!blogs.cc)[^"]+)||((?!wikipedia.org)[^"]+))"([^>]*)>
but i didn't get a correct answer
what's solution?
i resolved it and put my solution here for everybody who has same question.
just do it
(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!(?:blogs.cc|wikipedia.org|moreUrls.com))[^"]+))"([^>]*)>
C# Sample Code:
Regex.Replace(str, "(<a\\s*(?!.*\brel=)[^>]*)(href=\"https?://)((?!(?:blogs.cc|wikipedia.org))[^\"]+)\"([^>]*)>", "<a $2$3\" $4 rel=\"nofollow\">")
i hope it would be useful

Extracts all sub strings between string separators in a string (C#)

I'm trying to parse content of a string to see if the string includes urls, to convert the full string to html, to make the string clickable.
I'm not sure if there is a smarter way of doing this, but I started trying creating a parser with the Split method for strings, or Regex.Split in C#. But I can't find a good way of doing it.
(It is a ASP.NET MVC application, so perhaps there is some smarter way of doing this)
I want to ex. convert the string;
"Customer office is responsible for this. Contact info can be found {link}{www.customerservice.com}{here!}{link} More info can be found {link}{www.customerservice.com/moreinfo}{here!}{link}"
Into
"Customer office is responsible for this. Contact info can be found <a href=www.customerservice.com>here!</a> More info can be found <a href=www.customerservice.com/moreinfo>here!</a>"
i.e.
{link}{url}{text}{link} --> <a href=url>text</a>
Anyone have a good suggestion? I can also change the way the input string is formatted.
You can use the following to match:
{link}{([^}]*)}{([^}]*)}{link}
And replace with:
<a href=$1>$2</a>
See DEMO
Explanation:
{link} match {link} literally
{([^}]*)} match all characters except } in capturing group 1 (for url)
{([^}]*)} match all characters except } in capturing group 2 (for value)
{link} match {link} literally again
you can use regex as
{link}{(.*?)}{(.*?)}{link}
and substution as
<a href=\1>\2</a>
Regex
For your simple link format {link}{url}{text} you can use simple Regex.Replace:
Regex.Replace(input, #"\{link\}\{([^}]*)\}\{([^}]*)\}", #"$2");
Also this non-regex idea may help
var input = "Customer office is responsible for this. Contact info can be found {link}{www.customerservice.com}{here!}{link} More info can be found {link}{www.customerservice.com/moreinfo}{here!}{link}";
var output = input.Replace("{link}{", "<a href=")
.Replace("}{link}", "</a>")
.Replace("}{", ">");

Too short control escape. How to get Regex for this?

So, let's say I have a result from a search that comes back as:
\\my.test.site#SSL\JohnDoe\SusanSmith\courses\PDFs\Science_Math\BIOL\S12014 Syllabi\BIOL-1322-S12014-John-Doe.pdf
Whenever the result is listed in a text box I get the entire path instead of just the file. This is functioning as designed since I can't use the .Select(Path.GetFileName) while enumerating directories lest it doesn't have the full path to do the search on.
So, I was going to use Regex to do a replace at the end when the results are displayed however when I went to Rubular it doesn't like either my expression or the test string(can't figure out which).
I basically want to cut down everything except the file name and extension.
So my Regex was supposed to be something like:
\\my.test.site#SSL\JohnDoe\SusanSmith\courses\PDFs\.+\.+\.+\
So that I get everything up to the file name and extension for deletion. However Rubular doesn't like something as I get a "too short control escape" error. I don't want to test this in C# without verifying in Rubular since I use it heavily and figure if it won't work there it won't work at runtime.
Any ideas? Thanks.
Remember to escape the \ characters, as well as the literal . characters:
\\\\my\.test\.site#SSL\\JohnDoe\\SusanSmith\\courses\\PDFs\\.+\\.+\\.+\\
Also note, you probably want to avoid over-matching on the .+ by using non-greedy quantifiers:
\\\\my\.test\.site#SSL\\JohnDoe\\SusanSmith\\courses\\PDFs\\.+?\\.+?\\.+?\\
Or using character classes:
\\\\my\.test\.site#SSL\\JohnDoe\\SusanSmith\\courses\\PDFs\\[^\\]+\\[^\\]+\\[^\\]+\\
Maybe I'm misinterpreting the question, but it sounds like your approach has been overly complicated.
Can't you simply match this: .+\\
And then replace with '' (nothing)?

Using regex to split a formatted string to URL like StackOverFlow

I'm trying to write a parser that will create links found in posted text that are formatted like so:
[Site Description](http://www.stackoverflow.com)
to be rendered as a standard HTML link like this:
Site Description
So far what I have is the expression listed below and will work on the example above, but if will not work if the URL has anything after the ".com". Obviously there is no single regex expression that will find every URL but would like to be able to match as many as I can.
(\[)([A-Za-z0-9 -_]*)(\])(\()((http|https|ftp)\://[A-Za-z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?)(\))
Any help would be greatly appreciated. Thanks.
Darn. It seems #Jerry and #MikeH beat me to it. My answer is best, however, as the link tags are all uppercase ;)
Find what: \[([^]]+)\]\(([^)]+)\)
Replace with: $1
http://regex101.com/r/cY7lF0
Well, you could try negated classes so you don't have to worry about the parsing of the url itself?
\[([^]]+)\]\(([^)]+)\)
And replace with:
$1
regex101 demo
Or maybe use only the beginning parts to identify a url?
\[([^]]+)\]\(((?:https?|ftp)://[^)]+)\)
The replace is the same.

Visual Studio Regex Find and Replace function

I'm trying to find all var.AppendLine("..."); and replace them with Append("...\n");
Been fooling around with regex's but don't seem to get anywhere. Anyone has a suggestion on what regular expression to use here?
var can be a variable name and I need to select the ... for replace with Append("$1\n");
I assume you actually don't want to get rid of var:
Search: <{[a-zA-Z0-9]+}.AppendLine\("{[^"]+}"\)
Replace with: \1.Append("\2\\n")
I think you meand the regex in the "search& replace window of VS ? Then something like
<{[a-zA_Z]+\.}{AppendLine\("}{[^"]+}{"\)}
to replace with
\1Append("\3\\n")
(remove the \1 if you want to remove the "var." part, not clear in your question)

Categories