How to replace raw urls inside paragraphs to html links using regex

How to replace raw urls inside paragraphs to html links using regex - c#

How to change absolute url within a paragraph:
<p>http://www.google.com</p>
into html link into paragraph:
<p>http://www.google.com</p>
Thare can be a lot of paragraphs. I want the regex to cut out the generic url value from this: <p>url<p>, and put it into template like this: <p>url</p>
How to do it in the short way ? Can it be done using regex.Replace() method ?
BTW: Regular expression used for absolute urls matching can be like this: ^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$ (taken from msdn)

Try to use this regex:
(?<!\")(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?(?!\")
to avoid matching <a href="http://www.google.com"> like strings(enclosed by").
And a sample code:
var inputString = #"<p>http://www.google.com</p><p>my web link</p>";
var pattern = #"(?<url>(?<!\")(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?(?!\"))";
var result = Regex.Replace(strInput, pattern, "${url}");
explain:
(?<!subexpression) Zero-width negative lookbehind assertion.
(?!subexpression) Zero-width negative lookahead assertion.
(?<name>subexpression) Captures the matched subexpression into a named group.

form your regex: remove first ^ and last $ - it means "match the whole input string from start to end"
string regexPattern = #"(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?";
string input = #"<p>http://www.google.com</p>";
var reg = new Regex(regexPattern, RegexOptions.IgnoreCase);
// $0 - substitution, refers to the text matched by the whole pattern
var output = reg.Replace(input, "$0");
more about substitutions http://msdn.microsoft.com/en-us/library/ewy2t5e0.aspx

Related

Simplify Regex grouping

var pattern = (?:[P|p]rint\("")(.+)(?:""\);?)
var input = Print("Hello World");
Results in two groups, the second one captures exactly what I want to capture and the first one is completely useless, how do I remove the first one?
I tried (?:ABC) it didn't work

Your pattern uses 1 capturing group () and 2 non capturing groups using (?:)
Those 2 non capturing groups you can omit as well as the | from the character class. I think you also would like to make the .* non greedy like .*? to prevent overmatching.
Then your pattern could look like(Matching an optional semicolon at the end):
[Pp]rint\("(.+?)"\);?
Regex demo
You might also use a version with a negated character class to match not a double quote:
[Pp]rint\(("[^"]+)"\);
Regex demo

Try following :
string input = "var input = Print(\"Hello World\");";
string pattern = "[Pp]rint\\(\"(?'message'[^\"]+)";
Match match = Regex.Match(input, pattern);
string message = match.Groups["message"].Value;

C# Capturing the first match with regex

I've got an input string that looks like this:
url=https%3A%2F%2Fdomain.com%2Fsale-deal%3Futm_source%3Dinsider-primary-action%3Dinsider-primary-action&utm_source=FB
or
url=https%3A%2F%2Fdomain.com%2Fsale&utm_source=FB&sub_id1=M12
the string sometimes has or non %3Futm_source
how to get link between url= and %3Futm_source% or &utm_source
Regex reg = new Regex(#"url=(https%3A%2F%2Fdomain.com[a-zA-Z0-9-_/%\.]+)%3Futm_source|&utm_source");
Match result = reg.Match(inPut);
Console.WriteLine(result.Groups[1].Value));
it always get from url= to &utm_source

You can use this
(?<=url=).*?(?=%3Futm_source|&utm_source)
(?<=url=) Positive look behind. matches url=.
.* - Matches anything except new line.
(?=%3Futm_source|&utm_source) - Positive look ahead. Matches %3Futm_source or &utm_source
Demo

C# Regex, match but not include the first character before matched string

How can I make this C# Regex to not include the first character before the URL in the matching results:
((?!\").)https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+)
This will match:
Xhttps://twitter.com/oppomobileindia/status/798397636780953600
Notice the first X letter.
I want it to match the URLs that start without double quotes. Also not include the first character before the https for those URLs that do not start with double quotes.
An actual example that I use in my code:
var str = "<div id=\"content\">
<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>
<p>\"https://twitter.com/oppomobileindia/status/11111111111111111111</p></div>";
var pattern = #"(?<!""')https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";//
var rgx = new Regex(pattern);
var results = rgx.Replace(str, "XXX");
In the above example, only the first URL should be replaces, because the second one has double quotation before the URL. It also should be replaced at the exact match, without the first letter before the matches string.

Use a (?<!") negative lookbehind:
var re = #"(?<!"")https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";
The (?<!") means that there cannot be a " immediately before the current location.
In C#, you do not need to escape / inside the pattern since regex delimiters are not used when defining the regex.
Note on the C# syntax: if you want to define a " inside a verbatim string literal, double it. In a regular string literal, escape the " and \:
var re = "(?<!\")https?://twitter\\.com/(?:#!/)?(\\w+)/status(?:es)?/(\\d+)";

Regular Expression to match the pattern

I am looking for Regular Expression search pattern to find data within $< and >$.
string pattern = "\b\$<[^>]*>\$";
is not working.
Thanks,

You can make use of a tempered greedy token:
\$<(?:(?!\$<|>\$)[\s\S])*>\$
See demo
This way, you will match only the closest boundaries.
Your regex does not match because you do not allow > in-between your markers, and you are using \b where you most probably do not have a word boundary.
If you do not want to get the delimiters in the output, use capturing group:
\$<((?:(?!\$<|>\$)[\s\S])*)>\$
^ ^
And the result will be in Group 1.
In C#, you should consider declaring all regex patterns (whenever possible) with the help of a verbatim string literal notation (with #"") because you won't have to worry about doubling backslashes:
var rx = new Regex(#"\$<(?:(?!\$<|>\$)[\s\S])*>\$");
Or, since there is a singleline flag (and this is preferable):
var rx = new Regex(#"\$<((?:(?!\$<|>\$).)*)>\$", RegexOptions.Singleline | RegexOptions.CultureInvariant);
var res = rx.Match(text).Select(p => p.Groups[1].Value).ToList();

This pattern will do the work:
(?<=\$<).*(?=>\$)
Demo: https://regex101.com/r/oY6mO2/1

To find this pattern in php you have this REGEX code for find any patten,
/$<(.*?)>$/s
For Example:
$arrayWhichStoreKeyValueArrayOfYourPattern= array();
preg_match_all('/$<(.*?)>$/s',
$yourcontentinwhichyoufind,
$arrayWhichStoreKeyValueArrayOfYourPattern);
for($i=0;$i<count($arrayWhichStoreKeyValueArrayOfYourPattern[0]);$i++)
{
$content=
str_replace(
$arrayWhichStoreKeyValueArrayOfYourPattern[0][$i],
constant($arrayWhichStoreKeyValueArrayOfYourPattern[1][$i]),
$yourcontentinwhichyoufind);
}
using this example you will replace value using same name constant content in this var $yourcontentinwhichyoufind
For example you have string like this which has also same named constant.
**global.php**
//in this file my constant declared.
define("MYNAME","Hiren Raiyani");
define("CONSTANT_VAL","contant value");
**demo.php**
$content="Hello this is $<MYNAME>$ and this is simple demo to replace $<CONSTANT_VAL>$";
$myarr= array();
preg_match_all('/$<(.*?)>$/s', $content, $myarray);
for($i=0;$i<count($myarray[0]);$i++)
{
$content=str_replace(
$myarray[0][$i],
constant($myarray[1][$i]),
$content);
}
I think as i know that's all.

C# RegEx - get only first match in string

I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.

Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.

depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"

I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"

Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>

Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to replace raw urls inside paragraphs to html links using regex - c#

Related

Simplify Regex grouping

C# Capturing the first match with regex

C# Regex, match but not include the first character before matched string

Regular Expression to match the pattern

C# RegEx - get only first match in string

Categories

Resources