c# regex to extract link after = - c#

Couldn't find better title but i need a Regex to extract link from sample below.
snip... flashvars.image_url = 'http://domain.com/test.jpg' ..snip
assuming regex is the best way.
thanks

Consider the following sample code. It shows how one might extract from your supplied string. But I have expanded upon the string some. Generally, the use of .* is too all inclusive (as the example below demonstrates).
The main point, is there are several ways to do what you are asking, the first answer given uses "look-around" while the second suggests the "Groups" approach. The choice mainly depend upon your actual data.
string[] tests = {
#"snip... flashvars.image_url = 'http://domain.com/test.jpg' ..snip",
#"snip... flashvars.image_url = 'http://domain.com/test.jpg' flashvars2.image_url = 'http://someother.domain.com/test.jpg'",
};
string[] patterns = {
#"(?<==\s')[^']*(?=')",
#"=\s*'(.*)'",
#"=\s*'([^']*)'",
};
foreach (string pattern in patterns)
{
Console.WriteLine();
foreach (string test in tests)
foreach (Match m in Regex.Matches(test, pattern))
{
if (m.Groups.Count > 1)
Console.WriteLine("{0}", m.Groups[1].Value);
else
Console.WriteLine("{0}", m.Value);
}
}

A simple regex for this would be #"=\s*'(.*)'".

Edit: New regex matching your edited question:
You need to match what's between quotes, after a =, right?
#"(?<==\s*')[^']*(?=')"
should do.
(?<==\s*') asserts that there is a =, optionally followed by whitespace, followed by a ', just before our current position (positive lookbehind).
[^']* matches any number of non-' characters.
(?=') asserts that the match stops before the next '.
This regex doesn't check if there is indeed a URL inside those quotes. If you want to do that, use
#"(?<==\s*')(?=(?:https?|ftp|mailto)\b)[^']*(?=')"

Related

Use RegEx to extract specific part from string

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?
Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.
please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here
You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.
var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1
Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

Search string using Pattern within long string in C#

I need to search for a pattern within a string.
For eg:
string big = "Hello there, I need information for ticket XYZ12345. I also submitted ticket ZYX54321. Please update.";
Now I need to extract/find/seek words based on the pattern XXX00000 i.e. 3 ALPHA and than 5 numeric.
Is there any way to do this ?
Even extraction will be okay for me.
Please help.
foreach (Match m in Regex.Matches(big, "([A-Za-z]{3}[0-9]{5})"))
{
if (m.Success)
{
m.Groups[1].Value // -- here is your match
}
}
How about this one?
([XYZ]{3}[0-9]{5})
You can use Regex Tester to test your expressions.
You can use simple regular expression to match your following string
([A-Za-z]{3}[0-9]{5})
the full code will be:
string strRegex = #"([A-Za-z]{3}[0-9]{5})";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase);
string strTargetString = #"Hello there, I need information for ticket XYZ12345. I also submitted ticket ZYX54321. Please update.";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
You could always use a chatbot extension for the requests.
as for extracting the required information out of a sentence without any context
you can use regex for that.
you can use http://rubular.com/ to test it,
an example would be
...[0-9]
that would find XXX00000
hope that helped.
Use a regex:
string ticketNumber = string.Empty;
var match = Regex.Match(myString,#"[A-Za-z]{3}\d{5}");
if(match.Success)
{
ticketNumber = match.Value;
}
Here's a regex:
var str = "ABCD12345 ABC123456 ABC12345 XYZ98765";
foreach (Match m in Regex.Matches(str, #"(?<![A-Z])[A-Z]{3}[0-9]{5}(?![0-9])"))
Console.WriteLine(m.Value);
The extra bits are the zero-width negative look-behind ((?<![A-Z])) and look-ahead ((?![0-9])) expressions to make sure you don't capture extra numbers or letters. The above example only catches the third and fourth parts, but not the first and second. A simple [A-Z]{3}[0-9]{5} catches at least the specified number of characters, or more.

How to find a string with missing fragments?

I'm building a chatbot in C# using AIML files, at the moment I've this code to process:
<aiml>
<category>
<pattern>a * is a *</pattern>
<template>when a <star index="1"/> is not a <star index="2"/>?</template>
</category>
</aiml>
I would like to do something like:
if (user_string == pattern_string) return template_string;
but I don't know how to tell the computer that the star character can be anything, and expecially that can be more than one word!
I was thinking to do it with regular expressions, but I don't have enough experience with it. Can somebody help me? :)
Using Regex
static bool TryParse(string pattern, string text, out string[] wildcardValues)
{
// ^ and $ means that whole string must be matched
// Regex.Escape (http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape(v=vs.110).aspx)
// (.+) means capture at least one character and place it in match.Groups
var regexPattern = string.Format("^{0}$", Regex.Escape(pattern).Replace(#"\*", "(.+)"));
var match = Regex.Match(text, regexPattern, RegexOptions.Singleline);
if (!match.Success)
{
wildcardValues = null;
return false;
}
//skip the first one since it is the whole text
wildcardValues = match.Groups.Cast<Group>().Skip(1).Select(i => i.Value).ToArray();
return true;
}
Sample usage
string[] wildcardValues;
if(TryParse("Hello *. * * to *", "Hello World. Happy holidays to all", out wildcardValues))
{
//it's a match
//wildcardValues contains the values of the wildcard which is
//['World','Happy','holidays','all'] in this sample
}
By the way, you don't really need Regex for this, it's overkill. Just implement your own algorithm by splitting the pattern into tokens using string.Split then finding each token using string.IndexOf. Although using Regex does result in shorter code
Do you think this should work for you?
Match match = Regex.Match(pattern_string, #"<pattern>a [^<]+ is a [^<]+</pattern>");
if (match.Success)
{
// do something...
}
Here [^<]+ represents for one or more characters which is/are not <
If you think you may have < character in your *, then you can simply use .+ instead of [^<]+
But this will be risky as .+ means any characters having one or multiple times.

Regex collection groups in C# when using an OR

If I have the following code:
Regex xp = new Regex(#"(\*\*)(.+?)\*\*|(\*)([^\*]+)\*");
string text = #"*hello* **world**";
MatchCollection r_Matches = xp.Matches(text);
foreach (Match m in r_Matches)
{
Console.WriteLine(m.Groups[1].ToString());
Console.WriteLine(m.Groups[3].ToString());
}
// Outputs:
// ''
// '*'
// '**'
// ''
How can I run the above regular expression and have the result of the first collection from either side of the OR appear in the same place? (ie. .Groups[1] returns either ** or _, I gather this isn't how Regexes in C# work but is this achievable? and if so how?)
You can use a backreference:
Regex xp = new Regex(#"(\*{1,2})(.+?)\1");
string text = #"*hello* **world**";
MatchCollection r_Matches = xp.Matches(text);
foreach (Match m in r_Matches)
{
Console.WriteLine(m.Groups[1].ToString());
}
This will match ** or * followed one or more of any characters until it finds exactly what it had matched before (** or *).
As one of the commenters said, you can use named groups for this. .NET is more flexible than most of the other regex flavors in that it allows you to use the same name in different parts of the regex, with no restrictions. With this regex:
#"(?<delim>\*\*)(?<content>.+?)\*\*|(?<delim>\*)(?<content>[^*]+)\*"
...you can extract the parts that interest you like this:
foreach (Match m in r_Matches)
{
Console.WriteLine("Delimiter: {0}\nContent: {1}",
m.Groups["delim"].Value,
m.Groups["content"].Value);
}
And that's all there is to it. Contrary to one of the other comments, you don't have to muck about with GroupCollections or CaptureCollections, or whatever.
Be aware that this particular problem can be solved easily in almost any flavor. It's just that .NET is more flexible than most.

Dot word pattern matching

I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}
Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.
Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.
In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.
I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.

Categories