Match text separated by SINGLE forward slash only - c#

I am trying to split strings similar to this using Regex.Split:
https://www.linkedin.com/in/someone
To return this:
https://www.linkedin.com
in
someone
Effectively, ignoring double forward slash and only worrying about a single forward slash.
I know I should be using something like this /(?!/) negative look ahead - but can't get it to work.
This is not a duplicate of this Similar Question, because if you run that regular expression through Regex.Split, it does not give the required result.

How about this: (?<!/)/(?!/)
Breaking it down:
(?<!/): negative lookbehind for / characters
/: match a single / character
(?!/): negative lookahead for / characters
Taken together, we match a / character that does not have a / both before and after it.
Example usage:
string text = "https://www.linkedin.com/in/someone";
string[] tokens = Regex.Split(text, "(?<!/)/(?!/)");
foreach (var token in tokens)
{
Console.WriteLine($"Token: {token}");
}
Output:
Token: https://www.linkedin.com
Token: in
Token: someone

Also you can do it using this code :
string pattern = #"([^\/]+(\/{2,}[^\/]+)?)";
string input = #"https://www.linkedin.com/in/someone";
foreach(Match match in Regex.Matches(input, pattern)) {
Console.WriteLine(match);
}
Output :
https://www.linkedin.com
in
someone

As mentioned by #Panagiotis Kanavos in the comments section above, why make things complicated when you can use the Uri Class:
Provides an object representation of a uniform resource identifier (URI) and easy access to the parts of the URI.
public static void Main()
{
Uri myUri = new Uri("https://www.linkedin.com/in/someone");
string host = myUri.Scheme + Uri.SchemeDelimiter + myUri.Host;
Console.WriteLine(host);
}
OUTPUT:
DEMO:
dotNetFiddle

Related

Use RegEx to extract specific part from string

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?
Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.
please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here
You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.
var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1
Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

check pattern of HTTP GET using REGEXc#

I'm new to RegEx and having trouble getting pattern
have request with first line that look like
GET /someFolder/someSubfolder/someFile.fileExtenstion?param1=abc HTTP/1.1
I would like to check that the correct pattren exist
meaning first word GET later some valid URL than HTTP/verison
What I have till now is
string input = line;
Match match = Regex.Match(input, #"GET /([A-Za-z0-9-.+!*'();:#&=+$,/?%#[]])\ HTTP/1.1",
RegexOptions.IgnoreCase);
// check the Match instance.
if (match.Success)
{
string URL = match.Groups[1].Value;
}
But I get No match
What am I missing ?
You can simplify the regex a lot as
^GET.*HTTP\/1\.1$
^ anchors the regex at the start of the string.
.* matches anything
$ anchors the regex at end of string. Ensures that nothing followes the matched string
Regex Example
Old question but it deserve new answer for anyone looking for correctly matching HTTP Start Line and extract values from it.
The (.*) will not match white space, also escaping forward slash not necessary in C# and will lead to not match .
Here is sample code with named capturing group:
var httpRegex = new Regex(#"^(?<method>[a-zA-Z]+)\s(?<url>.+)\sHTTP/(?<major>\d)\.(?<minor>\d+)$");
var match = httpRegex.Match("GET http://www.google.com HTTP/1.1");
if (match.Success)
{
Console.WriteLine(
$"Method: {match.Groups["method"].Value}\r\n" +
$"Url: {match.Groups["url"].Value}\r\n" +
$"httpVersion: HTTP/{match.Groups["major"].Value}.{match.Groups["minor"].Value}"
);
}
Escaping forward slash required in languages like PHP and JavaScript, and here the same code for PHP with escaping https://regex101.com/r/2l7k83/1/

How to find a string with missing fragments?

I'm building a chatbot in C# using AIML files, at the moment I've this code to process:
<aiml>
<category>
<pattern>a * is a *</pattern>
<template>when a <star index="1"/> is not a <star index="2"/>?</template>
</category>
</aiml>
I would like to do something like:
if (user_string == pattern_string) return template_string;
but I don't know how to tell the computer that the star character can be anything, and expecially that can be more than one word!
I was thinking to do it with regular expressions, but I don't have enough experience with it. Can somebody help me? :)
Using Regex
static bool TryParse(string pattern, string text, out string[] wildcardValues)
{
// ^ and $ means that whole string must be matched
// Regex.Escape (http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape(v=vs.110).aspx)
// (.+) means capture at least one character and place it in match.Groups
var regexPattern = string.Format("^{0}$", Regex.Escape(pattern).Replace(#"\*", "(.+)"));
var match = Regex.Match(text, regexPattern, RegexOptions.Singleline);
if (!match.Success)
{
wildcardValues = null;
return false;
}
//skip the first one since it is the whole text
wildcardValues = match.Groups.Cast<Group>().Skip(1).Select(i => i.Value).ToArray();
return true;
}
Sample usage
string[] wildcardValues;
if(TryParse("Hello *. * * to *", "Hello World. Happy holidays to all", out wildcardValues))
{
//it's a match
//wildcardValues contains the values of the wildcard which is
//['World','Happy','holidays','all'] in this sample
}
By the way, you don't really need Regex for this, it's overkill. Just implement your own algorithm by splitting the pattern into tokens using string.Split then finding each token using string.IndexOf. Although using Regex does result in shorter code
Do you think this should work for you?
Match match = Regex.Match(pattern_string, #"<pattern>a [^<]+ is a [^<]+</pattern>");
if (match.Success)
{
// do something...
}
Here [^<]+ represents for one or more characters which is/are not <
If you think you may have < character in your *, then you can simply use .+ instead of [^<]+
But this will be risky as .+ means any characters having one or multiple times.

c# regex to extract link after =

Couldn't find better title but i need a Regex to extract link from sample below.
snip... flashvars.image_url = 'http://domain.com/test.jpg' ..snip
assuming regex is the best way.
thanks
Consider the following sample code. It shows how one might extract from your supplied string. But I have expanded upon the string some. Generally, the use of .* is too all inclusive (as the example below demonstrates).
The main point, is there are several ways to do what you are asking, the first answer given uses "look-around" while the second suggests the "Groups" approach. The choice mainly depend upon your actual data.
string[] tests = {
#"snip... flashvars.image_url = 'http://domain.com/test.jpg' ..snip",
#"snip... flashvars.image_url = 'http://domain.com/test.jpg' flashvars2.image_url = 'http://someother.domain.com/test.jpg'",
};
string[] patterns = {
#"(?<==\s')[^']*(?=')",
#"=\s*'(.*)'",
#"=\s*'([^']*)'",
};
foreach (string pattern in patterns)
{
Console.WriteLine();
foreach (string test in tests)
foreach (Match m in Regex.Matches(test, pattern))
{
if (m.Groups.Count > 1)
Console.WriteLine("{0}", m.Groups[1].Value);
else
Console.WriteLine("{0}", m.Value);
}
}
A simple regex for this would be #"=\s*'(.*)'".
Edit: New regex matching your edited question:
You need to match what's between quotes, after a =, right?
#"(?<==\s*')[^']*(?=')"
should do.
(?<==\s*') asserts that there is a =, optionally followed by whitespace, followed by a ', just before our current position (positive lookbehind).
[^']* matches any number of non-' characters.
(?=') asserts that the match stops before the next '.
This regex doesn't check if there is indeed a URL inside those quotes. If you want to do that, use
#"(?<==\s*')(?=(?:https?|ftp|mailto)\b)[^']*(?=')"

Regular expression to use which matches text before .html and after /

With this string
http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html
I need to get sdf-as
with this
hellow-1/yo-sdf.html
I need yo-sdf
This should get you want you need:
Regex re = new Regex(#"/([^/]*)\.html$");
Match match = re.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
Console.WriteLine(match.Groups[1].Value); //Or do whatever you want with the value
This needs using System.Text.RegularExpressions; at the top of the file to work.
There are many ways to do this. The following uses lookarounds to match only the filename portion. It actually allows no / if such is the case:
string[] urls = {
#"http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html",
#"hellow-1/yo-sdf.html",
#"noslash.html",
#"what-is/this.lol",
};
foreach (string url in urls) {
Console.WriteLine("[" + Regex.Match(url, #"(?<=/|^)[^/]*(?=\.html$)") + "]");
}
This prints:
[sdf-as]
[yo-sdf]
[noslash]
[]
How the pattern works
There are 3 parts:
(?<=/|^) : a positive lookbehind to assert that we're preceded by a slash /, or we're at the beginning of the string
[^/]* : match anything but slashes
(?=\.html$): a positive lookahead to assert that we're followed by ".html" (literally on the dot)
References
regular-expressions.info/Lookarounds, Anchors
A non-regex alternative
Knowing regex is good, and it can do wonderful things, but you should always know how to do basic string manipulations without it. Here's a non-regex solution:
static String getFilename(String url, String ext) {
if (url.EndsWith(ext)) {
int k = url.LastIndexOf("/");
return url.Substring(k + 1, url.Length - ext.Length - k - 1);
} else {
return "";
}
}
Then you'd call it as:
getFilename(url, ".html")
API links
String.Substring, EndsWith, and LastIndexOf
Attachments
Source code and output on ideone.com
Try this:
string url = "http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html";
Match match = Regex.Match(url, #"/([^/]+)\.html$");
if (match.Success)
{
string result = match.Groups[1].Value;
Console.WriteLine(result);
}
Result:
sdf-as
However it would be a better idea to use the System.URI class to parse the string so that you correctly handle things like http://example.com/foo.html?redirect=bar.html.
using System.Text.RegularExpressions;
Regex pattern = new Regex(".*\/([a-z\-]+)\.html");
Match match = pattern.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
if (match.Success)
{
Console.WriteLine(match.Value);
}
else
{
Console.WriteLine("Not found :(");
}
This one makes the slash and dot parts optional, and allows the file to have any extension:
new Regex(#"^(.*/)?(?<fileName>[^/]*?)(\.[^/.]*)?$", RegexOptions.ExplicitCapture);
But I still prefer Substring(LastIndexOf(...)) because it is far more readable.

Categories