check pattern of HTTP GET using REGEXc# - c#

I'm new to RegEx and having trouble getting pattern
have request with first line that look like
GET /someFolder/someSubfolder/someFile.fileExtenstion?param1=abc HTTP/1.1
I would like to check that the correct pattren exist
meaning first word GET later some valid URL than HTTP/verison
What I have till now is
string input = line;
Match match = Regex.Match(input, #"GET /([A-Za-z0-9-.+!*'();:#&=+$,/?%#[]])\ HTTP/1.1",
RegexOptions.IgnoreCase);
// check the Match instance.
if (match.Success)
{
string URL = match.Groups[1].Value;
}
But I get No match
What am I missing ?

You can simplify the regex a lot as
^GET.*HTTP\/1\.1$
^ anchors the regex at the start of the string.
.* matches anything
$ anchors the regex at end of string. Ensures that nothing followes the matched string
Regex Example

Old question but it deserve new answer for anyone looking for correctly matching HTTP Start Line and extract values from it.
The (.*) will not match white space, also escaping forward slash not necessary in C# and will lead to not match .
Here is sample code with named capturing group:
var httpRegex = new Regex(#"^(?<method>[a-zA-Z]+)\s(?<url>.+)\sHTTP/(?<major>\d)\.(?<minor>\d+)$");
var match = httpRegex.Match("GET http://www.google.com HTTP/1.1");
if (match.Success)
{
Console.WriteLine(
$"Method: {match.Groups["method"].Value}\r\n" +
$"Url: {match.Groups["url"].Value}\r\n" +
$"httpVersion: HTTP/{match.Groups["major"].Value}.{match.Groups["minor"].Value}"
);
}
Escaping forward slash required in languages like PHP and JavaScript, and here the same code for PHP with escaping https://regex101.com/r/2l7k83/1/

Related

C# Capturing the first match with regex

I've got an input string that looks like this:
url=https%3A%2F%2Fdomain.com%2Fsale-deal%3Futm_source%3Dinsider-primary-action%3Dinsider-primary-action&utm_source=FB
or
url=https%3A%2F%2Fdomain.com%2Fsale&utm_source=FB&sub_id1=M12
the string sometimes has or non %3Futm_source
how to get link between url= and %3Futm_source% or &utm_source
Regex reg = new Regex(#"url=(https%3A%2F%2Fdomain.com[a-zA-Z0-9-_/%\.]+)%3Futm_source|&utm_source");
Match result = reg.Match(inPut);
Console.WriteLine(result.Groups[1].Value));
it always get from url= to &utm_source
You can use this
(?<=url=).*?(?=%3Futm_source|&utm_source)
(?<=url=) Positive look behind. matches url=.
.* - Matches anything except new line.
(?=%3Futm_source|&utm_source) - Positive look ahead. Matches %3Futm_source or &utm_source
Demo

C# Regex, match but not include the first character before matched string

How can I make this C# Regex to not include the first character before the URL in the matching results:
((?!\").)https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+)
This will match:
Xhttps://twitter.com/oppomobileindia/status/798397636780953600
Notice the first X letter.
I want it to match the URLs that start without double quotes. Also not include the first character before the https for those URLs that do not start with double quotes.
An actual example that I use in my code:
var str = "<div id=\"content\">
<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>
<p>\"https://twitter.com/oppomobileindia/status/11111111111111111111</p></div>";
var pattern = #"(?<!""')https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";//
var rgx = new Regex(pattern);
var results = rgx.Replace(str, "XXX");
In the above example, only the first URL should be replaces, because the second one has double quotation before the URL. It also should be replaced at the exact match, without the first letter before the matches string.
Use a (?<!") negative lookbehind:
var re = #"(?<!"")https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";
The (?<!") means that there cannot be a " immediately before the current location.
In C#, you do not need to escape / inside the pattern since regex delimiters are not used when defining the regex.
Note on the C# syntax: if you want to define a " inside a verbatim string literal, double it. In a regular string literal, escape the " and \:
var re = "(?<!\")https?://twitter\\.com/(?:#!/)?(\\w+)/status(?:es)?/(\\d+)";

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Trouble creating a Regex expression

I'm trying to create a regex expression what will accept a certain format of command. The pattern is as follows:
Can start with a $ and have two following value 0-9,A-F,a-f (ie: $00 - $FF)
or
Can be any value except for "&<>'/"
*if the value start with $ the next two values after need to be a valid hex value from 00-ff
So far I have this
Regex correctValue = new Regex("($[0-9a-fA-F][0-9a-fA-F])");
Any help will be greatly appreciated!
You just need to add "\" symbol before your "$" and it works:
string input = "$00";
Match m = Regex.Match(input, #"^\$[0-9a-fA-F][0-9a-fA-F]$");
if (m.Success)
{
foreach (Group g in m.Groups)
Console.WriteLine(g.Value);
}
else
Console.WriteLine("Didn't match");
If I'm following you correctly, the net result you're looking for is any value that is not in the list "&<>'/", since any combination of $ and two alphanumeric characters would also not be in that list. Thus you could make your expression:
Regex correctValue = new Regex("[^&<>'/]");
Update: But just in case you do need to know how to properly match the $00 - $FF, this would do the trick:
Regex correctValue = new Regex("\$[0-9A-Fa-f]{2}");
In Regular Expression $ use for Anchor assertion, and means:
The match must occur at the end of the string or before \n at the end of the line or string.
try using [$] (Character Class for single character) or \$ (Character Escape) instead.

C# regex pattern to extract urls from given string - not full html urls but bare links as well

I need a regex which will do the following
Extract all strings which starts with http://
Extract all strings which starts with www.
So i need to extract these 2.
For example there is this given string text below
house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue
So from the given above string i will get
www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged
Looking for regex or another way. Thank you.
C# 4.0
You can write some pretty simple regular expressions to handle this, or go via more traditional string splitting + LINQ methodology.
Regex
var linkParser = new Regex(#"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
MessageBox.Show(m.Value);
Explanation
Pattern:
\b -matches a word boundary (spaces, periods..etc)
(?: -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?:// - Match http or https (the '?' after the "s" makes it optional)
| -OR
www\. -literal string, match www. (the \. means a literal ".")
) -end group
\S+ -match a series of non-whitespace characters.
\b -match the closing word boundary.
Basically the pattern looks for strings that start with http:// OR https:// OR www. (?:https?://|www\.) and then matches all the characters up to the next whitespace.
Traditional String Options
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
MessageBox.Show(s);
Using Nikita's reply, I get the url in string very easy :
using System.Text.RegularExpressions;
string myString = "test =) https://google.com/";
Match url = Regex.Match(myString, #"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?");
string finalUrl = url.ToString();
Does not work with html containing URL
For e.g.
<table><tr><td class="sub-img car-sm" rowspan ="1"><img src="https://{s3bucket}/abc/xyzxyzxyz/subject/jkljlk757cc617-a560-48f5-bea1-f7c066a24350_202008210836495252.jpg?X-Amz-Expires=1800&X-Amz-Algorithm=abcabcabc&X-Amz-Credential=AKIAVCAFR2PUOE4WV6ZX/20210107/ap-south-1/s3/aws4_request&X-Amz-Date=20210107T134049Z&X-Amz-SignedHeaders=host&X-Amz-Signature=3cc6301wrwersdf25fb13sdfcfe8c26d88ca1949e77d9e1d9af4bba126aa5fa91a308f7883e"></td><td class="icon"></td></tr></table>
For that need to use below Regular Expression
Regex regx = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

Categories