In my application, I must read a URL and do something if the URL contains Basic authentication credentials. An example of such a URL is
http://username:password#example.com
Is the regular expression below a good fit for my task? I am to capture four groups into local variables. The URL is passed to another internal library that will do further work to ensure the URL is valid before opening a connection.
^(.+?//)(.+?):(.+?)#(.+)$
It looks ok, and I think that a regular expression is good to use in this case. A couple of suggestions:
1) I think that named groups would make your code more readable, i.e:
^(?<protocol>.+?//)(?<username>.+?):(?<password>.+?)#(?<address>.+)$
Then you can simply write
Match match = Regex.Match(string, pattern);
if (match.Success) {
string user = match.Groups["username"];
2) then you could make the expression a little more strict, e.g. using \w when possible instead of .:
^(?<protocol>\w+://)...
Your regex seems OK, but why not use the thoroughly-tested and nearly-compliant Uri class? It's then trivial to access the pieces you want without worrying about spec-compatibility:
var url = new Uri("http://username:password#example.com");
var userInfo = url.UserInfo.Split(':');
var username = userInfo[0];
var password = userInfo[1];
Related
I want to set redirection from
www.somesite.com/products/dynamicstring/randomtext1/randomtext2
to www.somesite.com/products/dynamicstring
Is it possible to do that through Regex ?
It means if my incming url is
www.somesite.com/products/myproducts/test1/test2 it should redirect to www.somesite.com/products/myproducts/
just briefing more about this :
#TomLord i am using HttpContext.Current.Response.RedirectPermanent(matchingDefinition.To) i have all the redirects "From" and "To" in a class object, in the form of REGEX expressions.Example in From "/product/*" and To "/products" , i am reading these object and trying to redirect them, but i am not able to redirect something like /products/dynamicstring/randomtext1/ to /products/dynamicstring where dynamic string is random string , i dont find any regular expression which can be use to do this. For example /products/samples/randomtext1 should redirect to /products/samples/
Redirection cannot be done with regex alone. Google a bit what is a regular expression in reality. The short answer is: it's string-like expression that describes search pattern. So it can't redirect, not even replace a substring with substring or do anything else then match and capture parts of the matched string.
That being said, regex can help us do what you wanna. I am gonna assume you can use Javascript, cause I can't put a solution in every language. I am also gonna assume you will try to go over the code not copy paste and press enter. If you only need that hire a programmer. If you use another language, principle should be the same:
obtain URL
define regex
use capture group to extract the part of your URL that you need
construct a new URL
redirect to it
While matching the URLs in general is a fair bit more complex, like:
^(?:https?://)?(?:[\w]+\.)(?:\.?[\w]{2,})+$
As long as you are sure you will only be getting URLs and in the format you wanna, we will do it far simpler.
Basically, let's say you have:
some text with 2 dots that ends in com
then a /products/dynamicstring/
then text
then /
then text
As a regex that is:
/\w*.\w*.com\/products\/dynamicstring\/\w*\/\w*/g
Curde matching is done, but we still need to add a capture group we will use to extract part of the string we need:
/(\w*.\w*.com\/products\/)dynamicstring\/\w*\/\w*/g
Oke, now let's leverage this regex to do rest of the work:
Define regex:
var regex = /\w*.\w*.com\/products\/dynamicstring\/\w*\/\w*/g;
Get current URL. If you already have URL use it.
var currUrl = window.location.href;
Extract capture group from string:
var match = regex.exec(currUrl);
Use that to get a new URL from old one:
var redirectUrl = match[1] + myproducts/
Finally, we redirect with:
window.location.replace(redirectUrl);
I wrote all this straight from my head so I recommend you go over each step, look how it works, read some documentation about functions used. You might find an error as well as learn a lot.
I have a URL and need to extract the port, username and password from it and put them into an array. It looks like following.
http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts
Can I use some other method without replaces or substring?
One of the ways in C#
Get the query parameter
var parsedQuery = HttpUtility.ParseQueryString("http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts");
Then, below will give the username
parsedQuery["username"]
For Password:
parsedQuery["password"]
For port you can use URI :
Uri uri = new Uri("http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts");
Get the port by
uri.Port
Create an array or use whatever you require to club.
I don't know C#, but here's one that works for Python. It's pretty straightforward so you should be able to convert.
:(?P<port>[0-9]+).*username=(?P<username>[a-zA-Z0-9]+).*password=(?P<password>[a-zA-Z0-9]+)
The (?P<foo>bar) syntax is a named capture group that will put a variable matching the pattern 'bar' into a variable called 'foo' when you extract them.
Here is another possible solution with pure C# regex:
var url = "http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts";
var urlRegex = new Regex(#"(?<=(http(s)?://)?\w+(\.\w+)*:)\d+(?=/.*)?");
var usernameRegex = new Regex(#"(?<=(\?|&)username=).*?(?=&|$)", RegexOptions.IgnoreCase);
var passwordRegex = new Regex(#"(?<=(\?|&)password=).*?(?=&|$)", RegexOptions.IgnoreCase);
Console.WriteLine(urlRegex.Match(url));
Console.WriteLine(usernameRegex.Match(url));
Console.WriteLine(passwordRegex.Match(url));
If there are any parts that don't change, e.g. if it's always the same url you could just replace it like this
string str = "http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts"
str.Replace("http://myproject.ddns.net","");
This would leave you ":8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts"
There is nothing stopping you repeating the process with another section.
As for regex you could use Regex.Match https://msdn.microsoft.com/en-us/library/twcw2f1c(v=vs.110).aspx to get the parts you want.
You could use ":\d{4}/" to get the port - you'd have to strip the leading ":" and trailing "/" though; this "username=\w*\&" to get the username - you'd have to strip the leading "username=" and trailing "&" though; and for the password you could use "password=\w*\&" - you'd have to strip the leading "password=" and trailing "&" though.
If you'd like to experiment with regex this site https://regex101.com/ is pretty good.
Though I've been looking through some of the classes I've been having a hard time finding an efficient way to parse/regex domains (both root and subdomains, while including things like .co.uk, etc).
Is there a function that can validate whether or not it is a proper domain/url without actually connecting to the site? My goal is to use this for a large list of URL's to grab pretty much anything before (and including) the TLD.
You'll have to tweak the regex for your particular situation, but this gives you a point where to start:
const string pattern = #"http[s]?://(?<Domain>([a-zA-Z0-9\-]+?\.)*([a-zA-Z0-9\-]+\.)*([a-zA-Z]{3,61}|[a-zA-Z]{1,}\.[a-zA-Z]{2,3}))"; //";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(myURL);
var domain = match.Groups["Domain"].Value;
How to validate by a single regular expression the urls:
http://83.222.4.42:8880/listen.pls
http://www.my_site.com/listen.pls
http://www.my.site.com/listen.pls
to be true?
I see that I formulated the question not exactly :(, sorry my mistake. The idea is that I want to validate with the help of regexp valid urls, let it be an external ip address or the domain name. This is the idea, other valid urls can be considered:
http://93.122.34.342/
http://193.122.34.342/abc/1.html
http://www.my_site.com/listen2.pls
http://www.my.site.com/listen.php
and so on.
The road to hell is paved with string parsing.
URL parsing in particular is the source of many, many exploited security issues. Don't do it.
For example, do you want this to match?
Note the uppercase scheme section. Remember that some parts of a URL are case sensitive, and some are not. Then there's encoding rules. Etc.
Start by using System.Uri to parse the URLs you provide:
var uri = new Uri("http://83.222.4.42:8880/listen.pls");
Then you can write things like:
if (uri.Scheme == "http" &&
uri.Host == "83.222.4.42" &&
uri.AbsolutePath == "/listen.pls"
)
{
// ...
}
^http://.+/listen\.pls$
If there are strictly only 3 of them don't bother with a regular expression because there is not necessarily a good pattern match when everything is already strictly known - in fact you might accidentally match more than these three urls - which becomes a problem if the urls are intended for security purposes or something equally important. Instead, test the three cases directly - maybe put them in a configuration file.
In the future if you want to add more URLs to the list you'll likely end up with an overly complicated regular expression that's increasingly hard to maintain and takes the place of a simpler check against a small list.
You won't necessarily get speed gains by running Regex to find these three strings - in fact it might be quite expensive.
Note: If you wantUri regular expressions also try websites hosting libraries like Regex Library - there are many to pick and choose from if your needs change.
/^http:\/\/[-_a-zA-Z0-9.]+(:\d+)?\/listen\.pls$/
Do you mean any URL ending with /listen.pls? In that case try this:
^http://[^/]+/listen\.pls$
or if the protocol identifier must be optional:
^[http://]?[^/]+/listen\.pls$
Anyway take a look here, maybe it is useful for you: Url and Email validation using Regex
A modified version base upon Jay Bazuzi's solution above since I can't post code in comment, it checks a blacklisted extensions (I do this only for demonstration purpose, you should strongly consider to build a whitelist rather than a blacklist) :
string myurl = "http://www.my_site.com/listen.pls";
Uri myUri = new Uri(myurl);
string[] invalidExtensions = {
".pls",
".abc"
};
foreach(string invalidExtension in invalidExtensions) {
if (invalidExtension.ToLower().Equals(System.IO.Path.GetExtension(myUri.AbsolutePath))) {
//Logic here
}
}
Given this regex:
^((https?|ftp):(\/{2}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}
|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1})
Reformatted for readability:
#"^((https?|ftp):(\/{2}))?" + // http://, https://, ftp:// - Protocol Optional
#"(" + // Begin URL payload format section
#"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" + // IPv4 Address support
#")|("+ // Delimit supported payload types
#"((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1}" + // FQDNs
#")"; // End URL payload format section
How can I make it fail (i.e. not match) on this "fail" test case?
http://www.google
As I am specifying {1} on the TLD section, I would think it would fail without the extension. Am I wrong?
Edit: These are my PASS conditions:
"http://www.zi255.com?Req=Post&PID=4",
"http://www.zi255.com?Req=Post&ID=4",
"http://www.zi255.com/?Req=Post&PID=4",
"http://www.zi255.com?Req=Post&PostID=4",
"http://www.zi255.com/?Req=Post&ID=4"
"http://www.zi255.com?Req=Post&Post=4",
"http://www.zi255.com?Req=Post&Entry=4",
"http://www.zi255.com?PID=4"
"http://www.zi255.com/Post.aspx?Req=Post&ID=4",
"http://www.zi255.com/Post.aspx?Req=Post&PID=4",
"http://www.zi255.com/Post.aspx?Req=Post&Post=4",
"http://www.zi255.com/Post.aspx?Req=Post&Title=Random%20Post%20Name"
"http://www.zi255.com/?Req=Post&Title=Random%20Post%20Name",
"http://www.zi255.com?Req=Post&Title=Random%20Post%20Name",
"http://www.zi255.com?Req=Post&PostID=4",
"http://www.zi255.com?Req=Post&Post=4",
"http://www.zi255.com?Req=Post&Entry=4",
"http://www.zi255.com?PID=4"
"http://www.zi255.com",
"http://www.damnednice.com"
These are my FAIL conditions:
"http://.com",
"http://.com/",
"http:/www.google.com",
"http:/www.google.com/",
"http://www.google",
"http://www.googlecom",
"http://www.google.c",
".com",
"https://www..."
I'll throw out an alternative suggestion. You may want to use a combination of the parsing of the built-in System.Uri class and a couple targeted regexes (or simple string checks when appropriate).
Example:
string uriString = "...";
Uri uri;
if (!Uri.TryCreate(uriString, UriKind.Absolute, out uri))
{
// Uri is totally invalid!
}
else
{
// validate the scheme
if (!uri.Scheme.Equals("http", StringComparison.OrdinalIgnoreCase))
{
// not http!
}
// validate the authority ('www.blah.com:1234' portion)
if (uri.Authority // ...)
{
}
// ...
}
Sometimes, one catch-all reqex is not the best solution, however tempting. While debugging this regex is feasible (see Greg Hewgills answer), consider doing a couple of tests for different categories of problems, e.g. one test for numerical addresses and one test for named addresses.
You need to force your regex to match up until the end of the string. Add a $ at the very end of it. Otherwise, your regex is probably just matching http://, or something else shorter than your whole string.
The "validate a url" problem has been solved* numerous times. I suggest you use the System.Uri class, it validates more cases than you can shake a stick at.
The code Uri uri = new Uri("http://whatever"); throws a UriFormatException if it fails validation. That is probably what you'd want.
*) Or kind of solved. It's actually pretty tricky to define what is a valid url.
Its all about definitions, a "valid url" should provide you with a IP address when you do a DNS Lookup. The IP should be connected to and when a request is send out, you get a reply in the form of a HTML information that you can use.
So what we are looking for is a "valid URL Format" and that is where the system.uri comes in very handy. BUT, if the URL is hidden in a large piece of tekst, you would first like to find something that validates as a valid URL-Format.
The thing that distinquishes a URL from any given readable tekst is the dot not followed by whitespace. "123.com" could validate as a real URL.
Using the regex
[a-z_\.\-0-9]+\.[a-z]+[^ ]*
to find any possible valid url in a text and then do a system.uri check to see if its a valid URL format and then do a lookup. Only when the lookup gives you a result then you know the URL is valid.