How to Exclude Subdomain From URI [duplicate] - c#

I get url as
http://orders.mealsandyou.com/default.php
i dont want to use string functions to use it to get the main domain ie
mealsandyou.com
is there any function in c# to do that, UrilAuthority and all gives subdomain too...
Suggestions welcome, not workarounds

.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.

The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co.
In any case I think you're taking the wrong approach. URL rewriting is far more suited to this sort of thing. Have a read of this: learn.iis.net/page.aspx/460/using-the-url-rewrite-module

Related

Regular expressions redirection

I want to set redirection from
www.somesite.com/products/dynamicstring/randomtext1/randomtext2
to www.somesite.com/products/dynamicstring
Is it possible to do that through Regex ?
It means if my incming url is
www.somesite.com/products/myproducts/test1/test2 it should redirect to www.somesite.com/products/myproducts/
just briefing more about this :
#TomLord i am using HttpContext.Current.Response.RedirectPermanent(matchingDefinition.To) i have all the redirects "From" and "To" in a class object, in the form of REGEX expressions.Example in From "/product/*" and To "/products" , i am reading these object and trying to redirect them, but i am not able to redirect something like /products/dynamicstring/randomtext1/ to /products/dynamicstring where dynamic string is random string , i dont find any regular expression which can be use to do this. For example /products/samples/randomtext1 should redirect to /products/samples/
Redirection cannot be done with regex alone. Google a bit what is a regular expression in reality. The short answer is: it's string-like expression that describes search pattern. So it can't redirect, not even replace a substring with substring or do anything else then match and capture parts of the matched string.
That being said, regex can help us do what you wanna. I am gonna assume you can use Javascript, cause I can't put a solution in every language. I am also gonna assume you will try to go over the code not copy paste and press enter. If you only need that hire a programmer. If you use another language, principle should be the same:
obtain URL
define regex
use capture group to extract the part of your URL that you need
construct a new URL
redirect to it
While matching the URLs in general is a fair bit more complex, like:
^(?:https?://)?(?:[\w]+\.)(?:\.?[\w]{2,})+$
As long as you are sure you will only be getting URLs and in the format you wanna, we will do it far simpler.
Basically, let's say you have:
some text with 2 dots that ends in com
then a /products/dynamicstring/
then text
then /
then text
As a regex that is:
/\w*.\w*.com\/products\/dynamicstring\/\w*\/\w*/g
Curde matching is done, but we still need to add a capture group we will use to extract part of the string we need:
/(\w*.\w*.com\/products\/)dynamicstring\/\w*\/\w*/g
Oke, now let's leverage this regex to do rest of the work:
Define regex:
var regex = /\w*.\w*.com\/products\/dynamicstring\/\w*\/\w*/g;
Get current URL. If you already have URL use it.
var currUrl = window.location.href;
Extract capture group from string:
var match = regex.exec(currUrl);
Use that to get a new URL from old one:
var redirectUrl = match[1] + myproducts/
Finally, we redirect with:
window.location.replace(redirectUrl);
I wrote all this straight from my head so I recommend you go over each step, look how it works, read some documentation about functions used. You might find an error as well as learn a lot.

Regular Expression Help Needed to Parse Domain Name

I have a regular expression that returns the top level domain of a URL regardless of whether it is .com, .com.au, etc. and parses out any subdomains. I need to modify it to return both the top level domain and the first subdomain. So basically if I have for the input
http://test1.hello.mydomain.com.au
it should return
hello.mydomain
Can someone help me with this? Here is what I have for grabbing just the top level domain:
(?<=^(?:(?:ht|f)tps?)?://)[^/]+?(?=(?:\.(?:[a-z]{2,3}?\.[a-z]{2}|[a-z]{2,3}))(?:/|$))
This is not a problem that can be solved using regular expressions alone. You are looking for the Public Suffix List, which contains program-readable information about how to split up domain names in the way you describe.

Need C# regexp for URL validation

How to validate by a single regular expression the urls:
http://83.222.4.42:8880/listen.pls
http://www.my_site.com/listen.pls
http://www.my.site.com/listen.pls
to be true?
I see that I formulated the question not exactly :(, sorry my mistake. The idea is that I want to validate with the help of regexp valid urls, let it be an external ip address or the domain name. This is the idea, other valid urls can be considered:
http://93.122.34.342/
http://193.122.34.342/abc/1.html
http://www.my_site.com/listen2.pls
http://www.my.site.com/listen.php
and so on.
The road to hell is paved with string parsing.
URL parsing in particular is the source of many, many exploited security issues. Don't do it.
For example, do you want this to match?
Note the uppercase scheme section. Remember that some parts of a URL are case sensitive, and some are not. Then there's encoding rules. Etc.
Start by using System.Uri to parse the URLs you provide:
var uri = new Uri("http://83.222.4.42:8880/listen.pls");
Then you can write things like:
if (uri.Scheme == "http" &&
uri.Host == "83.222.4.42" &&
uri.AbsolutePath == "/listen.pls"
)
{
// ...
}
^http://.+/listen\.pls$
If there are strictly only 3 of them don't bother with a regular expression because there is not necessarily a good pattern match when everything is already strictly known - in fact you might accidentally match more than these three urls - which becomes a problem if the urls are intended for security purposes or something equally important. Instead, test the three cases directly - maybe put them in a configuration file.
In the future if you want to add more URLs to the list you'll likely end up with an overly complicated regular expression that's increasingly hard to maintain and takes the place of a simpler check against a small list.
You won't necessarily get speed gains by running Regex to find these three strings - in fact it might be quite expensive.
Note: If you wantUri regular expressions also try websites hosting libraries like Regex Library - there are many to pick and choose from if your needs change.
/^http:\/\/[-_a-zA-Z0-9.]+(:\d+)?\/listen\.pls$/
Do you mean any URL ending with /listen.pls? In that case try this:
^http://[^/]+/listen\.pls$
or if the protocol identifier must be optional:
^[http://]?[^/]+/listen\.pls$
Anyway take a look here, maybe it is useful for you: Url and Email validation using Regex
A modified version base upon Jay Bazuzi's solution above since I can't post code in comment, it checks a blacklisted extensions (I do this only for demonstration purpose, you should strongly consider to build a whitelist rather than a blacklist) :
string myurl = "http://www.my_site.com/listen.pls";
Uri myUri = new Uri(myurl);
string[] invalidExtensions = {
".pls",
".abc"
};
foreach(string invalidExtension in invalidExtensions) {
if (invalidExtension.ToLower().Equals(System.IO.Path.GetExtension(myUri.AbsolutePath))) {
//Logic here
}
}

Regular expression to define format of backup filenames

In the application I am currently working on, I have an option to create automatic backups of a certain file on the hard disk. What I would like to do is offer the user the possibility to configure the name of the file and its extension.
For example, the backup filename could be something like : "backup_month_year_username.bak". I had the idea to save the format in the form of a regular expression. For the example above, the regexp would look like :
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w).(?<extension>bak)$"
I thought about using regex because I will also have to browse through the directory of backuped files to delete those older than a certain date. The main trouble I have now is how to create a filename using the regex. In a way I should replace the tags with the information. I could do that using regex.replace and another regex, but I feel it's a big weird doing that and it might be a better way.
Thanks
[Edit] Maybe I wasn't really clear in the first go, but the idea is of course that the user (in this case an admin that will know regex syntax) will have the possibility to modify the form of the filename, that's all the idea behind it[/Edit]
... and if the regex changes, it is next to impossible to reconstruct a string from a given regex.
Edit:
Create some predefined "place-holders": %u could be the user's name, %y could be the year, etc.:
backup_%m_%y_%u.bak
and then simple replace the %? with their actual values.
It sounds like you're trying to use the regular expression to create the file name from a pattern which the user should be able to specify.
Regular expressions can - AFAIK - not be used to create output, but only to validate input, so you'd have the user specify two things:
a file name production pattern like Bart suggested
a validation pattern in form of a regular expression that helps you split the file names into their parts
EDIT
By the way, your sample regex contains an error: The "." is use for "any character", also \w only matches one word character, so I guess you meant to write
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
If the filename is always in this form, there is no reason for a regex, as it's easier to process with string.Split ...
With Bart's solution it is easy enough to split (using string.Split) the generated file name using underscore as the delimiter, to get back the information.
Ok, I think I have found a way to use only the regex. As I am using groups to get the information, I will use another regular expression to match the regular expression and replace the groups with the value:
Regex rgx = new Regex("\(\?\<Month\>.+?\)");
rgx.Replace("^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
, DateTime.Now.Month.ToString());
Ok, it's really a hack, but at least it works and I have only one pattern defined by the user. It might not work if the regex is too complex, but I think I can deal with that problem.
What do you think?

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.
I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.
I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."
I don't believe it's possible, but I am intrigued why you would want it.
In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.
It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

Categories