Regex find a string within an url, non case sensitive - c#

I have the url:
http://primarydomain.com/sites/secondarydomain/?foo=bar
What regex expression could I use to match the url for sites/secondarydomain - not case sensitive (this is for a rule in a web.config file but requires standard regex)?
To put it into context, I am writing a web.config url rewrite rule to remove sites/secondarydomain from all urls (due to a multiple sites being hosted on the same package).
<rule name="Remove full hosting path">
<match url="***Regex goes here***" ignoreCase="true"/>
<action type="Redirect" url="http://secondary.com/{R:1}" redirectType="Permanent" />
</rule>
I am looking to match only the directories (not the query string) in order to redirect the user (hence removing the sites/secondarydomain).
Update: It looks like I want to rewrite the url and not redirect, here is the current web.config rule that doesn't quite work:
<rule name="TestRule">
<match url=".*" />
<conditions>
<add input="{PATH_INFO}" pattern="^(/hostedsites/clemones_htdocs)(/.*)"/>
</conditions>
<action type="Rewrite" url="\{C:2}" appendQueryString="true" />
</rule>
Where my secondary domain is http://clemones.com/
and the path I'm trying to get rid of: http://clemones.com/hostedsites/clemones_htdocs/
FOR testing, http://clemones.com/shizzle works as a destination (hence sadly http://clemones.com/hostedsites/clemones_htdocs/shizzle also works).
Thanks in advance

Have you tried:
To elaborate, this only applies the regex to the path, not the root url:
<rule name="TestRule">
<match url=".*" />
<conditions>
<add input="{PATH_INFO}" pattern="^(/sites/secondarydomain)(/.*)"/>
</conditions>
<action type="Rewrite" url="\{C:2}" appendQueryString="true" />
</rule>
There are multiple groups resulting from the condition, {C:2} represents everything that comes after "/sites/secondarydomain/", excluding the query string which is appended by choosing "appendQueryString=true".
It allows you to break out the parts you want to take action on, so yes it is different than just applying a regular expression to the entire url.
Here is an article that explains how this works:
http://weblogs.asp.net/owscott/archive/2010/01/26/iis-url-rewrite-hosting-multiple-domains-under-one-site.aspx

Try a lookbehind (?<=(http://primarydomain.com/))[^\b]*
EDIT:
If you want to exclude the querystring... (?<=(http://primarydomain.com/))[^?]*
If you want to be more strict for whatever reason (like only allowing alphabet characters in the directory), you can try something like this (?<=(http://primarydomain.com/))[a-zA-Z/]*[a-zA-Z]

if the domain is always going to be http://primarydomain.com/sites/ then I would attack it like this:
match url="http://primarydomain.com/sites/([A-Za-z0-9_]+)/.*";

A combination lookbehind and lookahead will match the string you want:
(?<=.\w+/)\w+/\w+(?=/.*)
That being said, the {R:1} in your example really looks like a Regex backreference, so maybe that's why things aren't working as expected. If this is true, you may need something like this instead:
.\w+/(\w+/\w+)
Never done IIS rewriting, so YMMV. The two regular expressions do work (tested) on the examples you've given so far, and more generic URLs like:
http://primarydomain.com/hostedsites/clemones_htdocs/index.aspx?foo=bar
http://anydomain.net/sites/secondarydomain/index.aspx?foo=bar
...

Related

IIS redirect rule with regex inside condition

I get 500 error when having this rule:
<rule name="Remove Query String" stopProcessing="true">
<match url="(.*)" />
<conditions>
<add input="{QUERY_STRING}" pattern="^url=[^&]+" />
</conditions>
<action type="Redirect" url="{C:1}webp" appendQueryString="false" />
Problem is inside
<add input="{QUERY_STRING}" pattern="^url=[^&]+" />,
actually pattern seems wrong but works corrently when I check it online. All parsers parse it.
What I want to achieve is to redirect all URL that have query string url= to url whose value corresponds to the value of url in initial request but ignroeing everything after &, or everything after something else (like webp for example) That is reason I want to separate pattern in multiple logical groups.

Redirect www to non www urls

My goal is to redirect all www.* urls to non-www urls. For example:
If the url is www.mydomain.com/users it should redirect to mydomain.com/users.
In order to achieve that I have written the following code in my web.config:
<rule name="Redirect www.* urls to non www" stopProcessing="true">
<match url="*" />
<conditions>
<add input="{HTTP_HOST}" pattern="^www$" />
</conditions>
<action type="Redirect" url="{HTTP_HOST}/{R:0}" redirectType="Permanent"/>
</rule>
but it does nothing and I can see the www urls not redirecting to non www urls.
Can you share what I am doing wrong there?
Do note that I don't want to add any hard coded domain in that rule. I want to make it generic.
I need a generic solution where in my rule there is no where a hard coded domain and a hard coded protocol is present.
Well, here is the solution I came up with.
I have provided the solution with all the details along with comments comments for the regex, capturing groups etc. used in the rule:
<rule name="Redirect www.* urls to non www" enabled="true">
<!--Match all urls-->
<match url="(.*)"/>
<!--We will be capturing two groups from the below conditions.
One will be domain name (foo.com) and the other will be the protocol (http|https)-->
<!--trackAllCaptures added for tracking Capture Groups across all conditions-->
<conditions trackAllCaptures="true">
<!-- Capture the host.
The first group {C:1} will be captured inside parentheses of ^www\.(.+)$ condition,
It will capture the domain name, example: foo.com. -->
<add input="{HTTP_HOST}" negate="false" pattern="^www\.(.+)$"/>
<!-- Capture the protocol.
The second group {C:2} will be captured inside parentheses of ^(.+):// condition.
It will capture protocol, i.e http or https. -->
<add input="{CACHE_URL}" pattern="^(.+)://" />
</conditions>
<!-- Redirect the url too {C:2}://{C:1}{REQUEST_URI}.
{C:2} captured group will have the protocol and
{C:1} captured group will have the domain name.
"appendQueryString" is set to false because "REQUEST_URI" already contains the orignal url along with the querystring.
redirectType="Permanent" is added so as to make a 301 redirect. -->
<action type="Redirect" url="{C:2}://{C:1}{REQUEST_URI}" appendQueryString="false" redirectType="Permanent"/>
</rule>
It will do the following redirects:
http://www.foo.com -> http://foo.com
https://www.foo.com -> https://foo.com
http://www.foo.com?a=1 -> http://foo.com?a=1
https://www.foo.com?a=1 -> https://foo.com?a=1

URL Rewriting - Append page name along with query string in URL

I want to add some URL rewriting stuff in my web.config
The source URL:
http://constant.com/caam/verifying/?token=kpG1TwYo2KqTS%2bKg%2fY6lVm2Gt
Need to convert it to URL:
http://constant.com/caam/verifying/default.aspx?token=kpG1TwYo2KqTS%2bKg%2fY6lVm2Gt
Any ideas on how to accomplish this or other suggestions much appreciated it.
A basic redirect rule in IIS (web.config) might look like:
<rule name="Token Redirect" stopProcessing="true">
<match url="caam/verifying.*" />
<conditions trackAllCaptures="true">
<add input="{QUERY_STRING}" pattern="&?(token=[^&]+)&?" />
<add input="{REQUEST_URI}" pattern="default.aspx" negate="true" />
</conditions>
<action type="Redirect" url="/caam/verifying/default.aspx?{C:1}" appendQueryString="false" redirectType="Found" />
</rule>
You can change the match url, but basically this is matching everything that starts caam/verifying.
It then (additionally) checks that the query string has "token=" in it somewhere, and captures its value (it will go into the capture 1 as there's nothing else here, eg {C:1}).
We then output the redirect as /caam/verifying/default.aspx?{C:1} (where {C:1} is "token=12345", for example).
Note that this rule will only get hit if the URL matches (the caam/verifying part) and the parameters match (there's the "token=" part) - this redirect rule gets skipped otherwise.
EDIT
I've added an additional "negate" rule to not match against the "default.aspx" page.

Reg Ex validates in Tets but not in IIS

I have the following Match expression:
((?:[a-z0-9\-]*\.){1,}[a-z0-9\-]*)/training/([A-Za-z0-9]+)/$
Which works for:
http://training.dev.local/training/xxxxxxx/
However when the rewrite rule is applied to the web config it is not recognised for a C# web application.
<rule name="Train redirect" stopProcessing="true">
<match url="((?:[a-z0-9\-]*\.){1,}[a-z0-9\-]*)/training/([A-Za-z0-9]+)/$" ignoreCase="true" />
<action type="Rewrite" url="train-redirect/?code={R:2}" />
</rule>
I'm using regex101 to test: https://regex101.com/r/sL2nA6/3
Looking at Creating Rewrite Rules for the URL Rewrite Module tutorial, it is shown that only the path in the URL is matched against the regular expression. Therefore, your regex can be written as:
^training/([A-Za-z0-9]+)/$
(ignore case rule is unnecessary with the above regex)
and the rewrite action should be changed accordingly to:
<action type="Rewrite" url="train-redirect/?code={R:1}" />

Reading capture groups from Regex which are also URL rewrite rules? possible?

Given a URL rewrite rule like the following:
<rule name="RewriteUserFriendlyThings" stopProcessing="true">
<match url="^cat/sub-cat/(\d+)/([^/]+)/?$" />
<conditions>
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
</conditions>
<action type="Rewrite" url="cat/sub-cat/detail.aspx?id={R:2}" />
</rule>
In C# code I need to read out the value in the second group of the pattern (which is my ID) for my Bookmark button (don't ask) to work with pages that are dynamic like this. I'm using a certain CMS which does things at publish time, and we missed out bookmarking dynamic content.
So what I've done is load the web.config as XML and match the current URL based on what is in the url attribute of the match element. However, I can't figure out how to get at the group. Bearing in mind this needs to be generic so in another rule the group could be the third or first group.
I have a white list of rules which I do this against.
I tried using Capture Groups (?<id>\d+) but the web.config doesn't allow them.
The way I've got around this is using the Server Variables feature of the IIS 7 URL Rewriting module.
I defined a new allowed server variable and then captured the value that I needed that way using the IIS URL Rewrite syntax {R:2}. I could then acccess the value using the HttpContext.Request.ServerVariables collection.
The idea of using named groups seems like a reasonably good way to address the flexibility of the rules. I don't currently have access to testing rewrite rules on IIS but you might want to try using the alternate approach to name captured groups. Maybe that will get through.
Try this alternate pattern instead, which uses single quotes in place of the angle brackets:
(?'id'\d+)

Categories