I have been tracking down a bug on a Url Rewriting application. The bug showed up as an encoding problem on some diacritic characters in the querystring.
Basically, the problem was that a request which was basically /search.aspx?search=heřmánek was getting rewritten with a querystring of "search=he%c5%99m%c3%a1nek"
The correct value (using some different, working code) was a rewrite of the querystring as "search=he%u0159m%u00e1nek"
Note the difference between the two strings. However, if you post both you'll see that the Url Encoding reproduces the same string. It's not until you use the context.Rewrite function that the encoding breaks. The broken string returns 'heÅmánek' (using Request.QueryString["Search"] and the working string returns 'heřmánek'. This change happens after the call to the rewrite function.
I traced this down to one set of code using Request.QueryString (working) and the other using Request.Url.Query (request.Url returns a Uri instance).
While I have worked out the bug there is a hole in my understanding here, so if anyone knows the difference, I'm ready for the lesson.
Your question really sparked my interest, so I've done some reading for the past hour or so. I'm not absolutely positive I've found the answer, but I'll throw it out there to see what you think.
From what I've read so far, Request.QueryString is actually "a parsed version of the QUERY_STRING variable in the ServerVariables collection" [reference] , where as Request.Url is (as you stated) the raw URL encapsulated in the Uri object. According to this article, the Uri class' constructor "...parses the [url string], puts it in canonical format, and makes any required escape encodings."
Therefore, it appears that Request.QueryString uses a different function to parse the "QUERY_STRING" variable from the ServerVariables constructor. This would explain why you see the difference between the two. Now, why different encoding methods are used by the custom parsing function and the Uri object's parsing function is entirely beyond me. Maybe somebody a bit more versed on the aspnet_isapi DLL could provide some answers with that question.
Anyway, hopefully my post makes sense. On a side note, I'd like to add another reference which also provided for some very thorough and interesting reading: http://download.microsoft.com/download/6/c/a/6ca715c5-2095-4eec-a56f-a5ee904a1387/Ch-12_HTTP_Request_Context.pdf
What you indicated as the "broken" encoded string is actually the correct encoding according to standards. The one that you indicated as "correct" encoding is using a non-standard extension to the specifications to allow a format of %uXXXX (I believe it's supposed to indicate UTF-16 encoding).
In any case, the "broken" encoded string is ok. You can use the following code to test that:
Uri uri = new Uri("http://www.example.com/test.aspx?search=heřmánek");
Console.WriteLine(uri.Query);
Console.WriteLine(HttpUtility.UrlDecode(uri.Query));
Works fine. However... on a hunch, I tried UrlDecode with a Latin-1 codepage specified, instead of the default UTF-8:
Console.WriteLine(HttpUtility.UrlDecode(uri.Query,
Encoding.GetEncoding("iso-8859-1")));
... and I got the bad value you specified, 'heÅmánek'. In other words, it looks like the call to HttpContext.RewritePath() somehow changes the urlencoding/decoding to use the Latin-1 codepage, rather than UTF-8, which is the default encoding used by the UrlEncode/Decode methods.
This looks like a bug if you ask me. You can look at the RewritePath() code in reflector and see that it is definitely playing with the querystring - passing it around to all kinds of virtual path functions, and out to some unmanaged IIS code.
I wonder if somewhere along the way, the Uri at the core of the Request object gets manipulated with the wrong codepage? That would explain why Request.Querystring (which is simply the raw values from the HTTP headers) would be correct, while the Uri using the wrong encoding for the diacriticals would be incorrect.
I have done a bit of research over the past day or so and I think I have some information on this.
When you use Request.Querystring or HttpUtility.UrlDecode (or Encode) it is using the Encoding that is specified in the element (specifically the requestEncoding attribute) of the web.config (or the .config hierarchy if you haven't specified) ---NOT the Encoding.Default which is the default encoding for your server.
When you have the encoding set to UTF-8, a single unicode character can be encoded as 2 %xx hex values. It will also be decoded that way when given the whole value.
If you are UrlDecoding with a different Encoding than the url was encoded with, you will get a different result.
Since HttpUtility.UrlEncode and UrlDecode can take an encoding parameter, its tempting to try to encode using an ANSI codepage, but UTF-8 is the right way to go if you have the browser support (apparently old versions don't support UTF-8). You just need to make sure that the is properly set and both sides will work fine.
UTF-8 Seems to be the default encoding: (from .net reflector System.Web.HttpRequest)
internal Encoding QueryStringEncoding
{
get
{
Encoding contentEncoding = this.ContentEncoding;
if (!contentEncoding.Equals(Encoding.Unicode))
{
return contentEncoding;
}
return Encoding.UTF8;
}
}
Following the path to find out the this.ContentEncoding leads you to (also in HttpRequest)
public Encoding ContentEncoding
{
get
{
if (!this._flags[0x20] || (this._encoding == null))
{
this._encoding = this.GetEncodingFromHeaders();
if (this._encoding == null)
{
GlobalizationSection globalization = RuntimeConfig.GetLKGConfig(this._context).Globalization;
this._encoding = globalization.RequestEncoding;
}
this._flags.Set(0x20);
}
return this._encoding;
}
set
{
this._encoding = value;
this._flags.Set(0x20);
}
}
To answer your specific question on the difference betwen Request.Url.Quer and Request.QueryString... here is how HttpRequest builds its Url Property:
public Uri Url
{
get
{
if ((this._url == null) && (this._wr != null))
{
string queryStringText = this.QueryStringText;
if (!string.IsNullOrEmpty(queryStringText))
{
queryStringText = "?" + HttpEncoder.CollapsePercentUFromStringInternal(queryStringText, this.QueryStringEncoding);
}
if (AppSettings.UseHostHeaderForRequestUrl)
{
string knownRequestHeader = this._wr.GetKnownRequestHeader(0x1c);
try
{
if (!string.IsNullOrEmpty(knownRequestHeader))
{
this._url = new Uri(this._wr.GetProtocol() + "://" + knownRequestHeader + this.Path + queryStringText);
}
}
catch (UriFormatException)
{
}
}
if (this._url == null)
{
string serverName = this._wr.GetServerName();
if ((serverName.IndexOf(':') >= 0) && (serverName[0] != '['))
{
serverName = "[" + serverName + "]";
}
this._url = new Uri(this._wr.GetProtocol() + "://" + serverName + ":" + this._wr.GetLocalPortAsString() + this.Path + queryStringText);
}
}
return this._url;
}
}
You can see it is using the HttpEncoder class to do the decoding, but it uses the same QueryStringEncoding value.
Since I am already posting a lot of code here and anyone can get .NET Reflector, I'm going to snippet up the rest. The QueryString property comes from the HttpValueCollection which uses FillFromEncodedBytes method to eventually call HttpUtility.UrlDecode (with the QueryStringEncoding value set above), which eventually calls the HttpEncoder to decode it.
They do seem to use different methodology to decode the actual bytes of the querystring, but the encoding they use to do it seems to be the same.
It is interesting to me that the HttpEncoder has so many functions that seem to do the same thing, so its possible there are differences in those methods which can cause an issue.
Related
In my ASP.NET WebForm application I have simple rule:
routes.MapPageRoute("RouteSearchSimple", "search/{SearchText}", "~/SearchTicket.aspx");
As "SearchText" param I need to use cyrillic words, so to create Url I use:
string searchText = "текст";
string url = Page.GetRouteUrl("RouteSearchSimple",
new
{
SearchText = searchText
});
GetRouteUrl automatically encode searchText value and as a result
url = /search/%D1%82%D0%B5%D0%BA%D1%81%D1%82
but I need -> /search/текст
How is it possible to get it by Page.GetRouteUrl function.
Thanks a lot!
Actually, I believe Alexei Levenkov is close to the answer. Ultimately, a URL may only contain ASCII characters, so anything beyond alphanumeric characters will be URL encoded (even things like spaces).
Now, to your point, there are browsers out there that will display non-ASCII characters, but that is up to the implementation of the browser (behind the scenes, it is still performing the encoding). GetRouteUrl, however, will return the ASCII-encoded form every time because that is a requirement for URLs.
(As an aside, that "some 8 year old document" defines URLs. It's written by Tim Berners Lee. He had a bit of an impact on the Internet.)
Update
And because you got me interested, I did a bit more research. It looks as though Internationalized Domain Names do exist. However, from what I understand from the article, underneath the covers, ToASCII or ToUnicode are applied to the names. More can be read in this spec: RFC 3490. So, again, you're still at the same point. More discussion can be found at this Stackoverflow question.
Ok, guys, thank you for replies, it helps much. Simple answer is: it's impossible to do that by Page.GetRouteUrl() function. It's very strange why it hasn't beed developed in way to rely Encoding/Decoding params on developers like we have it in Request.Params or .QueryString, or at least it would be some alternate routing function where developers could control that.
One way I found is getting Url from RouteTable and parse it manually, in my case it would be like:
string url = (System.Web.Routing.RouteTable.Routes["RouteSearchSimple"] as System.Web.Routing.Route).Url.Replace("{SearchText}", "текст");
or simplest way is just creating url via string concatenation:
string param = "текст";
string url = "/search/" + param;
what I already did, but in that case you will need change the code in all places where it appears if you change your route url, therefore better create some static function like GetSearchUrl(string searchText) in one place.
And it works like a charm, Url's looks human readable and I can read params via RouteData.Values
The most simple solution is to decode with UrlDecode method:
string searchText = "текст";
string url = Page.GetRouteUrl("RouteSearchSimple",
new
{
SearchText = searchText
});
string decodedUrl = Server.UrlDecode(url); // => /search/текст
Assume the following Url:
"http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents & Functions + Properties.docx&Save=true"
I use HttpUtility.UrlEncode() to encode the value of the Filename parameter and I create the following Url:
"http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents%20%26%20Functions%20%2B%20Properties.docx&Save=true"
I send the following (encoded version) of request from a client to a C# Web Application. On the server when I process the request I have a problem. The HttpRequest variable contains the query string partially decoded. That is to say when I try to use or quick watch the following properties of HttpRequest they have the following values.
Property = Value
================
HttpRequest.QueryString = "{Library=Testing&Filename=Documents+&+Functions+++Properties.docx&Save=true}"
HttpRequest.Url = "{http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents & Functions + Properties.docx&Save=true}"
HttpRequest.Url.AbsoluteUri = "http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents%20&%20Functions%20+%20Properties.docx&Save=true"
I have also checked the following properties but all of them have the & value decoded. However all other values remain properly encoded (e.g. space is %20).
HttpRequest.Url.OriginalString
HttpRequest.Url.Query
HttpRequest.Url.PathAndQuery
HttpRequest.RawUrl
There is no way I can read the value of the parameter Filename properly. Am I missing something?
The QueryString property returns a NameValueCollection object that maps the querystring keys to fully-decoded values.
You need to write Request.QueryString["FileName"].
I'm answering this question many years later because I just had this problem and figured out the solution. The problem is that HttpRequest.Url isn't really the value that you gave. HttpRequest.Url is a Uri class and that value is the ToString() value of that class. ToString() for the Uri class decodes the Url. Instead, what you want to use is HttpRequest.Url.OriginalString. That is the encoded version of the URL that you are looking for. Hope this helps some future person having this problem.
What happens when you don't use UrlEncode? You didn't show how exactly you are using the url that you created using UrlEncode, so it is quite possible that things are just being double encoded (lots of the framework will encode the URLs for you automatically).
FWIW I ran into this same problem with RavenDB (version 960). They implement their own HttpRequest object that behaves just like this -- it first decodes just the ampersands (from %26 to &) and then decodes the entire value. I believe this is a bug.
A couple of workarounds to this problem:
Implement your own query string parsing on the server. It's not fun but it is effective.
Double-encode ampersands. First encode just the ampersands in the string, then encode the entire string. (It's an easy solution but not extensible because it puts the burden on the client.)
I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.
First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?
If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.
I'm working on a quick captcha generator for a simple site I'm putting together, and I'm hoping to pass an encrypted key in the url of the page. I could probably do this as a query string parameter easy enough, but I'm hoping not too (just because nothing else runs off the query string)...
My encryption code produces a byte[], which is then transformed using Convert.ToBase64String(byte[]) into a string. This string, however, is still not quite url friendly, as it can contain things like '/' and '='. Does anyone know of a better function in the .NET framework to convert a byte array to a url friendly string?
I know all about System.Web.HttpUtility.UrlEncode() and its equivalents, however, they only work properly with query string parameters. If I url encode an '=' inside of the path, my web server brings back a 400 Bad Request error.
Anyways, not a critical issue, but hoping someone can give me a nice solution
**EDIT: Just to be absolutely sure exactly what I'm doing with the string, I figured I would supply a little more information.
The byte[] that results from my encryption algorithm should be fed through some sort of algorithm to make it into a url friendly string. After this, it becomes the content of an XElement, which is then used as the source document for an XSLT transformation, and is used as a part of the href attribute for an anchor. I don't believe the xslt transformation is causing the issues, since what is coming through on the path appears to be an encoded query string parameter, but causes the HTTP 400
I've also tried HttpUtility.UrlPathEncode() on a base64 string, but that doesn't seem to do the trick either (I still end up with '/'s in my url)**
You're looking for HttpServerUtility.UrlTokenEncode and HttpServerUtility.UrlTokenDecode, in System.Web.
They encode in base64, replacing the potentially dangerous '+' and '/' chars with '-' and '_' instead.
MSDN documentation
For ASP.NET Core 6.0+ use Microsoft.AspNetCore.WebUtilities.WebEncoders:
byte[] bytes = RandomNumberGenerator.GetBytes(64);
string encoded = WebEncoders.Base64UrlEncode(bytes);
byte[] decoded = WebEncoders.Base64UrlDecode(encoded);
Have a look at System.BitConverter.ToString(myByteArray)
Handy for one way encoding for things like hashes but as pointed out by ssg it's not very efficient. I wouldn't recommend it for large amounts of data.
HttpServerUtility.UrlTokenEncode and similar are no longer available in .NET Core as of .NET 6.
The following method should provide an RFC4648§5-compliant base64 string based on a quick reading.
This code allocates at least four objects on the heap, so ultra-high performance use cases should tune it a bit and perhaps wait for .NET 7, which is expected to introduce a means to get the random bytes into a Span for stack allocation.
private string GetUrlSafeRandomBytes(int byteCount)
{
var salt = new byte[byteCount];
using var rng = RandomNumberGenerator.Create();
rng.GetBytes(salt);
var asBase64 = Convert.ToBase64String(salt);
var asUrlSafeString = asBase64.Replace('+', '-').Replace('/', '_');
return asUrlSafeString;
}
I have an MVC route like this www.example.com/Find?Key= with the Key being a Base64 string. The problem is that the Base64 string sometimes has a trailing equal sign (=) such as:
huhsdfjbsdf2394=
When that happens, for some reason my route doesn't get hit anymore.
What should I do to resolve this?
My route:
routes.MapRoute(
"FindByKeyRoute",
"Find",
new { controller = "Search", action = "FindByKey" }
);
If I have http://www.example.com/Find?Key=bla then it works.
If I have http://www.example.com/Find?Key=bla= then it doesn't work anymore.
Important Addition:
I'm writing against an IIS7 instance that doesn't allow % or similar encoding. That's why I didn't use UrlEncode to begin with.
EDIT: Original suggestion which apparently doesn't work
I'm sure the reason is that it thinks it's a query parameter called Key. Could you make it a parameter, with that part being the value, e.g.
www.example.com/Find?Route=Key=
I expect that would work (as the parser would be looking for an & to start the next parameter) but it's possible it'll confuse things still.
Suggestion which I believe will work
Alternatively, replace "=" in the base64 encoded value with something else on the way out, and re-replace it on the way back in, if you see what I mean. Basically use a different base64 decodabet.
Alternative suggestion which should work
Before adding base64 to the URL:
private static readonly char[] Base64Padding = new char[] { '=' };
...
base64 = base64.TrimEnd(Base64Padding);
Then before calling Convert.FromBase64String() (which is what I assume you're doing) on the inbound request:
// Round up to a multiple of 4 characters.
int paddingLength = (4 - (base64.Length % 4)) % 4;
base64 = base64.PadRight(base64.Length + paddingLength, '=');
IF you're passing data in the URL you should probably URL Encode it which would take care of the trailing =.
http://www.albionresearch.com/misc/urlencode.php
UrlEncode the encrypted (it is encrypted, right?) parameter.
If it is an encrypted string, beware that spaces and the + character will also get in your way.
Ok, so IIS 7 won't allow some special characters as part of your path. However, it would allow them if they were part of the querystring.
It is apparently, possible, to change this with a reg hack, but I wouldn't recommend that.
What I would suggest, then, is to use an alternate token, as suggested by Mr Skeet, or simply do not use it in your path, use it as querystring, where you CAN url encode it.
If it is an encrypted string, you haven't verified that it is or is not, you may in some cases get other 'illegal' characters. Querystring really would be the way to go.
Except your sample shows it as querystring... So what gives? Where did you find an IIS that won't allow standard uri encoding as part of the querystring??
Ok then. Thanks for the update.
RequestFiltering?
I see. Still that mentions double-encoded values that it blocks. Someone created a URL Sequence to deny any request with the '%' characters? At that point you might want to not use the encrypted string at all, but generate a GUID or something else that is guaranteed to not contain special characters, yet is not trivial to guess.