The code that responds to a request is the following:
return new HttpStatusCodeResult(ex.Detail.HttpStatusCode, ex.Detail.ReasonPhrase);
Response.AddHeader("X-Status", ex.Detail.ReasonPhrase);
The funny thing here is that, on the browser/client, when there's a special char involved (such as ç or é) the reason phrase renders as expected, but the X-Status does NOT!!!
here's a screenshot for the non-believers
I've tested a uncountable number of encoding combinations but none worked... the X-Status just fails...
I have to use the X-Status custom header because Safari overrides whatever message comes with the StatusText...
The HTTP specification does not define a character encoding for header fields or the status line (well, not beyond US-ASCII). If you need non-ASCII characters, you're on your own. One reliable way is to percent-encode, for instance.
Also note that HTTP/2 doesn't have a status line, thus trying to use that to return information is a non-starter anyway.
Related
I'm generating an encoded value to get passed within my URL, the issue is, our SEO manager configure the application, to pass lowercase URL, and he says he won't change the configuration. now i have to somehow encode my url, that uppercase, or whole string get encoded by their character code, so i can pass it without ruin the main value,
for example, my resulting base64 string is as following:
aHR0cDovL2xvY2FsaG9zdDoxMzUwL2hvdGVscy9nMy8xMzk1LTA1LTEwLzEvOTI3MjIyZmY
but it turn to be like this, when is passed to controller:
ahr0cdovl2xvy2fsag9zddoxmzuwl2hvdgvscy9nmy8xmzk1lta1ltewlzevoti3mjiyzmy
which can't be read... the case cause issue while decode.
You cannot encode it using base64 if it will be transformed to lowercase out of your control, base64 relies upon using uppercase characters.
If the configuration your manager is insisting on is that incoming or outgoing query string parameters be incorrectly lower cased, however, you should inform him that he is in violation of the URI specification, specifically the query string section. Of course it is ultimately up to your own internal company choices whether you want only lower case in your internal URIs, but you should not assume that other applications handling URIs will operate like this.
As #sachin stated above, if you can make this a POST request (instead of a GET like I assume it is now), and provided that your manager is not lower casing those upon sending them as well :/ You can send this data via POST.
Alternatively, you could use Base32 instead to get around this, it does rely on uppercase characters only, but you can simply transform the recieved value to upper case upon recieveing it prior to decoding the (now Base32) string. This is a pretty ridiculous solution though...
Just to be clear: "lol" would encode in Base32 to "NRXWY===" which would then be lower cased to "nrxwy===" which you could then uppercase back to "NRXWY===" prior to decoding.
These are two NuGet packages that do Base32 encoding:
Base32 as per RFC4648 here and the author claims it's tested and working correctly.
Another package, which looks appealing because it supports zBase32 here, the advantage with zBase32 is that it already uses lowercase characters only, so you won't have to worry about changing the case. The porter/author has included instructions on how to get zBase32 encoding
Both of the these (Base32 and zBase32) use a subset of Base64 characters, so they'll both work fine with URIs, all of the charcaters used are valid in URIs (the utf-8 content is irrelevant since you're just encoding bytes, so you'll get the same bytes back when you decode from Base32)
I have a .NET Web API v2 and need to define a route that can contain forward slashes at any point in the route. I have it configured (something) like this in WebApiConfig:
config.Routes.MapHttpRoute(
name: "SomeRouteName",
routeTemplate: "api/summary/{*token}",
defaults: new { controller = "Summary" });
Unfortunately the token contains slashes and there's nothing I can do about it at this point. The above route works in most cases if the slash is not at the beginning of the token. So that takes care of most cases. But if token begins with slash(es), it doesn't work because that first slash gets interpreted as part of the URL I assume and gets eaten. So in my controller action, I have the following code (admittedly a hack that I'm trying to avoid):
if (summary == null)
{
summary = _repo.GetSummary($"/{token}");
}
Obviously, this will only work for a single slash. I could do a loop and add more, but there isn't way to know how many it could be. Currently no tokens in my DB begin with two slashes, so this bad code works for now. It was implemented as a band-aid until a better solution is found.
Edit: This references the * route, which mostly fixed my original issue, but still doesn't match the first slashes: URLs with slash in parameter?
Since OP said in some comment:
There is no purpose -- but the tokens are generated and not
necessarily by my code. So I don't have control over the token
generation.
and also:
I've attempted to UrlEncode /Decode to see if this works, but the
slash (encoded as %2F) still gets eaten for whatever reason. As a side
note, I can say that Base64 encoding it will fix this but that I can't
change this at this point because it would break the API for existing
apps.
I would say that one of best choices to avoid issues with special characters is firstly encoding the whole token as a base 64 string, and then url-encode it to encode possible characters like =.
That is, you can configure a route where you don't need {token*} but just {token}. If you need to simplify the token decoding, you can implement an ActionFilterAttribute that decodes the so-called bound parameter under the hoods.
Others have already done this way...
OAuth2 already sends basic authentication's credentials encoding them as a base64 string (you convert user:password to the whole base 64 string).
It's a common way of avoding these issues and base 64 strings can be decoded in every modern client and server platform.
I have a large url that I am encoding using System.Web.HttpUtility.UrlEncode. When I encode it its not encoding it like the working example I have. I am not sure what the problem is, maybe different character type or something, I put an example of what suppose to be created and what actually being created. thanks for any help, i am lost on this one.
Working exmaple (look how this one has Did%252Citag%252 and the other doesnt)
22%7Chttp%3A%2F%2Fv17.nonxt1.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D22%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D8AD67D74F34FBAFBBA87616C0AED4A336DF0982A.129E2B5E648F8A2F35A34F312AC5C3C957A1C40A%26key%3Dlh1%2C35%7Chttp%3A%2F%2Fv18.nonxt3.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D35%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D7A58A11994C710872E945D0EAA6E43B6BFB8A648.B9C1D9FB377E1A49EBF3DC6C166C0B6E3E94EC24%26key%3Dlh1%2C34%7Chttp%3A%2F%2Fv6.nonxt1.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D34%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D260B10850A3448C849B8B8F1F2AF5E31244E71BC.6D7420FD66B85D40982BFB2C847EDB46021C63AE%26key%3Dlh1%2C5%7Chttp%3A%2F%2Fv23.nonxt7.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D5%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D9894DCDA7D2634EE0006CE0F6E0E29ABF7A8F253.18765D7CD7BDE80ED1A47DC8EC559C3E05C92F56%26key%3Dlh1
Here is an example of the one I am creating (see this one encodes as did%2citag%2)
5%7chttp%3a%2f%2fv23.nonxt7.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d5%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3dC0E2993011931D9F5FCAFAF54E821415F6042DDD.477CD23B021563A6DE30E858E35C21046E0B0BA6%26key%3dlh1%2c18%7chttp%3a%2f%2fv11.nonxt4.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d18%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3d696501A8ACBA0E1246173B040E0FB81DA8EBCDC7.944BA6C08C630EFFC2456D66BAD12376D7E377B2%26key%3dlh1%2c34%7chttp%3a%2f%2fv6.nonxt1.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d34%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3dDDD3D9081F7F2FF462D17CFAE6CAB72AEB86DEA9.3275E0EE8921EF728132035FC94BEF5926A0B7C1%26key%3dlh1%2c35%7chttp%3a%2f%2fv18.nonxt3.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d35%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3d7826E7470450F9F473BC7A845967EF3AC655CFB.3850F952F5D68151D325CD754C581CD66B0BC4D7%26key%3dlh1%2c22%7chttp%3a%2f%2fv17.nonxt1.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d22%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3d32FAAE6AE74B22BFB3DBD4300CEEDBC1A12A9ED4.8014678ABB1AEE93FB4B1C36E2C74C89102DC112%26key%3dlh1
Looks like in the first example the URL is double encoded. Meaning if you look at decoded sparams parameter it is represented as
sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire
In your second example
sparams=id,itag,source,ip,ipbits,expire
So, what is happening in the first example is that, they are doing a UrlEncode on the value first. Using this value Construct the URL and then do UrlEncode on the constructed URL.
UPDATE : This is a general practice to be followed if the value of your querystring contains values which needs to be UrlEncoded (eg. , & space ? etc)
According to w3c standards, your example is fine. There is no %252 symbol.
I'm not sure exactly what you are expecting, but when you fire these strings into a URL Decoder, this is what you get:
String 1
22|http://v17.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=22&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=8AD67D74F34FBAFBBA87616C0AED4A336DF0982A.129E2B5E648F8A2F35A34F312AC5C3C957A1C40A&key=lh1,35|http://v18.nonxt3.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=35&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=7A58A11994C710872E945D0EAA6E43B6BFB8A648.B9C1D9FB377E1A49EBF3DC6C166C0B6E3E94EC24&key=lh1,34|http://v6.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=34&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=260B10850A3448C849B8B8F1F2AF5E31244E71BC.6D7420FD66B85D40982BFB2C847EDB46021C63AE&key=lh1,5|http://v23.nonxt7.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=5&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=9894DCDA7D2634EE0006CE0F6E0E29ABF7A8F253.18765D7CD7BDE80ED1A47DC8EC559C3E05C92F56&key=lh1
String 2
5|http://v23.nonxt7.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=5&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=C0E2993011931D9F5FCAFAF54E821415F6042DDD.477CD23B021563A6DE30E858E35C21046E0B0BA6&key=lh1,18|http://v11.nonxt4.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=18&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=696501A8ACBA0E1246173B040E0FB81DA8EBCDC7.944BA6C08C630EFFC2456D66BAD12376D7E377B2&key=lh1,34|http://v6.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=34&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=DDD3D9081F7F2FF462D17CFAE6CAB72AEB86DEA9.3275E0EE8921EF728132035FC94BEF5926A0B7C1&key=lh1,35|http://v18.nonxt3.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=35&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=7826E7470450F9F473BC7A845967EF3AC655CFB.3850F952F5D68151D325CD754C581CD66B0BC4D7&key=lh1,22|http://v17.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=22&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=32FAAE6AE74B22BFB3DBD4300CEEDBC1A12A9ED4.8014678ABB1AEE93FB4B1C36E2C74C89102DC112&key=lh1
You URLS are quite different, and they also have a leading chars that I'm not sure you are wanting.
I am writing a crawler for a website.
Its response is gzip encoded.
I am not able to parse correctly a particular field, though the decompression is successful.
I am also using htmlagilitypack to parse it,
the parsed value of the field is only a part of the original value
as an example :
I am getting only /wEWAwKc04vTCQKb86mzBwKln/PuCg==
whereas the firebug shows the actual value as much longer:
/wEWBgKj7IuJCgKb86mzBwKln/PuCgLT250qAtC0+8cMAvimiNYD
what does the '==' at the end means?
I am assuming it that its a error on decompressors behalf?
The character = is added by the Base64 encoding.
Encoding the following sentence
Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
you would get
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
The = character can only be present at the end of the Base64 string. If you obtain it, it means you are probably getting all the characters; vice versa is not true, as that character is used as padding character, and it is not always mandatory in all the Base64 implementations.
You don't have a problem with decompression. The page has obviously been correctly decompressed. Otherwise your software would likely throw an error or you'd see just a bunch of strange characters.
However, what you get is an ASCII string that's obviously in Base 64 encoding. The equal signs at the end appear if the original binary data is not a multiple of 3 bytes. So that's all perfect Base 64 data.
As to why your crawler gets different data than Firefox with Firebug: I don't know but can image many reasons. These are two separate browsing sessions and the web site might just assign them different session IDs or somehow record some history of the session.
Anyhow, at the end of the day I don't understand your problem. What exactly are you unable to parse? Do you get some kind of error? What do you mean by field? Are you talking about a field of an HTML form?
In my web app, my parameters can contain all sorts of crazy characters (russian chars, slashes, spaces etc) and can therefor not always be represented as-is in a URL.
Sending them on their merry way will work in about 50% of the cases. Some things like spaces are already encoded somewhere (I'm guessing in the Html.BuildUrlFromExpression does). Other things though (like "/" and "*") are not.
Now I don't know what to do anymore because if I encode them myself, my encoding will get partially encoded again and end up wrong. If I don't encode them, some characters will not get through.
What I did is manually .replace() the characters I had problems with.
This is off course not a good idea.
Ideas?
--Edit--
I know there are a multitude of encoding/decoding libraries at my disposal.
It just looks like the mvc framework is already trying to do it for me, but not completely.
<a href="<%=Html.BuildUrlFromExpression<SearchController>(c=>c.Search("", 1, "a \v/&irdStr*ng"))%>" title="my hat's awesome!">
will render me
<a href="/Search.mvc/en/Search/1/a%20%5Cv/&irdStr*ng" title="my hat's awesome!">
Notice how the forward slash, asterisk and ampersand are not escaped.
Why are some escaped and others not? How can I now escape this properly?
Am I doing something wrong or is it the framework?
Parameters should be escaped using Uri.EscapeDataString:
string url = string.Format("http://www.foo.bar/page?name={0}&address={1}",
Uri.EscapeDataString("adlknad /?? lkm#"),
Uri.EscapeDataString(" qeio103 8182"));
Console.WriteLine(url);
Uri uri = new Uri(url);
string[] options = uri.Query.Split('?','&');
foreach (string option in options)
{
string[] parts = option.Split('=');
if (parts.Length == 2)
{
Console.WriteLine("{0} = {1}",parts[0],
Uri.UnescapeDataString(parts[1]));
}
}
AS others have mentioned, if you encode your string first you aviod the issue.
The MVC Framework is encoding characters that it knows it needs to encode, but leaving those that are valid URL characters (e.g. & % ? * /). This is because these are valid URL characters, although they are special chracters in a URL that might not acheive the result you are after.
Try using the Microsoft Anti-Cross Site Scripting library. It contains several Encode methods, which encode all the characters (including #, and characters in other languages). As for decoding, the browser should handle the encoded Url just fine, however if you need to manually decode the Url, use Uri.UnescapeDataString
Hope that helps.
Escaping of forward slahes and dots in path part of url is prohibited by security reason (althrough, it works in mono).
Html.BuildUrlFromExpression needs to be fixed then, would submit this upstream to the MVC project... alternatively do the encoding to the string before passing to BuildUrlFromExpression, and decode it when it comes back out on the other side.
It may not be readily fixable, as IIS may be handling the decoding of the url string beforehand... may need to do some more advanced encoding/decoding for alternative path characters in the utility methods, and decode on your behalf coming out.
I've seen similar posts on this. Too me, it looks like a flaw in MVC. The function would be more appropriately named "BuildUrlFromEncodedExpression". Whats worse, is that the called function needs to decode its input parameters. Yuk.
If there is any overlap between the characters encoded BuildUrlFromExpression() and the characters encoded by the caller (who, I think might fairly just encode any non-alphanumeric for simplicities sake) then you have potential for nasty bugs.
Server.URLEncode or HttpServerUtility.UrlEncode
I see what you're saying now - I didn't realize the question was specific to MVC. Looks like a limitation of that part of the MVC framework - particularly BuildUrlFromExpression is doing some URL encoding, but it knows that also needs some of those punctation as part of the framework URLs.
And also unfortunately, URLEncoding doesn't produce an invariant, i.e.
URLEncode(x) != URLEncode(URLEncode(x))
Wouldn't that be nice. Then you could pre-encode your variables and they wouldn't be double encoded.
There's probably an ASP.NET MVC framework best practice for this. I guess another thing you could do is encode into base64 or something that is URLEncode-invariant.
Have you tried using the Server.UrlEncode() method to do the encoding, and the Server.UrlDecode() method to decode?
I have not had any issues with using it for passing items.