case sensitive to URL case insensitive - c#

I'm generating an encoded value to get passed within my URL, the issue is, our SEO manager configure the application, to pass lowercase URL, and he says he won't change the configuration. now i have to somehow encode my url, that uppercase, or whole string get encoded by their character code, so i can pass it without ruin the main value,
for example, my resulting base64 string is as following:
aHR0cDovL2xvY2FsaG9zdDoxMzUwL2hvdGVscy9nMy8xMzk1LTA1LTEwLzEvOTI3MjIyZmY
but it turn to be like this, when is passed to controller:
ahr0cdovl2xvy2fsag9zddoxmzuwl2hvdgvscy9nmy8xmzk1lta1ltewlzevoti3mjiyzmy
which can't be read... the case cause issue while decode.

You cannot encode it using base64 if it will be transformed to lowercase out of your control, base64 relies upon using uppercase characters.
If the configuration your manager is insisting on is that incoming or outgoing query string parameters be incorrectly lower cased, however, you should inform him that he is in violation of the URI specification, specifically the query string section. Of course it is ultimately up to your own internal company choices whether you want only lower case in your internal URIs, but you should not assume that other applications handling URIs will operate like this.
As #sachin stated above, if you can make this a POST request (instead of a GET like I assume it is now), and provided that your manager is not lower casing those upon sending them as well :/ You can send this data via POST.
Alternatively, you could use Base32 instead to get around this, it does rely on uppercase characters only, but you can simply transform the recieved value to upper case upon recieveing it prior to decoding the (now Base32) string. This is a pretty ridiculous solution though...
Just to be clear: "lol" would encode in Base32 to "NRXWY===" which would then be lower cased to "nrxwy===" which you could then uppercase back to "NRXWY===" prior to decoding.
These are two NuGet packages that do Base32 encoding:
Base32 as per RFC4648 here and the author claims it's tested and working correctly.
Another package, which looks appealing because it supports zBase32 here, the advantage with zBase32 is that it already uses lowercase characters only, so you won't have to worry about changing the case. The porter/author has included instructions on how to get zBase32 encoding
Both of the these (Base32 and zBase32) use a subset of Base64 characters, so they'll both work fine with URIs, all of the charcaters used are valid in URIs (the utf-8 content is irrelevant since you're just encoding bytes, so you'll get the same bytes back when you decode from Base32)

Related

Encoding JSON serialized for URL

I have to create a webrequest using HTTP GET and I have to add on the url a JSON serialized and encoded with all parameters.
But I have a problem when I'm encoding the serialized object, they encode also symbols like "{" and ":"
I would like to know what I have to do in order to encode the serialized object like the followed above:
Serialized Object:
{\"Name\":\"Bob\"}
Encoded With HttpUtility.Utility, or another encoder will encode all symbols:like "{" ":"
"%7b%22Name%22%3a%22Bob%22%2
What I'm looking for:
http://tmpserviceURL.test?parameters={%20%22Name%22:%20%22Bob%22}
#Ant P is correct: you want those characters to be encoded. It is a bad idea not to encode them.
HttpUtility.UrlEncode and other similar methods encode {, } and : because they MUST do so per section 2.2 of the Uniform Resource Locators specification (RFC 1738).
From page 2:
Octets [within a URL] must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.
The spec goes on to define : as being in the set of "reserved" characters (those that have special meaning within a URL), and defines { and } as being in the set of "unsafe" characters (those that are known to be sometimes modified by gateways and other transport agents).
So, in short, if you send these characters unencoded in a URL, then you risk the URL not being interpretted correctly, or the data being corrupted by the time it reaches its destination. It may work sometimes, but you can't rely on it always working.
If you really feel you must ignore the URL spec, then you will have to roll your own URL encoder that does not encode those particular characters. I doubt you're going to find an off-the-shelf encoder that allows you to do this.

How do I correctly parse a URI query string into a name-value collection in C#?

I'm using .NET 4.5 and I'm trying to parse a URI query string into a NameValueCollection. The right way seems to be to use HttpUtility.ParseQueryString(string query) which takes the string obtained from Uri.Queryand returns a NameValueCollection. Uri.Query returns a string that is escaped according to RFC 2396, and HttpUtility.ParseQueryString(string query) expects a string that is URL-encoded. Assuming RFC 2396 and URL-encoding are the same thing, this should work fine.
However, the documentation for ParseQueryString claims that it "uses UTF8 format to parse the query string". There is also an overloaded method which takes a System.Text.Encoding and then uses that instead of UTF8.
My question is: what does it mean to use UTF8 as the encoding? The input is a string, which by definition (in C#) is UTF-16. How is that interpreted as UTF-8? What is the difference between using UTF8 and UTF16 as the encoding in this case? My concern is that since I'm accepting arbitrary user input, there might be some security risk if I botch the encoding (i.e. the user might be able to slip through some script exploit).
There is a previous question on this topic (How to parse a query string into a NameValueCollection in .NET) but it doesn't specifically adress the encoding problem.
When parsing encoded values, it treats those values as UTF-8. Take the character ยข, for example. The UTF-8 encoding is C2 A2. So if it were in a query string, it would be encoded as %C2%A2.
Now, when ParseQueryString is decoding, it needs to know what encoding to use. The default is UTF-8, meaning that the character would be decoded correctly. But perhaps the user was using Microsoft's Cyrillic code page (Windows-1251), where C2 and A2 are two different characters. In that case, interpreting it as UTF-8 would be an error.
If this is a user interface application (i.e. the user is entering data directly), then you probably want to use whatever encoding is defined for the current UI culture. If you're getting this information from Web pages, then you'll want to use whatever encoding the page uses. And if you're writing a Web service then you can tell the users that their input has to be UTF-8 encoded.

Is there cross-platform method to encode a string into another string without any whitespaces and then decode it back?

I am trying to pass a block of text to a system I do not own, which will pass the data to a system I do own.
Unfortunately, when the first system talks to the second system, it uses a TSV format. Thus, I wonder if there's a convenient way to take my block of text and encode it in an ASCII format without any kind of whitespace (mostly newlines and tabs, of course), and then later decode it.
When I'm doing the encoding, I'm working in C#. When I'm doing the decoding, I'm working in Javascript.
I realize that I can write my own code to essentially "manually" perform the encoding and decoding by creating my own scheme, but I wonder if there already exists one for this purpose.
One option which would blow up the size of your data but be really simple to implement: UTF-8 encode all the text, base64-encode that:
byte[] utf8 = Encoding.UTF8.GetBytes(text);
string base64 = Convert.ToBase64(utf);
That won't contain any whitespace, and can be converted back. It'll be significantly larger than the original string, and unreadable... but it'll work.
You could try using HttpUtility.UrlEncode(string) or Uri.EscapeDataString(string), which would percent-encode any whitespace in the passed in text (as well as other special characters, which means the encoded text may be much larger than the original).
On the javascript side you could then use decodeURIComponent(string) to decode it back to the original text.

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.
First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?
If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

Double/incomplete Parameter Url Encoding

In my web app, my parameters can contain all sorts of crazy characters (russian chars, slashes, spaces etc) and can therefor not always be represented as-is in a URL.
Sending them on their merry way will work in about 50% of the cases. Some things like spaces are already encoded somewhere (I'm guessing in the Html.BuildUrlFromExpression does). Other things though (like "/" and "*") are not.
Now I don't know what to do anymore because if I encode them myself, my encoding will get partially encoded again and end up wrong. If I don't encode them, some characters will not get through.
What I did is manually .replace() the characters I had problems with.
This is off course not a good idea.
Ideas?
--Edit--
I know there are a multitude of encoding/decoding libraries at my disposal.
It just looks like the mvc framework is already trying to do it for me, but not completely.
<a href="<%=Html.BuildUrlFromExpression<SearchController>(c=>c.Search("", 1, "a \v/&irdStr*ng"))%>" title="my hat's awesome!">
will render me
<a href="/Search.mvc/en/Search/1/a%20%5Cv/&irdStr*ng" title="my hat's awesome!">
Notice how the forward slash, asterisk and ampersand are not escaped.
Why are some escaped and others not? How can I now escape this properly?
Am I doing something wrong or is it the framework?
Parameters should be escaped using Uri.EscapeDataString:
string url = string.Format("http://www.foo.bar/page?name={0}&address={1}",
Uri.EscapeDataString("adlknad /?? lkm#"),
Uri.EscapeDataString(" qeio103 8182"));
Console.WriteLine(url);
Uri uri = new Uri(url);
string[] options = uri.Query.Split('?','&');
foreach (string option in options)
{
string[] parts = option.Split('=');
if (parts.Length == 2)
{
Console.WriteLine("{0} = {1}",parts[0],
Uri.UnescapeDataString(parts[1]));
}
}
AS others have mentioned, if you encode your string first you aviod the issue.
The MVC Framework is encoding characters that it knows it needs to encode, but leaving those that are valid URL characters (e.g. & % ? * /). This is because these are valid URL characters, although they are special chracters in a URL that might not acheive the result you are after.
Try using the Microsoft Anti-Cross Site Scripting library. It contains several Encode methods, which encode all the characters (including #, and characters in other languages). As for decoding, the browser should handle the encoded Url just fine, however if you need to manually decode the Url, use Uri.UnescapeDataString
Hope that helps.
Escaping of forward slahes and dots in path part of url is prohibited by security reason (althrough, it works in mono).
Html.BuildUrlFromExpression needs to be fixed then, would submit this upstream to the MVC project... alternatively do the encoding to the string before passing to BuildUrlFromExpression, and decode it when it comes back out on the other side.
It may not be readily fixable, as IIS may be handling the decoding of the url string beforehand... may need to do some more advanced encoding/decoding for alternative path characters in the utility methods, and decode on your behalf coming out.
I've seen similar posts on this. Too me, it looks like a flaw in MVC. The function would be more appropriately named "BuildUrlFromEncodedExpression". Whats worse, is that the called function needs to decode its input parameters. Yuk.
If there is any overlap between the characters encoded BuildUrlFromExpression() and the characters encoded by the caller (who, I think might fairly just encode any non-alphanumeric for simplicities sake) then you have potential for nasty bugs.
Server.URLEncode or HttpServerUtility.UrlEncode
I see what you're saying now - I didn't realize the question was specific to MVC. Looks like a limitation of that part of the MVC framework - particularly BuildUrlFromExpression is doing some URL encoding, but it knows that also needs some of those punctation as part of the framework URLs.
And also unfortunately, URLEncoding doesn't produce an invariant, i.e.
URLEncode(x) != URLEncode(URLEncode(x))
Wouldn't that be nice. Then you could pre-encode your variables and they wouldn't be double encoded.
There's probably an ASP.NET MVC framework best practice for this. I guess another thing you could do is encode into base64 or something that is URLEncode-invariant.
Have you tried using the Server.UrlEncode() method to do the encoding, and the Server.UrlDecode() method to decode?
I have not had any issues with using it for passing items.

Categories