Url Unicode characters encoding - c#

How to encode URLs containing Unicode? I would like to pass it to a command line utility and I need to encode it first.
Example: http://zh.wikipedia.org/wiki/白雜訊
becomes http://zh.wikipedia.org/wiki/%E7%99%BD%E9%9B%9C%E8%A8%8A.

You can use the HttpUtility.UrlPathEncode method in the System.Web assembly (requires the full .NET Framework 4 profile):
var encoded = HttpUtility.UrlPathEncode("http://zh.wikipedia.org/wiki/白雜訊");

According to MSDN you can't use UrlPathEncode anymore.
So, Correct way of doing it now is,
var urlString = Uri.EscapeUriString("http://zh.wikipedia.org/wiki/白雜訊");

I had Turkish character problem.<a href="/#Html.Raw(string)" solved the problem

Server.UrlEncode(s);
.NET strings are natively Unicode strings (UTF-8 encoded, to be specific) so you need to nothing more than invoke HttpServerUtility.UrlEncode (though the so-called "intrinsic" Server property will be available in most contexts in asp.net where you may want to do this).

Related

C# metro No mapping for the Unicode character exists in the target multi-byte code page

Line:
IList<string> text = await FileIO.ReadLinesAsync(file);
causes exception No mapping for the Unicode character exists in the target multi-byte code page
When I remove chars like ąśźćóż from my file it runs ok, but the problem is that I can't guarantee that those chars won't happen in future.
I tried changing the encoding in advanced save options but it is already
Unicode (UTF-8 with signature) - Codepage 65001
I have a hard time trying to figure this one out.
Make FileIO.ReadLinesAsync use a matching encoding. I don't know what you custom class does but according to the error message it does not use any Unicode encoding.
I think those characters ąśźćóż are UTF-16 encoded.So, it's better to use UTF-16. Use the overload ReadLinesAsync(IStorageFile, UnicodeEncoding) and set UnicodeEncdoing parameter to UnicodeEncoding.Utf16BE
From MSDN :
This method uses the character encoding of the specified file. If you
want to specify different encoding, call ReadLinesAsync(IStorageFile,
UnicodeEncoding) instead.

Converting special characters to regular c#

Is there a command in C# to convert strings like : https%3A%2F%2Fwww.google.com back to https://www.google.com?
some sort of "decryption" method maybe?
You need to use System.Web.HttpUtility.UrlDecode for this:
string real = System.Web.HttpUtility.UrlDecode(encodedString);
You can use the reverse function System.Web.HttpUtility.UrlEncode to encode.
This is not a matter of encryption or decryption. It is just that some characters cannot be expressed as part of parameters or other in a URL. For instance, a colon (:) cannot be part of a URL tail because it is used in the prefix (http:), so it gets encoded as %3A.
In the same way, a slash gets encoded as %2F. Hence, %3A%2F2%F means ://.
You can use HttpUtility.UrlDecode
You can try
HttpUtility.UrlDecode(url);
or
Uri.UnescapeDataString(url);
If you're not working on a web application, I suggest you use the WebUtility class instead as you don't have to import the entire System.Web assembly to access UrlDecode, which is required for the HttpUtility class. (You'll need to be targeting .NET 4)
string unencoded = WebUtility.UrlDecode("https%3A%2F%2Fwww.google.com");
You can also use Uri.UnescapeDataString if don't require any other HTML encoding/decoding methods. This is System.Uri so you don't need to import any other assembly.

Replace ASCII characters with their equivalent

I am setting a value in the cookie using JavaScript and getting the contents of the cookie in the code behind.
But the problem is if I am storing the string with some special characters or whitespace characters, when I am retrieving the contents of the cookie the special symbols are getting converted into ASCII equivalent.
For example, if I want to store Adam - (SET) in cookie , its getting converted into Adam%20-%20%28SET%29 and getting stored and when I am retrieving it I get the same Adam%20-%20%28SET%29. But I wan tot get this Adam - (SET) in the code behind.
How I get this. Please help.
In C#
Use:
String decoded = HttpUtility.UrlDecode(EncodedString);
HttpUtility.UrlDecode() is the underlying function used by most of the other alternatives you can use in the .NET Framwework (see below).
You may want to specify an encoding, if necessary.
Or:
String decoded = Uri.UnescapeDataString(s);
See Uri.UnescapeDataString()'s documentation for some caveats.
In JavaScript
var decoded = decodeURIComponent(s);
Before jumping on using unescape as recommended in other questions, read decodeURIComponent vs unescape, what is wrong with unescape? . You may also want to read What is the difference between decodeURIComponent and decodeURI? .
You can use the unescape function in JS to do that.
var str = unescape('Adam%20-%20%28SET%29');
You are looking for HttpUtility.UrlDecode() (in the System.Web namespace, I think)
In javasrcipt you can use the built-in decodeURIComponent, but I suspect that the string encoding is happening when the value is sent to server so the C# answers are what you want.

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.
First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?
If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

Double/incomplete Parameter Url Encoding

In my web app, my parameters can contain all sorts of crazy characters (russian chars, slashes, spaces etc) and can therefor not always be represented as-is in a URL.
Sending them on their merry way will work in about 50% of the cases. Some things like spaces are already encoded somewhere (I'm guessing in the Html.BuildUrlFromExpression does). Other things though (like "/" and "*") are not.
Now I don't know what to do anymore because if I encode them myself, my encoding will get partially encoded again and end up wrong. If I don't encode them, some characters will not get through.
What I did is manually .replace() the characters I had problems with.
This is off course not a good idea.
Ideas?
--Edit--
I know there are a multitude of encoding/decoding libraries at my disposal.
It just looks like the mvc framework is already trying to do it for me, but not completely.
<a href="<%=Html.BuildUrlFromExpression<SearchController>(c=>c.Search("", 1, "a \v/&irdStr*ng"))%>" title="my hat's awesome!">
will render me
<a href="/Search.mvc/en/Search/1/a%20%5Cv/&irdStr*ng" title="my hat's awesome!">
Notice how the forward slash, asterisk and ampersand are not escaped.
Why are some escaped and others not? How can I now escape this properly?
Am I doing something wrong or is it the framework?
Parameters should be escaped using Uri.EscapeDataString:
string url = string.Format("http://www.foo.bar/page?name={0}&address={1}",
Uri.EscapeDataString("adlknad /?? lkm#"),
Uri.EscapeDataString(" qeio103 8182"));
Console.WriteLine(url);
Uri uri = new Uri(url);
string[] options = uri.Query.Split('?','&');
foreach (string option in options)
{
string[] parts = option.Split('=');
if (parts.Length == 2)
{
Console.WriteLine("{0} = {1}",parts[0],
Uri.UnescapeDataString(parts[1]));
}
}
AS others have mentioned, if you encode your string first you aviod the issue.
The MVC Framework is encoding characters that it knows it needs to encode, but leaving those that are valid URL characters (e.g. & % ? * /). This is because these are valid URL characters, although they are special chracters in a URL that might not acheive the result you are after.
Try using the Microsoft Anti-Cross Site Scripting library. It contains several Encode methods, which encode all the characters (including #, and characters in other languages). As for decoding, the browser should handle the encoded Url just fine, however if you need to manually decode the Url, use Uri.UnescapeDataString
Hope that helps.
Escaping of forward slahes and dots in path part of url is prohibited by security reason (althrough, it works in mono).
Html.BuildUrlFromExpression needs to be fixed then, would submit this upstream to the MVC project... alternatively do the encoding to the string before passing to BuildUrlFromExpression, and decode it when it comes back out on the other side.
It may not be readily fixable, as IIS may be handling the decoding of the url string beforehand... may need to do some more advanced encoding/decoding for alternative path characters in the utility methods, and decode on your behalf coming out.
I've seen similar posts on this. Too me, it looks like a flaw in MVC. The function would be more appropriately named "BuildUrlFromEncodedExpression". Whats worse, is that the called function needs to decode its input parameters. Yuk.
If there is any overlap between the characters encoded BuildUrlFromExpression() and the characters encoded by the caller (who, I think might fairly just encode any non-alphanumeric for simplicities sake) then you have potential for nasty bugs.
Server.URLEncode or HttpServerUtility.UrlEncode
I see what you're saying now - I didn't realize the question was specific to MVC. Looks like a limitation of that part of the MVC framework - particularly BuildUrlFromExpression is doing some URL encoding, but it knows that also needs some of those punctation as part of the framework URLs.
And also unfortunately, URLEncoding doesn't produce an invariant, i.e.
URLEncode(x) != URLEncode(URLEncode(x))
Wouldn't that be nice. Then you could pre-encode your variables and they wouldn't be double encoded.
There's probably an ASP.NET MVC framework best practice for this. I guess another thing you could do is encode into base64 or something that is URLEncode-invariant.
Have you tried using the Server.UrlEncode() method to do the encoding, and the Server.UrlDecode() method to decode?
I have not had any issues with using it for passing items.

Categories