Unescape an escaped url in c#

Unescape an escaped url in c# - c#

I have urls which is escaped in this form:
http://www.someurl.com/profile.php?mode=register&agreed=true
I want to convert it to unescaped form
http://www.someurl.com/profile.php?mode=register&agreed=true
is this the same thing as escapped html?
how do i do this?
thanks

& is an HTML entity and is used when text is encoded into HTML because you have to "escape" the & that has a special meaning in HTML. Apparently, this escaping mechanism was used on the URL presumably because it is used in some HTML for instance in a link. I'm not sure why you want to decode it because the browser will do the proper decoding when the link is clicked. But anyway, to revert it you can use HttpUtility.HtmlDecode in the System.Web namespace:
var encoded = "http://www.someurl.com/profile.php?mode=register&agreed=true";
var decoded = HttpUtility.HtmlDecode(encoded);
The value of decoded is:
http://www.someurl.com/profile.php?mode=register&agreed=true
Another form of encoding/decoding used is URL encoding. This is used to be able to include special characters in parts of the URL. For instance the characters /, ? and & have a special meaning in a URL. If you need to include any of these characters in a say a query parameter you will have to URL encode the parameter to not mess up the URL. Here is an example of an URL where URL escaping has been used:
http://www.someurl.com/profile.php?company=Barnes+%26+Noble
The company name Barnes & Noble was encoded as Barnes+%26+Noble. If the & hadn't been escaped the URL would have contained not one but two query parameters because & is used as a delimiter between query parameters.

not sure why but decode from #Martin's answer doesn't work in my case (filename in my case is "%D1%8D%D1%84%D1%84%D0%B5%D0%BA%D1%82%D0%B8%D0%BD%D0%BE%D0%B2%D1%81%D1%82%D1%8C%20%D0%BF%D1%80%D0%BE%D0%B4%D0%B0%D0%B6%202020%20(1)-8.xml").
For me works method - https://learn.microsoft.com/en-us/dotnet/api/system.uri.unescape?view=netcore-3.1 .
Be aware that this is obsolete.

Related

Replacing illegal or undesired characters in requested URL before processing it .net

My site throws an exception every time a special kind of character is included in the request, or when the size of the URL exceeds a certain length.
How can I control the URL and transform it before processing it (For example : if the request was http://xwz.com/"ert I want to turn it into http://xwz.com/ert). Something like that.
I am using .net and c#

use this : HttpServerUtility.UrlEncode Method (String)
You can use it like this :
System.Web.HttpUtility.UrlEncode("test t");
You will need this library : UrlEncode usesSystem.Web.HttpUtility.UrlEncodeto encode strings.

Looking for HttpUtility.UrlEncode
The UrlEncode(String) method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream without encoding, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.

The code below will replace any invalid characters in your URL by an empty space
string url = System.Text.RegularExpressions.Regex.Replace(url , #"/^[!#$&-;=?-[]_a-z~]+$/", "");

I think this is what you're looking for:
System.Web.HttpUtility.UrlEncode(string url)

QueryString converting %E1 to %ufffd

I have a URL like the following
http://mysite.com/default.aspx?q=%E1
Where %E1 is supposed to be á. When I call Request.QueryString from my C# page I receive
http://mysite.com/default.aspx?q=%ufffd
It does this for any accented character. %E1, %E3, %E9, %ED etc. all get passed as %ufffd. Normal encoded values (%2D, %2E, %27) all get passed correctly.
The config file already has the responseEncoding/requestEncoding in the globalization section set to UTF-8.
How could I read the correct values?
Please note that I'm not the one generating the query string and I have no control over it.

While it's true that á is encoded as U+00E1, the UTF-8 encoding (which is relevant for URL parameters) is 0xC3 0xA1.
You can verify by called a Wikipedia entry on an accented letter, such as http://en.wikipedia.org/wiki/%C3%81
U+FFFD is the Unicode Replacement Character which indicates the a given character value cannot be correctly encoded in Unicode.
Update:
Your question has two points.
First: How do I encode a Unicode string as parameter. Use
"?q=" + HttpUtility.UrlEncode(value)
Second: How do I retrieve a Unicode value? Use:
Request["q"]
If you receive the %E1 from some other source you do not control, maybe the RawUrl can help you. (I have not tried)

UrlEncoding - which encoding should I use?

Using HttpUtility.UrlEncode and passing via the URL the receiving page sees the variables as:
brand new -> brand+new
Airconaire+Ltd -> Airconaire+Ltd
Can you see how the first and the second both have a + in them where they didn't at the start? I'm assuming this is something to do with the encoding (specifically RFC3986 or RFC2396) but how do I solve this?
I think ideally the spaces should be converted to %20 but is this the best way forward?

Try using HttpUtility.UrlPathEncode rather than URLEncode.

The UrlEncode() method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx

C# mvc2 encode url

i'm trying to encode an url with the code below;
var encodedUrl = HttpUtility.UrlEncode("http://www.example.com");
var decodedUrl = HttpUtility.UrlDecode("http%3A%2F%2Fwww%2Eexample%2Ecom%2F");
I'm working with the google webmaster tools api and this api expects an URL as shown in the decodedUrl variable above. Every single character is encoded there.
When i use the httputility encode function i get the following result;
http%3a%2f%2fwww.example.com
How can i use the encoding variable in such a way that every character in the url is encoded?

I'm pretty sure that HtmlUtility and AntiXss (another MS tool for encoding urls) aren't going to help here. A "." in a url is considered valid and so doesn't need to be encoded.
I think you're going to have to post-process your encoded string to further encode other characters that are not valid within teh google webmaster tools API.
i.e. do something like this...
var encodedUrl = HttpUtility.UrlEncode("http://www.example.com")
.Replace(".", "%2E");
... assuming that "." is the only character you're having problems with.

The period is not a reserved character in a URL, so it won't be encoded. See this question and answer for an elegant solution.

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.

First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?

If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.