How to include number (hash) character # in path segment? - c#

I have to download a file (using existing Flurl-Http endpoints [1]) whose name contains a "#" which of course has to be escaped to %23 to not conflict with uri-fragment detection.
But Flurl always escapes the rest but not this character, resulting in a non working uri where half of the path and all query params are missing because they got parsed as uri-fragment:
Url url = "http://server/api";
url.AppendPathSegment("item #123.txt");
Console.WriteLine(url.ToString());
Returns: http://server/api/item%20#123.txt
This means a http request (using Flurl.Http) would only try to download the non-existing resource http://server/api/item%20.
Even when I pre-escape the segment, the result still becomes exactly the same:
url.AppendPathSegment("item %23123.txt");
Console.WriteLine(url.ToString());
Again returns: http://server/api/item%20#123.txt.
Any way to stop this "magic" happen?
[1] This means I have delegates/interfaces where input is an existing Flurl.Url instance which I have to modify.

It looks like you've uncovered a bug. Here are the documented encoding rules Flurl follows:
Query string values are fully URL-encoded.
For path segments, reserved characters such as / and % are not encoded.
For path segments, illegal characters such as spaces are encoded.
For path segments, the ? character is encoded, since query strings get special treatment.
According to the 2nd point, it shouldn't encode # in the path, so how it handles AppendPathSegment("item #123.txt") is correct. However, when you encode the # to %23 yourself, Flurl certainly shouldn't unencode it. But I've confirmed that's what's happening. I invite you to create an issue on GitHub and it'll be addressed.
In the mean time, you could write your own extension method to cover this case. Something like this should work (and you wouldn't even need to pre-encode #):
public static Url AppendFileName(this Url url, string fileName) {
url.Path += "/" + WebUtility.UrlEncode(fileName);
return url;
}

I ended up using Uri.EscapeDataString(foo) because suggested WebUtility.UrlEncode replaces space with '+' which I didn't want to.

Related

C# - Decoding a string does not return the original encoded one

I have a random generated string that I need to put it in a URL, so I encode it like this:
var encodedToken = System.Web.HttpUtility.UrlEncode(token, System.Text.Encoding.UTF8);
In an ASP.NET action method, I receive this token and decode it:
var token = System.Web.HttpUtility.UrlDecode(encodedToken, System.Text.Encoding.UTF8);
but these tokens are not the same. For example the ab+cd string would encode to ab%2bcd and decoding the result would give me the ab cd string (the plus character changed to whitespace).
So far I have only noticed the + character problem, there may be others.
How can I solve this issue?
In your context, it appears that you don't need to call UrlDecode (since %2b decodes to + and + decodes to a blank space - i.e. you have double decoded).
Given, the framework appears to have already decoded it for you, you may remove your use of UrlDecode.
According to the Microsoft documentation:
You can encode a URL using with the UrlEncode method or the UrlPathEncode method. However, the methods return different results. The UrlEncode method converts each space character to a plus character (+). The UrlPathEncode method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.urlencode?view=netframework-4.7.2

Replacing illegal or undesired characters in requested URL before processing it .net

My site throws an exception every time a special kind of character is included in the request, or when the size of the URL exceeds a certain length.
How can I control the URL and transform it before processing it (For example : if the request was http://xwz.com/"ert I want to turn it into http://xwz.com/ert). Something like that.
I am using .net and c#
use this : HttpServerUtility.UrlEncode Method (String)
You can use it like this :
System.Web.HttpUtility.UrlEncode("test t");
You will need this library : UrlEncode usesSystem.Web.HttpUtility.UrlEncodeto encode strings.
Looking for HttpUtility.UrlEncode
The UrlEncode(String) method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream without encoding, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
The code below will replace any invalid characters in your URL by an empty space
string url = System.Text.RegularExpressions.Regex.Replace(url , #"/^[!#$&-;=?-[]_a-z~]+$/", "");
I think this is what you're looking for:
System.Web.HttpUtility.UrlEncode(string url)

Replacing URL encoded data with their symbols

Hi I am trying to format my Url in order for it to look more user friendly.So far I managed to replace spaces with "-" but it seems that there are special characters like # and : that display as encoded data.This is what I mean:
http://localhost:51208/Home/Details/C%23-in-Depth%2c-Second-Edition/BookId-3
The "#" symbol is displayed as %23 and the "," is displayed as %2c.I would like to be able to replace this encoding with their original symbols.
Does such a way exist?
Oh no, you totally don't want to replace it with #. This symbol has a special meaning in an url. It represents the fragment identifier and its value is never sent to the server. This basically means that if there's a # symbol in your url, everything that follows it gets truncated and never sent to the server. You may take a look at the following post to see what StackOverflow uses to format the slug in the question title. You could run your string through this replace function in order to make sure that no dangerous characters are left.
I would also recommend you reading the following blog post from Scott Hanselman where he covers the various scenarios you might encounter with IIS if you attempt to send special characters in the path portion of your url. I am quoting his conclusion here:
After ALL this effort to get crazy stuff in the Request Path, it's
worth mentioning that simply keeping the values as a part of the Query
String (remember WAY back at the beginning of this post?) is easier,
cleaner, more flexible, and more secure.
Just replace "#" with "sharp" and ":" with "-", you cannot just put those special characters in the url

UrlEncoding - which encoding should I use?

Using HttpUtility.UrlEncode and passing via the URL the receiving page sees the variables as:
brand new -> brand+new
Airconaire+Ltd -> Airconaire+Ltd
Can you see how the first and the second both have a + in them where they didn't at the start? I'm assuming this is something to do with the encoding (specifically RFC3986 or RFC2396) but how do I solve this?
I think ideally the spaces should be converted to %20 but is this the best way forward?
Try using HttpUtility.UrlPathEncode rather than URLEncode.
The UrlEncode() method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e.
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx

QueryString malformed after URLDecode

I'm trying to pass in a Base64 string into a C#.Net web application via the QueryString. When the string arrives the "+" (plus) sign is being replaced by a space. It appears that the automatic URLDecode process is doing this. I have no control over what is being passed via the QueryString. Is there any way to handle this server side?
Example:
http://localhost:3399/Base64.aspx?VLTrap=VkxUcmFwIHNldCB0byAiRkRTQT8+PE0iIHBsdXMgb3IgbWludXMgNSBwZXJjZW50Lg==
Produces:
VkxUcmFwIHNldCB0byAiRkRTQT8 PE0iIHBsdXMgb3IgbWludXMgNSBwZXJjZW50Lg==
People have suggested URLEncoding the querystring:
System.Web.HttpUtility.UrlEncode(yourString)
I can't do that as I have no control over the calling routine (which is working fine with other languages).
There was also the suggestion of replacing spaces with a plus sign:
Request.QueryString["VLTrap"].Replace(" ", "+");
I had though of this but my concern with it, and I should have mentioned this to start, is that I don't know what other characters might be malformed in addition to the plus sign.
My main goal is to intercept the QueryString before it is run through the decoder.
To this end I tried looking at Request.QueryString.toString() but this contained the same malformed information. Is there any way to look at the raw QueryString before it is URLDecoded?
After further testing it appears that .Net expects everything coming in from the QuerString to be URL encoded but the browser does not automatically URL encode GET requests.
The suggested solution:
Request.QueryString["VLTrap"].Replace(" ", "+");
Should work just fine. As for your concern:
I had though of this but my concern with it, and I should have mentioned this to start, is that I don't know what other characters might be malformed in addition to the plus sign.
This is easy to alleviate by reading about base64. The only non alphanumeric characters that are legal in modern base64 are "/", "+" and "=" (which is only used for padding).
Of those, "+" is the only one that has special meaning as an escaped representation in URLs. While the other two have special meaning in URLs (path delimiter and query string separator), they shouldn't pose a problem.
So I think you should be OK.
You could manually replace the value (argument.Replace(' ', '+')) or consult the HttpRequest.ServerVariables["QUERY_STRING"] (even better the HttpRequest.Url.Query) and parse it yourself.
You should however try to solve the problem where the URL is given; a plus sign needs to get encoded as "%2B" in the URL because a plus otherwise represents a space.
If you don't control the inbound URLs, the first option would be preferred as you avoid the most errors this way.
I'm having this exact same issue except I have control over my URL. Even with Server.URLDecode and Server.URLEncode it doesn't convert it back to a + sign, even though my query string looks as follows:
http://localhost/childapp/default.aspx?TokenID=0XU%2fKUTLau%2bnSWR7%2b5Z7DbZrhKZMyeqStyTPonw1OdI%3d
When I perform the following.
string tokenID = Server.UrlDecode(Request.QueryString["TokenID"]);
it still does not convert the %2b back into a + sign. Instead I have to do the following:
string tokenID = Server.UrlDecode(Request.QueryString["TokenID"]);
tokenID = tokenID.Replace(" ", "+");
Then it works correctly. Really odd.
I had similar problem with a parameter that contains Base64 value and when it comes with '+'.
Only Request.QueryString["VLTrap"].Replace(" ", "+"); worked fine for me;
no UrlEncode or other encoding helping because even if you show encoded link on page yourself with '+' encoded as a '%2b' then it's browser that changes it to '+' at first when it showen and when you click it then browser changes it to empty space. So no way to control it as original poster says even if you show links yourself. The same thing with such links even in html emails.
If you use System.Uri.UnescapeDataString(yourString) it will ignore the +. This method should only be used in cases like yours where when the string was encoded using some sort of legacy approach either on the client or server.
See this blog post:
http://blogs.msdn.com/b/yangxind/archive/2006/11/09/don-t-use-net-system-uri-unescapedatastring-in-url-decoding.aspx
If you URLEncode the string before adding it to the URL you will not have any of those problems (the automatic URLDecode will return it to the original state).
Well, obviously you should have the Base64 string URLEncoded before sending it to the server.
If you cannot accomplish that, I would suggest simply replacing any embedded spaces back to +; since b64 strings are not suposed to have spaces, its a legitimate tactic...
System.Web.HttpUtility.UrlEncode(yourString) will do the trick.
As a quick hack you could replace space with plus character before base64-decoding.
I am by no means a C# developer but it looks like you need to url ENCODE your Base64 string before sending it as a url.
Can't you just assume a space is a + and replace it?
Request.QueryString["VLTrap"].Replace(" ", "+");
;)

Categories