How to encode a path that contains a hash? - c#

How do you properly encode a path that includes a hash (#) in it? Note the hash is not the fragment (bookmark?) indicator but part of the path name.
For example, if there is a path like this:
http://www.contoso.com/code/c#/somecode.cs
It causes problems when you for example try do this:
Uri myUri = new Uri("http://www.contoso.com/code/c#/somecode.cs");
It would seem that it interprets the hash as the fragment indicator.
It feels wrong to manually replace # with %23. Are there other characters that should be replaced?
There are some escaping methods in Uri and HttpUtility but none seem to do the trick.

There are a few characters you are not supposed to use. You can try to work your way through this very dry documentation, or refer to this handy URL summary on Stack Overflow.
If you check out this very website, you'll see that their C# questions are encoded %23.
Stack Overflow C# Questions
You can do this using either (for ASP.NET):
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
Server.UrlEncode("c#")
);
Or for class libraries / desktop:
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
HttpUtility.UrlEncode("c#")
);

Did some more digging friends and found a duplicate question for Java:
HTTP URL Address Encoding in Java
However, the .Net Uri class does not offer the constructor we need, but the UriBuilder does.
So, in order to construct a proper URI where the path contains illegal characters, do this:
// Build Uri by explicitly specifying the constituent parts. This way, the hash is not confused with fragment identifier
UriBuilder uriBuilder = new UriBuilder("http", "www.contoso.com", 80, "/code/c#/somecode.cs");
Debug.WriteLine(uriBuilder.Uri);
// This outputs: http://www.contoso.com/code/c%23/somecode.cs
Notice how it does not unnecessarily escape parts of the URI that does not need escaping (like the :// part) which is the case with HttpUtility.UrlEncode. It would seem that the purpose of this class is actually to encode the querystring/fragment part of the URL - not the scheme or hostname.

Use UrlEncode: System.Web.HttpUtility.UrlEncode(string)
class Program
{
static void Main(string[] args)
{
string url = "http://www.contoso.com/code/c#/somecode.cs";
string enc = HttpUtility.UrlEncode(url);
Console.WriteLine("Original: {0} ... Encoded {1}", url, enc);
Console.ReadLine();
}
}

Related

Correct way of manipulating the url coming from App.config

I have a URL ("http://localhost:2477/") on which I do get and post request. I have stored this URL in the app.config file of my project.
In the code, depending on the function, I add the string "getValue?id={0}" or "postValue" to this URL. But I later ran into an issue when I changed the URL to "http://localhost:2477" (no forward slash in the end) in the app.config.
Took me some embarrassing amount of time to figure out this issue, which made me wonder if there is a good way to handle this case.
Irrespective of the case when there is a forward slash or not in the URL, I want my code to change it to a proper URL.
Always use Path.Combine(string, string). This method will conform a valid path and should add the / if needed.
edit
I realized my answer does not work for URL, just for file paths.
What you’re looking for is Uri constructor instead.
Uri baseUri = new Uri("http://www.contoso.com");
Uri myUri = new Uri(baseUri, "catalog/shownew.htm");
Using the Uri class you can modify your URL more elegantly. You can access the Host, Port, Query, etc. with ease. A similar question was asked here.
Try to use the UriBuilder, it's far more flexible as the Uri Constructor.
See https://stackoverflow.com/a/20164328/10574963

C# Percentage Decoding

I am trying to decode a percentage encoded string passed from a PHP script to my C# application. The PHP encrypts the data, so there are many special characters that I percentage encode.
Here's the string I'm passing in the URL:
%C9%90%04L%EFEA%D1U%AFi%CBc%3A%E5%D0%40Q%D6%1Bn%C9%C3%B5%0FT%FC%E5h%95m%EF%BF%24tB%A6%D1%08%3B%83%A1%CF%1B%99Zo%02
But it has trouble percentage decoding parts of it: when I fetch the query string, like so:
var queryString = HttpContext.Current.Request.QueryString;
var token = queryString["token"];
The variable token, though, equals this:
%C9%90%04L%EF%BF%BDEA%EF%BF%BDU%EF%BF%BDI%EF%BF%BDC%3A%EF%BF%BD%EF%BF%BD%40Q%EF%BF%BD%1BN%EF%BF%BD%C3%B5%0FT%EF%BF%BD%EF%BF%BDH%EF%BF%BDM%EF%BF%BD%24TB%EF%BF%BD%EF%BF%BD%08%3B%EF%BF%BD%EF%BF%BD%EF%BF%BD%1B%EF%BF%BDZO%02
This is definitely not what I put into the query string. When looking at it, the first time it messes up is %EF(starts 11th character into the original query string). Instead of %EF it shows: %EF%BF%BD. When I searched a little, I found this webpage which says the "Hex UTF-8 Bytes" are EF BF BD.
EDIT:
Forgot to mention, QueryString looks like this:
token=%c9%90%04L%ef%bf%bdEA%ef%bf%bdU%ef%bf%bdi%ef%bf%bdc%3a%ef%bf%bd%ef%bf%bd%40Q%ef%bf%bd%1bn%ef%bf%bd%c3%b5%0fT%ef%bf%bd%ef%bf%bdh%ef%bf%bdm%ef%bf%bd%24tB%ef%bf%bd%ef%bf%bd%08%3b%ef%bf%bd%ef%bf%bd%ef%bf%bd%1b%ef%bf%bdZo%02&oauth_token_secret=S%23%2bw%ef%bf%bd%ef%bf%bdX%17%ef%bf%bd0%ef%bf%bd%60%ef%bf%bd%ef%bf%bd%ef%bf%bd%ef%bf%bd*%ef%bf%bdi%08%ef%bf%bd%ef%bf%bd%ef%bf%bd%ef%bf%bd%07%ef%bf%bd%12RS07%ef%bf%bdgl%1e%ef%bf%bd%d7%832%d1%a1%ef%bf%bd%275%ef%bf%bdv%ef%bf%bd
You might be looking for the HttpServerUtility.UrlDecode method:
HttpContext.Current.Server.UrlDecode(HttpContext.Current.Request.QueryString["token"]);
Your error is somewhere else...
I created a new web page in a .Net 4.0 project, and put this in the Page_Load:
protected void Page_Load(object sender, EventArgs e)
{
var queryString = HttpContext.Current.Request.QueryString;
var token = queryString["token"];
throw new Exception(token);
}
Then I ran the page by going to this URL which matches the querystring you gave above:
http://localhost:27151/test.aspx?token=%c9%90%04L%ef%bf%bdEA%ef%bf%bdU%ef%bf%bdi%ef%bf%bdc%3a%ef%bf%bd%ef%bf%bd%40Q%ef%bf%bd%1bn%ef%bf%bd%c3%b5%0fT%ef%bf%bd%ef%bf%bdh%ef%bf%bdm%ef%bf%bd%24tB%ef%bf%bd%ef%bf%bd%08%3b%ef%bf%bd%ef%bf%bd%ef%bf%bd%1b%ef%bf%bdZo%02
The page decoded the token and displayed it in the exception message as this:
ɐL�EA�U�i�c:��#Q�n�õT��h�m�$tB��;����Zo
(The URL is encoding a binary string so when the actual string is printed, there are some characters that don't get displayed.)
If you run this and the token actually prints "%C9%90%04L..." then your token has probably been double encoded. All of the percent signs will be replaced with "%25" so your URL would look like this:
http://localhost:27151/test.aspx?token=%25c9%2590%2504L%25ef%25bf%25bdEA%25ef%25bf%25bdU%25ef%25bf%25bdi%25ef%25bf%25bdc%253a%25ef%25bf%25bd%25ef%25bf%25bd%2540Q%25ef%25bf%25bd%251bn%25ef%25bf%25bd%25c3%25b5%250fT%25ef%25bf%25bd%25ef%25bf%25bdh%25ef%25bf%25bdm%25ef%25bf%25bd%2524tB%25ef%25bf%25bd%25ef%25bf%25bd......
Since you didn't mention the "%25", the error is somewhere else in your code. The URL you think you are using is not the one being decoded.

QueryString Out of Encoded URL

I have an encoded URL.
http%3a%2f%myurl.test.me%2fSometjing%2fProduct%2fSearch%3fq=Tomato
I am trying to get query string out of the url which is "Tomato". I am using the following code but it returns null.
var parsedQuery = HttpUtility.ParseQueryString((url));
Console.Write(parsedQuery["q"]); // null
You're missing a few steps. You need to decode the URL, then pull out the query string, and then parse the query string:
string decoded =
HttpUtility.UrlDecode("http%3a%2f%2fmyurl.test.me%2fSometjing%2fProduct%2fSearch%3fq=Tomato");
var uri = new Uri(decoded);
var parsedQuery = HttpUtility.ParseQueryString(uri.Query);
Console.WriteLine (parsedQuery["q"]); // Tomato
Also, your encoded URL is a little malformed. The one in your post decoded looks like this:
http:/%myurl.test.me/Sometjing/Product/Search?q=Tomato
I think you just missed a 2f after the % right before myurl.test:
http%3a%2f%2fmyurl.test.me%2fSometjing%2fProduct%2fSearch%3fq=Tomato
The URL needs to decoded first before you can use the HttpUtility.ParseQueryString().
Fair warning though mentioned directly from MSDN.
The ParseQueryString method uses query strings that might contain user input, which is a potential security threat. By default, ASP.NET Web pages validate that user input does not include script or HTML elements. MSDN.

Box API v2 creating folder with cyrillic letters in it's name

I'm trying to create folder using new API.
If folder name contains cyrillic letters, I receive HTTP 400 Bad Request.
However it works fine with latin letters.
Is it known issue?
I found correct answer here: Detecting the character encoding of an HTTP POST request
the default encoding of a HTTP POST is ISO-8859-1.
The only thing I need is to manually set encoding of the request.
By the way, here is working code:
public static Task<string> Post(string url, string data, string authToken) {
var client = new WebClient { Encoding = Encoding.UTF8 };
client.Headers.Add("Content-Type:application/x-www-form-urlencoded");
client.Headers.Add(AuthHeader(authToken));
return client.UploadStringTaskAsync(new Uri(url), "POST", data);
}
Usually, complications involving international characters in Box API calls just need minor adjustments to the encoding of the requests. I'm guessing you'll just have to encode the target folder name with a urlencode.
If that doesn't do the trick, we may be able to help more if you send a sample request or code snippet. If you do, keep the api key and auth token to yourself.

Getting U+fffd/65533 instead of special character from Query String

I have a C# .net web project that have a globalization tag set to:
<globalization requestEncoding="utf-8" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
When this URL a Flash application (you get the same problem when you enter the URL manually in a browser): c_product_search.aspx?search=kjøkken (alternatively: c_product_search-aspx?search=kj%F8kken
Both return the following character codes:
k U+006b 107
j U+006a 106
� U+fffd 65533
k U+006b 107
k U+006b 107
e U+0065 101
n U+006e 110
I don't know too much about character encoding, but it seems that the ø has been given a unicode replacement character, right?
I tried to change the globalization tag to:
<globalization requestEncoding="iso-8859-1" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
That made the request work. However, now, other searches on my page stopped working.
I also tried the following with similar results:
NameValueCollection qs = HttpUtility.ParseQueryString(Request.QueryString.ToString(), Encoding.GetEncoding("iso-8859-1"));
string search = (string)qs["search"];
What should I do?
Kind Regards,
nitech
The problem comes from the combination Firefox/Asp.Net. When you manually entered a URL in Firefox's address bar, if the url contains french or swedish characters, Firefox will encode the url with "ISO-8859-1" by default.
But when asp.net recieves such a url, it thinks that it's utf-8 encoded ... And encoded characters become "U+fffd". I couldn't find a way in asp.net to detect that the url is "ISO-8859-1". Request.Encoding is set to utf-8 ... :(
Several solutions exist :
put <globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"/> in your Web.config. But your may comme with other problems, and your application won't be standard anymore (it will not work with languages like japanese) ... And anyway, I prefer using UTF-8 !
go to about:config in Firefox and set the value of network.standard-url.encode-query-utf8 to true. It will now work for you (Firefox will encode all your url with utf-8). But not for anybody else ...
The least worst solution I could come with was to handle this with code. If the default decoding didn't work, we reparse QueryString with iso8859-1 :
string query = Request.QueryString["search"];
if (query.Contains("%ufffd"))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))["search"];
query = HttpUtility.UrlDecode(query);
It works with hyperlinks and manually-entered url, in french, english, or japanese. But I don't know how it will handle other encodings like ISO8859-5 (russian) ...
Does anyone have a better solution ?
This solves only the problem of manually-entered url. In your hyperlinks, don't forget to encode url parameters with HttpUtility.UrlEncode on the server, or encodeURIComponent on the javascript code. And use HttpUtility.UrlDecode to decode it.
public string GetEncodedQueryString(string key)
{
string query = Request.QueryString[key];
if (query != null)
if (query.Contains((char)0xfffd))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))[key];
return query;
}
i think your problem is in the flash, not the .net.
it sends the special character in a weird way.
try to urlencode the search string bevore you send it to the server.
If the app is expecting the URL-encoded request to be based on UTF-8, the character "ø" should be "%C3%B8", not "%F8". Whatever function you're using to escape/encode that request, you probably need to pass it the name of the underlying character encoding, "UTF-8".
It turns out that ActionScript 2.0 will send the URL encoded/escaped with UTF-8 while ActionScript 3.0 used ISO-8859-1. The way to solve this was to change the Request.Encoding value inside Global.asax if an encoding is specified in the URL:
void Application_BeginRequest(object sender, EventArgs e)
{
HttpContext ctx = HttpContext.Current;
// encoding specified?
if (!String.IsNullOrEmpty(Request["encoding"]))
{
ctx.Request.ContentEncoding = System.Text.Encoding.GetEncoding(ctx.Request["encoding"]);
}
}
Could it be done differently?
Regards,
nitech

Categories