Preserve an escaped Uri with HttpClient - c#

I'm trying to use HttpClient to create a GET request with the following Uri:
http://test.com?action=enterorder&ordersource=acme&resid=urn%3Auuid%3A0c5eea50-9116-414e-8628-14b89849808d
As you can see, the resid param is escaped with %3A, ie the ":" character.
When I use this Uri in the HttpClient request, the url becomes:
http://test.com?action=enterorder&ordersource=acme&resid=urn:uuid:0c5eea50-9116-414e-8628-14b89849808d and I receive an error from the server because %3A is expected.
Anyone have any clue on what to do to preserve the escaped Uri when sending the request? It seems HttpClient always unescaped characters on the string before sending it.
Here is the code used:
Uri uri = new Uri("http://test.com?action=enterorder&ordersource=acme&resid=urn%3Auuid%3A0c5eea50-9116-414e-8628-14b89849808d");
using (HttpClient client = new HttpClient())
{
var resp = client.GetAsync(uri);
if (resp.Result.IsSuccessStatusCode)
{
var responseContent = resp.Result.Content;
string content = responseContent.ReadAsStringAsync().Result;
}
}

You may want to test in .NET 4.5 as a bunch of improvements were made to Uri parsing for escaped chars.
You can also check out this SO question: GETting a URL with an url-encoded slash which has a hack posted that you can use to force the URI to not get touched.

As a workaround you could try to encode this url part again to circumvent the issue. %3A would become %253A

Related

WebRequest returns unreadable string [duplicate]

I'm trying to download an html document from Amazon but for some reason I get a bad encoded string like "��K��g��g�e".
Here's the code I tried:
using (var webClient = new System.Net.WebClient())
{
var url = "https://www.amazon.com/dp/B07H256MBK/";
webClient.Encoding = Encoding.UTF8;
var result = webClient.DownloadString(url);
}
Same thing happens when using HttpClient:
var url = "https://www.amazon.com/dp/B07H256MBK/";
var httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(url);
I also tried reading the result in Bytes and then convert it back to UTF-8 but I still get the same result. Also note that this DOES NOT always happen. For example, yesterday I was running this code for ~2 hours and I was getting a correctly encoded HTML document. However today I always get a bad encoded result. It happens every other day so it's not a one time thing.
==================================================================
However when I use the HtmlAgilitypack's wrapper it works as expected everytime:
var url = "https://www.amazon.com/dp/B07H256MBK/";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);
What causes the WebClient and HttpClient to get a bad encoded string even when I explicitly define the correct encoding? And how does the HtmlAgilityPack's wrapper works by default?
Thanks for any help!
I fired up Firefox's web dev tools, requested that page, and looked at the response headers:
See that content-encoding: gzip? That means the response is gzip-encoded.
It turns out that Amazon gives you a response compressed with gzip even when you don't send an Accept-Encoding: gzip header (verified with another tool). This is a bit naughty, but not that uncommon, and easy to work around.
This wasn't a problem with character encodings at all. HttpClient is good at figuring out the correct encoding from the Content-Type header.
You can tell HttpClient to un-zip responses with:
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip,
};
using (var client = new HttpClient(handler))
{
// your code
}
This will be set automatically if you're using the NuGet package versions 4.1.0 to 4.3.2, otherwise you'll need to do it yourself.
You can do the same with WebClient, but it's harder.

HttpClient GetAsync with a hash in URL

.NET Core 2.2 console application on Windows.
I'm exploring how to use HttpClient GetAsync on a Stackoverflow share style URL eg: https://stackoverflow.com/a/29809054/26086 which returns a 302 redirect URL with a hash in it
static async Task Main()
{
var client = new HttpClient();
// 1. Doesn't work - has a hash in URL
var url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054#29809054";
HttpResponseMessage rm = await client.GetAsync(url);
Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request
// 2. Does work - no hash
url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054";
rm = await client.GetAsync(url);
Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 200 Okay
// 3. Doesn't work as the 302 redirect goes to the first URL above with a hash
url = "https://stackoverflow.com/a/29809054/26086";
rm = await client.GetAsync(url);
Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request
}
I'm crawling my blog which has many SO short codes in it.
Update/Workaround
With thanks to #rohancragg I found that turning off AutoRedirect then getting the URI from the returned header worked
// as some autoredirects fail due to #fragments in url, handle redirects manually
var handler = new HttpClientHandler { AllowAutoRedirect = false };
var client = new HttpClient(handler);
var url = "https://stackoverflow.com/a/29809054/26086";
HttpResponseMessage rm = await client.GetAsync(url);
// gives the desired new URL which can then GetAsync
Uri u = rm.Headers.Location;
As #Damien_The_Unbeliever implies in a comment, you'll just need to strip off the hash and everything after it - all that does is tell the browser to jump to that anchor tag in the HTML page (see: https://w3schools.com/jsref/prop_anchor_hash.asp).
You could also use the Uri class to parse the Uri and ignore any 'fragments': https://learn.microsoft.com/en-us/dotnet/api/system.uri.fragment
Because the share-style Urls are only ever going to return a 302 then I'd suggest capturing the Uri to which the 302 is referring and do as I suggest above and just get the path and ignore the fragment.
So you need to use some mechanism (which I'm just looking up!) to handle a 302 gracefully followed by option 2
Update: this looks relevant! How can I get System.Net.Http.HttpClient to not follow 302 redirects?
Update 2 Steve Guidi has a very important bit of advice in a comment here: https://stackoverflow.com/a/17758758/5351
In response to the advice that you need to use HttpResponseMessage.RequestMessage.RequestUri:
it is very important to add HttpCompletionOption.ResponseHeadersRead
as the second parameter of the GetAsync() call
Disclaimer - I've not tried the above, this is just based on reading ;-)
Maybe you need to encode your URL before send the request using HttpUtility class, this way any special character will be escaped.
using System.Web;
var url = $"htpps://myurl.com/{HttpUtility.UrlEncode("#1234567")}";

Is there any Issue or Risk of data lose if i am sending URL encoded data with a UTF8 encoding inside WebClient()

I am working on an asp.net mvc-4 web application . and my asp.net mvc-4 web application integrates with a 3rd party API. now the third part API should accept the json object in url encoded format + the 3rd party API allow passing non-ASCII characters.
so to be able to send url encoded json data and pass non-ASCII characters. i ended up with the following WebClient:-
using (WebClient wc = new WebClient())
{
var data = JsonConvert.SerializeObject(mainresourceinfo);
string url = currentURL + "resources?AUTHTOKEN=" + pmtoken;
Uri uri = new Uri(url);
wc.Encoding = System.Text.Encoding.UTF8;
wc.Headers.Add(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
crudoutput = wc.UploadString(uri, "INPUT_DATA=" + HttpUtility.UrlEncode(data));
}
where as shown above i have defined :-
wc.Encoding = System.Text.Encoding.UTF8;
to allow passing non-ASCII characters.
also since the 3rd part API should accept the json in url encoded format so i ended up defining HttpUtility.UrlEncode(data) inside the UploadString and defining wc.Headers.Add(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
First Question. is defining all these encoding and url encoding together a valid appraoch ?
Second Question. i though that defining
wc.Encoding = System.Text.Encoding.UTF8;
will ONLY affect how the data will be uploaded,, mainly to allow non-ASCII characters,, but seems wc.Encoding = System.Text.Encoding.UTF8; will also affect how the UploadString call output will be handled...so in my case if i i remove the wc.Encoding = System.Text.Encoding.UTF8; and the API rely back "The resource named ££123 has been added successfully" ,, then the crudoutput variable will have a value such as "The resource named ÂÂ123 has been added successfully".where the ££ will be assigned as ÂÂ . so this mean the wc.Encoding = System.Text.Encoding.UTF8; will mainly affect how the data will be uploaded and also how the output will be handled ? is this correct ?
So can anyone adivce about my above 2 questions ?
Thanks
You approach is valid as long as your pmtoken doesn't contain any characters such as & or + which have special meaning in a url. If it does, I'd recommend using URLEncode() to encode the pmtoken. As for your second question, yes it affects both request and response. Also make sure you data is not extremely long. Cos, browser(s) may have a limit on the url length and may cause issue. AFAIK, Microsoft has 2K character limit on url.

Box API v2 creating folder with cyrillic letters in it's name

I'm trying to create folder using new API.
If folder name contains cyrillic letters, I receive HTTP 400 Bad Request.
However it works fine with latin letters.
Is it known issue?
I found correct answer here: Detecting the character encoding of an HTTP POST request
the default encoding of a HTTP POST is ISO-8859-1.
The only thing I need is to manually set encoding of the request.
By the way, here is working code:
public static Task<string> Post(string url, string data, string authToken) {
var client = new WebClient { Encoding = Encoding.UTF8 };
client.Headers.Add("Content-Type:application/x-www-form-urlencoded");
client.Headers.Add(AuthHeader(authToken));
return client.UploadStringTaskAsync(new Uri(url), "POST", data);
}
Usually, complications involving international characters in Box API calls just need minor adjustments to the encoding of the requests. I'm guessing you'll just have to encode the target folder name with a urlencode.
If that doesn't do the trick, we may be able to help more if you send a sample request or code snippet. If you do, keep the api key and auth token to yourself.

URI formatting URL

All
I have the the following lines of code... (.net 3.5)
string URL = "http://api.linkedin.com/v1/people/url=http%3a%2f%2fuk.linkedin.com%2fpub%2fjulian-welby-everard%2f0%2fb97%2f416";
UriBuilder uri = new UriBuilder(URL);
this returns a URL in the URI object of http://api.linkedin.com/v1/people/url=http://uk.linkedin.com/pub/julian-welby-everard/0/b97/416 which as been decoded, I do not what this to happen
so I tried to encoded the data twice giving
string URL = "http://api.linkedin.com/v1/people/url=http%253a%252f%252fuk.linkedin.com%252fpub%252fjulian-welby-everard%252f0%252fb97%252f416";
UriBuilder uri = new UriBuilder(URL);
this now returns the URL as follows http://api.linkedin.com/v1/people/url=http%253a%252f%252fuk.linkedin.com%252fpub%252fjulian-welby-everard%252f0%252fb97%252f416 note that it has not decoded anything this time, i was hoping that it would decode just like the first attempt but as this had been double encoded it would return the string in the correct format.
So the question is as follows, I can I stop the URI object from decoding the supplied URL so I can pass the correct data accross to the HttpWebRequest.
Julian
I believe you are looking for HttpUtility.UrlEncode("http://www.google.com/") which returns http%3a%2f%2fwww.google.com%2f.

Categories