I'm trying to download an html document from Amazon but for some reason I get a bad encoded string like "��K��g��g�e".
Here's the code I tried:
using (var webClient = new System.Net.WebClient())
{
var url = "https://www.amazon.com/dp/B07H256MBK/";
webClient.Encoding = Encoding.UTF8;
var result = webClient.DownloadString(url);
}
Same thing happens when using HttpClient:
var url = "https://www.amazon.com/dp/B07H256MBK/";
var httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(url);
I also tried reading the result in Bytes and then convert it back to UTF-8 but I still get the same result. Also note that this DOES NOT always happen. For example, yesterday I was running this code for ~2 hours and I was getting a correctly encoded HTML document. However today I always get a bad encoded result. It happens every other day so it's not a one time thing.
==================================================================
However when I use the HtmlAgilitypack's wrapper it works as expected everytime:
var url = "https://www.amazon.com/dp/B07H256MBK/";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);
What causes the WebClient and HttpClient to get a bad encoded string even when I explicitly define the correct encoding? And how does the HtmlAgilityPack's wrapper works by default?
Thanks for any help!
I fired up Firefox's web dev tools, requested that page, and looked at the response headers:
See that content-encoding: gzip? That means the response is gzip-encoded.
It turns out that Amazon gives you a response compressed with gzip even when you don't send an Accept-Encoding: gzip header (verified with another tool). This is a bit naughty, but not that uncommon, and easy to work around.
This wasn't a problem with character encodings at all. HttpClient is good at figuring out the correct encoding from the Content-Type header.
You can tell HttpClient to un-zip responses with:
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip,
};
using (var client = new HttpClient(handler))
{
// your code
}
This will be set automatically if you're using the NuGet package versions 4.1.0 to 4.3.2, otherwise you'll need to do it yourself.
You can do the same with WebClient, but it's harder.
Related
I want to post one or more values to a php file on a strato (the host) server on a https domain using a C# WPF desktop application. However, after several attempts with a test program nothing seems to work. The testing value is not posted to the server, the $_POST Array is empty, respectively. I do get an echoed answer from the server but its always 'Error' (See script below).
I did try this with several techniques as well:
Webclient / HttpClient
HttpWebRequest
Adjusting SecurityProtocolType
sending value as byte[]
and what not.
$_GET by the way works perfectly fine as I always get back the computed testing value from the php script. However, I would like to have a POST Request since I am sending user data to the server.
More precisely, I already have tried these solutions (and several similar ones):
How to make HTTP POST web request (basically all of them except 3rd Party)
C# HttpClient not sending POST variables
https://holger.coders-online.net/118
http://www.howtosolvenow.com/how-to-send-https-post-request-with-c/
Testing PHP Script:
$newnumber = 'Error';
if(isset($_POST['number']))
{
$newnumber = $_POST['number']+1;
}
echo $newnumber;
//keeps on returning 'Error'
Latest attempt of C# code:
string resultString = null;
string _url = "https://myURL.com/test.php";
//ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls11;
HttpClient client = new HttpClient();
var content = new FormUrlEncodedContent(new[] {
new KeyValuePair<string,string>("number", "2"),
});
var response = await client.PostAsync(_url, content);
resultString = await response.Content.ReadAsStringAsync();
this._txt.Text = resultString;
I'm trying to upload crash manually to HockeyApp using public API. When calling the api link using Postman and uploading crash.log file it works fine but when I try to do the same from C# code I get 404 error.
Here is my code:
string log = ""; //log content
using (HttpClient client = new HttpClient())
{
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("*/*"));
var content = new MultipartFormDataContent();
var stringContent = new StringContent(log);
stringContent.Headers.ContentType = System.Net.Http.Headers.MediaTypeHeaderValue.Parse("text/plain");
content.Add(stringContent, "log", "crash.log");
var response = await this.client.PostAsync("https://rink.hockeyapp.net/api/2/apps/[APP_ID]/crashes/upload", content);
}
I was using WireShark to analyse the request that Postman is sending and tried to make mine look exactly the same. The only difference I see is that request from C# code has filename* field in Content-Disposition for the attachment while the one from Postman doesn't:
Content-Disposition: form-data; name="log"; filename="crash.log"; filename*=utf-8''%22crash.log%22
It might be worth mentioning that the code is written in portable library in Xamarin project.
Following #Lukas Spieß sugestion I asked the question on HockeyApp support. Apparently they don't handle quotes in the boundary header. The one thing I missed comparing Postman request and mine.
Here is the solution:
var contentTypeString = content.Headers.ContentType.ToString().Replace("\"", "");
content.Headers.Remove("Content-Type");
content.Headers.TryAddWithoutValidation("Content-Type", contentTypeString);
I'm familiar with Winform and WPF, but new to web developing. One day saw WebClient.UploadValues and decided to try it.
static void Main(string[] args)
{
using (var client = new WebClient())
{
var values = new NameValueCollection();
values["thing1"] = "hello";
values["thing2"] = "world";
//A single file that contains plain html
var response = client.UploadValues("D:\\page.html", values);
var responseString = Encoding.Default.GetString(response);
Console.WriteLine(responseString);
}
Console.ReadLine();
}
After run, nothing printed, and the html file content becomes like this:
thing1=hello&thing2=world
Could anyone explain it, thanks!
The UploadValues method is intended to be used with the HTTP protocol. This means that you need to host your html on a web server and make the request like that:
var response = client.UploadValues("http://some_server/page.html", values);
In this case the method will send the values to the server by using application/x-www-form-urlencoded encoding and it will return the response from the HTTP request.
I have never used the UploadValues with a local file and the documentation doesn't seem to mention anything about it. They only mention HTTP or FTP protocols. So I suppose that this is some side effect when using it with a local file -> it simply overwrites the contents of this file with the payload that is being sent.
You are using WebClient not as it was intended.
The purpose of WebClient.UploadValues is to upload the specified name/value collection to the resource identified by the specified URI.
But it should not be some local file on your disk, but instead it should be some web-service listening for requests and issuing responces.
I am writing C# code for a WinRT Surface Tablet in Visual Studio Express 2012 for Windows 8. Although my xml is formatted (I am porting from apps on other platforms that work fine) I am apparently having trouble with the request syntax.
I've been trying several different approaches and hit dead ends with the limitation of windows store apps in methods. The last I have tried is using HttpClient, HttpContent and HttpRequestMessage: (omitting the actual xml and urls, obviously)
string xmlSOAP = "..............[my soap xml]................."
string url = "http://example.domain.com/myMagicalwebservice.asmx"
string SOAPAction = "www.blahblah.com/doXMLStuff";
HttpClient hc = new
HttpContent content = new String Content(xmlSOAP);
HttpRequestMessage req = new HttpRequestMessage(HttpMethod.Post, url);
req.Headers.Add("SOAPAction", SOAPAction);
req.Method = HttpMethod.Post;
req.Content = content;
req.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/soap+xml;charset=UTF-8");
hc.SendAsync(req).ContinueWith(responseTask =>
{
System.Diagnostics.Debug.WriteLine(responseTask.Result);
});
This results in a System.FormatException of "The format of value 'application/soap+xml;charset=UTF-8' is invalid."
If I instead add the content type directly to the HttpContent instead of to HttpRequestMessage, I get the same outcome.
If I simply comment out the line adding the content type (just doing dumb trial and error here) I receive a result with statuscode 415: "Unsupported Media Type."
I have tried posting using the PostAsync method of HttpClient but I am unsure how to get the response using that.
Any help would be very much appreciated, and I thank you in advance for your time!
Try this:
req.Content.Headers.ContentType = MediaTypeHeaderValue.Parse("application/soap+xml;charset=UTF-8");
I'm trying to port a .NET application from Windows to Mono, but certain code that was working on Windows is no longer working (as expected) on mono:
WebClient client = new WebClient ();
Console.WriteLine (client.DownloadString("http://www.maxima.fm/51Chart/"));
it seems to detect correctly the encoding as UTF-8 (and manually setting the encoding to UTF-8 or ASCII don't work either) there are still '?' characters
You are writing to the console. Maybe your console is not configured properly to show certain characters. Make sure by debugging and storing the result into an intermediary variable.
Also the site you gave as example is completely messed up. The web server sends Content-Type: text/html; charset=iso-8859-1 HTTP header and in the resulting HTML you see <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> which of course is completely incoherent. You cannot expect an HTTP client to behave correctly when confronted to non-standard site, what you get is unexpected behavior.
Try testing on some web site that respects a minimum of web standards.
Remark: WebClient implements IDisposable, so make sure you wrap it in a using statement.
UPDATE:
To make it work with this particular site you may try downloading the response manually and specifying the correct encoding:
// You may try different encodings here (for me it worked with iso-8859-1)
var encoding = Encoding.GetEncoding("iso-8859-1");
using (var client = new WebClient())
{
using (var stream = client.OpenRead("http://www.maxima.fm/51Chart/"))
using (var reader = new StreamReader(stream, encoding))
{
var result = reader.ReadToEnd();
Console.WriteLine(result);
}
}
using (var client = new WebClient())
{
client.Encoding = Encoding.UTF8;
Console.WriteLine (client.DownloadString("http://www.maxima.fm/51Chart/"));
}