Windows 8: Download string with encoding (WinRT) - c#

I use this code to download string from the Internet
public static async Task<string> DownloadPageAsync(string url)
{
HttpClientHandler handler = new HttpClientHandler {UseDefaultCredentials = true, AllowAutoRedirect = true};
HttpClient client = new HttpClient(handler);
client.MaxResponseContentBufferSize = 196608;
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
string responseBody = await response.Content.ReadAsStringAsync();
return responseBody;
}
but it only works for UTF8 documents. Where do I set the Encoding?

Change ReadAsStringAsync to ReadAsBufferAsync and parse result with required encoding
var buffer = await response.Content.ReadAsBufferAsync();
byte [] rawBytes = new byte[buffer.Length];
using (var reader = DataReader.FromBuffer(buffer))
{
reader.ReadBytes(rawBytes);
}
var res = Encoding.UTF8.GetString(rawBytes, 0, rawBytes.Length);

In WinRT the HttpContent reads Enconding from the Headers property. If the HTTP response from server doesn't set the Content-Type header with encoding, it tries to find BOM mark in the stream and if there's no BOM it will default to the UTF-8 encoding.
If the server is not sending the right Content-Type header you use the HttpContent.ReadAsStreamAsync() method and use your own instance of the Encoding class to correctly decode data.

Set the "ContentEncoding" property of your HttpResponse object:
http://msdn.microsoft.com/en-us/library/system.web.httpresponse.contentencoding%28v=vs.71%29.aspx
Values include:
http://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.71%29.aspx
System.Text.ASCIIEncoding
System.Text.UnicodeEncoding
System.Text.UTF7Encoding
System.Text.UTF8Encoding
PS:
This really isn't "Metro" per se - just C#/.Net (albeit .Net 4.x)

Related

C# HttpClient save response with MIME "text/plain" as an UTF-8 string

I'm sending a request with HttpClient to a remote endpoint. I want to download the content and save it to a file as an UTF-8 string.
If the server would respond with the proper Content-Type text/plain; charset=utf-8, then the following code processes it just fine:
HttpClient client = new();
HttpResponseMessage res = await client.GetAsync(url);
string text = await res.Content.ReadAsStringAsync();
File.WriteAllText("file.txt", text);
However, the server always returns the basic Content-Type text/plain and I'm unable to get that as an UTF-8 string.
HttpClient cl = new();
HttpResponseMessage res = await cl.GetAsync(url);
string attempt1 = await res.Content.ReadAsStringAsync();
string attempt2 = Encoding.UTF8.GetString(await res.Content.ReadAsByteArrayAsync());
Stream stream = await res.Content.ReadAsStreamAsync();
byte[] bytes = ((MemoryStream)stream).ToArray();
string attempt3 = Encoding.UTF8.GetString(bytes);
I tried all three of these approaches, all resulted in scrambled characters due to the encoding mismatch. I don't have control over the server, so I can't change the headers.
Is there any way to force HttpClient to parse it as UTF-8? Why are the manual approaches not working?
I've built a Cloudflare worker to demonstrate this behavior and allow you to easily debug:
https://headers.briganreiz.workers.dev/charset-in-header
https://headers.briganreiz.workers.dev/no-charset
Edit: Turns out it was the GZip compression on the main server which I didn't notice. This question solved it for me: Decompressing GZip Stream from HTTPClient Response
I find it works well with these different classes WebRequest and HttpWebResponse. I have not added plumbing for resp.StatusCode etc but obviously presuming all went well is a tad naive.
Give it a go i am sure You'll find the WebRequest and HttpWebResponse more capable for dynamic requests (?)
var req = WebRequest.CreateHttp(url)
var getResponse = req.GetResponseAsync();
getResponse.Wait(ResponseTimeoutMilliseconds);
var resp = (HttpWebResponse)getResponse.Result;
using (Stream responseStream = resp.GetResponseStream())
{
var reader = new StreamReader(responseStream, Encoding.UTF8);
string content = reader.ReadToEnd();
}
Obviously once you have things working, you should absolutely use the ..Async versions but for debugging, since we already waited for response it is more convenient to simply step through i find, feel free to not take that middle step :)

UTF-8 URL Encode

I am having issues in encoding my query params using HttpUtility.UrlEncode() it is not getting converted to UTF-8.
query["agent"] = HttpUtility.UrlEncode("{\"mbox\":\"mailto: UserName#company.com\"}");
I tried using the overload method and passed utf encoding but still it is not working.
expected result:
?agent=%7B%22mbox%22%3A%22mailto%3AUserName%40company.com%22%7D
Actual Result:
?agent=%257b%2522mbox%2522%253a%2522mailto%253aUserName%2540company.com%2522%257d
public StatementService(HttpClient client, IConfiguration conf)
{
configuration = conf;
var BaseAddress = "https://someurl.com/statements?";
client.BaseAddress = new Uri(BaseAddress);
client.DefaultRequestHeaders.Add("Custom-Header",
"customheadervalue");
Client = client;
}
public async Task<Object> GetStatements(){
var query = HttpUtility.ParseQueryString(Client.BaseAddress.Query);
query["agent"] = HttpUtility.UrlEncode( "{\"mbox\":\"mailto:UserName#company.com\"}");
var longuri = new Uri(Client.BaseAddress + query.ToString());
var response = await Client.GetAsync(longuri);
response.EnsureSuccessStatusCode();
using var responseStream = await response.Content.ReadAsStreamAsync();
dynamic statement = JsonSerializer.DeserializeAsync<object>(responseStream);
//Convert stream reader to string
StreamReader JsonStream = new StreamReader(statement);
string JsonString = JsonStream.ReadToEnd();
//convert Json String to Object.
JObject JsonLinq = JObject.Parse(JsonString);
// Linq to Json
dynamic res = JsonLinq["statements"][0].Select(res => res).FirstOrDefault();
return await res;
}
The method HttpUtility.ParseQueryString internally returns a HttpValueCollection. HttpValueCollection.ToString() already performs url encoding, so you don't need to do that yourself. If you do it yourself, it is performed twice and you get the wrong result that you see.
I don't see the relation to UTF-8. The value you use ({"mbox":"mailto: UserName#company.com"}) doesn't contain any characters that would look different in UTF-8.
References:
HttpValueCollection and NameValueCollection
ParseQueryString source
HttpValueCollection source
I strongly suggest you this other approach, using Uri.EscapeDataString method. This method is inside System.Net instead of System.Web that is a heavy dll. In addition HttpUtility.UrlEncode encode characters are in uppercase this would be an issue in certain cases while implementing HTTP protocols.
Uri.EscapeDataString("{\"mbox\":\"mailto: UserName#company.com\"}")
"%7B%22mbox%22%3A%22mailto%3A%20UserName%40company.com%22%7D"

HttpClient throws System.ArgumentException: 'windows-1251' is not a supported encoding name

I am writing WinPhone 8.1 app.
Code is very simple and works in most cases:
string htmlContent;
using (var client = new HttpClient())
{
htmlContent = await client.GetStringAsync(GenerateUri());
}
_htmlDocument.LoadHtml(htmlContent);
But sometimes exception is thrown at
htmlContent = await client.GetStringAsync(GenerateUri());
InnerException {System.ArgumentException: 'windows-1251' is not a
supported encoding name. Parameter name: name at
System.Globalization.EncodingTable.internalGetCodePageFromName(String
name) at
System.Globalization.EncodingTable.GetCodePageFromName(String name)
at
System.Net.Http.HttpContent.<>c__DisplayClass1.b__0(Task
task)} System.Exception {System.ArgumentException}
Does HttpClient support 1251 encoding? And if it doesn't, how can I avoid this problem? Or is it target page problem? Or am I wrong in something?
Get response as IBuffer and then convert using .NET encoding classes:
HttpClient client = new HttpClient();
HttpResponseMessage response = await client.GetAsync(uri);
IBuffer buffer = await response.Content.ReadAsBufferAsync();
byte[] bytes = buffer.ToArray();
Encoding encoding = Encoding.GetEncoding("windows-1251");
string responseString = encoding.GetString(bytes, 0, bytes.Length);

How to retrieve HttpResponseMessage content headers for a live media stream (in a windows store app using c#)

I am making an online radio radio app for Windows 8.1 and wanted to communicate with shoutcast servers using the new Windows.Web.Http API (in order to send custom HTTP headers to get metadata from the live media stream).
The response headers are empty, and I need to read the content headers before starting to read the stream data.
This is the code I tried to use:
Uri uri = new Uri(Url);
var baseFilter = new HttpBaseProtocolFilter();
ShoutcastHttpFilter filter = new ShoutcastHttpFilter(baseFilter);
var HClient = new HttpClient(filter);
HttpResponseMessage response = await HClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead);
Task streaming = null;
Stream stream = null;
stream = response.Content.ReadAsInputStreamAsync().GetResults().AsStreamForRead();
string str = response.Content.Headers["Icy-MetaInt"];
When I run/debug the code, the content appears as "unbuffered" and has no headers.
How can I get the content headers and the stream?
This the code I used in ShoutcastHttpFilter:
request.Headers.Clear();
request.Headers.Add("Icy-MetaData", "1");
request.Headers["User-Agent"] = "VLC media player";
request.Headers["Connection"] = "Close";
HttpResponseMessage response = await innerFilter.SendRequestAsync(request).AsTask(cancellationToken, progress);
cancellationToken.ThrowIfCancellationRequested();
return response;
If the header name does not start with Content-*, then the header must be in the response headers.
Do this:
Uri uri = new Uri("http://example.com");
HttpClient client = new HttpClient();
HttpResponseMessage response = await client.GetAsync(
uri,
HttpCompletionOption.ResponseHeadersRead);
string value = response.Headers["Icy-MetaInt"];

Download a webpage in UTF-8

I'm using the code below to download this XML file:
private async static Task<string> DownloadPageAsync(string url)
{
try
{
HttpClientHandler handler = new HttpClientHandler();
handler.UseDefaultCredentials = true;
handler.AllowAutoRedirect = true;
handler.UseCookies = true;
HttpClient client = new HttpClient(handler);
client.MaxResponseContentBufferSize = 10000000;
HttpResponseMessage response = await client.GetAsync(url);
response.EnsureSuccessStatusCode();
string responseBody = response.Content.ReadAsString();
return responseBody;
}
catch (Exception ex)
{
return "error" + ex.Message;
}
}
but the document I'm getting seems to have encoding problems. Although the document is not well formatted, I'm guessing my downloaded webpage is not in UTF-8 either. How can I return a UTF-8 string? Thanks.
your link encoding is iso-8859-1.
use
XmlDocument.Load(uriString)
or
XDocument.Load(uriString)
I suggest using the HTML Agility Pack to download and parse the document for you - it will automatically detect the encoding (where possible), so this shouldn't be a problem for you.
If this is not an option, you need to know what encoding the document is using then transform it to UTF8 using the Encoding classes to convert from the original encoding to UTF8.

Categories