Uncompressing xml feed - c#

My application downloads a zipped xml file from the web and tries to create XML reader:
var fullReportUrl = "http://..."; // valid url here
//client below is an instance of HttpClient
var fullReportResponse = client.GetAsync(fullReportUrl).Result;
var zippedXmlStream = fullReportResponse.Content.ReadAsStreamAsync().Result;
XmlReader xmlReader = null;
using(var gZipStream = new GZipStream(zippedXmlStream, CompressionMode.Decompress))
{
try
{
xmlReader = XmlReader.Create(gZipStream, settings);
}
catch (Exception xmlEx)
{
}
}
When I try to create XML reader I get an error:
"The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
When I use the URL in the browser I succesfully download a zip file with a well formatted XML in it. My OS is able to unzip it without any issues. I examined the first two characters of the downloaded file and they appear to be 'PK' which is consistent with a ZIP format.
I might be missing a step in stream transformations. What am I doing wrong?

You don't need to use GzipStream for decompressing any http response with HttpClient. You can use HttpClientHandler AutomaticDecompression to make HttpClient decompress the request automatically for you.
HttpClientHandler handler = new HttpClientHandler()
{
// both gzip and deflate
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
using (var client = new HttpClient(handler))
{
var fullReportResponse = client.GetAsync(fullReportUrl).Result;
}
Edit 1:
Web Servers won't gzip output all the requests. First they check accept-encoding header, if the header is set and it is something like Accept-Encoding: deflate, gzip;q=1.0, *;q=0.5 the web server understands the client could support gzip or deflate so Web Server might ( depends on the app logic or server configuration ) compress the output into gzip or deflate. In your scenario I don't think you have set accept-encoding header so the web response will return uncompressed. Although I recommend you to try the code above.
Read more about accept-encoding on MDN

Related

C# WebClient unable to download csv file from link

Here is my code tries to download the csv file from mentioned url but getting and error.
string remoteUri = "https://www.nseindia.com/api/corporates-corporateActions?index=equities&from_date=30-07-2020&to_date=06-08-2020&csv=true";
string fileName = #"C:\test.csv";
WebClient myWebClient = new WebClient();
myWebClient.DownloadFile(remoteUri,fileName);
Getting an error Fatal Error: Execution time limit was exceeded.
But hitting the above url in browser downloads the csv file
This is because the HTTP server is expecting the following headers in your request:
Accept-Language: fr,fr-FR;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
You can try them in another REST client. I've been able to reproduce your problem with ARC.
There are some issues in that URL from where you are trying to get CSV file.I checked Your code with other CSV URL Its working fine. Bellow is the test CSV URL.
string remoteUri = "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv";
string fileName = #"Y:\Users Data\Sagar\Test\test.csv";
WebClient myWebClient = new WebClient();
myWebClient.Headers.Add("contentType", "text/csv");
await myWebClient.DownloadFileTaskAsync(remoteUri, fileName);

WebRequest returns unreadable string [duplicate]

I'm trying to download an html document from Amazon but for some reason I get a bad encoded string like "��K��g��g�e".
Here's the code I tried:
using (var webClient = new System.Net.WebClient())
{
var url = "https://www.amazon.com/dp/B07H256MBK/";
webClient.Encoding = Encoding.UTF8;
var result = webClient.DownloadString(url);
}
Same thing happens when using HttpClient:
var url = "https://www.amazon.com/dp/B07H256MBK/";
var httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(url);
I also tried reading the result in Bytes and then convert it back to UTF-8 but I still get the same result. Also note that this DOES NOT always happen. For example, yesterday I was running this code for ~2 hours and I was getting a correctly encoded HTML document. However today I always get a bad encoded result. It happens every other day so it's not a one time thing.
==================================================================
However when I use the HtmlAgilitypack's wrapper it works as expected everytime:
var url = "https://www.amazon.com/dp/B07H256MBK/";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);
What causes the WebClient and HttpClient to get a bad encoded string even when I explicitly define the correct encoding? And how does the HtmlAgilityPack's wrapper works by default?
Thanks for any help!
I fired up Firefox's web dev tools, requested that page, and looked at the response headers:
See that content-encoding: gzip? That means the response is gzip-encoded.
It turns out that Amazon gives you a response compressed with gzip even when you don't send an Accept-Encoding: gzip header (verified with another tool). This is a bit naughty, but not that uncommon, and easy to work around.
This wasn't a problem with character encodings at all. HttpClient is good at figuring out the correct encoding from the Content-Type header.
You can tell HttpClient to un-zip responses with:
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip,
};
using (var client = new HttpClient(handler))
{
// your code
}
This will be set automatically if you're using the NuGet package versions 4.1.0 to 4.3.2, otherwise you'll need to do it yourself.
You can do the same with WebClient, but it's harder.

Detect GZIP compression in HTTP Module

How can I detect whether or not GZIP is enabled for a particular request in my HTTP Module? I apply a filter to the output, and when it acts on gzipped content, it screws up the compression somehow, and the client browser throws an error that it can't decode the content.
public void Init(HttpApplication context)
{
// if(HttpContext.Current.IsCompressed) // Check for compressed content here
// Set up the filter / replacement.
context.PostReleaseRequestState += (_, __) =>
{
var filterStream = new ResponseFilterStream(HttpContext.Current.Response.Filter);
filterStream.TransformString += FilterPage;
HttpContext.Current.Response.Filter = filterStream;
};
}
ResponseFilterStream is a custom stream which caches all stream writes and presents the contents as an event in order to allow a method to modify the contents of the stream. It works great for modifying HTML requests (which is what I want), but I don't want it to act on gzip-compressed responses. How can I detect a gzipped response and prevent the filter stream from being hooked up to the response?
For a response, you can check the Encoding http header for a value of gzip or deflate
For a request, you need to check Accept-Encoding http header for a value of gzip or deflate encoding.
HTTP Compression

jQuery POST to a .NET HttpListener and back again

I have a chunk of javascript code that uses jQuery.post to send some data to a .NET app that's using an HttpListener.
Here's the js:
$.post("http://localhost:8080/catch", { name: "John", time: "2pm" },
function(data) {
alert(data);
});
and the C#:
HttpListenerContext context = listener.GetContext();
HttpListenerRequest request = context.Request;
StreamReader reader = new StreamReader(request.InputStream);
string s2 = reader.ReadToEnd();
Console.WriteLine("Data received:" + s2);
// Obtain a response object.
HttpListenerResponse response = context.Response;
// Construct a response.
string responseString = "<HTML><BODY> Hello world!</BODY></HTML>";
byte[] buffer = System.Text.Encoding.UTF8.GetBytes(responseString);
// Get a response stream and write the response to it.
response.ContentLength64 = buffer.Length;
System.IO.Stream output = response.OutputStream;
output.Write(buffer, 0, buffer.Length);
// You must close the output stream.
output.Close();
The post request goes out ok, and the .NET app reads in the data ok, but the JS code doesn't seem to get the response. The callback function to the jQuery.post fires, but data is always undefined.For brevity I have omitted some C# above where I set the prefixes to the listener.
Any ideas why I'm not getting my data back client-side?
EDIT: I should add that when I run the JS with HttpFox running I get Http code 200, 'NS_ERROR_DOM_BAD_URI', which I thought had something to do with the "http://localhost:8080/catch" I was targeting, but when I hit that resource in firefox, i get the HTML response just fine and it registers as a GET, 200.
EDIT: I simplified the response to just 'meow', and this is what fiddler is giving me for the full response:
HTTP/1.1 200 OK
Content-Length: 4
Content-Type: text/html
Server: Microsoft-HTTPAPI/2.0
Date: Fri, 15 Apr 2011 12:58:49 GMT
meow
Don't forget about the same origin policy restriction. Unless your javascript is hosted on http://localhost:8080 you won't to be able to send AJAX requests to this URL. A different port number is not allowed either. You will need to host your javascript file on an HTML page served from http://localhost:8080 if you want this to work. Or have your server send JSONP but this works only with GET requests.
Remark: make sure you properly dispose disposable resource on your server by wrapping them in using statements or your server might start leaking network connection handles.
Don't forget to release the resources by closing the response.
Calling Close on the response will force the response to be sent through the underlying socket and will then Dispose all of its disposable objects.
In your example, the Close method is only called on the Output stream. This will send the response through the socket, but will not dispose any resources related to the response, which includes the output stream you referenced.
// Complete async GetContext and reference required objects
HttpListenerContext Context = Listener.EndGetContext(Result);
HttpListenerRequest Request = Context.Request;
HttpListenerResponse Response = Context.Response;
// Process the incoming request here
// Complete the request and release it's resources by call the Close method
Response.Close();
I do not see setting of content-type. Set the content-type to text/html.
response.ContentType = "text/html";
You can simplify the writing code a lot. Just use this:
// Construct a response.
string responseString = "<HTML><BODY> Hello world!</BODY></HTML>";
context.Response.Write(responseString);
No need for the OutputStream or most of that other code. If you do have a reason to use it, note that you actually should not close the OutputStream. When you use Resopnse.OutputStream you're retrieving a reference to it but you're not taking ownership. It's still owned by the Response object and will be closed properly when the Response is disposed at the end of the request.

Downloading a Cab file using WebClient gives too few bytes

I need to download a Cab file from a Url into a stream.
using (WebClient client = new WebClient())
{
client.Credentials = CredentialCache.DefaultCredentials;
byte[] fileContents = client.DownloadData("http://localhost/sites/hfsc/FormServerTemplates/HfscInspectionForm.xsn");
using (MemoryStream ms = new MemoryStream(fileContents))
{
FormTemplate = formExtractor.ExtractFormTemplateComponent(ms, "template.xml");
}
}
This is fairly straight forward, however my cab extractor (CabLib) is throwing an exception that it's not a valid cabinet.
I was previously using a SharePoint call to get the byte stream and that was returning 30942 bytes. The stream I get through that method worked correctly with CabLib. The stream I get with the WebClient returns only 28087 bytes.
I have noticed that the responce header content-type is coming back as text/html; charset=utf-8
I'm not too sure why but I think it's what's affecting the data I get back.
I beleive the problem is that SharePoint is passing the xsn to the Forms Server to render as an info path form in HTML for you. You need to stop this from happening. You can do this by adding some query string parameters to the URL request.
These can be found at:
http://msdn.microsoft.com/en-us/library/ms772417.aspx
I suggest you use NoRedirect=true

Categories