Downloading a Cab file using WebClient gives too few bytes - c#

I need to download a Cab file from a Url into a stream.
using (WebClient client = new WebClient())
{
client.Credentials = CredentialCache.DefaultCredentials;
byte[] fileContents = client.DownloadData("http://localhost/sites/hfsc/FormServerTemplates/HfscInspectionForm.xsn");
using (MemoryStream ms = new MemoryStream(fileContents))
{
FormTemplate = formExtractor.ExtractFormTemplateComponent(ms, "template.xml");
}
}
This is fairly straight forward, however my cab extractor (CabLib) is throwing an exception that it's not a valid cabinet.
I was previously using a SharePoint call to get the byte stream and that was returning 30942 bytes. The stream I get through that method worked correctly with CabLib. The stream I get with the WebClient returns only 28087 bytes.
I have noticed that the responce header content-type is coming back as text/html; charset=utf-8
I'm not too sure why but I think it's what's affecting the data I get back.

I beleive the problem is that SharePoint is passing the xsn to the Forms Server to render as an info path form in HTML for you. You need to stop this from happening. You can do this by adding some query string parameters to the URL request.
These can be found at:
http://msdn.microsoft.com/en-us/library/ms772417.aspx
I suggest you use NoRedirect=true

Related

WebRequest returns unreadable string [duplicate]

I'm trying to download an html document from Amazon but for some reason I get a bad encoded string like "��K��g��g�e".
Here's the code I tried:
using (var webClient = new System.Net.WebClient())
{
var url = "https://www.amazon.com/dp/B07H256MBK/";
webClient.Encoding = Encoding.UTF8;
var result = webClient.DownloadString(url);
}
Same thing happens when using HttpClient:
var url = "https://www.amazon.com/dp/B07H256MBK/";
var httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(url);
I also tried reading the result in Bytes and then convert it back to UTF-8 but I still get the same result. Also note that this DOES NOT always happen. For example, yesterday I was running this code for ~2 hours and I was getting a correctly encoded HTML document. However today I always get a bad encoded result. It happens every other day so it's not a one time thing.
==================================================================
However when I use the HtmlAgilitypack's wrapper it works as expected everytime:
var url = "https://www.amazon.com/dp/B07H256MBK/";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);
What causes the WebClient and HttpClient to get a bad encoded string even when I explicitly define the correct encoding? And how does the HtmlAgilityPack's wrapper works by default?
Thanks for any help!
I fired up Firefox's web dev tools, requested that page, and looked at the response headers:
See that content-encoding: gzip? That means the response is gzip-encoded.
It turns out that Amazon gives you a response compressed with gzip even when you don't send an Accept-Encoding: gzip header (verified with another tool). This is a bit naughty, but not that uncommon, and easy to work around.
This wasn't a problem with character encodings at all. HttpClient is good at figuring out the correct encoding from the Content-Type header.
You can tell HttpClient to un-zip responses with:
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip,
};
using (var client = new HttpClient(handler))
{
// your code
}
This will be set automatically if you're using the NuGet package versions 4.1.0 to 4.3.2, otherwise you'll need to do it yourself.
You can do the same with WebClient, but it's harder.

How do I create a System.IO.Stream from blob:https://localhost:5001/2b28e86c-fef1-482e-ae16-12466e6f729f

I'm submitting a trusted url path to my webapi and then trying to upload the image to azure-storage. The uri I have to upload includes blob:https://localhost/... Which points to a image stored locally. I need to read this stream however I'm receiving an exception on first line of code:
"The URI prefix is not recognized."
var req = System.Net.WebRequest.Create("blob:https://localhost:5001/2b28e86c-fef1-482e-ae16-12466e6f729f");
using (var stream = req.GetResponse().GetResponseStream())
{
containerClient.UploadBlob(image.guid.ToString(), stream);
}
I did a bunch more searching and discovered that only the browser can access blob: files. Ended up switching back to FormData.

Why does WebClient.UploadValues overwrites my html web page?

I'm familiar with Winform and WPF, but new to web developing. One day saw WebClient.UploadValues and decided to try it.
static void Main(string[] args)
{
using (var client = new WebClient())
{
var values = new NameValueCollection();
values["thing1"] = "hello";
values["thing2"] = "world";
//A single file that contains plain html
var response = client.UploadValues("D:\\page.html", values);
var responseString = Encoding.Default.GetString(response);
Console.WriteLine(responseString);
}
Console.ReadLine();
}
After run, nothing printed, and the html file content becomes like this:
thing1=hello&thing2=world
Could anyone explain it, thanks!
The UploadValues method is intended to be used with the HTTP protocol. This means that you need to host your html on a web server and make the request like that:
var response = client.UploadValues("http://some_server/page.html", values);
In this case the method will send the values to the server by using application/x-www-form-urlencoded encoding and it will return the response from the HTTP request.
I have never used the UploadValues with a local file and the documentation doesn't seem to mention anything about it. They only mention HTTP or FTP protocols. So I suppose that this is some side effect when using it with a local file -> it simply overwrites the contents of this file with the payload that is being sent.
You are using WebClient not as it was intended.
The purpose of WebClient.UploadValues is to upload the specified name/value collection to the resource identified by the specified URI.
But it should not be some local file on your disk, but instead it should be some web-service listening for requests and issuing responces.

How to check the modify time of a remote file

I Am in need of knowing the last modification DateTime of a remote file prior to downloading the entire content. This to save up on downloading bytes I am never going to need anyway.
Currently I am using WebClient to download the file. It is not needed to keep the use of WebClient specifically. The Last-Modified key can be found within the response headers but the entire file is downloaded at that point in time.
WebClient webClient = new WebClient();
byte[] buffer = webClient.DownloadData( uri );
WebHeaderCollection webClientHeaders = webClient.ResponseHeaders;
String modified = webClientHeaders.GetKey( "Last-Modified" );
Also I am not sure if that key is always included at each file on the internet.
You can use the HTTP "HEAD" method to just get the file's headers.
...
var request = WebRequest.Create(uri);
request.Method = "HEAD";
...
Then you can extract the last modified date and check whether to download the file or not.
Just be aware that not all servers implement Last-modified properly.

How do I check for binary vs. text in an HttpWebRequest in c#?

Is there a way to determine if the response from an HttpWebRequest in C# contains binary data vs. text? Or is there another class or function I should be using to do this?
Here's some sample code. I'd like to know before reading the StreamReader if the content is not text.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.someurl.com");
request.Method = WebRequestMethods.Http.Get;
using (WebResponse response = request.GetResponse())
{
// check somewhere in here if the response is binary data and ignore it
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string responseDetails = reader.ReadToEnd().Trim();
}
}
In general, web sites will tell you in the Content-Type header what kind of data they're returning. You can determine that by getting the ContentType property from the response.
But sites have been known to lie. Or not say anything. I've seen both. If there is no Content-Type header or you don't want to trust it, then the only way you can tell what kind of data is there, is by reading it.
But then, if you don't trust the site, why are you reading data from it?

Categories