web request for html only response excluding images

web request for html only response excluding images - c#

Using .net WebClient (or any other way in .net), I need to make a web request for a url, so that the response will be the html and text only without the images.
Same as selecting "No Images" in Internet explorer or Opera browsers, so that I get the smallest response size.
I've tried setting the Accept header to "text/html", but still the images are coming in the response.

Related

Setting POST data's encoding on WPF WebBrowser

Backgrounds
I've been using my own site to send a POST request to another site that requires a specific sender url, which is my site's url. And now I'm trying to send the automated POST request with my WPF application. I can't use httpclient's sendAsync function because that way I can't include valid site url for header. Therefore I'm trying to send the request from WebBrowser by loading the page and filling the input values, and then submitting the form. And here's my problem.
Problem
This target server that receives POST request requires a specific encoding, something other than UTF-8. And my site is configured to fit that encoding and works fine on the actual web browsers like chrome or IE. But when I send the exact same request from the WPF WebBrowser by loading the same page, somehow the encoding goes wrong. I'd like my WPF WebBrowser to send the POST request under specific encoding, but I can't find the way to do so. I'd be graceful for any advice. Thank you in advance.
What I've tried
I've inserted <meta encoding="~">to my html, also set <form charset="~"> on the form. But it still sends the invalid encoding request.

I've looked into my request's broken text and ran it through decoder which restores broken text into original one, so I could be confident that my request in WPF was encoded with UTF-8. Then I could find a WebBrowser.Document.encoding option. My defaultEncoding was set to UTF-8 and it caused all the problems. By fixing it to document.encoding = "myEncoding";, my problem was solved.

C# windows service: HTML page with picture, CSS and JS as a response

I am learning to build a windows service, but it takes requests (e.g. localhost:8081/index) from the browser to. Therefore, the HTTP response should contain an HTML page.
The HTML page looks okay when I double click the index.html file, but it lost all the CSS and js files when I request from the web browser. And I open the developer's tool in chrome and found out that all the CSS and JS files were corrupted and contain the code from my HTML page (weird).
I used HttpListenerContext class to listen for http://localhost/index request, and then open index.html file and used File.ReadAllBytes(file). When composing the response, I used the following code:
responseBytes = File.ReadAllBytes(file);
response.ContentLength64 = responseBytes.Length;
await response.OutputStream.WriteAsync(responseBytes, 0, responseBytes.Length);
response.OutputStream.Close();
Can anyone help me to figure out why this is happening?

So I figured out the answer by myself when I traced the code. When request localhost/index comes in from the browser, the GET method will first acquire index.html first, then there are other GET requests come in for the CSS, image and JS as well (It's weird that I didn't see the later requests before.)
Then I updated the handler to compose responses contains CSS, image and JS files. Everything works perfectly now.

Html document not downloading from google search url

I used this below code from to download html content from google search url
HttpClient client = new HttpClient();
var response = await client.GetAsync("https://www.google.co.in/?gws_rd=ssl#q=chiranjeevi+movies");
var sdata = response.Content.ReadAsStringAsync().Result;
When i inspect on browse i find "klitem" class with div tag, but when read my response it is not showing, i don't no why it's not showing when i download the url content, any one please help me...

When I inspect on browser I find "klitem" class with div tag, but when read my response it is not showing, I don't no why it's not showing when I download the url content.
When you inspect on browser, you can find some HTML elements. Actually, they are elements in a DOM tree rendered by the web browser. They are not in the response content.
If you look at the Network tool of the develop tool, you will find when you call your url, the browser will send several requests and in the first request, it returns a html page like following
And this is the same content that read from your response. Unlike web browser, HttpClient can only get the response from one request. We can not get the final DOM tree from it. Google won't return the search result directly. As #Mehrzad said, the final result is possibly created dynamically via JavaScript. If you check the Network tool, you can find there are a lot of JavaScript requests in them instead of a static page.

Access YouTube with ASP.NET and YoutubeFisher but gets HTTP Error 403

I'm trying to use YoutubeFisher library with ASP.NET. I make an HttpWebRequest to grab html content, process the contest to extract the video links and display links on the web page. I managed to make it work on localhost. I can retrieve video links and download the video on the locahost. But when I push it to the Server, it works only if I send the request from the same Server. If that page is accessed by a client browser, the client can see the links properly, but when link is clicked the client gets the HTTP Error 403, everytime the client clicks on the link even though the link is correct.
My analysis is that when the Server makes HttpWebRequest to grab HTML contet, it sends HTTP header as well. The HTML content (links to the video file) that is sent back from YouTube server, I think, will reponse to only the request that matches that HTTP header, that is sent from the Server. So, when client clicks on the link it sends request to YouTube server with different HTTP header.
So, I'm thinking of getting the HTTP header from the client, then modify the Server HTTP header to include HTTP header info of the client before making HttpWebRequest. I'm not quite sure if this will work. As far as I know, HTTP heaer cannot be modified.
Below is the code that makes HttpWebRequest from YouTubeFisher library,
public static YouTubeService Create(string youTubeVideoUrl)
{
YouTubeService service = new YouTubeService();
service.videoUrl = youTubeVideoUrl;
service.GetVideoPageHtmlSource();
service.GetVideoTitle();
service.GetDownloadUrl();
return service;
}
private void GetVideoPageHtmlSource()
{
HttpWebRequest req = HttpWebRequest.Create(videoUrl) as HttpWebRequest;
HttpWebResponse resp = req.GetResponse() as HttpWebResponse;
videoPageHtmlSource = new StreamReader(resp.GetResponseStream(), Encoding.UTF8).ReadToEnd();
resp.Close();
}
Client browses the page but the links are there but give HTTP 403:
Browse the page the from the Server itself, everything works as expected:
How do I make HttpWebRequest on the behalf of the client then? Is my analysis of this problem correct?
Thank you for your input.

Use an http monitor such as Charles, Fiddler or even Firebug to find out what additional headers are being sent from the brower in the success case. I suspect you'll need to duplicate one or more of accept, user-agent or referer.

In the past I've just assumed that youtube has those links encoded so that they only work for the original request IP. If that were the case it would be far more difficult. I have no clue if this is the case or not, try forwarding all the header elements you can before going down this route...
The only possibility that comes to mind is that you'd have to use a javascript request to download the page to the client's browser, then upload that to your server for processing, or do the processing in javascript.
Or you could have the client download the video stream via your server, so your server would pass through the data. This would obviously use a ton of your bandwidth.

Download office document without the web server trying to render it

I'm trying to download an InfoPath template that's hosted on SharePoint. If I hit the url in internet explorer it asks me where to save it and I get the correct file on my disk. If I try to do this programmatically with WebClient or HttpWebRequest then I get HTML back instead.
How can I make my request so that the web server returns the actual xsn file and doesn't try to render it in html. If internet explorer can do this then it's logical to think that I can too.
I've tried setting the Accept property of the request to application/x-microsoft-InfoPathFormTemplate but that hasn't helped. It was a shot in the dark.

I'd suggest using Fiddler or WireShark, to see exactly how IE is sending the request, then duplicating that.

Have you tried spoofing Internet Explorer's User-Agent?

There is a HTTP response header that makes a HTTP user agent download a file instead of trying to display it:
Content-Disposition: attachment; filename=paper.doc
I understand that you may not have access to the server, but this is one straight-forward way to do this if you can access the server scripts.
See the HTTP/1.1 specification and/or say, Google, for more details on the header.

This is vb.net, but you should get the point. I've done this with an .aspx page that you pass the filename into, then return the content type of the file and add a header to make it an attachment, which prompts the browser to treat it as such.
Response.AddHeader("Content-Disposition", "attachment;filename=filename.xsn")
Response.ContentType = "application/x-microsoft-InfoPathFormTemplate"
Response.WriteFile(FilePath) ''//Where FilePath is... the path to your file ;)
Response.Flush()
Response.End()

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

web request for html only response excluding images - c#

Related

Setting POST data's encoding on WPF WebBrowser

C# windows service: HTML page with picture, CSS and JS as a response

Html document not downloading from google search url

Access YouTube with ASP.NET and YoutubeFisher but gets HTTP Error 403

Download office document without the web server trying to render it

Categories

Resources