How to get the webpage source code using C#

How to get the webpage source code using C# - c#

I know about the WebRequest and the WebResponse objects. The problem is that I do not really want to get the source code of the webpage, I only want to check to see if the link exists or not. The thing is, if I use the GetResponse method, it goes an pull the entire source code of the site.
I am creating a broken link checker with many links. It takes quite a while to check them all. If there a way to to get MINIMAL information from a weblink? Only enough information to see if the link is valid or broken (not the entire source code).
An answer (BESIDES USING ASYNCHRONOUS TRANSFER) would be greatly appreciated!

WebRequest request = HttpWebRequest.Create("http://www.foo.com/");
request.Method = "HEAD"; // Just get the document headers, not the data.
HEAD is similar to GET, only that instead of getting the file contents, we get just the headers.

A standard way of checking the existence of a link is to use a HEAD request, which causes the remote server to send the headers for the requested object, but not the object itself. If you thus requested an object that is not on the server, the server gives you the normal 404 response, but if it does exist, you get a 200 response and no data after the headers. This way very little uninteresting data goes over the wire.

Related

Pass data to a URL on a different web server without the QueryString

I've got an .ashx handler which, upon finishing processing will redirect to a success or error page, based on how the processing went. The handler is in my site, but the success or error pages might not be (this is something the user can configure).
Is there any way that I can pass the error details to the error page without putting it in the query string?
I've tried:
Adding a custom header that contains the error details, but since I'm using a Response.Redirect, the headers get cleared
Using Server.Transfer, instead of Response.Redirect, but this will not work for URLs not in my site
I know that I can pass data in the query string, but in some cases the data I need to pass might be too long for the query string. Do I have any other options?

Essentially, no. The only way to pass additional data in a GET request (i.e. a redirect) is to pass it in the query string.
The important thing to realise is that this is not a limitation of WebForms, this is just how HTTP works. If you're redirecting to another page that's outside of your site (and thus don't have the option of cookies/session data), you're going to have to send information directly in the request and that means using a query string.
Things like Server.Transfer and Response.Redirect are just abstractions over a simple HTTP request; no framework feature can defy how HTTP actually works.
You do, of course, have all kinds of options as to what you pass in the query string, but you're going to have to pass something. If you really want to shorten the URL, maybe you can pass an error code and expose an API that will let the receiving page fetch further information:
Store transaction information (or detailed error messages) in a database with an ID.
Pass the ID in the query string.
Expose a web method or similar API to allow the receiving page to request additional information.
There are plenty of hacky ways you could create the illusion of passing data in a redirect outside of a form post (such as returning a page containing a form and Javascript to immediately do a cross-domain form post) but the query string is the proper way of passing data in a GET request, so why try to hack around it?

If you must perform a redirect, you will need to pass some kind of information in the Query String, because that's how browser redirects work. You can be creative about how you pass it, though.
You could pass an error code, and have the consuming system know what various error codes mean.
You could pass a token, and have the consuming system know how to ask your system about the error information for the given token behind-the-scenes.
Also, if you have any flexibility around whether it's actually performing a redirect, you could use an AJAX request in the first place, and send back some kind of JSON object that the browser's javascript could interpret and send via a POST parameter or something like that.

A redirect is executed by most browsers as a GET, which means you'd have to put the data in the query string.
One trick (posted in two other answers) to do a "redirect" as a POST is to turn the response into a form that POSTs itself to the target site:
Response.Clear();
StringBuilder sb = new StringBuilder();
sb.Append("<html>");
sb.AppendFormat(#"<body onload='document.forms[""form""].submit()'>");
sb.AppendFormat("<form name='form' action='{0}' method='post'>",postbackUrl);
<!-- POST values go here -->
sb.AppendFormat("<input type='hidden' name='id' value='{0}'>", id);
sb.Append("</form>");
sb.Append("</body>");
sb.Append("</html>");
Response.Write(sb.ToString());
Response.End();
But I would read the comments on both to understand the limitations.

Basically there are two usual HTTP ways to send some data - GET and POST.
When you redirect to another URL with additional parameters, you make the client browser to send the GET request to the target server. Technically, your server responds to the browser with specific HTTP error code 307 + the URL to go (including the GET parameters).
Alternatively, you may want/need to make a POST request to the target URL. In that case you should respond with a simple HTML form, which consists of several hidden fields pre-filled with certain values. The form's action should point the target URL, method should be "POST", and of course your HTML should include javascript, which automatically submits the form once the document is loaded. This way the client browser would send the POST request instead of the GET one.

I have httpwebrequest object, want to upload it in parts

I have below code in a third party function, it posts file to a webserver. I want to post data in parts, what change should i do in the code. Below code is working and the "request" object contains everything.
private static HttpWebResponse GetRawResponse(HttpWebRequest request)
{
return (HttpWebResponse)request.GetResponse();
}
Also is there a way in which I can find out what is the full name (with path) of the file which is going to be uploaded from the httpwebrequest object.
Thanks.

HttpWebRequest does not spawn multiple HTTP requests. You cannot upload a file in chunks unless you actually create multiple HttpWebRequests. I don't think that would be useful unless you have a server side process to stitch them back together.
If you still want true chunked uploading you might want to look at raw TCP or some other mechanism.
Hope that helps. See this post as well.

C# Sending an HttpWebRequest without the servername

Basically what I've been trying to do is download a file off a server. The server sends a redirect automatically which is fine, but through packet sniffing a program that does successfully download the file I've found that the Headers (for the second request) are:
GET /path/to/file.txt
...
Host: server.com
Rather than the current response being generated (what I thought was standard):
GET www.server.com/path/to/file.txt
Using the normal HttpWebRequest method results in a 500 server error, and I get exceptions thrown when trying to use just the relative path as one would expect.
Using AllowAutoRedirect does not work for this scenario as the cookies are not handled properly, but even if I handle it manually the same error occurs.
How does one go about doing this (preferably without sockets :D)?

To be honest, I'm really not sure what you are asking, but you mentioned cookie troubles. As a total shot in the dark guess, are you setting the CookieContainer on your WebRequest?
request.CookieContainer = new CookieContainer();
request.AllowAutoRedirect = true;

Getting a raw socket from an IIS request

I'm trying to get the raw data sent to IIS using a HttpHandler. However, because the request is an "GET"-request without the "Content-Length" header set it reports that there is no data to read (TotalBytes), and the inputstream is empty. Is there any way I can plug into the IIS-pipeline (maybe even before the request is parsed) and just kind of take control over the request and read it's raw data? I don't care if I need to parse headers and stuff like that myself, I just want to get my hands on the actual request and tell IIS to ignore this one. Is that at all possible? Cause right now it looks like I need to do the alternative, which is developing a custom standalone server, and I really don't want to do that.

Most web servers will ignore (and rarely give you access to) the body of a GET request, because the HTTP semantics imply that it is to be ignored anyway. You should consider another method (for example POST or PUT).
See this question and the link in this answer:
HTTP GET with request body

Get html that is generated via AJAX in webclient

I often go to a site to look stuff up. I thought to myself: "Hold on. I can program. Why am I going to this site manually when I can write a piece of software that does it for me?".
And so I started. I'm using C#, so I found WebClient and Uri.
I've managed to get the source code for the site, yet the problem occurred that the specific data I'm looking for is generated via AJAX, after the source code has loaded.
So that's my problem. How can I get that code, if it needs to be requested via an AJAX call first?

The general approach is this:
using a tool like Fiddler, find out which HTTP requests are made by the browser in order to fetch the data you're looking for.
use WebClient to fetch the HTTP request(s) you need.
Take a look at my answer to this question for more info about HTML screen scraping for more details and how to work around various issues you may run across.
For #1 above, here's how to use fiddler to understand how a specific request is being made:
First, find the request you care about (the request which contains the data you want in its response). You can do this by inspecting each request by double-clicking it on the left pane in fiddler and looking inside the "text fiew" tab on the lower-right pane. You can also use CTRL+F to find content across multiple requests, but some requests are compressed so you'll want to ensure the "autodecode" button is selected in the toolbar before making your requests if you want to be sure you can text-search across all of them.
Once you've found the request you want, double-click it in Fiddler and select the "headers" tab in the upper-right pane. Those are the headers being sent. If your client sends exactly these headers to the server, you should get back the same data. But usually not all the headers are needed, so you'll want to figure out which ones are needed. You do this using Fiddler's Request Builder tab in the upper-right pane. Select that tab and drag your data request over from the left pane onto the request builder. Then submit the request to validate that it returns the correct results. Then start deleting headers, one header at a time, until the request stops working-- you know that that header was required. Try to delete each header until you find the ones that are required.
Then, you'll need to write code to generate the right header. Don't worry about the Host: header, that's generated automatically for you. For the Cookie: header, you'll need to generate it using the CookieContainer class. For the other headers (e.g. UserAgent:, Accept:, etc. you can generally copy them and add them to your request as-is.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.