how to read content from webpage - c#

I want to access a webpage & store the contents of the webpage into a database
this is the code I have tried for reading the contents of the webpage
public static WebClient wClient = new WebClient();
public static TextWriter textWriter;
public static String readFromLink()
{
string url = "http://www.ncedc.org/cgi-bin/catalog-search2.pl";
HttpWebRequest webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.Method = "POST";
System.Net.WebClient client = new System.Net.WebClient();
byte[] data = client.DownloadData(url);
string html = System.Text.Encoding.UTF8.GetString(data);
return html;
}
public static bool WriteTextFile(String fileName, String t)
{
try
{
textWriter = new StreamWriter(fileName);
}
catch (Exception)
{
return false;
Console.WriteLine("Data Save Unsuccessful: Could Not create File");
}
try
{
textWriter.WriteLine(t);
}
catch (Exception)
{
return false;
Console.WriteLine("Data Save UnSuccessful: Could Not Save Data");
}
textWriter.Close();
return true;
Console.WriteLine("Data Save Successful");
}
static void Main(string[] args)
{
String saveFile = "E:/test.txt";
String reSultString = readFromLink();
WriteTextFile(saveFile, reSultString);
Console.ReadKey();
}
but this code gives me an o/p as- This script should be referenced with a METHOD of POST. REQUEST_METHOD=GET
please tell me how to resolve this

You are mixing HttpWebRequest with System.Net.WebClient code. They are a different. You can use WebClient.UploadValues to send a POST with WebClient. You will also need to provide some POST data:
System.Net.WebClient client = new System.Net.WebClient();
NameValueCollection postData = new NameValueCollection();
postData.Add("format","ncread");
postData.Add("mintime","2002/01/01,00:00:00");
postData.Add("minmag","3.0");
postData.Add("etype","E");
postData.Add("outputloc","web");
postData.Add("searchlimit","100000");
byte[] data = client.UploadValues(url, "POST", postData);
string html = System.Text.Encoding.UTF8.GetString(data);
You can find out what parameters to pass by inspecting the POST message in Fiddler. And yes, as commented by #Chris Pitman, use File.WriteAllText(path, html);

I'm not sure if it's a fault on your side as I get the same message just by opening the page. The page source does not contain any html so I don't think you can do webRequest.Method = "POST". Have you spoken to the administrators of the site?

The .NET framework provides a rich set of methods to access data stored on the web. First you will have to include the right namespaces:
using System.Text;
using System.Net;
using System.IO;
The HttpWebRequest object allows us to create a request to the URL, and the WebResponse allows us to read the response to the request.
We’ll use a StreamReader object to read the response into a string variable.
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(URL);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
In this code sample, the URL variable should contain the URL that you want to get, and the result variable will contain the contents of the web page. You may want to add some error handling as well for a real application.

As far as I see, the URL you're requesting is a perl script. I think it demands POST to get search arguments and therefore delivers search results.

Related

How to Get Data From PHP C#

I have a Php Script in my Host Which has the link of my new version of my Program,How Can I Get that link From Php? I mean I wanna get that link From Php and Save it in one String.
I Often Use This Code For Doing Something like this:
webbrowser.Nagative("MyPhp Uri");
webbrowser.Document.ExecCommand("SelectAll", false, null);
webbrowser.Document.ExecCommand("Copy", false, null);
Than I Paste it in one Textbox
textbox1.Paste();
But This Way is not Complete way to get data From Php?
Can you help me?
You should use webrequest instead.
I'm not posting a complete solution because I'm pretty sure you will find one as soon as you know what to search for:
using System;
using System.Net;
//create a request object and server call
Uri requestUri = new Uri("MyPhp Uri");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(this.requestUri);
//set all properties you need for the request, like
request.Method = "GET";
request.BeginGetResponse(new AsyncCallback(ProcessResponse), request);
//handle response
private void ProcessResponse(IAsyncResult asynchronousResult)
{
string responseData = string.Empty;
HttpWebRequest myrequest = (HttpWebRequest)asynchronousResult.AsyncState;
using (HttpWebResponse response = (HttpWebResponse)myrequest.EndGetResponse(asynchronousResult))
{
Stream responseStream = response.GetResponseStream();
using (var reader = new StreamReader(responseStream))
{
responseData = reader.ReadToEnd();
}
responseStream.Close();
}
//TODO: do something with your responseData
}
Please notice: you should definitively add some try/catch blocks.. this is only a short example to point you in the right direction.

Scraping of a website on a AJAX based Request C#

I have a Website like this http://www.lfp.fr/ligue1/feuille_match/52255 and i want to switches between the tabs infoMatch and Statistiques but it shows me the Data of 1st page only and when i use the firebug to check the reponse it gives me this:
GET showStatsJoueursMatchmatchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes
string url="http://www.lfp.fr/ligue1/feuille_match/52255";
string getData = "?matchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes";
System.Uri uriObj = new System.Uri(url);
String Methode = "GET";
lgRequest = (HttpWebRequest)WebRequest.CreateDefault(uriObj);
lgRequest = (HttpWebRequest)WebRequest.CreateDefault(uriObj);
lgRequest.Method = Methode;
lgRequest.ContentType = "text/html";
SetRequestHeader("Accept", "text/html");
SetRequestHeader("Cache-Control", "no-cache");
SetRequestHeader("Content-Length", getData.Length.ToString());
StreamWriter stream = new StreamWriter
(lgRequest.GetRequestStream(), Encoding.ASCII);
stream.Write(body);
stream.Close();
lgResponse = (HttpWebResponse)lgRequest.GetResponse();
But it gives me the error "Cannot send a content-body with this verb-type." And when i use the "POST" in Method, it gives the Response of HTML but only the First Page Data not Statistiques.
Try at the following address: http://www.lfp.fr/ligue1/feuille_match/showStatsJoueursMatch?matchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes
Just like that:
using System;
using System.Net;
class Program
{
static void Main()
{
using (var client = new WebClient())
{
string result = client.DownloadString("http://www.lfp.fr/ligue1/feuille_match/showStatsJoueursMatch?matchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes");
Console.WriteLine(result);
}
}
}
Notice that I have used a WebClient instead of WebRequest which makes the code much shorter and easier to understand.
Once you have downloaded the HTML from the remote site you might consider using an HTML parsing library such as HTML Agility Pack to extract the useful information from the markup you have scraped.

How to POST Raw Data using C# HttpWebRequest

I am trying to make a POST request in which I am supposed to send Raw POST data.
Which property should I modify to achieve this.
Is it the HttpWebRequest.ContentType property. If, so what value should I assign to it.
public static string HttpPOST(string url, string querystring)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.ContentType = "application/x-www-form-urlencoded"; // or whatever - application/json, etc, etc
StreamWriter requestWriter = new StreamWriter(request.GetRequestStream());
try
{
requestWriter.Write(querystring);
}
catch
{
throw;
}
finally
{
requestWriter.Close();
requestWriter = null;
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
return sr.ReadToEnd();
}
}
You want to set the ContentType property to the mime type of the data. If its a file, it depends on the type of file, if it's plain text then text/plain and if it's an arbitrary binary data of your own local purposes then application/octet-stream. In the case of text-based formats you'll want to include the charset along with the content type, e.g. "text/plain; charset=UTF-8".
You'll then want to call GetRequestStream() and write the data to the stream returned.

Reading remote file [C#]

I am trying to read a remote file using HttpWebRequest in a C# Console Application. But for some reason the request is empty - it never finds the URL.
This is my code:
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://uo.neverlandsreborn.org:8000/botticus/status.ecl");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
How come this is not possible?
The file only contains a string. Nothing more!
How are you reading the response data? Does it come back as successful but empty, or is there an error status?
If that doesn't help, try Wireshark, which will let you see what's happening at the network level.
Also, consider using WebClient instead of WebRequest - it does make it incredibly easy when you don't need to do anything sophisticated:
string url = "http://uo.neverlandsreborn.org:8000/botticus/status.ecl";
WebClient wc = new WebClient();
string data = wc.DownloadString(url);
You have to get the response stream and read the data out of that. Here's a function I wrote for one project that does just that:
private static string GetUrl(string url)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.CreateDefault(new Uri(url));
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusCode != HttpStatusCode.OK)
throw new ServerException("Server returned an error code (" + ((int)response.StatusCode).ToString() +
") while trying to retrieve a new key: " + response.StatusDescription);
using (var sr = new StreamReader(response.GetResponseStream()))
{
return sr.ReadToEnd();
}
}
}

WebRequest to connect to the Wikipedia API

This may be a pathetically simple problem, but I cannot seem to format the post webrequest/response to get data from the Wikipedia API. I have posted my code below if anyone can help me see my problem.
string pgTitle = txtPageTitle.Text;
Uri address = new Uri("http://en.wikipedia.org/w/api.php");
HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
string action = "query";
string query = pgTitle;
StringBuilder data = new StringBuilder();
data.Append("action=" + HttpUtility.UrlEncode(action));
data.Append("&query=" + HttpUtility.UrlEncode(query));
byte[] byteData = UTF8Encoding.UTF8.GetBytes(data.ToString());
request.ContentLength = byteData.Length;
using (Stream postStream = request.GetRequestStream())
{
postStream.Write(byteData, 0, byteData.Length);
}
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
// Get the response stream.
StreamReader reader = new StreamReader(response.GetResponseStream());
divWikiData.InnerText = reader.ReadToEnd();
}
You might want to try a GET request first because it's a little simpler (you will only need to POST for wikipedia login). For example, try to simulate this request:
http://en.wikipedia.org/w/api.php?action=query&prop=images&titles=Main%20Page
Here's the code:
HttpWebRequest myRequest =
(HttpWebRequest)WebRequest.Create("http://en.wikipedia.org/w/api.php?action=query&prop=images&titles=Main%20Page");
using (HttpWebResponse response = (HttpWebResponse)myRequest.GetResponse())
{
string ResponseText;
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
ResponseText = reader.ReadToEnd();
}
}
Edit: The other problem he was experiencing on the POST request was, The exception is : The remote server returned an error: (417) Expectation failed. It can be solved by setting:
System.Net.ServicePointManager.Expect100Continue = false;
(This is from: HTTP POST Returns Error: 417 "Expectation Failed.")
I'm currently in the final stages of implementing an C# MediaWiki API which allows the easy scripting of most MediaWiki viewing and editing actions.
The main API is here: http://o2platform.googlecode.com/svn/trunk/O2%20-%20All%20Active%20Projects/O2_XRules_Database/_Rules/APIs/OwaspAPI.cs and here is an example of the API in use:
var wiki = new O2MediaWikiAPI("http://www.o2platform.com/api.php");
wiki.login(userName, password);
var page = "Test"; // "Main_Page";
wiki.editPage(page,"Test content2");
var rawWikiText = wiki.raw(page);
var htmlText = wiki.html(page);
return rawWikiText.line().line() + htmlText;
You seem to be pushing the input data on HTTP POST, but it seems you should use HTTP GET.
From the MediaWiki API docs:
The API takes its input through
parameters in the query string. Every
module (and every action=query
submodule) has its own set of
parameters, which is listed in the
documentation and in action=help, and
can be retrieved through
action=paraminfo.
http://www.mediawiki.org/wiki/API:Data_formats

Categories