Extract contents from HTTP request and then get selected contents from it - c#

just for learning purpose, i ma playing with page Request and response.I need to know how can i achieve this.What i want to do is to make a HTTP request from windows application and extract some content from it. For example
I am calling http://stackoverflow.com/questions
now from response i want to extract all question nodes which is in <div id="questions"> and format that and then display this in Table. Can some body guide me how to do that. I here that i can do that formating and extracting thingy from regular expression too but i m not sure how.
Thanks in advance
Lura

I suggest using the HTML Agility Pack - it will allow you to get the page directly and query it using XPath, similar to how XmlDocument works.

You can use HttpWebRequest to get Source content of page the as follows.
string url = #"http://stackoverflow.com/users";
System.Net.WebRequest request = System.Net.HttpWebRequest.Create(url);
System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)request.GetResponse();
System.IO.StreamReader stream = new System.IO.StreamReader
(response.GetResponseStream(), System.Text.Encoding.GetEncoding("utf-8"));
XmlDocument rssDoc = new XmlDocument();
rssDoc.Load(stream);

Related

How to read a page that uses JS to block scraping?

I'm using Xamarin (C#). I tried this code to get data I need in my app:
String url = "http://mmehdirajabiigdl.gigfa.com/VideoImageDownloader.php?link=https://www.instagram.com/tv/CJ_YPTLJ8Zx/?igshid=1jcvto1p6ekxx";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string mm = sr.ReadToEnd();
The response is wrong because this page is using JS that blocks my codes from getting HTML codes.
Well how can I fix this? I know WebRequest has no "Enabling JS". Maybe I should use WebBrowser but Xamarin has no WebBrowser.
As i know, HttpWebRequest just does a HTTP request and you could scrape the static HTML. It does not support JavaScript.
You could execute Javascript with Xamarin Forms WebView.
For more details, please check the link below. https://www.xamarinhelp.com/xamarin-forms-webview-executing-javascript/

How do I call httpwebrequest c# .net

I am trying to learn how to use proxies..
My main goal is to be able to input a proxy adress in a text box and use that input as an actual proxy adress for the webBrowser in c#
But first what I need to figure out is how do I call the httpwebrequest?
I was looking at this question and the answers below and I was trying to follow along but when ever I try to use the httpwebrequest it doesnt even pop up in intellisense.
Im refering to this line right here
HttpWebRequest request = WebRequest.Create(postUrl) as HttpWebRequest;
how to use http post with proxy support in c#
Here is my code in button click calls HttpWebRequest that redirects to google home page that you can get either XML or HTML and you can also redirects to page.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.google.co.in");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
//Get response as stream from httpwebresponse
StreamReader resStream = new StreamReader(response.GetResponseStream());
//Create instance for xml document
XmlDocument doc = new XmlDocument();
//Load response stream in to xml result
xmlResult = resStream.ReadToEnd();
//Load xmlResult variable value into xml documnet
doc.LoadXml(xmlResult);
Please refer this image 1 and snapshot2

C# loading html of a webpage currently on

I am trying to make a small app that can log in automatically on a website, get certain texts on the website and return to user.
To show what I have, I did below to make it log in,
System.Windows.Forms.HtmlDocument doc = logger.Document as System.Windows.Forms.HtmlDocument;
try
{
doc.GetElementById("loginUsername").SetAttribute("value", "myusername");
doc.GetElementById("loginPassword").SetAttribute("value", "mypassword");
doc.GetElementById("loginSubmit").InvokeMember("click");
And below to load html of the page
WebClient myClient = new WebClient();
Stream response = myClient.OpenRead(webbrowser.Url);
StreamReader reader = new StreamReader(response);
string src = reader.ReadToEnd(); // finally reading html and saving in variable
Now, it successfully loaded html but html of the page where it's not logged in. Is there a way to refer to current html somehow? Or another way to achieve my goals. Thank you for reading!
Use the Webclient class so you can use sessions and cookies.
check this Q&A: Using WebClient or WebRequest to login to a website and access data
Why don't you make REST API calls and send the data like username and password from your code itself?
Is there any Web API for the URL ? If yes , you can simply call the service and pass on the required parameters. The API shall return in JSON/XML which you can parse and extract information

get query results from a web site in c#

I am using c#. I have imei number of a phone. Need to get details of the phone from http://www.imei.info web site in my c# application.
When I go to the web site and search the imei number of my phone; I see the following URL http://www.imei.info/?imei=356061042215493 with my phone details.
How can I do this in my c# application?
You can concatenate the URL on the run-time and then download the HTML page, parse it and extract the information you want using HTMLAgilityPack. See code below as an example and then you can parse returned data to extract your information.
private List<HtmlNode> GetPageData(string imei)
{
HtmlDocument doc = new HtmlDocument();
WebClient webClient = new WebClient();
string strPage = webClient.DownloadString(
string.Format("{0}{1}", WebPage, imei));
doc.LoadHtml(strPage);
//Change parsing schema down here
return doc.DocumentNode.SelectNodes("//table[#class='sortable autostripe']//tbody//tr//td").ToList();
}
unless they have an API, you're going to need to read the page details using xml parser like: LINQ to XML or XmlReader
See WebClient.DownloadString and HtmlAgilityPack

C# HTTP programming

i want to build a piece of software that will process some html forms, the software will be a kind of bot that will process some forms on my website automatically.
Is there anyone who can give me some basic steps how to do this job...Any tutorials, samples, books or whatever can help me.
Can some of you post an working code with POST method ?
Check out How to: Send Data Using the WebRequest Class. It gives an example of how create a page that posts to another page using the HttpWebRequest class.
To fill out the form...
Find all of the INPUT or TEXTAREA elements that you want to fill out.
Build the data string that you are going to send back to the server. The string is formatted like "name1=value1&name2=value2" (just like in the querystring). Each value will need to be URL encoded.
If the form's "method" attribute is "GET", then take the URL in the "action" attribute, add a "?" and the data string, then make a "GET" web request to the URL.
If the form's "method" is "POST", then the data is submitted in a different area of the web request. Take a look at this page for the C# code.
To expand on David and JP's answers':
Assuming you're working with forms whose contents you're not familiar with, you can probably...
pull the page with the form via an HttpWebRequest.
load it into an XmlDocument
Use XPath to traverse/select the form elements.
Build your query string/post data based on the elements.
Send the data with HttWebRequest
If the form's structure is known in advance, you can really just start at #4.
(untested) example (my XPath is not great so the syntax is almost certainly not quite right):
HttpWebRequest request;
HttpWebResponse response;
XmlDocument xml = new XmlDocument();
string form_url = "http://...."; // you supply this
string form_submit_url;
XmlNodeList element_nodes;
XmlElement form_element;
StringBuilder query_string = new StringBuilder();
// #1
request = (HttpWebRequest)WebRequest.Create(form_url));
response = (HttpWebResponse)request.GetResponse();
// #2
xml.Load(response.GetResponseStream());
// #3a
form_element = xml.selectSingleNode("form[#name='formname']");
form_submit_url = form_element.GetAttribute("action");
// #3b
element_nodes = form_element.SelectNodes("input,select,textarea", nsmgr)
// #4
foreach (XmlNode input_element in element_nodes) {
if (query_string.length > 0) { query_string.Append("&"); }
// MyFormElementValue() is a function/value you need to provide/define.
query_string.Append(input_element.GetAttribute("name") + "=" + MyFormElementValue(input_element.GetAttribute("name"));
}
// #5
// This is a GET request, you can figure out POST as needed, and deduce the submission type via the <form> element's attribute.
request = (HttpWebRequest)WebRequest.Create(form_submit_url + "?" + query_string.ToString()));
References:
Link
http://www.developerfusion.com/forum/thread/26371/
http://msdn.microsoft.com/en-us/library/system.xml.xmlelement.getattribute.aspx
http://msdn.microsoft.com/en-us/library/system.xml.xmlelement.selectnodes.aspx
If you don't want to go the HttpWebRequest route, I would suggest WatiN. Makes it very easy to automate IE or Firefox and not worry about the internals of the HTTP requests.

Categories