How to read a page that uses JS to block scraping? - c#

I'm using Xamarin (C#). I tried this code to get data I need in my app:
String url = "http://mmehdirajabiigdl.gigfa.com/VideoImageDownloader.php?link=https://www.instagram.com/tv/CJ_YPTLJ8Zx/?igshid=1jcvto1p6ekxx";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string mm = sr.ReadToEnd();
The response is wrong because this page is using JS that blocks my codes from getting HTML codes.
Well how can I fix this? I know WebRequest has no "Enabling JS". Maybe I should use WebBrowser but Xamarin has no WebBrowser.

As i know, HttpWebRequest just does a HTTP request and you could scrape the static HTML. It does not support JavaScript.
You could execute Javascript with Xamarin Forms WebView.
For more details, please check the link below. https://www.xamarinhelp.com/xamarin-forms-webview-executing-javascript/

Related

How do I call httpwebrequest c# .net

I am trying to learn how to use proxies..
My main goal is to be able to input a proxy adress in a text box and use that input as an actual proxy adress for the webBrowser in c#
But first what I need to figure out is how do I call the httpwebrequest?
I was looking at this question and the answers below and I was trying to follow along but when ever I try to use the httpwebrequest it doesnt even pop up in intellisense.
Im refering to this line right here
HttpWebRequest request = WebRequest.Create(postUrl) as HttpWebRequest;
how to use http post with proxy support in c#
Here is my code in button click calls HttpWebRequest that redirects to google home page that you can get either XML or HTML and you can also redirects to page.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.google.co.in");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
//Get response as stream from httpwebresponse
StreamReader resStream = new StreamReader(response.GetResponseStream());
//Create instance for xml document
XmlDocument doc = new XmlDocument();
//Load response stream in to xml result
xmlResult = resStream.ReadToEnd();
//Load xmlResult variable value into xml documnet
doc.LoadXml(xmlResult);
Please refer this image 1 and snapshot2

Is it possible to access a webpage without a webbrowser?

I want to visit a web page (it has to be accessed, nothing needs to be read, modified, etc. Just accessed). I don't want to use webbrowser.
Just do a cURL GET request.
curl http://example.com/
And if you want to use C#, then
using System.Net;
string url = "https://www.example.com/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream resStream = response.GetResponseStream();
Also, you can use Fiddler to send request to the remote server (it is very helpful for service debuging).
try to use WebClient class:
for example:
WebClient client = new WebClient ();
string reply = client.DownloadString (address);

Login to website using C#

Before everyone gets upset that this has been answered. I have scoured the web looking for how to do this and have tried a number of methods. Login to website, via C# and How to programmatically log in to a website to screenscape? Both of these were helpful but I cannot figure out why I cannot get past the login page. Here is my code:
string url = "https://www.advocare.com/login.aspx";
string url2 = "https://url.after.login";
HttpWebRequest wReq = WebRequest.Create(url) as HttpWebRequest;
wReq.KeepAlive = true;
wReq.Method = "POST";
wReq.AllowAutoRedirect = false;
wReq.ContentType = "application/x-www-form-urlencoded";
string postData = "ctl00$cphContent$txtUserName=Username&ctl00$cphContent$txtPassword=Password";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
wReq.ContentLength = dataBytes.Length;
using (Stream postStream = wReq.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse wResp = wReq.GetResponse() as HttpWebResponse;
string pageSource;
wReq = WebRequest.Create(url2) as HttpWebRequest;
wReq.CookieContainer = new CookieContainer();
wReq.CookieContainer.Add(wResp.Cookies);
HttpWebResponse wResp2 = wReq.GetResponse() as HttpWebResponse;
using (StreamReader sr = new StreamReader(wResp2.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
Everytime I look at pageSource it is the HTML for the login.aspx page. I must be missing something here. Maybe it's not taking the cookie, I don't know. One question I have aside from, why doesn't this work, is in the string postData = "". Are those suppose to be the name or id portion of the html tag? Any help on this is greatly appreciated as I am stumped and will have to find a different way. I would like to continue with the WebRequest and WebResponse instead of using WebBrowser. If I can't, oh well. Thanks again for any help!
What are you trying to do besides login? If its like QAing a site programically, i would suggest using selenium andcreate a c# app based off of that. If u want i can post a link to a base project for a selenium based project.
Don't necessarily view the page source, but look at the actual HTTP POST. Install a HTTP proxy such as Fiddler and then re-visit the page you are trying to emulate. Complete the HTTP POST request, and check out the results produced in the proxy. From there you'll be able to see the actual parameters, cookies, headers, etc. that are being passed and you can then attempt to replicate this in your code. It's often easy to miss something when simply viewing the HTML source but monitoring the network traffic is pretty straight forward.

Issues retrieving facebook social plugin comments for page, C# HttpWebRequest class

I'm hoping I've done something knuckle-headed here and there is an easy answer. I'm simply trying to retrieve the list of comments for a page on my site. I use the social plug-in and then retrieve the comment id via the edge event. Server side I send the page id back and do a simple request using a HttpWebRequest. Worked well back in October, but now I get an 'internal error' response from FB. I can use the same url string put it into a browser and get the comments back in the browser in json.
StringBuilder url = new StringBuilder();
url.Append("https://graph.facebook.com/comments/?ids=" + comment.page);
string requestString = url.ToString();
HttpWebRequest request = WebRequest.Create(requestString) as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Ideas? Thanks much in advance.
Since you're using the Facebook C# SDK (per your tag), try:
var url = "{your url}";
var api = new Facebook.FacebookClient(appId,appSec);
dynamic commentsObj = api.Get("/comments/?ids=" + url);
dynamic arrayOfComments = commentsObj[url].data

How can I get a value from a HTTP Request from a windows forms client

How can I hit a link such as http://somewhere.com/client.php?locationID=1
and return the value of the location id from a C# windows forms application?
Trying to get an HTTPGetRequest from a C# Windows Forms Application.
Not sure where to start or how this would be done.
Thanks
try this:
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(#"http://somewhere.com/client.php?locationID=1");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string content = new StreamReader(response.GetResponseStream()).ReadToEnd();
I believe if you use the HttpWebRequest class, this information will be in the referer of the header:
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx
private void printReferer(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); ;
Console.WriteLine(req.Referer);
}
If you are trying to get the data from the page, use the WebClient class:
http://msdn.microsoft.com/en-us/library/system.net.webclient%28v=vs.80%29.aspx
It is a wrapper for HttpWebRequest/HttpWebResponse that makes life a little easier.

Categories