Return Javascript managed cookies programmatically using C# - c#

I'm trying to programmatically ping a website (through a console application) and return details of all the cookies being used by that site.
The following approach I'm using only captures those cookies managed through the header request and misses the ones set using Javascript:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.CookieContainer = new CookieContainer();
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
request.Method = "GET";
response = (HttpWebResponse)request.GetResponse();
foreach (Cookie c in response.Cookies)
{
cookie.Add(c);
}
Can someone possibly provide suggestions to how this can be extended to include javascript configured cookies?
Thanks!

Well, not sure how much help this will be but...
You are asking if there is an easy way to get cookies that are created dynamically by client-side javascript and the answer is no, there isn't (unless I'm missing something).
Is there a harder way, maybe, like wrapping the .NET browser control, letting the javascript execute through automated web scripts and then scraping the DOM... Doesn't sound like a good idea to me though.
Any other thoughts welcome.

Just managed to achieve something close to what I wanted (still testing the solution, and it seems to be missing some tracking cookies from 3rd parties!). However what I did was use Selenium, and the Chrome Driver executable. What it does is open up an instance of chrome, navigates to the URL, and pulls back all the dynamically generated information, which can then be interrogated using C#. :)

Related

Screen scrape that bypass older browser detection

I am trying in C# to screen scrap two airlines site so I can compare the two fares over many different dates. I manage to do on qua.com but when I try to do it on amadeus.net, I encounter that this site give me a response of
older browser not supported
So using webbrowser class doesn't work... using httpwebrequest doesnt work also.
So I want to use webclient but because amadeus.net is heavily base on js or something. I do not know where to post url.
Any suggestion?
Edit: webclient.downloadstring also doesn't wort
Try to use the Navigate overload with the user agent:
string useragent = "Mozilla/5.0 (Windows NT 6.0; rv:39.0) Gecko/20100101 Firefox/39.0" ;
webBrowser.Navigate(url, null, null,useragent) ;
An alternative is to use another WebBrowser such as awesomium
After looking into passing a fake useragent (from Jodrell) in httpWebrequest, this works but i had to deal with cookies so that can get complicated.
Graffito suggest to overload useragent within a webBrowser but didn't work as it gave me lots of JS loading error, this is because within that website it-self it requires a proper modern browser for it to work.
I found out that my IE itself is a version 9, so i upgraded it IE.11. Then tried Graffito solution again, but that didn't work.
So in the end i thought i might as well update webBrowser to the correct version by following this article

getting source code of redirected http site via c# webclient

I have problem with certain site - I am provided with list of product ID numbers (about 2000) and my job is to pull data from producer site. I already tried forming url of product pages, but there are some unknown variables that I can't put to get results. However there is search field so i can use url like this: http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen - the problem is, that given page display info (probably java script) and then redirect straight to desired page - the one that i need to pull data from.
is there any way of tracking this redirection thing?
I would like to put some of my code, but everything i got so far, i find unhelpful because it just download source of preregistered page.
public static string Download(string uri)
{
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
string s = client.DownloadString(uri);
return s;
}
Also suggested answer is not helpfull in this case, because redirection doesn't come with http request - page is redirected after few seconds of loading http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen url
I just found solution, And since i'm new, and i have to wait few hours to answer my question, it will end up there:
I hope that other users will find it usefull:
{pseudocode}
webBrowser1.Navigate('url');
while (webBrowser1.Url.AbsoluteUri != 'url')
{
// wait
}
String desiredUri = webBrowser1.Url.AbsoluteUri;
Thanks for answers.
Welcome to the wonderful world of page scraping. The short answer is "you can't do that." Not in the general case, anyway, and certainly not with WebClient. The problem appears to be that some Javascript does the redirection. And since all WebClient does is download the page, it's not even going to download the Javascript. Much less parse and execute it.
You might be able to do this by creating a program that uses the WebBrowser class. You can have it load the page. It should do the redirect and then you can inspect the result, which should be the page you were looking for. I haven't actually done this, but it does seem possible.
Your other option is to fire up your Web browser's developer tools (like IE's F12 Developer Tools) and watch what's happening. You can then inspect the Javascript that's being executed as well as the modified DOM, and see where the redirect happens.
Yes, it's tedious work. But once you figure out the redirect for one page, you can probably generate the URL for the other pages you want automatically.

Facebook Links without browser

I've read a lot of post on here, and other sites, but still not getting any clarification of my question. So here goes.
I have a Facebook link that requires you to be logged in. Is there a way using .Net (C#) that I can use Facebook API or something to "click" this link without a browser control.
Essentially, I have an application that I wrote that detects certain Farmville links. Right now I'm using a browser control to process the links. However, it's messy and crashes a lot.
Is there a way I can send the url along with maybe a token and api key to process the link?
Does anyone even understand what I'm asking? lol.
Disclaimer: I don't know what Facebook's API looks like, but I'm assuming it involves sending HTTP requests to their servers. I am also not 100% sure on your question.
You can do that with the classes in the System.Net namespace, specifically WebRequest and WebResponse:
using System.Net;
using System.IO;
...
HttpWebRequest req = WebRequest.Create("http://apiurl.com?args=go.here");
req.UserAgent = "The name of my program";
WebResponse resp = req.GetResponse();
string responseData = null;
using(TextReader reader = new StreamReader(resp.GetResponseStream())) {
responseData = reader.ReadToEnd();
}
//do whatever with responseData
You could put that in a method for easy access.
Sounds like your are hacking something... but here goes.
You might want to try a solution like Selenium, which is normally used for testing websites.
It is a little trouble to setup, but you can programmatically launch the facebook website in a browser of your choosing for login, programmatically enter username and password, programmatically click the login button, then navigate to your link.
It is kind of clunky, but get's the job done no matter what, since it appears to facebook that you are accessing their site from a browser.
I've tried similar sneaky tricks to enter my name over and over for Publisher's Clearing House, but they eventually wised up and banned my IP.

Loading an xml from an http url

I am using xml.net in web application
When I try load xml through an http internet url using:
xmlDoc.Load("http://....")
i get an error:"connected host has failed to respond"
Anyone knows the fix for this?
Thanks
Connected Host has failed to respond is because you've not go the uri correct or you're not allowed to access it, or it's not responding to you, or it's down. http doesn't really care what it transmits.
It probably means exactly what it says: the web server responsible for requests at the URL you specify isn't sending back responses. Something's going wrong on the web server, and if so, you can't do anything about someone's web server out there in the cloud not functioning properly.
You can, however, accept the fact that not every URL will work, and that you'll have to catch the Exception that the XmlDocument or XDocument is throwing. It's reasonable to expect that this scenario may occur. Thus, you need to programming defensively and by including the appropriate exception handling to handle such cases.
EDIT: So you can access it from outside the .NET framework eh? Perhaps try using an HTTP debugger, like Fiddler, and compare the request your XML document object makes to the request your browser makes. What header fields are different? Is there a header that the browser includes that the XML document object doesn't? Or are there different header values between the two, that may be causing the .NET request not to be responded to? Go figure.
If the page is accessible through a web browser but not through the load method it sounds as like the method isn't making a proper HTTP Request to the web server for the page wanted.
You can try using an HTTPWebRequest with a standard GET method to make a proper HTTP request for the webpage. You can then pass the response to the XMLDocument.Load method as a stream and it should then load up fine.
HTTPWebRequest Class MSDN.com
Try making a WebRequest to the url and set its UserAgent property to something like "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)" . If it works load the text you get in the xmldoc.
I tried loading the xml using .Net HttpWebRequest and also tried setting the userAgent property.
But its still giving me the error message:
"Unable to connect to the remote server"
The xml is however accesible through the browser.
Here is the code:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
request.UserAgent ="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)";
string result = string.Empty;
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
// Get the response stream
StreamReader reader = new StreamReader(response.GetResponseStream());
// Read the whole contents and return as a string
result = reader.ReadToEnd();
}
Thanks.
Is there any proxy being used by your browser?
just try telnet to see if you are able to connect to the web server by an application other than the browser.
so if you are using a url like http://www.xmlserver.com/file.xml then try the following in command prompt:
telnet xmlserver.com 80
A big difference between your request and a browser request could be bridged with the following line:
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

Is WebRequest The Right C# Tool For Interacting With Websites?

I'm writing a small tool in C# which will need to send and receive data to/from a website using POST and json formatting. I've never done anything like this before in C# (or any language really) so I'm struggling to find some useful information to get me started.
I've found some information on the WebRequest class in C# (specifically from here) but before I start diving into it, I wondered if this was the right tool for the job.
I've found plenty of tools to convert data into the json format but not much else, so any information would be really helpful here in case I end up down a dead end.
WebRequest and more specifically the HttpWebRequest class is a good starting point for what you want to achieve. To create the request you will use the WebRequest.Create and cast the created request to an HttpWebRequest to actually use it. You will then create your post data and send it to the stream like:
HttpWebRequest req = (HttpWebRequest)
WebRequest.Create("http://mysite.com/index.php");
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
string postData = "var=value1&var2=value2";
req.ContentLength = postData.Length;
StreamWriter stOut = new
StreamWriter(req.GetRequestStream(),
System.Text.Encoding.ASCII);
stOut.Write(postData);
stOut.Close();
Similarly you can read the response back by using the GetResponse method which will allow you to read the resultant response stream and do whatever else you need to do. You can find more info on the class at:
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx
WebClient is sometimes easier to use than WebRequest. You may want to take a look at it.
For JSON deserialization you are going to want to look at the JavaScriptSerializer class.
WebClient example:
using (WebClient client = new WebClient ())
{
//manipulate request headers (optional)
client.Headers.Add (HttpRequestHeader.UserAgent, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
//execute request and read response as string to console
using (StreamReader reader = new StreamReader(client.OpenRead(targetUri)))
{
string s = reader.ReadToEnd ();
Console.WriteLine (s);
}
}
Marked as wiki in case someone wants to update the code
When it comes to POSTing data to a web site, System.Net.HttpWebRequest (the HTTP-specific implementation of WebRequest) is a perfectly decent solution. It supports SSL, async requests and a bunch of other goodies, and is well-documented on MSDN.
The payload can be anything: data in JSON format or whatever -- as long as you set the ContentType property to something the server expects and understands (most likely application/json, text/json or text/x-json), all will be fine.
One potential issue when using HttpWebRequest from a system service: since it uses the IE proxy and credential information, default behavior may be a bit strange when running as the LOCALSYSTEM user (or basically any account that doesn't log on interactively on a regular basis). Setting the Proxy and Authentication properties to Nothing (or, as you C# folks prefer to call it, null, I guess) should avoid that.
I have used WebRequest for interacting with websites. It is the right 'tool'
I can't comment on the JSON aspect of your question.
The currently highest rated answer is helpful, but it doesn't send or receive JSON.
Here is an example that uses JSON for both sending and receiving:
How to post json object in web service
And here is the StackOverflow question that helped me most to solve this problem:
Problems sending and receiving JSON between ASP.net web service and ASP.Net web client
And here is another related question:
json call with C#
To convert from instance object to json formatted string and vice-versa, try out Json.NET:
http://json.codeplex.com/
I am currently using it for a project and it's easy to learn and work with and offers some flexibility in terms of serializing and custom type converters. It also supports a LINQ syntax for querying json input.
in 3.5 there is a built-in jsonserializer. The webrequest is the right class your looking for.
A few examples:
Link
http://dev.aol.com/blog/markdeveloper/ShareFileWithNETFramework
Link

Categories