Retrieve AJAX (XHR) infos from a dynamic website in C#

Retrieve AJAX (XHR) infos from a dynamic website in C# - c#

I'm trying to create an app using C# to retrieve my data cap infos from my ISP site. The page is this one but I suspect it can't be accessed from outside their network so if anyone needs more information just ask.
The page loads through AJAX the remaining traffic quota and displays it in the page after it's loaded. Right now I already have a working app using HtmlAgilityPack but it pretty hideous considering I'm using a WebBrowser control to load the page in background, wait for five seconds, parse the page's HTML with the library and see if it finds the necessary HTML string; if not the timer resets and repeats itself until the javascript has done its thing and loaded the data cap infos.
I want to somehow replicate what the webpage does and call the server and ask directly for the infos without creating a web browser instance in background and waiting for the infos to load.
Is it possible?
URL http://internet.tre.it/calls/checkMSISDN.aspx?g=2518607185932962118&h=UItDOr88/CtwONsfqfLgblVuTAysHYKc3kh6mLgiX0He49TU0I9lc56O8mWVhxzd3yFUDFF08P/Ng/5cg2nLtefFfjUIBq/QNQalmmSnKkQ=&mc=22299&acid=0&_=1541582209456
Headers
Host: internet.tre.it
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://internet.tre.it/
X-Requested-With: XMLHttpRequest
DNT: 1
Connection: keep-alive
Cookie: cookiesAccepted=1; _iub_cs-160673=%7B%22id%22%3A160673%2C%22consent%22%3Atrue%2C%22timestamp%22%3A%222018-04-16T15%3A42%3A10.978Z%22%2C%22version%22%3A%220.13.22%22%7D; ASP.NET_SessionId=n2wz2brfaepfj2klo0nqfwaw; pageVisit=c73074b54dbe40d49a715aeb9a0f4ea8; 148162__148162_d32646f682e342dba303540b0f10dac1=1
Response
Album of the JSON response. (I blacked out those two lines because they were respectively my own name and my phone number)

The response being a json string, I recommend the following :
Write code to download the json string from the URL. See for instance
https://stackoverflow.com/a/11891101/4180382
Copy the whole json string from your F12 response tab
In Visual Studio create a new class file
Click Edit > Paste special > Paste Json as classes.
In your code you will need the name of the first class that you pasted. It is the parent class. I would say it is 'Banners', but verify.
var obj = JsonConvert.DeserializeObject < Banners>(downloadedJson);
Now you can loop through the Menu array to extract all of the info you need.
And you are done! If all the info is in the json, no need to use HtmlAgilityPack. Let me know how you fare.

Related

Cannot download web page html, times out

I've tried everything I could find on these pages, Webclient, HttpClient, HttpWebRequest etc. I've looked at fiddler and copied all the cookies, added all the same headers and user agents etc. These all work for almost every other page I try, but for some reason Toys'R'Us just times out. Everything was working fine yesterday, it just seems to have stopped working today.
First time I've posted here but could somebody just try any method for downloading the html for a random example of:
http://www.toysrus.co.uk/toys/transformers-robots-in-disguise-strongarm-one-step-changers/0148891
No matter what I've tried it just seems to hang and timeout. The really annoying thing is that it loads fine in chrome, internet explorer, fiddler... I just can't seem to get it to download in c# regardless of downloadstring, HttpWebRequest etc, it just seems to hang and timeout.
To be clear, I had this working yesterday and has been for months, something must have changed on their servers but I can't figure out what.
If somebody could try it and test if it's just my setup that's failing I'd greatly appreciate it.

very odd this, finally got it working by adding these specific headers:
Accept: text/html, application/xhtml+xml, /
Accept-Language: en-GB
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
on a webclient downloadstring. I'm sure I tried this yesterday... but it's now working today. They must have changed something back their end I guess. Anyway... can carry on now. Thanks again Gianlucca.

Webservice returning 304

I have a usercontrol which on load gets the number of items in a cart.
C# web project, VS 2013.
I add this control on the masterpage.
I create a new aspx page and write code to add a product to a basket.
Create an asmx file to return the count. Add the cs file to the App_Code folder. Create my asmx file pointing to the CS file in
App_Code. I can navigate to the asmx webservice and confirm its
working.
Looking at this page http://www.c-sharpcorner.com/UploadFile/0c1bb2/calling-web-service-using-scriptmanager580/
I now add a ScriptManager to the masterpage to call the webservice.
What i'm trying to do is when the user adds an item to their cart the webservice is called and loads the control with the correct amount of items.
At present when the user adds an item the service is called but returns
myservice.asmx/js
HTTP/1.1 304 Not Modified
Cache-Control: private
X-SourceFiles: =?UTF-8?B?Qz4XGpz?=
So i navigate to the usercontrol page (by adding the control to a page) directly and can see the correct number of items but cant work out why the webservice is returning 304 when called.
My understanding is that 304 means the page is cached but not sure what i need to do next?
I can provide more code but hoping there is something simple I may have missed here?
Edit 1
Request Headers
GET /myService.asmx/js HTTP/1.1
Host: localhost:12345
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
Accept: */*
Referer: http://localhost:12345/myproductpage
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: XSRF-TOKEN=vSc1; XSRF-V=fiwSIljhKu74EC; __atuvc=4%7C26; __atuvs=5773d8720cbfe159003; SiteCookie=886c194f-e921-4587
**Response Headers**
HTTP/1.1 304 Not Modified
Cache-Control: private
X-SourceFiles: =?UTF-8?B?Qz4XGpz?=

Blank HTTP request to localhost server

I was looking to learn more about web servers and so I followed this tutorial on codeguru to build a sample one. However, whenever I run it and load the default page, I get the standard GET http request and then another socket connection is accepted and then a blank http request is shown. The console output is below:
Web Server now listening on port 7070, press ^C to stop...
Socket Type: Stream
Client Connected!
=================
Client IP: 127.0.0.1:56310
Message received: "GET / HTTP/1.1
Host: localhost:7070
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
"
Directory requested: /
File Requested: C:\_WebServer\default.html
No. of bytes sent: 103
Total Bytes: 191
No. of bytes sent: 191
Socket Type: Stream
Client Connected!
=================
Client IP: 127.0.0.1:56311
Message received: " "
Only GET is supported :(
What is this blank http request and why is there a second socket connection even though I only made one request from my browser? Is this a sort of keep-alive thing?
Edit: Could this maybe be a timeout thing?
Edit2:
I think a reason for this might be the following:
server socket receives 2 http requests when I send from chrome and receives one when I send from firefox
It seems like Chrome does a bit of a "placeholder" socket request to optimise later transfers. I get this behaviour in IE as well though, so maybe IE is now doing something similar.

That code has so many errors that I would not recommend anyone reading it. The most severe is that it expects everything to be sent and received with a single operation which is just true for the most trivial operations.
If you want to learn how to code a web server I suggest that you look at my code: https://github.com/jgauffin/Griffin.Framework/tree/master/src/Griffin.Framework/Griffin.Core/Net/Protocols/Http
It's released under the apache license and I've separated the HTTP protocol parsing from the network IO which hopefully makes it easier to understand than the article that you linked to.

Get the url of page

I use this to navigate to a website
doc = web.Load("http://google.com/search?btnI=1&q=[my keyword]") //I'm Feeling Lucky
Then I need the url of navigated website... How can I get it?

You could use HtmlWeb.ResponseUri property which gets the URI of the Internet resource that actually responded to the request.
An example - googling for "cookies":
var web = new HtmlWeb();
var doc = web.Load("http://google.com/search?btnI=1&q=cookies");
var responseUrl = web.ResponseUri;
gets the http://en.wikipedia.org/wiki/HTTP_cookie.

You can get the url of current browser by :
string url = HttpContext.Current.Request.Url.AbsoluteUri;

Looks like Sam1 might have given you the right answer (I have no real experience with the HTML Agility Pack) for a one or two instance endeavor.
That being said, if you intend to make a lot of calls to Google using keywords so that you can retrieve the top result (i.e. the "I'm Feeling Lucky" result), then I would highly suggest you use Google's Custom Search API (https://developers.google.com/custom-search/v1/overview).
It would use MUCH less bandwidth if you are pulling JSON results using this API.
Usage of the API only allows for 100 free queries per day. This might fall within your application requirements, but it also might not. If you have the means, I would suggest supporting Google by paying if you intend to make thousands of queries.

There are two things to note here. First - using "http://google.com" in the above URL without the "www" forces a 301 redirect to "http://www.google.com" so you should include the www to keep things simple.
The second is that opening the URL (with the www) performs a 302 redirect. The destination is inside the response headers. So if you can catch that 302 response, you can get the URL Google will send you to, before it sends you there.
Here are the response and request headers for the first request, in which Google performs a 301 redirect to the www domain.
Response Headers
Cache-Control public, max-age=2592000
Content-Length 244
Content-Type text/html; charset=UTF-8
Date Mon, 18 Feb 2013 14:14:40 GMT
Expires Wed, 20 Mar 2013 14:14:40 GMT
Location http://www.google.com/search?btnI=1&q=html5
Server gws
X-Frame-Options SAMEORIGIN
X-XSS-Protection 1; mode=block
Request Headers
Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Cookie PREF=ID=5d01155d00a8d706:U=49fab5927df1f8ad:FF=0:TM=1359732743:LM=1360874099:S=byw-1-fgfbcRWdPN; NID=67=NpFNjRkjTFtyrcYPE-pQeJiMFEgWMWdyVMVpbYATZySlsw63Hz4FCw2Tcr4tynhAhyq1vnuPqmdFBOC65Nd-048ZxrgP_HVtKbVCe7psi-G2aMvsOUbiBl1xYks2xK2K
DNT 1
Host google.com
User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
And the response/request headers for the 302 that takes me to the destination page. You can see the destination URL is returned. I've bolded it in the copy.
Response Headers
Cache-Control private
Content-Length 231
Content-Type text/html; charset=UTF-8
Date Mon, 18 Feb 2013 14:14:41 GMT
Location http://en.wikipedia.org/wiki/HTML5
Server gws
X-Frame-Options SAMEORIGIN
X-XSS-Protection 1; mode=block
Request Headers
Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Cookie PREF=ID=5d01155d00a8d706:U=49fab5927df1f8ad:FF=0:TM=1359732743:LM=1360874099:S=byw-1-fgfbcRWdPN; NID=67=NpFNjRkjTFtyrcYPE-pQeJiMFEgWMWdyVMVpbYATZySlsw63Hz4FCw2Tcr4tynhAhyq1vnuPqmdFBOC65Nd-048ZxrgP_HVtKbVCe7psi-G2aMvsOUbiBl1xYks2xK2K
DNT 1
Host www.google.com
User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0

C# WebRequest.getResponse(): 400 Bad Request

I am trying to download a file from a server using System.Web.
It actually works, but some links give me trouble. The links look like this:
http://cdn.somesite.com/r1KH3Z%2FaMY6kLQ9Y4nVxYtlfrcewvKO9HLTCUBjU8IBAYnA3vzE1LGrkqMrR9Nh3jTMVFZzC7mxMBeNK5uY3nx5K0MjUaegM3crVpFNGk6a6TW6NJ3hnlvFuaugE65SQ4yM5754BM%2BLagqYvwvLAhG3DKU9SGUI54UAq3dwMDU%2BMl9lUO18hJF3OtzKiQfrC/the_file.ext
The code looks basically like this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
WebResponse response = request.getResponse();
getResponse() always throws an exception (Error 400 Bad Request).
However, I know the link works because I can download the file with Firefox without problems.
I also tried to decode the link with Uri.UnescapeDataString(link), but that link wont even work in Firefox.
Other links work perfectly fine this way.. just these won't work.
Edit:
Okay, i found something out using wireshark:
If i open the link using Firefox, this is sent:
&ME3#"dM*PNyAo PA:]GET /r1KH3Z%2FaMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs4QgbudzcrJivrAaOTYkEnozqmdoSCCY8yb1i22YtEAV/epd_outpost_12adb.flv HTTP/1.1
Host: cdn.somesite.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
I think only the first line is the problem, because WebRequest.Create(link) decodes the url:
&MEz.#!dM/nP9#~P>.GET /r1KH3Z/aMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs6Mmh1EsQQ4vJVYUwtbLBDNx9AwCHlWDfzfSWIHzaaIo/epd_outpost_12adb.flv HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Host: cdn.somesite.com
( %2F is replaced with / )
Another edit:
I found out that the Uri class decodes the url automatically:
Uri uri = new Uri(link); //link is not decoded
Debug.WriteLine(uri.ToString()); //link is decoded here.
How can I prevent this?
Thanks in advance for your help.

By default, the Uri class will not allow an escaped / character (%2f) in a URI (even though this appears to be legal in my reading of RFC 3986).
Uri uri = new Uri("http://example.com/embed%2fded");
Console.WriteLine(uri.AbsoluteUri); // prints: http://example.com/embed/ded
(Note: don't use Uri.ToString to print URIs.)
According to the bug report for this issue on Microsoft Connect, this behaviour is by design, but you can work around it by adding the following to your app.config or web.config file:
<uri>
<schemeSettings>
<add name="http" genericUriParserOptions="DontUnescapePathDotsAndSlashes" />
</schemeSettings>
</uri>
(Since WebRequest.Create(string) just delegates to WebRequest.Create(Uri), you would need to use this workaround no matter which method you call.)

This has now changed in .NET 4.5. By default you can now use escaped slashes. I posted more info on this (including screenshots) in the comments here: GETting a URL with an url-encoded slash

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.