I am trying to download a file from a server using System.Web.
It actually works, but some links give me trouble. The links look like this:
http://cdn.somesite.com/r1KH3Z%2FaMY6kLQ9Y4nVxYtlfrcewvKO9HLTCUBjU8IBAYnA3vzE1LGrkqMrR9Nh3jTMVFZzC7mxMBeNK5uY3nx5K0MjUaegM3crVpFNGk6a6TW6NJ3hnlvFuaugE65SQ4yM5754BM%2BLagqYvwvLAhG3DKU9SGUI54UAq3dwMDU%2BMl9lUO18hJF3OtzKiQfrC/the_file.ext
The code looks basically like this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
WebResponse response = request.getResponse();
getResponse() always throws an exception (Error 400 Bad Request).
However, I know the link works because I can download the file with Firefox without problems.
I also tried to decode the link with Uri.UnescapeDataString(link), but that link wont even work in Firefox.
Other links work perfectly fine this way.. just these won't work.
Edit:
Okay, i found something out using wireshark:
If i open the link using Firefox, this is sent:
&ME3#"dM*PNyAo PA:]GET /r1KH3Z%2FaMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs4QgbudzcrJivrAaOTYkEnozqmdoSCCY8yb1i22YtEAV/epd_outpost_12adb.flv HTTP/1.1
Host: cdn.somesite.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
I think only the first line is the problem, because WebRequest.Create(link) decodes the url:
&MEz.#!dM/nP9#~P>.GET /r1KH3Z/aMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs6Mmh1EsQQ4vJVYUwtbLBDNx9AwCHlWDfzfSWIHzaaIo/epd_outpost_12adb.flv HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Host: cdn.somesite.com
( %2F is replaced with / )
Another edit:
I found out that the Uri class decodes the url automatically:
Uri uri = new Uri(link); //link is not decoded
Debug.WriteLine(uri.ToString()); //link is decoded here.
How can I prevent this?
Thanks in advance for your help.
By default, the Uri class will not allow an escaped / character (%2f) in a URI (even though this appears to be legal in my reading of RFC 3986).
Uri uri = new Uri("http://example.com/embed%2fded");
Console.WriteLine(uri.AbsoluteUri); // prints: http://example.com/embed/ded
(Note: don't use Uri.ToString to print URIs.)
According to the bug report for this issue on Microsoft Connect, this behaviour is by design, but you can work around it by adding the following to your app.config or web.config file:
<uri>
<schemeSettings>
<add name="http" genericUriParserOptions="DontUnescapePathDotsAndSlashes" />
</schemeSettings>
</uri>
(Since WebRequest.Create(string) just delegates to WebRequest.Create(Uri), you would need to use this workaround no matter which method you call.)
This has now changed in .NET 4.5. By default you can now use escaped slashes. I posted more info on this (including screenshots) in the comments here: GETting a URL with an url-encoded slash
Related
I'm trying to create an app using C# to retrieve my data cap infos from my ISP site. The page is this one but I suspect it can't be accessed from outside their network so if anyone needs more information just ask.
The page loads through AJAX the remaining traffic quota and displays it in the page after it's loaded. Right now I already have a working app using HtmlAgilityPack but it pretty hideous considering I'm using a WebBrowser control to load the page in background, wait for five seconds, parse the page's HTML with the library and see if it finds the necessary HTML string; if not the timer resets and repeats itself until the javascript has done its thing and loaded the data cap infos.
I want to somehow replicate what the webpage does and call the server and ask directly for the infos without creating a web browser instance in background and waiting for the infos to load.
Is it possible?
URL http://internet.tre.it/calls/checkMSISDN.aspx?g=2518607185932962118&h=UItDOr88/CtwONsfqfLgblVuTAysHYKc3kh6mLgiX0He49TU0I9lc56O8mWVhxzd3yFUDFF08P/Ng/5cg2nLtefFfjUIBq/QNQalmmSnKkQ=&mc=22299&acid=0&_=1541582209456
Headers
Host: internet.tre.it
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://internet.tre.it/
X-Requested-With: XMLHttpRequest
DNT: 1
Connection: keep-alive
Cookie: cookiesAccepted=1; _iub_cs-160673=%7B%22id%22%3A160673%2C%22consent%22%3Atrue%2C%22timestamp%22%3A%222018-04-16T15%3A42%3A10.978Z%22%2C%22version%22%3A%220.13.22%22%7D; ASP.NET_SessionId=n2wz2brfaepfj2klo0nqfwaw; pageVisit=c73074b54dbe40d49a715aeb9a0f4ea8; 148162__148162_d32646f682e342dba303540b0f10dac1=1
Response
Album of the JSON response. (I blacked out those two lines because they were respectively my own name and my phone number)
The response being a json string, I recommend the following :
Write code to download the json string from the URL. See for instance
https://stackoverflow.com/a/11891101/4180382
Copy the whole json string from your F12 response tab
In Visual Studio create a new class file
Click Edit > Paste special > Paste Json as classes.
In your code you will need the name of the first class that you pasted. It is the parent class. I would say it is 'Banners', but verify.
var obj = JsonConvert.DeserializeObject < Banners>(downloadedJson);
Now you can loop through the Menu array to extract all of the info you need.
And you are done! If all the info is in the json, no need to use HtmlAgilityPack. Let me know how you fare.
I need to send a series of get/post requests for an application I'm making (a custom wrapper for an online chat). I completed the site login process and initial chat loading, by loosely simulating requests logged from Telerik Fiddler.
Now I'm having trouble with a different post request, that registers the user as online.
It's a connection to a socket.io server, but I know for a fact it's possible to do without a socket connection, because everything worked fine when I sent my requests with Fiddler's "composer" feature.
Here's the request I'm trying to simulate
POST http://events.********.com/socket.io/1/xhr-polling/vLaINOG3fKixnNs-oTWq?t=1498442322413 HTTP/1.1
Host: events.********.com
Connection: keep-alive
Content-Length: 144
Origin: http://www.********.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Content-type: text/plain;charset=UTF-8
Accept: */*
Referer: http://www.********.com/home.php
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
5:::{"name":"updateUserStatus","args":[{"status":"online"}]}
Here's how it looks trying to simulate it (ignore the different url, it should work with this)
POST http://events.********.com/socket.io/1/xhr-polling/owR02QZlrwKOwcLjoTW8 HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Host: events.********.com
Content-Length: 60
Expect: 100-continue
5:::{"name":"updateUserStatus","args":[{"status":"online"}]}
Clearly a lot of stuff missing, but I don't think most of it matters. What I've noticed is that the original request's header has content-type set to "text/plain," and even though I've tried many ways to change the accept and content-type headers to match, it always sends as application/json and results in a 404.image
I feel like I'm missing something really obvious and stupid, but I've been troubleshooting for the past couple hours and can't figure anything out.
Here's the code I'm using for the request (I took out the "text/plain" content-type and other stuff i had before that didnt work so it's somewhat cleaner)
chatreq = (HttpWebRequest)WebRequest.Create("http://events.********.com/socket.io/1/xhr-polling/" + socket);
chatreq.CookieContainer = cookieContainer;
postData = "5:::{\"name\":\"updateUserStatus\",\"args\":[{\"status\":\"online\"}]}";
data = Encoding.ASCII.GetBytes(postData);
chatreq.Method = "POST";
chatreq.ContentType = "application/x-www-form-urlencoded";
chatreq.ContentLength = data.Length;
using (var stream = chatreq.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
response = (HttpWebResponse)chatreq.GetResponse();
Or, if there's a simple way to send a raw http request, that would be great.
Thanks.
I used Headers.Clear() first, and then set all the headers, and it went through normally.
I ended up using a socket.io library for this project though, since it ended up being a lot easier.
I have implemented Unity.mvc3 into my project but now my Knockout AJAX methods are starting a Longpolling process and I do not know why, can anyone help please?
This is the JSON response (I have not got a clue why I am getting
this)- {"C":"d-13044D90-B,0|E,2|F,0","M":[]}
A%2F%2Flocalhost%3A1764%2Findex.html&browserName=Chrome&tid=5&_=1403698144789
Request Method:GET Status Code:200 OK Request Headersview source
Accept:text/plain, /; q=0.01 Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8 Cache-Control:no-cache
Connection:keep-alive Content-Type:application/x-www-form-urlencoded;
charset=UTF-8
Cookie:__RequestVerificationToken=j6VFekQ7Po1EfD9wSUK4e4A_ts1SVuGIbRwDG727whnb8l--9P5v5maO-FhCOjFLitRIegjYixEX9698kZR_JWHvo7lUmFYfOwVNjwQ7Hhg1
Host:localhost:7356
Pragma:no-cache
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/35.0.1916.153 Safari/537.36 Query String
Parametersview sourceview URL encoded transport:longPolling
connectionToken:AQAAANCMnd8BFdERjHoAwE/Cl+sBAAAANVlw/ETQ70WfNHf1KMvu6gAAAAACAAAAAAAQZgAAAAEAACAAAACnUs1Jjk7unYzODmvpJNj7Nbvay/Dx4kSOH+V/CtVDTwAAAAAOgAAAAAIAACAAAAAm7NaW0uCoayuqpNT8z8+uLy/Uio+Nbh8g+VDE0X1/8TAAAAAKXR/1gbPBSIz2WagA2zJI6Te45f63pWeiYkXBGYlYeO+WbWbkTycWNrGmRqaUY8JAAAAAsCIr6JEw/gAUfIClKEjm3cBXG3+I33yhob1f3jMrvmLQTeDC7hphp1SAz+utVN28VstEmeExHyyuycRP/upWIw==
messageId:d-13044D90-B,0|E,2|F,0
browserName:Chrome tid:5
_:1403698144789 Response Headersview source Access-Control-Allow-Credentials:true
Content-Type:application/json; charset=UTF-8 Date:Wed, 25 Jun 2014
12:09:10 GMT Server:Microsoft-HTTPAPI/2.0 Transfer-Encoding:chunked
X-Content-Type-Options:nosniff
And this is the answer, add key="vs:EnableBrowserLink" value="false" to the web config between the appSettings tags the automatic polling has stopped, I found out this happens by default in VS2012 and now no more auto polling :-) - . I also cannot believe no one posted anything for this but anyways this worked a treat.
I use this to navigate to a website
doc = web.Load("http://google.com/search?btnI=1&q=[my keyword]") //I'm Feeling Lucky
Then I need the url of navigated website... How can I get it?
You could use HtmlWeb.ResponseUri property which gets the URI of the Internet resource that actually responded to the request.
An example - googling for "cookies":
var web = new HtmlWeb();
var doc = web.Load("http://google.com/search?btnI=1&q=cookies");
var responseUrl = web.ResponseUri;
gets the http://en.wikipedia.org/wiki/HTTP_cookie.
You can get the url of current browser by :
string url = HttpContext.Current.Request.Url.AbsoluteUri;
Looks like Sam1 might have given you the right answer (I have no real experience with the HTML Agility Pack) for a one or two instance endeavor.
That being said, if you intend to make a lot of calls to Google using keywords so that you can retrieve the top result (i.e. the "I'm Feeling Lucky" result), then I would highly suggest you use Google's Custom Search API (https://developers.google.com/custom-search/v1/overview).
It would use MUCH less bandwidth if you are pulling JSON results using this API.
Usage of the API only allows for 100 free queries per day. This might fall within your application requirements, but it also might not. If you have the means, I would suggest supporting Google by paying if you intend to make thousands of queries.
There are two things to note here. First - using "http://google.com" in the above URL without the "www" forces a 301 redirect to "http://www.google.com" so you should include the www to keep things simple.
The second is that opening the URL (with the www) performs a 302 redirect. The destination is inside the response headers. So if you can catch that 302 response, you can get the URL Google will send you to, before it sends you there.
Here are the response and request headers for the first request, in which Google performs a 301 redirect to the www domain.
Response Headers
Cache-Control public, max-age=2592000
Content-Length 244
Content-Type text/html; charset=UTF-8
Date Mon, 18 Feb 2013 14:14:40 GMT
Expires Wed, 20 Mar 2013 14:14:40 GMT
Location http://www.google.com/search?btnI=1&q=html5
Server gws
X-Frame-Options SAMEORIGIN
X-XSS-Protection 1; mode=block
Request Headers
Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Cookie PREF=ID=5d01155d00a8d706:U=49fab5927df1f8ad:FF=0:TM=1359732743:LM=1360874099:S=byw-1-fgfbcRWdPN; NID=67=NpFNjRkjTFtyrcYPE-pQeJiMFEgWMWdyVMVpbYATZySlsw63Hz4FCw2Tcr4tynhAhyq1vnuPqmdFBOC65Nd-048ZxrgP_HVtKbVCe7psi-G2aMvsOUbiBl1xYks2xK2K
DNT 1
Host google.com
User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
And the response/request headers for the 302 that takes me to the destination page. You can see the destination URL is returned. I've bolded it in the copy.
Response Headers
Cache-Control private
Content-Length 231
Content-Type text/html; charset=UTF-8
Date Mon, 18 Feb 2013 14:14:41 GMT
Location http://en.wikipedia.org/wiki/HTML5
Server gws
X-Frame-Options SAMEORIGIN
X-XSS-Protection 1; mode=block
Request Headers
Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Connection keep-alive
Cookie PREF=ID=5d01155d00a8d706:U=49fab5927df1f8ad:FF=0:TM=1359732743:LM=1360874099:S=byw-1-fgfbcRWdPN; NID=67=NpFNjRkjTFtyrcYPE-pQeJiMFEgWMWdyVMVpbYATZySlsw63Hz4FCw2Tcr4tynhAhyq1vnuPqmdFBOC65Nd-048ZxrgP_HVtKbVCe7psi-G2aMvsOUbiBl1xYks2xK2K
DNT 1
Host www.google.com
User-Agent Mozilla/5.0 (Windows NT 6.2; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
Context:
Hi everyone, i am trying to simulate a query on this website, but i am failing to do so.
I am using C# and a custom self developed library to Wrap the WebRequests actions making it easier to simulate Posts and Gets for Strings and Bitmaps.
Also, i'm using Fiddler2 Web Debugger to debug the web requests of the service
How to test the service Yourself:
Link to the service
Use this document on the first white box : 04034872000121
Write the captcha and click at "Consultar"
Thats it.
Problem:
After Debuging the requests with fiddler, and replicated everything on code (Cookies, Origin, Host, Postdata with a huge json and so on).
The request for the query, still not working, it redirects me to the home page again, instead of querying the document. (I am allowing "AutoRedirect" on web request object).
The only parameter that i'm not beeing able to replicate is the : GxAjaxRequest: 1
Here is the Fiddler debug feedback of the request:
POST http://sefaznet.ac.gov.br/sefazonline/servlet/hpfsincon?0898a16d81a4e94896958b17b52f252d,gx-no-cache=1354713117196 HTTP/1.1
Host: sefaznet.ac.gov.br
Connection: keep-alive
Content-Length: 1337
Origin: http://sefaznet.ac.gov.br
GxAjaxRequest: 1 **Weird Parameter. I've never saw it before.**
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11
Content-Type: application/x-www-form-urlencoded
Accept: */*
Referer: http://sefaznet.ac.gov.br/sefazonline/servlet/hpfsincon
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: GX_SESSION_ID=vSLRLKed3eXJGMBorGepVtQkJOQ1I3o0EBUVzT0g%2BI8%3D; JSESSIONID=af2ba968b7889ec8869caaaba281
vNUMDOC=04034872000121&cfield=chin&BUTTON1=Consultar&BTN_VOLTAR=Retornar&GXState=%7B%22_EventName%22%3A%22E'VISUALIZADADOS'.%22%2C%22_EventGridId%22%3A44%2C%22_EventRowId%22%3Aundefined%2C%22nRC_Duplicados%22%3A%220%22%2C%22CAPTCHA1_Reloadimagetext%22%3A%22Obter%20nova%20imagem!%22%2C%22CAPTCHA1_Validationresult%22%3A1%2C%22GX_FocusControl%22%3A%22vNUMDOC%22%2C%22GX_AJAX_KEY%22%3A%2264FFFF0AFF7A4DFF2655FFFFFF26FF77%22%2C%22AJAX_SECURITY_TOKEN%22%3A%221a9634f566dcd40d12bb8146fd7ff6edca12ae737a3743d79b4b826c3bd4a604%22%2C%22GX_CMP_OBJS%22%3A%7B%7D%2C%22sCallerURL%22%3A%22%2Fsefazonline%2Fservlet%2Fhpfsindado%3FeTlFtl5mBgEOtpLCt8Q02bMjmN3K93hV7i2Uxq_rHv0%3D%22%2C%22GX_RES_PROVIDER%22%3A%22com.genexus.webpanels.GXResourceProvider%22%2C%22GX_THEME%22%3A%22GeneXusX%22%2C%22_MODE%22%3A%22%22%2C%22Mode%22%3A%22%22%2C%22IsModified%22%3A%221%22%2C%22MESSAGE_Width%22%3A%22100%22%2C%22MESSAGE_Height%22%3A%22100%22%2C%22MESSAGE_Show%22%3A%22false%22%2C%22MESSAGE_Title%22%3A%22Title%22%2C%22MESSAGE_Message%22%3A%22This%20is%20the%20message%22%2C%22MESSAGE_Type%22%3A%22alert%22%2C%22MESSAGE_Icon%22%3A%22info%22%2C%22MESSAGE_Cls%22%3A%22%22%2C%22MESSAGE_Position%22%3A%22t%22%2C%22MESSAGE_Duration%22%3A1%2C%22MESSAGE_Visible%22%3A1%2C%22CAPTCHA1_Width%22%3A%22140%22%2C%22CAPTCHA1_Height%22%3A%2239%22%2C%22CAPTCHA1_Visible%22%3A1%7D&
Question:
How do i actually replicate/add this parameter to my webrequest via code ?
Is there any way to do so ?
By the way, the site messes alot with scripts which was hard to "figure out" the origin from most parameters used on the requests.
I hope someone might help me out.
Thanks in advance.
I've figured out.
The problem was that i've had to add a custom header to each request.
webRequest.Headers.Add ("customheadertext and value");
Now fiddler shows correctly my new request, with the added header
POST http://sefaznet.ac.gov.br/sefazonline/servlet/hpfsincon?0898a16d81a4e94896958b17b52f252d,gx-no-cache=1354721123208 HTTP/1.1
GxAjaxRequest: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.107 Safari/535.1
Content-Type: application/x-www-form-urlencoded
Referer: http://sefaznet.ac.gov.br/sefazonline/servlet/hpfsincon
Host: sefaznet.ac.gov.br
Cookie: GX_SESSION_ID=B8w8AQ4W%2FLzLHIpBor3JwJDQAWGy1xRqCYUMnzF14Yk%3D; JSESSIONID=c19564cbebfab1911442fd64a0bb
Content-Length: 1291
Expect: 100-continue
Accept-Encoding: gzip, deflate