Doing a simple GET from C#
var webClient = new WebClient();
webClient.Headers.Add("Accept", "*.*");
webClient.Headers.Add("Accept-Encoding", "gzip, deflate");
webClient.Headers.Add("User-Agent", "runscope/0.1");
var response = webClient.DownloadString("http://booking.frederiksberg.dk/NetInterBook/SearchScheme/SimpleSearch.aspx");
I get a response that is different than the same thing executed from Chrome's Advanced Rest Client / Postman / http://Hurl.it
I still get a website, but it doesn't contain the form information that I am looking for (the items with id-s similar to this drplFacility_item_1).
I've tried using RestSharp and HttpWebResponse as well, with the same results. What am I not doing that these other HTTP clients are? According to Chrome's network tab, they seem to be doing pretty vanilla GET-s. Thanks!
Here's the page I get from the webclient: http://pastebin.com/5PjxejKT
It was a Visual Studio GUI bug that was tripping me up. I did use inspectors before posting this question, and I was just really baffled as to why I'm getting a different response for the same GET from .NET then everywhere else. Turns out, I wasn't. (Thanks WireShark!)
Here's the active bug report: https://connect.microsoft.com/VisualStudio/feedback/details/2016177/text-visualizer-misses-corrupts-text-in-long-strings
Hope this helps anyone who might come across this, it took me a long time to figure this one out...
Related
I have problem with certain site - I am provided with list of product ID numbers (about 2000) and my job is to pull data from producer site. I already tried forming url of product pages, but there are some unknown variables that I can't put to get results. However there is search field so i can use url like this: http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen - the problem is, that given page display info (probably java script) and then redirect straight to desired page - the one that i need to pull data from.
is there any way of tracking this redirection thing?
I would like to put some of my code, but everything i got so far, i find unhelpful because it just download source of preregistered page.
public static string Download(string uri)
{
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
string s = client.DownloadString(uri);
return s;
}
Also suggested answer is not helpfull in this case, because redirection doesn't come with http request - page is redirected after few seconds of loading http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen url
I just found solution, And since i'm new, and i have to wait few hours to answer my question, it will end up there:
I hope that other users will find it usefull:
{pseudocode}
webBrowser1.Navigate('url');
while (webBrowser1.Url.AbsoluteUri != 'url')
{
// wait
}
String desiredUri = webBrowser1.Url.AbsoluteUri;
Thanks for answers.
Welcome to the wonderful world of page scraping. The short answer is "you can't do that." Not in the general case, anyway, and certainly not with WebClient. The problem appears to be that some Javascript does the redirection. And since all WebClient does is download the page, it's not even going to download the Javascript. Much less parse and execute it.
You might be able to do this by creating a program that uses the WebBrowser class. You can have it load the page. It should do the redirect and then you can inspect the result, which should be the page you were looking for. I haven't actually done this, but it does seem possible.
Your other option is to fire up your Web browser's developer tools (like IE's F12 Developer Tools) and watch what's happening. You can then inspect the Javascript that's being executed as well as the modified DOM, and see where the redirect happens.
Yes, it's tedious work. But once you figure out the redirect for one page, you can probably generate the URL for the other pages you want automatically.
I am having a lot of trouble using Webrequests in MonoDroid and getting timeouts at random. My code works fine then sometimes all requests just timeout and don't work.
I have verified the webservices used in my requests are not the problem.
Here is an example of some code that I may use to request some data from a webservice using MonoDroid:
bool bolOk = false;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create ("http://www.website.com/service/");
request.Timeout = 20000;
request.Credentials = gv_objCredentials;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse ()) {
bolOk = response.StatusCode == HttpStatusCode.OK;
}
As you can see it is basic stuff. I use code like the above always on another thread to the UI using ThreadPool.QueueUserWorkItem or TaskFactory.
What I have noticed is that if the requests start timing out from my app and I plug it in to my computer then debug the app from MonoDevelop the requests work without timing out. I am not sure if this means anything. This is similar to testing the webservices from my computer using a browser on the same network as the phone. The webservices always work without any issues.
What is the best way to make Webrequests from MonoDroid?
How can I ensure my requests are always successful and won't timeout if the webservice is operating correctly?
I had the Issue on Xamarin 4.2.6 and 4.2.8.
Thanks to Xamarin support, they identified the issue and suggested I targeted my build to armeabi-v7a rather than armeabi in my project properties (some multi-core processor issue described here)
Depending on whether you plan to support multi-core processors or not, you should check our this post and may need to manually edit your .csproj file.
There's a new version of Mono for Android (4.2.5) that fixes a number of bugs with the WebRequest and webRequestStream. You can check the release notes here: http://docs.xamarin.com/android/Releases/Mono_For_Android_4/Mono_for_Android_4.2.5
I suggest downloading the latest bits and check if it works. If not, please file a bug and they will surely fix it in the next version of the product.
I've been looking all over and i can't find anything to fix this problem.
IE6 only is having an issue with the response of an AJAX call and is erroring because of invalid data (alerting the data shows a single beauitful weird square)
Locally IE6 works perfectly (same page and same data), i've checked and rechecked its calling the correct URL's on the server vs here and its not cross domain or anything like that. It's also actually making the call fine and getting a response (OK status)
Calling the AJAX url on the live server in IE6 via address bar works perfectly and shows all the data fine too.
I've tried forcing the content-type and charset, i've tried turning off compression in the web.config and its still dead... and i'm running out of time :(
FF3.5, IE7, IE8 all perfectly fine locally and on the server. Server is Windows 2008 (Rackspace Cloud) and local is just the built-in dev server from Web Dev Express 2008
Its a MVC C# App.. any ideas would be appreciated!
EDIT:
Not alot i can paste but ok, controller:
public ActionResult TEST()
{
return Content("HI THERE!", "text/plain");
}
it was text/html of course to start with.
The javascript is a simple ajaxURL function which loads XMLHttpRequest or MSXML2.XMLHTTP or Microsoft.XMLHTTP. on ready state change, if the readyState == 4 then alert the result and its a square (unless you load the url in the browser directly)
Again the entire site loads fine in all other browsers so i think its more a server/config issue esp as even IE6 works locally. I've made the ajax URL's fully referenced just in case (using code to get host, checking non default port and adding it for localhost etc etc)
Viewing source all the calls to ajaxURL are perfect... i'll try another encoding option other than UTF-8 perhaps and check more compression options.
Surely i'm not the only one to ever come across this? heh
Found the issue, putting here in case others need the info :)
Content-Encoding: gzip
This is killing my IE6 ajax calls, i just need to get it turned off on the host. It also explains why it works locally without returning garbage as gzip isn't set locally.
Now i have to find out why the web.config HttpCompression settings arn't affecting it being compressed
EDIT:
<urlCompression doDynamicCompression="false" doStaticCompression="false"></urlCompression>
This did it, IE6 ajax working server side nicely again, 2:30am now i can go to bed! :D
Anyone having a similar issue with IE6 and responseText where responseText is a weird square looking character... it's the same problem! Just found this out after hours of searching.
Solution is to send your response with no Content-Encoding. In PHP for example, use the following header:
header('Content-Encoding: none');
Thanks to White Dragon for finally solving this for me.
I've been experiencing the same sort of problems and discovered that the problems only existed when using IE6 in IETester. When testing on a separate workstation IE6-XP-SP2 the AJAX responses were fine.
I have the same problem, too.
But I try White Dragon's, it doesn't work!
And I found that the data that ajax many times is the same,
so I think is about the request cache.
I set it to no-cache, then I solve the problem.
I have a rather simple program which takes in a URL and spits out the first place it redirects to. Anyhow, I've been testing it on some links and noticed gets 400 errors on some urls. I tried testing such urls by pasting it into my browser and that worked fine.
static string getLoc(string curLoc, out string StatusDescription, int timeoutmillseconds)
{
HttpWebRequest x = (HttpWebRequest)WebRequest.Create(curLoc);
x.UserAgent = "Opera/9.52 (Windows NT 6.0; U; en)";
x.Timeout = timeoutmillseconds;
x.AllowAutoRedirect = false;
HttpWebResponse y = null;
try
{
y = (HttpWebResponse)x.GetResponse(); //At this point it throws a 400 bad request exception.
I think something weird is happening with cookies. It turns out that due to the way I was testing the link, the necessary cookies for it to work were in my browser but not the link. It turns out some of the links I was testing manually (when the other links failed) were generating cookies.
It's slightly convoluted what happened but the short answer is that my browser had cookies, the program did not, maintaining the cookies between redirects did not solve the problem.
The underlying problem is caused by the fact that the link I am testing requires either an extra parameter or a cookie or both. I was trying to avoid both in my tests since the parameter/cookie were for tracking and I didn't want to break tracking.
In short, I know what the problem is but it's not a solvable problem.
Use HttpWebRequest to download web pages without key sensitive issues
[update: I don't know why, but both examples below now work fine! Originally I was also seeing a 403 on the page2 example. Maybe it was a server issue?]
First, WebClient is easier. Actually, I've seen this before. It turned out to be case sensitivity in the url when accessing wikipedia; try ensuring that you have used the same case in your request to wikipedia.
[updated] As Bruno Conde and gimel observe, using %27 should help make it consistent (the intermittent behaviour suggest that maybe some wikipedia servers are configured differently to others)
I've just checked, and in this case the case issue doesn't seem to be the problem... however, if it worked (it doesn't), this would be the easiest way to request the page:
using (WebClient wc = new WebClient())
{
string page1 = wc.DownloadString("http://en.wikipedia.org/wiki/Algeria");
string page2 = wc.DownloadString("http://en.wikipedia.org/wiki/%27Abadilah");
}
I'm afraid I can't think what to do about the leading apostrophe that is breaking things...
I also got strange results ... First, the
http://en.wikipedia.org/wiki/'Abadilah
didn't work and after some failed tries it started working.
The second url,
http://en.wikipedia.org/wiki/'t_Zand_(Alphen-Chaam)
always failed for me...
The apostrophe seems to be the responsible for these problems. If you replace it with
%27
all urls work fine.
Try escaping the special characters using Percent Encoding (paragraph 2.1). For example, a single quote is represented by %27 in the URL (IRI).
I'm sure the OP has this sorted by now but I've just run across the same kind of problem - intermittent 403's when downloading from wikipedia via a web client. Setting a user agent header sorts it out:
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");