C# Link Analyzer getting Bad Request Errors? - c#

I have a rather simple program which takes in a URL and spits out the first place it redirects to. Anyhow, I've been testing it on some links and noticed gets 400 errors on some urls. I tried testing such urls by pasting it into my browser and that worked fine.
static string getLoc(string curLoc, out string StatusDescription, int timeoutmillseconds)
{
HttpWebRequest x = (HttpWebRequest)WebRequest.Create(curLoc);
x.UserAgent = "Opera/9.52 (Windows NT 6.0; U; en)";
x.Timeout = timeoutmillseconds;
x.AllowAutoRedirect = false;
HttpWebResponse y = null;
try
{
y = (HttpWebResponse)x.GetResponse(); //At this point it throws a 400 bad request exception.

I think something weird is happening with cookies. It turns out that due to the way I was testing the link, the necessary cookies for it to work were in my browser but not the link. It turns out some of the links I was testing manually (when the other links failed) were generating cookies.
It's slightly convoluted what happened but the short answer is that my browser had cookies, the program did not, maintaining the cookies between redirects did not solve the problem.
The underlying problem is caused by the fact that the link I am testing requires either an extra parameter or a cookie or both. I was trying to avoid both in my tests since the parameter/cookie were for tracking and I didn't want to break tracking.
In short, I know what the problem is but it's not a solvable problem.

Related

Poor HTTP request performance unless using Fiddler

Edit: after talking it over with a couple IT guys, I've realized it's only the POLL requests that are having issues. I'm fetching the images via GET requests that go through quickly and as expected, whether or not the POLL messages are having issues.
I'm working on a client to interface with an IP camera in C#.
It's all working dandy except that I can get really poor http request performance when I'm not using Fiddler (a web traffic inspection proxy).
I'm using an httpclient to send my requests, this is my code that actually initiates the poll request:
public async Task<bool> SetPoll(int whichpreset)
{
string action = "set";
string resource = presetnames[whichpreset];
string value = presetvalues[whichpreset];
int requestlen = 24 + action.Length + resource.Length + value.Length;
var request = new HttpRequestMessage
{
RequestUri = new Uri("http://" + ipadd + "/res.php"),
Method = HttpMethod.Post,
Content = new FormUrlEncodedContent(new[]{
new KeyValuePair<string,string>("action",action),
new KeyValuePair<string,string>("resource",resource),
new KeyValuePair<string,string>("value",value)
}),
Version = new System.Version("1.1"),
};
HttpResponseMessage mess = await client.SendAsync(request);
if (mess.IsSuccessStatusCode)
{
return true;
}
else
{
return false;
}
}
When Fiddler is up, all my http requests go through quickly, and without a hitch (I'm making about 20 post requests upon connecting). Without it, they only go through as expected ~1/5 of the time, and the rest of the time they're never completed, which is a big issue. Additionally, the initial connection request often takes 1+ minutes when not using Fiddler, and consistently only takes a few seconds when I am, so it doesn't seem to be a timing issue of sending requests too soon after connecting.
This leads me to think that the request, as written, is fairly poorly behaved, and perhaps Fiddler's requests behave better. I'm a newbie to HTTP, so I'm not sure exactly why this would be. My questions:
does Fiddler modify HTTP requests (E.G. different headers, etc.)
as they are sent to the server?
even if it doesn't modify the requests, are Fiddler's requests in
some way better behaved than I'd be getting out of .net 4.0 in C# in
VS2013?
is there a way to improve the behavior of my requests to emulate
whatever Fiddler is doing? Ideally while still working within the
stock HTTP namespace, but I'm open to using others if necessary.
I'll happily furnish more code if helpful (though tomorrow).
Inserting
await Task.Delay(50);
between all requests fixed the problem (I haven't yet tested at different delays). Because Fiddler smoothed the problem out, I suspect it's an issue the camera has with requests sent in too quick of a succession, and fiddler sent them at a more tolerable rate. Because it's an async await, there is no noticeable performance impact, other than it taking a little while to get through all ~20 (30 now) requests through on startup, which is not an issue for my app.
Fiddler installs itself as a system proxy. It is possible that the Fiddler process has better access to the network than your application's process.
Fiddler might be configured to bypass your normal system proxy (check the gateway tab under options) and perhaps the normal system proxy has issues.
Fiddler might be running as a different user with a different network profile, e.g. could be using a different user cert store or different proxy settings such as exclusion list.
Fiddler might be configured to override your hosts file and your hosts file may contain errors.
Your machine might be timing out trying to reach the servers necessary to check for certificate revocation. Fiddler has CRL checking disabled by default (check the HTTPS tab).
Fiddler has a ton of options and the above are just some guesses.
My recommendation would be to check and/or toggle the above options to see if any of them apply. If you can't get anywhere, you may have to forget Fiddler exists and troubleshoot your network problems independently, e.g. by using NSLOOKUP, PING, TRACERT, and possibly TELNET to isolate the problem.
There is nothing in your code sample that suggests a code flaw that could cause intermittent network failures of the kind you are describing. In fact it is hard to imagine any code flaw that would cause that sort of behavior.

getting source code of redirected http site via c# webclient

I have problem with certain site - I am provided with list of product ID numbers (about 2000) and my job is to pull data from producer site. I already tried forming url of product pages, but there are some unknown variables that I can't put to get results. However there is search field so i can use url like this: http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen - the problem is, that given page display info (probably java script) and then redirect straight to desired page - the one that i need to pull data from.
is there any way of tracking this redirection thing?
I would like to put some of my code, but everything i got so far, i find unhelpful because it just download source of preregistered page.
public static string Download(string uri)
{
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
string s = client.DownloadString(uri);
return s;
}
Also suggested answer is not helpfull in this case, because redirection doesn't come with http request - page is redirected after few seconds of loading http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen url
I just found solution, And since i'm new, and i have to wait few hours to answer my question, it will end up there:
I hope that other users will find it usefull:
{pseudocode}
webBrowser1.Navigate('url');
while (webBrowser1.Url.AbsoluteUri != 'url')
{
// wait
}
String desiredUri = webBrowser1.Url.AbsoluteUri;
Thanks for answers.
Welcome to the wonderful world of page scraping. The short answer is "you can't do that." Not in the general case, anyway, and certainly not with WebClient. The problem appears to be that some Javascript does the redirection. And since all WebClient does is download the page, it's not even going to download the Javascript. Much less parse and execute it.
You might be able to do this by creating a program that uses the WebBrowser class. You can have it load the page. It should do the redirect and then you can inspect the result, which should be the page you were looking for. I haven't actually done this, but it does seem possible.
Your other option is to fire up your Web browser's developer tools (like IE's F12 Developer Tools) and watch what's happening. You can then inspect the Javascript that's being executed as well as the modified DOM, and see where the redirect happens.
Yes, it's tedious work. But once you figure out the redirect for one page, you can probably generate the URL for the other pages you want automatically.

Asp.net ajax - losing session variables - Internet Explorer 8 and prior versions

I have been working on a legacy project (although C#) and trying to solve a session problem that have been encovered for years. It hapens on IE8 and prior versions. On IE9, Google Chrome, Firefox and Safari works fine.
In other words, we have a management software that works fine on all browsers. But there is a specific page that makes tons of Ajax requests, and in some point it loses the session data.
I have checked for cookie problems with Fiddle but they are always sent and the same.
These clues make us think that the problem is within the application. But if we remember the problem occurs just in IE8 and prior versions we think the issue is probably in the browsers.
We also use a legacy Ajax library. And the problem mustn't be there as many of our aplications
use it and they doesn't have the same problem.
We are using IIS7 with State Server
I'm almost out of ideas. I hope you have some.
I got it!
Using Fiddler, I saw a very suspect request for "/". There was something requesting for the site base URL. And I remembered that the default page of this particular web application kills the session data, in other words calling the login page also means to log the user off.
After some hours of debugging and sniffing I found what was making such request.
There is a javascript function that creates some image tags. Some times those tag were created with an empty address, in other words the src property of the img tag was a string with 0 legth.
It must be an IE8 and older versions bug, as they request the website root instead of not requesting anything. Maybe it's not a bug, but this behavior is certainly unexpected.
Phew! I still can't believe I found it.
Losing session state can be result of the application error. But if you claim that this happens only on IE8 and older versions, this could not be the case...
So I would suggest you to use page ViewState instead of session state. Let me know if did the trick for you?
Here is sample how to create propety based on page viewstate, just make sure you have enabled viewstate on page level:
public string MyProperty
{
get
{
return ViewState["MyProperty"] as string;
}
set
{
ViewState["MyProperty"] = value;
}
}

Why is WebClient (System.Net) getting from URL twice?

I've got a method like this:
private string getFromURL(string url)
{
WebClient myClient = new WebClient();
return myClient.DownloadString(url);
}
using WebClient from System.Net. It appears to be hitting the url twice (I'm also watching the log of the web server in question and it records two hits). Any idea why this might be?
EDIT: the answer was in fact programmer error. I no longer have any reason to think this is behaving strangely. Thanks for the answers.
Or if the URL is subtly different in the two cases it could be responding to a HTTP redirect request.
My guess is that it's doing a HEAD before the GET. Does your log show the HTTP method being used?
check out tcpmon:
https://tcpmon.dev.java.net/
it's a java tool - but you can run it easy w/out being a "java" guy
Chances are there's a redirect or something to itself, so you should be able to see if the http requests are identical or slightly different.
Also, check out curl (cygwin) - you can test sending the requests from there and see if there's a redirect or something.

Use HttpWebRequest to download web pages without key sensitive issues

Use HttpWebRequest to download web pages without key sensitive issues
[update: I don't know why, but both examples below now work fine! Originally I was also seeing a 403 on the page2 example. Maybe it was a server issue?]
First, WebClient is easier. Actually, I've seen this before. It turned out to be case sensitivity in the url when accessing wikipedia; try ensuring that you have used the same case in your request to wikipedia.
[updated] As Bruno Conde and gimel observe, using %27 should help make it consistent (the intermittent behaviour suggest that maybe some wikipedia servers are configured differently to others)
I've just checked, and in this case the case issue doesn't seem to be the problem... however, if it worked (it doesn't), this would be the easiest way to request the page:
using (WebClient wc = new WebClient())
{
string page1 = wc.DownloadString("http://en.wikipedia.org/wiki/Algeria");
string page2 = wc.DownloadString("http://en.wikipedia.org/wiki/%27Abadilah");
}
I'm afraid I can't think what to do about the leading apostrophe that is breaking things...
I also got strange results ... First, the
http://en.wikipedia.org/wiki/'Abadilah
didn't work and after some failed tries it started working.
The second url,
http://en.wikipedia.org/wiki/'t_Zand_(Alphen-Chaam)
always failed for me...
The apostrophe seems to be the responsible for these problems. If you replace it with
%27
all urls work fine.
Try escaping the special characters using Percent Encoding (paragraph 2.1). For example, a single quote is represented by %27 in the URL (IRI).
I'm sure the OP has this sorted by now but I've just run across the same kind of problem - intermittent 403's when downloading from wikipedia via a web client. Setting a user agent header sorts it out:
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

Categories