Problem with C# Proxy in test console script - c#

I have a project that has to get 100's of pages of data from a site each day. I use a paid for proxy with login details and I wait 5 seconds between requests so I don't hammer their site and pass a referer, user-agent and it is a simple GET request.
However I tried to make just a little C# Console script to test various ways of adding proxies e.g with or without credentials and got a working IP:Port from the web > http://www.freeproxylists.net/ to test it with, as my own details in this test didn't work. I am at a loss to why this test script isn't working when my main project is.
I am accessing an old site I own anyway so I am not blocking my own home IP as I can access it on the web (or any other page or site) in a browser easily.
Without using a proxy I just get a 30 second wait (the timeout length) then a "Timeout Error", with the proxy I get NO wait at all (free proxy OR one I own with credentials) before a "Timeout Error" - so whether I use a proxy or not it fails to return a response.
I am probably just sleep drained but would like to know what I am doing wrong as I just copied my "MakeHTTPGetRequest" method from my main projects Scraper class and just removed all the case statements in the try/catch to check for Connection/Timeout/404/Service/Server errors etc and put it into one simple Main method here...
public static void Main(string[] args)
{
string url = "https://www.strictly-software.com"; // a site I own
//int port = ????; // working in main project crawler
int port = 3128; // from a list of working free proxies
string proxyUser = "????"; // working in main project crawler
string proxyPassword = "????"; // working in main project crawler
string proxyIP = "167.99.230.151"; // from a list of working proxies
ShowDebug("Make a request to: " + url + " with proxy:" + proxyIP + ":" + port.ToString());
// user basic IP and Port proxy with no login
WebProxy proxy = new WebProxy(proxyIP, port);
/*
// use default port, username and password to login
// get same error with correct personal proxy and login but not
// in main project
WebProxy proxy = new WebProxy(proxyIP, port)
{
Credentials = new NetworkCredential(proxyUser, proxyPassword)
};
*/
ShowDebug("Use Proxy: " + proxy.Address.ToString());
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(url);
client.Referer = "https://www.strictly-software.com";
client.Method = "GET";
client.ContentLength = 0;
client.ContentType = "application/x-www-form-urlencoded;charset=UTF-8";client.Proxy = proxy;
client.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0";
client.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
client.Headers.Add("Accept-Encoding", "gzip,deflate");
client.KeepAlive = true;
client.Timeout = 30;
ShowDebug("make request with " + client.UserAgent.ToString());
try
{
// tried adding this to see if it would help but didn't
//ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
// get the response
HttpWebResponse response = (HttpWebResponse)client.GetResponse();
ShowDebug("response.ContentEncoding = " + response.ContentEncoding.ToString());
ShowDebug("response.ContentType = " + response.ContentType.ToString());
ShowDebug("Status Desc: " + response.StatusDescription.ToString());
ShowDebug("HTTP Status Code: " + response.StatusCode.ToString());
ShowDebug("Now get the full response back");
// old method not working with £ signs
StreamReader ResponseStream = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string ResponseContent = ResponseStream.ReadToEnd().Trim();
ShowDebug("content from response == " + Environment.NewLine + ResponseContent);
ResponseStream.Close();
response.Close();
}
catch (WebException ex)
{
ShowDebug("An error occurred");
ShowDebug("WebException " + ex.Message.ToString());
ShowDebug(ex.Status.ToString());
}
catch(Exception ex)
{
ShowDebug("An error occurred");
ShowDebug("Exception " + ex.Message.ToString());
}
finally
{
ShowDebug("At the end");
}
}
The error messages from the console (ShowDebug is just a wrapper for the time + message)...
02/08/2020 00:00:00: Make a request to: https://www.strictly-software.com with proxy:167.99.230.151:3128
02/08/2020 00:00:00: Use Proxy: http://167.99.230.151:3128/
02/08/2020 00:00:00: make request with Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0
02/08/2020 00:00:00: An error occurred
02/08/2020 00:00:00: WebException The operation has timed out
02/08/2020 00:00:00: Timeout
02/08/2020 00:00:00: At the end
I am sure it is just a something I have missed but I know that this code was copied from my main project that is currently crawling through 100s of pages with the same code and using a proxy with my credentials that work as I am getting data back at the moment from the main projects code.
I can ping the IP address of the proxy, but going to it in a browser returns a connection error, this is despite my big project using the same proxy to ripple through tons of pages and return HTML all night long...
I just wanted to update my main project by adding new methods to pass in custom proxies, or not use a proxy for the 1st attempt but if it fails then use one for a final attempt, or use a default proxy:port etc.

You set your timeout to be 30 milliseconds: client.Timeout = 30;
That could be causing your timeouts.
More info here.

Not sure if this solves your problem, but:
The documentation for HttpWebRequest states the following:
The local computer or application config file may specify that a
default proxy be used. If the Proxy property is specified, then the
proxy settings from the Proxy property override the local computer or
application config file and the HttpWebRequest instance will use the
proxy settings specified. If no proxy is specified in a config file
and the Proxy property is unspecified, the HttpWebRequest class uses
the proxy settings inherited from Internet Explorer on the local
computer. If there are no proxy settings in Internet Explorer, the
request is sent directly to the server.
Maybe there is a proxy in the IE settings configured? This does not explain why the request fails using the custom proxy, but it may be worth a shot.
I also suggest, that you test the free proxy using something like postman. From my experience, these free proxies don't work at least half of the time.
My best guess is, that when not using the proxy, the request fails because of the IE stuff and when using the proxy, the proxy itself is simply not working.
Hopefully this solves the problem...

Related

WebClient returning 403 error only for this website?

I am trying to download file from these links by using C# WebClient, but I am getting 403 error.
https://www.digikey.com/product-search/download.csv?FV=ffe00035&quantity=0&ColumnSort=0&page=5&pageSize=500
https://www.digikey.com/product-search/download.csv?FV=ffe00035&quantity=0&ColumnSort=0&page=4&pageSize=500
I tried to use different user agents, accept encoding etc.
I replaced and tried https to http from url, but no success.
When I paste these urls in Chrome or FireFox or IE, I am able to download file, sometimes it give 403 error, then I replace https to http from url, it downloads. But no success in webclient
Tried Fiddler to inspect, no success
Can someone try in your system, solve this problem.
Here is my code:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
WebClient client= new WebClient();
Uri request_url = new Uri("https://www.digikey.com/product-search/download.csv?FV=ffe00035&quantity=0&ColumnSort=0&page=5&pageSize=500);
//tried http also http://www.digikey.com/product-search/download.csv?FV=ffe00035&quantity=0&ColumnSort=0&page=5&pageSize=500
client.Headers.Add("user-agent", " Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0");
client.DownloadFile(request_url, #"E:\123.csv");
I know there are many threads related to this topic, I tried all of them, no success, please don't mark duplicate. Try in your system, this <10 lines of code.
Note: the same code is working for other websites, only for this website it is giving error.
As I mentioned in my comment the issue here is that the server is expecting a cookie (specifically 'i10c.bdddb') to be present and is giving a 403 error when it's not. However, the cookie is sent with the 403 response. So you can make an initial junk request that will fail but give you the cookie. After this you can then proceed as normal.
Through some trial and error I was able to get the CSV using the code below:
System.Net.ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
CookieContainer cookieContainer = new CookieContainer();
Uri baseUri = new Uri("https://www.digikey.com");
using (HttpClientHandler handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (HttpClient client = new HttpClient(handler) { BaseAddress = baseUri})
{
//The User-Agent is required (what values work would need to be tested)
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0");
//Make our initial junk request that will fail but get the cookie
HttpResponseMessage getCookiesResponse = await client.GetAsync("/product-search/download.csv");
//Check if we actually got cookies
if (cookieContainer.GetCookies(baseUri).Count > 0)
{
//Try getting the data
HttpResponseMessage dataResponse = await client.GetAsync("product-search/download.csv?FV=ffe00035&quantity=0&ColumnSort=0&page=4&pageSize=500");
if(dataResponse.StatusCode == HttpStatusCode.OK)
{
Console.Write(await dataResponse.Content.ReadAsStringAsync());
}
}
else
{
throw new Exception("Failed to get cookies!");
}
}
Notes
Even with the right cookie if you don't send a User-Agent header the server will return a 403. I'm not sure what the server expects in terms of a user agent, I just copied the value my browser sends.
In the check to see if cookies have been set it would be a good idea to verify you actually have the 'i10c.bdddb' cookie instead of just checking if there are any cookies.
This is just a quick bit of sample code so it's not the cleanest. You may want to look into FormUrlEncodedContent to send the page number and other parameters.
I tested with your URL and I was able to reproduce your error. Any requests that I try with the querystring parameter quantity=0 seems to fail with a HTTP Error 403.
I would suggest requesting a quantity greater than zero.
A HTTP 403 status code mean forbidden, so there is a problem with your credentials. It doesn't seem to be like you're sending any. If you add them into your header this should work fine like this:
client.Headers.Add("Authorization", "token");
or sending them like this:
client.UseDefaultCredentials = true;
client.Credentials = new NetworkCredential("username", "password");
Most likely the links are working through web browsers is because you have already authenticated and the browser is sending the credentials/token.
I have this issue with Digi-key too.
The solution for me is to turn off my VPN service.

Why does my WebClient return a 404 error most of the time, but not always?

I want to get information about a Microsoft Update in my program. However, the server returns a 404 error at about 80 % of the time. I boiled the problematic code down to this console application:
using System;
using System.Net;
namespace WebBug
{
class Program
{
static void Main(string[] args)
{
while (true)
{
try
{
WebClient client = new WebClient();
Console.WriteLine(client.DownloadString("https://support.microsoft.com/api/content/kb/3068708"));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Console.ReadKey();
}
}
}
}
When I run the code, I have to get through the loop a few times until I get an actual response:
The remote server returned an error: (404) Not found.
The remote server returned an error: (404) Not found.
The remote server returned an error: (404) Not found.
<div kb-title title="Update for customer experience and diagnostic telemetry [...]
I can open and force refresh (Ctrl + F5) the link in my browser as often as I want to, but it'll show fine.
The problem occurs on two different machines with two different internet connections.
I've also tested this case using the Html Agility Pack, but with the same result.
The problem does not occur with other websites. (The root https://support.microsoft.com works fine 100 % of the time)
Why do I get this weird result?
Cookies. It's because of cookies.
As I started to dig into this problem I noticed that the first time I opened the site in a new browser I got a 404, but after refreshing (sometimes once, sometimes a few times) the site continued to work.
That's when I busted out Chrome's Incognito mode and the developer tools.
There wasn't anything too fishy with the network: there was a simple redirect to the https version if you loaded http.
But what I did notice was the cookies changed. This is what I see the first time I loaded the page:
and here's the page after a (or a few) refreshes:
Notice how a few more cookie entries got added? The site must be trying to read those, not finding them, and "blocking" you. This might be a bot-prevention device or bad programming, I'm not sure.
Anyways, here's how to make your code work. This example uses the HttpWebRequest/Response, not WebClient.
string url = "https://support.microsoft.com/api/content/kb/3068708";
//this holds all the cookies we need to add
//notice the values match the ones in the screenshot above
CookieContainer cookieJar = new CookieContainer();
cookieJar.Add(new Cookie("SMCsiteDir", "ltr", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("SMCsiteLang", "en-US", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("smc_f", "upr", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("smcexpsessionticket", "100", "/", ".microsoft.com"));
cookieJar.Add(new Cookie("smcexpticket", "100", "/", ".microsoft.com"));
cookieJar.Add(new Cookie("smcflighting", "wwp", "/", ".microsoft.com"));
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
//attach the cookie container
request.CookieContainer = cookieJar;
//and now go to the internet, fetching back the contents
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using(StreamReader sr = new StreamReader(response.GetResponseStream()))
{
string site = sr.ReadToEnd();
}
If you remove the request.CookieContainer = cookieJar;, it will fail with a 404, which reproduces your issue.
Most of the legwork for the code example came from this post and this post.

WebRequest.GetResponse() returns 404 on valid URl

I'm trying to scrape web page via C# application, but it keeps responding
"The remote server returned an error: (404) Not Found."
The web page is accesible through browser, but the app keeps failing. Any help appreciated.
var d = DateTime.UtcNow.Date;
var AddressString = #"http://www.booking.com/searchresults.html?src=searchresults&si=ai%2Cco%2Cci%2Cre%2Cdi&ss={0}&checkin_monthday={1}&checkin_year_month={2}&checkout_monthday={3}&checkout_year_month={4}";
var URi = String.Format(AddressString, "Prague", d.Day, d.Year + "-" + d.Month, d.Day + 1, d.Year + "-" + d.Month);
var request = (HttpWebRequest)WebRequest.Create(URi);
request.Timeout = 5000;
request.UserAgent = "Fiddler"; //I tried to set next three rows not to be null
request.Credentials = CredentialCache.DefaultCredentials;
request.Proxy = WebProxy.GetDefaultProxy();
try
{
var response = (HttpWebResponse)request.GetResponse();
}
catch(WebException e)
{
var response = (HttpWebResponse)e.Response; //e.Response contains WebPage, but it is incomplete
StreamReader sr = new StreamReader(response.GetResponseStream());
HtmlDocument doc = new HtmlDocument();
doc.Load(sr);
var a = doc.DocumentNode.SelectNodes("div[#class='resut-details']"); //fails, as not all desired nodes arent in response
}
EDIT:
Hi guys, thx for suggestions.
I added header: "Accept-Encoding: gzip,deflate,sdch" according to David Martins reply, but it didn't helped on its own.
I used Fidller to try to get any info about the problem, but I saw that app for the first time and it didn't made me any smarter. On the other hand, I tried to change request.UserAgent to that which is sent by my browser ("User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36";) and voila, I am not getting 404 exception anymore, but the document is not readable, as it is filled with such chars: ¿½O~���G�. I tried setting request.TransferEncoding = "UTF-8", but to enable this propperty, request.SendChunked must be set to true, which ends in
ProtocolViolationException
Additional information: Content-Length or Chunked Encoding cannot be set for an operation that does not write data.
EDIT 2:
I'm forgetting something and I can't figure out what. I'm getting somehow encoded response and need to decode it first to read it correctly. Even in Fiddler, when I want to see response, I need to confirm decoding to inspect result. After I decode it in fiddler, I'm getting just what I want to get into my application...
So, after trying suggestions from Jon Skeet and David Martin I got somewhere further and found relevant answer on new question in another toppic. If anyone ever looked for sth similar, answer is here:
.NET: Is it possible to get HttpWebRequest to automatically decompress gzip'd responses?

400 bad request invalid hostname only in android application

I have a asp.net mvc website deployed on a server, providing a few web interfaces to others. For example, getting the current user's information, my test C# console application looks like this:
using (var client = new WebClient())
{
try
{
var url = "http://api.fake.mysite.com/v1.0/user/current";
var token = "e0034e1c082de62b74e361b15f9c6471";
var encoded = Convert.ToBase64String(Encoding.UTF8.GetBytes(token));
client.Headers["Authorization"] = encoded;
client.Headers["Content-Type"] = "application/json";
Console.WriteLine(client.DownloadString(url));
}
catch (WebException e)
{
//log the exception
}
}
You can see the usage is pretty simple, just request the url via HTTP_GET, set the Authorization header to the encoded token. Actually it works fine in my machine. But some one else meets a strange issue when visiting this url in an android application, here is the java code:
HttpClient httpClient = new DefaultHttpClient();
String token = "e0034e1c082de62b74e361b15f9c6471";
String url = "http://api.fake.mysite.com/v1.0/user/current";
HttpGet httpGet = new HttpGet(url);
String encoded = Base64.encodeToString(token.getBytes(), Base64.DEFAULT);
httpGet.addHeader("Authorization", encoded);
httpGet.addHeader("Content-Type", "application/json");
try {
HttpResponse httpResponse = httpClient.execute(httpGet);
int responseCode = httpResponse.getStatusLine().getStatusCode();
String response = EntityUtils.toString(httpResponse.getEntity());
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
then he got "400 bad request invalid host name" error. I've tried:
(1) make sure the variable "encoded" has the same value in C# and Java code.
(2) make sure the website's domain name is correctly set in server IIS
(3) all PCs/mobile phones can visit the test index page(http://api.fake.mysite.com)
(4) ping api.fake.mysite.com works fine
(5) if removing httpGet.addHeader("Authorization", encoded);, the Java program got a 401 Unauthorized result as expected(the server code under my control returns the result)
(6) some other applications using C# and PHP can use the web methods well, only android application can't(tested in two totally different android mobile phones, the android emulator got 400 invalid host name either)
(7) use IP instead of domain name http://xx.xx.xx.xx/v1.0/user/current, everything is the same. (xx.xx.xx.xx stands for the ip address)
(8) checked the IIS log, all requests to /v1.0/user/current returns 200/401/500, no 400 results.
(9) make sure the android application has internet permissions(actually we've added all permissions)
Does anyone know the reason or help to find the reason? Thank you very much, this issue is driving me crazy.
Should be httpGet.addHeader("Authorization", "basic " + encoded); and String encoded = Base64.encodeToString(token.getBytes(), Base64.NO_WRAP);
I struggled the very same problem. I can send HTTP POST from Fiddler or any other tool to my asp.net web API in debug mode but I can not access from my android application.
I tried to be sure to connect from my computer browser to
web API interface.
I tried to be sure to connect from android emulator web
browser(AEWB). And then I deployed my web api to IIS so I can get certain address to access from AEWB.
I can accessed to this adres from my AEWB
http://10.0.0.2:8088/api/tran
http://10.0.0.2 -> this is your local host address seen from Android
8088 -> this is your port of web api hosted on IIS
/api -> this is web api
/tran -> this is your controller

web service - 400 bad request

i want to consume a php Webservice from C# which is protected by htaccess.
i added the Service by adding a web reference in my VS 2010.
mywsfromwsdl myws = new mywsfromwsdl();
System.Net.CredentialCache myCredentials = new System.Net.CredentialCache();
NetworkCredential netCred = new NetworkCredential("user", "pass");
myCredentials.Add(new Uri(myws.Url), "Basic", netCred);
myws.Credentials = myCredentials;
myws.PreAuthenticate = true;
tbxIN.Text = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> "+
" <test> "+
" <parm1>4</parm1> "+
" <parm1>2</parm1> "+
" </test>";
tbxOUT.Text= myws.func1(tbxIN.Text.ToString());
The VS shows an error called 400 Bad REquest my last row.
If i delete the .htaccess File on the server, the pgm works fine, but i cant delete. because other PHP User use the Service.
Can anybody tell me how to send the Credentials correctly ?
By jo
Sometimes C# and Apache clash a bit: in this case, it might be that your client is expecting a 100 Continue response due to authentication being active, but the server doesn't send it.
This kind of behavior is toggled by this line:
ServicePointManager.Expect100Continue = false;
Add it before executing the request.
It's also worth pointing out that when 400 bad request happens, you might find some useful details in server's logs.
According to MSDN you shouldn't need the credential cache:
// Set the client-side credentials using the Credentials property.
ICredentials credentials = new NetworkCredential("Joe",SecurelyStoredPassword,"mydomain");
math.Credentials = credentials;
Have you tried this method instead of the cache object? More info can be found here.

Categories