HTTPS request/response in C# - c#

I'm trying to view captcha from a site, but I screw something, because it is incorrect when I submit the answer, though I get the session and everything, so I decided to do it request by request, exactly the way they appear in fiddler, but it is https, and I can't find a tutorial, or explanation or anything about https requests in C#, for example, the first request is:
CONNECT passport.abv.bg:443 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0
Connection: keep-alive
Connection: keep-alive
Host: passport.abv.bg
so I try to do it like that:
HttpWebRequest req0 = (HttpWebRequest)WebRequest.Create("https://passport.abv.bg:443/");
req0.Method = "CONNECT";
req0.KeepAlive = true;
req0.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0";
req0.Host = "passport.abv.bg";
HttpWebResponse resp0 = (HttpWebResponse)req0.GetResponse();
StreamReader Reader0 = new StreamReader(resp0.GetResponseStream());
string thePage0 = Reader0.ReadToEnd();
Reader0.Close();
and of course it won't work, I can't even see the result, as it's not string, and the application freezes..
Can you give me some info please, I really can't find any explanation how to use https requests in C#

Related

Collecting cookies that are not set by HttpWebResponse

I need to scrape a table of info from a site for which I have valid credentials because the owners of the site do not provide an API.
I performed a login and saved the traffic with Fiddler, and am trying to replicate the key steps.
I'm going to show the steps I've done so far, and get to where I am stuck.
Log into the base url
CookieContainer jar = new CookieContainer();
request = (HttpWebRequest)WebRequest.Create(urlBase);
request.CookieContainer = jar;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
newUrl = response.ResponseUri.ToString();
Along with the return a cookie is set. When I look at the CookieContainer it has a count of 1 after the call.
Interestingly the response object does not contain the cookie - but I think all is okay because I can use jar.
2nd call
I'm not yet at the page where the name and password are presented, that doesn't happen until the 4th call.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlBase +
secondCallFolderAddition);
CookieCollection bakery = new CookieCollection();
request.KeepAlive = true;
request.Headers.Add("Upgrade-Insecure-Requests", #"1");
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 OPR/46.0.2597.57";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp, image/apng,*/*;q=0.8";
request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip, deflate, br");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string newURL = response.ResponseUri.ToString();
I get an OK status, and the response looks good compared to the original Fiddler traffic capture. In the original this 2nd call does not set a cookie, and no cookie is set here.
Third call
But here's where I get lost: the browser sent cookie data with three values (I've obfuscated):
__utma=1.123456789.123456789.123456789.123456789.1
olfsk=olfsk12345678901234567890123456789
hblid=abCDl11ABCabXabc1aABv1FLFX1RE1OS
I don't know where those values get set. They seem to relate to Google Analytics (from articles I've found) but I don't know how to collect them so that I can attach them to the call I make.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(newUrl);
request.KeepAlive = true;
request.Headers.Add("Upgrade-Insecure-Requests", "1");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 OPR/46.0.2597.57";
request.Accept = "text/html,application/xhtml+xml,application/xml;
q=0.9,image/webp,image/apng,*/*;q=0.8";
request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip, deflate, br");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
///request.Headers.Set(HttpRequestHeader.Cookie,
#"__utma=1.123456789.123456789.123456789.123456789.1;
olfsk=olfsk12345678901234567890123456789;
hblid=abCDl11ABCabXabc1aABv1FLFX1RE1OS");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string newURL = response.ResponseUri.ToString();
Please note the commented out line with the cookie data - I've tried this with that line un-commented also.
What happens is that I never get a response to the call.
I am very appreciative of any insights.
I am guessing that the cookie data in the third call is needed, and that is is set by a client-side script that gets collected between the 2nd and 3rd call - but I am new to this and unsure.
Also - if it is set on the client side, how can I get valid cookies that will get me past this roadblock. (These is another roadblock coming in the next call, where more cookies are used that i do not see set in a server response - but i am not there yet.)
I know I can solve this by using a WebBrowser object, but that seems like a clumsy solution. Is there a less clumsy way to go? Are there other objects or libraries I should try? (RestSharp? Postman? Webrequest object instead of HTTPWeRequest?)

XML to URL issues byte size differs from chrome extension and project

Hey guys I am trying to send an XML to a URL and I am getting back a 500 error. I am not sure why since this is an external service I am trying to use.
I used the chrome extension DHC - REST/HTTP API Client to test out the url and the XML and it gets a 200 response! When I compare the feeds mine from my .NET project and the google chrome extension one these are the differences.
Google Chrome Extension Feed: 200 response
POST https://togatest.efiletexas.gov/EPayments/Webs/EPayment.aspx HTTP/1.1
Host: togatest.efiletexas.gov
Connection: keep-alive
Content-Length: 328
Origin: chrome-extension://aejoelaoggembcahagimdiliamlcdmfm
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36
Content-Type: text/plain
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: es,en-US;q=0.8,en;q=0.6
Cookie: _ga=GA1.2.1578362496.1438095893; _mkto_trk=id:659-PBW-104&token:_mch-efiletexas.gov-1436559093618-37466; ASP.NET_SessionId=ug3njkuiz23oq255ouwsax3w
<PaymentRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<ClientKey>CJOFSTEXFILE123</ClientKey>
<TransactionID>TESTESTTEST123</TransactionID>
<RedirectURL>https://localhost:44300/efile</RedirectURL>
<Amount>-1</Amount>
<GetToken>1</GetToken>
</PaymentRequest>
My Project Feed: 500 Response
POST https://togatest.efiletexas.gov/EPayments/Webs/EPayment.aspx HTTP/1.1
Content-Type: text/plain
Accept: */*
Host: togatest.efiletexas.gov
Content-Length: 334
Connection: Keep-Alive
<PaymentRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<ClientKey>CJOFSTEXFILE123</ClientKey>
<TransactionID>TESTESTTEST123</TransactionID>
<RedirectURL>https://localhost:44300/efile</RedirectURL>
<Amount>-1</Amount>
<GetToken>1</GetToken>
</PaymentRequest>
For some reason I cant get the page to load I am getting an 500 error. I believe this is due because the content length is different and I dont know why. The XML's are exactly the same but perhaps the chrome extension is doing something different when opening the connection. Any help and explanation would be awesome. If someone can tell me how to manually change the content-length that would be really cool as well.
UPDATE:
Here is the code I am using to serialize my C# object, and open the connection for the URL
PaymentRequest tokenPaymentRequest = new PaymentRequest();
//Here we will set the values of our object
tokenPaymentRequest.ClientKey = "CJOFSTEXFILE123";
tokenPaymentRequest.TransactionID = "TESTESTTEST123";
tokenPaymentRequest.RedirectURL = "https://localhost:44300/efile";
tokenPaymentRequest.Amount = "-1";
tokenPaymentRequest.GetToken = "1";
var doc = new XDocument();
var xmlSerializer = new XmlSerializer(typeof(PaymentRequest));
using (var writer = doc.CreateWriter())
{
xmlSerializer.Serialize(writer, tokenPaymentRequest);
}
string xml = doc.ToString();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(destinationUrl);
//string s = "id="+Server.UrlEncode(xml);
byte[] requestBytes = System.Text.Encoding.ASCII.GetBytes(xml);
req.Method = "POST";
req.ContentType = "text/plain";
req.Accept = "*/*";
req.ContentLength = requestBytes.Length;
Stream requestStream = req.GetRequestStream();
requestStream.Write(requestBytes, 0, requestBytes.Length);
requestStream.Close();
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream(), System.Text.Encoding.Default);
string backstr = sr.ReadToEnd();
sr.Close();
res.Close();
return string.Empty;
Hey guys after much tampering with the code I figured out why the external service was not accepting my request.
Basically it needed this added to the header:
req.UserAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36";
With this in the header the code managed to be accepted successfully and the much desired 200 response has been received. I was confused at first since I thought the UserAgent property was restricted and could not be changed manually after much research I realized it can be changed if you declare your HttpWebRequest properly.
Here is a good answer for people having similar issues with sending XML's to a URL
Cannot set some HTTP headers when using System.Net.WebRequest
This answer was also very helpful when trying to understand why and how you are supposed to declare an HttpWebRequest object
How to modify request headers in c#,ASP .NET

Error 500 Web Request Can't Scrape WebSite

I have no problem accessing the website with a browser, but when I programmatically try to access the website for scraping, I get the following error.
The remote server returned an error: (500) Internal Server Error.
Here is the code I'm using.
using System.Net;
string strURL1 = "http://www.covers.com/index.aspx";
WebRequest req = WebRequest.Create(strURL1);
// Get the stream from the returned web response
StreamReader stream = new StreamReader(req.GetResponse().GetResponseStream());
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string strLine;
// Read the stream a line at a time and place each one
while ((strLine = stream.ReadLine()) != null)
{
if (strLine.Length > 0)
sb.Append(strLine + Environment.NewLine);
}
stream.Close();
This one has me stumped. TIA
Its the user agent.
Many sites like the one you're attempting to scrape will validate the user agent string in an attempt to stop you from scraping them. Like it has with you, this quickly stops junior programmers from attempting the scrape. Its not really a very solid way of stopping a scrape - but it stumps some people.
Setting the User-Agent string will work. Change the code to:
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(strURL1);
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"; // Chrome user agent string
..and it will be fine.
It looks like it's doing some sort of user-agent checking. I was able to replicate your problem in PowerShell, but I noticed that the PowerShell cmdlet Invoke-WebRequest was working fine.
So I hooked up Fiddler, reran it, and stole the user-agent string out of Fiddler.
Try to set the UserAgent property to:
User-Agent: Mozilla/5.0 (Windows NT; Windows NT 6.2; en-US) WindowsPowerShell/4.0

DownloadString returns a 404 Error: Site needs a User-Agent Header

I have a C# program which worked fine until a day or two ago. I use the following snippet to grab a page:
string strSiteListPath = #"http://www.ngs.noaa.gov/CORS/dates_sites.txt";
Uri uriSiteListPath = new Uri(strSiteListPath);
System.Net.WebClient oWebClient = new System.Net.WebClient();
strStationList = oWebClient.DownloadString(uriSiteListPath);
But it consistently returns a 404 Not Found error. That page completely exists, you are welcome to try it yourself. Because it worked days ago, and nothing in my code changed, I am given to think maybe the web-server changed in some way. That's fine, it'll happen, but what exactly has happened here?
Why can I browse to the file manually, but DownloadString fails to get the file?
EDIT:
For completeness, the code now looks like:
string strSiteListPath = #"http://www.ngs.noaa.gov/CORS/dates_sites.txt";
Uri uriSiteListPath = new Uri(strSiteListPath);
System.Net.WebClient oWebClient = new System.Net.WebClient();
oWebClient.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0");
strStationList = oWebClient.DownloadString(uriSiteListPath);
Thanks again, Thomas Levesque!
Apparently the site requires that you have a valid User-Agent header. If you set that header to something like that:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0
Then the request works fine.

Which HTTP header must be sent when call webrequest or webclient?

I am creating a web robot. Usually the http tools returns quite a few information and some of these are readonly (e.g. Connect: keep-alive). How to know which ones are required?
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset: ISO-8859-9,utf-8;q=0.7,*;q=0.3
Accept-Encoding: gzip,deflate,sdch
Accept-Language: tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control: max-age=0
Content-Length: 269
Content-Type: application/x-www-form-urlencoded
Host: closure-compiler.appspot.com
Origin: null
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1
Usually the code looks like the following. Someone pointed out that the following code missed to set Content-Type?
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://closure-compiler.appspot.com/compile");
req.Connection = "keep-alive";
req.Headers.Add("Cache-Control", "max-age=0");
req.Headers.Add("Origin","null");
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1";
req.ContentType = "application/x-www-form-urlencoded";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
req.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
req.Headers.Add("Accept-Language", "tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4");
req.Headers.Add("Accept-Charset", " ISO-8859-9,utf-8;q=0.7,*;q=0.3");
req.Method = "POST";
Stream reqStr = req.GetRequestStream();
No headers are required for general requests. Particular resources may require different headers. The right way is to ask owner of the resource what headers are needed. But if you want to cheat in some sort of game/forum you will have to figure out headers and other values yourself.
According to w3.org the simplest request should look somewhat similar to this:
GET <uri> CrLf
Thst's all.

Categories