I am getting different page in httpwebrequest in C# - c#

I am doing a httpwebrequest to recevie a web data from americalapperal.com using this code
var request = (HttpWebRequest)WebRequest.Create("http://store.americanapparel.net/en/sports-bra_rsaak301?c=White");
request.UserAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36";
var response = request.GetResponse();
//cli.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
using (var reader = new StreamReader(response.GetResponseStream()))
{
var data = reader.ReadToEnd();
return data;
}
I am receiving data from this url
http://store.americanapparel.net/en/sports-bra_rsaak301?c=White
But this live data is different and the data received my httpwebrequest is different
how could i get exact page data in c#?

Related

WPF c# WebCrawler

I´m trying to develop a web crawler to extract some information from my company's web site, but I'm getting the error as below:
An exception of type System.Net.WebException occurred in System.dll but was not handled in user code
Additional information: The remote server returned an error: (500) Internal Server Error.
Here is the requestHeaders of the website:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:WASReqURL=https://:9446/ProcessPortal/jsp/index.jsp; com_ibm_bpm_process_portal_hash=null
Host:ca8webp.itau:9446
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17
Here is my code:
public bool Acessar(CredencialWAO credencial)
{
bool logado = true;
#region REQUISIÇÃO PAGINA INICIAL
ParametrosCrawler parametrosRequest = new ParametrosCrawler("https://ca8webp.itau:9446/ProcessPortal/login.jsp");
parametrosRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
parametrosRequest.AcceptEncoding = "gzip, deflate, sdch";
parametrosRequest.AcceptLanguage = "en-US,en;q=0.8";
parametrosRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17";
parametrosRequest.CacheControl = "max-age=0";
}
protected override WebResponse GetWebResponse(WebRequest request)
{
var response = base.GetWebResponse(request);
AtualizarCookies(((HttpWebResponse)response).Cookies);
return response;
}
The error occurs when calling "base.GetWebResponse(request)"

LINQ to XML User-Agent header value

How can I specify HTTP User-Agent header for LINQ to XML to use for its requests when I call XElement.Load(url)?
I use for calls to Web API and it's required, that my client describes itself properly in User-Agent header.
You could use WebClient for specify user agent
using (var webClient = new WebClient())
{
webClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
using (var stream = webClient.OpenRead("http://server.com"))
{
XElement.Load(stream);
}
}
or
using (var webClient = new WebClient())
{
webClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
XElement.Parse(webClient.DownloadString(url));
}

UPS: AWB tracking via web crawling

Problem
Here at work, people spend a lot of time tracking AWB (Air way bill) from diferent sources (UPS, FedEx, DHL, ...). So, I was required to improve the process in order save valuable time, I was thinking to accomplish this using Excel as platform with Excel-DNA & C# but I have been trying some tests (crawling UPS) with no success.
Tests
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://wwwapps.ups.com/WebTracking/track?HTMLVersion=5.0&loc=es_MX&Requester=UPSHome&WBPM_lid=homepage%2Fct1.html_pnl_trk&trackNums=5007052424&track.x=Rastrear");
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Language: es-ES,es;q=0.8");
request.Headers.Add("Accept-Encoding: gzip,deflate,sdch");
request.KeepAlive = false;
request.Referer = #"http://www.ups.com/";
request.ContentType = "text/html; charset=utf-8";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
Or...
using (var client = new WebClient())
{
var values = new NameValueCollection();
values.Add("HTMLVersion", "5.0");
values.Add("loc", "es_MX");
values.Add("Requester", "UPSHome");
values.Add("WBPM_lid", "homepage/ct1.html_pnl_trk");
values.Add("trackNums", "5007052424");
values.Add("track.x", "Rastrear");
client.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
client.Headers[HttpRequestHeader.AcceptLanguage] = "es-ES,es;q=0.8";
client.Headers[HttpRequestHeader.Referer] = #"http://www.ups.com/";
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
string url = #"https://wwwapps.ups.com/WebTracking/track?";
byte[] result = client.UploadValues(url, values);
System.IO.File.WriteAllText(#"C:\UPSText.txt", Encoding.UTF8.GetString(result));
}
But none of the above examples worked as expected.
Question
Is it possible to web-crawl UPS in order to keep a track of AWB?
Note
Currently, I have no access to UPS API.
I just finished writing my script for it. The trick is that there is another url where you can just include the tracking number in the url and land directly on the page. You will then have to parse the tables as xml tags won't work. Just offset off of a header.

How to read HTML source of a page that requires NTML authentication

I need to get the HTML source of the web page.
This web page is a part of the web site that requires NTLM authentication.
This authentication is silent because Internet Explorer can use Windows log-in credentials.
Is it possible to reuse this silent authentication (i.e. reuse Windows log-in credentials), without making the user enter his/her credentials manually?
The options I have tried are below.
string url = #"http://myWebSite";
//works fine
System.Diagnostics.Process.Start("IExplore.exe", url);
InternetExplorer ie = null;
ie = new SHDocVw.InternetExplorer();
ie.Navigate(url);
//Works up to here, but I do not know how to read the HTML source with SHDocVw
NHtmlUnit.WebClient webClient = new NHtmlUnit.WebClient(BrowserVersion.INTERNET_EXPLORER_8);
HtmlPage htmlPage = webClient.GetHtmlPage(url);
string ghjg = htmlPage.WebResponse.ContentAsString; // Error 401
System.Net.WebClient client = new System.Net.WebClient();
client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Proxy.Credentials = CredentialCache.DefaultCredentials;
// DefaultNetworkCredentials and DefaultCredentials are empty
client.Headers.Add("user-agent", "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)");
string reply = client.DownloadString(url); // Error 401
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
IWebProxy proxy = request.Proxy;
// Print the Proxy Url to the console.
if (proxy != null)
{
// Use the default credentials of the logged on user.
proxy.Credentials = CredentialCache.DefaultNetworkCredentials;
// DefaultNetworkCredentials are empty
}
request.UserAgent = "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)";
request.Accept = "*/*";
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream stream = response.GetResponseStream(); // Error 401

Can't download utf-8 web content

I have simple code for getting response from a vietnamese website: http://vnexpress.net , but there is a small problem. For the first time, it downloads ok, but after that, the content contains unknown symbols like this:�\b\0\0\0\0\0\0�\a`I�%&/m.... What is the problem?
string address = "http://vnexpress.net";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);
You'll find that the response is GZipped. There doesn't appear to be a way to download that with WebClient, unless you create a derived class and modify the underlying HttpWebRequest to allow automatic decompression.
Here's how you'd do that:
public class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
var req = base.GetWebRequest(address) as HttpWebRequest;
req.AutomaticDecompression = DecompressionMethods.GZip;
return req;
}
}
And to use it:
string address = "http://vnexpress.net";
MyWebClient webClient = new MyWebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);
try with code and you'll be fine:
string address = "http://vnexpress.net";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
return Encoding.UTF8.GetString(Encoding.Default.GetBytes(webClient.DownloadString(address)));
DownloadString requires that the server correctly indicate the charset in the Content-Type response header. If you watch in Fiddler, you'll see that the server instead sends the charset inside a META Tag in the HTML response body:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If you need to handle responses like this, you need to either parse the HTML yourself or use a library like FiddlerCore to do this for you.

Categories