Problem
Here at work, people spend a lot of time tracking AWB (Air way bill) from diferent sources (UPS, FedEx, DHL, ...). So, I was required to improve the process in order save valuable time, I was thinking to accomplish this using Excel as platform with Excel-DNA & C# but I have been trying some tests (crawling UPS) with no success.
Tests
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://wwwapps.ups.com/WebTracking/track?HTMLVersion=5.0&loc=es_MX&Requester=UPSHome&WBPM_lid=homepage%2Fct1.html_pnl_trk&trackNums=5007052424&track.x=Rastrear");
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Language: es-ES,es;q=0.8");
request.Headers.Add("Accept-Encoding: gzip,deflate,sdch");
request.KeepAlive = false;
request.Referer = #"http://www.ups.com/";
request.ContentType = "text/html; charset=utf-8";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
Or...
using (var client = new WebClient())
{
var values = new NameValueCollection();
values.Add("HTMLVersion", "5.0");
values.Add("loc", "es_MX");
values.Add("Requester", "UPSHome");
values.Add("WBPM_lid", "homepage/ct1.html_pnl_trk");
values.Add("trackNums", "5007052424");
values.Add("track.x", "Rastrear");
client.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
client.Headers[HttpRequestHeader.AcceptLanguage] = "es-ES,es;q=0.8";
client.Headers[HttpRequestHeader.Referer] = #"http://www.ups.com/";
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
string url = #"https://wwwapps.ups.com/WebTracking/track?";
byte[] result = client.UploadValues(url, values);
System.IO.File.WriteAllText(#"C:\UPSText.txt", Encoding.UTF8.GetString(result));
}
But none of the above examples worked as expected.
Question
Is it possible to web-crawl UPS in order to keep a track of AWB?
Note
Currently, I have no access to UPS API.
I just finished writing my script for it. The trick is that there is another url where you can just include the tracking number in the url and land directly on the page. You will then have to parse the tables as xml tags won't work. Just offset off of a header.
Related
I tried to get the source of a particular site page using the code below but it failed.
I was able to get the page source in 1~2 seconds using a webbrowser or webdriver, but httpwebrequest failed.
I tried putting the actual webbrowser cookie into httpwebrequest, but it failed, too.
(Exception - The operation has timed out)
I wonder why it failed and want to learn through failure.
Thank you in advance!!.
string Html = String.Empty;
CookieContainer cc = new CookieContainer();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("https://www.coupang.com/");
req.Method = "GET";
req.Host = "www.coupang.com";
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
req.Headers.Add("Accept-Language", "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7");
req.CookieContainer = cc;
using (HttpWebResponse res = (HttpWebResponse)req.GetResponse())
using (StreamReader str = new StreamReader(res.GetResponseStream(), Encoding.UTF8))
{
Html = str.ReadToEnd();
}
Removing req.Host from your code should do the trick.
According to the documentation:
If the Host property is not set, then the Host header value to use in an HTTP request is based on the request URI.
You already set the URI in (HttpWebRequest)WebRequest.Create("https://www.coupang.com/") so I don't think doing it again is necessary.
Result
Please let me know if it helps.
I used the Fiddler extension RequestToCode to replay a POST from logging into Yahoo.
When I run the code, I can see in Fiddler that the login was successful and there are 10 cookies in the response.
In my code though, the response.Cookies had a count of 0.
So I updated my HTTPWebRequest and set:
request.CookieContainer = new CookieContainer();
When I run the code again and look at it in Fiddler I see the login failed because the response navigates to a failed login url.
My ultimate goal is to get the cookies from the login attempt to use in a later Get request to Yahoo.
Why is setting the cookie container causing a failure?
Maybe because you initializing new CookieContainer on every request.
Declare public variable CookieContainer cookies = new CookieContainer();
Now your new requests will use the same CookieContainer, example:
var request = (HttpWebRequest)WebRequest.Create("https://www.yahoo.com/");
request.CookieContainer = cookies;
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.Headers.Add("accept-language", "en,hr;q=0.9");
request.Headers.Add("accept-encoding", "");
request.Headers.Add("Upgrade-Insecure-Requests", "1");
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string responseFromServer = reader.ReadToEnd();
reader.Close();
response.Close();
I am doing a httpwebrequest to recevie a web data from americalapperal.com using this code
var request = (HttpWebRequest)WebRequest.Create("http://store.americanapparel.net/en/sports-bra_rsaak301?c=White");
request.UserAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36";
var response = request.GetResponse();
//cli.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
using (var reader = new StreamReader(response.GetResponseStream()))
{
var data = reader.ReadToEnd();
return data;
}
I am receiving data from this url
http://store.americanapparel.net/en/sports-bra_rsaak301?c=White
But this live data is different and the data received my httpwebrequest is different
how could i get exact page data in c#?
I am performing the following HttpWebRequest:
private static void InvokeHealthCheckApi()
{
var webRequest = (HttpWebRequest)WebRequest.Create(Url);
string sb = JsonConvert.SerializeObject(webRequest);
webRequest.Method = "GET";
webRequest.KeepAlive = true;
webRequest.AllowAutoRedirect = true;
webRequest.ContentType = "application/json";
webRequest.UserAgent =
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
webRequest.CookieContainer = cookieJar;
using (HttpWebResponse response = webRequest.GetResponse() as HttpWebResponse)
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
File.AppendAllText("C:\\httpResponse1.txt", response.Headers.ToString());
File.AppendAllText("C:\\httpResponse2.html", reader.ReadToEnd());
}
}
The response from the request is coming back as web page that reads:
"Script is disabled. Click Submit to continue."
(Submit Button)
After clicking the submit button i get a prompt that reads:
"Do you want to open or save healthcheck.json(297 bytes) from fr74a87d9.work.corp.net?
After clicking the Open button I receive the json data that I am expecting to receive.
My question is how do I parse the response to get to the json data that I need? Is it normal to get the web page as the initial response and have to drill down to get the json response? StreamReader can't parse the response because it's a web page and not json data.
I found a resolution to my problem here: http://blogs.microsoft.co.il/applisec/2013/06/03/passive-federation-client/
I have an HttpWebRequest with a StreamReader that works very fine without using a WebProxy. When I use WebProxy, the StreamReader reads strange character instead of the actual html. Here is the code.
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("https://URL");
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10";
req.Accept = "application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
req.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
req.Headers.Add("Accept-Language", "en-US,en;q=0.8");
req.Method = "GET";
req.CookieContainer = new CookieContainer();
WebProxy proxy = new WebProxy("proxyIP:proxyPort");
proxy.Credentials = new NetworkCredential("proxyUser", "proxyPass");
req.Proxy = this.proxy;
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
StreamReader reader = new StreamReader(res.GetResponseStream());
string html = reader.ReadToEnd();
Without using the WebProxy, the variable html holds the expected html string from the URL. But with a WebProxy, html holds a value like that:
"�\b\0\0\0\0\0\0��]r����s�Y����\0\tP\"]ki���ػ��-��X�\0\f���/�!�HU���>Cr���P$%�nR�� z�g��3�t�~q3�ٵȋ(M���14&?\r�d:�ex�j��p������.��Y��o�|��ӎu�OO.�����\v]?}�~������E:�b��Lן�Ԙ6+�l���岳�Y��y'ͧ��~#5ϩ�it�2��5��%�p��E�L����t&x0:-�2��i�C���$M��_6��zU�t.J�>C-��GY��k�O�R$�P�T��8+�*]HY\"���$Ō�-�r�ʙ�H3\f8Jd���Q(:�G�E���r���Rܔ�ڨ�����W�<]$����i>8\b�p� �\= 4\f�> �&��$��\v��C��C�vC��x�p�|\"b9�ʤ�\r%i��w#��\t�r�M�� �����!�G�jP�8.D�k�Xʹt�J��/\v!�r��y\f7<����\",\a�/IK���ۚ�r�����ҿ5�;���}h��+Q��IO]�8��c����n�AGڟu2>�
Since you are passing
req.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
I would say your proxy compress the stream before sending it back to you.
Inspect the headers of the Response to check the encoding.
Just use Gzip to decompress it.