I'm trying to download the html string of a website. The website has te following url:
https://www.gastrobern.ch/de/service/aus-weiterbildung/wirtekurs/234/?oid=1937&lang=de
First I tried to do a simple WebClient Request:
var wc = new WebClient();
string websitenstring = "";
websitenstring = wc.DownloadString("http://www.gastrosg.ch/default.asp?id=3020000&siteid=1&langid=de");
But, the websiteString was empty. Then, I read in some posts, that I have to send some additional headerinformations :
var wc = new WebClient();
string websitenstring = "";
wc.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
wc.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate, br";
wc.Headers[HttpRequestHeader.AcceptLanguage] = "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7";
wc.Headers[HttpRequestHeader.CacheControl] = "max-age=0";
wc.Headers[HttpRequestHeader.Host] = "www.gastrobern.ch";
wc.Headers[HttpRequestHeader.Upgrade] = "www.gastrobern.ch";
wc.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
websitenstring = wc.DownloadString("https://www.gastrobern.ch/de/service/aus-weiterbildung/wirtekurs/234/?oid=1937&lang=de");
I tried this, but no answer. Then, I also tried to set some cookies:
wc.Headers.Add(HttpRequestHeader.Cookie,
"CFID=10609582;" +
"CFTOKEN=32721418;" +
"_ga=GA1.2.37" +
"_ga=GA1.2.379124242.1539000256;" +
"_gid=GA1.2.358798732.1539000256;" +
"_dc_gtm_UA-1237799-1=1;");
But this also didn't work. I also found out, that the Browser is somehow doing multiple requests, and my C-Sharp Application is just doing one and showing the first response headers.
But I don't know how I can make a following up request. I'm thankful for every answer.
Try HttpClient instead
Here is an Example On how to use it
public async static Task<string> GetString(string url)
{
HttpClient client = new HttpClient();
// Way around to avoid Deadlock
HttpResponseMessage message = await client.GetAsync(url).ConfigureAwait(false);
return await message.Content.ReadAsStringAsync().ConfigureAwait(false);
}
To call this Method
string dataFromServer = GetString("https://www.gastrobern.ch/de/service/aus-weiterbildung/wirtekurs/234/?oid=1937&lang=de").Result;
I checked Here dataFromServer has HTML content to that page
Related
I have to work on JSON data from API (in my windows app), and I am trying make a POST request using WebClient.UploadString();
Below is my code, but its throwing error, I tried various options but not able to copy the JSON as a string.
string result = "";
string url = "https://30prnabicq-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia for vanilla JavaScript (lite) 3.24.12;JS Helper 2.24.0;vue-instantsearch 1.5.0&x-algolia-application-id=30PRNABICQ&x-algolia-api-key=dcccebe87b846b64f545bf63f989c2b1";
string json = "{\"requests\":[{\"indexName\":\"vacatures\",\"params\":\"query=&hitsPerPage=20&page=0&highlightPreTag=__ais-highlight__&highlightPostTag=__/ais-highlight__&facets=[\"category\",\"contract\",\"experienceNeeded\",\"region\"]&tagFilters=\"}]}";
using (var client = new WebClient())
{
client.Headers[HttpRequestHeader.Host] = "30prnabicq-dsn.algolia.net";
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0";
client.Headers[HttpRequestHeader.Accept] = "application/json";
client.Headers[HttpRequestHeader.AcceptLanguage] = "en-US,en;q=0.5";
client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate, br";
client.Headers[HttpRequestHeader.Referer] = "https://bouwjobs.be/";
client.Headers[HttpRequestHeader.ContentType] = "application/json";
client.Headers[HttpRequestHeader.ContentLength] = "249";
client.Headers[HttpRequestHeader.Origin] = "https://bouwjobs.be";
client.Headers[HttpRequestHeader.Connection] = "keep-alive";
client.Headers[HttpRequestHeader.Cache - Control] = "max-age=0";
result = client.UploadString(url, "POST", json);
return result;
}
Please guide me in correcting my code.
Note - Some restricted headers I have included in my code, but even after commenting out those it is throwing error.
You do not seem uploading a valid Json with WebClient. Double quotes in your inner array facets means your query parameter has ended. Remove quotes from it.
string json = "{\"requests\":[{\"indexName\":\"vacatures\",\"paras\":\"query=&hitsPerPage=20&page=0&highlightPreTag=__ais-highlight__&highlightPostTag=__/ais-highlight__&facets=[category,contract,experienceNeeded,region]&tagFilters=\"}]}";
This is valid json and should work fine.
I used the following code to get the SourceCode from a SharePoint 2010 site:
try {
WebRequest req = HttpWebRequest.Create("myLink");
req.Method = "GET";
req.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;
string source = "";
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream())) {
source += reader.ReadToEnd();
}
}
From the source string i was able to search for keywords i was looking for on the website.
Now the SharePoint has been migrated to 2016 and i am not able anymore to view the specific content in the Source Code.
However it is posible to use for example the integrated web developer tool of chrome to view the structure of the site. Also my content i am looking for is visible in this case.
How is it possible to get this information programmatically using C#?
Try this:
using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
client .Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
string htmlCode = client.DownloadString("myLink");
//...
}
I'm practicing my skills with httpclient .NET 4.5 , but I ran into some trouble trying to get html content from a webpage which I can view no problem with my browser, but I will get a 410 Gone when I use httpclient
There is the first page that I can actually get with httpclient
https://selfservice.mypurdue.purdue.edu/prod/bwckschd.p_disp_detail_sched?term_in=201610&crn_in=20172
But links inside of this above url, like the url represented by "View Catalog Entry "
when I try to access with httpclient, I will get a 410 Gone.
I used Tuple to set up the series of requests
var requests = new List<Tuple<Httpmethod, string, FormUrlEncodedContent, string>>()
{
new Tuple<Httpmethod, string, FormUrlEncodedContent, string>(Httpmethod.GET, "https://selfservice.mypurdue.purdue.edu/prod/bwckschd.p_disp_detail_sched?term_in=201610&crn_in=20172", null, ""),
new Tuple<Httpmethod, string, FormUrlEncodedContent, string>(Httpmethod.GET, "https://selfservice.mypurdue.purdue.edu/prod/bwckctlg.p_display_courses?term_in=201610&one_subj=EPCS&sel_crse_strt=10100&sel_crse_end=10100&sel_subj=&sel_levl=&sel_schd=&sel_coll=&sel_divs=&sel_dept=&sel_attr=", null, ""),
new Tuple<Httpmethod, string, FormUrlEncodedContent, string>(Httpmethod.GET, "https://selfservice.mypurdue.purdue.edu/prod/bwckctlg.p_disp_listcrse?term_in=201610&subj_in=EPCS&crse_in=10100&schd_in=%", null, "")
};
Then use the foreach to loop through pass each element in requests to a function
HttpClientHandler handler = new HttpClientHandler()
{
CookieContainer = cookies,
AllowAutoRedirect = false,
AutomaticDecompression = DecompressionMethods.GZip
};
HttpClient client = new HttpClient(handler as HttpMessageHandler)
{
BaseAddress = new Uri(url),
Timeout = TimeSpan.FromMilliseconds(20000)
};
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, sdch");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4");
client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36");
client.DefaultRequestHeaders.Connection.Add("keep-alive");
if (referrer.Length > 0)
client.DefaultRequestHeaders.Referrer = new Uri(referrer);
System.Diagnostics.Debug.WriteLine("Navigating to '" + url + "...");
HttpResponseMessage result = null;
//List<Cookie> cook = GetAllCookies(cookies);
try
{
switch (method)
{
case Httpmethod.POST:
result = await client.PostAsync(url, post_content);
break;
case Httpmethod.GET:
result = await client.GetAsync(url);
break;
}
}
catch (HttpRequestException ex)
{
throw new ApplicationException(ex.Message);
}
I have no clue what is wrong with my method, since I have successfully handled requests that needs to be logged in with this method.
I intended to build some sort of web crawler using this.
Please help, thanks in advance.
I forgot to mention, the links inside of the above url could be accessed by clicking on them on a web browser, simply copy them to address bar will get 410 Gone as well.
EDIT: There is some JavaScript code I saw in page source, it's creating an XMLHttpRequest to get the content, so it might be that it only refresh part of the page instead of creating a new webpage. But the url is changed. How do I get httpclient do the clicking thing?
i'm using an HttpClient to create a HTTP request and that client comes from the assembly Windows.Web.Http
All is good when posting the request without the Content-Type Header but the server does not return what I need because it needs that header, so after finding the correct headers needed to be sent I'm facing another problem... I'm not being able to set the Content-Type header
Here is my code (where is the try block is where the error is)
using (var wp = new Windows.Web.Http.HttpClient())
{
HttpRequestMessage mSent = new HttpRequestMessage(HttpMethod.Post, new Uri(url));
//mSent.Headers.Add("Host", "academicos.ubi.pt");
//mSent.Headers.Add("Connection", "keep-alive");
//mSent.Headers.Add("Content-Length", "18532");
//mSent.Headers.Add("Origin", "https://academicos.ubi.pt");
//mSent.Headers.Add("X-Requested-With", "XMLHttpRequest");
//mSent.Headers.Add("Cache-Control", "no-cache");
//mSent.Headers.Add("X-MicrosoftAjax", "Delta=True");
mSent.Headers.Add("User-Agent", " Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36");
try
{
mSent.Content.Headers.ContentType = new Windows.Web.Http.Headers.HttpMediaTypeHeaderValue("application/x-www-form-urlencoded");
}
catch (Exception ex) { ex.ToString(); }
//mSent.Headers.Add("Accept", "*/*");
//mSent.Headers.Add("Referer", "https://academicos.ubi.pt/online/horarios.aspx");
//mSent.Headers.Add("Accept-Encoding", "gzip,deflate");
//mSent.Headers.Add("Accept-Language", "pt-PT,pt;q=0.8,en-US;q=0.6,en;q=0.4");
mSent.Headers.Add("Cookie", "the cookie string is big, so I will not post it here");
mSent.Content = new HttpStringContent("the content is well defined, but I will not post it here it's huge"), Windows.Storage.Streams.UnicodeEncoding.Utf8);
HttpResponseMessage mReceived = await wp.SendRequestAsync(mSent, HttpCompletionOption.ResponseContentRead);
if (mReceived.IsSuccessStatusCode)
{
htmlPage = await mReceived.Content.ReadAsStringAsync();
}
}
The error that I'm recieving is Object reference not set to an instance of an object.
I tryied setting this header like I set the user agent it gives me another exception that says that to set the content type I need to set it under content headers...
Any Ideias? I tryied searching for answers for this problem, so far I came out empty handed
Pedro, you need to set the Content property to something, e.g.:
HttpRequestMessage mSent = new HttpRequestMessage(
HttpMethod.Post,
new Uri(url));
mSent.Content = new HttpStringContent(
"Name=Jonathan+Doe&Age=23",
UnicodeEncoding.Utf8,
"application/x-www-form-urlencoded");
There are other kinds of IHttpContent, such as HttpBufferContent, HttpFormUrlEncodedContent, HttpMultipartContent, HttpMultipartFormDataContent and HttpStreamContent.
You do not set the headers on the .Content member. You chould instead use the Multi-Part Content object and set the content similar to this:
HttpMultipartFormDataContent form = new HttpMultipartFormDataContent();
form.Add(new HttpStringContent(RequestBodyField.Text), "data");
HttpResponseMessage response = await httpClient.PostAsync(resourceAddress, form).AsTask(cts.Token);
Ref: http://code.msdn.microsoft.com/windowsapps/HttpClient-sample-55700664/sourcecode?fileId=98924&pathId=1116044733
Problem
Here at work, people spend a lot of time tracking AWB (Air way bill) from diferent sources (UPS, FedEx, DHL, ...). So, I was required to improve the process in order save valuable time, I was thinking to accomplish this using Excel as platform with Excel-DNA & C# but I have been trying some tests (crawling UPS) with no success.
Tests
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://wwwapps.ups.com/WebTracking/track?HTMLVersion=5.0&loc=es_MX&Requester=UPSHome&WBPM_lid=homepage%2Fct1.html_pnl_trk&trackNums=5007052424&track.x=Rastrear");
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Language: es-ES,es;q=0.8");
request.Headers.Add("Accept-Encoding: gzip,deflate,sdch");
request.KeepAlive = false;
request.Referer = #"http://www.ups.com/";
request.ContentType = "text/html; charset=utf-8";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
Or...
using (var client = new WebClient())
{
var values = new NameValueCollection();
values.Add("HTMLVersion", "5.0");
values.Add("loc", "es_MX");
values.Add("Requester", "UPSHome");
values.Add("WBPM_lid", "homepage/ct1.html_pnl_trk");
values.Add("trackNums", "5007052424");
values.Add("track.x", "Rastrear");
client.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
client.Headers[HttpRequestHeader.AcceptLanguage] = "es-ES,es;q=0.8";
client.Headers[HttpRequestHeader.Referer] = #"http://www.ups.com/";
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
string url = #"https://wwwapps.ups.com/WebTracking/track?";
byte[] result = client.UploadValues(url, values);
System.IO.File.WriteAllText(#"C:\UPSText.txt", Encoding.UTF8.GetString(result));
}
But none of the above examples worked as expected.
Question
Is it possible to web-crawl UPS in order to keep a track of AWB?
Note
Currently, I have no access to UPS API.
I just finished writing my script for it. The trick is that there is another url where you can just include the tracking number in the url and land directly on the page. You will then have to parse the tables as xml tags won't work. Just offset off of a header.