I want to load this http://www.yellowpages.ae/categories-by-alphabet/h.html url, but it returns null
In some question I have heard about adding Cookie container but it is already there in my code.
var MainUrl = "http://www.yellowpages.ae/categories-by-alphabet/h.html";
HtmlWeb web = new HtmlWeb();
web.PreRequest += request =>
{
request.CookieContainer = new System.Net.CookieContainer();
return true;
};
web.CacheOnly = false;
var doc = web.Load(MainUrl);
the website opens perfectly fine in browser.
You need CookieCollection to get cookies and set UseCookie to true in HtmlWeb.
CookieCollection cookieCollection = null;
var web = new HtmlWeb
{
//AutoDetectEncoding = true,
UseCookies = true,
CacheOnly = false,
PreRequest = request =>
{
if (cookieCollection != null && cookieCollection.Count > 0)
request.CookieContainer.Add(cookieCollection);
return true;
},
PostResponse = (request, response) => { cookieCollection = response.Cookies; }
};
var doc = web.Load("https://www.google.com");
I doubt it is a cookie issue. Looks like a gzip encryption since I got nothing but gibberish when I tried to fetch the page. If it was a cookie issue the response should return an error saying so. Anyhow. Here is my solution to your problem.
public static void Main(string[] args)
{
HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.yellowpages.ae/categories-by-alphabet/h.html");
request.Method = "GET";
request.ContentType = "text/html;charset=utf-8";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
doc.Load(stream, Encoding.GetEncoding("utf-8"));
}
}
}
catch (WebException ex)
{
Console.WriteLine(ex.Message);
}
Console.WriteLine(doc.DocumentNode.InnerHtml);
Console.ReadKey();
}
All it does is that it decrypts/extracts the gzip message that we receive.
How did I know it was GZIP you ask? The response stream from the debugger said that the ContentEncoding was gzip.
Basically just add:
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
To your code and you're good.
Related
First of all: I know this has been asked over 100 times, but most of these questions were eigher caused by timeout problems, by incorrect Url or by foregetting to close a stream (and belive me, I tried ALL the samples and none of them worked).
So, now to my question: in my Windows Phone app I'm using the HttpWebRequest to POST some data to a php web service. That service should then save the data in some directories, but to simplify it, at the moment, it only echos "hello".
But when I use the following code, I always get a 404 complete with an apache 404 html document. Therefor I think I can exclude the possibility of a timeout. It seems like the request reaches the server, but for some reason, a 404 is returned. But what really makes me be surprised is, if I use a get request, everything works fine. So here is my code:
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.CreateHttp(server + "getfeaturedpicture.php?randomparameter="+ Environment.TickCount);
webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0";
webRequest.Method = "POST";
webRequest.ContentType = "text/plain; charset=utf-8";
StreamWriter writer = new StreamWriter(await Task.Factory.FromAsync<Stream>(webRequest.BeginGetRequestStream, webRequest.EndGetRequestStream, null));
writer.Write(Encoding.UTF8.GetBytes("filter=" + Uri.EscapeDataString(filterML)));
writer.Close();
webRequest.BeginGetResponse(new AsyncCallback((res) =>
{
string strg = getResponseString(res);
Stator.mainPage.Dispatcher.BeginInvoke(() => { MessageBox.Show(strg); });
}), webRequest);
Although I don't think this is the reason, here's the source of getResponseString:
public static string getResponseString(IAsyncResult asyncResult)
{
HttpWebRequest webRequest = (HttpWebRequest)asyncResult.AsyncState;
HttpWebResponse webResponse;
try
{
webResponse = (HttpWebResponse)webRequest.EndGetResponse(asyncResult);
}
catch (WebException ex)
{
webResponse = ex.Response as HttpWebResponse;
}
MemoryStream tempStream = new MemoryStream();
webResponse.GetResponseStream().CopyTo(tempStream);
tempStream.Position = 0;
webResponse.Close();
return new StreamReader(tempStream).ReadToEnd();
}
This is tested code work fine in Post method with some body. May this gives you an idea.
public void testSend()
{
try
{
string url = "abc.com";
string str = "test";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "POST";
req.ContentType = "text/plain; charset=utf-8";
req.BeginGetRequestStream(SendRequest, req);
}
catch (WebException)
{
}
}
//Get Response and write body
private void SendRequest(IAsyncResult asyncResult)
{
string str = "test";
string Data = "data=" + str;
HttpWebRequest req= (HttpWebRequest)asyncResult.AsyncState;
byte[] postBytes = Encoding.UTF8.GetBytes(Data);
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = postBytes.Length;
Stream requestStream = req.GetRequestStream();
requestStream.Write(postBytes, 0, postBytes.Length);
requestStream.Close();
request.BeginGetResponse(SendResponse, req);
}
//Get Response string
private void SendResponse(IAsyncResult asyncResult)
{
try
{
MemoryStream ms;
HttpWebRequest request = (HttpWebRequest)asyncResult.AsyncState;
HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(asyncResult);
HttpWebResponse httpResponse = (HttpWebResponse)response;
string _responestring = string.Empty;
using (Stream data = response.GetResponseStream())
using (var reader = new StreamReader(data))
{
_responestring = reader.ReadToEnd();
}
}
catch (WebException)
{
}
}
I would suggest you to use RestSharp for your POST requests in windows phone. I am making an app for a startup and i faced lots of problems while using a similar code as yours. heres an example of a post request using RestSharp. You see, instead of using 3 functions it can be done in a more concise form. Also the response can be handled efficiently. You can get RestSharp from Nuget.
RestRequest request = new RestRequest("your url", Method.POST);
request.AddParameter("key", value);
RestClient restClient = new RestClient();
restClient.ExecuteAsync(request, (response) =>
{
if (response.StatusCode == HttpStatusCode.OK)
{
StoryBoard2.Begin();
string result = response.Content;
if (result.Equals("success"))
message.Text = "Review submitted successfully!";
else
message.Text = "Review could not be submitted.";
indicator.IsRunning = false;
}
else
{
StoryBoard2.Begin();
message.Text = "Review could not be submitted.";
}
});
It turned out the problem was on the server-side: it tried it on the server of a friend and it worked fine, there. I'll contact the support of the hoster and provide details as soon as I get a response.
I am encountering a problem getting the access_token in client application using oauth.
The returned response has empty body though in API I can see the response is not empty.
tokenresponse = {
"access_token":"[ACCESSTOKENVALUE]",
"token_type":"bearer",
"expires_in":"1200",
"refresh_token":"[REFRESHTOKENVALUE]",
"scope":"[SCOPEVALUE]"
}
The method from API that returns the token http://api.sample.com/OAuth/Token:
public ActionResult Token()
{
OutgoingWebResponse response =
this.AuthorizationServer.HandleTokenRequest(this.Request);
string tokenresponse = string.Format("Token({0})", response!=null?response.Body:""));
return response.AsActionResult();
}
The client method that requests the token is:
public string GetAuthorizationToken(string code)
{
string Url = ServerPath + "OAuth/Token";
string redirect_uri_encode = UrlEncode(ClientPath);
string param = string.Format("code={0}&client_id={1}&client_secret={2}&redirect_uri={3}&grant_type=authorization_code",code, ClientId, ClientSecret, redirect_uri_encode);
HttpWebRequest request = HttpWebRequest.Create(Url) as HttpWebRequest;
string result = null;
request.Method = "POST";
request.KeepAlive = true;
request.ContentType = "application/x-www-form-urlencoded";
request.Timeout = 10000;
request.Headers.Remove(HttpRequestHeader.Cookie);
var bs = Encoding.UTF8.GetBytes(param);
using (Stream reqStream = request.GetRequestStream())
{
reqStream.Write(bs, 0, bs.Length);
}
using (WebResponse response = request.GetResponse())
{
var sr = new StreamReader(response.GetResponseStream());
result = sr.ReadToEnd();
sr.Close();
}
if (!string.IsNullOrEmpty(result))
{
TokenData tokendata = JsonConvert.DeserializeObject<TokenData>(result);
return UpdateAuthorizotionFromToken(tokendata);
}
return null;
}
The result variable is empty.
Please let me know if you have any idea what could cause this. Initially I assumed is because of the cookies so I tried to remove them from request.
Thanks in advance.
Dear just create webclient using following code and you will get json info in tokeninfo.I used it and simply its working perfect.
WebClient client = new WebClient();
string postData = "client_id=" + ""
+ "&client_secret=" + ""
+ "&grant_type=password&username=" + "" //your username
+ "&password=" + "";//your password :)
string soundCloudTokenRes = "https://api.soundcloud.com/oauth2/token";
string tokenInfo = client.UploadString(soundCloudTokenRes, postData);
You can then use substring that contains only token from tokeninfo.
To upload tracks on sound cloud.
private void TestSoundCloudupload()
{
System.Net.ServicePointManager.Expect100Continue = false;
var request = WebRequest.Create("https://api.soundcloud.com/tracks") as HttpWebRequest;
//some default headers
request.Accept = "*/*";
request.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
request.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
request.Headers.Add("Accept-Language", "en-US,en;q=0.8,ru;q=0.6");
//file array
var files = new UploadFile[] { new UploadFile(Server.MapPath("Downloads//0.mp3"), "track[asset_data]", "application/octet-stream") };
//other form data
var form = new NameValueCollection();
form.Add("track[title]", "Some title");
form.Add("track[sharing]", "public");
form.Add("oauth_token", "");
form.Add("format", "json");
form.Add("Filename", "0.mp3");
form.Add("Upload", "Submit Query");
try
{
using (var response = HttpUploadHelper.Upload(request, files, form))
{
using (var reader = new StreamReader(response.GetResponseStream()))
{
Response.Write(reader.ReadToEnd());
}
}
}
catch (Exception ex)
{
Response.Write(ex.ToString());
}
}
I have to scrape a table from a secure site and I'm having trouble logging in to the page and retrieving the authentication token and any other associated cookies. Am I doing something wrong here?
public NameValueCollection LoginToDatrose()
{
var loginUriBuilder = new UriBuilder();
loginUriBuilder.Host = DatroseHostName;
loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
loginUriBuilder.Scheme = "https";
var boundary = Guid.NewGuid().ToString();
var postData = new NameValueCollection();
postData.Add("LoginName", DatroseUserName);
postData.Add("Password", DatrosePassword);
var data = Encoding.ASCII.GetBytes(postData.ToQueryString(false));
var request = WebRequest.Create(loginUriBuilder.Uri) as HttpWebRequest;
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var d = request.GetRequestStream())
{
d.Write(data, 0, data.Length);
}
var response = request.GetResponse() as HttpWebResponse;
var responseCookies = new NameValueCollection();
foreach (var nvp in response.Cookies.OfType<Cookie>())
{
responseCookies.Add(nvp.Name, nvp.Value);
}
//using (var responseData = response.GetResponseStream())
//using (var responseReader = new StreamReader(responseData))
//{
// var theResponse = responseReader.ReadToEnd();
// Debug.WriteLine(theResponse);
//}
return responseCookies;
}
I get no values in the return object. It does not fail. The value of theResponse (when not commented out) seems to be the HTML of the login page.
Any assistance would be greatly appreciated.
OK, the problem here seems related to the 302 redirect that would occur after the credentials were passed. The HttpWebRequest would automatically follow the 302.
Ultimately, I ended up doing things a little differently. First, I subclassed the WebClient class as follows:
public class CookiesAwareWebClient : WebClient
{
private CookieContainer outboundCookies = new CookieContainer();
private CookieCollection inboundCookies = new CookieCollection();
public CookieContainer OutboundCookies
{
get
{
return outboundCookies;
}
}
public CookieCollection InboundCookies
{
get
{
return inboundCookies;
}
}
public bool IgnoreRedirects { get; set; }
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = outboundCookies;
(request as HttpWebRequest).AllowAutoRedirect = !IgnoreRedirects;
}
return request;
}
protected override WebResponse GetWebResponse(WebRequest request)
{
WebResponse response = base.GetWebResponse(request);
if (response is HttpWebResponse)
{
inboundCookies = (response as HttpWebResponse).Cookies ?? inboundCookies;
}
return response;
}
}
This allowed me to use a WebClient class that was cookies-aware as well as one that I could control the redirect. Then I rewrote my code for logging in as follows:
public NameValueCollection LoginToDatrose()
{
var loginUriBuilder = new UriBuilder();
loginUriBuilder.Host = DatroseHostName;
loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
loginUriBuilder.Scheme = "https";
var postData = new NameValueCollection();
postData.Add("LoginName", DatroseUserName);
postData.Add("Password", DatrosePassword);
var responseCookies = new NameValueCollection();
using (var client = new CookiesAwareWebClient())
{
client.IgnoreRedirects = true;
var clientResponse = client.UploadValues(loginUriBuilder.Uri, "POST", postData);
foreach (var nvp in client.InboundCookies.OfType<Cookie>())
{
responseCookies.Add(nvp.Name, nvp.Value);
}
}
return responseCookies;
}
...and everything worked swimmingly.
I found that this code works as expected:
var url = "https://limal.info/efulfilment.php";
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
var alternativeAnswer = Encoding.UTF8.GetString(new WebClient().UploadValues(url, new NameValueCollection() { { "xml", "test" } }));
The following code however gives me a headache:
var url = "https://limal.info/efulfilment.php";
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
var request = (HttpWebRequest)WebRequest.Create(url);
request.ContentType = "application/x-www-form-urlencoded";
request.Method = "POST";
request.Timeout = 5000; // I added this for you, so you only need to wait 5 sec...
using (var requestStream = request.GetRequestStream())
{
var writer = new StreamWriter(requestStream);
writer.Write("xml=test");
}
using (var response = request.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
var reader = new StreamReader(responseStream);
var answer = reader.ReadToEnd();
}
}
Somehow the post parameters don't get recognized and I get the response:
"limal.info bridge error: Missing 'xml' variable in post request."
(The correct answer would be that the XML data is wrongly formatted, as test is invalid XML...)
Now to the next Problem:
When I use a different url a timeout exception occurs. It hangs at UploadValues in the following code. (The other example that uses HttpWebRequest hangs at GetResponse, which I tried, too.)
var url = "https://sys.efulfilment.de/rt/";
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
var alternativeAnswer = Encoding.UTF8.GetString(new WebClient().UploadValues(url, new NameValueCollection() { { "xml", "test" } }));
I read about similar problems here and on other sites. It seems that using Http POST with SSL is a problem in .NET.
WHY?? :(
I'm trying to make a function that checks if a site is online or not, but is having some problem with the timeout. I want to limit it to a max 3 sec, if there is no respons within 3 sec I should see the page as offline.
My try:
class OnlineCheck
{
public static bool IsOnline(string url)
{
try
{
WebClient webclient = new WebClient();
webclient.Headers.Add(HttpRequestHeader.KeepAlive, "1000");
webclient.OpenRead(url);
}
catch { return false; }
return true;
}
}
The WebClient doesn't support timeout. But you can use the HttpWebRequest!
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Endpoint);
request.Timeout=3000;
request.GetResponse();
If you want to check that the site is online, you are not really interested in the content of the page, just that you get a response. To make that more efficient, you should only request the http headers. Here is a quick example on how you could do:
private static IEnumerable<HttpStatusCode> onlineStatusCodes = new[]
{
HttpStatusCode.Accepted,
HttpStatusCode.Found,
HttpStatusCode.OK,
// add more codes as needed
};
private static bool IsSiteOnline(string url, int timeout)
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
{
if (request != null)
{
request.Method = "HEAD"; // get headers only
request.Timeout = timeout;
using (var response = request.GetResponse() as HttpWebResponse)
{
return response != null && onlineStatusCodes.Contains(response.StatusCode);
}
}
}
return false;
}
Use HttpWebRequest rather than WebClient. HttpWebRequest class has a timeout property.
You can try this code:
System.Net.WebRequest r = System.Net.WebRequest.Create("http://www.google.com");
r.Timeout = 3000;
System.Net.WebProxy proxy = new System.Net.WebProxy("<proxy address>");
System.Net.NetworkCredential credentials = new System.Net.NetworkCredential();
credentials.Domain = "<domain>";
credentials.UserName = "<login>";
credentials.Password = "<pass>";
proxy.Credentials = credentials;
r.Proxy = proxy;
try
{
System.Net.WebResponse rsp = r.GetResponse();
}
catch (Exception)
{
MessageBox.Show("Is not avaliable");
return;
}
MessageBox.Show("Avaliable!");
static bool isOnline (string URL)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
request.Timeout = 3000;
try
{
WebResponse resp = request.GetResponse();
}
catch (WebException e)
{
if (((HttpWebResponse)e.Response).StatusCode == HttpStatusCode.NotFound)
{
return false;
}
}
return true;
}