LINQ to XML User-Agent header value - c#

How can I specify HTTP User-Agent header for LINQ to XML to use for its requests when I call XElement.Load(url)?
I use for calls to Web API and it's required, that my client describes itself properly in User-Agent header.

You could use WebClient for specify user agent
using (var webClient = new WebClient())
{
webClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
using (var stream = webClient.OpenRead("http://server.com"))
{
XElement.Load(stream);
}
}
or
using (var webClient = new WebClient())
{
webClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
XElement.Parse(webClient.DownloadString(url));
}

Related

get method returns root url content

i try to ge the content of this url: https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1
but as a result of the following code the response contains the content of this url, the home page: https://www.eganba.com
in addition, when i try to get the first url content via Postman application the response is correct.
do you have any idea?
WebRequest request = WebRequest.Create("https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
request.Method = "GET";
request.Headers["X-Requested-With"] = "XMLHttpRequest";
WebResponse response = request.GetResponse();
Use WebClient method which inside System.Net. I think this code gives you what you need. It return the page's html
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Headers.Add("accept", "text/html");
var htmlCode = client.DownloadString("https://www.eganba.com/?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
var result = htmlCode.Contains("Stokta var") ? true : false;
}
Hope it helps to you.

I am getting different page in httpwebrequest in C#

I am doing a httpwebrequest to recevie a web data from americalapperal.com using this code
var request = (HttpWebRequest)WebRequest.Create("http://store.americanapparel.net/en/sports-bra_rsaak301?c=White");
request.UserAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36";
var response = request.GetResponse();
//cli.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
using (var reader = new StreamReader(response.GetResponseStream()))
{
var data = reader.ReadToEnd();
return data;
}
I am receiving data from this url
http://store.americanapparel.net/en/sports-bra_rsaak301?c=White
But this live data is different and the data received my httpwebrequest is different
how could i get exact page data in c#?

How to read HTML source of a page that requires NTML authentication

I need to get the HTML source of the web page.
This web page is a part of the web site that requires NTLM authentication.
This authentication is silent because Internet Explorer can use Windows log-in credentials.
Is it possible to reuse this silent authentication (i.e. reuse Windows log-in credentials), without making the user enter his/her credentials manually?
The options I have tried are below.
string url = #"http://myWebSite";
//works fine
System.Diagnostics.Process.Start("IExplore.exe", url);
InternetExplorer ie = null;
ie = new SHDocVw.InternetExplorer();
ie.Navigate(url);
//Works up to here, but I do not know how to read the HTML source with SHDocVw
NHtmlUnit.WebClient webClient = new NHtmlUnit.WebClient(BrowserVersion.INTERNET_EXPLORER_8);
HtmlPage htmlPage = webClient.GetHtmlPage(url);
string ghjg = htmlPage.WebResponse.ContentAsString; // Error 401
System.Net.WebClient client = new System.Net.WebClient();
client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Proxy.Credentials = CredentialCache.DefaultCredentials;
// DefaultNetworkCredentials and DefaultCredentials are empty
client.Headers.Add("user-agent", "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)");
string reply = client.DownloadString(url); // Error 401
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
IWebProxy proxy = request.Proxy;
// Print the Proxy Url to the console.
if (proxy != null)
{
// Use the default credentials of the logged on user.
proxy.Credentials = CredentialCache.DefaultNetworkCredentials;
// DefaultNetworkCredentials are empty
}
request.UserAgent = "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)";
request.Accept = "*/*";
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream stream = response.GetResponseStream(); // Error 401

WebClient.DownloadString(url) does not work with 2nd parameter

I am using the WebClient.DownloadString() method to download some data. I am using the following code:
static void Main(string[] args)
{
string query = "select+%3farticle+%3fmesh+where+{+%3farticle+a+npg%3aArticle+.+%3farticle+npg%3ahasRecord+[+dc%3asubject+%3fmesh+]+.+filter+regex%28%3fmesh%2c+\"blood\"%2c+\"i\"%29+}";
NameValueCollection queries = new NameValueCollection();
queries.Add("query", query);
//queries.Add("output", "sparql_json");
using (WebClient wc = new WebClient())
{
wc.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
wc.QueryString = queries;
string result = wc.DownloadString("http://data.nature.com/sparql");
Console.WriteLine(result);
}
Console.ReadLine();
}
With this code, tt works fine and gives me an xml string as the output. But I would like to get a JSON output and hence I un-commented the line
queries.Add("output", "sparql_json");
and executed the same program and it seems to be fetching an error message from the server.
However, if I try to use a web browser and use the same url (as given below), it gives me a JSON as expected:
URL that works in browsers
I am wondering what the problem could be. Especially when it works in a browser and not using a webclient. Is the webclient doing something different here?
Please note that I also tried to specify the query as
query + "&output=sparql_json"
But that does not work either.
Could someone please tell me what the problem might be?
Thanks
Add wc.Headers.Add("Accept","application/json");. Here is the full source I tested
string query = "select ?article ?mesh where { ?article a npg:Article . ?article npg:hasRecord [ dc:subject ?mesh ] . filter regex(?mesh, \"blood\", \"i\") }";
NameValueCollection queries = new NameValueCollection();
queries.Add("query", query);
queries.Add("output", "sparql_json");
using (WebClient wc = new WebClient())
{
wc.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
wc.Headers.Add("Accept","application/json");
wc.QueryString = queries;
string result = wc.DownloadString("http://data.nature.com/sparql");
Console.WriteLine(result);
}
Console.ReadLine();

Why Does my HttpWebRequest Return 400 Bad request?

The following code fails with a 400 bad request exception. My network connection is good and I can go to the site but I cannot get this uri with HttpWebRequest.
private void button3_Click(object sender, EventArgs e)
{
WebRequest req = HttpWebRequest.Create(#"http://www.youtube.com/");
try
{
//returns a 400 bad request... Any ideas???
WebResponse response = req.GetResponse();
}
catch (WebException ex)
{
Log(ex.Message);
}
}
First, cast the WebRequest to an HttpWebRequest like this:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(#"http://www.youtube.com/");
Then, add this line of code:
req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";
Set UserAgent and Referer in your HttpWebRequest:
var request = (HttpWebRequest)WebRequest.Create(#"http://www.youtube.com/");
request.Referer = "http://www.youtube.com/"; // optional
request.UserAgent =
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; " +
"Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; " +
".NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30618; " +
"InfoPath.2; OfficeLiveConnector.1.3; OfficeLivePatch.0.0)";
try
{
var response = (HttpWebResponse)request.GetResponse();
using (var reader = new StreamReader(response.GetResponseStream()))
{
var html = reader.ReadToEnd();
}
}
catch (WebException ex)
{
Log(ex);
}
There could be many causes for this problem. Do you have any more details about the WebException?
One cause, which I've run into before, is that you have a bad user agent string. Some websites (google for instance) check that requests are coming from known user agents to prevent automated bots from hitting their pages.
In fact, you may want to check that the user agreement for YouTube does not preclude you from doing what you're doing. If it does, then what you're doing may be better accomplished by going through approved channels such as web services.
Maybe you've got a proxy server running, and you haven't set the Proxy property of the HttpWebRequest?

Categories