How to read HTML source of a page that requires NTML authentication

How to read HTML source of a page that requires NTML authentication - c#

I need to get the HTML source of the web page.
This web page is a part of the web site that requires NTLM authentication.
This authentication is silent because Internet Explorer can use Windows log-in credentials.
Is it possible to reuse this silent authentication (i.e. reuse Windows log-in credentials), without making the user enter his/her credentials manually?
The options I have tried are below.
string url = #"http://myWebSite";
//works fine
System.Diagnostics.Process.Start("IExplore.exe", url);
InternetExplorer ie = null;
ie = new SHDocVw.InternetExplorer();
ie.Navigate(url);
//Works up to here, but I do not know how to read the HTML source with SHDocVw
NHtmlUnit.WebClient webClient = new NHtmlUnit.WebClient(BrowserVersion.INTERNET_EXPLORER_8);
HtmlPage htmlPage = webClient.GetHtmlPage(url);
string ghjg = htmlPage.WebResponse.ContentAsString; // Error 401
System.Net.WebClient client = new System.Net.WebClient();
client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Proxy.Credentials = CredentialCache.DefaultCredentials;
// DefaultNetworkCredentials and DefaultCredentials are empty
client.Headers.Add("user-agent", "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)");
string reply = client.DownloadString(url); // Error 401
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
IWebProxy proxy = request.Proxy;
// Print the Proxy Url to the console.
if (proxy != null)
{
// Use the default credentials of the logged on user.
proxy.Credentials = CredentialCache.DefaultNetworkCredentials;
// DefaultNetworkCredentials are empty
}
request.UserAgent = "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)";
request.Accept = "*/*";
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream stream = response.GetResponseStream(); // Error 401

Related

HTTPWebRequest fails when setting CookieContainer

I used the Fiddler extension RequestToCode to replay a POST from logging into Yahoo.
When I run the code, I can see in Fiddler that the login was successful and there are 10 cookies in the response.
In my code though, the response.Cookies had a count of 0.
So I updated my HTTPWebRequest and set:
request.CookieContainer = new CookieContainer();
When I run the code again and look at it in Fiddler I see the login failed because the response navigates to a failed login url.
My ultimate goal is to get the cookies from the login attempt to use in a later Get request to Yahoo.
Why is setting the cookie container causing a failure?

Maybe because you initializing new CookieContainer on every request.
Declare public variable CookieContainer cookies = new CookieContainer();
Now your new requests will use the same CookieContainer, example:
var request = (HttpWebRequest)WebRequest.Create("https://www.yahoo.com/");
request.CookieContainer = cookies;
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.Headers.Add("accept-language", "en,hr;q=0.9");
request.Headers.Add("accept-encoding", "");
request.Headers.Add("Upgrade-Insecure-Requests", "1");
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string responseFromServer = reader.ReadToEnd();
reader.Close();
response.Close();

get method returns root url content

i try to ge the content of this url: https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1
but as a result of the following code the response contains the content of this url, the home page: https://www.eganba.com
in addition, when i try to get the first url content via Postman application the response is correct.
do you have any idea?
WebRequest request = WebRequest.Create("https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
request.Method = "GET";
request.Headers["X-Requested-With"] = "XMLHttpRequest";
WebResponse response = request.GetResponse();

Use WebClient method which inside System.Net. I think this code gives you what you need. It return the page's html
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Headers.Add("accept", "text/html");
var htmlCode = client.DownloadString("https://www.eganba.com/?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
var result = htmlCode.Contains("Stokta var") ? true : false;
}
Hope it helps to you.

"The underlying connection was closed: The connection was closed unexpectedly"

When i try to get html page i get this error:
The underlying connection was closed: The connection was closed unexpectedly
I think the site I'm getting, is using some protection based on ip.
WebClient single_page_client = new WebClient();
single_page_client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)");
string cat_page_single = single_page_client.DownloadString(the_url);
How can i do it?
What about use proxy with Webclient?
EDIT
If i use this code, it works. Why?
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(current_url);
webrequest.KeepAlive = true;
webrequest.Method = "GET";
webrequest.ContentType = "text/html";
webrequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
//webrequest.Connection = "keep-alive";
webrequest.Host = "cat.sabresonicweb.com";
webrequest.Headers.Add("Accept-Language", "en-US,en;q=0.5");
webrequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0";
HttpWebResponse webresponse = (HttpWebResponse)webrequest.GetResponse();
Console.Write(webresponse.StatusCode);
Stream receiveStream = webresponse.GetResponseStream();
Encoding enc = System.Text.Encoding.GetEncoding(1252);//1252
StreamReader loResponseStream = new StreamReader(receiveStream, enc);
string current_page = loResponseStream.ReadToEnd();
loResponseStream.Close();
webresponse.Close();

The first request does not utilize a header that indicates the length of the result. It closes the connection when it finishes.
The second request utilizes the length header, reads the designated number of bytes, then closes the connection gracefully. (under the client side control instead of server driven disconnection)
-or-
The url you sent caused an error on the server. Is there an error in the server log or event viewer?

How to access external live meeting url via HttpWebRequest

I'm trying to access external live meeting url using Httpwebrequest, and getting 401 unauthorized error. Same code is working in my local system.
Code:
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(PostingUrl);
CredentialCache CredMCCache = new CredentialCache();
myReq.PreAuthenticate = true;
CredMCCache.Add(new System.Uri(PostingUrl),"Basic",new System.Net.NetworkCredential("username","password")
myReq.Credentials = CredMCCache;
myReq.KeepAlive = true;
myReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)";
myReq.Accept = "*/*";
myReq.Headers.Add("Accept-Language", "en-us");
myReq.Headers.Add("Accept-Encoding", "gzip, deflate");
WebProxy proxyObject = new WebProxy("proxy url with port", false);
myReq.Proxy = proxyObject;
myReq.Proxy.Credentials = CredentialCache.DefaultNetworkCredentials;
myReq.Method = "GET";
HttpWebResponse myResp = null;
// Get the response from the conference center
myResp = (HttpWebResponse)myReq.GetResponse();
I am getting the error in the above line. Any pointers will be helpful.

Why are you setting the proxy, eg
myReq.Proxy = proxyObject;
Do you need to do this? if you are indeed going thru a corporate proxy you shouldnt need to set the proxy for the HttpWebRequest as it will be pick up the settings (if any) from IE.
Secondly, are you trying to use basic authentication to authenticate with the remote server? It looks like you are, so use this instead to set the authenitcation details in the header
string authInfo = userName + ":" + userPassword;
authInfo = Convert.ToBase64String(Encoding.Default.GetBytes(authInfo));
myReq.Headers["Authorization"] = "Basic " + authInfo;

Why Does my HttpWebRequest Return 400 Bad request?

The following code fails with a 400 bad request exception. My network connection is good and I can go to the site but I cannot get this uri with HttpWebRequest.
private void button3_Click(object sender, EventArgs e)
{
WebRequest req = HttpWebRequest.Create(#"http://www.youtube.com/");
try
{
//returns a 400 bad request... Any ideas???
WebResponse response = req.GetResponse();
}
catch (WebException ex)
{
Log(ex.Message);
}
}

First, cast the WebRequest to an HttpWebRequest like this:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(#"http://www.youtube.com/");
Then, add this line of code:
req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";

Set UserAgent and Referer in your HttpWebRequest:
var request = (HttpWebRequest)WebRequest.Create(#"http://www.youtube.com/");
request.Referer = "http://www.youtube.com/"; // optional
request.UserAgent =
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; " +
"Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; " +
".NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30618; " +
"InfoPath.2; OfficeLiveConnector.1.3; OfficeLivePatch.0.0)";
try
{
var response = (HttpWebResponse)request.GetResponse();
using (var reader = new StreamReader(response.GetResponseStream()))
{
var html = reader.ReadToEnd();
}
}
catch (WebException ex)
{
Log(ex);
}

There could be many causes for this problem. Do you have any more details about the WebException?
One cause, which I've run into before, is that you have a bad user agent string. Some websites (google for instance) check that requests are coming from known user agents to prevent automated bots from hitting their pages.
In fact, you may want to check that the user agreement for YouTube does not preclude you from doing what you're doing. If it does, then what you're doing may be better accomplished by going through approved channels such as web services.

Maybe you've got a proxy server running, and you haven't set the Proxy property of the HttpWebRequest?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read HTML source of a page that requires NTML authentication - c#

Related

HTTPWebRequest fails when setting CookieContainer

get method returns root url content

"The underlying connection was closed: The connection was closed unexpectedly"

How to access external live meeting url via HttpWebRequest

Why Does my HttpWebRequest Return 400 Bad request?

Categories

Resources