Detecting top level frame BHO - c#

Sorry for my ignorance. You'll have to explain things to me, I'm treading in new waters. I have some background in JAVA but mostly php, javascript.
http://www.codeproject.com/Articles/19971/How-to-attach-to-Browser-Helper-Object-BHO-with-C
I followed this article with some of my own modifications and my question is specifically, how do I detect the "top level frame" of the webpage, ie the parent document. Any code I execute in OnDocumentComplete will run when any iframes on the page have also completed.
My function and the solution I implemented isn't actually producing the correct results.
public class BHO:IObjectWithSite
{
WebBrowser webBrowser;
HTMLDocument document;
public void OnDocumentComplete(object pDisp, ref object URL)
{
document = (HTMLDocument)webBrowser.Document;
string href = document.location.href;
//get top level page
if (href == URL.ToString())
{
HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create("http://mysite.com");
WebReq.Method = "POST";
WebReq.ContentType = "application/x-www-form-urlencoded";
byte[] buffer = Encoding.ASCII.GetBytes("string");
WebReq.ContentLength = buffer.Length;
Stream PostData = WebReq.GetRequestStream();
PostData.Write(buffer, 0, buffer.Length);
PostData.Close();
// Prepare web request and send the data.
HttpWebResponse WebResp = (HttpWebResponse)WebReq.GetResponse();
StreamReader streamResponse = new StreamReader(WebResp.GetResponseStream(), true);
string Response = streamResponse.ReadToEnd();
Newtonsoft.Json.Linq.JObject json = Newtonsoft.Json.Linq.JObject.Parse(Response);
string active = json["active"].ToString();
//print to screen
System.Windows.Forms.MessageBox.Show(active, "Title");
}
}
Checking if document.location.href matches URL works in most cases but is not guaranteed. So the result is I end up with multiple webrequests and popups on 1 page load.

The easiest way is to store the web browser object (IWebBrowser2) in an object property in the SetSite method (examples in C++ but should be straightforward to translate to C#):
CComQIPtr<IServiceProvider> pServiceProvider(pUnkSite);
if (!pServiceProvider) {
return E_FAIL;
}
pServiceProvider->QueryService(SID_SWebBrowserApp, IID_IWebBrowser2, (LPVOID*)&m_WebBrowser.p);
if (!m_WebBrowser) {
return E_FAIL;
}
This will store the browser pointer in the object member m_WebBrowser. Then you can compare with the pDisp parameter to OnDocumentComplete:
CComQIPtr<IWebBrowser2> webBrowser(pDisp);
if (webBrowser == m_WebBrowser) {
// This is the top-level page.
}

Related

content from a website in a text file

My aim is to get content from a website (for instance a league table from a sports website) and put it in a .txt file so that I can code with a local file.
I have tried multiple lines of code and others examples such as:
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.stackoverflow.com");
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.stackoverflow.com");
// execute the request
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
while (count > 0); // any more data to read?
}
My issue is when trying this, is that the words request and response are underlined in read and all the tokens are invalid.
Is there a better method to get content from a website to a .txt file or is there a way to fix the code supplied?
Thanks
is there a way to fix the code supplied?
The code you submitted works for me, make sure you have the proper name spaces defined.
In this case : using System.Net;
Or might it be that the duplicate creation of the variable request isn't a typo?
If so remove one of the request variables.
Is there a better method to get content from a website to a .txt file
Since you're reading all the content from the site anyway there isn't really a need for the while loop. Instead you can use the ReadToEnd method supplied by the StreamReader.
string siteContent = "";
using (StreamReader reader = new StreamReader(resStream)) {
siteContent = reader.ReadToEnd();
}
Also be sure to dispose of the WebResponse, other than that your code should work fine.

How to cancel large file download yet still get page source in C#?

I'm working in C# on a program to list all course resources for a MOOC (e.g. Coursera). I don't want to download the content, just get a listing of all the resources (e.g. pdf, videos, text files, sample files, etc...) which are made available to the course.
My problem lies in parsing the html source (currently using HtmlAgilityPack) without downloading all the content.
For example, if you go to this intro video for a banking course on Coursera and check the source (F12 in Chrome for Developer Tools), you can see the page source. I can stop the video download which autoplays, but still see the source.
How can I get the source in C# without download all the content?
I've looked in the HttpWebRequest headers (problem: time out), and DownloadDataAsync with Cancel (problem: the Completed Result object is invalid when cancelling the async request). I've also tried various Loads from HtmlAgilityPack but with no success.
Time out:
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Timeout = TIMEOUT * 1000000; //Really long
postRequest.Referer = "https://www.coursera.org";
if (headers != null)
{ //headers here }
//Deal with cookies
if (cookie != null)
{ cookieJar.Add(cookie); }
postRequest.CookieContainer = cookiejar;
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse)postRequest.GetResponse();
Any tips on how to proceed?
There are at least two ways to do what you're asking. The first is to use a range get. That is, specify the range of the file you want to read. You do that by calling AddRange on the HttpWebRequest. So if you want, say, the first 10 kilobytes of the file, you'd write:
request.AddRange(-10240);
Read carefully what the documentation says about the meaning of that parameter. If it's negative, it specifies the ending point of the range. There are also other overloads of AddRange that you might be interested in.
Not all servers support range gets, though. If that doesn't work, you'll have to do it another way.
What you can do is call GetResponse and then start reading data. Once you've read as much data as you want, you can stop reading and close the stream. I've modified your sample slightly to show what I mean.
string url = "https://www.coursera.org/course/money";
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = true; //allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse) postRequest.GetResponse();
int maxBytes = 1024*1024;
int totalBytesRead = 0;
var buffer = new byte[maxBytes];
using (var s = postResponse.GetResponseStream())
{
int bytesRead;
// read up to `maxBytes` bytes from the response
while (totalBytesRead < maxBytes && (bytesRead = s.Read(buffer, 0, maxBytes)) != 0)
{
// Here you can save the bytes read to a persistent buffer,
// or write them to a file.
Console.WriteLine("{0:N0} bytes read", bytesRead);
totalBytesRead += bytesRead;
}
}
Console.WriteLine("total bytes read = {0:N0}", totalBytesRead);
That said, I ran this sample and it downloaded about 6 kilobytes and stopped. I don't know why you're having trouble with timeouts or too much data.
Note that sometimes trying to close the stream before the entire response is read will cause the program to hang. I'm not sure why that happens at all, and I can't explain why it only happens sometimes. But you can solve it by calling request.Abort before closing the stream. That is:
using (var s = postResponse.GetResponseStream())
{
// do stuff here
// abort the request before continuing
postRequest.Abort();
}

HttpWebClient has High Memory Use in MonoTouch

I have a MonoTouch based iOS universal app. It uses REST services to make calls to get data. I'm using the HttpWebRequest class to build and make my calls. Everything works great, with the exception that it seems to be holding onto memory. I've got usings all over the code to limit the scope of things. I've avoided anonymous delegates as well as I had heard they can be a problem. I have a helper class that builds up my call to my REST service. As I make calls it seems to just hold onto memory from making my calls. I'm curious if anyone has run into similar issues with the HttpWebClient and what to do about it. I'm currently looking to see if I can make a call using an nsMutableRequest and just avoid the HttpWebClient, but am struggling with getting it to work with NTLM authentication. Any advice is appreciated.
protected T IntegrationCall<T,I>(string methodName, I input) {
HttpWebRequest invokeRequest = BuildWebRequest<I>(GetMethodURL(methodName),"POST",input, true);
WebResponse response = invokeRequest.GetResponse();
T result = DeserializeResponseObject<T>((HttpWebResponse)response);
invokeRequest = null;
response = null;
return result;
}
protected HttpWebRequest BuildWebRequest<T>(string url, string method, T requestObject, bool IncludeCredentials)
{
ServicePointManager.ServerCertificateValidationCallback = Validator;
var invokeRequest = WebRequest.Create(url) as HttpWebRequest;
if (invokeRequest == null)
return null;
if (IncludeCredentials)
{
invokeRequest.Credentials = CommonData.IntegrationCredentials;
}
if( !string.IsNullOrEmpty(method) )
invokeRequest.Method = method;
else
invokeRequest.Method = "POST";
invokeRequest.ContentType = "text/xml";
invokeRequest.Timeout = 40000;
using( Stream requestObjectStream = new MemoryStream() )
{
DataContractSerializer serializedObject = new DataContractSerializer(typeof(T));
serializedObject.WriteObject(requestObjectStream, requestObject);
requestObjectStream.Position = 0;
using(StreamReader reader = new StreamReader(requestObjectStream))
{
string strTempRequestObject = reader.ReadToEnd();
//byte[] requestBodyBytes = Encoding.UTF8.GetBytes(strTempRequestObject);
Encoding enc = new UTF8Encoding(false);
byte[] requestBodyBytes = enc.GetBytes(strTempRequestObject);
invokeRequest.ContentLength = requestBodyBytes.Length;
using (Stream postStream = invokeRequest.GetRequestStream())
{
postStream.Write(requestBodyBytes, 0, requestBodyBytes.Length);
}
}
}
return invokeRequest;
}
Using using is the right thing to do - but your code seems to be duplicating the same content multiple times (which it should not do).
requestObjectStream is turned into a string which is then turned into a byte[] before being written to another stream. And that's without considering what the extra code (e.g. ReadToEnd and UTF8Encoding.GetBytes) might allocate themselves (e.g. like more strings, byte[]...).
So if what you serialize is large then you'll consume a lot of extra memory (for nothing). It's even a bit worse for stringand byte[] since you can't dispose them manually (GC will decide when, making measurement harder).
I would try (but did not ;-) something like:
...
using (Stream requestObjectStream = new MemoryStream ()) {
DataContractSerializer serializedObject = new DataContractSerializer(typeof(T));
serializedObject.WriteObject(requestObjectStream, requestObject);
requestObjectStream.Position = 0;
invokeRequest.ContentLength = requestObjectStream.Length;
using (Stream postStream = invokeRequest.GetRequestStream())
requestObjectStream.CopyTo (postStream);
}
...
That would let the MemoryStream copy itself to the request stream. An alternative is to call ToArray to the MemoryStream (but that's another copy of the serialized object that the GC will have to track and free).

My post request to https://qrng.physik.hu-berlin.de/ failed, why?

the page at https://qrng.physik.hu-berlin.de/ provides a high bit rate quantum number generator web service and I'm trying to access that service.
However I could not manage to do so. This is my current code:
using System;
using System.Collections.Generic;
using System.Linq;
using S=System.Text;
using System.Security.Cryptography;
using System.IO;
namespace CS_Console_App
{
class Program
{
static void Main()
{
System.Net.ServicePointManager.Expect100Continue = false;
var username = "testuser";
var password = "testpass";
System.Diagnostics.Debug.WriteLine(Post("https://qrng.physik.hu-berlin.de/", "username="+username+"&password="+password));
Get("http://qrng.physik.hu-berlin.de/download/sampledata-1MB.bin");
}
public static void Get(string url)
{
var my_request = System.Net.WebRequest.Create(url);
my_request.Credentials = System.Net.CredentialCache.DefaultCredentials;
var my_response = my_request.GetResponse();
var my_response_stream = my_response.GetResponseStream();
var stream_reader = new System.IO.StreamReader(my_response_stream);
var content = stream_reader.ReadToEnd();
System.Diagnostics.Debug.WriteLine(content);
stream_reader.Close();
my_response_stream.Close();
}
public static string Post(string url, string data)
{
string vystup = null;
try
{
//Our postvars
byte[] buffer = System.Text.Encoding.ASCII.GetBytes(data);
//Initialisation, we use localhost, change if appliable
System.Net.HttpWebRequest WebReq = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);
//Our method is post, otherwise the buffer (postvars) would be useless
WebReq.Method = "POST";
//We use form contentType, for the postvars.
WebReq.ContentType = "application/x-www-form-urlencoded";
//The length of the buffer (postvars) is used as contentlength.
WebReq.ContentLength = buffer.Length;
//We open a stream for writing the postvars
Stream PostData = WebReq.GetRequestStream();
//Now we write, and afterwards, we close. Closing is always important!
PostData.Write(buffer, 0, buffer.Length);
PostData.Close();
//Get the response handle, we have no true response yet!
System.Net.HttpWebResponse WebResp = (System.Net.HttpWebResponse)WebReq.GetResponse();
//Let's show some information about the response
Console.WriteLine(WebResp.StatusCode);
Console.WriteLine(WebResp.Server);
//Now, we read the response (the string), and output it.
Stream Answer = WebResp.GetResponseStream();
StreamReader _Answer = new StreamReader(Answer);
vystup = _Answer.ReadToEnd();
//Congratulations, you just requested your first POST page, you
//can now start logging into most login forms, with your application
//Or other examples.
}
catch (Exception ex)
{
throw ex;
}
return vystup.Trim() + "\n";
}
}
}
I'm having 403 forbidden error when I try to do a get request on http://qrng.physik.hu-berlin.de/download/sampledata-1MB.bin.
After debugging abit, I've realised that even though I've supplied a valid username and password, the response html that was sent after my POST request indicate that I was actually not logon to the system after my POST request.
Does anyone know why is this the case, and how may I work around it to call the service?
Bump. can anyone get this to work or is the site just a scam?
The site is surely not a scam. I developed the generator and I put my scientific reputation in it. The problem is that you are trying to use the service in a way that was not intended. The sample files were really only meant to be downloaded manually for basic test purposes. Automated access to fetch data into an application was meant to be implemented through the DLLs we provide.
On the other hand, I do not know of any explicit intent to prevent your implementation to work. I suppose if a web browser can log in and fetch data, some program should be able to do the same. Maybe only the login request is just a little more complicated. No idea. The server software was developed by someone else and I cannot bother him with this right now.
Mick
Actually, the generator can now also be purchased. See here:
http://www.picoquant.com/products/pqrng150/pqrng150.htm
Have you tried to change this
my_request.Credentials = System.Net.CredentialCache.DefaultCredentials
to
my_request.Credentials = new NetworkCredential(UserName,Password);
as described on MSDN page?

asp.net Page loading twice when making an underlying connection

I have a aspx page that seems to be loading twice when I enter the Url to the page.
In this page's loading event, I'm making an connection to a server to retrieve a document and then I output the downloaded bytes to the output stream of the page.
This is causing the page to load twice for some strange reason. If I hard code a byte array without making this connection, the page loads once and all is well.
Here are the methods used to retrieve the external document. Maybe you can see something I can't.
public static byte[] GetDocument(string url)
{
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
Stream stream = myHttpWebResponse.GetResponseStream();
byte[] _Data = StreamToBytes(stream);
return _Data;
}
private static byte[] StreamToBytes(System.IO.Stream theStream)
{
if (theStream == null)
throw new ArgumentException("URL null.");
int bytesRead = 0;
byte[] buffer = new byte[8096];
MemoryStream bufferStream = new MemoryStream();
try
{
do
{
bytesRead = theStream.Read(buffer, 0, 8096);
bufferStream.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
}
finally
{
bufferStream.Flush();
theStream.Close();
theStream.Dispose();
}
return bufferStream.ToArray();
}
The likely culprit is having the page directive of AutoEventWireup="true" in addition to OnInit() having this.Page_Load += Page_Load;
Auto Event Wireup does what it sounds like. If there is a method that follows the naming convention, the event is automatically wired up.
You also oftentimes see this on button handlers. The button handler will be set specifically, and the page will also create a button handler if the name follows the convention buttonname_OnClick(sender,args)
These kind of problems often happens due to img tags that have an empty src ...

Categories