Get opened tabs information of Firefox? - c#

I'm trying to write a simple program that should search in Firefox window for duplicated tabs (checking the url of the tab) and then close the found duplicated tabs.
The idea is simple, but the implementation seems a nightmare.
Doing a lot of researchs messing with WinAPI I've found nDde library, which could retrieve the url of the current tab easy like this example:
VB.NET
Imports NDde.Client
Using dde As New DdeClient("Firefox", "WWW_GetWindowInfo")
dde.Connect()
Dim Url As String = dde.Request("URL", Integer.MaxValue).
Trim({ControlChars.NullChar, ControlChars.Quote, ","c})
MessageBox.Show(Url)
dde.Disconnect()
End Using
C#:
using (DdeClient dde = new DdeClient("Firefox", "WWW_GetWindowInfo")) {
dde.Connect();
string Url = dde.Request("URL", int.MaxValue).Trim({
ControlChars.NullChar,
ControlChars.Quote,
','
});
MessageBox.Show(Url);
dde.Disconnect();
}
//=======================================================
//Service provided by Telerik (www.telerik.com)
//Conversion powered by NRefactory.
//Twitter: #telerik
//Facebook: facebook.com/telerik
//=======================================================
But my knowledges about this library or dde in general are zero, so what I'm doing by the moment is sending ctrl+Tab keys to Firefox to change between tabs to get the url of each tab and then close duplicated founds sending ctrl+w, but this way I have not a reference point to know which tab was the "starting point" to know when I need to stop the dup-tab searching 'cause the first checked url could have a duplicated tab too, and also I can't know the exact number of opened tabs to have an Index reference.
I'm lost.
My question is, this library (or another library related to dde, or another totally different way) could retrieve at least one of those things in a dynamic way?:
· The url of the first tab, I mean the tab that is at the top-left, the first of all opened tabs.
· The total amount count of opened tabs.
· The url of all tabs.

There is already a Firefox extension that:
https://addons.mozilla.org/en-US/firefox/addon/duplicate-tabs-closer/

Related

Cannot access same source HTML as a browser

I am coming back to work on a BOT that scraped data from a site once a day for my personal use.
However they have changed the code during COVID and now it seems they are loading in a lot of the content with Ajax/JavaScript.
I thought that if I did a WebRequest and obtained the response HTML from a URL, it should match the same content that I see in a browser (FF/Chrome) when I right click and "view source". I thought the actual DOM and generated source code would come later when those files were loaded as onload events fired, scripts lazily loaded and so on.
However the source HTML I obtain with my BOT is NOT the same as the HTML I see when viewing the source code. So my regular expressions that find certain links are not available to me.
Why am I seeing a difference between "view source" and a download of the HTML?
I can only think that when the page loads, SCRIPTS run that load other content into the page and that when I view source I am actually seeing a partial generated source rather than the original source code. Therefore is there a way I can call the page with my BOT, wait X seconds before obtaining the response to get this "onload" generated HTML?
Or even better a way for MY BOT (not using someone elses), to view generated source.
This BOT runs as a web service. I can find another site to scrape but it's just painful when I have all the regular expressions working on the source I see, except it's NOT the source my BOT obtains.
A bit confused at why my browser is showing me more content with a view source (not generated source), than my BOT gets when making a valid request.
Any help would be much appreciated this is almost an 8 year project that I have been doing on/off and this change has ruined one of the core parts of the system.
In response to OP's comment, here is the Java code for how to click at different parts on the screen to do this:
You could use Java's Robot class. I just learned about it a few days ago:
// Import
import java.awt.Robot;
// Code
void click(int x, int y, int btn) {
Robot robot = new Robot();
robot.mouseMove(x, y);
robot.mousePress(btn);
robot.mouseRelease(btn);
}
You would then run the click function with the x and y position to click, as well as the button (MouseEvent.BUTTON1, MouseEvent.BUTTON2, etc.)
After stringing together the right positions (this will vary depending on the screen) you could do just about anything.
To use shortcuts, just use the keyPress and keyRelease functions. Here is a good way to do this:
void key(int keyCode, boolean ctrl, boolean alt, boolean shift) {
if (ctrl)
robot.keyPress(KeyEvent.VK_CONTROL);
if (alt)
robot.keyPress(KeyEvent.VK_ALT);
if (shift)
robot.keyPress(KeyEvent.VK_SHIFT);
robot.keyPress(keyCode);
robot.keyRelease(keyCode);
if (ctrl)
robot.keyRelease(KeyEvent.VK_CONTROL);
if (alt)
robot.keyRelease(KeyEvent.VK_ALT);
if (shift)
robot.keyRelease(KeyEvent.VK_SHIFT);
}
Thus, something like Ctrl+Shift+I to open the inspect menu would look like this:
key(KeyEvent.VK_I, true, false, true);
Here are the steps to copy a website's code (from the inspector) with Google Chrome:
Ctrl + Shift + I
Right click the HTML tag
Select "Edit as HTML"
Ctrl + A
Ctrl + C
Then, you can use the technique from this StackOverflow to get the content from the clipboard:
Clipboard c = Toolkit.getDefaultToolkit().getSystemClipboard();
String text = (String) c.getData(DataFlavor.stringFlavor);
Using something like FileOutputStream to put the info into a file:
FileOutputStream output = new FileOutputStream(new File( PATH HERE ));
output.write(text.getBytes());
output.close();
I hope this helps!
I have seemed to have fixed it by just turning on the ability to store cookies in my custom HTTP (Bot/Scraper) class, that was being called from the class trying to obtain the data. Probably the site has a defense system to prevent visitors requesting pages and not the JS/CSS with a different session ID on each request.
However I would like to see some other examples because if it is just cookies then they could use JavaScript to test for JavaScript e.g an AJAX call to log if JS is actually on or some DOM manipulation to determine if you are really Human or not which would break it again.
Every site uses different methods to prevent scrapers, email harvesters, job rapists, link harvesters etc inc working out the standard time between requests for 100% verifiable humans and BOTS and then using those values to help determine spoofed user-agents etc. I wrote a whole system to stop BOTS at my last place of work and its a layered approach, just glad the cookies being enabled solved it on this site but it could easily be beefed up with other tricks to test for BOTS vs HUMANS.
I do know some Java, enough to work out what is going on anyway. My BOT is in C#.

load bloomberg page from c#

I have an old excel workbook that I am trying to replace with a c# application. The only bit of functionality that I have not been able to replicate is the code below.
So the code below takes a bloomberg ticker (i.e. "VOD LN") and then with DDEInitiate it loads the bloomberg page.
I have read that C# doesn't support DDE or even if it does it is best avoided. In which case how can I do this via C#?
Public Sub LoadBbergPage(string ticker)
' loads bberg page
Dim strExe As String
Dim channelGP As Long
channelGP = DDEInitiate("Winblp", "BBK")
strExe = "<blp-2><home>" & Strings.Trim(ticker) & "<EQUITY><GO>"
DDEExecute channelGP, strExe
DDETerminate channelGP
End Sub
If you're trying to make it easier for your users to launch data into the terminal, you can use 'B-links'. Access it like any other web link. Below is an example for "IBM US Equity" - replace spaces with %20
https://blinks.bloomberg.com/securities/[ticker]/[function]
https://blinks.bloomberg.com/securities/IBM%20US%20Equity/DES
It will ask the user the first time to allow / remember settings and then should launch to terminal. If there are issues, you can go to https://blinks.bloomberg.com/help. Documentation is available on terminal via DOCS BLINKS<GO> (tons more special syntax)
But if you're trying to do some kind of screen scraping etc via DDE, don't bother; just use the Reference Data API instead: https://www.bloomberg.com/professional/support/api-library/

How can I enter an email address into text input field in Edge using Selenium WebDriver?

I have the following program:
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Edge;
namespace ConsoleApplication1
{
static class Program
{
static void Main()
{
//var driver = new ChromeDriver();
var driver = new EdgeDriver();
driver.Manage().Timeouts().ImplicitlyWait(TimeSpan.FromSeconds(20));
driver.Navigate().GoToUrl("http://www.cornelsen.de/shop/registrieren-lehrer");
driver.FindElement(By.Id("email")).SendKeys("dummy#user.de");
}
}
}
When I run this in Chrome or any other browser aside from Edge, then the email adress is entered correctly. But if I try the same thing in Edge, the "#" character is missing. The field displays only "dummyuser.de".
Any idea what I can do?
As a workaround, you can set the input value directly via ExecuteScript():
IWebElement email = driver.FindElement(By.Id("email"));
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
string script = "arguments[0].setAttribute('value', 'arguments[1]');";
js.ExecuteScript(script, email, "dummy#user.de");
Or, what you can do is to create a fake input element with a predefined value equal to the email address. Select the text in this input, copy and paste into the target input.
Not pretty, but should only serve as a workaround:
// create element
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
string script = #"
var el = document.createElement('input');
el.type = 'text';
el.value = 'arguments[0]';
el.id = 'mycustominput';
document.body.appendChild(el);
";
js.ExecuteScript(script, "dummy#user.de");
// locate the input, select and copy
IWebElement myCustomInput = driver.FindElement(By.Id("mycustominput"));
el.SendKeys(Keys.Control + "a"); // select
el.SendKeys(Keys.Control + "c"); // copy
// locate the target input and paste
IWebElement email = driver.FindElement(By.Id("email"));
email.SendKeys(Keys.Control + "v"); // paste
It wasn't as easy as I thought after all. Issues with alecxe's answer:
arguments[0].setAttribute('value', '...'); works only the first time you call it. After calling element.Clear();, it doesn't work any more. Workaround: arguments[0].value='...';
The site doesn't react on the JavaScript call like it would on element.SendKeys();, e.g. change event is not invoked. Workaround: Send the first part of the string up to the last "forbidden" character via JavaScript, the rest via WebElement.SendKeys (in this particular order, bc if you do another JavaScript call to the same field after SendKeys(), there will occur no change event either).
I also realized that there are more "forbidden" characters in Edge, e.g. accented or Eastern European ones (I'm Central European). The problem with 2. is that the last character might be a forbidden character. In this case, I append a whitespace. Which of course affects the test case behavior, but I haven't had any other idea.
Full C# code:
public static void SendKeys(this IWebElement element, TestTarget target, string text)
{
if (target.IsEdge)
{
int index = text.LastIndexOfAny(new[] { '#', 'Ł', 'ó', 'ź' }) + 1;
if (index > 0)
{
((IJavaScriptExecutor) target.Driver).ExecuteScript(
"arguments[0].value='" + text.Substring(0, index) + "';", element);
text = index == text.Length ? Keys.Space : text.Substring(index);
}
}
element.SendKeys(text);
}
This problem used to occur in old browsers. Apparently it returned in Edge.
You can try sending the string in pieces
IWebElement email = driver.FindElement(By.Id("email"));
email.SendKeys("dummy");
email.SendKeys("#");
email.SendKeys("user.de");
Or try using # ASCII code
driver.FindElement(By.Id("email")).SendKeys("dummy" + (char)64 + "user.de");
Try to clear the Text field first.
try following
driver.FindElement(By.Id("email")).clear().SendKeys("dummy#user.de");
Have you tried Copy Paste?
Clipboard.SetText("dummy#user.de");
email.SendKeys(OpenQA.Selenium.Keys.Control + "v");
Hope it could help.
I just added one extra line to click on text field and then send keys, I tried this and its working for me.
Code is written in java, you can change that to any other, if you want.
//INITIALISE DRIVER
WebDriver driver = null;
driver = new FirefoxDriver();
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
driver.navigate().to("http://www.cornelsen.de/shop/registrieren-lehrer");
driver.manage().window().maximize();
//CLICK EMAIL FIELD, JUST TO HAVE FOCUS ON TEXT FIELD
driver.findElement(By.id("email")).click();
driver.findElement(By.id("email")).sendKeys("dummy#user.de");
I'm the Program Manager for WebDriver at Microsoft. I just tried to reproduce your issue on my home machine (Windows 10 build 10586) and couldn't reproduce. Your exact test entered the '#' symbol fine.
You should check if you have the latest version of Windows 10 and WebDriver. If you hit the Windows key and type "winver" and hit enter it'll open a popup with the Windows version info. You want it to say
Microsoft Windows
Version 1511 (OS Build 10586.104)
This is the latest version of Windows 10 released to the public. If you have this version you'll also need the corresponding version of WebDriver found here:
http://www.microsoft.com/en-us/download/details.aspx?id=49962
Note that if the build is 10240 that you're on the original release build. Our November update added substantial support for new features (like finding elements by XPath and more!) along with bug fixes which might explain your issues.
Lastly I should note we have an Insiders release as well for WebDriver to match with the Insiders program. If you're subscribed to the Insiders program and want to see the newer features and bug fixes for WebDriver you can find the download here:
https://www.microsoft.com/en-us/download/details.aspx?id=48740
Note that it currently supports build 10547 which was actually before the November update. It'll be updated very shortly (next couple of days) to support the latest Windows Insiders flight, build 14267.
Sorry but I not agree with the last comment (Program Manager for WebDriver at Microsoft). I can reproduce the problem. This is my configuration:
Target Machine (Hub node where tests are run):
Win 10 build 10585.104
MS Edge 25.10586.0.0
MS EdgeHTML 13.10586
Selenium framework:
SeleniumHQ (for Java): 2.48.0
I am using Selenium Grid to run my suite. In this case, I was only doing conceptual test of Egde implementing a basic test:
1. Start Hub in local machine (Win 7) opening console (administrator privileges)
2. Register Node in Hub in target remote machine (Win 10 build 10585) opening console (in this case without administrator privileges because in other way edge hangs when create new session).
Setting up my grid and checking that everything is ok when I try to write my account name in login page I can not see the # and my basic test fails (wrong credentials).
I have introduced # by hand in the moment edge is opened (interrupt point) and I can see symbol.
I have sent "###############" to the text field and I can not see any. In summary, I have tried many things and I can not see #
When I started with Web Automation Testing using Selenium (Java) I remember this behaviour in old versions of Firefox and Chrome. I not really sure which one but it was reproducible in old version.
This partial basic code (implementated with pageobject) IS WORKING with Firefox 35.0 and Chrome 48.0.2564.109 but NOT IS WORKING with Edge's version I put at the beginning of my comment.
WebElement element = WebDriverExtensions.findElement(context, By.cssSelector("input[name='username'][type='email']"));
element.clear();
element.sendKeys(email);
Front Developers are using AngularJS and are validating user's text input to match with a welformatted email:
I afraid that current Edge version does not support sendkeys with this kind of character, maybe the problem is front on-line validation and Edge has to suits these situations because they are really common.
Best regards
None of the above worked for me with the version 2.52. This worked for me :
EdgeDriver edgeDriver = new EdgeDriver("folder of my edge driver containing MicrosoftWebDriver.exe");
IJavaScriptExecutor js = _edgeDriver as IJavaScriptExecutor;
js.ExecuteScript("document.getElementById('Email').value = 'some#email.com'");
Make sure to replace the ".getElementById('Email')" with what you should use to find your field with javascript and replace the "folder of my edge driver containing MicrosoftWebDriver.exe" with the correct path.
Good luck!

How to click on a link using Webkit Browser?

I want to click on link after navigating to a website
webKitBrowser1.Navigate("http://www.somesite.com");
How to click on a link on this website assuming that the link's id is lnkId ?
Go to Google
In the default browser control that comes with Visual Studio, I can do that using the code below :
foreach (HtmlElement el in webBrowser1.Document.GetElementTagName("a")) {
if (el.GetAttribute("id") == "lnkId") {
el.InvokeMember("click");
}
}
What is the equivalent of the code above when I'm using WebkitDotNet control?
As the WebKit doesn't provide a Click() event (see here for details), you cannot do that in the above way. But a small trick may work as an equivalent of the original winforms way as below:
foreach (Node el in webKitBrowser1.Document.GetElementsByTagName("a"))
{
if (((Element) el).GetAttribute("id") == "lnkId")
{
string urlString = ((Element) el).Attributes["href"].NodeValue;
webKitBrowser1.Navigate(urlString);
}
}
Here what I am doing is casting the WebKit.DOM.Node object to its subclass WebKit.DOM.Element to get its Attributes. Then providing href to the NamedNodeMap, i.e. Attributes as the NodeName, you can easily extract the NodeValue, which is the target url in this case. You can then simply invoke the Navigate(urlString) method on the WebKitBrowser instance to replicate the click event.
I don't work with Windows and all my experience is on Webkit GTK. Following comments are based on that experience.
I am not sure which webkit .NET version you are using. Looks like there are multiple implementations. Assuming you are using the one mentioned by Wasif, you can evaluate javascript as mentioned in the example https://code.google.com/p/open-webkit-sharp/source/browse/JavaScriptExample/Form1.cs.
Actually if implementation is supporting javascript execution then you can do most, if not all the DOM operations. The API functions are usually same as javascript functions and most of the time call exact same functions internally despite of origination. Communication between your application and javascript can be little challenging, but if you can read alert messages, that also can be solved. It looks like this library does support alert handling mechanism. A tool I wrote at https://github.com/nhrdl/notesMD will show some examples of achieving this communication though it uses GTK version and is written in python.
Incidentally if you know the id of the element, then Document.GetElementById will save you the loop.
webKitBrowser1.StringByEvaluatingJavaScriptFromString("var inpt = document.createElement(\"input\"); inpt.setAttribute(\"type\", \"submit\"); inpt.setAttribute(\"id\", \"nut\"); inpt.setAttribute(\"type\", \"submit\"); inpt.setAttribute(\"name\", \"tmp\"); inpt.setAttribute(\"value\", \"tmp\"); var element = document.getElementById(\"lnk\"); element.appendChild(inpt);");
webKitBrowser1.StringByEvaluatingJavaScriptFromString("document.getElementById('nut').click();");

Webbrowser control is not showing Html but shows webpage

I am automating a task using webbrowser control , the site display pages using frames.
My issue is i get to a point , where i can see the webpage loaded properly on the webbrowser control ,but when it gets into the code and i see the html i see nothing.
I have seen other examples here too , but all of those do no return all the browser html.
What i get by using this:
HtmlWindow frame = webBrowser1.Document.Window.Frames[1];
string str = frame.Document.Body.OuterHtml;
Is just :
The main frame tag with attributes like SRC tag etc, is there any way how to handle this?Because as i can see the webpage completely loaded why do i not see the html?AS when i do that on the internet explorer i do see the pages source once loaded why not here?
ADDITIONAL INFO
There are two frames on the page :
i use this to as above:
HtmlWindow frame = webBrowser1.Document.Window.Frames[0];
string str = frame.Document.Body.OuterHtml;
And i get the correct HTMl for the first frame but for the second one i only see:
<FRAMESET frameSpacing=1 border=1 borderColor=#ffffff frameBorder=0 rows=29,*><FRAME title="Edit Search" marginHeight=0 src="http://web2.westlaw.com/result/dctopnavigation.aspx?rs=WLW12.01&ss=CXT&cnt=DOC&fcl=True&cfid=1&method=TNC&service=Search&fn=_top&sskey=CLID_SSSA49266105122&db=AK-CS&fmqv=s&srch=TRUE&origin=Search&vr=2.0&cxt=RL&rlt=CLID_QRYRLT803076105122&query=%22LAND+USE%22&mt=Westlaw&rlti=1&n=1&rp=%2fsearch%2fdefault.wl&rltdb=CLID_DB72585895122&eq=search&scxt=WL&sv=Split" frameBorder=0 name=TopNav marginWidth=0 scrolling=no><FRAME title="Main Document" marginHeight=0 src="http://web2.westlaw.com/result/dccontent.aspx?rs=WLW12.01&ss=CXT&cnt=DOC&fcl=True&cfid=1&method=TNC&service=Search&fn=_top&sskey=CLID_SSSA49266105122&db=AK-CS&fmqv=s&srch=TRUE&origin=Search&vr=2.0&cxt=RL&rlt=CLID_QRYRLT803076105122&query=%22LAND+USE%22&mt=Westlaw&rlti=1&n=1&rp=%2fsearch%2fdefault.wl&rltdb=CLID_DB72585895122&eq=search&scxt=WL&sv=Split" frameBorder=0 borderColor=#ffffff name=content marginWidth=0><NOFRAMES></NOFRAMES></FRAMESET>
UPDATE
The two url of the frames are as follows :
Frame1 whose html i see
http://web2.westlaw.com/nav/NavBar.aspx?RS=WLW12.01&VR=2.0&SV=Split&FN=_top&MT=Westlaw&MST=
Frame2 whose html i do not see:
http://web2.westlaw.com/result/result.aspx?RP=/Search/default.wl&action=Search&CFID=1&DB=AK%2DCS&EQ=search&fmqv=s&Method=TNC&origin=Search&Query=%22LAND+USE%22&RLT=CLID%5FQRYRLT302424536122&RLTDB=CLID%5FDB6558157526122&Service=Search&SRCH=TRUE&SSKey=CLID%5FSSSA648523536122&RS=WLW12.01&VR=2.0&SV=Split&FN=_top&MT=Westlaw&MST=
And the properties of the second frame whose html i do not get are in the picture below:
Thank you
I paid for the solution of the question above and it works 100 %.
What i did was use this function below and it returned me the count to the tag i was seeking which i could not find :S.. Use this to call the function listed below:
FillFrame(webBrowser1.Document.Window.Frames);
private void FillFrame(HtmlWindowCollection hwc)
{
if (hwc == null) return;
foreach (HtmlWindow hw in hwc)
{
HtmlElement getSpanid = hw.Document.GetElementById("mDisplayCiteList_ctl00_mResultCountLabel");
if (getSpanid != null)
{
doccount = getSpanid.InnerText.Replace("Documents", "").Replace("Document", "").Trim();
break;
}
if (hw.Frames.Count > 0) FillFrame(hw.Frames);
}
}
Hope it helps people .
Thank you
For taking html you have to do it that way:
WebClient client = new WebClient();
string html = client.DownloadString(#"http://stackoverflow.com");
That's an example of course, you can change the address.
By the way, you need using System.Net;
This works just fine...gets BODY element with all inner elements:
Somewhere in your Form code:
wb.Url = new Uri("http://stackoverflow.com");
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wbDocumentCompleted);
And here is wbDocumentCompleted:
void wb1DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var yourBodyHtml = wb.Document.Body.OuterHtml;
}
wb is System.Windows.Forms.WebBrowser
UPDATE:
The same as for the document, I think that your second frame is not loaded at the time you check for it's content...You can try solutions from this link. You will have to wait for your frames to be loaded in order to see its content.
The most likely reason is that frame index 0 has the same domain name as the main/parent page, while the frame index 1 has a different domain name. Am I correct?
This creates a cross-frame security issue, and the WB control just leaves you high and dry and doesn't tell you what on earth went wrong, and just leaves your objects, properties and data empty (will say "No Variables" in the watch window when you try to expand the object).
The only thing you can access in this situation is pretty much the URL and iFrame properties, but nothing inside the iFrame.
Of course, there are ways to overcome teh cross-frame security issues - but they are not built into the WebBrowser control, and they are external solutions, depending on which WB control you are using (as in, .NET version or pre .NET version).
Let me know if I have correctly identified your problem, and if so, if you would like me to tell you about the solution tailored to your setup & instance of the WB control.
UPDATE: I have noticed that you're doing a .getElementByTagName("HTML")(0).outerHTML to get the HTML, all you need to do is call this on the document object, or the .body object and that should do it. MyDoc.Body.innerHTML should get the the content you want. Also, notice that there are additional iFrames inside these documents, in case that is of relevance. Can you give us the main document URL that has these two URL's in it so we / I can replicate what you're doing here? Also, not sure why you are using DomElement but you should just cast it to the native object it wants to be cast to, either a IHTMLDocument2 or the object you see in the watch window, which I think is IHTMLFrameElement (if i recall correctly, but you will know what i mean once you see it). If you are trying to use an XML object, this could be the reason why you aren't able to get the HTML content, change the object declaration and casting if there is one, and give it a go & let us know :). Now I'm curious too :).

Categories