I want to create a application which gets the word under the cursor (not only for text fields), but I can't find how to do that. Using OCR is pretty hard. The only thing I've seen working is the Deskperience components. They support a 'native' way, but I they cost a lot. Now I'm trying to figure out what is this 'native' way (maybe somehow of hooking). Any help will be appreciated.
EDIT:
I found a way, but it gets only the whole text of the control. Any idea how to get only the word under the cursor from the whole text?
On recent versions of Windows, the recommended way to gather information from one application to another (if you don't own the targeted application of course) is to use the UI Automation technology.
Wikipedia is pretty good for more information on this: Microsoft UI Automation
Basically, UI automation will use all necessary means to gather what can be gathered
Here is a small console application code that will spy the UI of other apps. Run it and move the mouse over to different applications. Each application has a different support for various "UI automation patterns". For example, there is the Value pattern and the Text pattern as demonstrated here.
static void Main(string[] args)
{
do
{
System.Drawing.Point mouse = System.Windows.Forms.Cursor.Position; // use Windows forms mouse code instead of WPF
AutomationElement element = AutomationElement.FromPoint(new System.Windows.Point(mouse.X, mouse.Y));
if (element == null)
{
// no element under mouse
return;
}
Console.WriteLine("Element at position " + mouse + " is '" + element.Current.Name + "'");
object pattern;
// the "Value" pattern is supported by many application (including IE & FF)
if (element.TryGetCurrentPattern(ValuePattern.Pattern, out pattern))
{
ValuePattern valuePattern = (ValuePattern)pattern;
Console.WriteLine(" Value=" + valuePattern.Current.Value);
}
// the "Text" pattern is supported by some applications (including Notepad)and returns the current selection for example
if (element.TryGetCurrentPattern(TextPattern.Pattern, out pattern))
{
TextPattern textPattern = (TextPattern)pattern;
foreach(TextPatternRange range in textPattern.GetSelection())
{
Console.WriteLine(" SelectionRange=" + range.GetText(-1));
}
}
Thread.Sleep(1000);
Console.WriteLine(); Console.WriteLine();
}
while (true);
}
UI automation is actually supported by Internet Explorer and Firefox, but not by Chrome to my knowledge. See this link: When will Google Chrome be accessible?
Now, this is just the beginning of work for you :-), because:
Most of the time, all this has heavy security implication. Using this technology (or direct Windows technology such as WindowFromPoint) will require sufficient rights to do so (such as being an administrator). And I don't think DExperience has any way to overcome these limitations, unless they install a kernel driver on the computer.
Some applications will not expose anything to anyone, even with proper rights. For example, if I'm writing a banking application, I don't want you to spy on what my application will display :-). Other applications such as Outlook with DRM will not expose anything for the same reasons.
Only the UI automation Text pattern support can give more information (like the word) than just the whole text. Alas, this specific pattern is not supported by IE nor FF even if they support UI automation globally.
So, if all this does not work for you, you will have to dive deeper and use OCR or Shape recognition techniques. Even with this, there will be some cases where you won't be able to do it at all (because of security rights).
This is non-trivial if the application you want to "spy" on is drawing the text themselves. One possible solution is to trigger the other application to paint a portion of it's window by invalidating the area directly under the cursor.
When the other application paints, you will have to intercept the text drawing calls. One way to do so is to inject code in the other application, and intercept calls into GDI functions that draw text. When you debug native applications, this is what visual studio does to implement breakpoints. To test the idea you could use a library like detours (but that's not free for commercial use).
You could also check if the application supports one of the accessability API's that are in Windows to facilitate things like screen readers for blind people.
One word of caution: I have not done any of this myself.
If the app need to handle not only .Net apps I would start with importing functions (P/Invoke):
WindowFromPoint
ChildWindowFromPointEx
Later you can iterate over the controls and try to get the text from inside based on type. If I will find some time I will try to publish such code.
After some checking it looks like the best way (unfortunately the hard also) is to hook into GDI text rendering some discussion
I'd echo what Patricker said, but I think there is no reliable way to do what you want.
You probably obtained the window text or something like that. But what if the cursor is over a window that doesn't use the window text to store its content? Windows are under no obligation to store their data in a particular way.
This ends up pointing you towards character recognition where you look at the pixels under the cursor and try and figure out what words are there. But not only is this very non-trivial, it also is not foolproof. What if part of the word is not visible because it extends out of the window?
This is definitely not trivial. There are a couple of ways to approach it. But there is no reliable way that will work with all windows.
There is an sdk for getting the text using OCR. It's not free but it's quite cheap compared to other products: http://www.screenocr.com/screen-ocr-library-sdk.htm They have an application which provides the same features so you can try the demo too.
To achieve this you need a multi-pronged approach.
UIA does work in many applications but you need to experiment to see where the text is returned. It may be in Element, Value or Range. There is no consistency even across office applications.
If UIA fails then enumerate the running object table (ROT) and retreive the COM pointers to various apps registered in the ROT. You can then cast these pointers to the underlying office types:
for example:
enumerate ROT - then
wb = (Excel._Workbook)enumerator.Value;
string strText = wb.Application.ActiveCell.Text.ToString();
If the above two methods fail then make use of the free OCR system in MODI (Microsoft Office Document Imaging 12.0 Type Library)
Related
I am struggling to find a reliable way to get the content/text of the window that is currently in the foreground. It should be able to determine the text from every possible program that a user is currently using, if possible
What I tried:
Take a screenshot of the currently active window, apply some filters and run an OCR algorithm (tesseract .Net wrapper). This works, but takes a long time and is not very accurate.
Then I tried some Windows API functions (FindWindow and SendMessage), as described here. I could make it run for the standard Editor (notepad) for example, but not for most other programs
I also tried to make it work with AutoHotKey and the WinGetText function and again a .Net Wrapper. Here, I just get some info about the window, but in no way the text of it...
Unfortunately, now, I don't have any other idea what to do as I am stuck in every way... Does someone have experience with this or knows a way that works? Any suggestion is really much appreciated
It will be difficult to find a single solution to retrieve text from applications. Different methods for different programs will be required.
For AutoHotkey, AccViewer, which makes use of Acc.ahk is the best method of first resort. Acc works on a large variety of controls and also elements within controls, it can cover far more control types than AutoHotkey's ControlGet command.
Acc Library [AHK_L] (updated 09/27/2012) - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/77303-acc-library-ahk-l-updated-09272012/
Accessible Info Viewer - Alpha Release (2012-09-20) - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/77888-accessible-info-viewer-alpha-release-2012-09-20/
A link describing some further text retrieval methods:
AutoHotKey ControlGet
Note also:
COM (Component Object Model), is handled natively by AutoHotkey. It can be used to retrieve the text from web elements in Internet Explorer, and via VBA code, text can be retrieved from MS Office programs such as MS Excel and MS Word.
So I'm writing an accesibility app that needs to know the location of the text entry caret. I tried GUIThreadInfo, but while that works in basic apps like Notepad, it fails in more complex ones like Chrome, iTunes, etc. that handle their own UI.
Is there even a way to get the caret position from apps like this?
Yes, doesn't work. The caret is an implementation detail of user32, associated with a window. Applications like browsers don't use window controls, far too expensive. And they don't have to, there's a separate API to allow such programs to provide an interface to accessibility apps like screen readers. Start reading here. Not easy to use from a C# app, this project can lessen the pain. No endorsement, never actually used it myself.
all.
i wanna write a tool for GUI automation which can locate text label on current screen(the absolute location) so that i can drive the mouse cursor to click on it.
The signature of the needed function should like this:
Point GetTextCoordination(string text)
Any one have an idea how implement this? I don't wanna use the OCR or computer vision technology for performance issue. Is the hooking of the TextOut win32api function a feasible way?
I don't think that hooking the TextOut function is a feasible solution (though it's certainly possible). You don't have any guarantee that the text you want to find was drawn using this function. Trying to use OCR would be similarly fraught with difficulties.
I suspect that for your purposes, it would be sufficient to enumerate the windows of the target application (using GetWindow and related functions) and examing the text of each (using GetWindowText) to look for the one you want. That would give you a window handle and from that you could get the window boundaries or send it a message directly.
You want to use a GUI automation toolkit, such as the UIAutomation library or the white library (which is a wrapper around UIAutomation), or AutoIT.
(Alternatively there are commercial tools for this - if you are looking into setting up a test automation program, then you'd be better off with one of the commercial tools, as they have a lot of features that make this kind of thing easier.)
I think it's possible to somehow hook with the windows environment (specifically explorer.exe) and trigger specific things, for example launching control panel and using it as if I had mouse (meaning I'm clicking the interface from the code).
Basically what I'm trying to do is automate some redundant tasks I do often, just I don't know how it's done, or even how it's called. Anyone can point me in right direction?
Thanks!
Forget about "automated clicking". GUI tools are just front-ends to control the system. You can control the system like they do, it will be much easier.
Huge possibilities can give you Microsoft Management Console. Each "snap-in" can be accessed via COM model. Some of them have GUI front-ends, find and fire "*.msc" files (somewhere in Windows directory) to try them.
There is many command line tools i.e. "net" command has huge abilities related to networking.
PowerShell may be a better choice instead of C# or C++, it's designed for task automation. You can easily use COM, .NET, MMC ...
Windows Explorer has a COM object model that you can call from both C# and C++. (Most of the examples on MSDN are in Javascript or VBScript, which I guess aren't your languages of choice, but they demonstrate that the API is straightforward to call.)
AutoHotKey is a scripting environment specifically designed for this sort of task
If you want mostly to launch control panel you can do using RunDll32 interface existing in the most control panel applets. See http://www.osattack.com/windows-7/huge-list-of-windows-7-shell-commands/ , http://support.microsoft.com/kb/167012 or http://www.winvistaclub.com/t57.html for example. For the corresponding API see http://support.microsoft.com/kb/164787.
Another option is usage of control.exe (see http://msdn.microsoft.com/en-us/library/cc144191.aspx and http://vlaurie.com/computers2/Articles/control.htm).
If you google more you will find much more examples which you can to automate a lot of things without using of some general ways to automate GUI.
At more or less the lowest level within Win32, you can use the SendMessage() API to send raw click messages to windows of interest. This will rely on a lot of intrusive knowledge about the apps you intend to drive. However, you could easily implement a "click recorder" that could replay click sequences captured from user interaction.
my main language is vb/c#.net and I'd like to make a console program but with a menu system.
If any of you have worked with "dos" like programs or iSeries from IBM then thats the style I am going for.
so, was wondering if anyone knows of a "winforms" library that will make my form look like this. I dont mind a "fake winforms look" or a console application but thats how I'd like.
I've used iSeries extensively and I remember exactly what you're talking about. To simulate this look and feel in a C# app, you'll want to create a console project and write text to different areas of the screen with the help of the Console.CursorTop and Console.CursorLeft properties, then calling Console.Write or Console.WriteLine to write out the text in the previously set position. To change colors, before calling WriteLine you'll want to use the Console.ForegroundColor and Console.BackgroundColor properties.
You'll need to listen for input and upon finding a tab character, your program can use its own internal logic to determine where the cursor should appear next (on the next line in the same column, for instance, to simulate those left columns of input fields in your screenshot).
Doing this with a Windows Forms app will be a little trickier and you'd definitely want to write your own control for it (possibly sub-classed from one of the many types of standard multi-line text controls already available).
It's a good question. For many Use Cases the standard Windows (or other windowing) paradigm can be overkill, intimidating, and confusing.
Back in DOS days there were a number of "Windowing" libraries that created various abstractions for doing this.
[After Googling]
Here's a site that lists various libraries including a several that appear to be of interest.
A resource like this would also be handy for Mobile apps, where mouse-driven window apps tend to be not the best fit, especially for workflow-type processes. The Console is a pretty universal lowest-common-denominator abstraction available in most every environment.
You are looking for a curses like library but for windows. And usable from VB & C#.
Curses provides for a even richer text based UI than even iSeries. All sorts of widgetry!
Windows is not really supportive of text interfaces whether on purpose or not so are out of luck.
But ...
Well, how about MonoCurses? I don't know if it will work though. Also look at PDCurses.
And if you don't mind using Python for just the front-end see this.
There are a couple of webifiers or screen scraping programs for iSeries that will create a web or windows user interface on top of your iSeries application. I have never used any of those so there is not a particular one that I can recommend, but you might want to look their for inspiration or reuse.