all.
i wanna write a tool for GUI automation which can locate text label on current screen(the absolute location) so that i can drive the mouse cursor to click on it.
The signature of the needed function should like this:
Point GetTextCoordination(string text)
Any one have an idea how implement this? I don't wanna use the OCR or computer vision technology for performance issue. Is the hooking of the TextOut win32api function a feasible way?
I don't think that hooking the TextOut function is a feasible solution (though it's certainly possible). You don't have any guarantee that the text you want to find was drawn using this function. Trying to use OCR would be similarly fraught with difficulties.
I suspect that for your purposes, it would be sufficient to enumerate the windows of the target application (using GetWindow and related functions) and examing the text of each (using GetWindowText) to look for the one you want. That would give you a window handle and from that you could get the window boundaries or send it a message directly.
You want to use a GUI automation toolkit, such as the UIAutomation library or the white library (which is a wrapper around UIAutomation), or AutoIT.
(Alternatively there are commercial tools for this - if you are looking into setting up a test automation program, then you'd be better off with one of the commercial tools, as they have a lot of features that make this kind of thing easier.)
Related
I am struggling to find a reliable way to get the content/text of the window that is currently in the foreground. It should be able to determine the text from every possible program that a user is currently using, if possible
What I tried:
Take a screenshot of the currently active window, apply some filters and run an OCR algorithm (tesseract .Net wrapper). This works, but takes a long time and is not very accurate.
Then I tried some Windows API functions (FindWindow and SendMessage), as described here. I could make it run for the standard Editor (notepad) for example, but not for most other programs
I also tried to make it work with AutoHotKey and the WinGetText function and again a .Net Wrapper. Here, I just get some info about the window, but in no way the text of it...
Unfortunately, now, I don't have any other idea what to do as I am stuck in every way... Does someone have experience with this or knows a way that works? Any suggestion is really much appreciated
It will be difficult to find a single solution to retrieve text from applications. Different methods for different programs will be required.
For AutoHotkey, AccViewer, which makes use of Acc.ahk is the best method of first resort. Acc works on a large variety of controls and also elements within controls, it can cover far more control types than AutoHotkey's ControlGet command.
Acc Library [AHK_L] (updated 09/27/2012) - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/77303-acc-library-ahk-l-updated-09272012/
Accessible Info Viewer - Alpha Release (2012-09-20) - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/77888-accessible-info-viewer-alpha-release-2012-09-20/
A link describing some further text retrieval methods:
AutoHotKey ControlGet
Note also:
COM (Component Object Model), is handled natively by AutoHotkey. It can be used to retrieve the text from web elements in Internet Explorer, and via VBA code, text can be retrieved from MS Office programs such as MS Excel and MS Word.
i need to block any screen capture software on the computer from taking screen shots. Since all of them are work on standard API-functions, i think i could monitor and block them.
I need to use C#.
All i have found is how to monitor and block them in a certain program (screen capture program). They are looking for a function in the program, then they change it address on mine function address.
But how can i do it, if i haven't any certain programs? I need to block anyone which tries to take a screenshot.
If your final goal is possible or not I don't know, but for the hooking the API portion I can help you out.
I have used the library EasyHook many times in the past, this will let you hook and intercept system function calls from C# code fairly easily. Just read through the PDF tutorial for setup instructions.
For actually finding the API's I recommend Rohitab's API Monitor, it's still in Alpha stages but it works really well and is free. You just hook it on to a processes and it tells you every external DLL call it makes (with the parameters it passed if you have the xml definition file for the DLL, the program comes with almost all of the windows API dll's pre-defined).
The combination of EasyHook and API Monitor is a great 1-2 punch for mucking with other program's calls.
It is not possible to prevent screenshots from being taken. The battle is already lost because of the DWM (Desktop Window Manager). It's lower level than Win32 and device contexts.
If you want to protect the text in your program, there are a lot easier ways to extract it than doing screenshots and OCR. TextOut and/or Direct2D hooking and accessibility APIs.
If there's a lot of IP in your program. Then don't make it all available onscreen. Make sure it's tedious to crawl the GUI for text, and hard to automate it. And don't load whole texts in memory of the program.
Possible solutions:
1. To prevent copying of text. Draw the text as an image.
2. To prevent accessibility technologies, like screen readers - override WndProc in your control, handle and ignore the window message WM_GETOBJECT.
3. To make it harder if they try to use OCR. Draw graphics behind the text. Human readable, but much harder for a machine to interpret it.
Neither of these methods are invasive for the user.
** A very invasive suggestion **:
If you are really serious about preventing anyone from "stealing" your content.
Implement mouse and keyboard hooks. Filter out typical copy shortcuts. Prevent the mouse from leaving the boundaries of your application.
Allow your application to only run when the OS runs well-known processes and services.
If any process starts which you don't recognize, black out the application and notify the user about it, and request the user to close it. And ofc make sure someone is not just spoofing a well-known process.
Monitor the clipboard as you suggested yourself.
You can ofc soften some of these suggestions based on the context of your application.
As Scott just posted it likely can be prevented with API hooks to see that paint events only go to desktop bound handles and not others, and refuse to paint otherwise. However, you need to consider the following scenarios and see if they're relevant threat to your approach or not:
Your software may be running in a virtual machine like VMWare. Such software has capapbilities to capture screen that does so at "virtual hardware" level, and your API hooks will not be able to discern it - and this would be the easiest way approach if I wanted to bypass your protections.
As a post suggests here, nothing also prevents someone to take monitor cable and plug it into another computer's capture card, and take screenshot that way. Again, your hooks will be helpless here.
Bottom line, you can make it somewhat harder to do, but bypassing such protection may be pretty trivial thing to do.
My 2c.
Does anyone know how to apply effects to the entire screen, in c# or any programming language.
I'm mostly interested in making the screen monochrome (specifically green-white instead of black white).
I know a cross-graphic card solution is possible because I found a program that can do it:
http://www.freedomscientific.com/products/lv/magic-bl-product-page.asp
Anyone knows how to accomplish something this or where to look?
Thanks !!
There is no easy Windows API to modify the entire screen contents. But this could be done at the device driver level.
Otherwise you have to resort to some Windows API tricks: place a "fake" window over the entire desktop, in a loop: grab the entire screen contents without grabbing fake window contents, do your processing to get the monochrome effect, then display that on the fake window. Yes, it's hacky and slow, but possible. Even more hacky, when you get mouse clicks to "go through" the fake window (lots of SetWindowsRgn calls).
So cross-platform here means using GDI, though some older DirectDraw APIs might work, in that case, you have a much easier time with hardware overlays (and better performance). Though I'm not sure how many cards actually support hardware overlays, and if newer versions of windows support the older DirectDraw APIs.
One more possibility is if the video card has a C# or C++ or C API, then you can do whatever you want with the card without writing device driver code.
Then there's CUDA, but I haven't yet tried that out. I know it's for stream processing on nVidia boards, but I wonder if it could get you an easy backdoor into the video display stuff.
To help people in the future who are interested in this:
This is possible with the Magnification API's color effect method. This allows one to use a matrix that can be applied to the whole screen.
NegativeScreen is an open source project that implements the feature you are describing in C#.
Unfortunately, this only works with affine transformations, as the API takes only an augmented matrices rather than a delegate or something.
I want to create a application which gets the word under the cursor (not only for text fields), but I can't find how to do that. Using OCR is pretty hard. The only thing I've seen working is the Deskperience components. They support a 'native' way, but I they cost a lot. Now I'm trying to figure out what is this 'native' way (maybe somehow of hooking). Any help will be appreciated.
EDIT:
I found a way, but it gets only the whole text of the control. Any idea how to get only the word under the cursor from the whole text?
On recent versions of Windows, the recommended way to gather information from one application to another (if you don't own the targeted application of course) is to use the UI Automation technology.
Wikipedia is pretty good for more information on this: Microsoft UI Automation
Basically, UI automation will use all necessary means to gather what can be gathered
Here is a small console application code that will spy the UI of other apps. Run it and move the mouse over to different applications. Each application has a different support for various "UI automation patterns". For example, there is the Value pattern and the Text pattern as demonstrated here.
static void Main(string[] args)
{
do
{
System.Drawing.Point mouse = System.Windows.Forms.Cursor.Position; // use Windows forms mouse code instead of WPF
AutomationElement element = AutomationElement.FromPoint(new System.Windows.Point(mouse.X, mouse.Y));
if (element == null)
{
// no element under mouse
return;
}
Console.WriteLine("Element at position " + mouse + " is '" + element.Current.Name + "'");
object pattern;
// the "Value" pattern is supported by many application (including IE & FF)
if (element.TryGetCurrentPattern(ValuePattern.Pattern, out pattern))
{
ValuePattern valuePattern = (ValuePattern)pattern;
Console.WriteLine(" Value=" + valuePattern.Current.Value);
}
// the "Text" pattern is supported by some applications (including Notepad)and returns the current selection for example
if (element.TryGetCurrentPattern(TextPattern.Pattern, out pattern))
{
TextPattern textPattern = (TextPattern)pattern;
foreach(TextPatternRange range in textPattern.GetSelection())
{
Console.WriteLine(" SelectionRange=" + range.GetText(-1));
}
}
Thread.Sleep(1000);
Console.WriteLine(); Console.WriteLine();
}
while (true);
}
UI automation is actually supported by Internet Explorer and Firefox, but not by Chrome to my knowledge. See this link: When will Google Chrome be accessible?
Now, this is just the beginning of work for you :-), because:
Most of the time, all this has heavy security implication. Using this technology (or direct Windows technology such as WindowFromPoint) will require sufficient rights to do so (such as being an administrator). And I don't think DExperience has any way to overcome these limitations, unless they install a kernel driver on the computer.
Some applications will not expose anything to anyone, even with proper rights. For example, if I'm writing a banking application, I don't want you to spy on what my application will display :-). Other applications such as Outlook with DRM will not expose anything for the same reasons.
Only the UI automation Text pattern support can give more information (like the word) than just the whole text. Alas, this specific pattern is not supported by IE nor FF even if they support UI automation globally.
So, if all this does not work for you, you will have to dive deeper and use OCR or Shape recognition techniques. Even with this, there will be some cases where you won't be able to do it at all (because of security rights).
This is non-trivial if the application you want to "spy" on is drawing the text themselves. One possible solution is to trigger the other application to paint a portion of it's window by invalidating the area directly under the cursor.
When the other application paints, you will have to intercept the text drawing calls. One way to do so is to inject code in the other application, and intercept calls into GDI functions that draw text. When you debug native applications, this is what visual studio does to implement breakpoints. To test the idea you could use a library like detours (but that's not free for commercial use).
You could also check if the application supports one of the accessability API's that are in Windows to facilitate things like screen readers for blind people.
One word of caution: I have not done any of this myself.
If the app need to handle not only .Net apps I would start with importing functions (P/Invoke):
WindowFromPoint
ChildWindowFromPointEx
Later you can iterate over the controls and try to get the text from inside based on type. If I will find some time I will try to publish such code.
After some checking it looks like the best way (unfortunately the hard also) is to hook into GDI text rendering some discussion
I'd echo what Patricker said, but I think there is no reliable way to do what you want.
You probably obtained the window text or something like that. But what if the cursor is over a window that doesn't use the window text to store its content? Windows are under no obligation to store their data in a particular way.
This ends up pointing you towards character recognition where you look at the pixels under the cursor and try and figure out what words are there. But not only is this very non-trivial, it also is not foolproof. What if part of the word is not visible because it extends out of the window?
This is definitely not trivial. There are a couple of ways to approach it. But there is no reliable way that will work with all windows.
There is an sdk for getting the text using OCR. It's not free but it's quite cheap compared to other products: http://www.screenocr.com/screen-ocr-library-sdk.htm They have an application which provides the same features so you can try the demo too.
To achieve this you need a multi-pronged approach.
UIA does work in many applications but you need to experiment to see where the text is returned. It may be in Element, Value or Range. There is no consistency even across office applications.
If UIA fails then enumerate the running object table (ROT) and retreive the COM pointers to various apps registered in the ROT. You can then cast these pointers to the underlying office types:
for example:
enumerate ROT - then
wb = (Excel._Workbook)enumerator.Value;
string strText = wb.Application.ActiveCell.Text.ToString();
If the above two methods fail then make use of the free OCR system in MODI (Microsoft Office Document Imaging 12.0 Type Library)
I think it's possible to somehow hook with the windows environment (specifically explorer.exe) and trigger specific things, for example launching control panel and using it as if I had mouse (meaning I'm clicking the interface from the code).
Basically what I'm trying to do is automate some redundant tasks I do often, just I don't know how it's done, or even how it's called. Anyone can point me in right direction?
Thanks!
Forget about "automated clicking". GUI tools are just front-ends to control the system. You can control the system like they do, it will be much easier.
Huge possibilities can give you Microsoft Management Console. Each "snap-in" can be accessed via COM model. Some of them have GUI front-ends, find and fire "*.msc" files (somewhere in Windows directory) to try them.
There is many command line tools i.e. "net" command has huge abilities related to networking.
PowerShell may be a better choice instead of C# or C++, it's designed for task automation. You can easily use COM, .NET, MMC ...
Windows Explorer has a COM object model that you can call from both C# and C++. (Most of the examples on MSDN are in Javascript or VBScript, which I guess aren't your languages of choice, but they demonstrate that the API is straightforward to call.)
AutoHotKey is a scripting environment specifically designed for this sort of task
If you want mostly to launch control panel you can do using RunDll32 interface existing in the most control panel applets. See http://www.osattack.com/windows-7/huge-list-of-windows-7-shell-commands/ , http://support.microsoft.com/kb/167012 or http://www.winvistaclub.com/t57.html for example. For the corresponding API see http://support.microsoft.com/kb/164787.
Another option is usage of control.exe (see http://msdn.microsoft.com/en-us/library/cc144191.aspx and http://vlaurie.com/computers2/Articles/control.htm).
If you google more you will find much more examples which you can to automate a lot of things without using of some general ways to automate GUI.
At more or less the lowest level within Win32, you can use the SendMessage() API to send raw click messages to windows of interest. This will rely on a lot of intrusive knowledge about the apps you intend to drive. However, you could easily implement a "click recorder" that could replay click sequences captured from user interaction.