FindWindowByClassNameAndRegex issue with special characters - c#

For our software testing we have a test that lets us check to see if certain windows are open using the FindWindowByClassNameAndRegex P/Invoke call. The issue we're getting is when we have windows open with more than a certain number of special characters we always get IntPtr.Zero as a return. Are there any known issues with this? Here's some of the code we use to find the window: (in this case it's for a firefox window)
Regex windowTitleRegex = new Regex(Regex.Escape(fullWindowTitle).Replace("\?", "."), RegexOptions.IgnoreCase | RegexOptions.ECMAScript);
curWindowHandle = NativeMethods.FindWindowByClassNameAndRegex("MozillaUIWindowClass", windowTitleRegex);
Where the title of the window is ~`!##$%^&*()_-+={[}]|:;'<,>.?/\"àëÉùÙâÏûâÏûÊÛçîÀË«éïÂλœÇÔêôÈŒ\
(There's no actual line break it's just a formatting thing)

There is no Windows API function by that name. I'm guessing you've found some DLL that exports this function. Odds are always good that the regex this DLL uses is of some kind that doesn't quite match the syntax that .NET's Regex class uses. There are a lot of dialects.
Best thing to do is to pinvoke EnumWindows(). You can use your own Regex in the callback to filter, GetClassName() gets you the window class name. If you already know the window name then just use FindWindow().

Related

Running AutoIT scripts in c#.Net - issue with WinWaitActive

I am building a windows form application using c#.NET and would like to use AutoITX3.dll to run/perform simple window scripts. I have properly referenced the AutoItX3Lib and following is my code.
I tab through the program to make window "Title" active and yet, the program wouldn't continue with the script and never prints the "ITS ACTIVE" line. It seems to get stuck at WinWaitActive and I am unsure why.
autoit = new AutoItX3Lib.AutoItX3Class();
autoit.AutoItSetOption("WinTitleMatchMode", 2);
autoit.WinActivate("Title");
autoit.WinWaitActive("Title");
System.Console.WriteLine("ITS ACTIVE");
Okay, here's the deal with WinWaitActive("sometitle"), it will freeze if it can't see the title -- that is, it will wait indefinitely for the window to become active, because it's not seeing it with the parameters that you specified.
Solution:
Use the AutoIt inspector to get the title of the window, and place all that code you have into an actual AutoIt script (So you can rule out the possibility of there being an error with the library itself). Then, use that AutoIt script and tune it.
Sometimes, we have trouble matching AutoIt titles with, so what you can do is something like the following:
While( True )
$winTitle = WinGetTitle("[active]")
$matchText = "Title" ; put the title you want to match here
$match = WinActive($matchText) ; If this is 0, then it didn't match.
ConsoleWrite($winTitle & " : " & $match)
WEnd
I noticed that you also want to match substrings of the title. I've found it's best to leave that option alone and use regular expressions to match the title.
If you post the title, I'll help you find a regexp or some code to match it and give you the full code to use that in C#.
[Note: I haven't verified any of the code I posted, so I can't guarantee that it will be error-free. I'm on a Mac & I can't test it right now]
Good luck!

Regular expression for valid filename

I already gone through some question in StackOverflow regarding this but nothing helped much in my case.
I want to restrict the user to provide a filename that should contain only alphanumeric characters, -, _, . and space.
I'm not good in regular expressions and so far I came up with this ^[a-zA-Z0-9.-_]$. Can somebody help me?
This is the correct expression:
string regex = #"^[\w\-. ]+$";
\w is equivalent of [0-9a-zA-Z_].
To validate a file name i would suggest using the function provided by C# rather than regex
if (filename.IndexOfAny(System.IO.Path.GetInvalidFileNameChars()) != -1)
{
}
While what the OP asks is close to what the currently accepted answer uses (^[\w\-. ]+$), there might be others seeing this question who has even more specific constraints.
First off, running on a non-US/GB machine, \w will allow a wide range of unwanted characters from foreign languages, according to the limitations of the OP.
Secondly, if the file extension is included in the name, this allows all sorts of weird looking, though valid, filenames like file .txt or file...txt.
Thirdly, if you're simply uploading the files to your file system, you might want a blacklist of files and/or extensions like these:
web.config, hosts, .gitignore, httpd.conf, .htaccess
However, that is considerably out of scope for this question; it would require all sorts of info about the setup for good guidance on security issues. I thought I should raise the matter none the less.
So for a solution where the user can input the full file name, I would go with something like this:
^[a-zA-Z0-9](?:[a-zA-Z0-9 ._-]*[a-zA-Z0-9])?\.[a-zA-Z0-9_-]+$
It ensures that only the English alphabet is used, no beginning or trailing spaces, and ensures the use of a file extension with at least 1 in length and no whitespace.
I've tested this on Regex101, but for future reference, this was my "test-suite":
## THE BELOW SHOULD MATCH
web.config
httpd.conf
test.txt
1.1
my long file name.txt
## THE BELOW SHOULD NOT MATCH - THOUGH VALID
æøå.txt
hosts
.gitignore
.htaccess
In case someone else needs to validate filenames (including Windows reserved words and such), here's a full expression:
\A(?!(?:COM[0-9]|CON|LPT[0-9]|NUL|PRN|AUX|com[0-9]|con|lpt[0-9]|nul|prn|aux)|[\s\.])[^\\\/:*"?<>|]{1,254}\z
Extended expression (don't allow filenames starting with 2 dots, don't allow filenames ending in dots or whitespace):
\A(?!(?:COM[0-9]|CON|LPT[0-9]|NUL|PRN|AUX|com[0-9]|con|lpt[0-9]|nul|prn|aux)|\s|[\.]{2,})[^\\\/:*"?<>|]{1,254}(?<![\s\.])\z
Edit:
For the interested, here's a link to Windows file naming conventions:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
use this regular expression ^[a-zA-Z0-9._ -]+$
This is a minor change to Engineers answer.
string regex = #"^[\w\- ]+[\w\-. ]*$"
This will block ".txt" which isn't valid.
Trouble is, it does block "..txt" which is valid
For full character set (Unicode) use
^[\p{L}0-9_\-.~]+$
or perhaps
^[\p{L}\p{N}_\-.~]+$
would be more accurate if we are talking about Unicode.
I added a '~' simply because I have some files using that character.
I've just created this. It prevents two dots and dot at end and beginning. It doesn't allow any two dots though.
^([a-zA-Z0-9_]+)\.(?!\.)([a-zA-Z0-9]{1,5})(?<!\.)$
When used in HTML5 via pattern:
<form action="" method="POST">
<fieldset>
<legend>Export Configuration</legend>
<label for="file-name">File Name</label>
<input type="text" required pattern="^[\w\-. ]+$" id="file-name" name="file_name"/>
</fieldset>
<button type="submit">Export Settings</button>
</form>
This will validate against all valid file names. You can remove required to prevent the native HTML5 validation.
I may be saying something stupid here, but it seems to me that these answers aren't correct. Firstly, are we talking Linux or Windows here (or another OS)?
Secondly, in Windows it is (I believe) perfectly legitimate to include a "$" in a filename, not to mention Unicode in general. It certainly seems possible.
I tried to get a definitive source on this... and ending up at the Wikip Filename page: in particular the section "Reserved characters and words" seems relevant: and these are, clearly, a list of things which you are NOT allowed to put in.
I'm in the Java world. And I naturally assumed that Apache Commons would have something like validateFilename, maybe in FilenameUtils... but it appears not (if it had done, this would still be potentially useful to C# programmers, as the code is usually pretty easy to understand, and could therefore be translated). I did do an experiment, though, using the method normalize: to my disappointment it allowed perfectly invalid characters (?, etc.) to "pass".
The part of the Wikip Filename page referenced above shows that this question depends on the OS you're using... but it should be possible to concoct some simple regex for Linux and Windows at least.
Then I found a Java way (at least):
Path path = java.nio.file.FileSystems.getDefault().getPath( 'bobb??::mouse.blip' );
output:
java.nio.file.InvalidPathException: Illegal char at index 4:
bobb??::mouse.blip
... presumably different FileSystem objects will have different validation rules
Copied from #Engineer for future reference as the dot was not escaped (as it should) in the most voted answer.
This is the correct expression:
string regex = #"^[\w\-\. ]+$";

How to detect a C++ identifier string?

E.g:
isValidCppIdentifier("_foo") // returns true
isValidCppIdentifier("9bar") // returns false
isValidCppIdentifier("var'") // returns false
I wrote some quick code but it fails:
my regex is "[a-zA-Z_$][a-zA-Z0-9_$]*"
and I simply do regex.IsMatch(inputString).
Thanks..
It should work with some added anchoring:
"^[a-zA-Z_][a-zA-Z0-9_]*$"
If you really need to support ludicrous identifiers using Unicode, feel free to read one of the various versions of the standard and add all the ranges into your regexp (for example, pages 713 and 714 of http://www-d0.fnal.gov/~dladams/cxx_standard.pdf)
Matti's answer will work to sanitize identifiers before inserting into C++ code, but won't handle C++ code as input very well. It will be annoying to separate things like L"wchar_t string", where L is not an identifier. And there's Unicode.
Clang, Apple's compiler which is built on a philosophy of modularity, provides a set of tokenizer functions. It looks like you would want clang_createTranslationUnitFromSourceFile and clang_tokenize.
I didn't check to see if it handles \Uxxxx or anything. Can't make any kind of gurarantees. Last time I used LLVM was five years ago and it wasn't the greatest experience… but not the worst either.
On the other hand, GCC certainly has it, although you have to figure out how to use cpp_lex_direct.

Regex to parse C/C++ functions declarations

I need to parse and split C and C++ functions into the main components (return type, function name/class and method, parameters, etc).
I'm working from either headers or a list where the signatures take the form:
public: void __thiscall myClass::method(int, class myOtherClass * )
I have the following regex, which works for most functions:
(?<expo>public\:|protected\:|private\:) (?<ret>(const )*(void|int|unsigned int|long|unsigned long|float|double|(class .*)|(enum .*))) (?<decl>__thiscall|__cdecl|__stdcall|__fastcall|__clrcall) (?<ns>.*)\:\:(?<class>(.*)((<.*>)*))\:\:(?<method>(.*)((<.*>)*))\((?<params>((.*(<.*>)?)(,)?)*)\)
There are a few functions that it doesn't like to parse, but appear to match the pattern. I'm not worried about matching functions that aren't members of a class at the moment (can handle that later). The expression is used in a C# program, so the <label>s are for easily retrieving the groups.
I'm wondering if there is a standard regex to parse all functions, or how to improve mine to handle the odd exceptions?
C++ is notoriously hard to parse; it is impossible to write a regex that catches all cases. For example, there can be an unlimited number of nested parentheses, which shows that even this subset of the C++ language is not regular.
But it seems that you're going for practicality, not theoretical correctness. Just keep improving your regex until it catches the cases it needs to catch, and try to make it as stringent as possible so you don't get any false matches.
Without knowing the "odd exceptions" that it doesn't catch, it's hard to say how to improve the regex.
Take a look at Boost.Spirit, it is a boost library that allows the implementation of recursive descent parsers using only C++ code and no preprocessors. You have to specify a BNF Grammar, and then pass a string for it to parse. You can even generate an Abstract-Syntax Tree (AST), which is useful to process the parsed data.
The BNF specification looks like for a list of integers or words separated might look like :
using spirit::alpha_p;
using spirit::digit_p;
using spirit::anychar_p;
using spirit::end_p;
using spirit::space_p;
// Inside the definition...
integer = +digit_p; // One or more digits.
word = +alpha_p; // One or more letters.
token = integer | word; // An integer or a word.
token_list = token >> *(+space_p >> token) // A token, followed by 0 or more tokens.
For more information refer to the documentation, the library is a bit complex at the beginning, but then it gets easier to use (and more powerful).
No. Even function prototypes can have arbitrary levels of nesting, so cannot be expressed with a single regular expression.
If you really are restricting yourself to things very close to your example (exactly 2 arguments, etc.), then could you provide an example of something that doesn't match?

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.
I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.
I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."
I don't believe it's possible, but I am intrigued why you would want it.
In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.
It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

Categories