Regex misunderstanding - c#

I'm trying to use regex to check for letters only so I used the below method. The problem is that if I have a number before or after the letter, the number is ignored and nothing happens and that's not what I'm trying to do. I'm trying to check for letters ONLY so if I have anything other then letters an error message pops up. If I have letters only it works fine, and If I have numbers only it also works fine, the problem is that if I have a letter and a number it does't work correctly, other than that everything works fine.
Regex _regex = new Regex("[A-Z]");
Match Instruction_match = _regex.Match(Instruction_Seperator[1]);
if (!Instruction_match.Success) // "A," or "B," or "C,"...etc.
{
Messagebox.show("Error, Please letters only");
}
note that Instruction_Seperator[1] is taken from the user through a textbox, where the user MUST only input letters nothing before the letters nor after the letters. do u have any idea why the messagebox doesn't popup when I input letters and numbers.
Looking forward for your replies :)
can I have a specific size where if the user exceeds pops up an error, for example if the user is allowed only to input 3 Latin letters and nothing else, is there a length constrain in regex :)

That pattern will match any string that contains a capital Latin letter; if it happens to contain any other characters they will be ignored. If you want pattern that will match if the string contains only capital Latin letters, you'll want to use start (^) and end ($) anchors, as well as a one-or-more quantifier (+) after your character class, like this:
^[A-Z]+$
In the end your code should look like this:
Regex _regex = new Regex("^[A-Z]+$");
Match Instruction_match = _regex.Match(Instruction_Seperator[1]);
if (!Instruction_match.Success) // "A," or "B," or "C,"...etc.
{
Messagebox.show("Error, Please letters only");
}
Given the update to your question and some other comments you've made, here are some more patterns you might need to use instead:
^[A-Z]{3}$ - This pattern will match exactly three capital Latin characters
^[A-Z]{1,3}$ - This pattern will match one, two, or three capital Latin characters
^[A-Z]([A-Z]{2})?$ - This pattern will match one or three capital Latin characters

Change your pattern to:
Regex _regex = new Regex("^[A-Z]+$");

The regex you have used [A-Z] matches only a single capital letter. Use [A-Z]+ for any length of continuous capital lettered substring of the input. Use ^[A-Z]+$ which implies that substring is anchored at both start and end position of the input string.

I am assuming that you would only like to match one letter, so the only matched string is "D" in the follwoing example, if you want any number of words use ^[A-Z]+$
var patterns = new string[] { "12ABC", "D", "A","AB","ABC","A2B3","A1BC", "A123", "123ABC12" };
var regex = new Regex(#"^[A-Z]{1,3}$");
foreach (var pattern in patterns)
{
var isMatch = regex.Match(pattern);
if (isMatch.Success)
Console.WriteLine("Found Matching string {0}", pattern);
}
Please look at the modified code, the change for your question is to add {1,3} to the regex pattern, which means up to 3 occurrences of Latin words.

Related

Trying to space words using Regex

I have a regex that is able to space words correctly, however, if something has a capitalized shortcode, it will not work.
what I'm trying to do is turn something like "TSTApplicationType" into TST Application Type".
Currently, I'm using Regex.Replace(value, "([a-z])_?([A-Z])", "$1 $2") to add the spaces to the words, however this just turns it into "TSTApplication Type".
You may use either of the two:
// Details on Approach 1
Regex.Replace(text, #"\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$)", "$& ")
// Details on Approach 2
Regex.Replace(text, #"(?<=\p{Lu})(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})", " ")
See regex demo #1 and regex demo #2
Details on Approach 1
\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$) matches
\p{Lu}{2,}(?=\p{Lu}) - 2 or more uppercase letters followed with an uppercase letter
| - or
(?>\p{Lu}\p{Ll}*)(?!$) - an uppercase letter and then 0 or more lowercase letters not at the end of string.
The replacement is the whole match (referenced with $&) and a space.
Details on Approach 2
This is a common approach that is basically inserting a space in between an uppercase letter and an uppercase letter followed with a lowercase letter ((?<=\p{Lu})(?=\p{Lu}\p{Ll})) or (|) between a lowercase letter and an uppercase letter (see (?<=\p{Ll})(?=\p{Lu})).
If you don't mind using Humanizer they also have this as well when you try to do .Humanize() on a string. This however doesn't preserve casing, but would be another option if you actually had wanted to change the casing.
"TSTApplicationType".Humanize(LetterCasing.Title); // TST Application Type

Split String At Every Non-Letter/Non-Number Character

Imagine a string that contains special characters like $§%%,., numbers and letters.
I want to receive the letter and number junks of an arbitrary string as an array of strings.
A good solution seems to be the use of regex, but I don't know how to express [numbers and letters]
// example
"abc" = {"abc"};
"ab .c" = {"ab", "c"}
"ab123,cd2, ,,%&$§56" = {"ab123", "cd2", "56"}
// try
string input = "jdahs32455$§&%$§df233§$fd";
string[] output = input.Split(Regex("makejunksfromstring"));
To extract chunks of 1 or more letters/digits you may use
[A-Za-z0-9]+ # ASCII only letters/digits
[\p{L}0-9]+ # Any Unicode letters and ASCII only digits
[\p{L}\p{N}]+ # Any Unicode letters/digits
See a regex demo.
C# usage:
string[] output = Regex.Matches(input, #"[\p{L}\p{N}]+").Cast<Match>().Select(x => x.Value).ToArray();
Yes, regex is indeed a good solution for this.
And in fact, to just match all standard words in the input sequence, this is all you need:
(\w+)
Let me quickly explain
\w matches any word character and is equivalent to [a-zA-Z0-9_] - matching a through z or A through Z or 0-9 or _, you might wanna go with [a-zA-Z0-9] instead to avoid that underscore.
Wrapping an expression in () means that you want to capture that part as a group.
The + means that you want sequences of 1 or more of the preceding characters.
Refer to a regular expression cheat sheet to see all the possibilities, such as
https://cheatography.com/davechild/cheat-sheets/regular-expressions/
Or any that you find online.
Also there are tools available to quickly test out your regular expressions, such as
https://regex101.com/ (quite well visualised matching)
or http://regexstorm.net/tester specifically for .NET

Regex to match trimmed string consisting of words separated by only 1 space char

I am looking for a regex to validate input in C#. The regex has to match an arbitrary number of words which are separated with only 1 space character in between. The matched string cannot start or end with whitespace characters (this is where my problem is).
Example: some sample input 123
What I've tried: /^(\S+[ ]{0,1})+$/gm this pattern almost does what is required but it also matches 1 trailing space.
Any ideas? Thanks.
I tried this one and it seems to work:
Regex regex = new Regex(#"^\S+([ ]{1}\S+)*$");
It checks if your string starts with a word followed by zero or more entities of a single white space followed by a word. So trailing white spaces are not allowed.

Regex, replace group between other groups?

I have such a regex:
string ipPort = #"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}[\s\S]*?[0-9]{1,5}";
Regex Rx = new Regex(ipPort,RegexOptions.Singleline);
List<string> catched = new List<string>();
foreach (Match ItemMatch in Rx.Matches(page))
{
catched.Add(ItemMatch.ToString());
}
It will find ip, followed by any number of characters, followed by port number. I want this "any number of characters" replaced by single colon ":". How to do that, I'm not very experienced with regular expressions...
You can use this general expression that uses lookarounds in order to find a pattern between a prefix and a suffix:
(?<=prefix)find(?=suffix)
Applied to your specific problem:
(?<=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})[^0-9].*?(?=[0-9]{1,5})
Note that I have added a [^0-9] meaning "not a digit". There must be at least one non-digit character there, because otherwise the search cannot distinguish between digits belonging to the last ip-group and the port number.
You can also repeat the number-dot group three times and then append the fourth number
(?<=([0-9]{1,3}\.){3}[0-9]{1,3})[^0-9].*?(?=[0-9]{1,5})
You can also replace [\s\S] (space or non-space character) by . (any character).
Applied to our general expression, now we have:
prefix (ip): ([0-9]{1,3}\.){3}[0-9]{1,3}
find (stuff to be replaced by colon): [^0-9].*?
suffix (port): [0-9]{1,5}

Regex match if a string has length 2 and contains 1 letter and 1 number

Guys I hate Regex and I suck at writing.
I have a string that is space separated and contains several codes that I need to pull out. Each code is marked by beginning with a capital letter and ending with a number. The code is only two digits.
I'm trying to create an array of strings from the initial string and I can't get the regular expression right.
Here is what I have
String[] test = Regex.Split(originalText, "([a-zA-Z0-9]{2})");
I also tried:
String[] test = Regex.Split(originalText, "([A-Z]{1}[0-9]{1})");
I don't have any experience with Regex as I try to avoid writing them whenever possible.
Anyone have any suggestions?
Example input:
AA2410 F7 A4 Y7 B7 A 0715 0836 E0.M80
I need to pull out F7, A4, B7. E0 should be ignored.
You want to collect the results, not split on them, right?
Regex regexObj = new Regex(#"\b[A-Z][0-9]\b");
allMatchResults = regexObj.Matches(subjectString);
should do this. The \bs are word boundaries, making sure that only entire strings (like A1) are extracted, not substrings (like the A1 in TWA101).
If you also need to exclude "words" with non-word characters in them (like E0.M80 in your comment), you need to define your own word boundary, for example:
Regex regexObj = new Regex(#"(?<=^|\s)[A-Z][0-9](?=\s|$)");
Now A1 only matches when surrounded by whitespace (or start/end-of-string positions).
Explanation:
(?<= # Assert that we can match the following before the current position:
^ # Start of string
| # or
\s # whitespace.
)
[A-Z] # Match an uppercase ASCII letter
[0-9] # Match an ASCII digit
(?= # Assert that we can match the following after the current position:
\s # Whitespace
| # or
$ # end of string.
)
If you also need to find non-ASCII letters/digits, you can use
\p{Lu}\p{N}
instead of [A-Z][0-9]. This finds all uppercase Unicode letters and Unicode digits (like Ä٣), but I guess that's not really what you're after, is it?
Do you mean that each code looks like "A00"?
Then this is the regex:
"[A-Z][0-9][0-9]"
Very simple... By the way, there's no point writing {1} in a regex. [0-9]{1} means "match exactly one digit, which is exactly like writing [0-9].
Don't give up, simple regexes make perfect sense.
This should be ok:
String[] all_codes = Regex.Split(originalText, #"\b[A-Z]\d\b");
It gives you an array with all code starting with a capital letter followed by a digit, separated by an kind of word boundary (site space etc.)

Categories