Trying to space words using Regex - c#

I have a regex that is able to space words correctly, however, if something has a capitalized shortcode, it will not work.
what I'm trying to do is turn something like "TSTApplicationType" into TST Application Type".
Currently, I'm using Regex.Replace(value, "([a-z])_?([A-Z])", "$1 $2") to add the spaces to the words, however this just turns it into "TSTApplication Type".

You may use either of the two:
// Details on Approach 1
Regex.Replace(text, #"\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$)", "$& ")
// Details on Approach 2
Regex.Replace(text, #"(?<=\p{Lu})(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})", " ")
See regex demo #1 and regex demo #2
Details on Approach 1
\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$) matches
\p{Lu}{2,}(?=\p{Lu}) - 2 or more uppercase letters followed with an uppercase letter
| - or
(?>\p{Lu}\p{Ll}*)(?!$) - an uppercase letter and then 0 or more lowercase letters not at the end of string.
The replacement is the whole match (referenced with $&) and a space.
Details on Approach 2
This is a common approach that is basically inserting a space in between an uppercase letter and an uppercase letter followed with a lowercase letter ((?<=\p{Lu})(?=\p{Lu}\p{Ll})) or (|) between a lowercase letter and an uppercase letter (see (?<=\p{Ll})(?=\p{Lu})).

If you don't mind using Humanizer they also have this as well when you try to do .Humanize() on a string. This however doesn't preserve casing, but would be another option if you actually had wanted to change the casing.
"TSTApplicationType".Humanize(LetterCasing.Title); // TST Application Type

Related

Regex, ignore non letter characters at before capital letter

I need to make a Regex string which matches server address taken from a file. The address always start witha capital letter. The lines in the file are in the form:
# note: first entry will be initial default
London, lonxx:33333
New York, NyC:222222
~CloudLondon, Clon:55555
I want to make a regex which takes each line starting from the upper case letter so in the case of CloudLondon it should match only "CloudLondon, Clon:55555" without the "~" .
I have the regex for the rest:
^[A-Z](?<Location>[\w\s]+)\s*,\s*(?<Server>\w+):(?<Port>\d+)$
but how can I ignore the characters at the beginning of the line until the first Capital letter?
Thanks to anybody who is going to answer.
You can remove the anchor ^ and move the character class into the group Location.
\b(?<Location>[A-Z][\w\s]+)\s*,\s*(?<Server>\w+):(?<Port>\d+)$
See a regex demo for the group values.

Match vocabulary words and phrases

I am writing an application/logic that has vocabulary word/phrase as an input parameter. I am having troubles writing validation logic for this parameter's value!
Following are the rules I've came up with:
can be up to 4 words (with hyphens or not)
one apostrophe is allowed
only regular letters are allowed (no special characters like !##$%^&*()={}[]"";|/>/? ¶ © etc)
numbers are disallowed
case insensitive
multiple languages support (English, Russian, Norwegian, etc..) (so both Unicode and Cyrillic must be supported)
either whole string matches or nothing
Few examples (in 3 languages):
// match:
one two three four
one-two-three-four
one-two-three four
vær så snill
тест регекс
re-read
under the hood
ONe
rabbit's lair
// not-match:
one two three four five
one two three four#
one-two-three-four five
rabbit"s lair
one' two's
one1
1900
Given the expected result provided above - could someone point me to right direction on how to create a validation rule like that? If that matters - I will be writing validation logic in C# so I have more tools than just Regex available at my disposal.
If that is going to be of any help - I have been testing several solutions, like these ^[\p{Ll}\p{Lt}]+$ and (?=\S*['-])([a-zA-Z'-]+)$. The first regex seems to be doing a great job allowing just the letters I need (En, No and Rus), whereas the second rule set is doing great in using the Lookahead concept.
\p{Ll} or \p{Lowercase_Letter}: a lowercase letter that has an uppercase variant.
\p{Lu} or \p{Uppercase_Letter}: an uppercase letter that has a lowercase variant.
\p{Lt} or \p{Titlecase_Letter}: a letter that appears at the start of a word when only the first letter of the word is capitalized.
\p{L&} or \p{Letter&}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
\p{Lm} or \p{Modifier_Letter}: a special character that is used like a letter.
\p{Lo} or \p{Other_Letter}: a letter or ideograph that does not have lowercase and uppercase variants.
Needless to say, neither of the solutions I have been testing take into account all the rules I defined above..
You can use
\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+){0,3}\z
See the regex demo. Details:
\A - start of string
(?!(?:[^']*'){2}) - the string cannot contain two apostrophes
\p{L}+ - one or more Unicode letters
(?:[\s'-]\p{L}+){0,3} - zero to three occurrences of
[\s'-] - a whitespace, ' or - char
\p{L}+ - one or more Unicode letters
\z - the very end of string.
In C#, you can use it as
var IsValid = Regex.IsMatch(text, #"\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+");{0,3}\z")

Regex for Name, Streetname, Cityname, etc

I m programming a web application with asp.net mvc and c#.
In a form a user should enter a name, a streetname and a city in different fields.
Start: The entered value has to start with an 'alphabetic' character (no matter if the language is english, chinese or french or what ever. So things like é and chinese chars and so on are ok. But chars like *¥##1 and so on are not allowed)
Middle: The same as i said first and spaces (but not two spaces after eachother).
End: That what I have said for the start.
This is correct:
A b c
Abcd ef
Abcdef
This is not correct:
1abc
A1 bc
1 2 3
a b c (space at the start)
Question:
What is the correct regex for this?
How can i set the length?
In a second case I want to allow numbers 0123456789 too (like the chars)
This is what I have: '^[a-zA-Z][a-zA-Z ][a-zA-Z]$'
Thank you
You want to validate strings that only contain letter words separated with a single space between them.
You may use a regex like
^\p{L}+(?: \p{L}+)*$
Or, if any whitepsace is allowed:
^\p{L}+(?:\s\p{L}+)*$
See the regex demo
To make it only match strings of 3 or more chars, use
^(?=.{3})\p{L}+(?:\s\p{L}+)*$
^^^^^^^^
Details
^ - start of a string
(?=.{3}) = a positive lookahead that requires any 3 chars immediately after the start of a string
\p{L}+ - 1 or more any Unicode letters
(?:\s\p{L}+)* - zero or more repetitions of
\s - any whitespace
\p{L}+ - 1 or more any Unicode letters
$ - end of string
Note that if you need to use it in ASP.NET, only use this regex to validate on the server side, as on the client side, this pattern might not be correctly handled by JavaScript regex.
You can use this regex:
^(?:\p{L}+ )*\p{L}+$
\p{L} matches all unicode code points that are in the "Letters" category.
The regex matches 0 or more of \p{L}+ (one or more letters plus a space) and then ensures there is at least one or more letters.
Demo
Example code:
Console.WriteLine(Regex.IsMatch("abc def", #"^(?:\p{L}+ )*\p{L}+$"));

Adding Tab space between String using Regular Expressions

I am trying to add a Tab space in a string which looks like this:
String 1: 1.1_1ATitle of the Chapter
String 2: 1.1_1Title of the Chapter
There is no space between "_1A" and "T".
or between "_1" and "T".
The desired output is
1.1_1A Title of the Chapter.
1.1_1 Title of the Chapter.
Here is what I tried:
string output= Regex.Replace(input, "^([\\d.]+)", "$\t");
also
string output= Regex.Replace(input, "^([\\d[A-Z]]+)", "$1 \t");
also
string output= Regex.Replace(input, "^([\\d.]+)", "\\t");
Can I have a single Regex for both the inputs?
Many Thanks
You've complicated thing quite a bit with the introduction of a letter in the version/index (or whatever the first part is). You might get it to work with this though:
([\d._]+(?:[A-Z](?=[A-Z]))?)
(Note! No C escaping of \. Check ideone example for that.)
It grabs everything consisting of digits, dots and underscore. Then, in an optional non-capturing group, it matches (included in previous capture group) a capital letter, if it is followed by another capital letter (positive look-ahead).
This does however assume that the title always starts with a capital letter. I.e. if the numeric part is followed by two capital letters, it's assumed that the first is part of the numeric part.
Replace with $1\t to get desired effect.
See it here at ideone.

Regex misunderstanding

I'm trying to use regex to check for letters only so I used the below method. The problem is that if I have a number before or after the letter, the number is ignored and nothing happens and that's not what I'm trying to do. I'm trying to check for letters ONLY so if I have anything other then letters an error message pops up. If I have letters only it works fine, and If I have numbers only it also works fine, the problem is that if I have a letter and a number it does't work correctly, other than that everything works fine.
Regex _regex = new Regex("[A-Z]");
Match Instruction_match = _regex.Match(Instruction_Seperator[1]);
if (!Instruction_match.Success) // "A," or "B," or "C,"...etc.
{
Messagebox.show("Error, Please letters only");
}
note that Instruction_Seperator[1] is taken from the user through a textbox, where the user MUST only input letters nothing before the letters nor after the letters. do u have any idea why the messagebox doesn't popup when I input letters and numbers.
Looking forward for your replies :)
can I have a specific size where if the user exceeds pops up an error, for example if the user is allowed only to input 3 Latin letters and nothing else, is there a length constrain in regex :)
That pattern will match any string that contains a capital Latin letter; if it happens to contain any other characters they will be ignored. If you want pattern that will match if the string contains only capital Latin letters, you'll want to use start (^) and end ($) anchors, as well as a one-or-more quantifier (+) after your character class, like this:
^[A-Z]+$
In the end your code should look like this:
Regex _regex = new Regex("^[A-Z]+$");
Match Instruction_match = _regex.Match(Instruction_Seperator[1]);
if (!Instruction_match.Success) // "A," or "B," or "C,"...etc.
{
Messagebox.show("Error, Please letters only");
}
Given the update to your question and some other comments you've made, here are some more patterns you might need to use instead:
^[A-Z]{3}$ - This pattern will match exactly three capital Latin characters
^[A-Z]{1,3}$ - This pattern will match one, two, or three capital Latin characters
^[A-Z]([A-Z]{2})?$ - This pattern will match one or three capital Latin characters
Change your pattern to:
Regex _regex = new Regex("^[A-Z]+$");
The regex you have used [A-Z] matches only a single capital letter. Use [A-Z]+ for any length of continuous capital lettered substring of the input. Use ^[A-Z]+$ which implies that substring is anchored at both start and end position of the input string.
I am assuming that you would only like to match one letter, so the only matched string is "D" in the follwoing example, if you want any number of words use ^[A-Z]+$
var patterns = new string[] { "12ABC", "D", "A","AB","ABC","A2B3","A1BC", "A123", "123ABC12" };
var regex = new Regex(#"^[A-Z]{1,3}$");
foreach (var pattern in patterns)
{
var isMatch = regex.Match(pattern);
if (isMatch.Success)
Console.WriteLine("Found Matching string {0}", pattern);
}
Please look at the modified code, the change for your question is to add {1,3} to the regex pattern, which means up to 3 occurrences of Latin words.

Categories