I am trying to add a Tab space in a string which looks like this:
String 1: 1.1_1ATitle of the Chapter
String 2: 1.1_1Title of the Chapter
There is no space between "_1A" and "T".
or between "_1" and "T".
The desired output is
1.1_1A Title of the Chapter.
1.1_1 Title of the Chapter.
Here is what I tried:
string output= Regex.Replace(input, "^([\\d.]+)", "$\t");
also
string output= Regex.Replace(input, "^([\\d[A-Z]]+)", "$1 \t");
also
string output= Regex.Replace(input, "^([\\d.]+)", "\\t");
Can I have a single Regex for both the inputs?
Many Thanks
You've complicated thing quite a bit with the introduction of a letter in the version/index (or whatever the first part is). You might get it to work with this though:
([\d._]+(?:[A-Z](?=[A-Z]))?)
(Note! No C escaping of \. Check ideone example for that.)
It grabs everything consisting of digits, dots and underscore. Then, in an optional non-capturing group, it matches (included in previous capture group) a capital letter, if it is followed by another capital letter (positive look-ahead).
This does however assume that the title always starts with a capital letter. I.e. if the numeric part is followed by two capital letters, it's assumed that the first is part of the numeric part.
Replace with $1\t to get desired effect.
See it here at ideone.
Related
I have a regex that is able to space words correctly, however, if something has a capitalized shortcode, it will not work.
what I'm trying to do is turn something like "TSTApplicationType" into TST Application Type".
Currently, I'm using Regex.Replace(value, "([a-z])_?([A-Z])", "$1 $2") to add the spaces to the words, however this just turns it into "TSTApplication Type".
You may use either of the two:
// Details on Approach 1
Regex.Replace(text, #"\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$)", "$& ")
// Details on Approach 2
Regex.Replace(text, #"(?<=\p{Lu})(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})", " ")
See regex demo #1 and regex demo #2
Details on Approach 1
\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$) matches
\p{Lu}{2,}(?=\p{Lu}) - 2 or more uppercase letters followed with an uppercase letter
| - or
(?>\p{Lu}\p{Ll}*)(?!$) - an uppercase letter and then 0 or more lowercase letters not at the end of string.
The replacement is the whole match (referenced with $&) and a space.
Details on Approach 2
This is a common approach that is basically inserting a space in between an uppercase letter and an uppercase letter followed with a lowercase letter ((?<=\p{Lu})(?=\p{Lu}\p{Ll})) or (|) between a lowercase letter and an uppercase letter (see (?<=\p{Ll})(?=\p{Lu})).
If you don't mind using Humanizer they also have this as well when you try to do .Humanize() on a string. This however doesn't preserve casing, but would be another option if you actually had wanted to change the casing.
"TSTApplicationType".Humanize(LetterCasing.Title); // TST Application Type
I would like to write a regular expression to replace all dashes that are next preceded or followed by a string of only digits.
See this example, I have highlighted the dashes I wish to replace:
Scotland Primary School (4-11) - (PPI)
holiday-castle#cwwdssaa.org 14-19 Holiday & Clusters - (FR)
SF-00014
www7902-az2388
793902-SS2388
7902-az2388
The dashes I'd like to replace are formatted -
In bold is the string of adjacent digits indicating it should be formatted. As you can see there are dashes in the text that should not be formatted i.e. the ones in the email address, surrounded by spaces or not adjacent to a complete set of digits.
So far I have written this, but not sure how to take it further:
(-\b\d+\b|\b\d+\b-)
You could use lookarounds to check for digits on either side:
string input = "Scotland Primary School (4-11) - (PPI)";
string result = Regex.Replace(input, #"(?<=(^|\s)\d+)-|-(?=\d+(\s|$))", ",");
Console.WriteLine(result);
Demo
I have assumed here that the replacement is comma, since I didn't actually see anything in your question about what the replacement should be.
I have a string "myname 18-may 1234" and I want only "myname" from whole string using a regex.
I tried using the \b(^[a-zA-Z]*)\b regex and that gave me "myname" as a result.
But when the string changes to "1234 myname 18-may" the regex does not return "myname". Please suggest the correct way to select only "myname" whole word.
Is it also possible - given the string in
"1234 myname 18-may" format - to get myname only, not may?
UPDATE
Judging by your feedback to your other question you might need
(?<!\p{L})\p{L}+(?!\p{L})
ORIGINAL ANSWER
I have come up with a lighter regex that relies on the specific nature of your data (just a couple of words in the string, only one is whole word):
\b(?<!-)\p{L}+\b
See demo
Or even a more restrictive regex that finds a match only between (white)spaces and string start/end:
(?<=^|\s)\p{L}+(?=\s|$)
The following regex is context-dependent:
\p{L}+(?=\s+\d{1,2}-\p{L}{3}\b)
See demo
This will match only the word myname.
The regex means:
\p{L}+ - Match 1 or more Unicode letters...
(?=\s+\d{1,2}-\p{L}{3}\b) - until it finds 1 or more whitespaces (\s+) followed with 1 or 2 digits, followed with a hyphen and 3 Unicode letters (\p{L}{3}) which is a whole word (\b). This construction is a positive look-ahead that only checks if something can be found after the current position in the string, but it does not "consume" text.
Since the date may come before the string, you can add an alternation:
\p{L}+(?=[ ]+\d{1,2}-\p{L}{3}\b)|(?<=\d{1,2}-\p{L}{3}[ ]+)\p{L}+
See another demo
The (?<=\d{1,2}-\p{L}{3}\s+) is a look-behind that checks for the same thing (almost) as the look-ahead, but before the myname.
here is a solution without RegEx
string input = "myname 18-may 1234";
string result = input.Split(' ').Where(x => x.All(y => char.IsLetter(y))).FirstOrDefault();
Do a replace using this regex:
(\s*\d+\-.{3}\s*|\s*.{3}\-\d+\s*)|(\s*\d+\s*)
you will end up with just your name.
Demo
Trying to learn a little more about using Regex (Regular expressions). Using Microsoft's version of Regex in C# (VS 2010), how could I take a simple string like:
"Hello"
and change it to
"H e l l o"
This could be a string of any letter or symbol, capitals, lowercase, etc., and there are no other letters or symbols following or leading this word. (The string consists of only the one word).
(I have read the other posts, but I can't seem to grasp Regex. Please be kind :) ).
Thanks for any help with this. (an explanation would be most useful).
You could do this through regex only, no need for inbuilt c# functions.
Use the below regexes and then replace the matched boundaries with space.
(?<=.)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<=.)(?!$)", " ");
Explanation:
(?<=.) Positive lookbehind asserts that the match must be preceded by a character.
(?!$) Negative lookahead which asserts that the match won't be followed by an end of the line anchor. So the boundaries next to all the characters would be matched but not the one which was next to the last character.
OR
You could also use word boundaries.
(?<!^)(\B|b)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<!^)(\B|b)(?!$)", " ");
Explanation:
(?<!^) Negative lookbehind which asserts that the match won't be at the start.
(\B|\b) Matches the boundary which exists between two word characters and two non-word characters (\B) or match the boundary which exists between a word character and a non-word character (\b).
(?!$) Negative lookahead asserts that the match won't be followed by an end of the line anchor.
Regex.Replace("Hello", "(.)", "$1 ").TrimEnd();
Explanation
The dot character class matches every character of your string "Hello".
The paranthesis around the dot character are required so that we could refer to the captured character through the $n notation.
Each captured character is replaced by the replacement string. Our replacement string is "$1 " (notice the space at the end). Here $1 represents the first captured group in the input, therefore our replacement string will replace each character by that character plus one space.
This technique will add one space after the final character "o" as well, so we call TrimEnd() to remove that.
A demo can be seen here.
For the enthusiast, the same effect can be achieve through LINQ using this one-liner:
String.Join(" ", YourString.AsEnumerable())
or if you don't want to use the extension method:
String.Join(" ", YourString.ToCharArray())
It's very simple. To match any character use . dot and then replace with that character along with one extra space
Here parenthesis (...) are used for grouping that can be accessed by $index
Find what : "(.)"
Replace with "$1 "
DEMO
How to check if a string contains a pattern separated by whitespace?
Examples:
"abc ef ds ab "
Now I would like to check if the given string consists only of the pattern [a-z] separated by whitespace. My try: ^\s*[a-z]*\s*$. But this checks only whitespace in the beginning and end, not if the whitespaces is used for separation of the content.
Try this regular expression:
/^[a-z\s]+$/
^(\s|[a-z])*$
Zero or more case characters that are either whitespace, or A-Z.
If you want to make sure there's at least one thing other than white space, then:
^\s*[a-z]+(\s*|[a-z])*$
Zero or more whitespace, at least one character A-Z, then the same as above.