Trying to learn a little more about using Regex (Regular expressions). Using Microsoft's version of Regex in C# (VS 2010), how could I take a simple string like:
"Hello"
and change it to
"H e l l o"
This could be a string of any letter or symbol, capitals, lowercase, etc., and there are no other letters or symbols following or leading this word. (The string consists of only the one word).
(I have read the other posts, but I can't seem to grasp Regex. Please be kind :) ).
Thanks for any help with this. (an explanation would be most useful).
You could do this through regex only, no need for inbuilt c# functions.
Use the below regexes and then replace the matched boundaries with space.
(?<=.)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<=.)(?!$)", " ");
Explanation:
(?<=.) Positive lookbehind asserts that the match must be preceded by a character.
(?!$) Negative lookahead which asserts that the match won't be followed by an end of the line anchor. So the boundaries next to all the characters would be matched but not the one which was next to the last character.
OR
You could also use word boundaries.
(?<!^)(\B|b)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<!^)(\B|b)(?!$)", " ");
Explanation:
(?<!^) Negative lookbehind which asserts that the match won't be at the start.
(\B|\b) Matches the boundary which exists between two word characters and two non-word characters (\B) or match the boundary which exists between a word character and a non-word character (\b).
(?!$) Negative lookahead asserts that the match won't be followed by an end of the line anchor.
Regex.Replace("Hello", "(.)", "$1 ").TrimEnd();
Explanation
The dot character class matches every character of your string "Hello".
The paranthesis around the dot character are required so that we could refer to the captured character through the $n notation.
Each captured character is replaced by the replacement string. Our replacement string is "$1 " (notice the space at the end). Here $1 represents the first captured group in the input, therefore our replacement string will replace each character by that character plus one space.
This technique will add one space after the final character "o" as well, so we call TrimEnd() to remove that.
A demo can be seen here.
For the enthusiast, the same effect can be achieve through LINQ using this one-liner:
String.Join(" ", YourString.AsEnumerable())
or if you don't want to use the extension method:
String.Join(" ", YourString.ToCharArray())
It's very simple. To match any character use . dot and then replace with that character along with one extra space
Here parenthesis (...) are used for grouping that can be accessed by $index
Find what : "(.)"
Replace with "$1 "
DEMO
Related
I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*
I have a string which looks like this :-
"$.ConfigSettings.DatabaseSettings.DatabaseConnections.SqlConnectionString.0.Id"
and I want the result to look like this :-
"$.ConfigSettings.DatabaseSettings.DatabaseConnections.SqlConnectionString[0].Id"
Basically wherever there is a single digit preceded and succeeded by a period I need to change it to [digit] followed by period ie [digit]. .I have seen tons of examples where people are only replacing the regex string.
How will I do this using Regex.Replace in C#
Regex.Replace(input, #"\.(\d)(?=\.)", "[$1]")
\. - capture a "."
(\d) - then a single digit in a capturing group ($1 in the replacement)
(?= - start a positive lookahead
\. - that matches a "."
) - end the lookahead
So, it means : (match a dot followed by a digit in a capturing group) only if it is followed by a dot
So we matched ".0" and captured "0". We replace the entire match with "[$1]", where $1 refers to the first captured group.
See "Grouping Constructs in Regular Expressions" : https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx for information about the different grouping constructs that I use in this solution.
I'm trying to write a regular expression to transform words written like "H e l l o Everyone" to "Hello Everyone".
If it is words separated by spaces like "Hello everyone, how are you?", nothing should happen.
Basically all single characters should be squeezed to a make a word and we can consider if it is more than 2 characters only are following this pattern.
If it is like "a b cdef" - Nothing should happen
But "a b c def" -> "abc def"
I tried something like this "^\w(?:(\s)\w)*$" but it is matching with "Hello world" as well.
And also, I'm not sure on how to squeeze these single characters.
Any help is greatly appreciated.
Thanks!
I suggest to match chunks of single word chars separated with single whitespaces and then removing the spaces inside within a match evaluator.
The regex is
(?<!\S)\w(?:\s\w){2,}(?!\S)
See its demo at RegexStorm. The (?<!\S) and (?!\S) make sure these chunks are enclosed with whitespaces (or are at string start/end).
Details:
(?<!\S) - a negative lookbehind making sure there is a whitespace or start of string immediately before the current location
\w - a word char (letter/digit/underscore, to match a letter, use \p{L} instead)
(?:\s\w){2,} - 2 or more sequences of:
\s - a whitespace
\w - a word char
(?!\S) - a negative lookahead making sure there is a whitespace or start of string immediately after the current location
See the C# demo:
var res = Regex.Replace(s, #"(?<!\S)\w(?:\s\w){2,}(?!\S)", m =>
new string(m.Value
.Where(c => !Char.IsWhiteSpace(c))
.ToArray()));
If you're looking for a pure regex solution,
Regex.Replace(s, #"(?<=^\w|(\s\w)+)\s(?=(\w\s)+|\w$)", string.Empty);
replaces a space with at least one space and letter pair on each side with nothing (with a little extra to handle start/end of the string).
I'm looking for a regexp (or any other solution) that would let me replace all whitespace characters between specific non whitespace chars. Eg:
instance. method
instance .method
"instance" .method
instance. "method"
Is it possible?
EDIT:
In other words - I want to throw out whitespace if it's between letter and dot, dot and letter, quotation mark and dot or dot and quotation mark.
Using lookaheads and lookbehinds:
var regex = new Regex("(?<=[a-zA-Z])\\s+(?=\\.)|(?<=\\.)\\s+(?=[a-zA-Z])|(?<=\")\\s+(?=\\.)|(?<=\\.)\\s+(?=\")");
Console.WriteLine(regex.Replace("instance. method", ""));
Console.WriteLine(regex.Replace("instance .method", ""));
Console.WriteLine(regex.Replace("\"instance\" .method", ""));
Console.WriteLine(regex.Replace("instance. \"method\"", ""));
Result:
instance.method
instance.method
"instance".method
instance."method"
The regex has four parts:
(?<=[a-zA-Z])\s+(?=\.) //Matches [a-zA-Z] before and . after:
(?<=\.)\s+(?=[a-zA-Z]) //Matches . before and [a-zA-Z] after
(?<=")\s+(?=\.) //Matches " before and . after
(?<=\.)\s+(?=") //Matches . before and " after
I want to throw out whitespace if it's between letter and dot, dot and letter, quotation mark and dot or dot and quotation mark.
I would use something like this:
#"(?i)(?:(?<=\.) (?=[""a-z])|(?<=[""a-z]) (?=\.))"
regex101 demo
Or broken down:
(?i) // makes the regex case insensitive.
(?:
(?<=\.) // ensure there's a dot before the match
[ ] // space (enclose in [] if you use the expanded mode, otherwise, you don't need []
(?=[a-z""]) // ensure there's a letter or quote after the match
| // OR
(?<=[a-z""]) // ensure there's a letter or quote before the match
[ ] // space
(?=\.) // ensure there's a dot after the match
)
In a variable:
var reg = new Regex(#"(?i)(?:(?<=\.) (?=[""a-z])|(?<=[""a-z]) (?=\.))");
What you are looking for/to search on google is "Character LookAhead and LookBehind"... basically what you want to do is use RegEx to find all instances of whitespace characters or split the string by Whitespace (i prefer this one), and then look ahead and behind on each match and see if the char at those positions (previous and next) match your criteria. Then replace if necessary at that position.
Unfortunately i do not know of a "single statement" solution for what you are attempting to do.
Is this what you seek? (regex101 link)
[A-Za-z"](\s)\.|\.(\s)[A-Za-z"]
You can parse the string with word bounds:
^([\w\".]*)([\s])([\w\".]*)$
$1 will give you the first part.
$2 will give you the white space.
$3 will give you the end part.
Regex.Replace(instance, "([\\w\\d\".])\\s([\\w\\d\".])", "$1$2");
One alternate and simple solution would be to split the string on dot and then trim them.
I want it to search string like "$12,56,450" using Regex in c#, but it doesn't match the string
Here is my code:
string input="Total earn for the year $12,56,450";
string pattern = #"\b(?mi)($12,56,450)\b";
Regex regex = new Regex(pattern);
if (regex.Match(input).Success)
{
return true;
}
This Regex will do the job, (?mi)(\$\d{2},\d{2},\d{3}), and here's a Regex 101 to prove it.
Now let's break it down a little:
\$ matches the literal $ at the beginning of the string
\d{2} matches any two digits
, matches the literal ,
\d{2} matches any two digits
, matches the literal ,
\d{3} matches any three digits
Now, for the purposes of the demonstration I removed the word boundaries, \b, but I'm also pretty confident you don't need them anyway. See, word boundaries aren't generally necessary for such a finite string match. Consider their definition:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
You need to escape $ and some other special regex caracters.
try this #"\b(?mi)(\$12,56,450)\b";
if you want you can use \d to match a digit, and use \d{2,3} to match a digit with size 2 or 3.