I would like to write a regular expression to replace all dashes that are next preceded or followed by a string of only digits.
See this example, I have highlighted the dashes I wish to replace:
Scotland Primary School (4-11) - (PPI)
holiday-castle#cwwdssaa.org 14-19 Holiday & Clusters - (FR)
SF-00014
www7902-az2388
793902-SS2388
7902-az2388
The dashes I'd like to replace are formatted -
In bold is the string of adjacent digits indicating it should be formatted. As you can see there are dashes in the text that should not be formatted i.e. the ones in the email address, surrounded by spaces or not adjacent to a complete set of digits.
So far I have written this, but not sure how to take it further:
(-\b\d+\b|\b\d+\b-)
You could use lookarounds to check for digits on either side:
string input = "Scotland Primary School (4-11) - (PPI)";
string result = Regex.Replace(input, #"(?<=(^|\s)\d+)-|-(?=\d+(\s|$))", ",");
Console.WriteLine(result);
Demo
I have assumed here that the replacement is comma, since I didn't actually see anything in your question about what the replacement should be.
Related
I have a lot of movie files and I want to get their production year from their file names. as below:
Input: Kingdom.of.Heaven.2005.720p.Dubbed.Film2media
Output: 2005
This code just splits all the numbers:
string[] result = Regex.Split(str, #"(\d+:)");
You must be more specific about which numbers you want. E.g.
Regex to find the year (not for splitting):
\b(19\d\d)|(20\d\d)\b
19\d\d selects numbers like 1948, 1989.
20\d\d selects numbers like 2001, 2022.
\b specifies the word limits. It excludes numbers or words with 5 or more digits.
| means or
But it is difficult to make a fool proof algorithm without knowing how exactly the filename is constructed. E.g. the movie "2001: A Space Odyssey" was released in 1968. So, 2001 is not a correct result here.
To omit the movie name, you could search backwards like this:
string productionYear =
Regex.Match(str, #"\b(19\d\d)|(20\d\d)\b", RegexOptions.RightToLeft);
If instead of 720p we had a resolution of 2048p for instance, this would not be a problem, because the 2nd \b requires the number to be at the word end.
If the production year was always the 4th item from the right, then a better way to get this year would be:
string[] parts = str.Split('.');
string productionYear = parts[^4]; // C# 8.0+, .NET Core
// or
string productionYear = parts[parts.Length - 4]; // C# < 8 or .NET Framework
Note that the regex expression you specify in Regex.Split designates the separators, not the returned values.
I would not try to split the string, more like match a field. Also, consider matching \d{4} and not \d+ if you want to be sure to get years and not other fields like resolution in your example
You can try this:
string str = "Kingdom.of.Heaven.2005.720p.Dubbed.Film2media";
string year = Regex.Match(str, #"(?<=\.)(\d{4})(?=\.)").Groups[1].Value;
Console.WriteLine("Year: " + year);
Output: Year: 2005
Demo: https://dotnetfiddle.net/KM2PNk
\d{4}: This matches any sequence of four digits.
(?<=\.): This is a positive lookbehind assertion, which means that the preceding pattern must be present, but is not included in the match. In this case, the preceding pattern is a dot, so the regular expression will only match a sequence of four digits if it is preceded by a dot.
(?=\.): This is a positive lookahead assertion, which means that the following pattern must be present, but is not included in the match. In this case, the following pattern is a dot, so the regular expression will only match a sequence of four digits if it is followed by a dot.
I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*
I am trying to add a Tab space in a string which looks like this:
String 1: 1.1_1ATitle of the Chapter
String 2: 1.1_1Title of the Chapter
There is no space between "_1A" and "T".
or between "_1" and "T".
The desired output is
1.1_1A Title of the Chapter.
1.1_1 Title of the Chapter.
Here is what I tried:
string output= Regex.Replace(input, "^([\\d.]+)", "$\t");
also
string output= Regex.Replace(input, "^([\\d[A-Z]]+)", "$1 \t");
also
string output= Regex.Replace(input, "^([\\d.]+)", "\\t");
Can I have a single Regex for both the inputs?
Many Thanks
You've complicated thing quite a bit with the introduction of a letter in the version/index (or whatever the first part is). You might get it to work with this though:
([\d._]+(?:[A-Z](?=[A-Z]))?)
(Note! No C escaping of \. Check ideone example for that.)
It grabs everything consisting of digits, dots and underscore. Then, in an optional non-capturing group, it matches (included in previous capture group) a capital letter, if it is followed by another capital letter (positive look-ahead).
This does however assume that the title always starts with a capital letter. I.e. if the numeric part is followed by two capital letters, it's assumed that the first is part of the numeric part.
Replace with $1\t to get desired effect.
See it here at ideone.
I am looking for a regex to validate input in C#. The regex has to match an arbitrary number of words which are separated with only 1 space character in between. The matched string cannot start or end with whitespace characters (this is where my problem is).
Example: some sample input 123
What I've tried: /^(\S+[ ]{0,1})+$/gm this pattern almost does what is required but it also matches 1 trailing space.
Any ideas? Thanks.
I tried this one and it seems to work:
Regex regex = new Regex(#"^\S+([ ]{1}\S+)*$");
It checks if your string starts with a word followed by zero or more entities of a single white space followed by a word. So trailing white spaces are not allowed.
How to check if a string contains a pattern separated by whitespace?
Examples:
"abc ef ds ab "
Now I would like to check if the given string consists only of the pattern [a-z] separated by whitespace. My try: ^\s*[a-z]*\s*$. But this checks only whitespace in the beginning and end, not if the whitespaces is used for separation of the content.
Try this regular expression:
/^[a-z\s]+$/
^(\s|[a-z])*$
Zero or more case characters that are either whitespace, or A-Z.
If you want to make sure there's at least one thing other than white space, then:
^\s*[a-z]+(\s*|[a-z])*$
Zero or more whitespace, at least one character A-Z, then the same as above.