Find 3 or more whitespaces with regex in C# [duplicate] - c#

This question already has answers here:
Regex to validate string for having three non white-space characters
(2 answers)
Closed 3 years ago.
As said above, I want to find 3 or more whitespaces with regex in C#. Currently I tried:
\s{3,} and [ ]{3,} for Somestreet 155/ EG 47. Both didnt worked out. What did I do wrong?

This \s{3,} matches 3 or more whitespace in a row. You need for example this pattern \s.*\s.*\s to match a string with 3 whitespaces anywhere.
So this would match:
a b c d
a b c
a b
abc d e f
a
a b // ends in 1 space
// just 3 spaces
a // ends in 3 spaces

Linq is an alternative way to count spaces:
string source = "Somestreet 155/ EG 47";
bool result = source
.Where(c => c == ' ') // spaces only
.Skip(2) // skip 2 of them
.Any(); // do we have at least 1 more (i.e. 3d space?)
Edit: If you want not just spaces but whitespaces Where should be
...
.Where(c => char.IsWhiteSpace(c))
...

You could count the whitespace matches:
if (Regex.Matches(yourString, #"\s+").Count >= 3) {...}
The + makes sure that consecutive matches to \s only count once, so "Somestreet 155/ EG 47" has three matches but "Somestreet 155/ EG47" only has two.
If the string is long, then it could take more time than necessary to get all the matches then count them. An alternative is to get one match at a time and bail out early if the required number of matches has been met:
static bool MatchesAtLeast(string s, Regex re, int matchCount)
{
bool success = false;
int startPos = 0;
while (!success)
{
Match m = re.Match(s, startPos);
if (m.Success)
{
matchCount--;
success = (matchCount <= 0);
startPos = m.Index + m.Length;
if (startPos > s.Length - 2) { break; }
}
else { break; }
}
return success;
}
static void Main(string[] args)
{
Regex re = new Regex(#"\s+");
string s = "Somestreet 155/ EG\t47";
Console.WriteLine(MatchesAtLeast(s, re, 3)); // outputs True
Console.ReadLine();
}

Try ^\S*\s\S*\s\S*\s\S*$ instead.
\S matches non-whitespace characters, ^ matches beginnning of a string and $ matches end of a string.
Demo

Related

How to capitalize 1st letter (ignoring non a-z) with regex in c#?

There are tons of posts regarding how to capitalize the first letter with C#, but I specifically am struggling how to do this when ignoring prefixed non-letter characters and tags inside them. Eg,
<style=blah>capitalize the word, 'capitalize'</style>
How to ignore potential <> tags (or non-letter chars before it, like asterisk *) and the contents within them, THEN capitalize "capitalize"?
I tried:
public static string CapitalizeFirstCharToUpperRegex(string str)
{
// Check for empty string.
if (string.IsNullOrEmpty(str))
return string.Empty;
// Return char and concat substring.
// Start # first char, no matter what (avoid <tags>, etc)
string pattern = #"(^.*?)([a-z])(.+)";
// Extract middle, then upper 1st char
string middleUpperFirst = Regex.Replace(str, pattern, "$2");
middleUpperFirst = CapitalizeFirstCharToUpper(str); // Works
// Inject the middle back in
string final = $"$1{middleUpperFirst}$3";
return Regex.Replace(str, pattern, final);
}
EDIT:
Input: <style=foo>first non-tagged word 1st char upper</style>
Expected output: <style=foo>First non-tagged word 1st char upper</style>
You may use
<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)
The regex does the following:
<[^<>]*> - matches <, any 0+ chars other than < and > and then >
| - or
(?<!\p{L}) - finds a position not immediately preceded with a letter
(\p{L}) - captures into Group 1 any letter
(\p{L}*) - captures into Group 2 any 0+ letters (that is necessary if you want to lowercase the rest of the word).
Then, check if Group 2 matched, and if yes, capitalize the first group value and lowercase the second one, else, return the whole value:
var result = Regex.Replace(s, #"<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)", m =>
m.Groups[1].Success ?
m.Groups[1].Value.ToUpper() + m.Groups[2].Value.ToLower() :
m.Value);
If you do not need to lowercase the rest of the word, remove the second group and the code related to it:
var result = Regex.Replace(s, #"<[^<>]*>|(?<!\p{L})(\p{L})", m =>
m.Groups[1].Success ?
m.Groups[1].Value.ToUpper() : m.Value);
To only replace the first occurrence using this approach, you need to set a flag and reverse it once the first match is found:
var s = "<style=foo>first non-tagged word 1st char upper</style>";
var found = false;
var result = Regex.Replace(s, #"<[^<>]*>|(?<!\p{L})(\p{L})", m => {
if (m.Groups[1].Success && !found) {
found = !found;
return m.Groups[1].Value.ToUpper();
} else {
return m.Value;
}
});
Console.WriteLine(result); // => <style=foo>First non-tagged word 1st char upper</style>
See the C# demo.
Using look-behind regex feature you can match the first 'capitalize' without > parenthesis and then you can capitalize the output.
The regex is the following:
(?<=<.*>)\w+
It will match the first word after the > parenthesis

Trim String value with particular pattern in C#.NET

I have a string which is 900-1000 characters long.
the pattern string follows is
"Number:something,somestringNumber:something,somestring"
and so on example string:
"23:value,ordernew14:valueagain,orderagain"
the requirement is whenever it crosses more than 1000 characters, I have to remove first 500 characters. and then if doesnot starts with Number, i have to remove characters until I reach to point where first character is digit
sortinfo = sortinfo.Remove(0, 500);
sortinfo = new string(sortinfo.SkipWhile(c => !char.IsDigit(c)).ToArray());
I am able to do this with the help of above code
In the above example if i give remove 5 characters output will be
14:valueagain,orderagain
which is perfectly fine.
but if the string has value :
23:value,or3dernew14:valueagain,orderagain
and remove 5 characters, output is
3dernew14:valueagain,orderagain
and requirement is to have
14:valueagain,orderagain
and hence its breaking everything as it is not in correct format.
please help me how can I do this
my full code
class Program
{
static void Main(string[] args)
{
string str;
str=TrimSortInfo("23:value,ord4er24:valueag4ain,order6again15:value,order"); // breaking value
//str = TrimSortInfo("23:value,order24:valueagain,orderagain15:value,order"); //working value
Console.WriteLine(str);
Console.ReadLine();
}
static string TrimSortInfo(string sortinfo)
{
if (sortinfo.Length > 15)
{
sortinfo = sortinfo.Remove(0, 15);
sortinfo = new string(sortinfo.SkipWhile(c => !char.IsDigit(c))
.ToArray());
return sortinfo;
}
return sortinfo;
}
}
Using a regex:
static Regex rx = new Regex("(?<=.*?)[0-9]+:.*");
static string TrimSortInfo(string sortinfo, int trimLength = 15)
{
if (sortinfo.Length > trimLength)
{
return rx.Match(sortinfo, trimLength).Value;
}
return sortinfo;
}
Note that there is a big risk here: you could trim "in the middle" of the number.
So you could trim a "xxxxxxxxxxxxxx24:something" to "4:something".
The regex means: looking for a sequence of digits 0-9 (at least one digit) ([0-9]+), followed by a :, followed by all the other characters (.*). Before this sequence there can be any other character, but only the minimum quantity possible (?<=.*?). This pre-sequence isn't captured (?<=...).
In the end the regex can be simplified to:
static Regex rx = new Regex("[0-9]+:.*");
because it is unanchored, so the match will begin at the first occurrence of the match.
To solve this problem:
static Regex rx = new Regex("(?:[^0-9])([0-9]+:.*)");
static string TrimSortInfo(string sortinfo, int trimLength = 15)
{
if (sortinfo.Length > trimLength)
{
return rx.Match(sortinfo, trimLength - 1).Groups[1].Value;
}
return sortinfo;
}
We cheat a little. To trim 15 characters, we skip 14 characters (trimLength - 1) then we capture a non-digit character (that we will ignore (?:[^0-9])) plus the digits and the : and everything else ([0-9]+:.*). Note the use of Groups[1].Value

Match content inside brackets [duplicate]

This question already has an answer here:
How to get text between nested parentheses?
(1 answer)
Closed 5 years ago.
I have a good regex that works well for most my cases:
"\(.*\)"
This regex matches nested brackets which is good: "ABC ( DEF (GHI) JKL ) MNO"
But there is a tricky case: "This is ABC (XXX) DEF (XXX) (XXX)". As you can see this regex matches also DEF, but it doesn't.
Any ideas of how I can adjust my regex?
If you don't insist on regular expressions you can put a simple stack-based implementation:
using System.Linq;
...
private static IEnumerable<string> EnumerateEnclosed(string value) {
if (null == value)
yield break;
Stack<int> positions = new Stack<int>();
for (int i = 0; i < value.Length; ++i) {
char ch = value[i];
if (ch == '(')
positions.Push(i);
else if (ch == ')')
if (positions.Any()) {
int from = positions.Pop();
if (!positions.Any()) // <- outmost ")"
yield return value.Substring(from, i - from + 1);
}
}
}
Test:
// Let's combine both examples into one and elaborate it a bit further:
string test = "ABC (DEF (GHI) J(RT(123)L)KL) MNO (XXX1) DEF (XXX2) (XXX3)";
Console.WriteLine(string.Join(Environment.NewLine, EnumerateEnclosed(test)));
Outcome:
(DEF (GHI) J(RT(123)L)KL)
(XXX1)
(XXX2)
(XXX3)
Regex: \([^)]+\)[^(]+\)|\([^)]+\)
Details:
[^(] Match a single character not present in the list "("
+ Matches between one and unlimited times
| or
Regex demo

Regex for alphanumeric, at least 1 number and special chars

I am trying to find a regex which will give me the following validation:
string should contain at least 1 digit and at least 1 special character. Does allow alphanumeric.
I tried the following but this fails:
#"^[a-zA-Z0-9##$%&*+\-_(),+':;?.,!\[\]\s\\/]+$]"
I tried "password1$" but that failed
I also tried "Password1!" but that also failed.
ideas?
UPDATE
Need the solution to work with C# - currently the suggestions posted as of Oct 22 2013 do not appear to work.
Try this:
Regex rxPassword = new Regex( #"
^ # start-of-line, followed by
[a-zA-Z0-9!##]+ # a sequence of one or more characters drawn from the set consisting of ASCII letters, digits or the punctuation characters ! # and #
(<=[0-9]) # at least one of which is a decimal digit
(<=[!##]) # at least one of which is one of the special characters
(<=[a-zA-Z]) # at least one of which is an upper- or lower-case letter
$ # followed by end-of-line
" , RegexOptions.IgnorePatternWhitespace ) ;
The construct (<=regular-expression) is a zero-width positive look-behind assertion.
Sometimes it's a lot simpler to do things one step at a time. The static constructor builds the escaped character class characters from a simple list of allowed special characters. The built-in Regex.Escape method doesn't work here.
public static class PasswordValidator {
private const string ALLOWED_SPECIAL_CHARS = #"##$%&*+_()':;?.,![]\-";
private static string ESCAPED_SPECIAL_CHARS;
static PasswordValidator() {
var escapedChars = new List<char>();
foreach (char c in ALLOWED_SPECIAL_CHARS) {
if (c == '[' || c == ']' || c == '\\' || c == '-')
escapedChars.AddRange(new[] { '\\', c });
else
escapedChars.Add(c);
}
ESCAPED_SPECIAL_CHARS = new string(escapedChars.ToArray());
}
public static bool IsValidPassword(string input) {
// Length requirement?
if (input.Length < 8) return false;
// First just check for a digit
if (!Regex.IsMatch(input, #"\d")) return false;
// Then check for special character
if (!Regex.IsMatch(input, "[" + ESCAPED_SPECIAL_CHARS + "]")) return false;
// Require a letter?
if (!Regex.IsMatch(input, "[a-zA-Z]")) return false;
// DON'T allow anything else:
if (Regex.IsMatch(input, #"[^a-zA-Z\d" + ESCAPED_SPECIAL_CHARS + "]")) return false;
return true;
}
}
This may be work, there are two possible, the digit before special char or the digit after the special char. You should use DOTALL(the dot point all char)
^((.*?[0-9].*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*)|(.*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*?[0-9].*))$
This worked for me:
#"(?=^[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]{8,}$)(?=([!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]\W+){1,})(?=[^0-9][0-9])[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]*$"
alphanumeric, at least 1 numeric, and special character with a min length of 8
This should do the work
(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+
Tested with javascript, not sure about c#, may need some little adjust.
What it does is use anticipated positive lookahead to find the required elements of the password.
EDIT
Regular expression is designed to test if there are matches. Since all the patterns are lookahead, no real characters get captured and matches are empty, but if the expression "match", then the password is valid.
But, since the question is C# (sorry, i don't know c#, just improvising and adapting samples)
string input = "password1!";
string pattern = #"^(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+.*$";
Regex rgx = new Regex(pattern, RegexOptions.None);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0) {
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}
Adding start of line, and a .*$ to the end, the expression will match if the password is valid. And the match value will be the password. (i guess)

How to parse a comma delimited string when comma and parenthesis exists in field

I have this string in C#
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
I want to use a RegEx to parse it to get the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
In addition to the above example, I tested with the following, but am still unable to parse it correctly.
"%exc.uns: 8 hours let # = ABC, DEF", "exc_it = 1 day" , " summ=graffe ", " a,b,(c,d)"
The new text will be in one string
string mystr = #"""%exc.uns: 8 hours let # = ABC, DEF"", ""exc_it = 1 day"" , "" summ=graffe "", "" a,b,(c,d)""";
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == ',' && scopeLevel == 0)
{
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
firstIndex = i + 1;
}
else if (str[i] == '(') scopeLevel++;
else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
Event faster:
([^,]*\x28[^\x29]*\x29|[^,]+)
That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
^ ^ ^ ^ ^
The Carets symbolize where the grouping stops.
Just this regex:
[^,()]+(\([^()]*\))?
A test example:
var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(#"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
.Cast<Match>()
.Select(m => m.Value);
returns
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
If you simply must use Regex, then you can split the string on the following:
, # match a comma
(?= # that is followed by
(?: # either
[^\(\)]* # no parens at all
| # or
(?: #
[^\(\)]* # ...
\( # (
[^\(\)]* # stuff in parens
\) # )
[^\(\)]* # ...
)+ # any number of times
)$ # until the end of the string
)
It breaks your input into the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.
Another way to implement what Snowbear was doing:
public static string[] SplitNest(this string s, char src, string nest, string trg)
{
int scope = 0;
if (trg == null || nest == null) return null;
if (trg.Length == 0 || nest.Length < 2) return null;
if (trg.IndexOf(src) >= 0) return null;
if (nest.IndexOf(src) >= 0) return null;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == src && scope == 0)
{
s = s.Remove(i, 1).Insert(i, trg);
}
else if (s[i] == nest[0]) scope++;
else if (s[i] == nest[1]) scope--;
}
return s.Split(trg);
}
The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");
The function would first turn your string into
adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO
then split on the ~, ignoring the nested commas.
Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:
MatchCollection matches = Regex.Matches(data, #"(?:[^(),]|\([^)]*\))+");
var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var result = string.Join(#"\n",Regex.Split(s, #"(?<=\)),|,\s"));
The pattern matches for ) and excludes it from the match then matches ,
or
matches , followed by a space.
result =
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
The TextFieldParser (msdn) class seems to have the functionality built-in:
TextFieldParser Class: - Provides methods and properties for parsing structured text files.
Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.
The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.
See the article which helped me find that
Here's a stronger option, which parses the whole text, including nested parentheses:
string pattern = #"
\A
(?>
(?<Token>
(?:
[^,()] # Regular character
|
(?<Paren> \( ) # Opening paren - push to stack
|
(?<-Paren> \) ) # Closing paren - pop
|
(?(Paren),) # If inside parentheses, match comma.
)*?
)
(?(Paren)(?!)) # If we are not inside parentheses,
(?:,|\Z) # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
# though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);
You can get all matches by iterating over match.Groups["Token"].Captures.

Categories