Regex to match a string after colon - c#

Input string is something like this: OU=TEST:This001. We need extra "This001". Best in C#.

What about :
/OU=.*?:(.*)/
Here is how it works:
OU= // Must contain OU=
. // Any character
* // Repeated but not mandatory
? // Ungreedy (lazy) (Don't try to match everything)
: // Match the colon
( // Start to capture a group
. // Any character
* // Repeated but not mandatory
) // End of the group
For the / they're delimiters to know where the regex start and where it ends (and for adding options).
The captured group will contain This001.
But it would be faster with a simple Substring().
yourString.Substring(yourString.IndexOf(":")+1);
Resources :
regular-expressions.info

"OU=" smells like you're doing an Active Directory or LDAP search and responding to the results. While regex is a brilliant tool, I just wanted to make sure that you're also aware of the excellent System.DirectoryServices.Protocols classes that were made for parsing, filtering and manipulating just this sort of data.
The SearchResult, SearchResultEntry and DirectoryAttribute in particular would be the friends you might be looking for. I don't doubt that you can regex or substring as cleverly as the next guy but it's also nice to have another good tool in the toolbox.
Have you tried these classes?

A solution without regex:
var str = "OU=TEST:This00:1";
var result = str.Split(new char[] { ':' }, 2)[1];
// result == This00:1
Regex vs Split vs IndexOf
Split
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = str.Split(new char[] { ':' }, 2)[1];
sw.Stop();
// sw.ElapsedTicks == 15
Regex
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = (new Regex(":(.*)", RegexOptions.Compiled)).Match(str).Groups[1];
sw.Stop();
// sw.ElapsedTicks == 7000 (Compiled)
IndexOf
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = str.Substring(str.IndexOf(":") + 1);
sw.Stop();
// sw.ElapsedTicks == 40
Winner: Split
Links
Split
IndexOf
Regex

if the OU=TEST: is your requirement before the string you want to match, use this regex:
(?<=OU\s*=\s*TEST\s*:\s*).*
that regex matches any length of text after the colon, whereas any text before the colon is just a requirement.
You can replace TEST with [A-Za-z]+ to match any text other than TEST, or you can replace TEST with [\w]+ to match any length of any combination of alphabet and numbers.
\s* means it might be any number of whitespaces or nothing in that position, remove it if you don't need such a check.

Related

Get number between characters in Regex

Having difficulty creating a regex.
I have this text:
"L\":0.01690502,\"C\":0.01690502,\"V\":33.76590433"
I need only the number after C\": extracted, this is what I currently have.
var regex = new Regex(#"(?<=C\\"":)\d +.\d + (?=\s *,\\)");
var test = regex.Match(content).ToString();
decimal.TryParse(test, out decimal closingPrice);
To extract the number after C\":, you can capture (\d+.\d+) in a group:
C\\":(\d+.\d+)
You could also use a positive lookbehind:
(?<=C\\":)\d+.\d+
You can use this code to fetch all pairs of letter and number.
var regex = new Regex("(?<letter>[A-Z])[^:]+:(?<number>[^,\"]+)");
var input = "L\":0.01690502,\"C\":0.01690502,\"V\":33.76590433";
var matches = regex.Matches(input).Cast<Match>().ToArray();
foreach (var match in matches)
Console.WriteLine($"Letter: {match.Groups["letter"].Value}, number: {match.Groups["number"].Value}");
If you only need only number from "C" letter you can use this linq expression:
var cNumber = matches.FirstOrDefault(m => m.Groups["letter"].Value == "C")?.Groups["number"].Value ?? "";
Regex explanation:
(?<letter>[A-Z]) // capture single letter
[^:]+ // skip all chars until ':'
: // colon
(?<number>[^,"]+) // capture all until ',' or '"'
Working demo
Fixed it with this.
var regex = new Regex("(?<=C\\\":)\\d+.\\d+(?=\\s*,)");
var test = regex.Match(content).ToString();
String literal to use for C#:
#"C\\"":([.0-9]*),"
If you wish to filter for only a valid numbers:
#"C\\"":([0-9]+.[0-9]+),"

Replace a part of string containing Password

Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);
Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.
This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant

Retrieve Alphabet with white space

I would like to retrieve the alphabet only but the code is not enough to make it.
What am I missing?
[A-Öa-ö]+$
16440 dallas
23941 cityO < You also have white space after "O"
931 00 Texas
10581 New Orleans
It's because you specify a sequence from the ASCII character table. And åäö is not directly after Z in the ascii table.
You can see it here: http://www.asciitable.com/
So what you need is a regex that specifies those separately:
[A-Za-zåäöÅÄÖ]+$
So the complete regex is:
var re = new Regex("([A-Za-zåäöÅÄÖ]+)$", RegexOptions.Multiline);
var matches = re.Matches(data);
Console.WriteLine(matches[0].Groups[1].Value);
However, since you want to allow white spaces within the name (as for "New Orleans") you need to allow it, simply include it in the regex:
var re = new Regex("([A-Za-zåäöÅÄÖ ]+)$", RegexOptions.Multiline);
Unfortunately that also includes white spaces in the beginning and the end:
" New Orleans "
To fix that you start by specifying the regex as greedy, i.e. tell it to use less characters:
new Regex("([A-Za-zåäöÅÄÖ ]+?)$", RegexOptions.Multiline)
The problem with that is that it do not take other lines than New orleans. Don't ask me why. To fix that I told the regex that it must have a space between the digits and the text and that there may be a space after the text:
var re = new Regex("\\s([A-Za-zåäöÅÄÖ ]+?)[\\s]*$", RegexOptions.Multiline);
which works with all lines.
Regex breakdown:
\\s A single whitespace (which should not be included in the match since it's not in the parenthesis expression)
([A-Za-zåäöÅÄÖ ]+?)
Find a character which either is in the alphabet or space
+ there must be one or more
? use greedy search.
[\\s]*
[\\s] Find a white space character
* There must be zero or more if it
Alternative
As an alternative to regex you can do something like this:
public IEnumerable<string> GetCodes(string data)
{
var lines = data.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
foreach (var line in lines)
{
for (var i = 0; i < line.Length; i++)
{
if (!char.IsLetter(line[i]))
continue;
var text = line.Substring(i).TrimEnd(' ');
yield return text;
break;
}
}
}
Which is invoked like:
var codes = GetCodes(yourData).ToList();
In C#, you can use \p{L} Unicode category class to match all Unicode characters. You may match zero or more whitespace characters with \s*. End of string is $ (or \Z or \z). The word you need can be captured and this capture can easily be retrieved from the match result via GroupCollection.
Thus, you can use
(\p{L}+)\s*$
or - if you plan to match specific Finnish, etc. letters:
(?i)([A-ZÅÄÖ]+)\s*$
See the regex demo
C# demo:
var strs = new string[] {"16440 dallas", "23941 cityO ", "931 00 Texas", "10581 New Orleans"};
foreach (var s in strs) {
var match = Regex.Match(s, #"(\p{L}+)\s*$");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
}

Would like to split a string using a regex pattern

I have a string that I would like to split into
var finalQuote = "2012-0001-1";
var quoteNum = "2012-0001";
var revision = "1"
I used something like this
var quoteNum = quoteNum.subString(0,9);
var revision = quoteNum.subString(quoteNum.lastIndexOf("-") + 1);
But can't it be done using regex more efficiently? I come across patterns like this that need to be split into two.
var finalQuote = "2012-0001-1";
string pat = #"(\d|[A-Z]){4}-\d{4}";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(text);
var quoteNum = m.Value;
So far I have reached here. But I feel I am not using the correct method. Please guide me.
EDIT: I wanna edit by the pattern. Splitting with dashes is not an option as the first part of the split contains a dash. ie, "2012-0001"
I would simply go with:
var quoteNum = finalQuote.Substring(0,9);
var revision = finalQuote.Substring(10);
quoteNum would consist of the first 9 characters, and revision of the 10th and everything that may follow the 10th, e.g. if the revision is 10 or higher it would still work.
Using complicated regexes or extension methods is very quickly overkill; sometimes the simple methods are efficient enough by itself.
I would agree with others that using substring is a better solution than regex for this.
But if you're insisting on using regex you can use something like:
^(\d{4}-\d{4})-(\d)$
Untested since I don't have a C# environment installed:
var finalQuote = "2012-0001-1";
string pat = #"^(\d{4}-\d{4})-(\d)$";
Regex r = new Regex(pat);
Match m = r.Match(finalQuote);
var quoteNum = m.Groups[1].Value;
var revision = m.Groups[2].Value;
Alternatively, if you want a string[] you could try (again, untested):
string[] data = Regex.Split("2012-0001-1",#"-(?=\d$)");
data[0] would be quoteNum and data[1] would be revision.
Update:
Explanation of the Regex.Split:
From the Regex.Split documentation: The Regex.Split methods are similar to the String.Split method, except that Regex.Split splits the string at a delimiter determined by a regular expression instead of a set of characters.
The regex -(?=\d$) matches a single - given it is followed by a digit followed by the end of the string so it would only match the last dash in the string. The last digit is not consumed because we use a zero-width lookahead assertion (?=)
sIt would be easier to maintain in the future if you something that the new comer would understand.
you could use:
var finalQuote = "2012-0001-1";
string[] parts = finalQuote.Split("-");
var quoteNum = parts[0] + "-" + parts[1] ;
var revision = parts[3];
However if you insists you need a regEx then
(\d{4}-\d{4})-(\d)
There are two groups in this expression, group 1 capture the first part and the group 2 capture the second part.
var finalQuote = "2012-0001-1";
string pat = #"(\d{4}-\d{4})-(\d)";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(finalQuote);
var quoteNum = m.Groups[1].Value;
var revision = m.Groups[2].Value;

Regex to strip characters except given ones?

I would like to strip strings but only leave the following:
[a-zA-Z]+[_a-zA-Z0-9-]*
I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?
Because everything in the second part of the regex is in the first part, you could do something like this:
String foo = "_-abc.!##$5o993idl;)"; // your string here.
//First replace removes all the characters you don't want.
foo = Regex.Replace(foo, "[^_a-zA-Z0-9-]", "");
//Second replace removes any characters from the start that aren't allowed there.
foo = Regex.Replace(foo, "^[^a-zA-Z]+", "");
So start out by paring it down to only the allowed characters. Then get rid of any allowed characters that can't be at the beginning.
Of course, if your regex gets more complicated, this solution falls apart fairly quickly.
Assuming that you've got the strings in a collection, I would do it this way:
foreach element in the collection try match the regex
if !success, remove the string from the collection
Or the other way round - if it matches, add it to a new collection.
If the strings are not in a collection can you add more details as to what your input looks like ?
If you want to pull out all of the identifiers matching your regular expression, you can do it like this:
var input = " _wontmatch f_oobar0 another_valid ";
var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" );
foreach( Match match in re.Matches( input ) )
Console.WriteLine( match.Value );
Use MatchCollection matchColl = Regex.Matches("input string","your regex");
Then use:
string [] outStrings = new string[matchColl.Count]; //A string array to contain all required strings
for (int i=0; i < matchColl.Count; i++ )
outStrings[i] = matchColl[i].ToString();
You will have all the required strings in outStrings. Hope this helps.
Edited
var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase);
string output_string="";
foreach (Match m in s)
{
output_string = output_string + m;
}
MessageBox.Show(output_string);

Categories