Would like to split a string using a regex pattern - c#

I have a string that I would like to split into
var finalQuote = "2012-0001-1";
var quoteNum = "2012-0001";
var revision = "1"
I used something like this
var quoteNum = quoteNum.subString(0,9);
var revision = quoteNum.subString(quoteNum.lastIndexOf("-") + 1);
But can't it be done using regex more efficiently? I come across patterns like this that need to be split into two.
var finalQuote = "2012-0001-1";
string pat = #"(\d|[A-Z]){4}-\d{4}";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(text);
var quoteNum = m.Value;
So far I have reached here. But I feel I am not using the correct method. Please guide me.
EDIT: I wanna edit by the pattern. Splitting with dashes is not an option as the first part of the split contains a dash. ie, "2012-0001"

I would simply go with:
var quoteNum = finalQuote.Substring(0,9);
var revision = finalQuote.Substring(10);
quoteNum would consist of the first 9 characters, and revision of the 10th and everything that may follow the 10th, e.g. if the revision is 10 or higher it would still work.
Using complicated regexes or extension methods is very quickly overkill; sometimes the simple methods are efficient enough by itself.

I would agree with others that using substring is a better solution than regex for this.
But if you're insisting on using regex you can use something like:
^(\d{4}-\d{4})-(\d)$
Untested since I don't have a C# environment installed:
var finalQuote = "2012-0001-1";
string pat = #"^(\d{4}-\d{4})-(\d)$";
Regex r = new Regex(pat);
Match m = r.Match(finalQuote);
var quoteNum = m.Groups[1].Value;
var revision = m.Groups[2].Value;
Alternatively, if you want a string[] you could try (again, untested):
string[] data = Regex.Split("2012-0001-1",#"-(?=\d$)");
data[0] would be quoteNum and data[1] would be revision.
Update:
Explanation of the Regex.Split:
From the Regex.Split documentation: The Regex.Split methods are similar to the String.Split method, except that Regex.Split splits the string at a delimiter determined by a regular expression instead of a set of characters.
The regex -(?=\d$) matches a single - given it is followed by a digit followed by the end of the string so it would only match the last dash in the string. The last digit is not consumed because we use a zero-width lookahead assertion (?=)

sIt would be easier to maintain in the future if you something that the new comer would understand.
you could use:
var finalQuote = "2012-0001-1";
string[] parts = finalQuote.Split("-");
var quoteNum = parts[0] + "-" + parts[1] ;
var revision = parts[3];
However if you insists you need a regEx then
(\d{4}-\d{4})-(\d)
There are two groups in this expression, group 1 capture the first part and the group 2 capture the second part.
var finalQuote = "2012-0001-1";
string pat = #"(\d{4}-\d{4})-(\d)";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(finalQuote);
var quoteNum = m.Groups[1].Value;
var revision = m.Groups[2].Value;

Related

Can LINQ be used to search for Regex expressions in a string?

I have the following code that works, but would like to edit it up using LINQ to find if any of the Regex search strings are in the target.
foreach (Paragraph comment in
wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>().Where<Paragraph>(comment => comment.InnerText.Contains("cmt")))
{
//print values
}
More precisely I have to select through LINQ if the string start with letters or start with symbols - or •
This Regex is correct for my case ?
string pattern = #"^[a-zA-Z-]+$";
Regex rg = new Regex(pattern);
Any suggestion please?
Thanks in advance for any help
You can. It would be better to use query syntax though, as described here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-combine-linq-queries-with-regular-expressions
Example:
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = System.IO.File.ReadAllText(file.FullName)
let matches = searchTerm.Matches(fileText)
where matches.Count > 0
select new
{
name = file.FullName,
matchedValues = from System.Text.RegularExpressions.Match match in matches
select match.Value
};
Your pattern is fine, just remove the $ from the end and add any character
#"^[a-zA-Z-]+. *"
Your regex should be modified as
^[\p{L}•-]
To also allow whitespace at the start of the string add \s and use
^[\p{L}\s•-]
Details
^ - start of string
[\p{L}•-] - a letter, • or -
[\p{L}•-] - a letter, whitespace, • or -
In C#, use
var reg = new Regex(#"^[\p{L}•-]");
foreach (Paragraph comment in
wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>()
.Where<Paragraph>(comment => reg.IsMatch(comment.InnerText)))
{
//print values
}
If you want to match those items containing cmt and also matching this regex, you may adjust the pattern to
var reg = new Regex(#"^(?=.*cmt)[\p{L}\s•-]", RegexOptions.Singleline);
If you need to only allow cmt at the start of the string:
var reg = new Regex(#"^(?:cmt|[\p{L}\s•-])");

Substring from path string

In strings like this (I get strings from Directory.GetFiles())
string temp = "\\folder_name\\file_name.filetype.[somename#somedomain].wallet"
What is the best way to substring: file_name.filetype
I could do something like this:
const string source = ".[somename#somedomain].wallet";
temp.Substring(0, temp.IndexOf(source, StringComparison.Ordinal));
... but problem is that "mail" in string ".[xxxx#xxxx].wallet" is changing, in my words my string source should be something like this:
const string source = ".[*].wallet"; //so all strings that are in .[all_strings].wallet
Is there an easy way to do something like this (with asterisk "*"), or I will have to substring piece by piece and concatenate this new string?
You can construct a regex that requires a backslash before the substring of interest, and a text in square brackets followed by .wallet at the end.
Here is how you can do with in C# regex APIs:
string temp = #"\folder_name\file_name.filetype.[somename#somedomain].wallet";
var m = Regex.Match(temp, #"(?<=\\)[^.]*\.[^.]*(?=\.\[[^\]]*\].wallet)");
if (m.Success) {
Console.WriteLine(m.Value);
} else {
Console.WriteLine("<no match>");
}
Demo.
(?<=...) and (?=...) constructs are zero-length look-ahead and look-behind. They are not included in the m.Value.
You could search for the 2nd index of . and take everything before that point.
string temp = "\\folder_name\\file_name.filetype.[somename#somedomain].wallet";
var filename = Path.GetFileName(temp);
var lastIndex = filename.IndexOf('.', filename.IndexOf('.') + 1);
var fileYouAreLookingFor = filename.Substring(0, lastIndex);
Working fiddle
You could also use an regex to achieve this. The first group of the following one should be what you are looking for.
string temp = "\\folder_name\\file_name.filetype.[somename#somedomain].wallet";
var filenameRegex = new Regex("^.*\\\\(.*)\\.\\[.*\\]\\.wallet$");
var match = filenameRegex.Match(temp);
var result = match.Groups[1];

Regex match and replace operators in math operation

Given an input string
12/3
12*3/12
(12*54)/(3/4)
I need to find and replace each operator with a string that contains the operator
some12text/some3text
some12text*some2text/some12text
(some12text*some54text)/(some3text/some4text)
practical application:
From a backend (c#), i have the following string
34*157
which i need to translate to:
document.getElementById("34").value*document.getElementById("157").value
and returned to the screen which can be run in an eval() function.
So far I have
var pattern = #"\d+";
var input = "12/3;
Regex r = new Regex(pattern);
var matches = r.Matches(input);
foreach (Match match in matches)
{
// im at a loss what to match and replace here
}
Caution: i cannot do a blanket input.Replace() in the foreach loop, as it may incorrectly replace (12/123) - it should only match the first 12 to replace
Caution2: I can use string.Remove and string.Insert, but that mutates the string after the first match, so it throws off the calculation of the next match
Any pointers appreciated
Here you go
string pattern = #"\d+"; //machtes 1-n consecutive digits
var input = "(12*54)/(3/4)";
string result = Regex.Replace(input, pattern, "some$0Text");
$0 is the character group matching the pattern \d+. You can also write
string result = Regex.Replace(input, pattern, m => "some"+ m.Groups[0]+ "Text");
Fiddle: https://dotnetfiddle.net/JUknx2

C# regular expression to find custom markers and take content

I have a string:
productDescription
In it are some custom tags such as:
[MM][/MM]
For example the string might read:
This product is [MM]1000[/MM] long
Using a regular expression how can I find those MM tags, take the content of them and replace everything with another string? So for example the output should be:
This product is 10 cm long
I think you'll need to pass a delegate to the regex for that.
Regex theRegex = new Regex(#"\[MM\](\d+)\[/MM\]");
text = theRegex.Replace(text, delegate(Match thisMatch)
{
int mmLength = Convert.ToInt32(thisMatch.Groups[1].Value);
int cmLength = mmLength / 10;
return cmLength.ToString() + "cm";
});
Using RegexDesigner.NET:
using System.Text.RegularExpressions;
// Regex Replace code for C#
void ReplaceRegex()
{
// Regex search and replace
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"\[MM\](?<value>.*)\[\/MM\]", options);
string input = #"[MM]1000[/MM]";
string replacement = #"10 cm";
string result = regex.Replace(input, replacement);
// TODO: Do something with result
System.Windows.Forms.MessageBox.Show(result, "Replace");
}
Or if you want the orginal text back in the replacement:
Regex regex = new Regex(#"\[MM\](?<theText>.*)\[\/MM\]", options);
string replacement = #"${theText} cm";
A regex like this
\[(\w+)\](\d+)\[\/\w+\]
will find and collect the units (like MM) and the values (like 1000). That would at least allow you to use the pairs of parts intelligently to do the conversion. You could then put the replacement string together, and do a straightforward string replacement, because you know the exact string you're replacing.
I don't think you can do a simple RegEx.Replace, because you don't know the replacement string at the point you do the search.
Regex rex = new Regex(#"\[MM\]([0-9]+)\[\/MM\]");
string s = "This product is [MM]1000[/MM] long";
MatchCollection mc = rex.Matches(s);
Will match only integers.
mc[n].Groups[1].Value;
will then give the numeric part of nth match.

Regex to match a string after colon

Input string is something like this: OU=TEST:This001. We need extra "This001". Best in C#.
What about :
/OU=.*?:(.*)/
Here is how it works:
OU= // Must contain OU=
. // Any character
* // Repeated but not mandatory
? // Ungreedy (lazy) (Don't try to match everything)
: // Match the colon
( // Start to capture a group
. // Any character
* // Repeated but not mandatory
) // End of the group
For the / they're delimiters to know where the regex start and where it ends (and for adding options).
The captured group will contain This001.
But it would be faster with a simple Substring().
yourString.Substring(yourString.IndexOf(":")+1);
Resources :
regular-expressions.info
"OU=" smells like you're doing an Active Directory or LDAP search and responding to the results. While regex is a brilliant tool, I just wanted to make sure that you're also aware of the excellent System.DirectoryServices.Protocols classes that were made for parsing, filtering and manipulating just this sort of data.
The SearchResult, SearchResultEntry and DirectoryAttribute in particular would be the friends you might be looking for. I don't doubt that you can regex or substring as cleverly as the next guy but it's also nice to have another good tool in the toolbox.
Have you tried these classes?
A solution without regex:
var str = "OU=TEST:This00:1";
var result = str.Split(new char[] { ':' }, 2)[1];
// result == This00:1
Regex vs Split vs IndexOf
Split
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = str.Split(new char[] { ':' }, 2)[1];
sw.Stop();
// sw.ElapsedTicks == 15
Regex
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = (new Regex(":(.*)", RegexOptions.Compiled)).Match(str).Groups[1];
sw.Stop();
// sw.ElapsedTicks == 7000 (Compiled)
IndexOf
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = str.Substring(str.IndexOf(":") + 1);
sw.Stop();
// sw.ElapsedTicks == 40
Winner: Split
Links
Split
IndexOf
Regex
if the OU=TEST: is your requirement before the string you want to match, use this regex:
(?<=OU\s*=\s*TEST\s*:\s*).*
that regex matches any length of text after the colon, whereas any text before the colon is just a requirement.
You can replace TEST with [A-Za-z]+ to match any text other than TEST, or you can replace TEST with [\w]+ to match any length of any combination of alphabet and numbers.
\s* means it might be any number of whitespaces or nothing in that position, remove it if you don't need such a check.

Categories