c# regular expressions find and extract number of giving length - c#

I have a string such as:
"12/11/2015: Liefertermin 71994 : 30.11.2015 -> 27.11.2015"
And I want to extract the substring 71994, which will always be a number of 5 digits
I have tried the following with no success:
private string FindDispo_InInfo()
{
Regex pattern = new Regex("^[0-9]{5,5}$");
Match match = pattern.Match(textBox1.Text);
string stDispo = match.Groups[0].Value;
return stDispo;
}

Replace the anchors ^ and $ with a word boundary \b and use a verbatim string literal:
Regex pattern = new Regex(#"\b[0-9]{5}\b");
And you can access the value using match.Value:
string stDispo = match.Value;
Fixed code:
private static string FindDispo_InInfo(string text)
{
Regex pattern = new Regex(#"\b[0-9]{5}\b");
Match match = pattern.Match(text);
if (match.Success)
return match.Value;
else
return string.Empty;
}
And here is a C# demo:
Console.WriteLine(FindDispo_InInfo("12/11/2015: Liefertermin 71994 : 30.11.2015 -> 27.11.2015"));
// => 71994
However, creating a regex object inside the method might turn out inefficient. Better declare it as a static private read-only field, and then use inside the method as many times as necessary.

What you need is (\d{5}) which will capture a number of length 5

Related

Regex to parse URL from an excel formula

I have a formula in excel which upon reading from C# code looks like this
"=HYPERLINK(CONCATENATE(\"https://abc.efghi.rtyui.com/#/wqeqwq/\",#REF!,\"/asdasd\"), \"View asdas\")"
I want to use regex to fetch the URL from this string, i.e.
https://abc.efghi.rtyui.com/#/wqeqwq/#REF!/asdasd
The url can be different but the format of the formula will remain the same.
"=HYPERLINK(CONCATENATE(\"{SOME_STRING}\",#REF!,\"{SOME_STRING}\"), \"View asdas\")"
Try it like this:
(?<=HYPERLINK\(CONCATENATE\(")[^"]+
Demo
The positive lookbehind allows us to skip part in-front of the URL from the full match.
If you have an arbitrary number of whitespace in-between add some \s*, e.g. see this example that also shows the escaped = at the beginning of the string.
Sample Code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<=HYPERLINK\(CONCATENATE\("")[^""]+";
string input = #"=HYPERLINK(CONCATENATE(""https://abc.efghi.rtyui.com/#/wqeqwq/"",#REF!,""/asdasd""), ""View asdas"")";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Addendum: Here is another technique that uses capturing groups and regex Replace to extract the resulting URL string (after CONCATENATE would have happened):
^\=HYPERLINK\(CONCATENATE\("([^"]+)",([^,]+),"([^"]+)".*$
Demo2
string pattern = #"^\=HYPERLINK\(CONCATENATE\(""([^""]+)"",([^,]+),""([^""]+)"".*$";
string substitution = #"$1$2$3";
string input = #"=HYPERLINK(CONCATENATE(""https://abc.efghi.rtyui.com/#/wqeqwq/"",#REF!,""/asdasd""), ""View asdas"")";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution, 1);
You can extract the URL from the formula using capturing groups in regular expression as given below:
string inputString = "=HYPERLINK(CONCATENATE(\"https://abc.efghi.rtyui.com/#/wqeqwq/\",#REF!,\"/asdasd\"), \"View asdas\")";
string regex = "CONCATENATE\\(\"([\\S]+)\",#REF!,\"([\\S]+)\"\\)";
Regex substringRegex = new Regex(regex, RegexOptions.IgnoreCase);
Match substringMatch = substringRegex.Match(inputString);
if (substringMatch.Success)
{
string url = substringMatch.Groups[1].Value + "#REF!" + substringMatch.Groups[2].Value;
}
I have defined two capturing groups in my regular expression. One for extracting part of the URL before #REF! and the other for extracting part of the URL after #REF!. Then I am concatenating all the extracted parts with #REF! to get the final URL.

How to remove substring after occurence of certain characters in a string?

I have the requirement as follows:
input => "Employee.Addresses[].Address.City"
output => "Empolyee.Addresses[].City"
(Address is removed which is present after [].)
input => "Employee.Addresses[].Address.Lanes[].Lane.Name"
output => "Employee.Addresses[].Lanes[].Name"
(Address is removed which is present after []. and Lane is removed which is present after [].)
How to do this in C#?
private static IEnumerable<string> Filter(string input)
{
var subWords = input.Split('.');
bool skip = false;
foreach (var word in subWords)
{
if (skip)
{
skip = false;
}
else
{
yield return word;
}
if (word.EndsWith("[]"))
{
skip = true;
}
}
}
And now you use it like this:
var filtered = string.Join(".", Filter(input));
How about a regular expression?
Regex rgx = new Regex(#"(?<=\[\])\..+?(?=\.)");
string output = rgx.Replace(input, String.Empty);
Explanation:
(?<=\[\]) //positive lookbehind for the brackets
\. //match literal period
.+? //match any character at least once, but as few times as possible
(?=\.) //positive lookahead for a literal period
Your description of what you need is lacking. Please correct me if I have understood it incorrectly.
You need to find the pattern "[]." and then remove everything after this pattern until the next dot .
If this is the case, I believe using a Regular Expression could solve the problem easily.
So, the pattern "[]." can be written in a regular expression as
"\[\]\."
Then you need to find everything after this pattern until the next dot: ".*?\." (The .*? means every character as many times as possible but in a non-greedy way, i.e. stopping at the first dot it finds).
So, the whole pattern would be:
var regexPattern = #"\[\]\..*?\.";
And you want to replace all matches of this pattern with "[]." (i.e. removing what was match after the brackets until the dot).
So you call the Replace method in the Regex class:
var result = Regex.Replace(input, regexPattern, "[].");

search string for everything before a set of characters in C#

I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.
You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}
You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it
You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead
This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.
^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()
Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}
You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple

C# regular expression to find custom markers and take content

I have a string:
productDescription
In it are some custom tags such as:
[MM][/MM]
For example the string might read:
This product is [MM]1000[/MM] long
Using a regular expression how can I find those MM tags, take the content of them and replace everything with another string? So for example the output should be:
This product is 10 cm long
I think you'll need to pass a delegate to the regex for that.
Regex theRegex = new Regex(#"\[MM\](\d+)\[/MM\]");
text = theRegex.Replace(text, delegate(Match thisMatch)
{
int mmLength = Convert.ToInt32(thisMatch.Groups[1].Value);
int cmLength = mmLength / 10;
return cmLength.ToString() + "cm";
});
Using RegexDesigner.NET:
using System.Text.RegularExpressions;
// Regex Replace code for C#
void ReplaceRegex()
{
// Regex search and replace
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"\[MM\](?<value>.*)\[\/MM\]", options);
string input = #"[MM]1000[/MM]";
string replacement = #"10 cm";
string result = regex.Replace(input, replacement);
// TODO: Do something with result
System.Windows.Forms.MessageBox.Show(result, "Replace");
}
Or if you want the orginal text back in the replacement:
Regex regex = new Regex(#"\[MM\](?<theText>.*)\[\/MM\]", options);
string replacement = #"${theText} cm";
A regex like this
\[(\w+)\](\d+)\[\/\w+\]
will find and collect the units (like MM) and the values (like 1000). That would at least allow you to use the pairs of parts intelligently to do the conversion. You could then put the replacement string together, and do a straightforward string replacement, because you know the exact string you're replacing.
I don't think you can do a simple RegEx.Replace, because you don't know the replacement string at the point you do the search.
Regex rex = new Regex(#"\[MM\]([0-9]+)\[\/MM\]");
string s = "This product is [MM]1000[/MM] long";
MatchCollection mc = rex.Matches(s);
Will match only integers.
mc[n].Groups[1].Value;
will then give the numeric part of nth match.

Regular expression to use which matches text before .html and after /

With this string
http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html
I need to get sdf-as
with this
hellow-1/yo-sdf.html
I need yo-sdf
This should get you want you need:
Regex re = new Regex(#"/([^/]*)\.html$");
Match match = re.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
Console.WriteLine(match.Groups[1].Value); //Or do whatever you want with the value
This needs using System.Text.RegularExpressions; at the top of the file to work.
There are many ways to do this. The following uses lookarounds to match only the filename portion. It actually allows no / if such is the case:
string[] urls = {
#"http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html",
#"hellow-1/yo-sdf.html",
#"noslash.html",
#"what-is/this.lol",
};
foreach (string url in urls) {
Console.WriteLine("[" + Regex.Match(url, #"(?<=/|^)[^/]*(?=\.html$)") + "]");
}
This prints:
[sdf-as]
[yo-sdf]
[noslash]
[]
How the pattern works
There are 3 parts:
(?<=/|^) : a positive lookbehind to assert that we're preceded by a slash /, or we're at the beginning of the string
[^/]* : match anything but slashes
(?=\.html$): a positive lookahead to assert that we're followed by ".html" (literally on the dot)
References
regular-expressions.info/Lookarounds, Anchors
A non-regex alternative
Knowing regex is good, and it can do wonderful things, but you should always know how to do basic string manipulations without it. Here's a non-regex solution:
static String getFilename(String url, String ext) {
if (url.EndsWith(ext)) {
int k = url.LastIndexOf("/");
return url.Substring(k + 1, url.Length - ext.Length - k - 1);
} else {
return "";
}
}
Then you'd call it as:
getFilename(url, ".html")
API links
String.Substring, EndsWith, and LastIndexOf
Attachments
Source code and output on ideone.com
Try this:
string url = "http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html";
Match match = Regex.Match(url, #"/([^/]+)\.html$");
if (match.Success)
{
string result = match.Groups[1].Value;
Console.WriteLine(result);
}
Result:
sdf-as
However it would be a better idea to use the System.URI class to parse the string so that you correctly handle things like http://example.com/foo.html?redirect=bar.html.
using System.Text.RegularExpressions;
Regex pattern = new Regex(".*\/([a-z\-]+)\.html");
Match match = pattern.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
if (match.Success)
{
Console.WriteLine(match.Value);
}
else
{
Console.WriteLine("Not found :(");
}
This one makes the slash and dot parts optional, and allows the file to have any extension:
new Regex(#"^(.*/)?(?<fileName>[^/]*?)(\.[^/.]*)?$", RegexOptions.ExplicitCapture);
But I still prefer Substring(LastIndexOf(...)) because it is far more readable.

Categories