RegEx ignoring part of string to extract out text - c#

I have the following string:
#delimabc#delim#delim123#delim#delim456#delim
and I need to write a .Net RegEx that finds 3 matches in this example (but assume the number of matches will be variable:
abc
123
456
How can I write a RegEx so that the expression only matches the first and second #delim, and then the third and fourth and so on?
The following will of course capture from the first to the last instance of the #delim string.
#delim(.+)+#delim

You could use look behind like:
(?<=#delim)\w+
(?<=#delim) is using a Positive Lookbehind which will match the characters #delim literally (case sensitive)
while \w+ will match any word character from [a-zA-Z0-9_]. To include or exclude characters you could replace \w by [a-zA-Z0-9_] and include the new characters or remove those that should not be evaluated in your expression.
Online Demo
Here is .NET Online Demo:
.NET Online Demo
VB.NET version
Dim sampleInput="#delimabc#delim#delim123#delim#delim456#delim"
Dim results = Regex.Matches(sampleInput,"(?<=#delim)\w+")
For Each item As Group In results
Console.WriteLine("Line: {0}", item)
Next
C# Version
var sampleInput = "#delimabc#delim#delim123#delim#delim456#delim";
var results = Regex.Matches(sampleInput, "(?<=#delim)\\w+");
foreach (Group item in results) {
Console.WriteLine("Line: {0}", item);
}
Updated version:
(?<=#delim)[^#].+?(?=#delim|$)

#delim(.+?)#delim
Try this .Set g flag.Just modifed your regex to add ?.Grab the caotures.See demo.
http://regex101.com/r/uH3tP3/1

You can use split on this regex:
(?:#delim)+
RegEx Demo
Alternatively replace given regex pattern by an empty string.

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Get the middle part of a filename using regex

I need a regex that can return up to 10 characters in the middle of a file name.
filename: returns:
msl_0123456789_otherstuff.csv -> 0123456789
msl_test.xml -> test
anythingShort.w1 -> anythingSh
I can capture the beginning and end for removal with the following regex:
Regex.Replace(filename, "(^msl_)|([.][[:alnum:]]{1,3}$)", string.Empty); *
but I also need to have only 10 characters when I am done.
Explanation of the regex above:
(^msl_) - match lines that start with "msl_"
| - or
([.] - match a period
[[:alnum]]{1,3} - followed by 1-3 alphanumeric characters
$) - at the end of the line
Note [[:alnum:]] can't work in a .NET regex, because it does not support POSIX character classes. You may use \w (to match letters, digits, underscores) or [^\W_] (to match letters or digits).
You can use your regex and just keep the first 10 chars in the string:
new string(Regex.Replace(s, #"^msl_|\.\w{1,3}$","").Take(10).ToArray())
See the C# demo online:
var strings = new List<string> { "msl_0123456789_otherstuff.csv", "msl_test.xml", "anythingShort.w1" };
foreach (var s in strings)
{
Console.WriteLine("{0} => {1}", s, new string(Regex.Replace(s, #"^msl_|\.\w{1,3}$","").Take(10).ToArray()));
}
Output:
msl_0123456789_otherstuff.csv => 0123456789
msl_test.xml => test
anythingShort.w1 => anythingSh
Using replace with the alternation, removes either of the alternatives from the start and the end of the string, but it will also work when the extension is not present and does not take the number of chars into account in the middle.
If the file extension should be present you might use a capturing group and make msl_ optional at the beginning.
Then match 1-10 times a word character except the _ followed by matching optional word characters until the .
^(?:msl_)?([^\W_]{1,10})\w*\.[^\W_]{2,}$
.NET regex demo (Click on the table tab)
A bit broader match could be using \S instead of \w and match until the last dot:
^(?:msl_)?(\S{1,10})\S*\.[^\W_]{2,}$
See another regex demo | C# demo
string[] strings = {"msl_0123456789_otherstuff.csv", "msl_test.xml","anythingShort.w1", "123456testxxxxxxxx"};
string pattern = #"^(?:msl_)?(\S{1,10})\S*\.[^\W_]{2,}$";
foreach (String s in strings) {
Match match = Regex.Match(s, pattern);
if (match.Success)
{
Console.WriteLine(match.Groups[1]);
}
}
Output
0123456789
test
anythingSh

Regex get the text after the match which must be the last occurrence

I want to extract the string after the last occurrence of "cn=" using regex in C# application. So what I need is the string between last occurence of "cn=" and \ character Please note that the source string may contains spaces.
Example:
ou=company\ou=country\ou=site\cn=office\cn=name\ou=pet
Result:
name
So far Ive got (?<=cn=).* for selecting the text after the cn= using positive lookbehind and (?:.(?!cn=))+$ for finding the last occurence but I dont know how to combine it together to get desired result.
You may try using the following regex ...
(?m)(?<=cn=)[\w\s]+(?=\\?(?:ou=)?[\w\s]*$)
see regex demo
C# ( demo )
using System;
using System.Text.RegularExpressions;
public class RegEx
{
public static void Main()
{
string pattern = #"(?m)(?<=cn=)[\w\s]+(?=\\?(?:ou=)?[\w\s]*$)";
string input = #"ou=company\ou=country\ou=site\cn=office\cn=name\ou=pet";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("{0}", m.Value);
}
}
}
You could use a negative lookahead:
cn=(?!.*cn=)([^\\]+)
Take group $1 and see a demo on regex101.com. As full C# code, see a demo on ideone.com.
To only have one group, add another lookaround:
(?<=cn=)(?!.*cn=)([^\\]+)
Another idea by just using a capturing group for getting the desired part.
string pattern = #"^.*cn=(\w+)";
^.*cn= will consume anything from ^ start up to last occurence of cn= (see greed).
(\w+) first group captures one or more word characters. Here is a demo at regex101.
The extracted match will be in m.Groups[1] (see demo).

REGEX help needed in c#

I am very new to reg-ex and i am not sure whats going on with this one.... however my friend gave me this to solve my issue BUT somehow it is not working....
string: department_name:womens AND item_type_keyword:base-layer-underwear
reg-ex: (department_name:([\\w-]+))?(item_type_keyword:([\\w-]+))?
desired output: array OR group
1st element should be: department_name:womens
2nd should be: womens
3rd: item_type_keyword:base-layer-underwear
4th: base-layer-underwear
strings can contain department_name OR item_type_keyword, BUT not mendatory, in any order
C# Code
Regex regex = new Regex(#"(department_name:([\w-]+))?(item_type_keyword:([\w-]+))?");
Match match = regex.Match(query);
if (match.Success)
if (!String.IsNullOrEmpty(match.Groups[4].ToString()))
d1.ItemType = match.Groups[4].ToString();
this C# code only returns string array with 3 element
1: department_name:womens
2: department_name:womens
3: womens
somehow it is duplicating 1st and 2nd element, i dont know why. BUT its not return the other elements that i expect..
can someone help me please...
when i am testing the regex online, it looks fine to me...
http://fiddle.re/crvw1
Thanks
You can use something like this to get the output you have in your question:
string txt = "department_name:womens AND item_type_keyword:base-layer-underwear";
var reg = new Regex(#"(?:department_name|item_type_keyword):([\w-]+)", RegexOptions.IgnoreCase);
var ms = reg.Matches(txt);
ArrayList results = new ArrayList();
foreach (Match match in ms)
{
results.Add(match.Groups[0].Value);
results.Add(match.Groups[1].Value);
}
// results is your final array containing all results
foreach (string elem in results)
{
Console.WriteLine(elem);
}
Prints:
department_name:womens
womens
item_type_keyword:base-layer-underwear
base-layer-underwear
match.Groups[0].Value gives the part that matched the pattern, while match.Groups[1].Value will give the part captured in the pattern.
In your first expression, you have 2 capture groups; hence why you have twice department_name:womens appearing.
Once you get the different elements, you should be able to put them in an array/list for further processing. (Added this part in edit)
The loop then allows you to iterate over each of the matches, which you cannot exactly do with if and .Match() (which is better suited for a single match, while here I'm enabling multiple matches so the order they are matched doesn't matter, or the number of matches).
ideone demo
(?:
department_name # Match department_name
| # Or
item_type_keyword # Match item_type_keyword
)
:
([\w-]+) # Capture \w and - characters
It's better to use the alternation (or logical OR) operator | because we don't know the order of the input string.
(department_name:([\w-]+))|(item_type_keyword:([\w-]+))
DEMO
String input = #"department_name:womens AND item_type_keyword:base-layer-underwear";
Regex rgx = new Regex(#"(?:(department_name:([\w-]+))|(item_type_keyword:([\w-]+)))");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
Console.WriteLine(m.Groups[4].Value);
}
IDEONE
Another idea using a lookahead for capturing and getting all groups in one match:
^(?!$)(?=.*(department_name:([\w-]+))|)(?=.*(item_type_keyword:([\w-]+))|)
as a .NET String
"^(?!$)(?=.*(department_name:([\\w-]+))|)(?=.*(item_type_keyword:([\\w-]+))|)"
test at regexplanet (click on .NET); test at regex101.com
(add m multiline modifier if multiline input: "^(?m)...)
If you use any spliting with And Or , etc that you can use
(department_name:(.*?)) AND (item_type_keyword:(.*?)$)
•1: department_name:womens
•2: womens
•3: item_type_keyword:base-layer-underwear
•4: base-layer-underwear
(?=(department_name:\w+)).*?:([\w-]+)|(?=(item_type_keyword:.*)$).*?:([\w-]+)
Try this.This uses a lookahead to capture then backtrack and again capture.See demo.
http://regex101.com/r/lS5tT3/52

Select only numeric part of a selection in a single regex

Well, I don't know how to explain that exactly, but I have this text:
abc=0;def=2;abc=1;ghi=4;jkl=2
The thing I want to do is select abc=0 and abc=1 but excluding abc part...
My regex is: abc=\d+, but it includes abc part...
I readed something about this, and the answer was this: (?!abc=)\d+ but It select all the numbers inside the text...
So, can somebody help me with this?
Thanks in advance.
If your language supports \K then you could use the below regex to matche the number which was just after to the string abc=,
abc=\K\d+
DEMO
OR
use a positive look-behind if your language didn't support \K,
(?<=abc=)\d+
DEMO
C# code would be,
{
string str = "abc=0;def=2;abc=1;ghi=4;jkl=2";
Regex rgx = new Regex(#"(?<=abc=)\d+");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Value);
}
IDEONE
Explanation:
(?<=abc=) Positive lookbehind which actually sets the matching marker just after to the string abc=.
\d+ Matches one or more digits.
You don't need a lookaround assertion here. You can simply use a capturing group to capture the matched context that you want and refer back to the matched group using the Match.Groups Property.
abc=(\d+)
Example:
string s = "abc=0;def=2;abc=1;ghi=4;jkl=2";
foreach (Match m in Regex.Matches(s, #"abc=(\d+)"))
Console.WriteLine(m.Groups[1].Value);
Output
0
1

Categories