I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];
Related
I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");
I'm trying to come up with a regular expression matches the text in bold in all the examples.
Between the string "JZ" and any character before "-"
JZ123456789-301A
JZ134255872-22013
Between the string "JZ" and the last character
JZ123456789D
I have tried the following but it only works for the first example
(?<=JZ).*(?=-)
You can use (?<=JZ)[0-9]+, presuming the desired text will always be numeric.
Try it out here
You may use
JZ([^-]*)(?:-|.$)
and grab Group 1 value. See the regex demo.
Details
JZ - a literal substring
([^-]*) - Capturing group 1: zero or more chars other than -
(?:-|.$) - a non-capturing group matching either - or any char at the end of the string
C# code:
var m = Regex.Match(s, #"JZ([^-]*)(?:-|.$)");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If, for some reason, you need to obtain the required value as a whole match, use lookarounds:
(?<=JZ)[^-]*(?=-|.$)
See this regex variation demo. Use m.Value in the code above to grab the value.
A one-line answer without regex:
string s,r;
// if your string always starts with JZ
s = "JZ123456789-301A";
r = string.Concat(s.Substring(2).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
// if your string starts with anything
s = "A12JZ123456789-301A";
r = string.Concat(s.Substring(s.IndexOf("JZ")).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
Basically, we remove everything before and including the delimiter "JZ", then we take each char while they are digit. The Concat is use to transform the IEnumerable<char> to a string. I think it is easier to read.
Try it online
I have a string
string str = "I am fine. How are you? You need exactly 4 pieces of sandwiches. Your ADAST Count is 5. Okay thank you ";
What I want is, get the ADAST count value. For the above example, it is 5.
The problem here is, the is after the ADAST Count. It can be is or =. But there will the two words ADAST Count.
What I have tried is
var resultString = Regex.Match(str, #"ADAST\s+count\s+is\s+\d+", RegexOptions.IgnoreCase).Value;
var number = Regex.Match(resultString, #"\d+").Value;
How can I write the pattern which will search is or = ?
You may use
ADAST\s+count\s+(?:is|=)\s+(\d+)
See the regex demo
Note that (?:is|=) is a non-capturing group (i.e. it is used to only group alternations without pushing these submatches on to the capture stack for further retrieval) and | is an alternation operator.
Details:
ADAST - a literal string
\s+ - 1 or more whitespaces
count - a literal string
\s+ - 1 or more whitespaces
(?:is|=) - either is or =
\s+ - 1 or more whitespaces
(\d+) - Group 1 capturing one or more digits
C#:
var m = Regex.Match(s, #"ADAST\s+count\s+(?:is|=)\s+(\d+)", RegexOptions.IgnoreCase);
if (m.Success) {
Console.Write(m.Groups[1].Value);
}
I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/
I can't understand how to solve the following problem:
I have input string "aaaabaa" and I'm trying to search for string "aa" (I'm looking for positions of characters)
Expected result is
0 1 2 5
aa aabaa
a aa abaa
aa aa baa
aaaab aa
This problem is already solved by me using another approach (non-RegEx).
But I need a RegEx I'm new to RegEx so google-search can't help me really.
Any help appreciated! Thanks!
P.S.
I've tried to use (aa)* and "\b(\w+(aa))*\w+" but those expressions are wrong
You can solve this by using a lookahead
a(?=a)
will find every "a" that is followed by another "a".
If you want to do this more generally
(\p{L})(?=\1)
This will find every character that is followed by the same character. Every found letter is stored in a capturing group (because of the brackets around), this capturing group is then reused by the positive lookahead assertion (the (?=...)) by using \1 (in \1 there is the matches character stored)
\p{L} is a unicode code point with the category "letter"
Code
String text = "aaaabaa";
Regex reg = new Regex(#"(\p{L})(?=\1)");
MatchCollection result = reg.Matches(text);
foreach (Match item in result) {
Console.WriteLine(item.Index);
}
Output
0
1
2
5
The following code should work with any regular expression without having to change the actual expression:
Regex rx = new Regex("(a)\1"); // or any other word you're looking for.
int position = 0;
string text = "aaaaabbbbccccaaa";
int textLength = text.Length;
Match m = rx.Match(text, position);
while (m != null && m.Success)
{
Console.WriteLine(m.Index);
if (m.Index <= textLength)
{
m = rx.Match(text, m.Index + 1);
}
else
{
m = null;
}
}
Console.ReadKey();
It uses the option to change the start index of a regex search for each consecutive search. The actual problem comes from the fact that the Regex engine, by default, will always continue searching after the previous match. So it will never find a possible match within another match, unless you instruct it to by using a Look ahead construction or by manually setting the start index.
Another, relatively easy, solution is to just stick the whole expression in a forward look ahead:
string expression = "(a)\1"
Regex rx2 = new Regex("(?=" + expression + ")");
MatchCollection ms = rx2.Matches(text);
var indexes = ms.Cast<Match>().Select(match => match.Index);
That way the engine will automatically advance the index by one for every match it finds.
From the docs:
When a match attempt is repeated by calling the NextMatch method, the regular expression engine gives empty matches special treatment. Usually, NextMatch begins the search for the next match exactly where the previous match left off. However, after an empty match, the NextMatch method advances by one character before trying the next match. This behavior guarantees that the regular expression engine will progress through the string. Otherwise, because an empty match does not result in any forward movement, the next match would start in exactly the same place as the previous match, and it would match the same empty string repeatedly.
Try this:
How can I find repeated characters with a regex in Java?
It is in java, but the regex and non-regex way is there. C# Regex is very similar to the Java way.