Regex - Substring based on condition - c#

I am trying to write a regex expression to use it in C#
The use of the regex is to get a substring of the input according to the input size
The regex expression target
If the input size less than 13 then get the full input
Else if the input size is greater than 25 then get from the input the substring from index 3 till index 16 (so that I skip the first three chars)
Here is what I came with till now
(?(?=.{25,}).{3}(.{13})|(?(?=.{0,13})(.{0,13})))
This is not working since when the input size is greater than 25 the result is not trimming the first three chars
Check it here

Note that a non-regex solution is rather trivial:
public string check(string s)
{
var res = "";
if (s.Length>=25)
res = s.Substring(3,13);
else if (s.Length <= 13)
res = s;
return res;
}
If you want to use a regex, you may use
^(?=.{25,}).{3}(?<res>.{13})|^(?=.{0,13}$)(?<res>.*)
See the regex demo. Compile with RegexOptions.Singleline to support newlines in the input.
Details
^ - start of string
(?=.{25,}) - if there are 25 or more chars after the start of string, match
.{3} - any 3 chars
(?<res>.{13}) - and capture 13 chars into res group
| - or
^(?=.{0,13}$) - make sure there are no more than 0 to 13 chars in the string and then
(?<res>.*) - grab the whole string (if no RegexOptions.Singleline is used, only 1 line will be matched).
Use it as
var res = "";
var m = Regex.Match(s, #"^(?=.{25,}).{3}(?<res>.{13})|^(?=.{0,13}$)(?<res>.*)")
if (m.Success)
{
res = m.Groups["res"].Value;
}
See a C# demo.

Related

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

How to separate numbers from words, chars and any other marks with whitespace in string

I'm trying to separate numbers from words or characters and any other punctuation with whitespace in string wrote them together e.g. string is:
string input = "ok, here is369 and777, and 20k0 10+1.any word.";
and desired output should be:
ok, here is 369 and 777 , and 20 k 0 10 + 1 .any word.
I'm not sure if I'm on right way, but now what I'm trying to do, is to find if string contains numbers and then somehow replace it all with same values but with whitespace between. If it is possible, how can I find all individual numbers (not each digit in number to be clearer), separated or not separated by words or whitespace and attach each found number to value, which can be used for all at once to replace it with same numbers but with spaces on sides. This way it returns only first occurrence of a number in string:
class Program
{
static void Main(string[] args)
{
string input = "here is 369 and 777 and 15 2080 and 579";
string resultString = Regex.Match(input, #"\d+").Value;
Console.WriteLine(resultString);
Console.ReadLine();
}
}
output:
369
but also I'm not sure if I can get all different found number for single replacement value for each. Would be good to find out in which direction to go
If what we need is basically to add spaces around numbers, try this:
string tmp = Regex.Replace(input, #"(?<a>[0-9])(?<b>[^0-9\s])", #"${a} ${b}");
string res = Regex.Replace(tmp, #"(?<a>[^0-9\s])(?<b>[0-9])", #"${a} ${b}");
Previous answer assumed that words, numbers and punctuation should be separated:
string input = "here is369 and777, and 20k0";
var matches = Regex.Matches(input, #"([A-Za-z]+|[0-9]+|\p{P})");
foreach (Match match in matches)
Console.WriteLine("{0}", match.Groups[1].Value);
To construct the required result string in a short way:
string res = string.Join(" ", matches.Cast<Match>().Select(m => m.Groups[1].Value));
You were on the right path. Regex.Match only returns one match and you would have to use .NextMatch() to get the next value that matches your regular expression. Regex.Matches returns every possible match into a MatchCollection that you can then parse with a loop as I did in my example:
string input = "here is 369 and 777 and 15 2080 and 579";
foreach (Match match in Regex.Matches(input, #"\d+"))
{
Console.WriteLine(match.Value);
}
Console.ReadLine();
This Outputs:
369
777
15
2080
579
This provides the desired output:
string input = "ok, here is369 and777, and 20k0 10+1.any word.";
var matches = Regex.Matches(input, #"([\D]+|[0-9]+)");
foreach (Match match in matches)
Console.Write("{0} ", match.Groups[0].Value);
[\D] will match anything non digit. Please note space after {0}.

Regex to extract specific numbers in a String

string temp = "12345&refere?X=Assess9677125?Folder_id=35478";
I need to extract the number 12345 alone and I don't need the numbers 9677125 and 35478.
What regex can I use?
Here is the regex for extracting 5 digit number in the beginning of the string:
^(\d{5})&
If length is arbitrary:
^(\d+)&
If termination pattern is not always &:
^(\d+)[^\d]
Based on the Sayse's comment you can simply rewrite as:
^(\d+)
and in case of the termination is some number(for instance 999) then:
^(\d+)999
You don't need regex if you only want to extract the first number:
string temp = "12345&refere?X=Assess9677125?Folder_id=35478";
int first = Int32.Parse(String.Join("", temp.TakeWhile(c => Char.IsDigit(c))));
Console.WriteLine(first); // 12345
If the number you want is always at the beginning of the string and terminated by an ampersand (&) you don't need a regex at all. Just split the string on the ampersand and get the first element of the resulting array:
String temp = "12345&refere?X=Assess9677125?Folder_id=35478";
var splitArray = String.Split('&', temp);
var number = splitArray[0]; // returns 12345
Alternatively, you can get the index of the ampersand and substring up to that point:
String temp = "12345&refere?X=Assess9677125?Folder_id=35478";
var ampersandIndex = temp.IndexOf("&");
var number = temp.SubString(0, ampersandIndex); // returns 12345
From what you haven given us this is fairly simple:
var regex = new Regex(#"^(?<number>\d+)&");
var match = regex.Match("12345&refere?X=Assess9677125?Folder_id=35478");
if (match.Success)
{
var number = int.Parse(match.Groups["number"].Value);
}
Edit: Of course you can replace the argument of new Regex with any of the combinations Giorgi has given.

Get a specific part from a string based on a pattern

I have a string in this format:
ABCD_EFDG20120700.0.xml
This has a pattern which has three parts to it:
First is the set of chars before the '_', the 'ABCD'
Second are the set of chars 'EFDG' after the '_'
Third are the remaining 20120700.0.xml
I can split the original string and get the number(s) from the second element in the split result using this switch:
\d+
Match m = Regex.Match(splitname[1], "\\d+");
That returns only '20120700'. But I need '20120700.0'.
How do I get the required string?
You can extend your regex to look for any number of digits, then period and then any number of digits once again:
Match m = Regex.Match(splitname[1], "\\d+\\.\\d+");
Although with such regular expression you don't even need to split the string:
string s = "ABCD_EFDG20120700.0.xml";
Match m = Regex.Match(s, "\\d+\\.\\d+");
string result = m.Value; // result is 20120700.0
I can suggest you to use one regex operation for all you want like this:
var rgx = new Regex(#"^([^_]+)_([^\d.]+)([\d.]+\d+)\.(.*)$");
var matches = rgx.Matches(input);
if (matches.Count > 0)
{
Console.WriteLine("{0}", matches[0].Groups[0]); // All input string
Console.WriteLine("{0}", matches[0].Groups[1]); // ABCD
Console.WriteLine("{0}", matches[0].Groups[2]); // EFGH
Console.WriteLine("{0}", matches[0].Groups[3]); // 20120700.0
Console.WriteLine("{0}", matches[0].Groups[4]); // xml
}

Replace using Regular Expression - fixed digit location

I would like to replace from a number of 16 digits, it's 5th to 10th digit.
How can that be achieved with a regular expression (C#)?
The way to do it is to capture in the inner and outer portions separately, like this:
// Split into 2 groups of 5 digits and 1 of 6
string regex = "(\\d{5})(\\d{5})(\\d{6})";
// Insert ABCDEF in the middle of
// match 1 and match 3
string replaceRegex = "${1}ABCDE${3}";
string testString = "1234567890999999";
string result = Regex.Replace(testString, regex, replaceRegex);
// result = '12345ABCDE999999'
Why use a regular expression? If by "number of 16 digits", you mean a 16 character long string representation of a number, then you'd probably be better off just using substring.
string input = "0000567890000000";
var output = input.Substring(0, 4) + "222222" + input.Substring(10, 6);
Or did you mean you want to swap the 5th and 10th digits? Your question isn't very clear.
Use the regular expression (?<=^\d{4})\d{6}(?=\d{6}$) to achieve it without capture groups.
It looks for 6 consecutive digits (5th to 10th inclusively) that are preceded by the first 4 digits and the last 6 digits of the string.
Regex.Replace("1234567890123456", #"(?<=^\d{4})\d{6}(?=\d{6}$)", "replacement");
Got it...
by creating 3 capturing groups:
([\d]{5})([\d]{5})([\d]{6})
keep capturing group1 and 3 and replace group2 with stars (or whatever)
$1*****$3
C# code below
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"([\d]{5})([\d]{5})([\d]{6})", "$1*****$2", RegexOptions.Singleline);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Categories