I'm searching for a regular expression that will extract 0263563
from
;010263563=2119?
and 0267829
from
%00000026782904?;010267829=4119?
(Must be the same regular expression).
Start at the 3rd character after the semicolon and take 7 characters.
;..(\d{7})
or more general:
;..(.{7})
Or according to comment : "To clarify characters to take are digits"
;\d\d(\d{7})
Well, you need this:
;\d{2}(\d+)
The First group will contain the number you want.
;[\S]{2}([\S]{7})
Since you said characters and not numbers, but it would work either way
For rules that are that precise (start 3 chars after ;, then next 7), you could use a plain substring:
string s = "%00000026782904?;010267829=4119?";
var pos = s.IndexOf(';');
var number = s.Substring(pos+3, 7);
And of course, test whether that IndexOf really found the ;
The below regex would exactly 7 digits which must be preceded by a ; , any two characters.
(?<=;.{2})\d{7}
Code:
String input = #";010263563=2119?
%00000026782904?;010267829=4119?";
Regex rgx = new Regex(#"(?<=;.{2})\d{7}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
Output:
0263563
0267829
Assuming regex is not an absolute requirement, you could use:
var input = "%00000026782904?;010267829=4119?";
// output will be: 0267829
var digits = input.SkipWhile(x => x != ';').Skip(3).Take(7).ToArray();
var output = new string(digits);
Related
I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");
I'm trying to come up with a regular expression matches the text in bold in all the examples.
Between the string "JZ" and any character before "-"
JZ123456789-301A
JZ134255872-22013
Between the string "JZ" and the last character
JZ123456789D
I have tried the following but it only works for the first example
(?<=JZ).*(?=-)
You can use (?<=JZ)[0-9]+, presuming the desired text will always be numeric.
Try it out here
You may use
JZ([^-]*)(?:-|.$)
and grab Group 1 value. See the regex demo.
Details
JZ - a literal substring
([^-]*) - Capturing group 1: zero or more chars other than -
(?:-|.$) - a non-capturing group matching either - or any char at the end of the string
C# code:
var m = Regex.Match(s, #"JZ([^-]*)(?:-|.$)");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If, for some reason, you need to obtain the required value as a whole match, use lookarounds:
(?<=JZ)[^-]*(?=-|.$)
See this regex variation demo. Use m.Value in the code above to grab the value.
A one-line answer without regex:
string s,r;
// if your string always starts with JZ
s = "JZ123456789-301A";
r = string.Concat(s.Substring(2).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
// if your string starts with anything
s = "A12JZ123456789-301A";
r = string.Concat(s.Substring(s.IndexOf("JZ")).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
Basically, we remove everything before and including the delimiter "JZ", then we take each char while they are digit. The Concat is use to transform the IEnumerable<char> to a string. I think it is easier to read.
Try it online
I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];
I need to make a validation in a string with positional information using regex.
Example:
020005254877841100557810AAAAAA841158891BBBB
I need to get match position 5 until 10 and has to be only numbers.
How can I do this using RegEx?
hmm does it really have to be regex?
I would have done like this.
var myString = "020005254877841100557810AAAAAA841158891BBBB";
var isValid = myString.Substring(4, 5).All(Char.IsDigit);
If you absolutely HAVE to use regex, here it is.... (Though I think you should go with something like Jonas W's answer).
Match m = Regex.Match(myString, "^.{4}\d{5}.*");
if(m.Success){
//do stuff
}
The regex means, "from the beginning of the string (^), match 4 of any character (.{4}), then five digits, (\d{5}), then however many of any other characters (.*)"
If you really feel you must use a regex this will be the one:
"^....\d{5}"
You can also do multiple checks like this:
"^....(\d{5}|\D{5})
That one will match all numbers or all non-digit characters, but not a mix or anything with whitespace.
To build on what FailedDev mentioned.
string myString = "020005254877841100557810AAAAAA841158891BBBB";
myString.Substring(5, 5); //will get the substring at index 5 to 10.
double Num;
bool isNum = double.TryParse(myString, out Num); //returns true of all numbers
Hope that helps,
I have a text file for processing, which has some numbers. I want JUST text in it, and nothing else. I managed to remove the punctuation marks, but how do I remove the numbers? I want this using C# code.
Also, I want to remove words with length greater than 10. How do I do that using Reg Expressions?
You can do this with a regex:
string withNumbers = // string with numbers
string withoutNumbers = Regex.Replace(withNumbers, "[0-9]", "");
Use this regex to remove words with more than 10 characters:
[\w]{10, 100}
100 defines the max length to match. I don't know if there is a quantifier for min length...
Only letters and nothing else (because I see you also want to remove the punctuation marks)
Regex.IsMatch(input, #"^[a-zA-Z]+$");
You can also use string.Join:
string s = "asdasdad34534t3sdf43534";
s = string.Join(null, System.Text.RegularExpressions.Regex.Split(s, "[\\d]"));
The Regex.Replace method should do the trick.
// regex to match any digit
var regex = new Regex("\d");
// replace all matches in input with empty string
var output = regex.Replace(input, String.Empty);