Hi I am newbie in RegEx operations. I have a text like
[JUNCTIONS]
;ID Elev Demand Pattern
3 50 100 ;
4 50 30 ;
5 50 20 ;
6 40 20 ;
7 50 5 ;
8 30 5 ;
9 30 5 ;
2 50 80 ;
10 50 70 ;
11 50 30 ;
12 50 52 ;
13 50 40 ;
14 50 40 ;
15 50 10 ;
16 50 10 ;
17 50 10 ;
18 0 0 ;
19 0 0 ;
[RESERVOIRS]
;ID Head Pattern
1 100 ;
[TANKS]
I want to create a pattern and output the text between [JUNCTIONS] and [RESERVOIRS] then [RESERVOIRS] to [TANKS] then so on. [XXXX] is not known to me. I want to get text inside [XXX] to [XXX]. How can i do that?
Here is the regex:
(?=(\[\S+\].*?\[\S+\]))
or
(?=(\[(?:JUNCTIONS|RESERVOIRS)\].*?\[(?:RESERVOIRS|TANKS)\]))
Assuming you want to handle all the [...] things from your input.
Note: Use the make sure you are handling multiple line regex matching from your c#. And don't for get to escape the \ character if you need.
Here is some c# code to do the match, and get the results.
Be sure to add error checking, for example to make sure that the match actually worked.
Note the Singleline flag - this lets the dot (.) match all characters, including newlines. You'll also probably need to cleanup and trim the output, to remove any trailing newlines, etc.
MatchCollection matches = Regex.Matches(test, #"^\[JUNCTIONS\](.*)\[RESERVOIRS\](.*)\[TANKS\](.*)$", RegexOptions.Singleline);
GroupCollection groups = matches[0].Groups;
// JUNCTIONS text
Console.WriteLine(groups[1]);
// RESERVOIRS text
Console.WriteLine(groups[2]);
Edit - Updated to match OP's changes
If you want to match an unspecified number of matches, its a little trickier. This regex will match a [TEXT] block and anything that comes after it, until it its a [ character. The way to use this regex is to loop over the MatchCollection for each region, and use .groups[1] for the text and .groups[2] for the body.
MatchCollection matches =
Regex.Matches(test, #"\[([\w+]+)\]([^\[]+)?", RegexOptions.Singleline);
// for each block / section of the document
foreach(Match match in matches){
GroupCollection groups = match.Groups;
// [TEXT] part will be here
Console.WriteLine(groups[1]);
// The rest will be here
Console.WriteLine(groups[2]);
}
Why use a regex?
Assuming you can read this input text one line at a time, it will probably be quicker and easier to just loop over the lines, and output those you need. Some variant of:
Update:
In response to you comment below; you can probably use this to skip any lines with [something] in them, and print out the rest:
// Pattern: Any instance of [] with one or more characters of between them:
var pattern = #"\[.+\]";
while((line = file.ReadLine()) != null)
{
if(!Regex.IsMatch(line, pattern)) // Skip lines that match
{
Console.WriteLine(line);
}
}
Related
I have the following RegEx pattern:
#"^((\(?\+45\)?)?)(\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2})$/gm"
It's supposed to replace strings such as:
10203040
10 20 30 40
+45 10 20 30 40
+4510203040
This is my replace method:
var text = "10 10 10 10";
text = Regex.Replace(text, #"^((\(?\+45\)?)?)(\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2})$/gm", "****");
The above code returns "****" which is correct.
var text = "10 10 10 10 10203040";
text = Regex.Replace(text, #"^((\(?\+45\)?)?)(\s?\d{2}\s?\d{2}\s?\d{2}\s?\d{2})$/gm", "****");
The above code doesn't replace any text and just returns the original string - I need this code to return "**** ****", as there's two occurences of the numbers I need to match.
I hope someone can help me - thanks in advance :)
You've anchored your regex to start (^) and end ($) of line, so they will only perform replacements if the matched string is the entire line. Remove the anchors and it should work as expected.
Text from txt file:
10 25
32 44
56 88
102 127
135 145
...
If it is a first line place 0, rest use the last number as a first in new line. Is it possible to do it or I need to loop through lines after regex parse.
0 10 25
25 32 44
44 56 88
88 102 127
127 135 145
(?<Middle>\d+)\s(?<End>\d+) //(?<Start>...)
I would advise against using regex for readability reasons but this will work:
var input = ReadFromFile();
var regex = #"(?<num>\d*)[\n\r]+";
var replace = "${num}\n${num} ";
var output = Regex.Replace(input, regex, replace);
That will do everything apart from the first 0.
Note that a regex approach does not sound quite good for a task like this. It can be used for small input strings, for larger ones, it is recommended that you write some more logic and parse text line by line.
So, more from academic interest, here is a regex solution showing how to replace with different replacement patterns based on whether the line matched is first or not:
var pat = #"(?m)(?:(\A)|^(?!\A))(.*\b\s+(\d+)\r?\n)";
var s = "10 25\n32 44\n56 88\n102 127\n135 14510 25\n32 44\n56 88\n102 127\n135 145";
var res = Regex.Replace(s, pat, m => m.Groups[1].Success ?
$"0 {m.Groups[2].Value}{m.Groups[3].Value} " : $"{m.Groups[2].Value}{m.Groups[3].Value} ");
Result of the C# demo:
0 10 25
25 32 44
44 56 88
88 102 127
127 135 14510 25
25 32 44
44 56 88
88 102 127
127 135 145
Note the \n line breaks are hardcoded, but it is still just an illustration of regex capabilities.
Pattern details
(?m) - an inline RegexOptions.Multiline modifier
(?:(\A)|^(?!\A)) - a non-capturing group matching either
(\A) - start of string capturing it to Group 1
| - or
^(?!\A) - start of a line (but not string due to the (?!\A) negative lookahead)
(.*\b\s+(\d+)\r?\n) - Group 2:
.*\b - 0+ chars other than newline up to the last word boundary on a line followed with...
\s+ - 1+ whitespaces (may be replaced with [\p{Zs}\t]+ to only match horizontal whitespaces)
(\d+) - Group 3: one or more digits
\r?\n - a CRLF or LF line break.
The replacement logic is inside the match evaluator: if Group 1 matched (m.Groups[1].Success ?) replace with 0 and Group 2 + Group 3 values + space. Else, replace with Group 2 + Group 3 + space.
With C#.
var lines = File.ReadLines(fileName);
var st = new StringBuilder(); //or StreamWriter directly to disk ect.
var last = "0";
foreach (var line in lines)
{
st.AppendLine(last + " " + line );
last = line.Split().LastOrDefault();
}
var lines2 = st.ToString();
Why doesn't this regex pattern parse the string "Season 02 Episode 01" properly?
For example, this is not a match:
var fileName = "Its Always Sunny in Philadelphia Season 02 Episode 01 - Charlie Gets Crippled.avi"
// Regex explanation:
// Starts with "S" and can contain more letters, can continue with space, then contains two numbers.
// Then starts with "E" again and can contain more letters, can continue with space, then contains two numbers.
var pattern = #"S\w?\s?(\d\d)\s?E\w?\s?(\d\d)";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(fileName);
Use * instead of ?
? is for 0 or 1 time. * is for 0 or more times.
Starts with "S" and can contain more letters [...]
You mean +, not ?.
var pattern = #"S\w+\s+(\d+)\s+E\w+\s+(\d+)";
Note that this regex is pretty unspecific. Watch out for false positives. I'd recommend to make the expression more specific.
I was wondering if this was possible using Regex. I would like to exclude all letters (upper and lowercase) and the following 14 characters ! “ & ‘ * + , : ; < = > # _
The problem is the equal sign. In the string (which must either be 20 or 37 characters long) that I will be validating, that equal sign must either be in the 17th or 20th position because it is used as a separator in those positions. So it must check if that equal sign is anywhere other than in the 16th or 20th position (but not both). The following are some examples:
pass: 1234567890123456=12345678901234567890
pass: 1234567890123456789=12345678901234567
don't pass: 123456=890123456=12345678901234567
don't pass: 1234567890123456=12=45678901234567890
I am having a hard time with the part that I must allow the equal sign in those two positions and not sure if that's possible with Regex. Adding an if-statement would require substantial code change and regression testing because this function that stores this regex currently is used by many different plug-ins.
I'll go for
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Explanations :
1) Start with your allowed char :
^[^a-zA-Z!"&'*+,:;<=>#_]$
[^xxx] means all except xxx, where a-z is lower case letters A-Z upper case ones, and your others chars
2) Repeat it 16 times, then =, then others allowed chars ("allowed char" followed by '+' to tell that is repeated 1 to n times)
^[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+$
At this point you'll match your first case, when = is at position 17.
3) Your second case will be
^[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*$
with the last part followed by * instead of + to handle strings that are only 20 chars long and that ends with =
4) just use the (case1|case2) to handle both
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Tested OK with notepad++ and your examples
Edit to match exactly 20 or 37 chars
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{3}|[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{20}|[^a-zA-Z!"&'*+,:;<=>#_]{19}=|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]{17})$
More readable view with explanation :
`
^(
// 20 chars with = at 17
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 16 allowed chars
= // followed by =
[^a-zA-Z!"&'*+,:;<=>#_]{3} // folowed by 3 allowed chars
|
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 37 chars with = at 17
=
[^a-zA-Z!"&'*+,:;<=>#_]{20}
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 20 chars with = at 20
=
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 37 chars with = at 20
=
[^a-zA-Z!"&'*+,:;<=>#_]{17}
)$
`
I've omitted other symbols matching other symbols and just placed the [^=], you should have there code for all allowed symbols except =
var r = new Regex(#"^(([0-9\:\<\>]{16,16}=(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))|(^[^=]{19,19}=(([0-9\:\<\>]{17}))?))$");
/*
#"^(
([0-9\:\<\>]{16,16}
=
(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))
|
(^[^=]{19,19}
=
(([0-9\:\<\>]{17}))?)
)$"
*/
using {length,length} you can also specify the overall string length. The $ in the end and ^ in the beginning are important also.
I have a input which can be the following (Either one of these three):
1-8…in other words 1, 2,3,4,5,6,7,8
A-Z….in other words, A, B, C, D etc
01-98…in other words, 01,02,03,04 etc
I came up with this regex but it's not working not sure why:
#"[A-Z0-9][1-8]-
I am thinking to check for corner cases like just 0 and just 9 after regex check because regex check isn't validating this
Not sure I understand, but how about:
^(?:[A-Z]|[1-8]|0[1-9]|[1-8][0-9]|9[0-8])$
Explanation:
(?:...) is a group without capture.
| introduces an alternative
[A-Z] means one letter
[1-8] one digit between 1 and 8
0[1-9] a 0 followed by a digit between 1 and 9
[1-8][0-9] a digit between 1 and 8 followed by a digit between 1 and 9
9[0-8] 9 followed by a digit between 0 and 8
May be it is, depending on your real needs:
^(?:[A-Z]|[0-9]?[1-8])$
I think you may use this pattern
#"^([1-8A-Z]|0[1-9]|[1-9]{2})$"
What about ^([1-8]|([0-9][0-9])|[A-Z])$ ?
That will give a match for
A
8 (but not 9)
09
[1-8]{0,1}[A-Z]{0,1}\d{1,2}
matches all of the following
8A8 8B9 9 0 00
You can use following pattern:
^[1-8][A-Z](?:0[1-9]|[1-8][0-9]|9[0-8])$
^[1-8] - input should start with number 1-8
[A-Z] - then should be single letter A-Z
(0[1-9]|[1-8][0-9]|9[0-8])$ and it should end with two numbers which are 01-19 or 10-89 or 90-98
Test:
string pattern = #"^[1-8][A-Z](0[1-9]|[1-8][0-9]|9[0-8])$";
Regex regex = new Regex(pattern);
string[] valid = { "1A01", "8Z98" };
bool allMatch = valid.All(regex.IsMatch);
string[] invalid = { "0A01", "11A01", "1A1", "1A99", "1A00", "101", "1AA01" };
bool allNotMatch = !invalid.Any(regex.IsMatch);