Regex in C# acting weird - c#

I've encountered a problem while working with regex in C#. Namely, the debugger shows correct(IMO) results but when I try to print the results in my application, they are different(and wrong). Code below:
Match match2 = Regex.Match("048 A Dream Within A Dream (satur) (123|433) K48", "(.*)(\\((.)*?\\))\\s\\((.)*?\\)\\s.*");
string nick = match2.Groups[1].Value;
string name = match2.Groups[0].Value;
Console.WriteLine("nick - '{0}', name - '{1}'", nick, name);
Expected results show up in the debugger, as in following screenshot:
Console shows different(wrong) results:
nick - '048 A Dream Within A Dream ', name - '048 A Dream Within A
Dream (satur) (123|433) K48'
How do I fix it? I want the results to be shown exactly as in debugger.

You're missing the fact that Groups[0] is always meant to represent the whole match. The first capturing group is in Groups[1]. You want:
string nick = match2.Groups[2].Value;
string name = match2.Groups[1].Value;
The reason it's showing what you expected in the debugger is that you're looking at the implementation detail of a field within GroupCollection; when it's asked for a group by number, it returns the match if the requested number is 0, or offsets the number by 1 otherwise.
From the documentation for GroupCollection:
If the match is successful, the first element in the collection contains the Group object that corresponds to the entire match. Each subsequent element represents a captured group, if the regular expression includes capturing groups.

You are looking into _groups field, but it's not exactly what is returned as Groups property:
Change your code to use Groups[1] and Groups[2]:
string nick = match2.Groups[2].Value;
string name = match2.Groups[1].Value;

Related

Regex getting modified while passing from database to browser

I am trying to retrieve a regex expression from database and pass it on to client (Browser). But the regex expression is getting slightly modified in between due to intermidiate parsing and regex expression processing in C#. Can anyone please tell me what I need to do inorder to pass the regex correctly to client. I have the following piece of code
private static readonly Regex defaultObjectPattern = new Regex(#"([^\w\.$\]])(\['?[$\w \s]*'?\][^\.\[])", RegexOptions.Compiled);
var parsedFormula = new StringBuilder(defaultObjectPattern.Replace(rule.Rule, "$1item$2"));
where rule.Rule is
"Error(\"![Item].[XYZ ID].match(/^20[^\s]{4,}$|^$/)\", \"Invaid XYZ
ID\",\"XYZ ID entered is Invalid. Please obtain a valid XYZ ID from
your Supervisor or delete this entry.\", \"XYZ ID\",\"All\")"
By the time above instruction is complete the regex expression in the match method is getting modified as /^20\['^\s']{4,}$|^$/. Single quotes are getting added within the Square brackets.
The exact string I am storing in the database is
'Error("![Item].[XYZ ID].match(/^20[^\s]{4,}$|^$/)", "Invaid XYZ
ID","XYZ ID entered is Invalid. Please obtain a valid XYZ ID from your
Supervisor or delete this entry.", "XYZ ID","All")'
I cannot change the defaultObjectPattern as it is used for lot of other things. But I need to get the regex expression in match method without getting modified (without single quotes getting added).
Thanks in advance for your help.

Regex: Find pagenumber from partial matching urls

As we all know, Regex patterns will make your stomache turn the first time you see them (or 10th time since you never went head first and truly learned it. Quilty.). I'm currently reading upon it, but since I'm on a tight deadline I'll check here if I can get a quicker and better answer/explaination meanwhile.
I have some url to a forum thread, and I want to scan through the html and find the last page for the thread.
So say I have one of the following urls identifying the thread in question:
https://www.somesite.com/forum/thread-93912* (absolute url to the
thread)
/forum/thread-93912 (relative url to the thread)
and I want to get all values (integers) that appear directly (next path) after any of the above "partial" match in the html-document.
So from any of the following hrefs located anywhere in the html-document (the doc is represented as a single string):
https://www.somesite.com/forum/thread-93912/34
https://www.somesite.com/forum/thread-93912/34/morestuffhere/whatevs
/forum/thread-93912/34
/forum/thread-93912/34/somethingheretoo
I want to extract the number 34 (only 34), so I can parse it to int.
EDIT
Okay, to make it simpler:
Say I have all the html in htmlString, and in this string I want to find all numbers x that appear after my inputString /forum/thread-93912.
These all appear in the htmlString, and I want to extract the numbers:
thread-93912/34
thread-93912/14
thread-93912/84
thread-93912/64
thread-93912/4
You don't need regex. Just use System.Uri.Segments
Uri url = new Uri("your url here");
Console.WriteLine(url.Segments[4]);
\b(\d+)\b(?=[^\d]*$)
Try this.See demo.grab the capture.
http://regex101.com/r/sU3fA2/55
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Regex regex = new Regex(#"\b\d+\b(?=[^\d]*$)");
Match match = regex.Match("/forum/thread-93912/34");
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
}
Since my question was a little hard to explain thuroughly (and since I "changed" my problem a little), I thought I'd add my own answer to get the exact code I went with (which I came up with thanks to the other answers here, so I'll give you all an upvote!).
I'm sure this can be made prettier and more compact, but I went for clearity since I'm new to regex!
First, get all strings matching the url + some number (separated with a slash "/"), then extract that number to a group called "page".
Regex regex = new Regex(urlToThread + #"/(?<page>\d+)");
MatchCollection matches = regex.Matches(htmlString);
Then iterate all matches and extract the "page"-value (garanteed to be an integer), and parse it to an integer. Add all parsed integers to a list and sort when done. The last one will be the greatest (last page).
List<int> pages = new List<int>();
foreach(Match match in matches)
pages.Add(int.Parse(match.Groups["page"].Value));
pages.Sort();
// And here we get the last page
int nrOfPages = pages[pages.Count-1];

Regex pattern with parameter, wanna have special sign allowed inside word

Hello please advise me i need in the parameter part (only 1 time inside allowed a > just once.
Working one:
^(?<command>(name|Name))\s:\s(?<parameter>[\w#]([\s\w._#=*'^/\[\]]*[\w.^/\[\]])*(,\s[\w#]([\s\w._#=*'^/\[\]]*[\w.^/\[\]])*)*)(?<seperator>\s>>\s)*(?<description>\w([\s\w]*[\w.])*)?$
Not working one i thought i could implement like this but wrong:
^(?<command>(name|Name))\s:\s(?<parameter>[\w#]([\s\w._>#=*'^/\[\]]*[\w.^/\[\]])*(,\s[\w#]([\s\w._>#=*'^/\[\]]*[\w.^/\[\]])*)*)(?<seperator>\s>>\s)*(?<description>\w([\s\w]*[\w.])*)?$
Expected Input:
Name : param > eter1, parameter2 >> description
Expected Output:
CommandPart: Name
ParameterPart1: param > eter1
ParameterPart2: parameter2
Description: description
I'm assuming your "expected input" isn't actually expected input, since your pattern will only match if command is replaced with name or Name.
It usually also helps to explain what doesn't go as expected, as we have no idea what you really want this regex to do.
It also really helps to state what language you're doing this in, as regex is implemented differently in almost all languages.
However, letting RegexBuddy chew on it, and adding your expected input makes me assume that your problem is that the capturing group named parameter eats up the remainder of the line, instead of giving up some content to seperator and description.
To fix this, you can make use of lazy multipliers (*? or +?) like so:
^(?<command>name|Name)\s:\s(?<parameter>[\w#](?:[\s\w._>#=*'^/\[\]]*?[\w.^/\[\]])*?(?:,\s[\w#](?:[\s\w._>#=*'^/\[\]]*?[\w.^/\[\]])*?)*?)(?<seperator>\s>>\s)*(?<description>\w[\s\w]*[\w.]*)?$
Note that I also removed some numbered capturing groups, and set some to non-capturing subgroups, as I assume you didn't really want them to capture, given that you use named groups.
non-capturing subgroups are made like with (?:something).

Returning the regular expression match as part of a split (or equivalent functionality)

I am trying to parse through some log files and put them into a database for analysis. A single line looks something like this:
2012-09-30 17:16:27,213 [39] (boxes) ERROR Assembly.Places [(null)] - Error while displaying a thing
I have made a regular expression that works well for pulling out the date in front and breaking up the lines that way, but I lose the date itself. This is a pretty important bit of data, and I don't want to lose it!
I cannot just do this by \r\n, because some logs are fatal errors that include stack traces for the developers. Those, obviously, use \r\n to make them readable.
My current code looks like this for reference:
var logpath = Directory.GetFiles(#"C:\a\directory", "*.log");
foreach (var log in logpath)
{
var fileStream = new StreamReader(log);
var fileString = fileStream.ReadToEnd();
var records = Regex.Split(fileString, "[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}");
...
}
Split() will always remove the matched delimiter. The trick is not to match any actual text, but rather a position in the string.
This is done through zero-width look-ahead:
var datePattern = "^(?=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})";
var datePositions = new Regex(datePattern, RegexOptions.Multiline);
// ...
Regex.Split(fileString, datePositions);
You should match instead of splitting
This is the regex.Use singleLine Mode
([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})(.*?)((?=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}|$))
Group 1 contains date
Group 2 contains the required date
NOTE
The regex is conceptually like this.
(yourDate)(.*?yourdata)(?=till the other date|$)
Dont forget to use singlelineMode
Well, I'm not an expert on the subject but I did found this: Regex.Match.
From what I see you can receive the first match of the date format with a Match object
which has all kind of nice properties that put together you can probably cut the parts you want.
p.s. also exists a Regex.Matches which will return all matches in the file, might be easier for use.
Sorry I don't have time for to find a complete code example.
good day

Extract substring from string with Regex

Imagine that users are inserting strings in several computers.
On one computer, the pattern in the configuration will extract some characters of that string, lets say position 4 to 5.
On another computer, the extract pattern will return other characters, for instance, last 3 positions of the string.
These configurations (the Regex patterns) are different for each computer, and should be available for change by the administrator, without having to change the source code.
Some examples:
Original_String Return_Value
User1 - abcd78defg123 78
User2 - abcd78defg123 78g1
User3 - mm127788abcd 12
User4 - 123456pp12asd ppsd
Can it be done with Regex?
Thanks.
Why do you want to use regex for this? What is wrong with:
string foo = s.Substring(4,2);
string bar = s.Substring(s.Length-3,3);
(you can wrap those up to do a bit of bounds-checking on the length easily enough)
If you really want, you could wrap it up in a Func<string,string> to put somewhere - not sure I'd bother, though:
Func<string, string> get4and5 = s => s.Substring(4, 2);
Func<string,string> getLast3 = s => s.Substring(s.Length - 3, 3);
string value = "abcd78defg123";
string foo = getLast3(value);
string bar = get4and5(value);
If you really want to use regex:
^...(..)
And:
.*(...)$
To have a regex capture values for further use you typically use (), depending on the regex compiler it might be () or for microsoft MSVC I think it's []
Example
User4 - 123456pp12asd ppsd
is most interesting in that you have here 2 seperate capture areas. Is there some default rule on how to join them together, or would you then want to be able to specify how to make the result?
Perhaps something like
r/......(..)...(..)/\1\2/ for ppsd
r/......(..)...(..)/\2-\1/ for sd-pp
do you want to run a regex to get the captures and handle them yourself, or do you want to run more advanced manipulation commands?
I'm not sure what you are hoping to get by using RegEx. RegEx is used for pattern matching. If you want to extract based on position, just use substring.
It seems to me that Regex really isn't the solution here. To return a section of a string beginning at position pos (starting at 0) and of length length, you simply call the Substring function as such:
string section = str.Substring(pos, length)
Grouping. You could match on /^.{3}(.{2})/ and then look at group $1 for example.
The question is why? Normal string handling i.e. actual substring methods are going to be faster and clearer in intent.

Categories