Convert C# regex Code to Java - c#

I have found this Regex extractor code in C#.
Can someone tell me how this works, and how do I write the equivalent in Java?
// extract songtitle from metadata header.
// Trim was needed, because some stations don't trim the songtitle
fileName =
Regex.Match(metadataHeader,
"(StreamTitle=')(.*)(';StreamUrl)").Groups[2].Value.Trim();

This should be what you want.
// Create the Regex pattern
Pattern p = Pattern.compile("(StreamTitle=')(.*)(';StreamUrl)");
// Create a matcher that matches the pattern against your input
Matcher m = p.matcher(metadataHeader);
// if we found a match
if (m.find()) {
// the filename is the second group. (The `(.*)` part)
filename = m.group(2);
}

It pulls "MyTitle" from a string such as "StreamTitle='MyTitle';StreamUrl".
The () operators define match groups, there are 3 in your regex. The second one contains the string of interest, and is gotten in the Groups[2].Value.
There's a few very good regex designers out there. The one I use is Rad Software's Regular Expression Designer (www.radsoftware.com.au). It is very useful for figuring out stuff like this (and it uses C# RegEx's).

Related

Regex for Information Extraction from CS File

Below is a snapshot of lines from my CS file from C# code and I'm trying to extract fields mandatory or supported fields from my class file.
1) Is there a way for me to dynamically load the cs file into the .NET application and extract the information out, starting just by loading cs file from file path?
2) Following to the question above, I'm currently resorting to extract information out thru Regex.
First Regex - (m_oSupportedFields.).+?(?=EnumSupported.Mandatory;|EnumSupported.Supported)
and result as below :-
Second Regex - (..+)\=
and result as below :-
What I'm trying to achieve is to extract Persona.Forename, Personal.Surname and other fields by a Regex (one Regex for EnumSupported.Mandatory, and one for EnumSupported.Supported).
Also, I'm trying to cater for malformed line such as
m_oSupportedFields.Personal.DOB.Day.Supported=EnumSupported.Supported;
(Note the space between the equal sign)
or
m_oSupportedFields.Personal.DOB.Day.Supported = EnumSupported.Supported;
(Note the double space between)
or even
m_oSupportedFields.Personal.Surname.Supported =
EnumSupported.Mandatory;
(Note the Enum is on second line)
Please advice on how should I compile the Regex for such situation.
Thanks.
UPDATED in TEXTUAL VERSION
m_oSupportedFields.Personal.Surname.Supported = EnumSupported.Mandatory;
m_oSupportedFields.Personal.Forename.Supported = EnumSupported.Mandatory;
m_oSupportedFields.Personal.MiddleName.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Day.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Month.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Year.Supported = EnumSupported.Supported;
So from each line, you want to extract the part after m_oSupportedFields. and before .Supported =, as well as the part after the =. And you want to ignore only blank spaces before the =, but any whitespace after the =.
Your regular expression will be: ^m_oSupportedFields\.([\w\.]+)\.Supported *=\s*(EnumSupported\.\w+);
Omit the ^ if you don't want to require that the string start at the beginning of a line.
Using C#, you can access the match groups like this:
using System.Text.RegularExpressions;
string regex = #"^m_oSupportedFields\.([\w\.]+)\.Supported *=\s*(EnumSupported\.\w+);";
string input = #"m_oSupportedFields.Personal.DOB.Day.Supported=EnumSupported.Supported";
foreach (Match m in Regex.Matches(input, regex))
{
Console.WriteLine(m.Captures[0].ToString());
Console.WriteLine(m.Captures[1].ToString());
}
// Console:
// Personal.DOB.Day
// EnumSupported.Supported
1) Is there a way for me to dynamically load the cs file into the .NET application and extract the information out, starting just by loading cs file from file path?
Possibly, there is the .Net Compiler as a Service which is now used by VS2015 (Overview). Look into creating a Stand-Alone Code Analysis Tool.
extract Persona.Forename, Personal.Surname and other fields by a Regex (one Regex for EnumSupported.Mandatory, and one for EnumSupported.Supported).
To create a pattern, one can be very general or one can be very specific on what needs to be captured. As one makes the pattern to be more general, the pattern complexity increases along with the supporting code to extract the data.
Capture into Enumerable Dynamic Entities
This is a specific pattern that takes the results and places them into Linq set of dynamic entities. ** Note that it handles the possible line split**
string data = #"
m_oSupportedFields.Personal.Surname.Supported =
EnumSupported.Mandatory;
m_oSupportedFields.Personal.Forename.Supported=EnumSupported.Mandatory;
m_oSupportedFields.Personal.MiddleName.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Day.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Month.Supported = EnumSupported.Supported;
m_oSupportedFields.Personal.DOB.Year.Supported = EnumSupported.Supported;
";
string pattern = #"
Personal\. # Anchor for match
(?<Full> # Grouping for Or condition
(?<Name>[^.]+) # Just the name
| # Or
(?<Combined>[^.]+\.[^.]+) # Name/subname
) # End Or Grouping
(?=\.Supported) # Look ahead to anchor to Supported (does not capture)
\.Supported
\s*= # Possible whitespace and =
[\s\r\n]*EnumSupported\.
(?<SupportType>Mandatory|Supported) # Capture support type";
// Ignore Pattern whitespace allows us to comment the pattern instead of having
// it on oneline. It does not affect regex pattern processing in anyway.
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
FullName = mt.Groups["Full"].Value,
IsName = mt.Groups["Name"].Success,
IsCombined = mt.Groups["Combined"].Success,
Type = mt.Groups["SupportType"].Value
})
The results look like this:
Note that it can determine if the name extracted is from a single like (ForeName) or double from (DOB.Day) and captures either into the named capture "FullName" with the "Name" and "Combined" capturings used as "Is-As" booleans.

how to handle new lines in Regular expressions

I have an application that reads codes from a text file written in C#.
The codes will generally follow the same pattern each time
example:
QUES10100
From what i have written so far this results in the regular expression looking like this:
string expr = "^[A-Z]{4}[0-9]{5}$";
The question then is when the codes are read from a text file ( One per new line ) the codes have the \r new line character appended. This is from placing a breakpoint on to see what was really being passed through.
What am i missing from the expression provided above?
Also if i am adding the codes individually the /r characters are not appended so its fine, in this case i would need an or operand in there somewhere.
Summary
What I have so far: ^[A-Z]{4}[0-9]{5}$
What I need: ^[A-Z]{4}[0-9]{5}$ OR ^[A-Z]{4}[0-9]{5}$ with /r characters accounted for.
Thanks, any clarifications please let me know as my experience with
REGEX is very limited.
Update
string expr = "^[A-Z]{4}[0-9]{5}";
Regex regex = new Regex(expr , RegexOptions.IgnoreCase);
Match match = regex.Match( code );
if (!match.Success) //Pattern must match
{
MessageBox.Show("Code does not match the necessary pattern");
return false;
}
return true;
Why do you want to use regex for that? Use File.ReadLines and use the regex for validation.
foreach(string line in File.ReadLines(#"c:\file path here")) {
if (Regex.Test(expr, line)) {
Console.WriteLine(line);
}
}
If you have no control over how are the strings being read, you could also take a look at the String.Trim(char\[\] values) method, which would allow you to sanitize your string before hand:
Something like the below:
string str = "....".Trim(new char[] {'\r', '\n'});
This is usually recommended (since almost anything is better than regex :)).
Then you would feed it to the regular expression you have built.

c# Regex: find placeholders as substring

i have following string.
"hello [#NAME#]. nice to meet you. I heard about you via [#SOURCE#]."
in above text i have two place holders. NAME and SOURCE
i want to extract these sub string using Reg Ex.
what would be the reg ex pattern to find list of these place holders.
i tried
string pattern = #"\[#(\w+)#\]";
result
hello
NAME
. nice to meet you. I heard about you via
SOURCE
.
what i want is only
NAME
SOURCE
Sample code
string tex = "hello [#NAME#]. nice to meet you. I heard about you via [#SOURCE#].";
string pattern = #"\[#(\w+)#\]";
var sp = Regex.Split(tex, pattern);
sp.Dump();
Your regex is working correctly. That's, how Regex.Split() should behave (see the doc). If what you said is really what you want, you can use something like:
var matches = from Match match in Regex.Matches(text, pattern)
select match.Groups[1].Value;
If, on the other hand, you wanted to replace the placeholders using some rules (e.g. using a Dictionary<string, string>), then you could do:
Regex.Replace(text, pattern, m => substitutions[m.Groups[1].Value]);
Try this regex:
\[#([A-Z]+)#\]
^hello (.*?). nice to meet you. I heard about you via (.*?).$
Very simply, the () means you want to capture what's inside, the .*? is (what's known as) an "ungreedy" capture (capture as few characters as possible). and . means any character.
demo of above
Unless you're placeholds are always going to use [# prefix, and #] postfix, then see the other users' posts.

C# regex need characters after \player_n\

I need a regex pattern which will accommodate for the following.
I get a response from a UDP server, it's a very long string and each word is separated by \, for example:
\g79g97\g879o\wot87gord\player_0\name0\g6868o\g78og89\g79g79\player_1\name1\gyuvui\yivyil\player_2\name2\g7g87\g67og9o\v78v9i7
I need the strings after \player_n\, so in the above example I would need name0, name1 and name3,
I know this is the second regex question of the day but I have the book (Mastering Regular Expressions) on order! Thank you.
UPDATE. elusive's regex pattern will suffice, and I can add the match(0) to a textbox. However, what if I want to add all the matches to the text box ?
textBox1.Text += match.Captures[0].ToString(); //this works fine.
How do I add "all" match.captures to the text box? :s sorry for being so lame, this Regex class is brand new to me .
Try this one:
\\player_\d+\\([^\\]+)
i think that this test sample can help you
string inp = #"\g79g97\g879o\wot87gord\player_0\name0\g6868o\g78og89\g79g79\player_1\name1\gyuvui\yivyil\player_2\name2\g7g87\g67og9o\v78v9i7";
string rex = #"[\w]*[\\]player_[0-9]+[\\](?<name>[A-Za-z0-9]*)\b";
Regex re = new Regex(rex);
Match mat = re.Match(inp);
for (Match m = re.Match(inp); m.Success; m = m.NextMatch())
{
Console.WriteLine(m.Groups["name"]);
}
you can take the name of the player from the m.Groups["name"]
To get only the player name, you could use:
(?<=\\player_\d+\\)[^\\]+
This (?<=\\player_\d+\\) is something called a positive look-behind. It makes sure that the actual match [^\\]+ is preceded by the expression in the parentheses.
In this case, it's even specific to only a few regex engines (.NET being among them, luckily), in that it contains a variable length expression (due to \d+). Most regex engines only support fixed-length look-behind.
In any case, look-behind is not necessarily the best approach to this problem, match groups are simpler easier to read.

C# reliable way to pattern match?

At the moment I am trying to match patterns such as
text text date1 date2
So I have regular expressions that do just that. However, the issue is for example if users input data with say more than 1 whitespace or if they put some of the text in a new line etc the pattern does not get picked up because it doesn't exactly match the pattern set.
Is there a more reliable way for pattern matching? The goal is to make it very simple for the user to write but make it easily matchable on my end. I was considering stripping out all the whitespace/newlines etc and then trying to match the pattern with no spaces i.e. texttextdate1date2.
Anyone got any better solutions?
Update
Here is a small example of the pattern I would need to match:
FIND me#test.com 01/01/2010 to 10/01/2010
Here is my current regex:
FIND [A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4} [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4} to [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}
This works fine 90% of the time, however, if users submit this information via email it can have all different kinds of formatting and HTML I am not interested in. I am using a combination of the HtmlAgilityPack and a HTML tag removing regex to strip all the HTML from the email, but even at that I can't seem to get a match on some occassions.
I believe this could be a more parsing related question than pattern matching, but I think maybe there is a better way of doing this...
To match at least one or more whitespace characters (space, tab, newline), use:
\s+
Substitute the above wherever you have the physical space in your pattern and you should be fine.
Example of matching multiple groups in a text with multiple whitespaces and/or newlines.
var txt = "text text date1\ndate2";
var matches = Regex.Match(txt, #"([a-z]+)\s+([a-z]+)\s+([a-z0-9]+)\s+([a-z0-9]+)", RegexOptions.Singleline);
matches.Groups[n].Value with n from 1 to 4 will contain your matches.
I would split the string into a string array and match each resulting string to the necessary Regular Expression.
\b(text)[\s]+(text)[\s]+(date1)[\s]+(date2)\b
Its a nasty expression but here is something that will work for the input you provided:
^(\w+)\s+([\w#.]+)\s+(\d{2}\/\d{2}\/\d{4})[^\d]+(\d{2}\/\d{2}\/\d{4})$
This will work with variable amounts of whitespace between the capture groups as well.
Through ORegex you can tokenize your string and just pattern match on token sequences:
var tokens = input.Split(new[]{' ','\t','\n','\r'}, StringSplitOptions.RemoveEmptyEntries);
var oregex = new ORegex<string>("{0}{0}{1}{1}", IsText, IsDate);
var matches = oregex.Matches(tokens); //here is your subsequence tokens.
...
public bool IsText(string str)
{
...
}
public bool IsDate(string str)
{
...
}

Categories