Named group in regular expression match - c#

I'm trying to parse some source files for some standard information.
The source files could look like this:
// Name: BoltBait
// Title: Some cool thing
or
// Name :
// Title : Another thing
or
// Title:
// Name:
etc.
The code I'm using to parse for the information looks like this:
Regex REName = new Regex(#"\/{2}\s*Name\s*:\s*(?<nlabel>.*)\n", RegexOptions.IgnoreCase);
Match mname = REName.Match(ScriptText); // entire source code file
if (mname.Success)
{
Name.Text = mname.Groups["nlabel"].Value.Trim();
}
Which works fine if the field has information. It doesn't work if the field is left blank.
For example, in the third example above, the Title field returns a match of "// Name:" and I want it to return the empty string.
I need help from a regex expert.
I thought the regex was too greedy, so I tried the following expression:
#"\/{2}\s*Name\s*:\s*(?<nlabel>.*?)\n"
However, it didn't help.

You can also use a class subtraction to avoid matching newline symbols:
//[\s-[\r\n]]*Name[\s-[\r\n]]*:[\s-[\r\n]]*(?<nlabel>.*)(?=\r?\n|$)
Note that:
[\s-[\r\n]]* - Matches any whitespace excluding newline symbols (a character class subtraction is used)
(?=\r?\n|$) - A positive look-ahead that checks if there is a line break or the end of the string.
See regex demo, output:

\s includes line breaks, which is not wanted here.
It should suffice to match tabs and spaces explicitly after :
\/{2}\s*Name\s*:[\t ]*(?<nlabel>.*?)\n
This returns the empty string correctly in your third example (for both name and title).

My approach is to use an alternate in a non-capturing group to match the label from the colon to the end of the line. This matches either anything to the end of the line, or nothing.
var text1 = "// Name: BoltBait" + Environment.NewLine + "// Title: Some cool thing" + Environment.NewLine;
var text2 = "// Name :" + Environment.NewLine + "// Title : Another thing" + Environment.NewLine;
var text3 = "// Title:" + Environment.NewLine + "// Name:" + Environment.NewLine;
var texts = new List<string>() { text1, text2, text3 };
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var regex = new Regex("^//\\s*?Name\\s*?:(?<nlabel>(?:.*$|$))", options );
foreach (var text in texts){
var match = regex.Match( text );
Console.WriteLine( "|" + match.Groups["nlabel"].Value.Trim() + "|" );
}
Produces:
|BoltBait|
||
||

Related

How to find and get string after a string known values in a text file c#

I want to find and get a string after a string known values in a text file with c#
My text file:
function PreloadFiles takes nothing returns nothing
call Preload( "=== Save ===" )
call Preload( "Player: Michael" )
call Preload( "-load1 UvjkiJyjLlPN1o7FCAwQ0en80t769u5uBKAL1t0u0Cajk86WNmp83F" )
call Preload( "-load2 IMdOIPKGSDFXStx4Zd4LAvAaBmHW19rxsvSNF6kaObSFyBzGq8skYGuq0T1eW" )
call Preload( "-load3 Bd6MoyqnfDydBbwqGApWii3mabJpwNvjcwrKLI0r6UU2wadrMV1h7WQ8D6" )
call Preload( "-load4 D5kI18Flk5bJ4Oi7vQw33b5LHDXHGgJNYsiC6VNJDAHe1" )
call Preload( "KEY PASS: 3568" )
endfunction
i want to get string after string "-load1" ,"-load2" ,"-load3" ,"-load4" ,"KEY PASS: " and fill them on 5 Textbox
like that
UvjkiJyjLlPN1o7FCAwQ0en80t769u5uBKAL1t0u0Cajk86WNmp83F
IMdOIPKGSDFXStx4Zd4LAvAaBmHW19rxsvSNF6kaObSFyBzGq8skYGuq0T1eW
Bd6MoyqnfDydBbwqGApWii3mabJpwNvjcwrKLI0r6UU2wadrMV1h7WQ8D6
D5kI18Flk5bJ4Oi7vQw33b5LHDXHGgJNYsiC6VNJDAHe1
3568
Please help me
Thanks you!
you can use
string Substring (int startIndex);
like:
string in1 = "-load1 UvjkiJyjLlPN1o7FCAwQ0en80t769u5uBKAL1t0u0Cajk86WNmp83F";
string out = in1.substring(7);
it returns:
"UvjkiJyjLlPN1o7FCAwQ0en80t769u5uBKAL1t0u0Cajk86WNmp83F"
It is possible to do with Regex class (from System.Text.RegularExpressions namespace).
Patterns examples:
for -loadN ... string: " [A-Za-z0-9]*\" ". It means Regex should look for substring which starts with whitespace " " contains some amount of chars (A-z) (of any case) or digits (0-9) and ends with double quote \" and whitespace " ". Such as yours UvjkiJyjLlP..." .
for KEY PASS: ... string: #"KEY PASS: (\d{4})". This means Regex should find a substring which contains "KEYPASS: " text and some string of 4 digits and with whitespace " " between them.
But aware, it's very unsafe, because Regex patterns is very sensitive.
For example,
"-loaddd1 AbCdEfG..." (extra chars)
"-load1 AbCdEfG..." (multiple whitespaces)
"KEY PASS: 12345" (pattern in example below looks strictly only for 4 digits, not 5 or more or less)
"-LOAD1 AbCdEfG..." (uppercased)
etc.
This ones will be ignored (last, btw, could be solved by passing RegexOptions.IgnoreCase into Regex.Match(line, pattern, RegexOptions.IgnoreCase)). Others could be solved too, but you should know that this cases are possible.
For a provided in question example this code works fine:
string loadPattern = " [A-Za-z0-9]*\" ";
string keyPassPattern = #"KEY PASS: (\d{4})";
List<string> capturedValues = new List<string>();
foreach (string line in File.ReadAllLines("Preload.txt"))
{
string s;
if (Regex.IsMatch(line, loadPattern) && line.Contains("-load"))
{
// Getting captured substring and trimming from trailing whitespace and quote
s = Regex.Match(line, loadPattern, RegexOptions.IgnoreCase).Value.Trim('\"', ' ');
capturedValues.Add(s);
}
else if (Regex.IsMatch(line, keyPassPattern))
{
// Just replacing "KEY PASS: " to empty string
s = Regex.Match(line, keyPassPattern).Value.Replace("KEY PASS: ", "");
capturedValues.Add(s);
}
}
Result:
string s1 = "-load1 UvjkiJyjLlPN1o7FCAwQ0en80t769u5uBKAL1t0u0Cajk86WNmp83F";
String filter = s1.ToString();
String[] filterRemove = filter.Split(' ');
String Value1= filterRemove[1];
In this way, you will get
"UvjkiJyjLlPN1o7FCAwQ0en80t769u5uBKAL1t0u0Cajk86WNmp83F" in value1
in the same way you can do for all the string and combine them.

using Regex to iterate over a string and search for 3 consecutive hyphens and replace it with [space][hyphen][space]

I currently have a string which looks like this when it is returned :
//This is the url string
// the-great-debate---toilet-paper-over-or-under-the-roll
string name = string.Format("{0}",url);
name = Regex.Replace(name, "-", " ");
And when I perform the following Regex operation it becomes like this :
the great debate toilet paper over or under the roll
However, like I mentioned in the question, I want to be able to apply regex to the url string so that I have the following output:-
the great debate - toilet paper over or under the roll
I would really appreciate any assistance.
[EDIT] However, not all the strings look like this, some of them just have a single hyphen so the above method work
world-water-day-2016
and it changes to
world water day 2016
but for this one:
the-great-debate---toilet-paper-over-or-under-the-roll
I need a way to check if the string has 3 hyphens than replace those 3 hyphens with [space][hyphen][space]. And than replace all the remaining single hyphens between the words with space.
First of all, there is always a very naive solution to this kind of problem: you replace your specific matches in context with some chars that are not usually used in the current environment and after replacing generic substrings you may replace the temporary substrings with the necessary exception.
var name = url.Replace("---", "[ \uFFFD ]").Replace("-", " ").Replace("[ \uFFFD ]", " - ");
You may also use a regex based replacement that matches either a 3-hyphen substring capturing it, or just match a single hyphen, and then check if Group 1 matched inside a match evaluator (the third parameter to Regex.Replace can be a Match evaluator method).
It will look like
var name = Regex.Replace(url, #"(---)|-", m => m.Groups[1].Success ? " - " : " ");
See the C# demo.
So, when (---) part matches, the 3 hyphens are put into Group 1 and the .Success property is set to true. Thus, m => m.Groups[1].Success ? " - " : " " replaces 3 hyphens with space+-+space and 1 hyphen (that may be actually 1 of the 2 consecutive hyphens) with a space.
Here's a solution using LINQ rather than Regex:
var str = "the-great-debate---toilet-paper-over-or-under-the-roll";
var result = str.Split(new string[] {"---"}, StringSplitOptions.None)
.Select(s => s.Replace("-", " "))
.Aggregate((c,n) => $"{c} - {n}");
// result = "the great debate - toilet paper over or under the roll"
Split the string up based on the ---, then remove hyphens from each substring, then join them back together.
The easy way:
name = Regex.Replace(name, "\b-|-\b", " ");
The show-off way:
name = Regex.Replace(name, "(\b)?-(?(1)|\b)", " ");

Regex string in IOS application

I am trying to convert string which i take form NSDictionary as a dictionary and then I have to via method :
string NSDictionaryConverter(string name)
{
foreach (var a in str)
{
if (a.Key.Description.Equals(name))
{
result = a.Value.ToString();
}
Console.WriteLine(str.Keys);
}
return result;
}
Take what ever i need.
Why do I use dictionary ? These dictionary contains information for everything which conatain annotation from the map.
The Key FormattedAddressLines contatins for example :
FormattedAddressLines = (
"ZIP City Name",
Country
);
The value which with I have problems is address, because it contains a lot of details. I need all them displayed nicely on the screen.
Namely, I need to remove ", (, ) chars and line breaks with whitespace before punctuation.
After regex it looks still messy :
string address = NSDictionaryConverter("FormattedAddressLines");
string city = NSDictionaryConverter("City");
string zip = NSDictionaryConverter("ZIP");
string country = NSDictionaryConverter("Country");
address = Regex.Replace(address, #"([()""])+", "");
fullAddress = address + ", " + city + ", " + zip + ", " + country;
addressLabel.Text = fullAddress;
How could i do this to looks like :
Full Address value, - new line
XXX, - new line
XXX, - new Line
... - new line
N value - new line
It seems you need to remove specific special characters and whitespace before punctuation.
You need to add a \s*(?:\r?\n|\r)\s*(?=\p{P}) alternative to your regex:
Regex.Replace(address, #"[()""]+|\s*(?:\r?\n|\r)+\s*(?=\p{P})", "")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The \s* matches 0+ whitespaces, (?:\r?\n|\r)+ matches 1 or more line breaks and \s*(?=\p{P}) matches 0+ whitespaces that are followed with a punctuation symbol. It might be necessary to replace \p{P} with [\p{P}\p{S}] if you also want to include symbols.
See the regex demo:

Retrieving specific characters from string separated by a delimiter

I want to retrieve characters separated by a specific delimiter.
Example :
Here, I want to access the string between the " " delimiters. But I want the 2nd set of characters between "".
abc"def"ghi"jklm // Output : ghi
"hello" yes "world" // output : world
How can I get that?
I know we can use split. But sometimes the string might not start with " character.
Can anyone please help me with this?
You can just find the first quote, and use your approach from there:
var firstQuote = str.IndexOf('"');
var startsWithQuote = str.Substring(firstQuote);
string valueStr = "abc\"def\"ghi\"jklm";
var result = valueStr.Split('"')[2];
Console.WriteLine(result);
https://dotnetfiddle.net/T3fMof
Obviously check for the array elements before accessing them
You can use regular expressions to match them:
var test = "abc\"def\"ghi\"jklm";
var test2 = "\"hello\" yes \"world\"";
var match1 = Regex.Matches(test, ".+\"(.+)\"");
var match2 = Regex.Matches(test2, ".+\"(.+)\"");
Console.WriteLine("Match1: " + match1[0].Groups[1].Captures[0]);
Console.WriteLine("Match2: " + match2[0].Groups[1].Captures[0]);
// Match1: ghi
// Match2: world

how do i replace exact phrases in c# string.replace

I am trying to ensure that a list of phrases start on their own line by finding them and replacing them with \n + the phrase. eg
your name: joe your age: 28
becomes
my name: joe
your age: 28
I have a file with phrases that i pull and loop through and do the replace. Except as there are 2 words in some phrases i use \b to signify where the phrase starts and ends.
This doesn't seem to work, anybody know why?
example - String is 'Name: xxxxxx' does not get edited.
output = output.Replace('\b' + "Name" + '\b', "match");
Using regular expressions, accounts for any number of words with any number of spaces:
using System.Text.RegularExpressions;
Regex re = new Regex("(?<key>\\w+(\\b\\s+\\w+)*)\\s*:\\s*(?<value>\\w+)");
MatchCollection mc = re.Matches("your name: joe your age: 28 ");
foreach (Match m in mc) {
string key = m.Groups("key").Value;
string value = m.Groups("value").Value;
//accumulate into a list, but I'll just write to console
Console.WriteLine(key + " : " + value);
}
Here is some explanation:
Suppose what you want to the left of the colon (:) is called a key, and what is to the right - a value.
These key/value pairs are separated by at least once space. Because of this, value has be exactly one word (otherwise we'd have ambiguity).
The above regular expression uses named groups, to make code more readable.
got it
for (int headerNo=0; headerNo<headersArray.Length; headerNo++)
{
string searchPhrase = #"\b" + PhraseArray[headerNo] + #"\b";
string newPhrase = "match";
output = Regex.Replace(output, searchPhrase, newPhrase); }
Following the example you can do that :
output = output.Replace("your", "\nyour");

Categories