How to match a specific sentence with Regex

How to match a specific sentence with Regex - c#

I'm new to Regex and I couldn't cope with matching this sort of sentence: Band Name #Venue 30 450, where the digits at the end represent price and quantity.
string input = "Band Name #City 25 3500";
Match m = Regex.Match(input, #"^[A-Za-z]+\s+[A-Za-z]+\s+[\d+]+\s+[\d+]$");
if (m.Success)
{
Console.WriteLine("Success!");
}

You can use Regex and leverage usage of named groups. This will make easier to extract data later if you need them. Example is:
string pattern = #"(Band) (?<Band>[A-Za-z ]+) (?<City>#[A-Za-z ]+) (?<Price>\d+) (?<Quantity>\d+)";
string input = "Band Name #City 25 3500";
Match match = Regex.Match(input, pattern);
Console.WriteLine(match.Groups["Band"].Value);
Console.WriteLine(match.Groups["City"].Value.TrimStart('#'));
Console.WriteLine(match.Groups["Price"].Value);
Console.WriteLine(match.Groups["Quantity"].Value);
If you looked at the pattern there are few regex groups which are named ?<GroupName>. It is just a basic example which can be tweaked as well to fulfill you actual needs.

This one should work:
[A-Za-z ]+ [A-Za-z ]+ #[A-Za-z ]+ \d+ \d+
Can test it here.
With your code it'd be:
string input = "Band Name #City 25 3500";
Match m = Regex.Match(input, "[A-Za-z ]+ [A-Za-z ]+ #[A-Za-z ]+ \d+ \d+");
if (m.Success)
{
Console.WriteLine("Success!");
}

Here is a very old and elaborated way : 1st way
string re1=".*?"; // Here the part before #
string re2="(#)"; // Any Single Character 1
string re3="((?:[a-z][a-z]+))"; // Word 1, here city
string re4="(\\s+)"; // White Space 1
string re5="(\\d+)"; // Integer Number 1, here 25
string re6="(\\s+)"; // White Space 2
string re7="(\\d+)"; // Integer Number 2, here 3500
Regex r = new Regex(re1+re2+re3+re4+re5+re6+re7,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String c1=m.Groups[1].ToString();
String word1=m.Groups[2].ToString();
String ws1=m.Groups[3].ToString();
String int1=m.Groups[4].ToString();
String ws2=m.Groups[5].ToString();
String int2=m.Groups[6].ToString();
Console.Write("("+c1.ToString()+")"+"("+word1.ToString()+")"+"("+ws1.ToString()+")"+"("+int1.ToString()+")"+"("+ws2.ToString()+")"+"("+int2.ToString()+")"+"\n");
}
In the above way you can store the specific values at a time. Like in your group[6] there is 3500 or what value in this format.
you can create your own regex here : Regex
And in a short, others given answers are right. 2nd way
just create the regex with
"([A-Za-z ]+) ([A-Za-z ]+) #([A-Za-z ]+) (\d+) (\d+)"
And match with any string format. you can create you won regex and test here: Regex Tester

That is the answer to what I was trying to do:
string input = "Band Name #Location 25 3500";
Match m = Regex.Match(input, #"([A-Za-z ]+) (#[A-Za-z ]+) (\d+) (\d+)");
if (m.Success)
{
Console.WriteLine("Success!");
}

Related

Regex - Get digits after a colon

I have a regex:
var topPayMatch = Regex.Match(result, #"(?<=Top Pay)(\D*)(\d+(?:\.\d+)?)", RegexOptions.IgnoreCase);
And I have to convert this to int which I did
topPayMatch = Convert.ToInt32(topPayMatchString.Groups[2].Value);
So now...
Top Pay: 1,000,000 then it currently grabs the first digit, which is 1. I want all 1000000.
If Top Pay: 888,888 then I want all 888888.
What should I add to my regex?

You can use something as simple like #"(?<=Top Pay: )([0-9,]+)". Note that, decimals will be ignored with this regex.
This will match all numbers with their commas after Top Pay:, which after you can parse it to an integer.
Example:
Regex rgx = new Regex(#"(?<=Top Pay: )([0-9,]+)");
string str = "Top Pay: 1,000,000";
Match match = rgx.Match(str);
if (match.Success)
{
string val = match.Value;
int num = int.Parse(val, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(num);
}
Console.WriteLine("Ended");
Source:
Convert int from string with commas

If you use the lookbehind, you don't need the capture groups and you can move the \D* into the lookbehind.
To get the values, you can match 1+ digits followed by optional repetitions of , and 1+ digits.
Note that your example data contains comma's and no dots, and using ? as a quantifier means 0 or 1 time.
(?<=Top Pay\D*)\d+(?:,\d+)*
The pattern matches:
(?<=Top Pay\D*) Positive lookbehind, assert what is to the left is Top Pay and optional non digits
\d+ Match 1+ digits
(?:,\d+)* Optionally repeat a , and 1+ digits
See a .NET regex demo and a C# demo
string pattern = #"(?<=Top Pay\D*)\d+(?:,\d+)*";
string input = #"Top Pay: 1,000,000
Top Pay: 888,888";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
var topPayMatch = int.Parse(m.Value, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(topPayMatch);
}
Output
1000000
888888

Regex to find all placeholder occurrences in text

Im struggling to create a Regex that finds all placeholder occurrences in a given text. Placeholders will have the following format:
[{PRE.Word1.Word2}]
Rules:
Delimited by "[{PRE." and "}]" ("PRE" upper case)
2 words (at least 1 char long each) separated by a dot. All chars valid on each word apart from newline.
word1: min 1 char, max 15 chars
word2: min 1 char, max 64 chars
word1 cannot have dots, if there are more than 2 dots inside placeholder extra ones will be part of word2. If less than 2 dots, placeholder is invalid.
Looking to get all valid placeholders regardless of what the 2 words are.
Im not being lazy, just spent an horrible amount of time building the rule on regexr.com, but was unable to cross all these rules.
Looking fwd to checking your suggestions.
The closest I've got to was the below, and any attempt to expand on that breaks all valid matches.
\[\{OEP\.*\.*\}\]
Much appreciated!
Sample text where Regex should find matches:
Random text here
[{Test}] -- NO MATCH
[{PRE.TestTest3}] --NO MATCH
[{PRE.TooLong.12345678901234567890}] --NO MATCH
[{PRE.Address.Country}] --MATCH
[{PRE.Version.1.0}] --MATCH
Random text here

You can use
\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]
See the regex demo
Details
\[{ - a [{ string
PRE\. - PRE. text
([^][{}.]{1,15}) - Group 1: any one to fifteen chars other than [, ], {, } and .
\. - a dot
(.{1,64}?) - any one to 64 chars other than line break chars as few as possible
}] - a }] text.
If you need to get all matches in C#, you can use
var pattern = #"\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]";
var matches = Regex.Matches(text, pattern);
See this C# demo:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "[{PRE.Word1.Word2}] and [{PRE.Word 3.Word..... 2 %%%}]";
var pattern = #"\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]";
var matches = Regex.Matches(text, pattern);
var props = new List<Property>();
foreach (Match m in matches)
props.Add(new Property(m.Groups[1].Value,m.Groups[2].Value));
foreach (var item in props)
Console.WriteLine("Word1 = " + item.Word1 + ", Word2 = " + item.Word2);
}
public class Property
{
public string Word1 { get; set; }
public string Word2 { get; set; }
public Property()
{}
public Property(string w1, string w2)
{
this.Word1 = w1;
this.Word2 = w2;
}
}
}
Output:
Word1 = Word1, Word2 = Word2
Word1 = Word 3, Word2 = Word..... 2 %%%

string input = "[{PRE.Word1.Word2}]";
// language=regex
string pattern = #"\[{ PRE \. (?'group1' .{1,15}? ) \. (?'group2' .{1,64}? ) }]";
var match = Regex.Match(input, pattern, RegexOptions.IgnorePatternWhitespace);
Console.WriteLine(match.Groups["group1"].Value);
Console.WriteLine(match.Groups["group2"].Value);

Get particular parts from a string

I'm trying to get particular parts from a string. I have to get the part which starts after '#' and contains only letters from the Latin alphabet.
I suppose that I have to create a regex pattern, but I don't know how.
string test = "PQ#Alderaa1:30000!A!->20000";
var planet = "Alderaa"; //what I want to get
string test2 = "#Cantonica:3000!D!->4000NM";
var planet2 = "Cantonica";
There are some other parts which I have to get, but I will try to get them myself. (starts after ':' and is an Integer; may be "A" (attack) or "D" (destruction) and must be surrounded by "!" (exclamation mark); starts after "->" and should be an Integer)

You could get the separate parts using capturing groups:
#([a-zA-Z]+)[^:]*:(\d+)!([AD])!->(\d+)
That will match:
#([a-zA-Z]+) Match # and capture in group 1 1+ times a-zA-Z
[^:]*: Match 0+ times not a : using a negated character class, then match a : (If what follows could be only optional digits, you might also match 0+ times a digit [0-9]*)
(\d+) Capture in group 2 1+ digits
!([AD])! Match !, capture in group 3 and A or D, then match !
->(\d+) Match -> and capture in group 4 1+ digits
Demo | C# Demo

You can use this regex, which uses a positive look behind to ensure the matched text is preceded by # and one or more alphabets get captured using [a-zA-Z]+ and uses a positive look ahead to ensure it is followed by some optional text, a colon, then one or more digits followed by ! then either A or D then again a !
(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)
Demo
C# code demo
string test = "PQ#Alderaa1:30000!A!->20000";
Match m1 = Regex.Match(test, #"(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)");
Console.WriteLine(m1.Groups[0].Value);
test = "#Cantonica:3000!D!";
m1 = Regex.Match(test, #"(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)");
Console.WriteLine(m1.Groups[0].Value);
Prints,
Alderaa
Cantonica

You already have a good answers but I would like to add a new one to show named capturing groups.
You can create a class for your planets like
class Planet
{
public string Name;
public int Value1; // name is not cleat from context
public string Category; // as above: rename it
public string Value2; // same problem
}
Now you can use regex with named groups
#(?<name>[a-z]+)[^:]*:(?<value1>\d+)!(?<category>[^!]+)!->(?<value2>[\da-z]+)
Demo
Usage:
var input = new[]
{
"PQ#Alderaa1:30000!A!->20000",
"#Cantonica:3000!D!->4000NM",
};
var regex = new Regex("#(?<name>[a-z]+)[^:]*:(?<value1>\\d+)!(?<category>[^!]+)!->(?<value2>[\\da-z]+)",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
var planets = input
.Select(p => regex.Match(p))
.Select(m => new Planet
{
Name = m.Groups["name"].Value, // here and further we can access to part of input string by name
Value1 = int.Parse(m.Groups["value1"].Value),
Category = m.Groups["category"].Value,
Value2 = m.Groups["value2"].Value
})
.ToList();

How to separate numbers from words, chars and any other marks with whitespace in string

I'm trying to separate numbers from words or characters and any other punctuation with whitespace in string wrote them together e.g. string is:
string input = "ok, here is369 and777, and 20k0 10+1.any word.";
and desired output should be:
ok, here is 369 and 777 , and 20 k 0 10 + 1 .any word.
I'm not sure if I'm on right way, but now what I'm trying to do, is to find if string contains numbers and then somehow replace it all with same values but with whitespace between. If it is possible, how can I find all individual numbers (not each digit in number to be clearer), separated or not separated by words or whitespace and attach each found number to value, which can be used for all at once to replace it with same numbers but with spaces on sides. This way it returns only first occurrence of a number in string:
class Program
{
static void Main(string[] args)
{
string input = "here is 369 and 777 and 15 2080 and 579";
string resultString = Regex.Match(input, #"\d+").Value;
Console.WriteLine(resultString);
Console.ReadLine();
}
}
output:
369
but also I'm not sure if I can get all different found number for single replacement value for each. Would be good to find out in which direction to go

If what we need is basically to add spaces around numbers, try this:
string tmp = Regex.Replace(input, #"(?<a>[0-9])(?<b>[^0-9\s])", #"${a} ${b}");
string res = Regex.Replace(tmp, #"(?<a>[^0-9\s])(?<b>[0-9])", #"${a} ${b}");
Previous answer assumed that words, numbers and punctuation should be separated:
string input = "here is369 and777, and 20k0";
var matches = Regex.Matches(input, #"([A-Za-z]+|[0-9]+|\p{P})");
foreach (Match match in matches)
Console.WriteLine("{0}", match.Groups[1].Value);
To construct the required result string in a short way:
string res = string.Join(" ", matches.Cast<Match>().Select(m => m.Groups[1].Value));

You were on the right path. Regex.Match only returns one match and you would have to use .NextMatch() to get the next value that matches your regular expression. Regex.Matches returns every possible match into a MatchCollection that you can then parse with a loop as I did in my example:
string input = "here is 369 and 777 and 15 2080 and 579";
foreach (Match match in Regex.Matches(input, #"\d+"))
{
Console.WriteLine(match.Value);
}
Console.ReadLine();
This Outputs:
369
777
15
2080
579

This provides the desired output:
string input = "ok, here is369 and777, and 20k0 10+1.any word.";
var matches = Regex.Matches(input, #"([\D]+|[0-9]+)");
foreach (Match match in matches)
Console.Write("{0} ", match.Groups[0].Value);
[\D] will match anything non digit. Please note space after {0}.

regex to strip number from var in string

I have a long string and I have a var inside it
var abc = '123456'
Now I wish to get the 123456 from it.
I have tried a regex but its not working properly
Regex regex = new Regex("(?<abc>+)=(?<var>+)");
Match m = regex.Match(body);
if (m.Success)
{
string key = m.Groups["var"].Value;
}
How can I get the number from the var abc?
Thanks for your help and time

var body = #" fsd fsda f var abc = '123456' fsda fasd f";
Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'");
Match m = regex.Match(body);
Console.WriteLine("name: " + m.Groups["name"]);
Console.WriteLine("number: " + m.Groups["number"]);
prints:
name: abc
number: 123456

Your regex is not correct:
(?<abc>+)=(?<var>+)
The + are quantifiers meaning that the previous characters are repeated at least once (and there are no characters since (?< ... > ... ) is named capture group and is not considered as a character per se.
You perhaps meant:
(?<abc>.+)=(?<var>.+)
And a better regex might be:
(?<abc>[^=]+)=\s*'(?<var>[^']+)'
[^=]+ will match any character except an equal sign.
\s* means any number of space characters (will also match tabs, newlines and form feeds though)
[^']+ will match any character except a single quote.
To specifically match the variable abc, you then put it like this:
(?<abc>abc)\s*=\s*'(?<var>[^']+)'
(I added some more allowances for spaces)

From the example you provided the number can be gotten such as
Console.WriteLine (
Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456
\d+ means 1 or more numbers (digits).
But I surmise your data doesn't look like your example.

Try this:
var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x";
Regex regex = new Regex(#"(?<=var \w+ = ')\d+");
Match m = regex.Match(body);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to match a specific sentence with Regex - c#

This one should work: [A-Za-z ]+ [A-Za-z ]+ #[A-Za-z ]+ \d+ \d+ Can test it here. With your code it'd be: string input = "Band Name #City 25 3500"; Match m = Regex.Match(input, "[A-Za-z ]+ [A-Za-z ]+ #[A-Za-z ]+ \d+ \d+"); if (m.Success) { Console.WriteLine("Success!"); }

That is the answer to what I was trying to do: string input = "Band Name #Location 25 3500"; Match m = Regex.Match(input, #"([A-Za-z ]+) (#[A-Za-z ]+) (\d+) (\d+)"); if (m.Success) { Console.WriteLine("Success!"); }

Related

Regex - Get digits after a colon

Regex to find all placeholder occurrences in text

Get particular parts from a string

How to separate numbers from words, chars and any other marks with whitespace in string

regex to strip number from var in string

Categories

Resources