I have a question about .NET regular expressions.
Now I have several strings in a list, there may be a number in the string, and the rest part of string is same, just like
string[] strings = {"var1", "var2", "var3", "array[0]", "array[1]", "array[2]"}
I want the result is {"var$i" , "array[$i]"}, and I have a record of the number which record the number matched, like a dictionary
var$i {1,2,3} &
array[$i] {0, 1 ,2}
I defined a regex like this
var numberReg = new Regex(#".*(<number>\d+).*");
foreach(string str in strings){
var matchResult = numberReg.Match(name);
if(matchResult.success){
var number = matchResult.Groups["number"].ToString();
//blablabla
But the regex here seems to be not work(never match success), I am new at regex, and I want to solve this problem ASAP.
Try this as your regex:
(?<number>\d+)
It is not clear to me what exactly you want. However looking into your code, I assume you have to somehow extract the numbers (and maybe variable names) from your list of values. Try this:
// values
string[] myStrings = { "var1", "var2", "var3", "array[0]", "array[1]", "array[2]" };
// matches
Regex x = new Regex(#"(?<pre>\w*)(?<number>\d+)(?<post>\w*)");
MatchCollection matches = x.Matches(String.Join(",", myStrings));
// get the numbers
foreach (Match m in matches)
{
string number = m.Groups["number"].Value;
...
}
Related
I have some list that is holding several strings, for example:
List<string> list1 = new List<string>()
{
"REGISTER_OPTION_P2", "REGISTER_OPTION_P27", "REGISTER_OPTION_P254","REGISTER_OPTION_NOFW", "POWER_OPTION_P45JW"
};
I Want to filter all the strings that are ending with the _P*where * is several digits only and not non-digits.
The result for the above will hold the following:
"REGISTER_OPTION_P2", "REGISTER_OPTION_P27", "REGISTER_OPTION_P254"
I know there is char.IsDigit() but it operates only on 1 digit. My case is multiple digits.
Any option to make it?
You can use
var lst = new[] {"REGISTER_OPTION_P2", "REGISTER_OPTION_P27", "REGISTER_OPTION_P254","REGISTER_OPTION_NOFW", "POWER_OPTION_P45JW"};
var pattern = #"_P[0-9]*$";
var result = lst.Where(x => Regex.IsMatch(x, pattern, RegexOptions.RightToLeft));
foreach (var s in result)
Console.WriteLine(s);
Output:
REGISTER_OPTION_P2
REGISTER_OPTION_P27
REGISTER_OPTION_P254
See the C# demo.
Details:
_P - a fixed string
[0-9]* - zero or more digits
$ - end of string.
Note the use of RegexOptions.RightToLeft that greatly enhances matching at the end of the string.
So the regex expression that will catch that is
P(\d+$)
\d stands for digit, + is more than 1, $ is the end of the string, and () specifies that it should be captured. C# should have a findAll function in regex.
One tool that is really helpful for me (because I'm not great at regex) is
https://www.autoregex.xyz/
Use the String.Replace() function
"REGISTER_OPTION_P42".Replace("REGISTER_OPTION_P",string.Empty) = "42"
or use the String.Substring() function
"REGISTER_OPTION_P42".Substring(17) = "42"
and then use .All( (c)=>char.IsDigit(c) ) to check that all remaining characters are digits.
sample code
static void Main(string[] args)
{
var list = new List<string>(new string[] { "REGISTER_OPTION_P23", "REGISTER_OPTION_P823", "REGISTER_OPTION_P1Q6", "REGISTER_OPTION_P5" });
var filtered = list.Where((s) => s.Replace("REGISTER_OPTION_P", string.Empty).All((c)=>char.IsDigit(c))).ToList();
foreach (var item in filtered)
{
Console.WriteLine(item);
}
//REGISTER_OPTION_P23
//REGISTER_OPTION_P823
//REGISTER_OPTION_P5
}
This question already has answers here:
Regular Expression Groups in C#
(5 answers)
Closed 6 years ago.
New to using C# Regex, I am trying to capture two comma separated integers from a string into two variables.
Example: 13,567
I tried variations on
Regex regex = new Regex(#"(\d+),(\d+)");
var matches = regex.Matches("12,345");
foreach (var itemMatch in matches)
Debug.Print(itemMatch.Value);
This just captures 1 variable, which is the entire string. I did workaround this by changing the capture pattern to "(\d+)", but that then ignores the middle comma entirely and I would get a match if there were any text between the integers.
How do I get it to extract both integers and ensure it also sees a comma between.
Can do this with String.Split
Why not just use a split and parse?
var results = "123,456".Split(',').Select(int.Parse).ToArray();
var left = results[0];
var right = results[1];
Alternatively, you can use a loop and use int.TryParse to handle failures but for what you're looking for this should cover it
If you're really committed to a Regex
You can do this with a Regex too, just need to use groups of the match
Regex r = new Regex(#"(\d+)\,(\d+)", RegexOptions.Compiled);
var r1 = r.Match("123,456");
//first is total match
Console.WriteLine(r1.Groups[0].Value);
//Then first and second groups
var left = int.Parse(r1.Groups[1].Value);
var right = int.Parse(r1.Groups[2].Value);
Console.WriteLine("Left "+ left);
Console.WriteLine("Right "+right);
Made a dotnetfiddle you can test the solution in as well
With Regex, you can use this:
Regex regex = new Regex(#"\d+(?=,)|(?<=,)\d+");
var matches = regex.Matches("12,345");
foreach (Match itemMatch in matches)
Console.WriteLine(itemMatch.Value);
prints:
12
345
Actually this is doing a look-ahead and look-behind a , :
\d+(?=,) <---- // Match numbers followed by a ,
| <---- // OR
(?<=,)\d+ <---- // Match numbers preceeded by a ,
upfront the code to visualize a bit the problem I am facing:
This is the text that needs to be split.
:20:0444453880181732
:21:0444453880131350
:22:CANCEL/ABCDEF0131835055
:23:BUY/CALL/E/EUR
:82A:ABCDEFZZ80A
:87A:4444655604
:30:061123
:31G:070416/1000/USNY
:31E:070418
:26F:PRINCIPAL
:32B:EUR1000000,00
:36:1,31000000
:33B:USD1310000,00
:37K:PCT1,60000000
:34P:061127USD16000,00
:57A:ABCDEFZZ80A
This is my Regex
Regex r = new Regex(#"\:\d{2}\w*\:", RegexOptions.Multiline);
MatchCollection matches = r.Matches(Content);
string[] items = r.Split(Content);
// ----- Fix for first entry being empty string.
int index = items[0] == string.Empty ? 1 : 0;
foreach (Match match in matches)
{
MessageField field = new MessageField();
field.FieldIdExtended = match.Value;
field.Content = items[index];
Fields.Add(field);
index++;
}
As you can see from the comments the problem occurs with the splitting of the string.
It returns as first item an empty string.
Is there any elegant way to solve this?
Thanks, Dimi
The reason that you are getting this behaviour is that your first delimiter from the split has nothing before it and this the first entry is blank.
The way to solve this properly is probably to capture the value that you want in the regular expression and then just get it from your match set.
At a rough first guess you probably want something like:
Regex r = new Regex(#"^:(?<id>\d{2}\w*):(?<content>.*)$", RegexOptions.Multiline);
MatchCollection matches = r.Matches(Content);
foreach (Match match in matches)
{
MessageField field = new MessageField();
field.FieldIdExtended = match.Groups["id"].ToString()
field.Content = match.Groups["content"].ToString();
Fields.Add(field);
}
The use of named capture groups makes it easy to extract stuff. You may need to tweak the regex to be more as you want. Currently it gets 20 as id and 0444453880181732 as content. I wasn't 100% clear on what you needed to capture but you look ok with regex so I assume that isn't a problem. :)
Essentially here you are not really trying to split stuff but match stuff and pull it out.
use:
string[] items = r.Split(Content, StringSplitOptions.RemoveEmptyEntries);
to remove empty entries.
In my current project I have to work alot with substring and I'm wondering if there is an easier way to get out numbers from a string.
Example:
I have a string like this:
12 text text 7 text
I want to be available to get out first number set or second number set.
So if I ask for number set 1 I will get 12 in return and if I ask for number set 2 I will get 7 in return.
Thanks!
This will create an array of integers from the string:
using System.Linq;
using System.Text.RegularExpressions;
class Program {
static void Main() {
string text = "12 text text 7 text";
int[] numbers = (from Match m in Regex.Matches(text, #"\d+") select int.Parse(m.Value)).ToArray();
}
}
Try using regular expressions, you can match [0-9]+ which will match any run of numerals within your string. The C# code to use this regex is roughly as follows:
Match match = Regex.Match(input, "[0-9]+", RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// here you get the first match
string value = match.Groups[1].Value;
}
You will of course still have to parse the returned strings.
Looks like a good match for Regex.
The basic regular expression would be \d+ to match on (one or more digits).
You would iterate through the Matches collection returned from Regex.Matches and parse each returned match in turn.
var matches = Regex.Matches(input, "\d+");
foreach(var match in matches)
{
myIntList.Add(int.Parse(match.Value));
}
You could use regex:
Regex regex = new Regex(#"^[0-9]+$");
you can split the string in parts using string.Split, and then travese the list with a foreach applying int.TryParse, something like this:
string test = "12 text text 7 text";
var numbers = new List<int>();
int i;
foreach (string s in test.Split(' '))
{
if (int.TryParse(s, out i)) numbers.Add(i);
}
Now numbers has the list of valid values
I'm developing a simple little search mechanism and I want to allow the user to search for chunks of text with spaces. For example, a user can search for the name of a person:
Name: John Smith
I then "John Smith".Split(' ') into an array of two elements, {"John","Smith"}. I then return all of the records that match "John" AND "Smith" first followed by records that match either "John" OR "Smith." I then return no records for no matches. This isn't a complicated scenario and I have this part working.
I'd now like to be able to allow the user to ONLY return records that match "John Smith"
I'd like to use a basic quote syntax for searching. So if a user wants to search for "John Smith" OR Pocahontas they would enter: "John Smith" Pocahontas. The order of terms is absolutely irrelevant; "John Smith" does not receive priority over Pocahontas because he comes first in the list.
I have two main trains of thought on how I should parse the input.
A) Using regular expression then parsing stuff (IndexOf, Split)
B) Using only the parsing methods
I think a logical point of action would be to find the stuff in quotes; then remove it from the original string and insert it into a separate list. Then all the stuff left over from the original string could be split on the space and inserted into that separate list. If there is either 1 quote or an odd number, it is simply removed from the list.
How do I find matches the from within regex? I know about regex.Replace, but how would I iterate through the matches and insert them into a list. I know there is some neat way to do this using the MatchEvaluator delegate and linq, but I know basically nothing about regex in c#.
EDIT: Came back to this tab withou refreshing and didn't realize this question was already answered... accepted answer is better.
I think pulling out the stuff in quotes first with regex is a good idea. Maybe something like this:
String sampleInput = "\"John Smith\" Pocahontas Bambi \"Jane Doe\" Aladin";
//Create regex pattern
Regex regex = new Regex("\"([^\".]+)\"");
List<string> searches = new List<string>();
//Loop through all matches from regex
foreach (Match match in regex.Matches(sampleInput))
{
//add the match value for the 2nd group to the list
//(1st group is the entire match)
//(2nd group is the first parenthesis group in the defined regex pattern
// which in this case is the text inside the quotes)
searches.Add(match.Groups[1].Value);
}
//remove the matches from the input
sampleInput = regex.Replace(sampleInput, String.Empty);
//split the remaining input and add the result to our searches list
searches.AddRange(sampleInput.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries));
I needed the same functionality as Shawn but I didn't want to use regex. Here is a simple solution that I came up with uses Split() instead of regex for anyone else needing this functionality.
This works because the Split method, by default, will create empty entries in the array for consecutive search values in the source string. If we split on the quote character then the result is an array where the even indexed entries are individual words and the odd indexed entries will be the quotes phrases.
Example:
“John Smith” Pocahontas
Results in
item(0) = (empty string)
item(1) = John Smith
item(2) = Pocahontas
And
1 2 “3 4” 5 “6 7” “8 9”
Results in
item(0) = 1 2
item(1) = 3 4
item(2) = 5
item(3) = 6 7
item(4) = (empty string)
item(5) = 8 9
Note that an unmatched quote will result in a phrase from the last quote to the end of the input string.
public static List<string> QueryToTerms(string query)
{
List<string> Result = new List<string>();
// split on the quote token
string[] QuoteTerms = query.Split('"');
// switch to denote if the current loop is processing words or a phrase
bool WordTerms = true;
foreach (string Item in QuoteTerms)
{
if (!string.IsNullOrWhiteSpace(Item))
if (WordTerms)
{
// Item contains words. parse them and ignore empty entries.
string[] WTerms = Item.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string WTerm in WTerms)
Result.Add(WTerm);
}
else
// Item is a phrase.
Result.Add(Item);
// Alternate between words and phrases.
WordTerms = !WordTerms;
}
return Result;
}
Use a regex like this:
string input = "\"John Smith\" Pocahontas";
Regex rx = new Regex(#"(?<="")[^""]+(?="")|[^\s""]\S*");
for (Match match = rx.Match(input); match.Success; match = match.NextMatch()) {
// use match.Value here, it contains the string to be searched
}