Regexp find position of different characters in string - c#

I have a string conforming to the following pattern:
(cc)-(nr).(nr)M(nr)(cc)whitespace(nr)
where cc is artbitrary number of letter characters, nr is arbitrary number of numerical characters, and M is is the actual letter M.
For example:
ASF-1.15M437979CA 100000
EU-12.15M121515PO 1145
I need to find the positions of -, . and M whithin the string. The problem is, the leading characters and the ending characters can contain the letter M as well, but I need only the one in the middle.
As an alternative, the subtraction of the first characters (until -) and the first two numbers (as in (nr).(nr)M...) would be enough.

If you need a regex-based solution, you just need to use 3 capturing groups around the required patterns, and then access the Groups[n].Index property:
var rxt = new Regex(#"\p{L}*(-)\d+(\.)\d+(M)\d+\p{L}*\s*\d+");
// Collect matches
var matches = rxt.Matches(#"ASF-1.15M437979CA 100000 or EU-12.15M121515PO 1145");
// Now, we can get the indices
var posOfHyphen = matches.Cast<Match>().Select(p => p.Groups[1].Index);
var posOfDot = matches.Cast<Match>().Select(p => p.Groups[2].Index);
var posOfM = matches.Cast<Match>().Select(p => p.Groups[3].Index);
Output:
posOfHyphen => [3, 32]
posOfDot => [5, 35]
posOfM => [8, 38]

Regex:
string pattern = #"[A-Z]+(-)\d+(\.)\d+(M)\d+[A-Z]+";
string value = "ASF-1.15M437979CA 100000 or EU-12.15M121515PO 1145";
var match = Regex.Match(value, pattern);
if (match.Success)
{
int sep1 = match.Groups[1].Index;
int sep2 = match.Groups[2].Index;
int sep3 = match.Groups[3].Index;
}

Related

Splitting a string at first number and then returning 2 strings

Having some trouble adapting my splitting of a string into 2 parts to do it from the first number. It's currently splitting on the first space, but that won't work long term because cities have spaces in them too.
Current code:
var string = "Chicago 1234 Anytown, NY"
var commands = parameters.Split(new[] { ' ' }, 2);
var originCity = commands[0];
var destination = commands[1];
This works great for a city that has a single name, but I break on:
var string = "Los Angeles 1234 Anytown, NY"
I've tried several different approaches that I just haven't been able to work out. Any ideas on being able to return 2 strings as the following:
originCity = Los Angeles
destination = 1234 Anytown, NY
You can't use .Split() for this.
Instead, you need to find the index of the first number. You can use .indexOfAny() with an array of numbers (technically a char[] array) to do this.
int numberIndex = address.IndexOfAny("0123456789".ToCharArray())
You can then capture two substrings; One before the index, the other after.
string before = line.Substring(0, numberIndex);
string after = line.Substring(numberIndex);
You could use Regex. In the following, match is the first match in the regex results.
var match = Regex.Match(s, "[0-9]");
if (match.Success)
{
int index = match.Index;
originCity = s.Substring(0, index);
destination = s.Substring(index, s.Length - index);
}
Or you can do it yourself:
int index = 0;
foreach (char c in s)
{
int result;
if (int.TryParse(c, out result))
{
index = result;
break;
}
//or if (char.IsDigit()) { index = int.Parse(c); break; }
}
...
You should see if using a regular expression will do what you need here. At least with the sample data you're showing, the expression:
(\D+)(\d+)(\D+)
would group the results into non-numeric characters up to the first numeric character, the numeric characters until a non-numeric is encountered, and then the rest of the non-numeric characters. Here is how it would be used in code:
var pattern = #"(\D+)(\d+)(\D+)";
var input = "Los Angeles 1234 Anytown, NY";
var result = Regex.Match(input, pattern);
var city = result.Groups[1];
var destination = $"{result.Groups[2]} {result.Groups[3]}";
This falls apart in cases like 29 Palms, California or if the numbers would contain comma, decimal, etc so it is certainly not a silver bullet but I don't know your data and it may be ok for such a simple solution.

How to get two numerical values from a string in C#

I have a string like this :
X LIMITED COMPANY (52100000/58447000)
I want to extract X LIMITED COMPANY, 52100000 and 58447000 seperately.
I'm extracting X LIMITED COMPANY like this :
companyName = Regex.Match(mystring4, #"[a-zA-Z\s]+").Value.Trim();
But I'm stuck with extracting numbers, they can be 1, 2 or large numbers in the example. Can you show me how to extract those numbers? Thanks.
Try regular expressions with alternative | (or):
Either word symbols (but not digits) [\w-[\d]][\w\s-[\d]]+)
Digits only ([0-9]+)
E.g.
string mystring4 = #"AKASYA CAM SANAYİ VE TİCARET LİMİTED ŞİRKETİ(52100000 / 58447000)";
string[] values = Regex
.Matches(mystring4, #"([\w-[\d]][\w\s-[\d]]+)|([0-9]+)")
.OfType<Match>()
.Select(match => match.Value.Trim())
.ToArray();
Test
// X LIMITED COMPANY
// 52100000
// 58447000
Console.Write(string.Join(Environment.NewLine, values));
I suggested changing the initial pattern [a-zA-Z\s]+ into [a-zA-Z][a-zA-Z\s]+ in order to skip matches which contain separators only (e.g. " ")
Try using named groups:
var s = "X LIMITED COMPANY (52100000 / 58447000)";
var regex = new Regex(#"(?<CompanyName>[^\(]+)\((?<Num1>\d+)\s*/\s*(?<Num2>\d+)\)");
var match = regex.Match(s);
var companyName = match.Groups["CompanyName"];
If the format is fixed, you could try this:
var regex = new Regex(#"^(?<name>[^\(]+)\((?<n1>\d+)/(?<n2>\d+)\)");
var match = regex.Match(input);
var companyName = match.Groups["name"].Value;
var number1 = Convert.ToInt64(match.Groups["n1"].Value);
var number2 = Convert.ToInt64(match.Groups["n2"].Value);
This matches everything up to the open parentheses and puts it into a named group "name". Then it matches two numbers within parentheses, separated by "/" and puts them into groups named "n1" and "n2" respectively.

regex to strip number from var in string

I have a long string and I have a var inside it
var abc = '123456'
Now I wish to get the 123456 from it.
I have tried a regex but its not working properly
Regex regex = new Regex("(?<abc>+)=(?<var>+)");
Match m = regex.Match(body);
if (m.Success)
{
string key = m.Groups["var"].Value;
}
How can I get the number from the var abc?
Thanks for your help and time
var body = #" fsd fsda f var abc = '123456' fsda fasd f";
Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'");
Match m = regex.Match(body);
Console.WriteLine("name: " + m.Groups["name"]);
Console.WriteLine("number: " + m.Groups["number"]);
prints:
name: abc
number: 123456
Your regex is not correct:
(?<abc>+)=(?<var>+)
The + are quantifiers meaning that the previous characters are repeated at least once (and there are no characters since (?< ... > ... ) is named capture group and is not considered as a character per se.
You perhaps meant:
(?<abc>.+)=(?<var>.+)
And a better regex might be:
(?<abc>[^=]+)=\s*'(?<var>[^']+)'
[^=]+ will match any character except an equal sign.
\s* means any number of space characters (will also match tabs, newlines and form feeds though)
[^']+ will match any character except a single quote.
To specifically match the variable abc, you then put it like this:
(?<abc>abc)\s*=\s*'(?<var>[^']+)'
(I added some more allowances for spaces)
From the example you provided the number can be gotten such as
Console.WriteLine (
Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456
\d+ means 1 or more numbers (digits).
But I surmise your data doesn't look like your example.
Try this:
var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x";
Regex regex = new Regex(#"(?<=var \w+ = ')\d+");
Match m = regex.Match(body);

Using Regular Expressions to extract groups of numbers from a string

I need to convert a string like,
"[1,2,3,4][5,6,7,8]"
into groups of integers, adjusted to be zero based rather than one based:
{0,1,2,3} {4,5,6,7}
The following rules also apply:
The string must contain at least 1 group of numbers with enclosing square brackets.
Each group must contain at least 2 numbers.
Every number must be unique (not something I'm attempting to achieve with the regex).
0 is not valid, but 10, 100 etc are.
Since I'm not that experienced with regular expressions, I'm currently using two;
#"^(?:\[(?:[1-9]+[\d]*,)+(?:[1-9]+[\d]*){1}\])+$";
and
#"\[(?:[1-9]+[\d]*,)+(?:[1-9]+[\d]*){1}\]";
I'm using the first one to check the input and the second to get all matches of a set of numbers inside square brackets.
I'm then using .Net string manipulation to trim off the square brackets and extract the numbers, parsing them and subtracting 1 to get the result I need.
I was wondering if I could get at the numbers better by using captures, but not sure how they work.
Final Solution:
In the end I used the following regular expression to validate the input string
#"^(?<set>\[(?:[1-9]\d{0,7}(?:]|,(?=\d))){2,})+$"
agent-j's pattern is fine for capturing the information needed but also matches a string like "[1,2,3,4][5]" and would require me to do some additional filtering of the results.
I access the captures via the named group 'set' and use a second simple regex to extract the numbers.
The '[1-9]\d{0,7}' simplifies parsing ints by limiting numbers to 99,999,999 and avoiding overflow exceptions.
MatchCollection matches = new Regex(#"^(?<set>\[(?:[1-9]\d{0,7}(?:]|,(?=\d))){2,})+$").Matches(inputText);
if (matches.Count != 1)return;
CaptureCollection captures = matches[0].Groups["set"].Captures;
var resultJArray = new int[captures.Count][];
var numbersRegex = new Regex(#"\d+");
for (int captureIndex = 0; captureIndex < captures.Count; captureIndex++)
{
string capture = captures[captureIndex].Value;
MatchCollection numberMatches = numbersRegex.Matches(capture);
resultJArray [captureIndex] = new int[numberMatches.Count];
for (int numberMatchIndex = 0; numberMatchIndex < numberMatches.Count; numberMatchIndex++)
{
string number = numberMatches[numberMatchIndex].Value;
int numberAdjustedToZeroBase = Int32.Parse(number) - 1;
resultJArray [captureIndex][numberMatchIndex] = numberAdjustedToZeroBase;
}
}
string input = "[1,2,3,4][5,6,7,8][534,63433,73434,8343434]";
string pattern = #"\G(?:\[(?:(\d+)(?:,|(?=\]))){2,}\])";//\])+$";
MatchCollection matches = Regex.Matches (input, pattern);
To start out, any (regex) with plain parenthasis is a capturing group. This means that the regex engine will capture (store positions matched by that group). To avoid this (when you don't need it, use (?:regex). I did that above.
Index 0 is special and it means the whole of the parent. I.E. match.Groups[0].Value is always the same as match.Value and match.Groups[0].Captures[0].Value. So, you can consider the Groups and Capture collections to start at index 1.
As you can see below, each match contains a bracketed digit group. You'll want to use captures 1-n from Group 1 of each match.
foreach (Match match in matches)
{
// [1,2]
// use captures 1-n from the first group.
for (int i = 1; i < match.Group[1].Captures.Count; i++)
{
int number = int.Parse(match.Group[1].Captures[i]);
if (number == 0)
throw new Exception ("Cannot be 0.");
}
}
Match[0] => [1,2,3,4]
Group[0] => [1,2,3,4]
Capture[0] => [1,2,3,4]
Group[1] => 4
Capture[0] => 1
Capture[1] => 2
Capture[2] => 3
Capture[3] => 4
Match[1] => [5,6,7,8]
Group[0] => [5,6,7,8]
Capture[0] => [5,6,7,8]
Group[1] => 8
Capture[0] => 5
Capture[1] => 6
Capture[2] => 7
Capture[3] => 8
Match[2] => [534,63433,73434,8343434]
Group[0] => [534,63433,73434,8343434]
Capture[0] => [534,63433,73434,8343434]
Group[1] => 8343434
Capture[0] => 534
Capture[1] => 63433
Capture[2] => 73434
Capture[3] => 8343434
The \G causes the match to begin at the start of the last match (so you won't match [1,2] [3,4]). The {2,} satisfies your requirement that there be at least 2 numbers per match.
The expression will match even if there is a 0. I suggest that you put that validation in with the other non-regex stuff. It will keep the regex simpler.
The following regex will validate and also spit out match groups of the bracketed [] group and also the inside that, each number
(?:([1-9][0-9]*)\,?){2,}
[1][5] - fail
[1] - fail
[] - fail
[a,b,c][5] - fail
[1,2,3,4] - pass
[1,2,3,4,5,6,7,8][5,6,7,8] - pass
[1,2,3,4][5,6,7,8][534,63433,73434,8343434] - pass
What about \d+ and a global flag?

Can i use regex to find the index of X?

I have a big string, and want to find the first occurrence of X, X is "numberXnumber"... 3X3, or 4X9...
How could i do this in C#?
var s = "long string.....24X10 .....1X3";
var match = Regex.Match(s, #"\d+X\d+");
if (match.Success) {
Console.WriteLine(match.Index); // 16
Console.WriteLine(match.Value); // 24X10;
}
Also take a look at NextMatch which is a handy function
match = match.NextMatch();
match.Value; // 1X3;
For those who love extension methods:
public static int RegexIndexOf(this string str, string pattern)
{
var m = Regex.Match(str, pattern);
return m.Success ? m.Index : -1;
}
Yes, regex could do that for you
you could do ([0-9]+)X([0-9]+) If you know that the numbers are only single digit you could take [0-9]X[0-9]
this may help you
string myText = "33x99 lorem ipsum 004x44";
//the first matched group index
int firstIndex = Regex.Match(myText,"([0-9]+)(x)([0-9]+)").Index;
//first matched "x" (group = 2) index
int firstXIndex = Regex.Match(myText,"([0-9]+)(x)([0-9]+)").Groups[2].Index;
var index = new Regex("yourPattern").Match("X").Index;
http://www.regular-expressions.info/download/csharpregexdemo.zip
You can use this pattern:
\d([xX])\d
If I test
blaat3X3test
I get:
Match offset: 5 Match length: 3
Matched text: 3X3 Group 1 offset: 6
Group 1 length: 1 Group 1 text: X
Do you want the number, or the index of the number? You can get both of these, but you're probably going to want to take a look at System.Text.RegularExpressions.Regex
The actual pattern is going to be [0-9]x[0-9] if you want only single numbers (89x72 will only match 9x7), or [0-9]+x[0-9]+ to match the longest consecutive string of numbers in both directions.

Categories