Regex: Parse specific string with one 18-digit number - c#

C#/.NET 4.0
I need to parse a string containing a 18-digit number. I also need the substrings at the left and right side.
Example strings:
string a = "Frl Camp Gerbesklooster 871687120000000691 OPLDN 2010 H1";
string b = "some text with spaces 123456789012345678 more text";
How it should be parsed:
string aParsed[0] = "Frl Camp Gerbesklooster";
string aParsed[1] = "871687120000000691";
string aParsed[2] = "OPLDN 2010 H1";
string bParsed[0] = "some text with spaces";
string bParsed[1] = "123456789012345678";
string bParsed[2] = "more text";
There is always that 18-digit number in the middle of the string. I'm an absolute newbie to Regex so I don't actually have a try of my own.
What is the best way to do this? Should I use regular expressions?
Thanks.

You can use something like the regex: (.*)(\d{18})(.*).
The key here is to use {18} to specify that there must be exactly 18 digits and to capture each part in a group.
var parts = Regex.Matches(s, #"(.*)(\d{18})(.*)")
.Cast<Match>()
.SelectMany(m => m.Groups.Cast<Group>().Skip(1).Select(g=>g.Value))
.ToArray();

Daniël,
Although the question is answered the following may be a useful reference for learning Reg Expressions.
http://txt2re.com
Regards,
Liam

Related

Find all occurrences of substrings matching a pattern

I would like to use C# to extract from a single string all occurrences of sub-strings with pattern which is: white space followed by any text.
So for example if I have a string “This is a very short sentence” then I want to be able to obtain 5 strings:
“is a very short sentence”
“a very short sentence”
“very short sentence”
“short sentence”
“sentence”
From the example above sub-strings should not include leading white space. Also being able to access each obtained string by index would be great.
I tried to use regex but I was unable to bypass first match.
Please help
Using Split and some Linq:
string text2 = "This is a very short sentence";
// Get all words except first one
var parts = text2.Split(' ').Skip(1);
// Generate various combinations
var result = Enumerable.Range(0, parts.Count())
.Select(i => string.Join(" ", parts.Skip(i)));
Make a try with looping and Substring method:
string inputStr = "This is a very short sentence";
List<string> subStringList = new List<string>();
while(inputStr.IndexOf(' ')!=-1)
{
inputStr= inputStr.Substring(inputStr.IndexOf(' ')+1);
subStringList.Add(inputStr);
}
Console.WriteLine(String.Join("\n",subStringList));
Working Example

Extracting only the substring containing letters from a string containing digits strings and symbols

I have a string that is like the following:
string str = hello_16_0_2016;
What I want is to extract hello from the string. As in my program the string part can occur anywhere as it is autogenerated, so I cannot fix the position of my string.
For example: I can take the first five string from above and store it in a new variable.
But as occurring of letters is random and I want to extract only letters from the string above, so can someone guide me to the correct procedure to do this?
Could you just use a simple regular expression to pull out only alphabetic characters, assuming you only need a-z?
using System.Text.RegularExpressions;
var str = "hello_16_0_2016";
var onlyLetters = Regex.Replace(str, #"[^a-zA-Z]", "");
// onlyLetters = "hello"
I'd use something like this (uses Linq):
var str = "hello_16_0_2016";
var result = string.Concat(str.Where(char.IsLetter));
Check it out
Or, if performance is a concern (because you have to do this on a tight loop, or have to convert hundreds of thousands of strings), it'd probably be faster to do:
var result = new string(str.Where(char.IsLetter).ToArray());
Check it too
But as occurring of letters is random and I want to extract only
letters from the string above, so can someone guide me to the correct
procedure to do this?
The following will extract the first text, without numbers anywhere in the string:
Console.WriteLine( Regex.Match("hello_16_0_2016", #"[A-Za-z]+").Value ); // "hello"

How to write regular expression to get the substring from the string using regular expression in c#?

I have following string
string s=#"\Users\Public\Roaming\Intel\Wireless\Settings";
I want output string like
string output="Wireless";
Sub-string what I want should be after "Intel\" and it should ends with the first "\" after "Intel\" before string Intel and after Intel the string may be different.
I have achieved it using string.substring() but I want to get it using regular expression ? what regular expression should I write to get that string.
For a regex solution you may use:
(?<=intel\\)([^\\]+?)[\\$]
Demo
Notice the i flag.
BTW, Split is much simpler and faster solution than regexes. Regex is associated with patterns of string. For a static/fixed string structure, it is a wise solution to manipulate it with string functions.
With regex, it will look like
var txt = #"\Users\Public\Roaming\Intel\Wireless\Settings";
var res = Regex.Match(txt, #"Intel\\([^\\]+)", RegexOptions.IgnoreCase).Groups[1].Value;
But usually, you should use string methods with such requirements. Here is a demo code (without error checking):
var strt = txt.IndexOf("Intel\\") + 6; // 6 is the length of "Intel\"
var end = txt.IndexOf("\\", strt + 1); // Look for the next "\"
var res2 = txt.Substring(strt, end - strt); // Get the substring
See IDEONE demo
You could also use this if you want everything AFTER the intel/
/(?:intel\\)((\w+\\?)+)/gi
http://regexr.com/3blqm
You would need the $1outcome. Note that $1 will be empty or none existent if the string does not contain Intel/ or anything after it.
Why not use Path.GetDirectoryName and Path.GetFileName for this:
string s = #"\Users\Public\Roaming\Intel\Wireless\Settings";
string output = Path.GetFileName(Path.GetDirectoryName(s));
Debug.Assert(output == "Wireless");
It is possible to iterate over directory components until you find the word Intel and return the next component.

Regular expression question (C#)

How do I write a regular expression to match (_Rev. n.nn) in the following filenames (where n is a number):
Filename_Rev. 1.00
Filename_Rev. 1.10
Thanks
The following should work (for the whole line):
#"^Filename_Rev\.\s\d\.\d\d$"
Should capture versions >9
Edit: Fixed
string captureString = "abc123butts_Rev. 1.00";
Regex reg = new Regex(#"(.(?!_Rev))+\w_Rev\. (?<version>\d+\.\d+)");
string version = reg.Match(captureString).Groups["version"].Value;
Building off of #leppie's answer (give him the green check not me), you can extract the numbers from your regex match by putting parens around the \d's.
Regex foo = new Regex(#"_Rev\.\s(\d)\.(\d\d)$");
GroupCollection groups = foo.Match("Filename_Rev. 1.00").Groups;
string majorNum = groups[1].Value;
string minorNum = groups[2].Value;
System.Console.WriteLine(majorNum);
System.Console.WriteLine(minorNum);

C# regular expression to find custom markers and take content

I have a string:
productDescription
In it are some custom tags such as:
[MM][/MM]
For example the string might read:
This product is [MM]1000[/MM] long
Using a regular expression how can I find those MM tags, take the content of them and replace everything with another string? So for example the output should be:
This product is 10 cm long
I think you'll need to pass a delegate to the regex for that.
Regex theRegex = new Regex(#"\[MM\](\d+)\[/MM\]");
text = theRegex.Replace(text, delegate(Match thisMatch)
{
int mmLength = Convert.ToInt32(thisMatch.Groups[1].Value);
int cmLength = mmLength / 10;
return cmLength.ToString() + "cm";
});
Using RegexDesigner.NET:
using System.Text.RegularExpressions;
// Regex Replace code for C#
void ReplaceRegex()
{
// Regex search and replace
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"\[MM\](?<value>.*)\[\/MM\]", options);
string input = #"[MM]1000[/MM]";
string replacement = #"10 cm";
string result = regex.Replace(input, replacement);
// TODO: Do something with result
System.Windows.Forms.MessageBox.Show(result, "Replace");
}
Or if you want the orginal text back in the replacement:
Regex regex = new Regex(#"\[MM\](?<theText>.*)\[\/MM\]", options);
string replacement = #"${theText} cm";
A regex like this
\[(\w+)\](\d+)\[\/\w+\]
will find and collect the units (like MM) and the values (like 1000). That would at least allow you to use the pairs of parts intelligently to do the conversion. You could then put the replacement string together, and do a straightforward string replacement, because you know the exact string you're replacing.
I don't think you can do a simple RegEx.Replace, because you don't know the replacement string at the point you do the search.
Regex rex = new Regex(#"\[MM\]([0-9]+)\[\/MM\]");
string s = "This product is [MM]1000[/MM] long";
MatchCollection mc = rex.Matches(s);
Will match only integers.
mc[n].Groups[1].Value;
will then give the numeric part of nth match.

Categories