Locate RegEx match then extract

Locate RegEx match then extract - c#

I am trying to read text from a RichTextBox in order to locate the first occurrence of a matched expression. I would then like to extract the string that satisfies they query so I can use it as a variable. Below is the basic bit of code I have to start of with and build upon.
private string returnPostcode()
{
string[] allLines = rtxtDocViewer.Text.Split('\n');
string expression = string expression = "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$"
foreach (string line in allLines)
{
if (Regex.Matches(line, expression, RegexOptions.Count > 0)
{
//extract and return the string that is found
}
}
}
Example of what's contained in the RichTextBox is below. I want to extract "E12 8SD" which the above regex should be able to find. Thanks
Damon Brown
Flat B University Place
26 Park Square
London
E12 8SD
Mobile: 1111 22222
Email: dabrown192882#gmail.com Date of birth: 21/03/1986
Gender: Male
Marital Status: Single
Nationality: English
Summary
I have acquired a multifaceted skill set with experience using several computing platforms.

You need to use Regex.IsMatch and remove the RegexOptions.Count > 0
string[] allLines = s.Split('\n');
string expression = "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$";
foreach (string line in allLines)
{
if (Regex.IsMatch(line, expression)) // Regex.IsMatch will check if a string matches the regex
{
Console.WriteLine(line); // Print the matched line
}
}
See the IDEONE Demo
Quite possible that your text contains CR+LF line breaks. Then, adjust your code as follows:
string[] allLines = s.Split(new[] {"\r\n"}, StringSplitOptions.RemoveEmptyEntries);
See this demo
UPDATE
To just extract the code with your regex, you need not split the contents into lines, just use a Regex.Match on the whole text:
string s = "Damon Brown\nFlat B University Place\n26 Park Square \nLondon\nTW1 1AJ Twickenham Mobile: +44 (0) 7711223344\nMobile: 1111 22222\nEmail: dabrown192882#gmail.com Date of birth: 21/03/1986\nGender: Male\nMarital Status: Single\nNationality: English\nSummary\nI have acquired a multifaceted skill set with experience using several computing platforms.";
string expression = #"(?i)\b(gir 0a{2})|((([a-z][0-9]{1,2})|(([a-z][a-hj-y][0-9]{1,2})|(([a-z][0-9][a-z])|([a-z][a-hj-y][0-9]?[a-z])))) [0-9][a-z]{2})\b";
Match res = Regex.Match(s, expression);
if (res.Success)
Console.WriteLine(res.Value); // = > TW1 1AJ
I also removed the uppercase ranges to replace them with a case-insensitive modifier (?i).
See this IDEONE demo.

Related

Finding multiple occurrences of a word and printing out the entire line in which the word exists using c-sharp

Hey guys this is something I am trying to do for a while...
So I have a text file online which can be accessed through a link like this https://example.com/myfile.txt
the text file contains a few sentences, here are a few
This is a sentence
sentences are a combination of multiple words
we cannot imagine a world without sentences
words together form a sentence
here's a sentence orange is a fruit and I love it!
how's the day today?
Okay that was all random sentences in the text file, so now If I enter the word 'words' in the input (string input) I want it to print out all the lines which contain the input, here's an example
sentences are a combination of multiple words
words together form a sentence

This is an obvious case to use Regex.
using System.Text.RegularExpression;
string text = #"This is a sentence
sentences are a combination of multiple words
we cannot imagine a world without sentences
words together form a sentence
here's a sentence orange is a fruit and I love it!
how's the day today?";
string input = "words";
Regex regex = new Regex(#"^.*" + input + #".*$", RegexOptions.Multiline);
foreach (Match match in regex.Matches(text))
{
Console.WriteLine(match.value);
}
Explanation: #"^.*" + input + #".*$", RegexOptions.Multiline:
^.* match zero or more characters from start of line
input match the string in input string
.*$ match zero or more characters at end of line
RegexOptions.Multiline makes ^ and $ match start and end of line (normally they match start and end of whole text).

So, just for the fun of it, you can use these functions (they do what you asked, and the second one allows you to pass the URI so it will download the text file from a web address and proceed with the search.
public static global::System.Collections.Generic.List<string> SearchFor(string[] Lines, string Term)
{
global::System.Collections.Generic.List<string> r = new global::System.Collections.Generic.List<string>(Lines.Length);
foreach (string l in Lines) { if (l.Contains(Term)) { r.Add(l.Trim()); } }
r.TrimExcess();
return r;
}
public static global::System.Collections.Generic.List<string> SearchFor(string Text, string Term, bool TextIsUri = false)
{
if (TextIsUri) { using (global::System.Net.WebClient w = new global::System.Net.WebClient()) { Text = w.DownloadString(new global::System.Uri(Text)); } }
return SearchFor(Text.Split(new char[] { '\n' }, global::System.StringSplitOptions.RemoveEmptyEntries), Term);
}
Surelly you can also provide some safety measures like testing if string is empty and such, but i left it without it so you can see the simplest code and improve on it.

Get string between strings in c#

I am trying to get string between same strings:
The texts starts here ** Get This String ** Some other text ongoing here.....
I am wondering how to get the string between stars. Should I should use some regex or other functions?

You can try Split:
string source =
"The texts starts here** Get This String **Some other text ongoing here.....";
// 3: we need 3 chunks and we'll take the middle (1) one
string result = source.Split(new string[] { "**" }, 3, StringSplitOptions.None)[1];

You can use IndexOf to do the same without regular expressions.
This one will return the first occurence of string between two "**" with trimed whitespaces. It also has checks of non-existence of a string which matches this condition.
public string FindTextBetween(string text, string left, string right)
{
// TODO: Validate input arguments
int beginIndex = text.IndexOf(left); // find occurence of left delimiter
if (beginIndex == -1)
return string.Empty; // or throw exception?
beginIndex += left.Length;
int endIndex = text.IndexOf(right, beginIndex); // find occurence of right delimiter
if (endIndex == -1)
return string.Empty; // or throw exception?
return text.Substring(beginIndex, endIndex - beginIndex).Trim();
}
string str = "The texts starts here ** Get This String ** Some other text ongoing here.....";
string result = FindTextBetween(str, "**", "**");
I usually prefer to not use regex whenever possible.

If you want to use regex, this could do:
.*\*\*(.*)\*\*.*
The first and only capture has the text between stars.
Another option would be using IndexOf to find the position of the first star, check if the following character is a star too and then repeat that for the second set. Substring the part between those indexes.

If you can have multiple pieces of text to find in one string, you can use following regex:
\*\*(.*?)\*\*
Sample code:
string data = "The texts starts here ** Get This String ** Some other text ongoing here..... ** Some more text to find** ...";
Regex regex = new Regex(#"\*\*(.*?)\*\*");
MatchCollection matches = regex.Matches(data);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}

You could use split but this would only work if there is 1 occurrence of the word.
Example:
string output = "";
string input = "The texts starts here **Get This String **Some other text ongoing here..";
var splits = input.Split( new string[] { "**", "**" }, StringSplitOptions.None );
//Check if the index is available
//if there are no '**' in the string the [1] index will fail
if ( splits.Length >= 2 )
output = splits[1];
Console.Write( output );
Console.ReadKey();

You can use SubString for this:
String str="The texts starts here ** Get This String ** Some other text ongoing here";
s=s.SubString(s.IndexOf("**"+2));
s=s.SubString(0,s.IndexOf("**"));

Subtitle's Time Editor with Regular Expressions

I have a subtitle in my string
string subtitle = Encoding.ASCII.GetString(srt_text);
srt_text is a byte array. I am converting it to string as you can see. subtitle starts and finish with
Starts:
1
00:00:40,152 --> 00:00:43,614
Out west there was this fella,
2
00:00:43,697 --> 00:00:45,824
fella I want to tell you about,
Finish:
1631
01:52:17,016 --> 01:52:20,019
Catch ya later on
down the trail.
1632
01:52:20,102 --> 01:52:24,440
Say, friend, you got any more
of that good Sarsaparilla?
Now I want to take times and put them into array. I tried
Regex rgx = new Regex(#"^(?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9],[0-9][0-9][0-9]$", RegexOptions.IgnoreCase);
Match m = rgx.Match(subtitle);
I am thinking I can just find times but didn't put into array.
Assume 'times' is my string array. I want to array output like that
times[0] = "00:00:40,152"
times[1] = "00:00:43,614"
...
times[n-1] = "01:52:20,102"
times[n] = "01:52:24,440"
It have to keep going when subtitle is finish. All times might be in.
I am open for your advise. How can I do this? I am new probably have a lot of mistakes. I apoligize. Hope you can understand and help me.

Using Regular Expressions
You can do this with Regex with multiple matches using Regex.Matches
The regex used is
(\d{2}:\d{2}:\d{2},\d+)
\d select digits
{2} count of repeatition
+ one or many repeatitions
: and , are plain characters without meaning.
Here is the syntax.
var matchList = Regex.Matches(subtitle, #"(\d{2}:\d{2}:\d{2},\d+)",RegexOptions.Multiline);
var times = matchList.Cast<Match>().Select(match => match.Value).ToList();
With this your times variable will be filled with all the time substrings.
Below is the result screenshot.
Also note: The RegexOptions.Multiline part is optional in this scenario.

Probably this might help you get the times from the string you have.
string subtitle = #"1
00:00:40,152 --> 00:00:43,614
Out west there was this fella,
2
00:00:43,697 --> 00:00:45,824
fella I want to tell you about,";
List<string> timestrings = new List<string>();
List<string> splittedtimestrings = new List<string>();
List<string> splittedstring = subtitle.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries ).ToList();
foreach(string st in splittedstring)
{
if(st.Contains("00"))
{
timestrings.Add(st);
}
}
foreach(string s in timestrings)
{
string[] foundstr = s.Split(new string[] { " --> " }, StringSplitOptions.RemoveEmptyEntries);
splittedtimestrings.Add(foundstr[0]);
splittedtimestrings.Add(foundstr[1]);
}
I have tried splitting the string to get the time string instead of Regex. Because I think Regex should be used to processes text based on pattern matches rather than on comparing and matching literal text.

How to separate numbers from words, chars and any other marks with whitespace in string

I'm trying to separate numbers from words or characters and any other punctuation with whitespace in string wrote them together e.g. string is:
string input = "ok, here is369 and777, and 20k0 10+1.any word.";
and desired output should be:
ok, here is 369 and 777 , and 20 k 0 10 + 1 .any word.
I'm not sure if I'm on right way, but now what I'm trying to do, is to find if string contains numbers and then somehow replace it all with same values but with whitespace between. If it is possible, how can I find all individual numbers (not each digit in number to be clearer), separated or not separated by words or whitespace and attach each found number to value, which can be used for all at once to replace it with same numbers but with spaces on sides. This way it returns only first occurrence of a number in string:
class Program
{
static void Main(string[] args)
{
string input = "here is 369 and 777 and 15 2080 and 579";
string resultString = Regex.Match(input, #"\d+").Value;
Console.WriteLine(resultString);
Console.ReadLine();
}
}
output:
369
but also I'm not sure if I can get all different found number for single replacement value for each. Would be good to find out in which direction to go

If what we need is basically to add spaces around numbers, try this:
string tmp = Regex.Replace(input, #"(?<a>[0-9])(?<b>[^0-9\s])", #"${a} ${b}");
string res = Regex.Replace(tmp, #"(?<a>[^0-9\s])(?<b>[0-9])", #"${a} ${b}");
Previous answer assumed that words, numbers and punctuation should be separated:
string input = "here is369 and777, and 20k0";
var matches = Regex.Matches(input, #"([A-Za-z]+|[0-9]+|\p{P})");
foreach (Match match in matches)
Console.WriteLine("{0}", match.Groups[1].Value);
To construct the required result string in a short way:
string res = string.Join(" ", matches.Cast<Match>().Select(m => m.Groups[1].Value));

You were on the right path. Regex.Match only returns one match and you would have to use .NextMatch() to get the next value that matches your regular expression. Regex.Matches returns every possible match into a MatchCollection that you can then parse with a loop as I did in my example:
string input = "here is 369 and 777 and 15 2080 and 579";
foreach (Match match in Regex.Matches(input, #"\d+"))
{
Console.WriteLine(match.Value);
}
Console.ReadLine();
This Outputs:
369
777
15
2080
579

This provides the desired output:
string input = "ok, here is369 and777, and 20k0 10+1.any word.";
var matches = Regex.Matches(input, #"([\D]+|[0-9]+)");
foreach (Match match in matches)
Console.Write("{0} ", match.Groups[0].Value);
[\D] will match anything non digit. Please note space after {0}.

C# Regex Split - How do I split string into 2 words

I have the following string:
String myNarrative = "ID: 4393433 This is the best narration";
I want to split this into 2 strings;
myId = "ID: 4393433";
myDesc = "This is the best narration";
How do I do this in Regex.Split()?
Thanks for your help.

If it is a fixed format as shown, use Regex.Match with Capturing Groups (see Matched Subexpressions). Split is useful for dividing up a repeating sequence with unbound multiplicity; the input does not represent such a sequence but rather a fixed set of fields/values.
var m = Regex.Match(inp, #"ID:\s+(\d+)\s+(.*)\s+");
if (m.Success) {
var number = m.Groups[1].Value;
var rest = m.Groups[2].Value;
} else {
// Failed to match.
}
Alternatively, one could use Named Groups and have a read through the Regular Expression Language quick-reference.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Locate RegEx match then extract - c#

Related

Finding multiple occurrences of a word and printing out the entire line in which the word exists using c-sharp

Get string between strings in c#

Subtitle's Time Editor with Regular Expressions

How to separate numbers from words, chars and any other marks with whitespace in string

C# Regex Split - How do I split string into 2 words

Categories

Resources