Extract data from a string using Regex.Matches

Extract data from a string using Regex.Matches - c#

I have a string that always takes a general form. I wish to extract information from it and place it in an array.
Given the following input:
John Doe +22\r\nPong
I want the following output
John Doe
+22
Pong
I'm using the following bit of code to extract the details I want.
public static string[] DetailExtractor(string input)
{
return Regex.Matches(input, #"(.*(?=\s\+))|(\+\d{1,2}(?=\\r\\n))|((?<=\\r\\n).*)")
.OfType<Match>()
.Select(m => m.Value)
.ToArray();
}
But it gives me the following output:
Player Name
""
However, using the same regex expression in this online regex tester matches all the elements I want.
Why does it work for one and not the other? Does Regex.Matches not work the way I think it does?

Just taking a guess here, but I'm betting that you are using the following:
var details = DetailExtractor("John Doe +22\r\nPong");
The above would convert \r\n to the a carriage return and a new line character. This would prevent the regex you wrote from working. Instead you can specify a raw string in C# or escape the \r\n:
var details = DetailExtractor(#"John Doe +22\r\nPong");
or
var details = DetailExtractor("John Doe +22\\r\\nPong");
As everyone else has pointed out there's simpler regexes available to do the same type of matching depending on your needs.
The regex below is slightly simpler, but the string array return is slightly more complex.
public static string[] DetailExtractor1(string input)
{
var match = Regex.Match(input, #"^(?<name>\w+\s+\w+)\s+(?<num>\+\d+)\r\n(?<type>\w+)");
if (match.Success)
{
return new string[] {
match.Groups["name"].Value,
match.Groups["num"].Value,
match.Groups["type"].Value
};
}
return null;
}

You can try with one of these:
[a-z]+ [a-z]+ \+[0-9]{1,}\\r\\n[a-z]+
or:
[a-z\s\\]+\+[0-9]{1,}[a-z\s\\]+
or:
[\w\s]+\+\d{1,}\\r\\n[\w]+

Related

What regular expression needed to extract a part from a text to include splitting points too?

heres is the code where I get strings from a textbox:
string txtS = TextBoxS.Text;
than extract some text with regex:
string[] splitS=Regex.Split(txtS, #"\s(we|tomorow)\s");
text: Today we have a rainy day but maybe tomorow will be sunny.
Now after splitting this gives me an output within a splitting point
OutPut: have a rainy day but maybe
But what regular expression to use to get an output including the splitting points or delimiters, so I want this output: we have a rainy day but maybe tomorow I tried some other regular epressions but didn`t find the proper one....

The C# code would be :
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\s(we.+tomorow)\s";
string input = #"Today we have a rainy day but maybe tomorow will be sunny.";
RegexOptions options = RegexOptions.Multiline | RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("{0}", m.Value);
}
Console.ReadKey();
}
}

Get only Whole Words from a .Contains() statement

I've used .Contains() to find if a sentence contains a specific word however I found something weird:
I wanted to find if the word "hi" was present in a sentence which are as follows:
The child wanted to play in the mud
Hi there
Hector had a hip problem
if(sentence.contains("hi"))
{
//
}
I only want the SECOND sentence to be filtered however all 3 gets filtered since CHILD has a 'hi' in it and hip has a 'hi' in it. How do I use the .Contains() such that only whole words get picked out?

Try using Regex:
if (Regex.Match(sentence, #"\bhi\b", RegexOptions.IgnoreCase).Success)
{
//
};
This works just fine for me on your input text.

Here's a Regex solution:
Regex has a Word Boundary Anchor using \b
Also, if the search string might come from user input, you might consider escaping the string using Regex.Escape
This example should filter a list of strings the way you want.
string findme = "hi";
string pattern = #"\b" + Regex.Escape(findme) + #"\b";
Regex re = new Regex(pattern,RegexOptions.IgnoreCase);
List<string> data = new List<string> {
"The child wanted to play in the mud",
"Hi there",
"Hector had a hip problem"
};
var filtered = data.Where(d => re.IsMatch(d));
DotNetFiddle Example

You could split your sentence into words - you could split at each space and then trim any punctuation. Then check if any of these words are 'hi':
var punctuation = source.Where(Char.IsPunctuation).Distinct().ToArray();
var words = sentence.Split().Select(x => x.Trim(punctuation));
var containsHi = words.Contains("hi", StringComparer.OrdinalIgnoreCase);
See a working demo here: https://dotnetfiddle.net/AomXWx

You could write your own extension method for string like:
static class StringExtension
{
public static bool ContainsWord(this string s, string word)
{
string[] ar = s.Split(' ');
foreach (string str in ar)
{
if (str.ToLower() == word.ToLower())
return true;
}
return false;
}
}

Search string using Pattern within long string in C#

I need to search for a pattern within a string.
For eg:
string big = "Hello there, I need information for ticket XYZ12345. I also submitted ticket ZYX54321. Please update.";
Now I need to extract/find/seek words based on the pattern XXX00000 i.e. 3 ALPHA and than 5 numeric.
Is there any way to do this ?
Even extraction will be okay for me.
Please help.

foreach (Match m in Regex.Matches(big, "([A-Za-z]{3}[0-9]{5})"))
{
if (m.Success)
{
m.Groups[1].Value // -- here is your match
}
}

How about this one?
([XYZ]{3}[0-9]{5})
You can use Regex Tester to test your expressions.

You can use simple regular expression to match your following string
([A-Za-z]{3}[0-9]{5})
the full code will be:
string strRegex = #"([A-Za-z]{3}[0-9]{5})";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase);
string strTargetString = #"Hello there, I need information for ticket XYZ12345. I also submitted ticket ZYX54321. Please update.";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

You could always use a chatbot extension for the requests.
as for extracting the required information out of a sentence without any context
you can use regex for that.
you can use http://rubular.com/ to test it,
an example would be
...[0-9]
that would find XXX00000
hope that helped.

Use a regex:
string ticketNumber = string.Empty;
var match = Regex.Match(myString,#"[A-Za-z]{3}\d{5}");
if(match.Success)
{
ticketNumber = match.Value;
}

Here's a regex:
var str = "ABCD12345 ABC123456 ABC12345 XYZ98765";
foreach (Match m in Regex.Matches(str, #"(?<![A-Z])[A-Z]{3}[0-9]{5}(?![0-9])"))
Console.WriteLine(m.Value);
The extra bits are the zero-width negative look-behind ((?<![A-Z])) and look-ahead ((?![0-9])) expressions to make sure you don't capture extra numbers or letters. The above example only catches the third and fourth parts, but not the first and second. A simple [A-Z]{3}[0-9]{5} catches at least the specified number of characters, or more.

RegEx for split text on string .NET

I found this answer for my question, but it for PHP. Perhaps there is an analogue for .NET? I know about Split method, but I don't understand how to save text outside my tags <#any_text#>, and I need a regular expression (by the condition of the task).
For example:
string: aaa<#bbb#>aaa<#bb#>c
list: aaa
<#bbb#>
aaa
<#bb#>
c

Here you have passing test. It wasn't hard to find it on web and it would be definitely faster and better for you - try first finding solution yourself, trying some code, and then ask a question. This way you will actually learn something.
[TestMethod]
public void TestMethod1()
{
string source = "aaa<#bbb#>aaa<#bb#>c";
Regex r = new Regex("(<#.+?#>)");
string[] result = r.Split(source);
Assert.AreEqual(5, result.Length);
}

string input = #"aaa<#bbb#>aaa<#bb#>c";
var list = Regex.Matches(input, #"\<.+?\>|[^\<].+?[^\>]|.+?")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

regexp for find number in a string

I have the following string fromat:
session=11;reserID=1000001
How to get string array of number?
My code:
var value = "session=11;reserID=1000001";
var numbers = Regex.Split(value, #"^\d+");

You probably were on the right track but forgot the character class:
Regex.Split(value, #"[^\d]+");
You can also write it shorter by using \D+ which is equivalent.
However, you'd get an empty element at the start of the returned array, so caution when consuming the result. Sadly, Regex.Split() doesn't have an option that removes empty elements (String.Split does, however). A not very pretty way of resolving that:
Regex.Replace(value, #"[^\d;]", "").Split(';');
based on the assumption that the semicolon is actually the relevant piece where you want to split.
Quick PowerShell test:
PS> 'session=11;reserID=1000001' -replace '[^\d;]+' -split ';'
11
1000001
Another option would be to just skip the element:
Regex.Split(...).Skip(1).ToArray();

Regex
.Matches("session=11;reserID=1000001", #"\d+") //match all digit groupings
.Cast<Match>() //promote IEnumerable to IEnumerable<Match> so we can use Linq
.Select(m => m.Value) //for each Match, select its (string) Value
.ToArray() //convert to array, as per question

.Net has built in feature without using RegEx.Try System.Web.HttpUtility.ParseQueryString, passing the string. You would need to reference the System.Web assembly, but it shouldn't require a web context.
var value = "session=11;reserID=1000001";
NameValueCollection numbers =
System.Web.HttpUtility.ParseQueryString(value.Replace(";","&"));

I will re-use my code from another question:
private void button1_Click(object sender, EventArgs e)
{
string sauce = htm.Text; //htm = textbox
Regex myRegex = new Regex(#"[0-9]+(?:\.[0-9]*)?", RegexOptions.Compiled);
foreach (Match iMatch in myRegex.Matches(sauce))
{
txt.AppendText(Environment.NewLine + iMatch.Value);//txt= textbox
}
}
If you want to play around with regex here is a good site: http://gskinner.com/RegExr/
They also have a desktop app: http://gskinner.com/RegExr/desktop/ - It uses adobe air so install that first.

var numbers = Regex.Split(value, #".*?(.\d+).*?");
or
to return each digit:
var numbers = Regex.Split(value, #".*?(\d).*?");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract data from a string using Regex.Matches - c#

You can try with one of these: [a-z]+ [a-z]+ \+[0-9]{1,}\\r\\n[a-z]+ or: [a-z\s\\]+\+[0-9]{1,}[a-z\s\\]+ or: [\w\s]+\+\d{1,}\\r\\n[\w]+

Related

What regular expression needed to extract a part from a text to include splitting points too?

Get only Whole Words from a .Contains() statement

Search string using Pattern within long string in C#

RegEx for split text on string .NET

regexp for find number in a string

Categories

Resources