Check if string contains character and number - c#

How do I check if a string contains the following characters "RM" followed by a any number or special character(-, _ etc) and then followed by "T"?
Ex: thisIsaString ABRM21TC = yes, contains "RM" followed by a number and followed by "T"
Ex: thisIsaNotherString RM-T = yes, contain "RM" followed by a special character then followed by "T"

Your going to want to check the string using a regex (regular expression). See this MSDN for info on how to do that
http://msdn.microsoft.com/en-us/library/ms228595.aspx

Try this regexp.
[^RM]*RM[^RMT]+T[^RMT]*
Here is a sample program.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication12
{
class Program
{
static void Main(string[] args)
{
String rg = "[^RM]*RM[^RMT]+T[^RMT]*";
string input = "111RM----T222";
Match match = Regex.Match(input, rg, RegexOptions.IgnoreCase);
Console.WriteLine(match.Success);
}
}
}

You can do it with a simple regular expression:
var match = Regex.Match(s, "RM([^T]+)T");
Check if the pattern is present by calling match.Success.
Get the captured value by calling match.Groups[1].
Here is a demo (on ideone: link):
foreach (var s in new[] {"ABRM21TC", "RM-T", "RxM-T", "ABR21TC"} ) {
var match = Regex.Match(s, "RM([^T]+)T");
Console.WriteLine("'{0}' - {1} (Captures '{2}')", s, match.Success, match.Groups[1]);
}
It prints
'ABRM21TC' - True (Captures '21')
'RM-T' - True (Captures '-')
'RxM-T' - False (Captures '')
'ABR21TC' - False (Captures '')

Use regular expressions
http://www.webresourcesdepot.com/learn-test-regular-expressions-with-the-regulator/
The Regulator is an advanced, free regular expressions testing and learning tool. It allows you to build and verify a regular expression against any text input, file or web, and displays matching, splitting or replacement results within an easy to understand, hierarchical tree.

You should play around with more sample data especially regarding special characters, you can use regexpal, I have added the two cases and an expression to get you started.

Related

How to extract the number from a matched string in C#?

I want to extract emoji id from the input.
For example, inputs:
`<eid=1> valid get 1`
`<eid = > invalid `
`<exd = 1> invalid`
`< eid = 1000> valid get 1000`
I know how to match those string, but I have no idea about how to extract those ids from the matched strings.
Use regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] inputs = {
"<eid=1>",
"<eid = >",
"<exd = 1>",
"< eid = 1000>"
};
string pattern = #"\<\s*eid\s*=\s*(?'number'\d+)\s*\>";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("input : '{0}' Does Match, number = '{1}'", input, match.Groups["number"]);
}
else
{
Console.WriteLine("input : '{0}' Does not Match", input);
}
}
Console.ReadLine();
}
}
}
You can do something like this. If you don't want to store each item in an array (ex. you have html code), you can store all the values as one string, as use the following:
var input = #"`<eid=1> valid get 1`
`<eid = > invalid `
`<exd = 1> invalid`
`< eid = 1000> valid get 1000`";
var regex = new Regex(#"(?<open>\=).*?(?<final-open>\>)");
var matches = regex.Matches(input).Cast<Match>().Select(m => m.Groups["final"].Value).Distinct().ToList();
foreach (var match in matches)
{
// here you have all the matches
var id = int.Parse(match.Trim());
}
This method sets the opening and closing tags of the matches you want where this is the open tag '\=' and this is the closing tag '>':
var regex = new Regex(#"(?<open>\=).*?(?<final-open>\>)");
You need to understand what is a match, what is a capture and how can one do match captures of specific data.
In the realm of regular expressions there is a difference between a match and a capture and basic grouping.
You want to match the whole value <eid=8> but you want to get the value 8 into a capture. That is done by adding a grouping ( ) pattern to establish 1 to many capture groups. For a match can hold one or more groupings which are indexed starting at 1 to N. Zero is a special group done automatically and explained later.
So for the data <eid=8>, to group capture the value use this regex <\w+=(\d+)\> (instead of the viable pattern <\w+=\d+\>). The grouping is what puts the number into the match capture group of 1 with a value of 8.
So what are groups exactly?
Groups[0] is always the whole match such as what you see of <eid=8>.
Groups[1-N] are individual captures when ( ) construct is specified. So for our example Groups[1].Value is the number of 8. Nice, that answers your question.
One can do a named match capture by putting in (<?<{name here}>... ). By that logic we can change our pattern to <\w+=(?<TheNumbers>\d+)\> and we then can extract with Groups["TheNumbers"].Value or even Groups[1].Value still.

Parsing a list of functions and their parameters from a string

I have a string which contains some functions (I know their names) and their parameters like this:
translate(700 210) rotate(-30)
I would like to parse each one of them in a string array starting with the function name followed by the parameters.
I don't know much abour regex and so far I got this:
MatchCollection matches = Regex.Matches(attribute.InnerText, #"((translate|rotate|scale|matrix)\s*\(\s*(-?\d+\s*\,*\s*)+\))*");
for (int i = 0; i < matches.Count; i++)
{
Console.WriteLine(matches[i].Value);
}
That this returns is:
translate(700 210)
[blank space]
rotate(-30)
[blank space]
This works for me because I can run another regular expression one each row from the resulting collection and get the contents. What I don't understand is why there are blank rows returned between the methods.
Also, is running a regex twice - once to separate the methods and once to actually parse them a good approach?
Thanks!
Regex.Matches will match your entire regular expression multiple times. It finds one match for the whole thing, then finds the next match for the whole thing.
The outermost parens with * indicate that you're willing to accept zero or more of the preceding group's contents as a match. So when it finds none of them, it happily returns that. That is not your intent. You want exactly one.
The blanks are harmless, but "zero or more" also includes two. Consider this string, with no space between the two functions:
var text = "translate(700 210)rotate(-30)";
That's one match, according to the regex you provided. You'll get "rotate" and "-30". If the missing space is an error, detect it and warn the user. If you're not going to do that, parse it correctly.
So let's get rid of the outermost parens and that *. We'll also name the capturing groups, for readability.
var matches = Regex.Matches(text, #"(?<funcName>translate|rotate|scale|matrix)\s*\(\s*(?<param>-?\s*\d+\s*\,*\s*)+\)");
foreach (Match match in matches)
{
if (match.Groups["funcName"].Success)
{
var funcName = match.Groups["funcName"].Value;
var param = Int32.Parse(match.Groups["param"].Value);
Console.WriteLine($"{funcName}( {param} )");
}
}
I also stuck in \s* after the optional -, just in case.
I like using Regex with a dictionary
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication56
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> dict = new Dictionary<string, string>();
string input = "translate(700 210) rotate(-30)";
string pattern = #"(?'command'[^\(]+)\((?'value'[^\)]+)\)";
MatchCollection matches = Regex.Matches(input, pattern);
foreach(Match match in matches.Cast<Match>())
{
dict.Add(match.Groups["command"].Value, match.Groups["value"].Value);
}
}
}
}

Regex get the text after the match which must be the last occurrence

I want to extract the string after the last occurrence of "cn=" using regex in C# application. So what I need is the string between last occurence of "cn=" and \ character Please note that the source string may contains spaces.
Example:
ou=company\ou=country\ou=site\cn=office\cn=name\ou=pet
Result:
name
So far Ive got (?<=cn=).* for selecting the text after the cn= using positive lookbehind and (?:.(?!cn=))+$ for finding the last occurence but I dont know how to combine it together to get desired result.
You may try using the following regex ...
(?m)(?<=cn=)[\w\s]+(?=\\?(?:ou=)?[\w\s]*$)
see regex demo
C# ( demo )
using System;
using System.Text.RegularExpressions;
public class RegEx
{
public static void Main()
{
string pattern = #"(?m)(?<=cn=)[\w\s]+(?=\\?(?:ou=)?[\w\s]*$)";
string input = #"ou=company\ou=country\ou=site\cn=office\cn=name\ou=pet";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("{0}", m.Value);
}
}
}
You could use a negative lookahead:
cn=(?!.*cn=)([^\\]+)
Take group $1 and see a demo on regex101.com. As full C# code, see a demo on ideone.com.
To only have one group, add another lookaround:
(?<=cn=)(?!.*cn=)([^\\]+)
Another idea by just using a capturing group for getting the desired part.
string pattern = #"^.*cn=(\w+)";
^.*cn= will consume anything from ^ start up to last occurence of cn= (see greed).
(\w+) first group captures one or more word characters. Here is a demo at regex101.
The extracted match will be in m.Groups[1] (see demo).

Regex doesn't capture results accurately in C# but works fine on online regex tools

This regex - (.*).ap(.\d*)$ - works fine in online regex tools (for example, works here) but not in C#.
The intent is that for a string that ends with .ap<some numbner> (.ap followed by some number - .ap1, .ap123 etc), we want to capture everything on the left of the last .ap and everything on the right of the last .ap. For example for abcfile.csv.ap123, we want to capture abcfile.csv and 123.
(.*).ap(.\d*)$ captures the two groups as expected in online tools but in C#, it captures the entire name (captures abcfile.csv.ap123, in the above example).
The C# code is -
string renameSuffix = "ap";
Match match = Regex.Match(filename, #"(.*)\." + renameSuffix + #"(\d*)$");
match.Success is true but match.Captures.Count is 1 and match.Captures[0].Value contains the entire filename (I'm looking at this in a watch).
What could be wrong here?
More examples -
TestCashFile_10_12-25-2016_D.csv - Shouldn't match
TestCashFile_10_12-25-2016_D_A.csv.ap123 - Should match and capture TestCashFile_10_12-25-2016_D_A.csv and 123
TestCashFile_10_12-25-2016_D_A.csv.ap123.ds - Shouldn't match
TestCashFile_10_12-25-2016_D.csv.ap2.ap1 - Should match and capture TestCashFile_10_12-25-2016_D.csv.ap2 and 1
Your code is fine
just try match.Groups[1] and match.Groups[2]
In the 0 index you have full match and only subsequent groups relate to regex groups.
Looks like it is working :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication42
{
class Program
{
static void Main(string[] args)
{
string[] filenames = {
"TestCashFile_10_12-25-2016_D.csv", // - Shouldn't match
"TestCashFile_10_12-25-2016_D_A.csv.ap123", // - Should match and capture TestCashFile_10_12-25-2016_D_A.csv and 123
"TestCashFile_10_12-25-2016_D_A.csv.ap123.ds", // - Shouldn't match
"TestCashFile_10_12-25-2016_D.csv.ap2.ap1" //- Should match and capture TestCashFile_10_12-25-2016_D.csv.ap2 and 1
};
string renameSuffix = "ap";
string pattern = #"(?'filename'.*)\." + renameSuffix + #"(?'suffix'\d*)$";
foreach (string filename in filenames)
{
Match match = Regex.Match(filename, pattern);
Console.WriteLine("Match : {0}, Filename : {1}, Suffix : {2}", match.Success ? "True" : "False", match.Groups["filename"].Value, match.Groups["suffix"].Value);
}
Console.ReadLine();
}
}
}

searching a hash in a string with regex

I want to get this bold part from this string:
some other code src='/pages/captcha?t=c&s=**51afb384edfc&h=513cc6f5349b**' `</td><td><input type=text name=captchaenter id=captchaenter size=3`
This is my regex that is not working:
Regex("src=\\'/pages/captcha\\?t=c&s=([\\d\\w&=]+)\\'", RegexOptions.IgnoreCase)
In tool for regex testing it's working.
How can this be fixed?
Your string-based regex is different from the regex you tested in the tool. In your regex, you have [\d\w\W]+ which matches any character and is aggressive (i.e. no ? after + to make it non-aggressive). So it may match a very long string, which may be all the way up to the last end quote.
In your tool you have [\d\w&=] which only matches digits, letters, & and =, so obviously it will stop when hitting the end quote.
The regex's aren't the same. The one in code has a character class ([\\d\\w\\W]+) that is different from the one in the tool ([\\d\\w&=]+])
Works perfectly fine with this code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string s = "src='/pages/captcha?t=c&s=51afb384edfc&h=513cc6f5349b' </td><td><input type=text name=captchaenter id=captchaenter size=3";
Regex rgx = new Regex("src=\\'/pages/captcha\\?t=c&s=([\\d\\w\\W]+)\\'", RegexOptions.IgnoreCase);
Match m = rgx.Match(s);
Console.Write(m.Groups[1]);
}
}
}
It outputs
51afb384edfc&h=513cc6f5349b
I despise regular expressions. I would do it similar to (but safer than) this:
private static string GetStuff(string source)
{
var start = source.IndexOf("s=") + 2;
var end = source.IndexOf('\'', start + 3);
return source.Substring(start, end - start);
}

Categories