Regular expression to extract based on capital letters - c#

Hi please can someone help with a C# regex to split into just two words as follows:
"SetTable" ->> ["Set", "Table"]
"GetForeignKey" ->> ["Get", "ForeignKey"] //No split on Key!

This can be solved in different ways; one method is the following
string source = "GetForeignKey";
var result = Regex.Matches(source, "[A-Z]").OfType<Match>().Select(x => x.Index).ToArray();
string a, b;
if (result.Length > 1)
{
a = source.Substring(0, result[1]);
b = source.Substring(result[1]);
}

Try the regex below
(?![A-Z][a-z]+Key)[A-Z][a-z]+|[A-Z][a-z]+Key
c# code
var matches = Regex.Matches(input, #"(?![A-Z][a-z]+Key)[A-Z][a-z]+|[A-Z][a-z]+Key");
foreach (Match match in matches)
match.Groups[0].Value.Dump();
for Splitting
matches.OfType<Match>().Select(x => x.Value).ToArray().Dump();
Fiddle

Related

C# Regex - endless matching

I wanted to use Regex to get data from repeated XML tag:
<A>cat</A><A>dog</A>
So I've created Regex:
<A>(.*?)</A>
and code:
string text = "<A>asdasd</A><A>dsfsd</A>";
string regex = #"<A>(.*?)</A>";
Regex rgx = new Regex(regex);
Match match = rgx.Match(text);
while(match.Success)
{
i++;
Console.WriteLine(match.Groups[1].Value);
match.NextMatch();
}
But when I start my code, my loop is endless and never stop.
Can someone help me find what's wrong with code? Or find another solution?
(I don't want to deserialize XML).
This:
match.NextMatch();
just returns the next match, it doesn't change the state of match itself. You need to update the variable:
match = match.NextMatch();
While the failure reason is that you did not assign the next match to match, you can actually use Regex.Matches to get all the substrings you need in one go without the need for an explicit loop:
var results = rgx.Matches(text)
.Cast<Match>()
.Select(m => m.Groups[1].Value);
Console.WriteLine(string.Join("\n", results));
See the C# online demo:
var text = "<A>asdasd</A><A>dsfsd</A>";
var regex = #"<A>(.*?)</A>";
var rgx = new Regex(regex);
var results = rgx.Matches(text)
.Cast<Match>()
.Select(m => m.Groups[1].Value);
Console.WriteLine(string.Join("\n", results));
// asdasd
// dsfsd
Just use Regex.Matches to load them all into a collection, and proceed to iterate it.
string text = "<A>asdasd</A><A>dsfsd</A>";
string regex = #"<A>(.*?)</A>";
foreach (Match m in Regex.Matches(text, regex))
{
Console.WriteLine(m.Groups[1].Value);
}
Or single line using Linq:
Regex.Matches(text, regex).Cast<Match>().ToList().ForEach(m => Console.WriteLine(m.Groups[1].Value));

Extract multiple white spaces from string

I want to get white spaces which are greater than 1 space long.
The following gets me the null chars between each letter, and also the white spaces. However I only want to extract the two white spaces string between c and d, and the 3 white spaces string between f and g.
string b = "ab c def gh";
List<string> c = Regex.Split(b, #"[^\s]").ToList();
UPDATE:
The following works, but I'm looking for a more elegant way of achieving this:
c.RemoveAll(x => x == "" || x == " ");
The desired result would be a List<string> containing " " and " "
If you want List<String> as a result you could execute this Linq query
string b = "ab c def gh";
List<String> c = Regex
.Matches(b, #"\s{2,}")
.OfType<Match>()
.Select(match => match.Value)
.ToList();
This should give you your desired List.
string b = "ab c def gh";
var regex = new Regex(#"\s\s+");
var result = new List<string>();
foreach (Match m in regex.Matches(b))
result.Add(m.Value);
If all you are interested in are these groups of whitespaces, you could use
foreach(var match in Regex.Matches(b, #"\s\s+")) {
// ... do something with match
}
This guarantees that you will match at least 2 whitespaces.
Rather than splitting using a Regex, try using Regex.Matches to get all items matching your pattern - in this case I've used a pattern to match two or more whitespace characters, which I think is what you want?
var matchValues = Regex.Matches("ab c def gh", "\\s\\s+")
.OfType<Match>().Select(m => m.Value).ToList();
Annoyingly, the MatchCollection returned by Regex.Matches isn't IEnumerable<Match>, hence the need to use OfType<> in the LINQ expression.
You can use the following single line :
var list =Regex.Matches(value,#"[ ]{2,}").Cast<Match>().Select(match => match.Value).ToList();
Hope it will help you.

How do I extract a string of text that lies between *>...* using .NET C# regex or anything else?

I have a string like this.
*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*
I wanna extract between *> and * characters.
I tried this pattern which is wrong here below:
string pattern = "\\*\\>..\\*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(seriGelen);
if (matches.Count > 0)
{
foreach (Match match in matches)
MessageBox.Show("{0}", match.Value);
}
You can use simple regex:
(?<=\*>).*?(?=\*)
Sample code:
string text = "*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
string[] values = Regex.Matches(text, #"(?<=\*>).*?(?=\*)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Looks like there are can be very different values (UPD: there was an integer positive value). So, let me to not check numbers format. Also I will consider that *> and >, and also * are just different variants of delimiters.
I'd like to suggest the following solution.
(?<=[>\*])([^>\*]+?)(?=[>\*]+)
(http://regex101.com/r/mM7nK1)
Not sure it is ideal. Will only works if your input starts and ends with delimiters, but will allow to you to use matches instead groups, as your code does.
========
But you know, why wouldn't you use String.Split function?
var toprint = seriGelen.Split(new [] {'>', '*'}, StringSplitOptions.RemoveEmptyEntries);
Is there an error at the beginning of the string? Missing an asterisk after first number? >-0.0532>-0.0534*>
If not try this.
>([-+]?[0-9]*\.?[0-9]+)\*
C# Code
string strRegex = #">([-+]?[0-9]*\.?[0-9]+)\*";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Singleline);
string strTargetString = #">-0.0532>-0.0534*>-0.0534*>-0.0532*>-0.0534*>-0.0534*>-0.0532*>-0.0532*>-0.0534*>-0.0534*>-0.0534*>-0.0532*>-0.0534*";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Regex to parse string input

I want to use regex to parse it into groups
string input = #"(1,2)(3,4)";
Regex.Matches(input, #"\((\d,\d)\)");
The results I get is not only 1,2 and 3,4 but also spaces. Can you guys help me ?
EDIT:
I want to get 2 groups 1,2 and 3,4.
string input = #"(1,2)(3,4)";
MatchCollection inputMatch= Regex.Matches(collegeRecord.ToString(), #"(?<=\().*?(?=\))");
For current string you will get two outputs:
inputMatch[0].Groups[0].Value;
inputMatch[0].Groups[1].Value;
Or
You can also try foreach loop
foreach (Match match in inputMatch)
{
}
I have not tested this code,
My Working Example:
MatchCollection facilities = Regex.Matches(collegeRecord.ToString(), #"<td width=""38"">(.*?)image_tooltip");
foreach (Match facility in facilities)
{
collegeDetailDH.InsertFacilityDetails(collegeDetailDH._CollegeID, facility.ToString().Replace("<td width=\"38\">", string.Empty).Replace("<span class=\"icon_", string.Empty).Replace("image_tooltip", string.Empty));
}
How do you reach them? Try this:
Example:
MatchCollection matchs = Regex.Matches(input, #"\((\d,\d)\)");
foreach (Match m in matchs)
{
rtb1.Text += "\n\n" + m.Captures[0].Value;
}
Try looking at this pattern:
(\((?:\d,\d)\))+
+ allows that the group is repeating and can occur one or more time.
You need to use lookarounds.
string input = #"(1,2)(3,4)";
foreach (Match match in Regex.Matches(input, #"(?<=\().*?(?=\))"))
Console.WriteLine(match.Value);
If your string may have other content then digits in brackets, and you need only those with digits inside, you can use more specific regex as follows.
string input = #"(1,2)(3,4)";
foreach (Match match in Regex.Matches(input, #"(?<=\()\d,\d(?=\))"))
Console.WriteLine(match.Value);

Regex starting with a string

I want to filter the following string with the regular expressions:
TEST^AB^^HOUSE-1234~STR2255
I wanna get only the string "HOUSE-1234" and I've to test the string always with the beginning "TEST^AB^^" and ending with the "~".
Can you please help me how the regex should look like?
You can use \^\^(.*?)\~ pattern which matches start with ^^ and ends with ~
string s = #"TEST^AB^^HOUSE-1234~STR2255";
Match match = Regex.Match(s, #"\^\^(.*?)\~", RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
Output will be;
HOUSE-1234
Here is a DEMO.
string input = "TEST^AB^^HOUSE-1234~STR2255";
var matches = Regex.Matches(input, #"TEST\^AB\^\^(.+?)~").Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
string pattern=#"\^\^(.*)\~";
Regex re=new Regex(pattern);
With the little information you've given us (and assuming that the TEST^AB isn't necessarily constant), this might work:
(?:\^\^).*(?:~)
See here
Or if TEST^AB is constant, you can throw it in too
(?:TEST\^AB\^\^).*(?:~)
The important part is to remember that you need to escape the ^
Don't even need the RegEx for something that well defined. If you want to simplify:
string[] splitString;
if (yourstring.StartsWith("TEST^AB^^"))
{
yourstring = yourstring.Remove(0, 9);
splitString = yourstring.Split('~');
return splitString[0];
}
return null;
(TEST\^AB\^\^)((\w)+-(\w+))(\~.+)
There are three groups :
(TEST\^AB\^\^) : match yours TEST^AB^^
((\w)+\-(\w+)) : match yours HOUSE-123
(\~.+) : match the rest
You should do this without regex:
var str = "TEST^AB^^HOUSE-1234~STR2255";
var result = (str.StartsWith("TEST^AB^^") && str.IndexOf('~') > -1)
? new string(str.Skip(9).TakeWhile(c=>c!='~').ToArray())
: null;
Console.WriteLine(result);

Categories