Splitting a String with conditions - c#

Given a string of:
"S1 =F A1 =T A2 =T F3 =F"
How can I split it so that the result is an array of strings where the 4 strings ,individual string would look like this:
"S1=F"
"A1=T"
"A2=T"
"F3=F"
Thank you

You can try matching all Name = (T|F) conditions with regular expressions and then get rid of white spaces in the each match with a help of Linq:
using System.Linq;
using System.Text.RegularExpressions;
..
string source = "S1 \t = F A1 = T A2 = T F3 = F";
string[] result = Regex
.Matches(source, #"[A-Za-z][A-Za-z0-9]*\s*=\s*[TF]")
.OfType<Match>()
.Select(match => string.Concat(match.Value.Where(c => !char.IsWhiteSpace(c))))
.ToArray();
Console.WriteLine(string.Join(Environment.NewLine, result));
Outcome:
S1=F
A1=T
A2=T
F3=F
Edit: What's goining on. First part is a regular expression matching:
... Regex
.Matches(source, #"[A-Za-z][A-Za-z0-9]*\s*=\s*[TF]")
.OfType<Match>() ...
We are trying to find out fragments with a pattern
[A-Za-z] - Letter A..Z or a..z
[A-Za-z0-9]* - followed by zero or many letters or digits
\s* - zero or more white spaces (spaces, tabulations etc.)
= - =
\s* - zero or more white spaces (spaces, tabulations etc.)
[TF] - either T or F
Second part is match clearing: for each match found e.g. S1 \t = F we want to obtain "S1=F" string:
...
.Select(match => string.Concat(match.Value.Where(c => !char.IsWhiteSpace(c))))
.ToArray();
We use Linq here: for each character in the match we filter out all white spaces (take character c if and only if it's not a white space):
match.Value.Where(c => !char.IsWhiteSpace(c))
then combine (Concat) filtered characters (IEnumerable<char>) of each match back to string and organize these strings as an array (materialization):
.Select(match => string.Concat(...))
.ToArray();

Related

Regex split preserving strings and escape character

I need to split a string on C#, based on space as delimiter and preserving the quotes.. this part is ok.
But additionally, I want to allow escape character for string \" to allow include other quotes inside the quotes.
Example of what I need:
One Two "Three Four" "Five \"Six\""
To:
One
Two
Three Four
Five "Six"
This is the regex I am currently using, it is working for all the cases except "Five \"Six\""
//Split on spaces unless in quotes
List<string> matches = Regex.Matches(input, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(x => x.Value.Trim('"'))
.ToList();
I'm looking for any Regex, that would do the trick.
You can use
var input = "One Two \"Three Four\" \"Five \\\"Six\\\"\"";
// Console.WriteLine(input); // => One Two "Three Four" "Five \"Six\""
List<string> matches = Regex.Matches(input, #"(?s)""(?<r>[^""\\]*(?:\\.[^""\\]*)*)""|(?<r>\S+)")
.Cast<Match>()
.Select(x => Regex.Replace(x.Groups["r"].Value, #"\\(.)", "$1"))
.ToList();
foreach (var s in matches)
Console.WriteLine(s);
See the C# demo.
The result is
One
Two
Three Four
Five "Six"
The (?s)"(?<r>[^"\\]*(?:\\.[^"\\]*)*)"|(?<r>\S+) regex matches
(?s) - a RegexOptions.Singleline equivalent to make . match newlines, too
"(?<r>[^"\\]*(?:\\.[^"\\]*)*)" - ", then Group "r" capturing any zero or more chars other than " and \ and then zero or more sequences of any escaped char and zero or more chars other than " and \, and then a " is matched
| - or
(?<r>\S+) - Group "r": one or more whitespaces.
The .Select(x => Regex.Replace(x.Groups["r"].Value, #"\\(.)", "$1")) takes the Group "r" value and unescapes (deletes a \ before) all escaped chars.

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Regex to match words between underscores after second occurence of underscore

so i would like to get words between underscores after second occurence of underscore
this is my string
ABC_BC_BE08_C1000004_0124
I've assembled this expresion
(?<=_)[^_]+
well it matches what i need but only skips the first word since there is no underscore before it. I would like it to skip ABC and BC and just get the last three strings, i've tried messing around but i am stuck and cant make it work. Thanks!
You can use a non-regex approach here with Split and Skip:
var text = "ABC_BC_BE08_C1000004_0124";
var result = text.Split('_').Skip(2);
foreach (var s in result)
Console.WriteLine(s);
Output:
BE08
C1000004
0124
See the C# demo.
With regex, you can use
var result = Regex.Matches(text, #"(?<=^(?:[^_]*_){2,})[^_]+").Cast<Match>().Select(x => x.Value);
See the regex demo and the C# demo. The regex matches
(?<=^(?:[^_]*_){2,}) - a positive lookbehind that matches a location that matches the following patterns immediately to the left of the current location:
^ - start of string
(?:[^_]*_){2,} - two or more ({2,}) sequences of any zero or more chars other than _ ([^_]*) and then a _ char
[^_]+ - one or more chars other than _
Usign .NET there is also a captures collection that you might use with a regex and a repeated catpure group.
^[^_]*_[^_]*(?:_([^_]+))+
The pattern matches:
^ Start of string
[^_]*_[^_]* Match any char except an _, match _ and again any char except _
(?: Non capture group
_([^_]+) Match _ and capture 1 or more times any char except _ in group 1
)+ Close the non capture group and repeat 1 or more times
.NET regex demo | C# demo
For example:
var pattern = #"^[^_]*_[^_]*(?:_([^_]+))+";
var str = "ABC_BC_BE08_C1000004_0124";
var strings = Regex.Match(str, pattern).Groups[1].Captures.Select(c => c.Value);
foreach (String s in strings)
{
Console.WriteLine(s);
}
Output
BE08
C1000004
0124
If you want to match only word characters in between the underscores, another option for a pattern could be using a negated character class [^\W_] excluding the underscore from the word characters in between:
^[^\W_]*_[^\W_]*(?:_([^\W_]+))+

Problem with brackets in regular expression in C#

can anybody help me with regular expression in C#?
I want to create a pattern for this input:
{a? ab 12 ?? cd}
This is my pattern:
([A-Fa-f0-9?]{2})+
The problem are the curly brackets. This doesn't work:
{(([A-Fa-f0-9?]{2})+)}
It just works for
{ab}
I would use {([A-Fa-f0-9?]+|[^}]+)}
It captures 1 group which:
Match a single character present in the list below [A-Fa-f0-9?]+
Match a single character not present in the list below [^}]+
If you allow leading/trailing whitespace within {...} string, the expression will look like
{(?:\s*([A-Fa-f0-9?]{2}))+\s*}
See this regex demo
If you only allow a single regular space only between the values inside {...} and no space after { and before }, you can use
{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}
See this regex demo. Note this one is much stricter. Details:
{ - a { char
(?:\s*([A-Fa-f0-9?]{2}))+ - one or more occurrences of
\s* - zero or more whitespaces
([A-Fa-f0-9?]{2}) - Capturing group 1: two hex or ? chars
\s* - zero or more whitespaces
} - a } char.
See a C# demo:
var text = "{a? ab 12 ?? cd}";
var pattern = #"{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
// => a?; ab; 12; ??; cd
If you want to capture pairs of chars between the curly's, you can use a single capture group:
{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}
Explanation
{ Match {
( Capture group 1
[A-Fa-f0-9?]{2} Match 2 times any of the listed characters
(?: [A-Fa-f0-9?]{2})* Optionally repeat a space and again 2 of the listed characters
) Close group 1
} Match }
Regex demo | C# demo
Example code
string pattern = #"{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}";
string input = #"{a? ab 12 ?? cd}
{ab}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
a? ab 12 ?? cd
ab

How to extract text.text information using regular expressions?

I have a following sample string
ptv.test foo bar cc.any more words
I want a regular expression which can extract the patter text.text. For example in above string it should match ptv.test and cc.any
Thanks
You can use the following code:
string s = "ptv.test foo bar cc.any more words";
var matches = Regex.Matches(s, #"\w+\.\w+");
foreach(Match match in matches)
{
Console.WriteLine(match.Value);
}
Which outputs:
ptv.test
cc.any
\w+\.\w+
(one or more word characters, the period, one or more word characters)
[A-Za-z]+\.[A-Za-z]
You need to escape the period becuase it is a Regex special character that matches anything
Your question is vague one. The answer depends on what does the "text" actually mean. Some possibilities are below:
[a-z]+\.[a-z]+ English lower case letters a..z
[A-Za-z]+\.[A-Za-z]+ English letters A..Z or a..z
\p{L}+\.\p{L}+ Any unicode letters
\w+\.\w+ Any word symbols (letters + digits)
...
Another detail to concern with is should "text" be preceded / followed by white spaces or string start/end. E.g. for given
pt???v.test foo bar cc.an!!!y more words
should "v.test" or "cc.an" be considered as matches. If not, add \b before and after the required pattern, e.g.:
\b[a-z]+\.[a-z]+\b
The implementation can be something like this:
string source = #"ptv.test foo bar cc.any more words";
string pattern = #"\b[a-z]+\.[a-z]+\b";
string[] matches = Regex
.Matches(source, pattern)
.Cast<Match>()
.Select(match => match.Value)
.ToArray(); // let's organize matches as an array
// ptv.test
// cc.any
Console.Write(String.Join(Environment.NewLine, matches));

Categories