Remove part of string between 2 brackets containing a specific word

Remove part of string between 2 brackets containing a specific word - c#

I need to make a function called RemoveError, that checks if a string contains the word "Error" inside 2 brackets with other text. If so, I need to remove the 2 brackets sorrounding "Error" and everything inside it.
Example:
var Result = RemoveError("Lorem Ipsum (Status: Hello) (Error: 14) (Comment: Some text)");
Result will return:
"Lorem Ipsum (Status: Hello) (Comment: Some text)"
Hope someone can help

You could try this Regex pattern:
public string RemoveError(string input) {
return Regex.Replace(input, #"\(Error\:\s[0-9]{1,3}\)\s", "");
}
I am assuming that your error code is numeric and between 1 and 3 digits long. If that is not the case, you need to adapt that part of the expression. I am additionally removing one extra whitespace after the error part, because otherwise you would end up with 2 whitespaces in between.
\( - opening paranthesis
Error - match the word Error
\: - match the colon
\s - match a whitespace
[0-9]{1,3} - match 1 to 3 characters in the range from 0-9
\) - match a closing paranthesis
\s - match a whitespace
Output:
Lorem Ipsum (Status: Hello) (Comment: Some text)

Related

how to find a word in one sentence using Regex

I have this sample data:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Re: Krishna P Mohan (31231231 / NA0031212301)
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
This is what I expect and currently get:
expected op - Krishna P Mohan
output - Krishna P Mohan (31231231 / NA0031212301)
I need to find the name which is comes after the Re: and till the (. im getting the complete line instead of only name till bracket starts.
code
var regex = new Regex(#"[\n\r].*Re:\s*([^\n\r]*)");
var fullNameText = regex.Match(extractedDocContent).Value;

If you want a match only, you can use a lookbehind assertion:
(?<=\bRe:\s*)[^\s()]+(?:[^\n\r()]*[^\s()])?
Explanation
(?<=\bRe:\s*) Positive lookbehind, assert the word Re: followed by optional whitespace chars to the left
[^\s()]+ Match 1 or more non whitespace chars except for ( and )
(?: Non capture group
[^\n\r()]* Optionally repeat matching any char except newlines and ( or )
[^\s()] Match a non whitespace character except for ( and )
)? Close the non capture group
If you want the capture group value, and you are matching only word characters:
\bRe:\s*([^\n\r(]+)\b
Regex demo
Else you can use:
\bRe:\s*([^\s()]+(?:[^\n\r()]*[^\s()])?)

How can I match four number fields or a text string using regex

So a sensor I'm interfacing to either outputs 4 multi-digit integers (separated by spaces) or an error string.
Ideally my regex would return a match for either of the above scenarios and reject any
other outputs - e.g. if only 3 numbers are output. I can then check if there are 4 groups (number output) or 1 group (error string output) in the following c#.
The regex I have matches all the time and returns spaces when there are less than 4 numbers so I still need to check everything.
I've tried putting in ?: but the format breaks. Any regex whizzes up to the challenge? Thanks in advance.
([0-9]+\s)([0-9]+\s)([0-9]+\s)([0-9]+)|([a-zA-Z\s_!]+)
So a numeric example would be 11 222 33 4444 or Sensor is in an error state! An incorrect output would be 222 11 3333 as it only has 3 fields
Also - I need to capture the four numbers (but not the spaces) or the error string.

You can capture either the 4 groups with only digits and match the whitespace chars outside of the group.
Or else match 1+ times any of the listed characters in the character class. Note that \s can also match a newline, and as the \s is in the character class the match can also consist of only spaces for example.
To match the whole string, you can add anchors.
^(?:([0-9]+)\s([0-9]+)\s([0-9]+)\s([0-9]+)|[a-zA-Z\s_!]+)$
» Regex demo
Another option to match the error string, is to start matching word characters without digits, optionally repeated by a whitespace char and again word characters without digits.
^(?:([0-9]+)\s([0-9]+)\s([0-9]+)\s([0-9]+)|[^\W\d]+(?:\s+[^\W\d]+)*!?)$
» Regex demo
If there can be a digit in the error message, but you don't want to match only digits or whitespace chars, you can exclude that using a negative lookahead.
^(?:([0-9]+)\s([0-9]+)\s([0-9]+)\s([0-9]+)|(?![\d\s]*$).+)$
» Regex demo

Regex replace all non-numeric characters except for certain character patterns

I am having a hard time trying to figure out this regex pattern. I want to replace all non-numeric characters in a string except for certain alpha character patterns.
For example i am trying:
string str = "The sky is blue 323.05 lnp days of the year";
str = Regex.Replace(str, "(^blue|lnp|days)[^.0-9]", "", RegexOptions.IgnoreCase);
I would like it to return:
"blue 323.05 lnp days"
but I can't figure out how to get it to match the entire character pattern in the expression.

I'd suggest capturing what you need to keep and just matching what you need to remove:
var result = Regex.Replace(text, #"(\s*\b(?:blue|lnp|days)\b\s*)|[^.0-9]", "$1").Trim();
See the regex demo. Note that the eventual leading/trailing spaces will be trimmed with .Trim().
The regex means:
(\s*\b(?:blue|lnp|days)\b\s*) - Group 1 ($1):
\s* - 0+ whitespaces
\b(?:blue|lnp|days)\b - one of the three words as whole words
\s* - 0+ whitespaces
| - or
[^.0-9] - any char but . and ASCII digit.

Regex that removes the 2 trailing letters from a string not preceded with other letters

This is in C#. I've been bugging my head but not luck so far.
So for example
123456BVC --> 123456BVC (keep the same)
123456BV --> 123456 (remove trailing letters)
12345V -- > 12345V (keep the same)
12345 --> 12345 (keep the same)
ABC123AB --> ABC123 (remove trailing letters)
It can start with anything.
I've tried #".*[a-zA-Z]{2}$" but no luck
This is in C# so that I always return a string removing the two trailing letters if they do exist and are not preceded with another letter.
Match result = Regex.Match(mystring, pattern);
return result.Value;

Your #".*[a-zA-Z]{2}$" regex matches any 0+ characters other than a newline (as many as possible) and 2 ASCII letters at the end of the string. You do not check the context, so the 2 letters are matched regardless of what comes before them.
You need a regex that will match the last two letters not preceded with a letter:
(?<!\p{L})\p{L}{2}$
See this regex demo.
Details:
(?<!\p{L}) - fails the match if a letter (\p{L}) is found before the current position (you may use [a-zA-Z] if you only want to deal with ASCII letters)
\p{L}{2} - 2 letters
$ - end of string.
In C#, use
var result = Regex.Replace(mystring, #"(?<!\p{L})\p{L}{2}$", string.Empty);

If you're looking to remove those last two letters, you can simply do this:
string result = Regex.Replace(originalString, #"[A-Za-z]{2}$", string.Empty);
Remember that in regex $ means the end of the input or the string before a newline.

Regex match if a string has length 2 and contains 1 letter and 1 number

Guys I hate Regex and I suck at writing.
I have a string that is space separated and contains several codes that I need to pull out. Each code is marked by beginning with a capital letter and ending with a number. The code is only two digits.
I'm trying to create an array of strings from the initial string and I can't get the regular expression right.
Here is what I have
String[] test = Regex.Split(originalText, "([a-zA-Z0-9]{2})");
I also tried:
String[] test = Regex.Split(originalText, "([A-Z]{1}[0-9]{1})");
I don't have any experience with Regex as I try to avoid writing them whenever possible.
Anyone have any suggestions?
Example input:
AA2410 F7 A4 Y7 B7 A 0715 0836 E0.M80
I need to pull out F7, A4, B7. E0 should be ignored.

You want to collect the results, not split on them, right?
Regex regexObj = new Regex(#"\b[A-Z][0-9]\b");
allMatchResults = regexObj.Matches(subjectString);
should do this. The \bs are word boundaries, making sure that only entire strings (like A1) are extracted, not substrings (like the A1 in TWA101).
If you also need to exclude "words" with non-word characters in them (like E0.M80 in your comment), you need to define your own word boundary, for example:
Regex regexObj = new Regex(#"(?<=^|\s)[A-Z][0-9](?=\s|$)");
Now A1 only matches when surrounded by whitespace (or start/end-of-string positions).
Explanation:
(?<= # Assert that we can match the following before the current position:
^ # Start of string
| # or
\s # whitespace.
)
[A-Z] # Match an uppercase ASCII letter
[0-9] # Match an ASCII digit
(?= # Assert that we can match the following after the current position:
\s # Whitespace
| # or
$ # end of string.
)
If you also need to find non-ASCII letters/digits, you can use
\p{Lu}\p{N}
instead of [A-Z][0-9]. This finds all uppercase Unicode letters and Unicode digits (like Ä٣), but I guess that's not really what you're after, is it?

Do you mean that each code looks like "A00"?
Then this is the regex:
"[A-Z][0-9][0-9]"
Very simple... By the way, there's no point writing {1} in a regex. [0-9]{1} means "match exactly one digit, which is exactly like writing [0-9].
Don't give up, simple regexes make perfect sense.

This should be ok:
String[] all_codes = Regex.Split(originalText, #"\b[A-Z]\d\b");
It gives you an array with all code starting with a capital letter followed by a digit, separated by an kind of word boundary (site space etc.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove part of string between 2 brackets containing a specific word - c#

Related

how to find a word in one sentence using Regex

How can I match four number fields or a text string using regex

Regex replace all non-numeric characters except for certain character patterns

Regex that removes the 2 trailing letters from a string not preceded with other letters

Regex match if a string has length 2 and contains 1 letter and 1 number

Categories

Resources