Regex to remove specific string if exist - c#

I wanna remove the -L from the end of my string if exists
So
ABCD => ABCD
ABCD-L => ABCD
at the moment I'm using something like the line below which uses the if/else type of arrangement in my Regex, however, I have a feeling that it should be way more easier than this.
var match = Regex.Match("...", #"(?(\S+-L$)\S+(?=-L)|\S+)");

How about just doing:
Regex rgx = new Regex("-L$");
string result = rgx.Replace("ABCD-L", "");
So basically: if the string ends with -L, replace that part with an empty string.
If you want to not only invoke the replacement at the end of the string, but also at the end of a word, you can add an additional switch to detect word boundaries (\b) in addition to the end of the string:
Regex rgx = new Regex("-L(\b|$)");
string result = rgx.Replace("ABCD-L ABCD ABCD-L", "");
Note that detecting word boundaries can be a little ambiguous. See here for a list of characters that are considered to be word characters in C#.

You also can use String.Replace() method to find a specific string inside a string and replace it with another string in this case with an empty string.
http://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx

Use Regex.Replace function,
Regex.Replace(string, #"(\S+?)-L(?=\s|$)", "$1")
DEMO
Explanation:
( group and capture to \1:
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times)
) end of \1
-L '-L'
(?= look ahead to see if there is:
\s whitespace (\n, \r, \t, \f, and " ")
| OR
$ before an optional \n, and the end of
the string
) end of look-ahead

You certainly can use Regex for this, but why when using normal string functions is clearer?
Compare this:
text = text.EndsWith("-L")
? text.Substring(0, text.Length - "-L".Length)
: text;
to this:
text = Regex.Replace(text, #"(\S+?)-L(?=\s|$)", "$1");
Or better yet, define an extension method like this:
public static string RemoveIfEndsWith(this string text, string suffix)
{
return text.EndsWith(suffix)
? text.Substring(0, text.Length - suffix.Length)
: text;
}
Then your code can look like this:
text = text.RemoveIfEndsWith("-L");
Of course you can always define the extension method using the Regex. At least then your calling code looks a lot cleaner and is far more readable and maintainable.

Related

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Replacing special chars in a string with a single unique char

I have a string like so:
string inputStr = "Name*&^%LastName*##";
The following Regex will replace all the special chars with a '-'
Regex rgx = new Regex("[^a-zA-Z0-9 - _]");
someStr = rgx.Replace(someStr, "-");
That produces an output something like:
Name---LastName---
How do I replace '---' with a single '-' so the output looks like this:
Name-LastName
So the question is how do I replace all the special chars with a single '-'?
Regards.
Try this
Regex rgx = new Regex("[^a-zA-Z0-9 \- _]+");//note - character is escaped
or
Regex rgx = new Regex("[^a-zA-Z0-9 _-]+");//or use - as last character
But this will give Name-LastName- Is this okay or..?
If you don't need - at last position you can use the following code as well. Credit goes to
#MatthewStrawbridge. You can see in comments.
string someStr = rgx.Replace(inputStr, "-").TrimEnd('-');
will output Name-LastName.
Edit: As #pguardiario pointed in comments updated my answer to escape - since range([]) has special meaning for - character. If we need - as a literal we need to escape it or make it first or last character of the character class in order to behave as literal.

Trim string by strings

How can I trim a string by a whole string instead of a list of single characters?
I want to remove all and whitespaces at beginning and end of an HTML string. But method String.Trim() does only have overloads for set of characters and not for set of strings.
You could use HttpUtility.HtmlDecode(String) and use the resultant as an input for String.Trim()
HttpUtility.HtmlDecode on MSDN
HttpServerUtility.HtmlDecode on MSDN (a wrapper you can access through the Page.Server property)
string stringWithNonBreakingSpaces;
string trimmedString = String.Trim(HttpUtility.HtmlDecode(stringWithNonBreakingSpaces));
Note: This solution would decode all the HTML strings in the input.
The Trim method removes from the current string all leading and trailing white-space characters by default.
Edit: Solution for your problem AFTER your edit:
string input = #" <a href='#'>link</a> ";
Regex regex = new Regex(#"^( |\s)*|( |\s)*$");
string result = regex.Replace(input, String.Empty);
This will remove all trailing and leading spaces and . You can add any string or character group to the expression. If you were to trim all tabs too the regex would simply become:
Regex regex = new Regex(#"^( |\s|\t)*|( |\s|\t)*$");
Not sure if this is what you're looking for?
string str = "hello ";
str.Replace(" ", "");
str.Trim();
Use RegEx, as David Heffernan said. It is rather easy to select all spaces at the start of string: ^(\ | )*

Using Regex to match quoted string with embedded, non-escaped quotes

I am trying to match a string in the following pattern with a regex.
string text = "'Emma','The Last Leaf','Gulliver's travels'";
string pattern = #"'(.*?)',?";
foreach (Match match in Regex.Matches(text,pattern,RegexOptions.IgnoreCase))
{
Console.WriteLine(match + " " + match.Index);
Console.WriteLine(match.Groups[1].Captures[0]);
}
This matches "Emma" and "The Last leaf" correctly, however the third match is "Gulliver". But the desired match is "Gulliver's travels". How can I build a regex for a patterns like this?
Since , is your delimiter, you can try changing your pattern like this. It should work.
string pattern = #"'(.*?)'(?:,|$)";
The way this works is, it looks for a single quote followed by a comma or end of the line.
I think this can work '(.*?)',|'(.*)' as regular expression.
you may consider to use look behind /look ahead:
"(?<=^'|',').*?(?='$|',')"
test with grep:
kent$ echo "'Emma','The Last Leaf','Gulliver's travels'"|grep -Po "(?<=^'|',').*?(?='$|',')"
Emma
The Last Leaf
Gulliver's travels
You can't, if you have single-quote delimited strings and Gulliver's contains a single, unescaped quote there's no way to distinguish it from the end of a string. You could always just split it by commas and trim 's from either side but I'm not sure that's what you want:
string text = "'Emma','The Last Leaf','Gulliver's travels'";
foreach(string s in text.split(new char[] {','})) {
Console.WriteLine(s.Trim('\''));
}

problem in regular expression

I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);

Categories